Data Validation/Cleansing Tool Query

  • Follow


We are currently using SAS to do validation.  We write programs to check
things like ranges, all fields present, etc., etc..
Since this is a clinical trials environment it is also necessary to check
across records for visit squence, missing visits, etc..

While we have a lot of this packaged into macros, it seems to me that
there should be tools available that allow non-programmers to do a lot
(preferably, all) of this.  It seems a waste to need programmers to do
something so low-level.

Anyone have suggestions for products that might fill the bill?

TIA.

Jonathan
0
Reply jgoldberg (119) 1/4/2010 8:43:29 PM

If you have macros defined for it already, then a non-programmer can do it
trivially.

I however would disagree about it being a waste; a data-savvy programmer can
be highly useful in data cleaning, as it's not necessarily trivial to make
decisions and/or see issues that require additional cleaning steps.  Trivial
data cleaning is, well, trivial, and shouldn't take an appreciable amount of
a programmer's actual physical time; data cleaning that is not truly
trivial, but instead requires analysis, should be done by a programmer, in
my book.

-Joe

On Mon, Jan 4, 2010 at 2:43 PM, Jonathan Goldberg
<jgoldberg@biomedsys.com>wrote:

> We are currently using SAS to do validation.  We write programs to check
> things like ranges, all fields present, etc., etc..
> Since this is a clinical trials environment it is also necessary to check
> across records for visit squence, missing visits, etc..
>
> While we have a lot of this packaged into macros, it seems to me that
> there should be tools available that allow non-programmers to do a lot
> (preferably, all) of this.  It seems a waste to need programmers to do
> something so low-level.
>
> Anyone have suggestions for products that might fill the bill?
>
> TIA.
>
> Jonathan
>
0
Reply snoopy369 (1752) 1/4/2010 9:02:15 PM


Have to agree will the others here, but if you ever move to SDD, then SAS
does an excellent job of doing this for programmers and non-programmers.



On Mon, Jan 4, 2010 at 3:43 PM, Jonathan Goldberg
<jgoldberg@biomedsys.com>wrote:

> We are currently using SAS to do validation.  We write programs to check
> things like ranges, all fields present, etc., etc..
> Since this is a clinical trials environment it is also necessary to check
> across records for visit squence, missing visits, etc..
>
> While we have a lot of this packaged into macros, it seems to me that
> there should be tools available that allow non-programmers to do a lot
> (preferably, all) of this.  It seems a waste to need programmers to do
> something so low-level.
>
> Anyone have suggestions for products that might fill the bill?
>
> TIA.
>
> Jonathan
>
0
Reply sethstjames (68) 1/4/2010 9:40:38 PM

It would seem as though a reference to the following book might be in order
here

Cody, R. (1999).  Cody's data cleaning techniques using SAS software.  Cary,
NC: SAS Institute inc.

HTH
Dennis Fisher

Dennis G. Fisher, Ph.D.
Professor and Director
Center for Behavioral Research and Services
California State University, Long Beach
1090 Atlantic Avenue
Long Beach, CA 90813
tel: 562-495-2330 x121
fax: 562-983-1421
http://www.csulb.edu/centers/cbrs


-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Seth
StJames
Sent: Monday, January 04, 2010 1:41 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Data Validation/Cleansing Tool Query

Have to agree will the others here, but if you ever move to SDD, then SAS
does an excellent job of doing this for programmers and non-programmers.



On Mon, Jan 4, 2010 at 3:43 PM, Jonathan Goldberg
<jgoldberg@biomedsys.com>wrote:

> We are currently using SAS to do validation.  We write programs to check
> things like ranges, all fields present, etc., etc..
> Since this is a clinical trials environment it is also necessary to check
> across records for visit squence, missing visits, etc..
>
> While we have a lot of this packaged into macros, it seems to me that
> there should be tools available that allow non-programmers to do a lot
> (preferably, all) of this.  It seems a waste to need programmers to do
> something so low-level.
>
> Anyone have suggestions for products that might fill the bill?
>
> TIA.
>
> Jonathan
>
0
Reply dfisher (224) 1/4/2010 9:45:22 PM

Jonathan,

I have to both agree and disagree with my colleagues however, for the most
part, I agree with everything they said.

Data validation is FAR from being trivial and, at the risk of offending some
of my colleagues, shouldn't be left solely to the responsibility of
programmers.

Sure, you can write or buy routines for doing many of the tasks, but a lot
of validity checks require business knowledge that programmers might not
have and often require the talents of staff whose salaries are even higher
(believe it or not SAS programmers are not necessarily the highest paid
employees in some organizations).

Are correct codes used? Are entries reasonable and consistent? Over time can anomalies or unexpected patterns be identified?

Those questions are all components of data validity and can require anything
from a running and reviewing the results of a simple algorithm, to comparing
differences between statistical models based on samples from the data.

Who should do the work, I think, depends upon which specific task is being
done, the available staff, and the skills required.

Art
-------
On Mon, 4 Jan 2010 15:43:29 -0500, Jonathan Goldberg
<jgoldberg@BIOMEDSYS.COM> wrote:

>We are currently using SAS to do validation.  We write programs to check
>things like ranges, all fields present, etc., etc..
>Since this is a clinical trials environment it is also necessary to check
>across records for visit squence, missing visits, etc..
>
>While we have a lot of this packaged into macros, it seems to me that
>there should be tools available that allow non-programmers to do a lot
>(preferably, all) of this.  It seems a waste to need programmers to do
>something so low-level.
>
>Anyone have suggestions for products that might fill the bill?
>
>TIA.
>
>Jonathan
0
Reply art297 (4237) 1/4/2010 11:35:19 PM

Art, I absolutely agree re: business knowledge.  I suspect it has a lot to
do with the corporate culture of one's company [or university etc. etc.]; in
mine, the programmers are usually the ones with the business knowledge of
the data.  Sometimes the business end of the group will have good data
skills, and we will involve them 100% in data validation, either directly
[by running reports out for them and/or giving them direct data access] or
indirectly [by asking questions].  However, that's certainly not always
true; in those cases, I'm the one who knows the data in and out, and I make
most of the decisions.  I'd much prefer that not to be the case ever - it's
a lot better when you have more eyes on the data - but sometimes it is the
best case.

I do have the advantage though of being primarily tasked with particular
projects.  In a corporate culture where programmers work on any given
project and projects don't have a specific programmer assigned to them, that
lack of continuous business knowledge certainly would affect whose
responsibility ultimately data cleaning/validation should be.

-Joe

On Mon, Jan 4, 2010 at 5:35 PM, Arthur Tabachneck <art297@netscape.net>wrote:

> Jonathan,
>
> I have to both agree and disagree with my colleagues however, for the most
> part, I agree with everything they said.
>
> Data validation is FAR from being trivial and, at the risk of offending
> some
> of my colleagues, shouldn't be left solely to the responsibility of
> programmers.
>
> Sure, you can write or buy routines for doing many of the tasks, but a lot
> of validity checks require business knowledge that programmers might not
> have and often require the talents of staff whose salaries are even higher
> (believe it or not SAS programmers are not necessarily the highest paid
> employees in some organizations).
>
> Are correct codes used? Are entries reasonable and consistent? Over time
> can anomalies or unexpected patterns be identified?
>
> Those questions are all components of data validity and can require
> anything
> from a running and reviewing the results of a simple algorithm, to
> comparing
> differences between statistical models based on samples from the data.
>
> Who should do the work, I think, depends upon which specific task is being
> done, the available staff, and the skills required.
>
> Art
> -------
> On Mon, 4 Jan 2010 15:43:29 -0500, Jonathan Goldberg
> <jgoldberg@BIOMEDSYS.COM> wrote:
>
> >We are currently using SAS to do validation.  We write programs to check
> >things like ranges, all fields present, etc., etc..
> >Since this is a clinical trials environment it is also necessary to check
> >across records for visit squence, missing visits, etc..
> >
> >While we have a lot of this packaged into macros, it seems to me that
> >there should be tools available that allow non-programmers to do a lot
> >(preferably, all) of this.  It seems a waste to need programmers to do
> >something so low-level.
> >
> >Anyone have suggestions for products that might fill the bill?
> >
> >TIA.
> >
> >Jonathan
>
0
Reply snoopy369 (1752) 1/4/2010 11:43:04 PM

5 Replies
195 Views

(page loaded in 0.126 seconds)

Similiar Articles:













7/23/2012 6:07:57 PM


Reply: