Dataset Comparison Using Proc Compare

  • Follow


Hi All:


I have two adverse events data sets Ae_old and Ae_New .. I  am looking
to compare Aeterm (Term entered in the CRF) to see if there were any
additions/deletions or changes in the AEterm in Ae_new from Ae_old
dataset and create a flag in the Ae_new  to know which data point was
added (flag = 1) ,which data point was deleted (flag = 2) and which
data point is changed(flag = 3)

I am using this proc compare .


proc compare base = ae_old compare = ae_new OUTNOEQUAL  OUTCOMP
OUTBASE /*OUTDIF*/ NOPRINT out = Ae_diff;
   id subject  ;
   var aeterm ;
run;

but I am having difficulty to create a flags .Any suggestions

Thanks in advance

0
Reply Al 4/8/2010 8:41:38 PM

On Apr 8, 1:41=A0pm, Al <ali6...@gmail.com> wrote:
> Hi All:
>
> I have two adverse events data sets Ae_old and Ae_New .. I =A0am looking
> to compare Aeterm (Term entered in the CRF) to see if there were any
> additions/deletions or changes in the AEterm in Ae_new from Ae_old
> dataset and create a flag in the Ae_new =A0to know which data point was
> added (flag =3D 1) ,which data point was deleted (flag =3D 2) and which
> data point is changed(flag =3D 3)
>
> I am using this proc compare .
>
> proc compare base =3D ae_old compare =3D ae_new OUTNOEQUAL =A0OUTCOMP
> OUTBASE /*OUTDIF*/ NOPRINT out =3D Ae_diff;
> =A0 =A0id subject =A0;
> =A0 =A0var aeterm ;
> run;
>
> but I am having difficulty to create a flags .Any suggestions
>
> Thanks in advance

Hi,

If you have a primary key in both datasets and keep only the key and
aeterm the following code will identify inserts, deletes and updates

For a cleaner version
  You might want to take a look at

/* T003300 GIVEN TWO DATASETS CREATE AND APPLY THE TRANSACTION DATASET
THAT WILL MAKE THE TABLES THE SAME

http://homepage.mac.com/magdelina/.Public/utl.html
utl_tipweb.txt

OLD DATA
  |       SSN         JOB            ACTIVE        |
    |                                                |
    |      001001110  carpenter1       YES           |
    |      002001110  plumber1         YES           |
    |      003001110  mason1           YES           |
    |      004001110  plumber1         YES           |
    |      005001110  electrician1     YES           |
    |      006001110  mason3           YES           |
    |      008001110  mason4           NO            |
    |                                                |
    |   New table ( New data not in RDBMS )          |
    |   (ie NUSSNJOB)                                |
    |                                                |
    |       SSN         JOB            ACTIVE        |
    |                                                |
    |      001001110  carpenter1       YES           |
    |      002001110  plumber1         YES           |
    |      003001110  painter1         YES           |
    |      004001110  plumber1         YES           |
    |      005001110  electrician1     YES           |
    |      007001110  painter2         YES           |

Here is what the code gives you


 |  Transaction data set   to make onld data look like new data

You will just have a primary key(SSN)  and AETERM
Type will tell you if a AETERM was added, deleted or updated.


                       |
    |                                                |
    |     SSN       JOB     ACTIVE          __TYPE   |
    |                                                |
    |  006001110    mason3       YES       DELETE    |
    |  008001110    mason4       NO        DELETE    |
    |  007001110    painter2     YES       INSERT    |
    |  003001110    painter1     YES       UPDATE








=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D


*----------------------------------------------------------*\
|  Make oldrep look exactly like newrep by applying          |
|  the delta dateset using utldmod                           |
|  utl_update is the same as macro above                     |
|  Transaction dataset must contain __type variable          |
|  with the value UPDATE, INSERT or DELETE                   |
\*----------------------------------------------------------*/
%macro utl_update
   (
      master=3D
     ,transaction=3D
     ,key=3D
    )
    / des=3D"Update Insert Delete in Sybase Table";

 /*-------------------------------------------------*\
 |  This object applyes a transaction dataset to     |
 |  a master RDBMS table or SAS dataset. The result  |
 |  is an updated master ( RDBMS data )              |
 \*-------------------------------------------------*/

 /*----------------------------------------------*\
 |  IPO                                           |
 |    Inputs                                      |
 |    =3D=3D=3D=3D=3D=3D                                      |
 |      transaction - Transaction table must have |
 |               variable type which can          |
 |               have 3 values (SAS dataset)      |
 |               DELETE -- Delete Row in Master   |
 |               INSERT -- Insert Row in Master   |
 |               UPDATE -- Update Row in Master   |
 |      key   - unique index                      |
 |                                                |
 |    Process  - Modify SAS/RDBMS Table           |
 |    =3D=3D=3D=3D=3D=3D     based on transaction file        |
 |                                                |
 |    Output   - master Modified                  |
 |    =3D=3D=3D=3D=3D=3D                                      |
 |    Transaction table can be in work library    |
 \*----------------------------------------------*/

 data &master;
   modify &master &transaction;
   by &key;
   select (_iorc_);
     when (%sysrc(_dsenmr))   /* nomatch in MASTER _Add */
       do;
          if __type=3D"INSERT" then output &master;
          *put "Nomatch" __type=3D;
          _error_=3D0;
       end;
     when (%sysrc(_sok))   /* matched Update or Delete */
       do;
          *put "Matched " __type=3D;
          if      __type=3D"DELETE" then remove  &master;
          else if __type=3D"UPDATE" then replace &master;
          _error_=3D0;
       end;
     otherwise
       do;
          put "Unexpected ERROR Condition: _IORC_ =3D" _IORC_;
          _error_=3D0;
       end;
   end;
run;
%mend utl_update;

%macro utl_delta
    (
     uinmem1 =3Dwork.oldrep,       /* Last Months Data */
     uinmem2 =3Dwork.newrep,       /* Current Month Data */
     uinkey  =3Drep_socs,           /* primary unique key both tables */
     uotmem1 =3Drepdelta,         /* delta tble for RDBMS update */
     uotmem2 =3D repsame           /* records that do not change  */
    )
    / des =3D "Build delta SAS table for RDBMS update";
    /*----------------------------------------------*\
    |  WIN95 SAS611   --  UNIX SAS611 SOLARIS 2.5    |
    |                                                |
    |                                                |
    |  Create a delta ( transaction ) dataset        |
    |  based on a comparison of old and new data.    |
    |                                                |
    |  Use this new delta dataset to bring older     |
    |  RDBMS table up to date. ( invoke utldmod )    |
    |                                                |
    |  Proc compare new feed against updated RDBMS   |
    |  Result should be an exact comparison          |
    |                                                |
    |  You are probably asking why not just drop     |
    |  the RDBMS table and load the new table.       |
    |                                                |
    |  1. As a rule we keep SAS image tables         |
    |     of most read only RDBMS tables.            |
    |     And it is very easy to do detailed analysis|
    |     such as this before any major sales        |
    |     representative alignment.                  |
    |                                                |
    |  2. The delta dataset is very useful for QC,   |
    |     before updating RDBMS tables.              |
    |                                                |
    |  3. Table may reside in as many as three       |
    |     different databases on different platforms.|
    |     (Oracle, Sybase, Watcom, MS-Access)        |
    |     This plays to SASes strength               |
    |                                                |
    |  This code represents a slightly new           |
    |  methodology and as such has not been          |
    |  completely tested. CompuCraft would           |
    |  greatly appreciate any feedback.              |
    \*----------------------------------------------*/

    /*%^&*------------------------------------------*\
    | Description:                                   |
    |  Create a delta ( transaction ) dataset        |
    |  based on new data.                            |
    |                                                |
    |  This code creates the delta ( transaction )   |
    |  dataset.                                      |
    |                                                |
    |  IPO                                           |
    |   INPUTS                                       |
    |   =3D=3D=3D=3D=3D=3D                                       |
    |   Old table ( previous month - in RDBMS )      |
    |   (ie OLSSNJOB)                                |
    |                                                |
    |       SSN         JOB            ACTIVE        |
    |                                                |
    |      001001110  carpenter1       YES           |
    |      002001110  plumber1         YES           |
    |      003001110  mason1           YES           |
    |      004001110  plumber1         YES           |
    |      005001110  electrician1     YES           |
    |      006001110  mason3           YES           |
    |      008001110  mason4           NO            |
    |                                                |
    |   New table ( New data not in RDBMS )          |
    |   (ie NUSSNJOB)                                |
    |                                                |
    |       SSN         JOB            ACTIVE        |
    |                                                |
    |      001001110  carpenter1       YES           |
    |      002001110  plumber1         YES           |
    |      003001110  painter1         YES           |
    |      004001110  plumber1         YES           |
    |      005001110  electrician1     YES           |
    |      007001110  painter2         YES           |
    |                                                |
    |   PROCESS                                      |
    |   =3D=3D=3D=3D=3D=3D=3D                                      |
    |    Extract the old data from the RDBMS.        |
    |                                                |
    |    Union old and new data. Put an indicator    |
    |    variable on union. This variable            |
    |    identifies the origin of record ( old/new). |
    |                                                |
    |    Sort the union table on all fields.         |
    |                                                |
    |    If record appears only in the old table     |
    |    then deactivate the record. ( DELETE )      |
    |                                                |
    |    If a record appears only in the new data    |
    |    ( at least one field makes record diff)     |
    |    and the key appears in both new and old     |
    |    data then perform an update ( UPDATE )      |
    |                                                |
    |    If a record appears only in the new data    |
    |    and the key does not appear in old table    |
    |    then perform an insert ( INSERT )           |
    |                                                |
    |    If the record is exactly the same in        |
    |    both tables then leave it alone             |
    |                                                |
    |   OUTPUT                                       |
    |   =3D=3D=3D=3D=3D=3D                                       |
    |  Transaction data set                          |
    |                                                |
    |     SSN       JOB     ACTIVE          __TYPE   |
    |                                                |
    |  006001110    mason3       YES       DELETE    |
    |  008001110    mason4       NO        DELETE    |
    |  007001110    painter2     YES       INSERT    |
    |  003001110    painter1     YES       UPDATE    |
    |                                                |
    \*%^&*------------------------------------------*/

 /*  for testing without macro
 %let uinmem1 =3Doldrep;
 %let uinmem2 =3Dnewrep;
 %let uinkey  =3Drep_socs;
 %let uotmem1 =3Drepdelta;
 %let uotmem2=3D repsame;
 */

 %put %sysfunc(ifc(%sysevalf(%superq(uinmem1)=3D,boolean),  **** Please
Provide Previous dataset                  ,));
 %put %sysfunc(ifc(%sysevalf(%superq(uinmem2)=3D,boolean),  **** Please
Provide Current dataset                   ,));
 %put %sysfunc(ifc(%sysevalf(%superq(uinkey)=3D,boolean),**** Please
Provide primary unique key both tables    ,));
 %put %sysfunc(ifc(%sysevalf(%superq(uotmem1)=3D,boolean),**** Please
Provide transaction records dataset       ,));
 %put %sysfunc(ifc(%sysevalf(%superq(uotmem2)=3D,boolean), **** Please
Provide records that do not change dataset,));

 %if %eval(
    %sysfunc(ifc(%sysevalf(%superq(uinmem1)=3D,boolean),1,0)) +
    %sysfunc(ifc(%sysevalf(%superq(uinmem2)=3D,boolean),1,0)) +
    %sysfunc(ifc(%sysevalf(%superq(uinkey)=3D,boolean),1,0)) +
    %sysfunc(ifc(%sysevalf(%superq(uotmem1)=3D,boolean),1,0)) +
    %sysfunc(ifc(%sysevalf(%superq(uotmem2)=3D,boolean),1,0))
    ) eq 0 %then %do;

    data utldlta1
         (
          label =3D "Union of current & previous month"
         )
         / view=3Dutldlta1;
       retain &uinkey;
       set  &uinmem1 ( in =3D prevmnth )
            &uinmem2 ( in =3D currmnth );
       if prevmnth then __split =3D "OLD";
       else __split =3D "NEW";
    run;
    proc sql;
        select
            name into : ucols separated by ' '
        from
            dictionary.columns
         where
            libname =3D %upcase('work')         and
            memname =3D %upcase ( 'utldlta1' )  and
            name not eq %upcase( '__split' );
    quit;
    %put ucols =3D &ucols;
    proc sort data =3D utldlta1  out =3D utldlta2;
        by &ucols __split;
    run;
    %let uwrds =3D %sysfunc(countw(&ucols));
    %let ulastcol =3D %scan (&ucols, &uwrds );
    data utldlta3
         (
          label =3D "Transactions"
         )
         &uotmem2
         (
          label =3D "No change"
         );
        set utldlta2;
        by &ucols;
        retain __split __type;
        select;
            /* Only in previous month */
            when ( ( first.&ulastcol eq last.&ulastcol ) and __split =3D
"OLD") do;
                if ( first.&uinkey eq last.&uinkey ) then do; /*
unique record */
                    __type=3D"DELETE";                          /* key
unique    */
                    output utldlta3;                          /* OLD
data only */
                end;
                else do;                                      /*
unique record */
                    __type =3D "UPDATE";                        /* key
same      */
                    output &uotmem2;                      /* other
field changed */
                end;
            end;
            /* Only in new month */
            when ( first.&ulastcol eq last.&ulastcol ) do;    /*
unique record */
                if ( first.&uinkey eq last.&uinkey ) then do; /* new
key       */
                    __type =3D "INSERT";
                    output utldlta3;
                end;
                else do;                                      /*
unique record */
                    __type =3D "UPDATE";                        /* same
key      */
                    output utldlta3;                          /* new
data only */
                end;
            end;

            /* Duplicate records same in both months */
            when ( first.&ulastcol ne last.&ulastcol ) do;    /*
duplicate record  */
                output &uotmem2;
            end;
        otherwise put "ERROR =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D> NEVER SAY N=
EVER
<=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D ERROR";
        end;
    run;
    proc sort data=3Dutldlta3
        out=3D&uotmem1
        (
        label =3D "Transaction dataset"
        index =3D ( &uinkey   / unique )
        );
    by __type;
    run;
    %macro utl_nlobs(dsn);
       %let dsid=3D%sysfunc(open(&dsn));%sysfunc(attrn(&dsid,nlobs))
%let rc=3D%sysfunc(close(&dsid));
    %mend utl_lnobs;
    proc print data=3D&uinmem1(obs=3D30) width=3Dminimum noobs;
    title "Old RDBMS Data obs=3D%utl_nlobs(&uinmem1)";
    run;
    proc print data=3D&uinmem2(obs=3D30) width=3Dminimum noobs;
    title "New Feed Data obs=3D%utl_nlobs(&uinmem2)";
    run;
    proc print data=3D&uotmem1(obs=3D30) width=3Dminimum noobs;
    title "Transaction data set obs=3D%utl_nlobs(&uotmem1)";
    run;
    proc print data=3D&uotmem2(obs=3D30 drop=3D__type) width=3Dminimum noob=
s;
    title "No change data set obs=3D%utl_nlobs(&uotmem2)";
    run;
    /*----------------------------------------------------------*\
    |  Make oldrep look exactly like newrep by applying          |
    |  the delta dateset using utldmod                           |
    \*----------------------------------------------------------*/

    /*

   data oldrep;
     input rep_socs : $9. JOB  : $16. ACTIVE  : $3.;
   cards;
      001001110  carpenter1       YES
      002001110  plumber1         YES
      003001110  mason1           YES
      004001110  plumber1         YES
      005001110  electrician1     YES
      006001110  mason3           YES
      008001110  mason4           NO
   ;
   run;
   data newrep;
     input rep_socs : $9. JOB  : $16. ACTIVE  : $3.;
   cards;
      001001110  carpenter1       YES
      002001110  plumber1         YES
      003001110  painter1         YES
      004001110  plumber1         YES
      005001110  electrician1     YES
      007001110  painter2         YES
   ;
   run;

   %utl_delta
    (
     uinmem1 =3Doldrep,
     uinmem2 =3Dnewrep,
     uinkey  =3Drep_socs,
     uotmem1 =3Drepdelta,
     uotmem2=3D repsame
    );
    %utl_delta;

    proc sort data=3Ddat.oldrep out=3Dol;by rep_socs;run;
    proc sort data=3Ddat.newrep out=3Dnu;by rep_socs;run;

    title "Indepth comparison Updated RDMS table with New Feed";

    proc compare data=3Dol compare=3Dnu;
    run;
    */
 %end; /* end macro argument checks */

%mend utl_delta;

data oldrep;  /* rep_socs is primary ket */
  input rep_socs : $9. JOB  : $16. ACTIVE  : $3.;
cards;
   001001110  carpenter1       YES
   002001110  plumber1         YES
   003001110  mason1           YES
   004001110  plumber1         YES
   005001110  electrician1     YES
   006001110  mason3           YES
   008001110  mason4           NO
;
run;

data newrep;
  input rep_socs : $9. JOB  : $16. ACTIVE  : $3.;
cards;
   001001110  carpenter1       YES
   002001110  plumber1         YES
   003001110  painter1         YES
   004001110  plumber1         YES
   005001110  electrician1     YES
   007001110  painter2         YES
;
run;

%utl_delta
    (
     uinmem1 =3Doldrep,       /* Last Months Data */
     uinmem2 =3Dnewrep,       /* Current Month Data */
     uinkey  =3Drep_socs,      /* primary unique key both tables */
     uotmem1 =3Drepdelta,   /* delta tble for RDBMS update */
     uotmem2=3D repsame
    );

%utl_update
   (
    master=3Doldrep
   ,transaction=3Drepdelta
   ,key=3Drep_socs
   );

proc sort data=3Doldrep out=3Dol;by rep_socs;run;  /* oldrep has been
updated with transactions and now is equal to newrep */
proc sort data=3Dnewrep out=3Dnu;by rep_socs;run;
title "Indepth comparison Updated RDMS table with New Feed";

proc compare data=3Dol compare=3Dnu;
run;



0
Reply xlr82sas 4/8/2010 10:06:24 PM


On Apr 8, 1:41=A0pm, Al <ali6...@gmail.com> wrote:
> Hi All:
>
> I have two adverse events data sets Ae_old and Ae_New .. I =A0am looking
> to compare Aeterm (Term entered in the CRF) to see if there were any
> additions/deletions or changes in the AEterm in Ae_new from Ae_old
> dataset and create a flag in the Ae_new =A0to know which data point was
> added (flag =3D 1) ,which data point was deleted (flag =3D 2) and which
> data point is changed(flag =3D 3)
>
> I am using this proc compare .
>
> proc compare base =3D ae_old compare =3D ae_new OUTNOEQUAL =A0OUTCOMP
> OUTBASE /*OUTDIF*/ NOPRINT out =3D Ae_diff;
> =A0 =A0id subject =A0;
> =A0 =A0var aeterm ;
> run;
>
> but I am having difficulty to create a flags .Any suggestions
>
> Thanks in advance

Use a merge instead?

UNTESTED

data new;
merge ae_old (in=3Da) ae_new (in=3Db rename aeterm=3Daterm2);
by id;
if a and not b then flag=3D1;
if b and not a then flag=3D2;
if a and b and aeterm ne aeterm2 then flag=3D3; *assumes you renamed
variable in new dataset (original =3Dvar1 new =3Dvar2);
if flag=3D. then flag=3D4; *check if missing any;
run;

HTH,
Reese

0
Reply Reeza 4/8/2010 11:21:51 PM

On Apr 8, 4:21=A0pm, Reeza <fkhurs...@hotmail.com> wrote:
> On Apr 8, 1:41=A0pm, Al <ali6...@gmail.com> wrote:
>
>
>
> > Hi All:
>
> > I have two adverse events data sets Ae_old and Ae_New .. I =A0am lookin=
g
> > to compare Aeterm (Term entered in the CRF) to see if there were any
> > additions/deletions or changes in the AEterm in Ae_new from Ae_old
> > dataset and create a flag in the Ae_new =A0to know which data point was
> > added (flag =3D 1) ,which data point was deleted (flag =3D 2) and which
> > data point is changed(flag =3D 3)
>
> > I am using this proc compare .
>
> > proc compare base =3D ae_old compare =3D ae_new OUTNOEQUAL =A0OUTCOMP
> > OUTBASE /*OUTDIF*/ NOPRINT out =3D Ae_diff;
> > =A0 =A0id subject =A0;
> > =A0 =A0var aeterm ;
> > run;
>
> > but I am having difficulty to create a flags .Any suggestions
>
> > Thanks in advance
>
> Use a merge instead?
>
> UNTESTED
>
> data new;
> merge ae_old (in=3Da) ae_new (in=3Db rename aeterm=3Daterm2);
> by id;
> if a and not b then flag=3D1;
> if b and not a then flag=3D2;
> if a and b and aeterm ne aeterm2 then flag=3D3; *assumes you renamed
> variable in new dataset (original =3Dvar1 new =3Dvar2);
> if flag=3D. then flag=3D4; *check if missing any;
> run;
>
> HTH,
> Reese

I little rework of Reese excellent post. I added it to my tips. thanks
Reese.

/* T005780 CHANGES IN LAST MONTHS CLASS DATASET - MARKING UPDATES,
INSERTS AND DELETES IN STUDENT WEIGHTS  - SAS-L REESE POST */
http://homepage.mac.com/magdelina/.Public/utl.html
utl_tipweb.txt

/* insert, update and delete for one variable from SAS-L Reese  */
data jan2010;
 keep name weight;
 set sashelp.class;
run;

data feb2010;
 keep name weight;
 set sashelp.class end=3Ddne;
 if name=3D'John' then weight=3D100;
 if name=3D'Jane' then delete;
 output;
 if dne then do; name=3D'Zero'; weight=3D110; output; end;
run;

data dif;
   merge jan2010 (in=3Djan rename=3Dweight=3Djan_weight)
         feb2010 (in=3Dfeb rename=3Dweight=3Dfeb_weight);
   by name;
   select;
    when ( jan and not feb ) flg=3D'DELETE';
    when ( feb and not jan ) flg=3D'INSERT';
    when ( feb and jan and feb_weight ne jan_weight ) flg=3D'UPDATE';
    when ( jan and feb and feb_weight eq jan_weight ) flg=3D'';
    /* LEAVE OFF OTHERWISE TO FORCE AN HARD ERROR */
  end;
run;

/*
             JAN_      FEB_
 NAME       WEIGHT    WEIGHT     FLG

 Alfred      112.5     112.5
 Alice        84.0      84.0
 Barbara      98.0      98.0
 Carol       102.5     102.5
 Henry       102.5     102.5
 James        83.0      83.0
 Jane         84.5        .     DELETE
 Janet       112.5     112.5
 Jeffrey      84.0      84.0
 John         99.5     100.0    UPDATE
 Joyce        50.5      50.5
 Judy         90.0      90.0
 Louise       77.0      77.0
 Mary        112.0     112.0
 Philip      150.0     150.0
 Robert      128.0     128.0
 Ronald      133.0     133.0
 Thomas       85.0      85.0
 William     112.0     112.0
 Zero           .      110.0    INSERT
*/
0
Reply xlr82sas 4/11/2010 6:40:31 PM

3 Replies
212 Views

(page loaded in 0.213 seconds)


Reply: