f



How to filter sas data sets into separate sas data sets

Listers,

This is my failing point in coding SAS.  The use of 2 separate SAS data sets to create a third.

I can do this in a program with nested Do loops.  But with SAS it is just different enough I seem to be unable to see the coding technique.

Problem:

Sas Data set 1 contains 1 variable and 1000 obs.
Sas Data set 2 contains 15 variables and 500000 obs.

I need to filter off the data in Sas data set 2 based on Sas data set 1

Node1, Node2 and Node3 must match VAR1 exactly.  I have all the code I need to do this except for splitting the data into a separate SAS data set (NEWLISTS).


Psuedo code:

Sas Data set 1:  Var1 Length 7 Char
Sas Data set 2:  Node1 Length 1-8 Char   Node2 Length 1-8 Char  Node3 Length 1-8 Char



Data Newlist (KEEP=node1 node2 node3 ... var15)
     Otherds (Keep=node1 node2 node3 .... var15) ;

Set SASDS1 ;   *Contains 1 var 1000 obs   ;
Set SASDS2 ;   *Contains 15 vars and 500,000 obs  ;


If Node1 = VAR1 or Node2 = VAR1 or Node3 = Var1 then Output Newlist ;
Else Output Otherds ;


I have tried putting in two SET statements, but I seem to reduce the information dramatically.  I have been reviewing the coding examples on the SAS Wesbsite as well as in the archives.  I am just so crunched for time that I must lean on you for assistance.

I know this should be easy, I am just not seeing.

Thanks for the assistance

Lizette
0
starsoul (24)
4/2/2007 2:53:08 PM
comp.soft-sys.sas 142828 articles. 3 followers. Post Follow

9 Replies
616 Views

Similar Articles

[PageSpeed] 45

Lizette,

a quick question first: what release of SAS are you using?

Seems like an ideal task for a hash solution. Hashes however are only available with SAS version 9.

Robert Bardos
Ansys AG, Zurich, Switzerland

> -----Ursprüngliche Nachricht-----
> Von: SAS(r) Discussion
> [mailto:SAS-L@LISTSERV.UGA.EDU]Im Auftrag von
> Lizette Koehler
> Gesendet: Montag, 2. April 2007 16:53
> An: SAS-L@LISTSERV.UGA.EDU
> Betreff: How to filter sas data sets into separate sas data sets
>
>
> Listers,
>
> This is my failing point in coding SAS.  The use of 2
> separate SAS data sets to create a third.
>
> I can do this in a program with nested Do loops.  But
> with SAS it is just different enough I seem to be
> unable to see the coding technique.
>
> Problem:
>
> Sas Data set 1 contains 1 variable and 1000 obs.
> Sas Data set 2 contains 15 variables and 500000 obs.
>
> I need to filter off the data in Sas data set 2 based
> on Sas data set 1
>
> Node1, Node2 and Node3 must match VAR1 exactly.  I have
> all the code I need to do this except for splitting the
> data into a separate SAS data set (NEWLISTS).
>
>
> Psuedo code:
>
> Sas Data set 1:  Var1 Length 7 Char
> Sas Data set 2:  Node1 Length 1-8 Char   Node2 Length
> 1-8 Char  Node3 Length 1-8 Char
>
>
>
> Data Newlist (KEEP=node1 node2 node3 ... var15)
>      Otherds (Keep=node1 node2 node3 .... var15) ;
>
> Set SASDS1 ;   *Contains 1 var 1000 obs   ;
> Set SASDS2 ;   *Contains 15 vars and 500,000 obs  ;
>
>
> If Node1 = VAR1 or Node2 = VAR1 or Node3 = Var1 then
> Output Newlist ;
> Else Output Otherds ;
>
>
> I have tried putting in two SET statements, but I seem
> to reduce the information dramatically.  I have been
> reviewing the coding examples on the SAS Wesbsite as
> well as in the archives.  I am just so crunched for
> time that I must lean on you for assistance.
>
> I know this should be easy, I am just not seeing.
>
> Thanks for the assistance
>
> Lizette
>
0
bardos2 (393)
4/2/2007 3:17:30 PM
you cannot do this as described without a common link
here, your Var1

Note: both data sets must be sorted by Var1

Data Newlist (KEEP=node1 node2 node3 ... var15)
     Otherds (Keep=node1 node2 node3 .... var15) ;

do until(EndoFile);

Set SASDS1
    SASDS2 end = EndoFile;
by  Var1;

*consider: if Var1 in (Node1 Node2 Node3);

If    Node1 = VAR1
   or Node2 = VAR1
   or Node3 = Var1
   then Output Newlist ;
Else    Output Otherds ;

end;
stop;

Undoubtedly the HashHeads will propose a lookup table

but the above is what you are trying to do.

Ron Fehd  the macro maven  CDC Atlanta GA USA RJF2 at cdc dot gov

> From: starsoul@mindspring.com
> Listers,
>
> This is my failing point in coding SAS.  The use of 2
> separate SAS data sets to create a third.
>
> I can do this in a program with nested Do loops.  But with
> SAS it is just different enough I seem to be unable to see
> the coding technique.
>
> Problem:
>
> Sas Data set 1 contains 1 variable and 1000 obs.
> Sas Data set 2 contains 15 variables and 500000 obs.
>
> I need to filter off the data in Sas data set 2 based on Sas
> data set 1
>
> Node1, Node2 and Node3 must match VAR1 exactly.  I have all
> the code I need to do this except for splitting the data into
> a separate SAS data set (NEWLISTS).
>
>
> Psuedo code:
>
> Sas Data set 1:  Var1 Length 7 Char
> Sas Data set 2:  Node1 Length 1-8 Char   Node2 Length 1-8
> Char  Node3 Length 1-8 Char
>
>
>
> Data Newlist (KEEP=node1 node2 node3 ... var15)
>      Otherds (Keep=node1 node2 node3 .... var15) ;
>
> Set SASDS1 ;   *Contains 1 var 1000 obs   ;
> Set SASDS2 ;   *Contains 15 vars and 500,000 obs  ;
>
>
> If Node1 = VAR1 or Node2 = VAR1 or Node3 = Var1 then Output
> Newlist ; Else Output Otherds ;
>
>
> I have tried putting in two SET statements, but I seem to
> reduce the information dramatically.  I have been reviewing
> the coding examples on the SAS Wesbsite as well as in the
> archives.  I am just so crunched for time that I must lean on
> you for assistance.
>
> I know this should be easy, I am just not seeing.
0
rjf2 (3342)
4/2/2007 3:20:51 PM
Lizette,

Instead of trying to merge the two data sets, I would probably try to create
a SAS format from the values of VAR1 in data set 1.  Then, NODE1, NODE2 and
NODE3 could be compared against the format for a match.

The example below is a simplified version of what you could do and shows a
printout of how it works.  It has 5 observations in data set 1 and only 3
variables in data set 2, but I think the logic should hold for the example
you gave.  After the example is code that could be used to actually split
the data as you had requested.  Hope this helps.


* create sas data set 1 ;
data sasds1;
   input var1 $;
cards;
AAA
BBB
DDD
FFF
AAA
HHH
;
run;

* sort data set 1 to eliminate any duplicate values ;
proc sort nodupkey data = sasds1 out = sasds1_dd (rename=(var1=start));
   by var1;
run;

* create data set that will be used to build SAS format ;
data sasds1_dd;
   set sasds1_dd end=last;
fmtname = '$NODES';
label = 'Y';
output;
if last then do;
   hlo='O';
   label='N';
   output;
end;
run;

* build format (the optional fmtlib option will print the format for you to
review) ;
proc format cntlin = sasds1_dd fmtlib;
run;

* create sas dataset 2 ;
data sasds2;
   input node1 $ node2 $ node3 $;
cards;
ZZZ YYY XXX
AAA YYY XXX
ZZZ FFF XXX
ZZZ YYY DDD
AA  YYY XXX
ZZZ AAAA XXX
;
run;

* apply format to node1-node3 to determine if there is a match ;
data testing;
   set sasds2;
if put(node1,$nodes.) = 'Y' or
   put(node2,$nodes.) = 'Y' or
   put(node3,$nodes.) = 'Y' then keep_it='Y';
        else keep_it = 'N';
run;

proc print data = testing;
run;


Once you review this example, if it seems to do what you want, the splitting
of the data could be accomplished with the code below.


data newlist otherds;
   set sasds2;
if put(node1,$nodes.) = 'Y' or
   put(node2,$nodes.) = 'Y' or
   put(node3,$nodes.) = 'Y' then output newlist;
        else output otherds;
run;



Jack Clark
Research Analyst
Center for Health Program Development and Management
University of Maryland, Baltimore County




-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Lizette
Koehler
Sent: Monday, April 02, 2007 10:53 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: How to filter sas data sets into separate sas data sets

Listers,

This is my failing point in coding SAS.  The use of 2 separate SAS data sets
to create a third.

I can do this in a program with nested Do loops.  But with SAS it is just
different enough I seem to be unable to see the coding technique.

Problem:

Sas Data set 1 contains 1 variable and 1000 obs.
Sas Data set 2 contains 15 variables and 500000 obs.

I need to filter off the data in Sas data set 2 based on Sas data set 1

Node1, Node2 and Node3 must match VAR1 exactly.  I have all the code I need
to do this except for splitting the data into a separate SAS data set
(NEWLISTS).


Psuedo code:

Sas Data set 1:  Var1 Length 7 Char
Sas Data set 2:  Node1 Length 1-8 Char   Node2 Length 1-8 Char  Node3 Length
1-8 Char



Data Newlist (KEEP=node1 node2 node3 ... var15)
     Otherds (Keep=node1 node2 node3 .... var15) ;

Set SASDS1 ;   *Contains 1 var 1000 obs   ;
Set SASDS2 ;   *Contains 15 vars and 500,000 obs  ;


If Node1 = VAR1 or Node2 = VAR1 or Node3 = Var1 then Output Newlist ;
Else Output Otherds ;


I have tried putting in two SET statements, but I seem to reduce the
information dramatically.  I have been reviewing the coding examples on the
SAS Wesbsite as well as in the archives.  I am just so crunched for time
that I must lean on you for assistance.

I know this should be easy, I am just not seeing.

Thanks for the assistance

Lizette
0
JClark1 (431)
4/2/2007 3:24:34 PM
Something like this is the old way.  You could use a proc sql if you
have a new enough version.  Increase your buffersize and if you have
enough memory you may get it into a hash routine.

DATA WORK.NEW;
        MERGE small (IN=A OBS=500) big ;
                BY ID_FIELD;
        IF A=1;
RUN;
QUIT;

RICH
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@listserv.vt.edu] On Behalf Of
Lizette Koehler
Sent: Monday, April 02, 2007 10:53 AM
To: SAS-L@LISTSERV.VT.EDU
Subject: How to filter sas data sets into separate sas data sets

Listers,

This is my failing point in coding SAS.  The use of 2 separate SAS data
sets to create a third.

I can do this in a program with nested Do loops.  But with SAS it is
just different enough I seem to be unable to see the coding technique.

Problem:

Sas Data set 1 contains 1 variable and 1000 obs.
Sas Data set 2 contains 15 variables and 500000 obs.

I need to filter off the data in Sas data set 2 based on Sas data set 1

Node1, Node2 and Node3 must match VAR1 exactly.  I have all the code I
need to do this except for splitting the data into a separate SAS data
set (NEWLISTS).


Psuedo code:

Sas Data set 1:  Var1 Length 7 Char
Sas Data set 2:  Node1 Length 1-8 Char   Node2 Length 1-8 Char  Node3
Length 1-8 Char



Data Newlist (KEEP=node1 node2 node3 ... var15)
     Otherds (Keep=node1 node2 node3 .... var15) ;

Set SASDS1 ;   *Contains 1 var 1000 obs   ;
Set SASDS2 ;   *Contains 15 vars and 500,000 obs  ;


If Node1 = VAR1 or Node2 = VAR1 or Node3 = Var1 then Output Newlist ;
Else Output Otherds ;


I have tried putting in two SET statements, but I seem to reduce the
information dramatically.  I have been reviewing the coding examples on
the SAS Wesbsite as well as in the archives.  I am just so crunched for
time that I must lean on you for assistance.

I know this should be easy, I am just not seeing.

Thanks for the assistance

Lizette
0
reeves5032 (10)
4/2/2007 3:39:10 PM
Lizette:

  Can you show
(1) your nested do loops that work for you, and
(2) some sample data and the data sets that you desire to get?

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Lizette Koehler
Sent: Monday, April 02, 2007 9:53 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: How to filter sas data sets into separate sas data sets

Listers,

This is my failing point in coding SAS.  The use of 2 separate SAS data
sets to create a third.

I can do this in a program with nested Do loops.  But with SAS it is
just different enough I seem to be unable to see the coding technique.

Problem:

Sas Data set 1 contains 1 variable and 1000 obs.
Sas Data set 2 contains 15 variables and 500000 obs.

I need to filter off the data in Sas data set 2 based on Sas data set 1

Node1, Node2 and Node3 must match VAR1 exactly.  I have all the code I
need to do this except for splitting the data into a separate SAS data
set (NEWLISTS).


Psuedo code:

Sas Data set 1:  Var1 Length 7 Char
Sas Data set 2:  Node1 Length 1-8 Char   Node2 Length 1-8 Char  Node3
Length 1-8 Char



Data Newlist (KEEP=node1 node2 node3 ... var15)
     Otherds (Keep=node1 node2 node3 .... var15) ;

Set SASDS1 ;   *Contains 1 var 1000 obs   ;
Set SASDS2 ;   *Contains 15 vars and 500,000 obs  ;


If Node1 = VAR1 or Node2 = VAR1 or Node3 = Var1 then Output Newlist ;
Else Output Otherds ;


I have tried putting in two SET statements, but I seem to reduce the
information dramatically.  I have been reviewing the coding examples on
the SAS Wesbsite as well as in the archives.  I am just so crunched for
time that I must lean on you for assistance.

I know this should be easy, I am just not seeing.

Thanks for the assistance

Lizette


-----Message Disclaimer-----

This e-mail message is intended only for the use of the individual or
entity to which it is addressed, and may contain information that is
privileged, confidential and exempt from disclosure under applicable law.
If you are not the intended recipient, any dissemination, distribution or
copying of this communication is strictly prohibited. If you have
received this communication in error, please notify us immediately by
reply email to Connect@principal.com and delete or destroy all copies of
the original message and attachments thereto. Email sent to or from the
Principal Financial Group or any of its member companies may be retained
as required by law or regulation.

Nothing in this message is intended to constitute an Electronic signature
for purposes of the Uniform Electronic Transactions Act (UETA) or the
Electronic Signatures in Global and National Commerce Act ("E-Sign")
unless a specific statement to the contrary is included in this message.

While this communication may be used to promote or market a transaction
or an idea that is discussed in the publication, it is intended to provide
general information about the subject matter covered and is provided with
the understanding that The Principal is not rendering legal, accounting,
or tax advice. It is not a marketed opinion and may not be used to avoid
penalties under the Internal Revenue Code. You should consult with
appropriate counsel or other advisors on all matters pertaining to legal,
tax, or accounting obligations and requirements.
0
Huang.JS (234)
4/2/2007 5:23:01 PM
First off, you don't need two set statements, I suspect you want a
merge.

(remember to sort datasets before merging)

Data newlist (KEEP=node1 node2 node3 ... var15)
     Otherds (Keep=node1 node2 node3 .... var15) ;
Merge sasds1 sasds2;
By ;/*not sure which variable you need, but there has to be something
that is the same between the two*/



What I'd really need to know is what are the field names in DS1 and DS2
(You described them  but didn't tell us the names)

You will possibly need a REANME= statement to get a matching name to
merge by.

Either read up more on merges and RENAME or get back to us with the
variable names, and some sample data (maybe a proc print with obs=20).

HTH






-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Lizette Koehler
Sent: Monday, April 02, 2007 9:53 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: How to filter sas data sets into separate sas data sets

Listers,

This is my failing point in coding SAS.  The use of 2 separate SAS data
sets to create a third.

I can do this in a program with nested Do loops.  But with SAS it is
just different enough I seem to be unable to see the coding technique.

Problem:

Sas Data set 1 contains 1 variable and 1000 obs.
Sas Data set 2 contains 15 variables and 500000 obs.

I need to filter off the data in Sas data set 2 based on Sas data set 1

Node1, Node2 and Node3 must match VAR1 exactly.  I have all the code I
need to do this except for splitting the data into a separate SAS data
set (NEWLISTS).


Psuedo code:

Sas Data set 1:  Var1 Length 7 Char
Sas Data set 2:  Node1 Length 1-8 Char   Node2 Length 1-8 Char  Node3
Length 1-8 Char



Data Newlist (KEEP=node1 node2 node3 ... var15)
     Otherds (Keep=node1 node2 node3 .... var15) ;

Set SASDS1 ;   *Contains 1 var 1000 obs   ;
Set SASDS2 ;   *Contains 15 vars and 500,000 obs  ;


If Node1 = VAR1 or Node2 = VAR1 or Node3 = Var1 then Output Newlist ;
Else Output Otherds ;


I have tried putting in two SET statements, but I seem to reduce the
information dramatically.  I have been reviewing the coding examples on
the SAS Wesbsite as well as in the archives.  I am just so crunched for
time that I must lean on you for assistance.

I know this should be easy, I am just not seeing.

Thanks for the assistance

Lizette


-----Message Disclaimer-----

This e-mail message is intended only for the use of the individual or
entity to which it is addressed, and may contain information that is
privileged, confidential and exempt from disclosure under applicable
law.
If you are not the intended recipient, any dissemination, distribution
or copying of this communication is strictly prohibited. If you have
received this communication in error, please notify us immediately by
reply email to Connect@principal.com and delete or destroy all copies of
the original message and attachments thereto. Email sent to or from the
Principal Financial Group or any of its member companies may be retained
as required by law or regulation.

Nothing in this message is intended to constitute an Electronic
signature for purposes of the Uniform Electronic Transactions Act (UETA)
or the Electronic Signatures in Global and National Commerce Act
("E-Sign") unless a specific statement to the contrary is included in
this message.

While this communication may be used to promote or market a transaction
or an idea that is discussed in the publication, it is intended to
provide general information about the subject matter covered and is
provided with the understanding that The Principal is not rendering
legal, accounting, or tax advice. It is not a marketed opinion and may
not be used to avoid penalties under the Internal Revenue Code. You
should consult with appropriate counsel or other advisors on all matters
pertaining to legal, tax, or accounting obligations and requirements.







The information contained in this e-mail is confidential and/or proprietary

to Capital One and/or its affiliates. The information transmitted herewith

is intended only for use by the individual or entity to which it is

addressed.  If the reader of this message is not the intended recipient,

you are hereby notified that any review, retransmission, dissemination,

distribution, copying or other use of, or taking of any action in reliance

upon this information is strictly prohibited. If you have received this

communication in error, please contact the sender and delete the material

from your computer.
0
4/2/2007 6:00:35 PM
Summary: You need a loop and the POINT option or SQL
#iw-value=1

Lizette,

I think a lot of respondents have misunderstood the problem.  This probably
could have been avoided with a simplified example.  Here is my
interpretation.  Here is code to generate data.

   data control ;
     do var1 = 1 to 4 ; output ; end ;
   run ;

   data tosplit ;
     input node1 node2 node3 other ;
   cards ;
   1 2 3 4
   1 2 3 5
   7 7 7 7
   ;

Here is a way to split using the DATA step.

   data newlist other ;
     drop flag var1 ;
     set tosplit ;
     do pt = 1 to nobs ;
        set control point = pt nobs = nobs ;
        if var1 = node1 or var1 = node2 or var1 = node3 then
           flag = 1 ;
     end ;
     if flag then output newlist ;
     else
        output other ;
   run ;

And here is a way to split using SQL.

   proc sql ;
      create table newlist as
      select * from tosplit
       where node1 in (select var1 from control)
          or node2 in (select var1 from control)
          or node2 in (select var1 from control)
      ;
      create table other as
      select * from tosplit
       where not(node1 in (select var1 from control)
          or node2 in (select var1 from control)
          or node2 in (select var1 from control))
      ;
   quit ;

Ian Whitlock
================

Date:         Mon, 2 Apr 2007 10:53:08 -0400
Reply-To:     starsoul@mindspring.com
Sender:       "SAS(r) Discussion"
From:         Lizette Koehler <starsoul@MINDSPRING.COM>
Subject:      How to filter sas data sets into separate sas data sets
Content-Type: text/plain; charset=UTF-8

Listers,

This is my failing point in coding SAS. The use of 2 separate SAS data sets
to create a third.

I can do this in a program with nested Do loops. But with SAS it is just
different enough I seem to be unable to see the coding technique.

Problem:

Sas Data set 1 contains 1 variable and 1000 obs. Sas Data set 2 contains 15
variables and 500000 obs.

I need to filter off the data in Sas data set 2 based on Sas data set 1

Node1, Node2 and Node3 must match VAR1 exactly. I have all the code I need
to do this except for splitting the data into a separate SAS data set
(NEWLISTS).

Psuedo code:

Sas Data set 1: Var1 Length 7 Char Sas Data set 2: Node1 Length 1-8 Char
Node2 Length 1-8 Char Node3 Length 1-8 Char

Data Newlist (KEEP=node1 node2 node3 ... var15) Otherds (Keep=node1 node2
node3 .... var15) ;

Set SASDS1 ; *Contains 1 var 1000 obs ; Set SASDS2 ; *Contains 15 vars and
500,000 obs ;

If Node1 = VAR1 or Node2 = VAR1 or Node3 = Var1 then Output Newlist ; Else
Output Otherds ;

I have tried putting in two SET statements, but I seem to reduce the
information dramatically. I have been reviewing the coding examples on the
SAS Wesbsite as well as in the archives. I am just so crunched for time
that I must lean on you for assistance.

I know this should be easy, I am just not seeing.

Thanks for the assistance

Lizette
0
iw1junk (1187)
4/2/2007 11:37:16 PM
On Mon, 2 Apr 2007 23:37:16 +0000, Ian Whitlock <iw1junk@COMCAST.NET> wrote:

>Summary: You need a loop and the POINT option or SQL
>#iw-value=1
>
>Lizette,
>
>I think a lot of respondents have misunderstood the problem.  This probably
>could have been avoided with a simplified example.  Here is my
>interpretation.  Here is code to generate data.
>
>   data control ;
>     do var1 = 1 to 4 ; output ; end ;
>   run ;
>
>   data tosplit ;
>     input node1 node2 node3 other ;
>   cards ;
>   1 2 3 4
>   1 2 3 5
>   7 7 7 7
>   ;

Here are test data generators which conform to the actual scale of the problem.

   data control;
   do _n_ = 1 to 1000;
      var1 = floor(ranuni(135)* 100000);
      output;
      end;
   run;

   data tosplit;
   do other = 1 to 500000;
      node1 = floor(ranuni(246)* 100000);
      node2 = floor(ranuni(246)* 100000);
      node3 = floor(ranuni(246)* 100000);
      output;
      end;
   run;

>
>Here is a way to split using the DATA step.
>
>   data newlist other ;
>     drop flag var1 ;
>     set tosplit ;
>     do pt = 1 to nobs ;
>        set control point = pt nobs = nobs ;
>        if var1 = node1 or var1 = node2 or var1 = node3 then
>           flag = 1 ;
>     end ;
>     if flag then output newlist ;
>     else
>        output other ;
>   run ;

At full scale the loop runs one half billion times. That took about 9
minutes on my PC. So I would use a different table lookup device (format,
hash, SET with an index).

>
>And here is a way to split using SQL.
>
>   proc sql ;
>      create table newlist as
>      select * from tosplit
>       where node1 in (select var1 from control)
>          or node2 in (select var1 from control)
>          or node2 in (select var1 from control)
>      ;
>      create table other as
>      select * from tosplit
>       where not(node1 in (select var1 from control)
>          or node2 in (select var1 from control)
>          or node2 in (select var1 from control))
>      ;
>   quit ;

Only downside here is that NEWLIST has to be passed twice, because SQL can
build only one table at a time.

Here's a way to let SQL do most of the work while a DATA step does the
actual splitting, all in one pass.

  proc sql ;
      create view intermediate as
      select * ,
             node1 in (select var1 from control)
          or node2 in (select var1 from control)
          or node2 in (select var1 from control) as matched
               from tosplit
      ;
   quit ;

   data newlist other;
   set intermediate;
   if matched then output newlist;
   else            output other;
   drop matched;
   run;

>
>Ian Whitlock
>================
>
>Date:         Mon, 2 Apr 2007 10:53:08 -0400
>Reply-To:     starsoul@mindspring.com
>Sender:       "SAS(r) Discussion"
>From:         Lizette Koehler <starsoul@MINDSPRING.COM>
>Subject:      How to filter sas data sets into separate sas data sets
>Content-Type: text/plain; charset=UTF-8
>
>Listers,
>
>This is my failing point in coding SAS. The use of 2 separate SAS data sets
>to create a third.
>
>I can do this in a program with nested Do loops. But with SAS it is just
>different enough I seem to be unable to see the coding technique.
>
>Problem:
>
>Sas Data set 1 contains 1 variable and 1000 obs. Sas Data set 2 contains 15
>variables and 500000 obs.
>
>I need to filter off the data in Sas data set 2 based on Sas data set 1
>
>Node1, Node2 and Node3 must match VAR1 exactly. I have all the code I need
>to do this except for splitting the data into a separate SAS data set
>(NEWLISTS).
>
>Psuedo code:
>
>Sas Data set 1: Var1 Length 7 Char Sas Data set 2: Node1 Length 1-8 Char
>Node2 Length 1-8 Char Node3 Length 1-8 Char
>
>Data Newlist (KEEP=node1 node2 node3 ... var15) Otherds (Keep=node1 node2
>node3 .... var15) ;
>
>Set SASDS1 ; *Contains 1 var 1000 obs ; Set SASDS2 ; *Contains 15 vars and
>500,000 obs ;
>
>If Node1 = VAR1 or Node2 = VAR1 or Node3 = Var1 then Output Newlist ; Else
>Output Otherds ;
>
>I have tried putting in two SET statements, but I seem to reduce the
>information dramatically. I have been reviewing the coding examples on the
>SAS Wesbsite as well as in the archives. I am just so crunched for time
>that I must lean on you for assistance.
>
>I know this should be easy, I am just not seeing.
>
>Thanks for the assistance
>
>Lizette
0
nospam1405 (4666)
4/5/2007 8:27:29 PM
Lizette Koehler wrote:


You state
> Node1, Node2 and Node3 must match VAR1 exactly.

I presume this criteria means the values of all three variables must exist
in the lookup table.

Yet your coding uses OR logic that is not sufficient to encompass the
criteria.

> I have all the code
> I need to do this except for splitting the data into a separate SAS
> data set (NEWLISTS).
....
> If Node1 = VAR1 or Node2 = VAR1 or Node3 = Var1 then Output Newlist ;

What is the problem from a set theory perspective ?

Suppose set one is LOOKUP and set two is OBSERVED

Is NewLists to be
- the set of rows from OBSERVED which contain a Node value that does not
exist in LOOKUP?
- the set of rows from OBSERVED which contain at least one Node value that
does exist in LOOKUP ?
- the set of rows from OBSERVED which has all Node values existing in LOOKUP
?

The following sample code breaks the data into three tables, fullmatch,
partialmatch and nomatch.  The partial match row are categorized by the
nature of the nodes that were found in lookup.  It operates very quickly and
splits the data in a hair over one second on my system.

There is no explicit need to create three tables at all.  The categorical
variable _type_ would deliver all the information you need to implement your
subsetting criteria in a WHERE clause at content generation (proc
report/tabulate/print/export etc) time.

------------------------------------------------
data lookup;
  do rowid = 1 to 1e4;
    var1 = floor(ranuni(135)* 1e5);
    output;
  end;
run;

data observed;
  do rowid = 1 to 5e5;
    node1 = floor(ranuni(246)* 1e5);
    node2 = floor(ranuni(246)* 1e5);
    node3 = floor(ranuni(246)* 1e5);
    output;
  end;
run;

data
  fullmatch    (keep=rowid node1-node3)
  nomatch      (keep=rowid node1-node3)
  partialmatch (keep=rowid node1-node3 _type_)
;

  if 0 then set lookup(obs=0); * does nothing, but preps PDV;

  declare hash lookup (dataset:'lookup');
  lookup.defineKey('var1');
  lookup.defineDone();

  do until (LastObserved);
    set observed end=LastObserved;

    found1 = lookup.find (key:node1) = 0;
    found2 = lookup.find (key:node2) = 0;
    found3 = lookup.find (key:node3) = 0;

    _type_ = found1 + 2*found2 + 4*found3;
    select (_type_);
      when (0) output nomatch;
      when (7) output fullmatch;
      otherwise output partialmatch;
    end;
  end;

  format _type_ binary32.;

  stop;
run;
------------------------------------------------

Richard A. DeVenezia
http://www.devenezia.com



0
rdevenezia1 (1435)
4/6/2007 2:01:52 AM
Reply: