Hi Everyone,
I have a dataset that looks like this.
data have;
input id rank;
datalines;
1 1
1 2
1 2
1 2
1 5
1 6
2 1
2 1
2 1
2 1
2 5
3 1
3 2
3 3
3 3
4 1
4 2
4 3
5 1
5 2
6 1
6 1
6 3
6 3
;
run;
ID 1 has 6 records, ID 2 has 5 records, ID 3 has 4 records, ID 4 has 3
records, ID 5 has 2 records, and ID 6 has 4 records.
I only want to get the top 3 records for each ID. For your reference the
value 1 for rank is considered the top rank.
Clearly there is no problem with IDs 4 and 5.
However, I have problems with IDs 1, 2 , 3, and 6 because of the ties with
at least the 4th record for each ID.
Here's the problem:
I only want 3 records for each ID and they should be the top 3. In case of
ties in ranks (like the rank 2's in ID 1, the 1's in ID 2, and the 3's in
IDs 3 adn 6 , i want to randomly select from among the tied ranks so that
I can come up with my 3 records for each case. To be clear, I want to
randomly select 2 records in ID 1 with rank 2; 3 records in ID 2 with rank
1; and 1 record in ID 3 and 6 with rank 3. I also would like to be able
to get the same results each time I submit the code, maybe a seed should be
set?
Thanks for you help in advance everyone.
Best regards,
Florio
|
|
0
|
|
|
|
Reply
|
foa2
|
11/18/2009 9:06:57 PM |
|
Hi Florio,
The FIRST. and LAST. processing lets
you count and keep track of where you
are in each group. The RETAIN lets you
keep a value such as the counter across
observations.
data have;
input id rank;
datalines;
1 1
1 2
1 2
1 2
1 5
1 6
2 1
2 1
2 1
2 1
2 5
3 1
3 2
3 3
3 3
4 1
4 2
4 3
5 1
5 2
6 1
6 1
6 3
6 3
;
run;
data result;
set have;
by id rank;
retain counter 0;
if first.id then counter=0; * reset ;
counter+1;
if counter le 3 then output;
run;
Hope this is helpful.
Mark Terjeson
Investment Business Intelligence
Investment Management & Research
Russell Investments
253-439-2367
Russell
Global Leaders in Multi-Manager Investing
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Florio Arguillas
Sent: Wednesday, November 18, 2009 1:07 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Selecting the top 3 records with a twist
Hi Everyone,
I have a dataset that looks like this.
data have;
input id rank;
datalines;
1 1
1 2
1 2
1 2
1 5
1 6
2 1
2 1
2 1
2 1
2 5
3 1
3 2
3 3
3 3
4 1
4 2
4 3
5 1
5 2
6 1
6 1
6 3
6 3
;
run;
ID 1 has 6 records, ID 2 has 5 records, ID 3 has 4 records, ID 4 has 3
records, ID 5 has 2 records, and ID 6 has 4 records.
I only want to get the top 3 records for each ID. For your reference
the
value 1 for rank is considered the top rank.
Clearly there is no problem with IDs 4 and 5.
However, I have problems with IDs 1, 2 , 3, and 6 because of the ties
with
at least the 4th record for each ID.
Here's the problem:
I only want 3 records for each ID and they should be the top 3. In case
of
ties in ranks (like the rank 2's in ID 1, the 1's in ID 2, and the 3's
in
IDs 3 adn 6 , i want to randomly select from among the tied ranks so
that
I can come up with my 3 records for each case. To be clear, I want to
randomly select 2 records in ID 1 with rank 2; 3 records in ID 2 with
rank
1; and 1 record in ID 3 and 6 with rank 3. I also would like to be
able
to get the same results each time I submit the code, maybe a seed should
be
set?
Thanks for you help in advance everyone.
Best regards,
Florio
|
|
0
|
|
|
|
Reply
|
mterjeson (1932)
|
11/18/2009 9:16:59 PM
|
|
Give each a random value to sort by, and use a fixed seed as you say [the
date you run it, perhaps, or a wave number, or whatever]. Sort by that.
So:
data have;
input id rank;
ranval = ranuni(7);
datalines;
1 1
1 2
1 2
1 2
1 5
1 6
2 1
2 1
2 1
2 1
2 5
3 1
3 2
3 3
3 3
4 1
4 2
4 3
5 1
5 2
6 1
6 1
6 3
6 3
;
run;
proc sort data=have;
by id rank ranval;
run;
data want;
set have;
by id rank ranval;
if first.id then _t = 0;
_t+1;
if _t le 3;
drop _t;
run;
-Joe
On Wed, Nov 18, 2009 at 3:06 PM, Florio Arguillas <foa2@cornell.edu> wrote:
> Hi Everyone,
>
> I have a dataset that looks like this.
>
> data have;
> input id rank;
> datalines;
> 1 1
> 1 2
> 1 2
> 1 2
> 1 5
> 1 6
> 2 1
> 2 1
> 2 1
> 2 1
> 2 5
> 3 1
> 3 2
> 3 3
> 3 3
> 4 1
> 4 2
> 4 3
> 5 1
> 5 2
> 6 1
> 6 1
> 6 3
> 6 3
> ;
> run;
>
> ID 1 has 6 records, ID 2 has 5 records, ID 3 has 4 records, ID 4 has 3
> records, ID 5 has 2 records, and ID 6 has 4 records.
> I only want to get the top 3 records for each ID. For your reference the
> value 1 for rank is considered the top rank.
> Clearly there is no problem with IDs 4 and 5.
> However, I have problems with IDs 1, 2 , 3, and 6 because of the ties with
> at least the 4th record for each ID.
>
> Here's the problem:
> I only want 3 records for each ID and they should be the top 3. In case of
> ties in ranks (like the rank 2's in ID 1, the 1's in ID 2, and the 3's in
> IDs 3 adn 6 , i want to randomly select from among the tied ranks so that
> I can come up with my 3 records for each case. To be clear, I want to
> randomly select 2 records in ID 1 with rank 2; 3 records in ID 2 with rank
> 1; and 1 record in ID 3 and 6 with rank 3. I also would like to be able
> to get the same results each time I submit the code, maybe a seed should be
> set?
>
> Thanks for you help in advance everyone.
>
> Best regards,
>
> Florio
>
|
|
0
|
|
|
|
Reply
|
snoopy369 (1752)
|
11/18/2009 9:18:41 PM
|
|
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Florio Arguillas
> Sent: Wednesday, November 18, 2009 1:07 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Selecting the top 3 records with a twist
>
> Hi Everyone,
>
> I have a dataset that looks like this.
>
> data have;
> input id rank;
> datalines;
> 1 1
> 1 2
> 1 2
> 1 2
> 1 5
> 1 6
> 2 1
> 2 1
> 2 1
> 2 1
> 2 5
> 3 1
> 3 2
> 3 3
> 3 3
> 4 1
> 4 2
> 4 3
> 5 1
> 5 2
> 6 1
> 6 1
> 6 3
> 6 3
> ;
> run;
>
> ID 1 has 6 records, ID 2 has 5 records, ID 3 has 4 records, ID 4 has 3
> records, ID 5 has 2 records, and ID 6 has 4 records.
> I only want to get the top 3 records for each ID. For your reference the
> value 1 for rank is considered the top rank.
> Clearly there is no problem with IDs 4 and 5.
> However, I have problems with IDs 1, 2 , 3, and 6 because of the ties with
> at least the 4th record for each ID.
>
> Here's the problem:
> I only want 3 records for each ID and they should be the top 3. In case of
> ties in ranks (like the rank 2's in ID 1, the 1's in ID 2, and the 3's in
> IDs 3 adn 6 , i want to randomly select from among the tied ranks so that
> I can come up with my 3 records for each case. To be clear, I want to
> randomly select 2 records in ID 1 with rank 2; 3 records in ID 2 with rank
> 1; and 1 record in ID 3 and 6 with rank 3. I also would like to be able
> to get the same results each time I submit the code, maybe a seed should be
> set?
>
> Thanks for you help in advance everyone.
>
> Best regards,
>
> Florio
Here is one possible solution
data temp;
set have;
rnd = uniform(549127);
run;
proc sort data=temp;
by id rank rnd;
run;
data want;
do _n_ = 1 by 1 until(last.id);
set temp;
by id rank rnd;
if i LT 4 then output;
end;
run;
Hope this is helpful,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
|
|
0
|
|
|
|
Reply
|
NordlDJ
|
11/18/2009 9:24:50 PM
|
|
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Nordlund, Dan (DSHS/RDA)
> Sent: Wednesday, November 18, 2009 1:25 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: Selecting the top 3 records with a twist
>
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > Florio Arguillas
> > Sent: Wednesday, November 18, 2009 1:07 PM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: Selecting the top 3 records with a twist
> >
> > Hi Everyone,
> >
> > I have a dataset that looks like this.
> >
> > data have;
> > input id rank;
> > datalines;
> > 1 1
> > 1 2
> > 1 2
> > 1 2
> > 1 5
> > 1 6
> > 2 1
> > 2 1
> > 2 1
> > 2 1
> > 2 5
> > 3 1
> > 3 2
> > 3 3
> > 3 3
> > 4 1
> > 4 2
> > 4 3
> > 5 1
> > 5 2
> > 6 1
> > 6 1
> > 6 3
> > 6 3
> > ;
> > run;
> >
> > ID 1 has 6 records, ID 2 has 5 records, ID 3 has 4 records, ID 4 has 3
> > records, ID 5 has 2 records, and ID 6 has 4 records.
> > I only want to get the top 3 records for each ID. For your reference the
> > value 1 for rank is considered the top rank.
> > Clearly there is no problem with IDs 4 and 5.
> > However, I have problems with IDs 1, 2 , 3, and 6 because of the ties with
> > at least the 4th record for each ID.
> >
> > Here's the problem:
> > I only want 3 records for each ID and they should be the top 3. In case of
> > ties in ranks (like the rank 2's in ID 1, the 1's in ID 2, and the 3's in
> > IDs 3 adn 6 , i want to randomly select from among the tied ranks so that
> > I can come up with my 3 records for each case. To be clear, I want to
> > randomly select 2 records in ID 1 with rank 2; 3 records in ID 2 with rank
> > 1; and 1 record in ID 3 and 6 with rank 3. I also would like to be able
> > to get the same results each time I submit the code, maybe a seed should be
> > set?
> >
> > Thanks for you help in advance everyone.
> >
> > Best regards,
> >
> > Florio
>
> Here is one possible solution
>
> data temp;
> set have;
> rnd = uniform(549127);
> run;
> proc sort data=temp;
> by id rank rnd;
> run;
> data want;
> do _n_ = 1 by 1 until(last.id);
> set temp;
> by id rank rnd;
> if i LT 4 then output;
> end;
> run;
>
Sorry for the error, the output statement should be
if _n_ LT 4 then output;
Hope this is more helpful :-),
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
|
|
0
|
|
|
|
Reply
|
NordlDJ
|
11/18/2009 9:30:04 PM
|
|
|
4 Replies
257 Views
(page loaded in 0.009 seconds)
|