Hi,
I am having difficulty with a many-to-many merge between two
datasets.
The two datasets actually have 2 variables in common (ID and VISNO),
but I want to merge on ID only since I would lose some data from the
first dataset if I merge with ID and VISNO. Here are both my merge and
procSQL codes:
proc sort data=dataset1; by ID;
proc sort data=dataset2; by ID;
Data want;
Merge
dataset1 (IN=in1)
dataset2;
By ID;
if in1;
Run;
OR
PROC SQL;
CREATE TABLE want AS
SELECT *
FROM dataset1,dataset2
WHERE dataset1.ID=dataset2.ID
;
QUIT;
Here is a snapshot of how the data looks:
dataset1:
ID VISNO Varx
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
2 1 N
2 2 Y
2 3 N
2 4 N
2 5 N
dataset2:
ID VISNO Varx
1 0 N
1 3 N
1 4 Y
1 6 N
1 8 N
2 0 N
2 4 Y
2 6 N
2 8 N
2 10 N
|
|
0
|
|
|
|
Reply
|
jiji
|
3/7/2010 3:56:32 PM |
|
On Mar 7, 9:56=A0am, jiji <joelleahal...@gmail.com> wrote:
> Hi,
>
> I am having difficulty with a many-to-many merge between two
> datasets.
> The two datasets actually have 2 variables in common (ID and VISNO),
> but I want to merge on ID only since I would lose some data from the
> first dataset if I merge with ID and VISNO. Here are both my merge and
> procSQL codes:
>
> proc sort data=3Ddataset1; by ID;
> proc sort data=3Ddataset2; by ID;
> Data want;
> Merge
> dataset1 (IN=3Din1)
> dataset2;
> By ID;
> if in1;
> Run;
>
> OR
>
> PROC SQL;
> CREATE TABLE want AS
> SELECT *
> FROM dataset1,dataset2
> WHERE dataset1.ID=3Ddataset2.ID
> ;
> QUIT;
>
> Here is a snapshot of how the data looks:
> dataset1:
>
> ID =A0 =A0VISNO =A0 =A0 =A0Varx
> 1 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 2 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
>
> dataset2:
>
> ID =A0 =A0VISNO =A0 =A0 =A0Vary
> 1 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 1 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 2 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A010 =A0 =A0 =A0 =A0 =A0 =A0 N
I should have changed VarX to Var Y in dataset2. The two datasets do
not have variables in common.
I want the final dataset to look something like this
ID VISNO Varx Vary
1 0 . N
1 1 N .
1 2 N .
1 3 Y N
1 4 N Y
1 5 N .
1 6 . N
1 8 . N
|
|
0
|
|
|
|
Reply
|
jiji
|
3/7/2010 4:49:25 PM
|
|
Joelle,
There is a time delay (sometimes a large time delay) between when posts
are appear on the listserv and Google group's copy of the listserv. I
can't recall on which I had placed my original post.
However, your problem doesn't appear to be a many-to-many merge but,
rather, a straight forward merge using both id and visno. e.g.,:
data dataset1;
input ID VISNO Varx $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
2 1 N
2 2 Y
2 3 N
2 4 N
2 5 N
;
data dataset2;
input ID VISNO Vary $;
cards;
1 0 N
1 3 N
1 4 Y
1 6 N
1 8 N
2 0 N
2 4 Y
2 6 N
2 8 N
2 10 N
;
proc sort data=dataset1; by ID VISNO;run;
proc sort data=dataset2; by ID VISNO;run;
data want;
Merge dataset1 dataset2;
By ID VISNO;
run;
HTH,
Art
-----Original Message-----
From: Joelle Hallak <joelleahallak@gmail.com>
To: Arthur Tabachneck <art297@netscape.net>
Sent: Sun, Mar 7, 2010 11:52 am
Subject: Re: Many-to-many merge
Hi Arthur,
for some reason I do not see your question posted on the discussion
page.
Anyway, i posted my answer to your question there
I should have changed VarX to Var Y in dataset2. The two datasets do
not have variables in common.
I want the final dataset to look something like this
ID � � VISNO � � Varx � � � Vary
1 � � � � 0 � � � � � � . � � � � � N
1 � � � � 1 � � � � � � N � � � � � .
1 � � � � 2 � � � � � � N � � � � � .
1 � � � � 3 � � � � � � Y � � � � N
1 � � � � 4 � � � � � � N � � � � Y
1 � � � � 5 � � � � � � N � � � � .
1 � � � � 6 � � � � � � . � � � � � N
1 � � � � 8 � � � � � � . � � � � � N
Thanks,
On Sun, Mar 7, 2010 at 10:39 AM, Arthur Tabachneck
<art297@netscape.net> wrote:
jiji,
Given the sample data you provided, what do you want the resulting
file to look like? � There are many possibilities and, for both ids,
visno 4 has conflicting values for varx.
Art
-------------
On Mar 7, 10:56� am, jiji <joelleahal...@gmail.com> wrote:
> Hi,
>
> I am having difficulty with a many-to-many merge between two
> datasets.
> The two datasets actually have 2 variables in common (ID and
VISNO),
> but I want to merge on ID only since I would lose some data from
the
> first dataset if I merge with ID and VISNO. Here are both my merge
and
> procSQL codes:
>
> proc sort data=dataset1; by ID;
> proc sort data=dataset2; by ID;
> Data want;
> Merge
> dataset1 (IN=in1)
> dataset2;
> By ID;
> if in1;
> Run;
>
> OR
>
> PROC SQL;
> CREATE TABLE want AS
> SELECT *
> FROM dataset1,dataset2
> WHERE dataset1.ID=dataset2.ID
> ;
> QUIT;
>
> Here is a snapshot of how the data looks:
> dataset1:
>
> ID � � VISNO � � � Varx
> 1 � � � � 1 � � � � � � N
> 1 � � � � 2 � � � � � � N
> 1 � � � � 3 � � � � � � Y
> 1 � � � � 4 � � � � � � N
> 1 � � � � 5 � � � � � � N
> 2 � � � � 1 � � � � � � N
> 2 � � � � 2 � � � � � � Y
> 2 � � � � 3 � � � � � � N
> 2 � � � � 4 � � � � � � N
> 2 � � � � 5 � � � � � � N
>
> dataset2:
>
> ID � � VISNO � � � Varx
> 1 � � � � 0 � � � � � � N
> 1 � � � � 3 � � � � � � N
> 1 � � � � 4 � � � � � � Y
> 1 � � � � 6 � � � � � � N
> 1 � � � � 8 � � � � � � N
> 2 � � � � 0 � � � � � � N
> 2 � � � � 4 � � � � � � Y
> 2 � � � � 6 � � � � � � N
> 2 � � � � 8 � � � � � � N
> 2 � � � 10 � � � � � � N
|
|
0
|
|
|
|
Reply
|
art297 (4237)
|
3/7/2010 6:01:32 PM
|
|
Why not just merge by ID and VISNO? You will not lose any
information.
Perhaps you want to carry-forward the values for X or Y onto the
missing visits?
You can do that by generating new variables and using the RETAIN
statement.
data one;
input id visno x $ @@;
cards;
1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N
2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N
run;
data two;
input id visno y $ @@;
cards;
1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N
2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N
run;
data new;
merge one(in=3Din1) two(in=3Din2);
by id visno;
if first.id then do; newx=3D' ' ; newy=3D' '; end;
retain newx newy;
format visno z2.;
if in1 then newx=3Dx;
if in2 then newy=3Dy;
put id visno newx newy;
if last.id then put;
run;
1 00 N
1 01 N N
1 02 N N
1 03 Y N
1 04 N Y
1 05 N Y
1 06 N N
1 08 N N
2 00 N
2 01 N N
2 02 Y N
2 03 N N
2 04 N Y
2 05 N Y
2 06 N N
2 08 N N
2 10 N N
On Mar 7, 10:56=A0am, jiji <joelleahal...@gmail.com> wrote:
> Hi,
>
> I am having difficulty with a many-to-many merge between two
> datasets.
> The two datasets actually have 2 variables in common (ID and VISNO),
> but I want to merge on ID only since I would lose some data from the
> first dataset if I merge with ID and VISNO. Here are both my merge and
> procSQL codes:
>
> proc sort data=3Ddataset1; by ID;
> proc sort data=3Ddataset2; by ID;
> Data want;
> Merge
> dataset1 (IN=3Din1)
> dataset2;
> By ID;
> if in1;
> Run;
>
> OR
>
> PROC SQL;
> CREATE TABLE want AS
> SELECT *
> FROM dataset1,dataset2
> WHERE dataset1.ID=3Ddataset2.ID
> ;
> QUIT;
>
> Here is a snapshot of how the data looks:
> dataset1:
>
> ID =A0 =A0VISNO =A0 =A0 =A0Varx
> 1 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 2 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
>
> dataset2:
>
> ID =A0 =A0VISNO =A0 =A0 =A0Varx
> 1 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 1 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 2 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A010 =A0 =A0 =A0 =A0 =A0 =A0 N
|
|
0
|
|
|
|
Reply
|
Tom
|
3/7/2010 6:45:11 PM
|
|
On Mar 7, 12:45=A0pm, Tom Abernathy <tom.aberna...@gmail.com> wrote:
> Why not just merge by ID and VISNO? =A0You will not lose any
> information.
> Perhaps you want to carry-forward the values for X or Y onto the
> missing visits?
> You can do that by generating new variables and using the RETAIN
> statement.
>
> data one;
> =A0input id visno x $ @@;
> cards;
> 1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N
> 2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N
> run;
>
> data two;
> =A0input id visno y $ @@;
> cards;
> 1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N
> 2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N
> run;
>
> data new;
> =A0 merge one(in=3Din1) two(in=3Din2);
> =A0 by id visno;
> =A0 if first.id then do; newx=3D' ' ; newy=3D' '; end;
> =A0 retain newx newy;
> =A0 format visno z2.;
> =A0 if in1 then newx=3Dx;
> =A0 if in2 then newy=3Dy;
> =A0 put id visno newx newy;
> =A0 if last.id then put;
> run;
>
> 1 00 =A0 N
> 1 01 N N
> 1 02 N N
> 1 03 Y N
> 1 04 N Y
> 1 05 N Y
> 1 06 N N
> 1 08 N N
>
> 2 00 =A0 N
> 2 01 N N
> 2 02 Y N
> 2 03 N N
> 2 04 N Y
> 2 05 N Y
> 2 06 N N
> 2 08 N N
> 2 10 N N
>
> On Mar 7, 10:56=A0am, jiji <joelleahal...@gmail.com> wrote:
>
>
>
> > Hi,
>
> > I am having difficulty with a many-to-many merge between two
> > datasets.
> > The two datasets actually have 2 variables in common (ID and VISNO),
> > but I want to merge on ID only since I would lose some data from the
> > first dataset if I merge with ID and VISNO. Here are both my merge and
> > procSQL codes:
>
> > proc sort data=3Ddataset1; by ID;
> > proc sort data=3Ddataset2; by ID;
> > Data want;
> > Merge
> > dataset1 (IN=3Din1)
> > dataset2;
> > By ID;
> > if in1;
> > Run;
>
> > OR
>
> > PROC SQL;
> > CREATE TABLE want AS
> > SELECT *
> > FROM dataset1,dataset2
> > WHERE dataset1.ID=3Ddataset2.ID
> > ;
> > QUIT;
>
> > Here is a snapshot of how the data looks:
> > dataset1:
>
> > ID =A0 =A0VISNO =A0 =A0 =A0Varx
> > 1 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > 2 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
>
> > dataset2:
>
> > ID =A0 =A0VISNO =A0 =A0 =A0Varx
> > 1 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > 1 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > 2 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A010 =A0 =A0 =A0 =A0 =A0 =A0 N- Hide quoted text -
>
> - Show quoted text -
Thank you Tom.
How can I apply this to all the dataset. Since I have 102 variables. I
tried the following, however it still did not retain the missing VISNO
in either dataset
proc sort data=3Ddataset1; by ID VISNO;
proc sort data=3Ddataset2; by ID VISNO;
data new;
merge dataset1(in=3Din1) dataset2(in=3Din2);
by id visno;
if first.id then do; end;
retain;
if in1;
if in2;
put id;
if last.id then put;
run;
run;
|
|
0
|
|
|
|
Reply
|
jiji
|
3/7/2010 8:17:32 PM
|
|
On Mar 7, 2:17=A0pm, jiji <joelleahal...@gmail.com> wrote:
> On Mar 7, 12:45=A0pm, Tom Abernathy <tom.aberna...@gmail.com> wrote:
>
>
>
>
>
> > Why not just merge by ID and VISNO? =A0You will not lose any
> > information.
> > Perhaps you want to carry-forward the values for X or Y onto the
> > missing visits?
> > You can do that by generating new variables and using the RETAIN
> > statement.
>
> > data one;
> > =A0input id visno x $ @@;
> > cards;
> > 1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N
> > 2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N
> > run;
>
> > data two;
> > =A0input id visno y $ @@;
> > cards;
> > 1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N
> > 2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N
> > run;
>
> > data new;
> > =A0 merge one(in=3Din1) two(in=3Din2);
> > =A0 by id visno;
> > =A0 if first.id then do; newx=3D' ' ; newy=3D' '; end;
> > =A0 retain newx newy;
> > =A0 format visno z2.;
> > =A0 if in1 then newx=3Dx;
> > =A0 if in2 then newy=3Dy;
> > =A0 put id visno newx newy;
> > =A0 if last.id then put;
> > run;
>
> > 1 00 =A0 N
> > 1 01 N N
> > 1 02 N N
> > 1 03 Y N
> > 1 04 N Y
> > 1 05 N Y
> > 1 06 N N
> > 1 08 N N
>
> > 2 00 =A0 N
> > 2 01 N N
> > 2 02 Y N
> > 2 03 N N
> > 2 04 N Y
> > 2 05 N Y
> > 2 06 N N
> > 2 08 N N
> > 2 10 N N
>
> > On Mar 7, 10:56=A0am, jiji <joelleahal...@gmail.com> wrote:
>
> > > Hi,
>
> > > I am having difficulty with a many-to-many merge between two
> > > datasets.
> > > The two datasets actually have 2 variables in common (ID and VISNO),
> > > but I want to merge on ID only since I would lose some data from the
> > > first dataset if I merge with ID and VISNO. Here are both my merge an=
d
> > > procSQL codes:
>
> > > proc sort data=3Ddataset1; by ID;
> > > proc sort data=3Ddataset2; by ID;
> > > Data want;
> > > Merge
> > > dataset1 (IN=3Din1)
> > > dataset2;
> > > By ID;
> > > if in1;
> > > Run;
>
> > > OR
>
> > > PROC SQL;
> > > CREATE TABLE want AS
> > > SELECT *
> > > FROM dataset1,dataset2
> > > WHERE dataset1.ID=3Ddataset2.ID
> > > ;
> > > QUIT;
>
> > > Here is a snapshot of how the data looks:
> > > dataset1:
>
> > > ID =A0 =A0VISNO =A0 =A0 =A0Varx
> > > 1 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > > 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > > 2 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
>
> > > dataset2:
>
> > > ID =A0 =A0VISNO =A0 =A0 =A0Varx
> > > 1 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > > 1 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > > 2 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A010 =A0 =A0 =A0 =A0 =A0 =A0 N- Hide quoted text -
>
> > - Show quoted text -
>
> Thank you Tom.
> How can I apply this to all the dataset. Since I have 102 variables. I
> tried the following, however it still did not retain the missing VISNO
> in either dataset
>
> proc sort data=3Ddataset1; by ID VISNO;
> proc sort data=3Ddataset2; by ID VISNO;
> data new;
> =A0 merge dataset1(in=3Din1) dataset2(in=3Din2);
> =A0 by id visno;
> =A0 if first.id then do; end;
> =A0 retain;
> =A0 if in1;
> =A0 if in2;
> =A0 put id;
> =A0 if last.id then put;
> =A0 run;
> run;- Hide quoted text -
>
> - Show quoted text -
It actually worked with the following code:
data want;
merge dataset1(in=3Din1) dataset2(in=3Din2);
by id visno;
retain if in1 or in2;
run;
Thanks!!
|
|
0
|
|
|
|
Reply
|
jiji
|
3/7/2010 8:37:57 PM
|
|
Hey,
you could also try a FULL JOIN on ID and VISNO or a NATURAL FULL JOIN.
The PROC SQL code posted earlier was an INNER JOIN.
/*full join*/
proc sql;
select coalesce(d1.id,d2.id) as id, coalesce(d1.visno,d2.visno) as visno,varx,vary
from dataset1 d1 full join dataset2 d2
on d1.id = d2.id and d1.visno = d2.visno;
quit;
/*equivalent to the code above, but less coding for you if the key variables have the same
names*/
proc sql;
select * from dataset1 natural full join dataset2;
quit;
HTH,
Marianne
On Sun, 7 Mar 2010 13:01:32 -0500, Arthur Tabachneck <art297@NETSCAPE.NET> wrote:
>Joelle,
>
>There is a time delay (sometimes a large time delay) between when posts
>are appear on the listserv and Google group's copy of the listserv. I
>can't recall on which I had placed my original post.
>
>However, your problem doesn't appear to be a many-to-many merge but,
>rather, a straight forward merge using both id and visno. e.g.,:
>
>data dataset1;
> input ID VISNO Varx $;
> cards;
>1 1 N
>1 2 N
>1 3 Y
>1 4 N
>1 5 N
>2 1 N
>2 2 Y
>2 3 N
>2 4 N
>2 5 N
>;
>
>data dataset2;
> input ID VISNO Vary $;
> cards;
>1 0 N
>1 3 N
>1 4 Y
>1 6 N
>1 8 N
>2 0 N
>2 4 Y
>2 6 N
>2 8 N
>2 10 N
>;
>
>proc sort data=dataset1; by ID VISNO;run;
>proc sort data=dataset2; by ID VISNO;run;
>
>data want;
> Merge dataset1 dataset2;
> By ID VISNO;
>run;
>
>HTH,
>Art
>-----Original Message-----
>From: Joelle Hallak <joelleahallak@gmail.com>
>To: Arthur Tabachneck <art297@netscape.net>
>Sent: Sun, Mar 7, 2010 11:52 am
>Subject: Re: Many-to-many merge
>
>Hi Arthur,
>for some reason I do not see your question posted on the discussion
>page.
>Anyway, i posted my answer to your question there
>
>I should have changed VarX to Var Y in dataset2. The two datasets do
> not have variables in common.
>
>I want the final dataset to look something like this
> ID � � VISNO � � Varx � � � Vary
> 1 � � � � 0 � � � � � � . � � � � � N
> 1 � � � � 1 � � � � � � N � � � � � .
> 1 � � � � 2 � � � � � � N � � � � � .
> 1 � � � � 3 � � � � � � Y � � � � N
> 1 � � � � 4 � � � � � � N � � � � Y
> 1 � � � � 5 � � � � � � N � � � � .
> 1 � � � � 6 � � � � � � . � � � � � N
> 1 � � � � 8 � � � � � � . � � � � � N
>
>Thanks,
>
>On Sun, Mar 7, 2010 at 10:39 AM, Arthur Tabachneck
><art297@netscape.net> wrote:
>jiji,
>
>Given the sample data you provided, what do you want the resulting
>file to look like? � There are many possibilities and, for both ids,
>visno 4 has conflicting values for varx.
>
>Art
>-------------
>On Mar 7, 10:56� am, jiji <joelleahal...@gmail.com> wrote:
>> Hi,
>>
>> I am having difficulty with a many-to-many merge between two
>> datasets.
>> The two datasets actually have 2 variables in common (ID and
>VISNO),
>> but I want to merge on ID only since I would lose some data from
>the
>> first dataset if I merge with ID and VISNO. Here are both my merge
>and
>> procSQL codes:
>>
>> proc sort data=dataset1; by ID;
>> proc sort data=dataset2; by ID;
>> Data want;
>> Merge
>> dataset1 (IN=in1)
>> dataset2;
>> By ID;
>> if in1;
>> Run;
>>
>> OR
>>
>> PROC SQL;
>> CREATE TABLE want AS
>> SELECT *
>> FROM dataset1,dataset2
>> WHERE dataset1.ID=dataset2.ID
>> ;
>> QUIT;
>>
>> Here is a snapshot of how the data looks:
>> dataset1:
>>
>> ID � � VISNO � � � Varx
>> 1 � � � � 1 � � � � � � N
>> 1 � � � � 2 � � � � � � N
>> 1 � � � � 3 � � � � � � Y
>> 1 � � � � 4 � � � � � � N
>> 1 � � � � 5 � � � � � � N
>> 2 � � � � 1 � � � � � � N
>> 2 � � � � 2 � � � � � � Y
>> 2 � � � � 3 � � � � � � N
>> 2 � � � � 4 � � � � � � N
>> 2 � � � � 5 � � � � � � N
>>
>> dataset2:
>>
>> ID � � VISNO � � � Varx
>> 1 � � � � 0 � � � � � � N
>> 1 � � � � 3 � � � � � � N
>> 1 � � � � 4 � � � � � � Y
>> 1 � � � � 6 � � � � � � N
>> 1 � � � � 8 � � � � � � N
>> 2 � � � � 0 � � � � � � N
>> 2 � � � � 4 � � � � � � Y
>> 2 � � � � 6 � � � � � � N
>> 2 � � � � 8 � � � � � � N
>> 2 � � � 10 � � � � � � N
|
|
0
|
|
|
|
Reply
|
m.weires (5)
|
3/7/2010 8:42:13 PM
|
|
|
6 Replies
356 Views
(page loaded in 0.173 seconds)
|