Many-to-many merge

  • Follow


Hi,

I am having difficulty with a many-to-many merge between two
datasets.
The two datasets actually have 2 variables in common (ID and VISNO),
but I want to merge on ID only since I would lose some data from the
first dataset if I merge with ID and VISNO. Here are both my merge and
procSQL codes:

proc sort data=dataset1; by ID;
proc sort data=dataset2; by ID;
Data want;
Merge
dataset1 (IN=in1)
dataset2;
By ID;
if in1;
Run;

OR

PROC SQL;
CREATE TABLE want AS
SELECT *
FROM dataset1,dataset2
WHERE dataset1.ID=dataset2.ID
;
QUIT;

Here is a snapshot of how the data looks:
dataset1:

ID    VISNO      Varx
1        1             N
1        2             N
1        3             Y
1        4             N
1        5             N
2        1             N
2        2             Y
2        3             N
2        4             N
2        5             N

dataset2:

ID    VISNO      Varx
1        0             N
1        3             N
1        4             Y
1        6             N
1        8             N
2        0             N
2        4             Y
2        6             N
2        8             N
2      10             N
0
Reply jiji 3/7/2010 3:56:32 PM

On Mar 7, 9:56=A0am, jiji <joelleahal...@gmail.com> wrote:
> Hi,
>
> I am having difficulty with a many-to-many merge between two
> datasets.
> The two datasets actually have 2 variables in common (ID and VISNO),
> but I want to merge on ID only since I would lose some data from the
> first dataset if I merge with ID and VISNO. Here are both my merge and
> procSQL codes:
>
> proc sort data=3Ddataset1; by ID;
> proc sort data=3Ddataset2; by ID;
> Data want;
> Merge
> dataset1 (IN=3Din1)
> dataset2;
> By ID;
> if in1;
> Run;
>
> OR
>
> PROC SQL;
> CREATE TABLE want AS
> SELECT *
> FROM dataset1,dataset2
> WHERE dataset1.ID=3Ddataset2.ID
> ;
> QUIT;
>
> Here is a snapshot of how the data looks:
> dataset1:
>
> ID =A0 =A0VISNO =A0 =A0 =A0Varx
> 1 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 2 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
>
> dataset2:
>
> ID =A0 =A0VISNO =A0 =A0 =A0Vary
> 1 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 1 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 2 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A010 =A0 =A0 =A0 =A0 =A0 =A0 N

I should have changed VarX to Var Y in dataset2. The two datasets do
not have variables in common.

I want the final dataset to look something like this

ID    VISNO     Varx      Vary
1        0             .           N
1        1             N          .
1        2             N          .
1        3             Y         N
1        4             N         Y
1        5             N         .
1        6             .          N
1        8             .          N
0
Reply jiji 3/7/2010 4:49:25 PM


Joelle,

There is a time delay (sometimes a large time delay) between when posts
are appear on the listserv and Google group's copy of the listserv.  I
can't recall on which I had placed my original post.

However, your problem doesn't appear to be a many-to-many merge but,
rather, a straight forward merge using both id and visno.  e.g.,:

data dataset1;
  input ID    VISNO      Varx $;
  cards;
1        1             N
1        2             N
1        3             Y
1        4             N
1        5             N
2        1             N
2        2             Y
2        3             N
2        4             N
2        5             N
;

data dataset2;
  input ID    VISNO      Vary $;
  cards;
1        0             N
1        3             N
1        4             Y
1        6             N
1        8             N
2        0             N
2        4             Y
2        6             N
2        8             N
2      10             N
;

proc sort data=dataset1; by ID VISNO;run;
proc sort data=dataset2; by ID VISNO;run;

data want;
  Merge dataset1 dataset2;
  By ID VISNO;
run;

HTH,
Art
-----Original Message-----
From: Joelle Hallak <joelleahallak@gmail.com>
To: Arthur Tabachneck <art297@netscape.net>
Sent: Sun, Mar 7, 2010 11:52 am
Subject: Re: Many-to-many merge

Hi Arthur,
for some reason I do not see your question posted on the discussion
page.
Anyway, i posted my answer to your question there

I should have changed VarX to Var Y in dataset2. The two datasets do
 not have variables in common.

I want the final dataset to look something like this
 ID �  � VISNO �  �  Varx �  �  � Vary
 1 �  �  �  � 0 �  �  �  �  �  �  . �  �  �  �  �  N
 1 �  �  �  � 1 �  �  �  �  �  �  N �  �  �  �  � .
 1 �  �  �  � 2 �  �  �  �  �  �  N �  �  �  �  � .
 1 �  �  �  � 3 �  �  �  �  �  �  Y �  �  �  �  N
 1 �  �  �  � 4 �  �  �  �  �  �  N �  �  �  �  Y
 1 �  �  �  � 5 �  �  �  �  �  �  N �  �  �  �  .
 1 �  �  �  � 6 �  �  �  �  �  �  . �  �  �  �  � N
 1 �  �  �  � 8 �  �  �  �  �  �  . �  �  �  �  � N

Thanks,

On Sun, Mar 7, 2010 at 10:39 AM, Arthur Tabachneck
&lt;art297@netscape.net&gt; wrote:
jiji,

Given the sample data you provided, what do you want the resulting
file to look like? � There are many possibilities and, for both ids,
visno 4 has conflicting values for varx.

Art
-------------
On Mar 7, 10:56� am, jiji &lt;joelleahal...@gmail.com&gt; wrote:
&gt; Hi,
&gt;
&gt; I am having difficulty with a many-to-many merge between two
&gt; datasets.
&gt; The two datasets actually have 2 variables in common (ID and
VISNO),
&gt; but I want to merge on ID only since I would lose some data from
the
&gt; first dataset if I merge with ID and VISNO. Here are both my merge
and
&gt; procSQL codes:
&gt;
&gt; proc sort data=dataset1; by ID;
&gt; proc sort data=dataset2; by ID;
&gt; Data want;
&gt; Merge
&gt; dataset1 (IN=in1)
&gt; dataset2;
&gt; By ID;
&gt; if in1;
&gt; Run;
&gt;
&gt; OR
&gt;
&gt; PROC SQL;
&gt; CREATE TABLE want AS
&gt; SELECT *
&gt; FROM dataset1,dataset2
&gt; WHERE dataset1.ID=dataset2.ID
&gt; ;
&gt; QUIT;
&gt;
&gt; Here is a snapshot of how the data looks:
&gt; dataset1:
&gt;
&gt; ID �  � VISNO �  �  � Varx
&gt; 1 �  �  �  � 1 �  �  �  �  �  �  N
&gt; 1 �  �  �  � 2 �  �  �  �  �  �  N
&gt; 1 �  �  �  � 3 �  �  �  �  �  �  Y
&gt; 1 �  �  �  � 4 �  �  �  �  �  �  N
&gt; 1 �  �  �  � 5 �  �  �  �  �  �  N
&gt; 2 �  �  �  � 1 �  �  �  �  �  �  N
&gt; 2 �  �  �  � 2 �  �  �  �  �  �  Y
&gt; 2 �  �  �  � 3 �  �  �  �  �  �  N
&gt; 2 �  �  �  � 4 �  �  �  �  �  �  N
&gt; 2 �  �  �  � 5 �  �  �  �  �  �  N
&gt;
&gt; dataset2:
&gt;
&gt; ID �  � VISNO �  �  � Varx
&gt; 1 �  �  �  � 0 �  �  �  �  �  �  N
&gt; 1 �  �  �  � 3 �  �  �  �  �  �  N
&gt; 1 �  �  �  � 4 �  �  �  �  �  �  Y
&gt; 1 �  �  �  � 6 �  �  �  �  �  �  N
&gt; 1 �  �  �  � 8 �  �  �  �  �  �  N
&gt; 2 �  �  �  � 0 �  �  �  �  �  �  N
&gt; 2 �  �  �  � 4 �  �  �  �  �  �  Y
&gt; 2 �  �  �  � 6 �  �  �  �  �  �  N
&gt; 2 �  �  �  � 8 �  �  �  �  �  �  N
&gt; 2 �  �  � 10 �  �  �  �  �  �  N
0
Reply art297 (4237) 3/7/2010 6:01:32 PM

Why not just merge by ID and VISNO?  You will not lose any
information.
Perhaps you want to carry-forward the values for X or Y onto the
missing visits?
You can do that by generating new variables and using the RETAIN
statement.

data one;
 input id visno x $ @@;
cards;
1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N
2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N
run;

data two;
 input id visno y $ @@;
cards;
1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N
2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N
run;

data new;
  merge one(in=3Din1) two(in=3Din2);
  by id visno;
  if first.id then do; newx=3D' ' ; newy=3D' '; end;
  retain newx newy;
  format visno z2.;
  if in1 then newx=3Dx;
  if in2 then newy=3Dy;
  put id visno newx newy;
  if last.id then put;
run;


1 00   N
1 01 N N
1 02 N N
1 03 Y N
1 04 N Y
1 05 N Y
1 06 N N
1 08 N N

2 00   N
2 01 N N
2 02 Y N
2 03 N N
2 04 N Y
2 05 N Y
2 06 N N
2 08 N N
2 10 N N

On Mar 7, 10:56=A0am, jiji <joelleahal...@gmail.com> wrote:
> Hi,
>
> I am having difficulty with a many-to-many merge between two
> datasets.
> The two datasets actually have 2 variables in common (ID and VISNO),
> but I want to merge on ID only since I would lose some data from the
> first dataset if I merge with ID and VISNO. Here are both my merge and
> procSQL codes:
>
> proc sort data=3Ddataset1; by ID;
> proc sort data=3Ddataset2; by ID;
> Data want;
> Merge
> dataset1 (IN=3Din1)
> dataset2;
> By ID;
> if in1;
> Run;
>
> OR
>
> PROC SQL;
> CREATE TABLE want AS
> SELECT *
> FROM dataset1,dataset2
> WHERE dataset1.ID=3Ddataset2.ID
> ;
> QUIT;
>
> Here is a snapshot of how the data looks:
> dataset1:
>
> ID =A0 =A0VISNO =A0 =A0 =A0Varx
> 1 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 2 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
>
> dataset2:
>
> ID =A0 =A0VISNO =A0 =A0 =A0Varx
> 1 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 1 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> 1 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> 2 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> 2 =A0 =A0 =A010 =A0 =A0 =A0 =A0 =A0 =A0 N

0
Reply Tom 3/7/2010 6:45:11 PM

On Mar 7, 12:45=A0pm, Tom Abernathy <tom.aberna...@gmail.com> wrote:
> Why not just merge by ID and VISNO? =A0You will not lose any
> information.
> Perhaps you want to carry-forward the values for X or Y onto the
> missing visits?
> You can do that by generating new variables and using the RETAIN
> statement.
>
> data one;
> =A0input id visno x $ @@;
> cards;
> 1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N
> 2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N
> run;
>
> data two;
> =A0input id visno y $ @@;
> cards;
> 1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N
> 2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N
> run;
>
> data new;
> =A0 merge one(in=3Din1) two(in=3Din2);
> =A0 by id visno;
> =A0 if first.id then do; newx=3D' ' ; newy=3D' '; end;
> =A0 retain newx newy;
> =A0 format visno z2.;
> =A0 if in1 then newx=3Dx;
> =A0 if in2 then newy=3Dy;
> =A0 put id visno newx newy;
> =A0 if last.id then put;
> run;
>
> 1 00 =A0 N
> 1 01 N N
> 1 02 N N
> 1 03 Y N
> 1 04 N Y
> 1 05 N Y
> 1 06 N N
> 1 08 N N
>
> 2 00 =A0 N
> 2 01 N N
> 2 02 Y N
> 2 03 N N
> 2 04 N Y
> 2 05 N Y
> 2 06 N N
> 2 08 N N
> 2 10 N N
>
> On Mar 7, 10:56=A0am, jiji <joelleahal...@gmail.com> wrote:
>
>
>
> > Hi,
>
> > I am having difficulty with a many-to-many merge between two
> > datasets.
> > The two datasets actually have 2 variables in common (ID and VISNO),
> > but I want to merge on ID only since I would lose some data from the
> > first dataset if I merge with ID and VISNO. Here are both my merge and
> > procSQL codes:
>
> > proc sort data=3Ddataset1; by ID;
> > proc sort data=3Ddataset2; by ID;
> > Data want;
> > Merge
> > dataset1 (IN=3Din1)
> > dataset2;
> > By ID;
> > if in1;
> > Run;
>
> > OR
>
> > PROC SQL;
> > CREATE TABLE want AS
> > SELECT *
> > FROM dataset1,dataset2
> > WHERE dataset1.ID=3Ddataset2.ID
> > ;
> > QUIT;
>
> > Here is a snapshot of how the data looks:
> > dataset1:
>
> > ID =A0 =A0VISNO =A0 =A0 =A0Varx
> > 1 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > 2 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
>
> > dataset2:
>
> > ID =A0 =A0VISNO =A0 =A0 =A0Varx
> > 1 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > 1 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 1 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > 2 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> > 2 =A0 =A0 =A010 =A0 =A0 =A0 =A0 =A0 =A0 N- Hide quoted text -
>
> - Show quoted text -

Thank you Tom.
How can I apply this to all the dataset. Since I have 102 variables. I
tried the following, however it still did not retain the missing VISNO
in either dataset

proc sort data=3Ddataset1; by ID VISNO;
proc sort data=3Ddataset2; by ID VISNO;
data new;
  merge dataset1(in=3Din1) dataset2(in=3Din2);
  by id visno;
  if first.id then do; end;
  retain;
  if in1;
  if in2;
  put id;
  if last.id then put;
  run;
run;
0
Reply jiji 3/7/2010 8:17:32 PM

On Mar 7, 2:17=A0pm, jiji <joelleahal...@gmail.com> wrote:
> On Mar 7, 12:45=A0pm, Tom Abernathy <tom.aberna...@gmail.com> wrote:
>
>
>
>
>
> > Why not just merge by ID and VISNO? =A0You will not lose any
> > information.
> > Perhaps you want to carry-forward the values for X or Y onto the
> > missing visits?
> > You can do that by generating new variables and using the RETAIN
> > statement.
>
> > data one;
> > =A0input id visno x $ @@;
> > cards;
> > 1 1 N 1 2 N 1 3 Y 1 4 N 1 5 N
> > 2 1 N 2 2 Y 2 3 N 2 4 N 2 5 N
> > run;
>
> > data two;
> > =A0input id visno y $ @@;
> > cards;
> > 1 0 N 1 3 N 1 4 Y 1 6 N 1 8 N
> > 2 0 N 2 4 Y 2 6 N 2 8 N 2 10 N
> > run;
>
> > data new;
> > =A0 merge one(in=3Din1) two(in=3Din2);
> > =A0 by id visno;
> > =A0 if first.id then do; newx=3D' ' ; newy=3D' '; end;
> > =A0 retain newx newy;
> > =A0 format visno z2.;
> > =A0 if in1 then newx=3Dx;
> > =A0 if in2 then newy=3Dy;
> > =A0 put id visno newx newy;
> > =A0 if last.id then put;
> > run;
>
> > 1 00 =A0 N
> > 1 01 N N
> > 1 02 N N
> > 1 03 Y N
> > 1 04 N Y
> > 1 05 N Y
> > 1 06 N N
> > 1 08 N N
>
> > 2 00 =A0 N
> > 2 01 N N
> > 2 02 Y N
> > 2 03 N N
> > 2 04 N Y
> > 2 05 N Y
> > 2 06 N N
> > 2 08 N N
> > 2 10 N N
>
> > On Mar 7, 10:56=A0am, jiji <joelleahal...@gmail.com> wrote:
>
> > > Hi,
>
> > > I am having difficulty with a many-to-many merge between two
> > > datasets.
> > > The two datasets actually have 2 variables in common (ID and VISNO),
> > > but I want to merge on ID only since I would lose some data from the
> > > first dataset if I merge with ID and VISNO. Here are both my merge an=
d
> > > procSQL codes:
>
> > > proc sort data=3Ddataset1; by ID;
> > > proc sort data=3Ddataset2; by ID;
> > > Data want;
> > > Merge
> > > dataset1 (IN=3Din1)
> > > dataset2;
> > > By ID;
> > > if in1;
> > > Run;
>
> > > OR
>
> > > PROC SQL;
> > > CREATE TABLE want AS
> > > SELECT *
> > > FROM dataset1,dataset2
> > > WHERE dataset1.ID=3Ddataset2.ID
> > > ;
> > > QUIT;
>
> > > Here is a snapshot of how the data looks:
> > > dataset1:
>
> > > ID =A0 =A0VISNO =A0 =A0 =A0Varx
> > > 1 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > > 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A02 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > > 2 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A05 =A0 =A0 =A0 =A0 =A0 =A0 N
>
> > > dataset2:
>
> > > ID =A0 =A0VISNO =A0 =A0 =A0Varx
> > > 1 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A03 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > > 1 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 1 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 =A0 =A0 Y
> > > 2 =A0 =A0 =A0 =A06 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A0 =A08 =A0 =A0 =A0 =A0 =A0 =A0 N
> > > 2 =A0 =A0 =A010 =A0 =A0 =A0 =A0 =A0 =A0 N- Hide quoted text -
>
> > - Show quoted text -
>
> Thank you Tom.
> How can I apply this to all the dataset. Since I have 102 variables. I
> tried the following, however it still did not retain the missing VISNO
> in either dataset
>
> proc sort data=3Ddataset1; by ID VISNO;
> proc sort data=3Ddataset2; by ID VISNO;
> data new;
> =A0 merge dataset1(in=3Din1) dataset2(in=3Din2);
> =A0 by id visno;
> =A0 if first.id then do; end;
> =A0 retain;
> =A0 if in1;
> =A0 if in2;
> =A0 put id;
> =A0 if last.id then put;
> =A0 run;
> run;- Hide quoted text -
>
> - Show quoted text -

It actually worked with the following code:
data want;
  merge dataset1(in=3Din1) dataset2(in=3Din2);
  by id visno;
  retain if in1 or in2;
run;

Thanks!!
0
Reply jiji 3/7/2010 8:37:57 PM

Hey,

you could also try a FULL JOIN on ID and VISNO or a NATURAL FULL JOIN.
The PROC SQL code posted earlier was an INNER JOIN.

/*full join*/
proc sql;
 select coalesce(d1.id,d2.id) as id, coalesce(d1.visno,d2.visno) as visno,varx,vary
  from dataset1 d1 full join dataset2 d2
  on d1.id = d2.id and d1.visno = d2.visno;
quit;


/*equivalent to the code above, but less coding for you if the key variables have the same
names*/
proc sql;
 select * from dataset1 natural full join dataset2;
quit;

HTH,
Marianne

On Sun, 7 Mar 2010 13:01:32 -0500, Arthur Tabachneck <art297@NETSCAPE.NET> wrote:

>Joelle,
>
>There is a time delay (sometimes a large time delay) between when posts
>are appear on the listserv and Google group's copy of the listserv.  I
>can't recall on which I had placed my original post.
>
>However, your problem doesn't appear to be a many-to-many merge but,
>rather, a straight forward merge using both id and visno.  e.g.,:
>
>data dataset1;
>  input ID    VISNO      Varx $;
>  cards;
>1        1             N
>1        2             N
>1        3             Y
>1        4             N
>1        5             N
>2        1             N
>2        2             Y
>2        3             N
>2        4             N
>2        5             N
>;
>
>data dataset2;
>  input ID    VISNO      Vary $;
>  cards;
>1        0             N
>1        3             N
>1        4             Y
>1        6             N
>1        8             N
>2        0             N
>2        4             Y
>2        6             N
>2        8             N
>2      10             N
>;
>
>proc sort data=dataset1; by ID VISNO;run;
>proc sort data=dataset2; by ID VISNO;run;
>
>data want;
>  Merge dataset1 dataset2;
>  By ID VISNO;
>run;
>
>HTH,
>Art
>-----Original Message-----
>From: Joelle Hallak <joelleahallak@gmail.com>
>To: Arthur Tabachneck <art297@netscape.net>
>Sent: Sun, Mar 7, 2010 11:52 am
>Subject: Re: Many-to-many merge
>
>Hi Arthur,
>for some reason I do not see your question posted on the discussion
>page.
>Anyway, i posted my answer to your question there
>
>I should have changed VarX to Var Y in dataset2. The two datasets do
> not have variables in common.
>
>I want the final dataset to look something like this
> ID �  � VISNO �  �  Varx �  �  � Vary
> 1 �  �  �  � 0 �  �  �  �  �  �  . �  �  �  �  �  N
> 1 �  �  �  � 1 �  �  �  �  �  �  N �  �  �  �  � .
> 1 �  �  �  � 2 �  �  �  �  �  �  N �  �  �  �  � .
> 1 �  �  �  � 3 �  �  �  �  �  �  Y �  �  �  �  N
> 1 �  �  �  � 4 �  �  �  �  �  �  N �  �  �  �  Y
> 1 �  �  �  � 5 �  �  �  �  �  �  N �  �  �  �  .
> 1 �  �  �  � 6 �  �  �  �  �  �  . �  �  �  �  � N
> 1 �  �  �  � 8 �  �  �  �  �  �  . �  �  �  �  � N
>
>Thanks,
>
>On Sun, Mar 7, 2010 at 10:39 AM, Arthur Tabachneck
>&lt;art297@netscape.net&gt; wrote:
>jiji,
>
>Given the sample data you provided, what do you want the resulting
>file to look like? � There are many possibilities and, for both ids,
>visno 4 has conflicting values for varx.
>
>Art
>-------------
>On Mar 7, 10:56� am, jiji &lt;joelleahal...@gmail.com&gt; wrote:
>&gt; Hi,
>&gt;
>&gt; I am having difficulty with a many-to-many merge between two
>&gt; datasets.
>&gt; The two datasets actually have 2 variables in common (ID and
>VISNO),
>&gt; but I want to merge on ID only since I would lose some data from
>the
>&gt; first dataset if I merge with ID and VISNO. Here are both my merge
>and
>&gt; procSQL codes:
>&gt;
>&gt; proc sort data=dataset1; by ID;
>&gt; proc sort data=dataset2; by ID;
>&gt; Data want;
>&gt; Merge
>&gt; dataset1 (IN=in1)
>&gt; dataset2;
>&gt; By ID;
>&gt; if in1;
>&gt; Run;
>&gt;
>&gt; OR
>&gt;
>&gt; PROC SQL;
>&gt; CREATE TABLE want AS
>&gt; SELECT *
>&gt; FROM dataset1,dataset2
>&gt; WHERE dataset1.ID=dataset2.ID
>&gt; ;
>&gt; QUIT;
>&gt;
>&gt; Here is a snapshot of how the data looks:
>&gt; dataset1:
>&gt;
>&gt; ID �  � VISNO �  �  � Varx
>&gt; 1 �  �  �  � 1 �  �  �  �  �  �  N
>&gt; 1 �  �  �  � 2 �  �  �  �  �  �  N
>&gt; 1 �  �  �  � 3 �  �  �  �  �  �  Y
>&gt; 1 �  �  �  � 4 �  �  �  �  �  �  N
>&gt; 1 �  �  �  � 5 �  �  �  �  �  �  N
>&gt; 2 �  �  �  � 1 �  �  �  �  �  �  N
>&gt; 2 �  �  �  � 2 �  �  �  �  �  �  Y
>&gt; 2 �  �  �  � 3 �  �  �  �  �  �  N
>&gt; 2 �  �  �  � 4 �  �  �  �  �  �  N
>&gt; 2 �  �  �  � 5 �  �  �  �  �  �  N
>&gt;
>&gt; dataset2:
>&gt;
>&gt; ID �  � VISNO �  �  � Varx
>&gt; 1 �  �  �  � 0 �  �  �  �  �  �  N
>&gt; 1 �  �  �  � 3 �  �  �  �  �  �  N
>&gt; 1 �  �  �  � 4 �  �  �  �  �  �  Y
>&gt; 1 �  �  �  � 6 �  �  �  �  �  �  N
>&gt; 1 �  �  �  � 8 �  �  �  �  �  �  N
>&gt; 2 �  �  �  � 0 �  �  �  �  �  �  N
>&gt; 2 �  �  �  � 4 �  �  �  �  �  �  Y
>&gt; 2 �  �  �  � 6 �  �  �  �  �  �  N
>&gt; 2 �  �  �  � 8 �  �  �  �  �  �  N
>&gt; 2 �  �  � 10 �  �  �  �  �  �  N
0
Reply m.weires (5) 3/7/2010 8:42:13 PM

6 Replies
356 Views

(page loaded in 0.173 seconds)


Reply: