merge two files in awk

  • Follow


Hi,
I have two files (file 1 has one column and file 2 four columns), I
have choose the rows of file 2 where column 2 & 3 of file 2 matches
with column 1 of file 1. Anybody has any idea?
Thanks.

0
Reply amrita.ray (1) 10/27/2006 4:51:16 PM

To add to this with an example,

file 1 :
1_8
1_9
1_10
1_11


file 2 :
1 1_500 1_600 0.000 1.0 0.0 0.0
1 1_500 1_500 0.000 0.0 0.0 1.0
1 1_9 1_100 0.000 0.50000 0.50000 0.00000
1 1_9 1_200 0.000 0.50000 0.50000 0.00000
1 1_9 1_400 0.000 1.0 0.0 0.0
.....
1 1_8 1_500 2.107 0.59766 0.40234 0.00000
1 1_8 1_9 2.107 0.89431 0.10569 0.00000
1 1_8 1_300 2.107 0.0 1.0 0.0


merge two files such that it will print
1 1_8 1_9 2.107 0.89431 0.10569 0.00000

i.e. the rows of file 2 where col.2 and col.3 matches with any two
entries in file 1


amrita.ray@gmail.com wrote:
> Hi,
> I have two files (file 1 has one column and file 2 four columns), I
> have choose the rows of file 2 where column 2 & 3 of file 2 matches
> with column 1 of file 1. Anybody has any idea?
> Thanks.

0
Reply mainak 10/27/2006 4:57:34 PM


Please don't top post. Corrected below

mainak.sen@gmail.com wrote:
> Hi,
> I have two files (file 1 has one column and file 2 four columns), I
> have choose the rows of file 2 where column 2 & 3 of file 2 matches
> with column 1 of file 1. Anybody has any idea?
> Thanks.
> > To add to this with an example,
> >
> > file 1 :
> > 1_8
> > 1_9
> > 1_10
> > 1_11
> >
> >
> > file 2 :
> > 1 1_500 1_600 0.000 1.0 0.0 0.0
> > 1 1_500 1_500 0.000 0.0 0.0 1.0
> > 1 1_9 1_100 0.000 0.50000 0.50000 0.00000
> > 1 1_9 1_200 0.000 0.50000 0.50000 0.00000
> > 1 1_9 1_400 0.000 1.0 0.0 0.0
> > ....
> > 1 1_8 1_500 2.107 0.59766 0.40234 0.00000
> > 1 1_8 1_9 2.107 0.89431 0.10569 0.00000
> > 1 1_8 1_300 2.107 0.0 1.0 0.0
> >
> >
> > merge two files such that it will print
> > 1 1_8 1_9 2.107 0.89431 0.10569 0.00000
> >
> > i.e. the rows of file 2 where col.2 and col.3 matches with any two
> > entries in file 1

awk 'NR == FNR { col[$0]++ }
$2 in col && $3 in col' file1 file2

0
Reply Vassilis 10/27/2006 6:12:14 PM

You already got your answer in comp.unix.shell.

0
Reply William 10/27/2006 7:28:51 PM

Yes, the answer:
awk 'NR==FNR {s[$1]} NR!=FNR && ($2 in s) && ($3 in s)' file1 file2
Thanks.


William James wrote:
> You already got your answer in comp.unix.shell.

0
Reply amrita 10/27/2006 8:27:53 PM

amrita.ray@gmail.com wrote:
> Yes, the answer:
> awk 'NR==FNR {s[$1]} NR!=FNR && ($2 in s) && ($3 in s)' file1 file2

That's the wrong answer. Check the others you got.

	Ed.

> 
> William James wrote:
> 
>>You already got your answer in comp.unix.shell.
> 
> 
0
Reply Ed 10/27/2006 9:10:59 PM

Ed Morton wrote:
> amrita.ray@gmail.com wrote:
> 
>> Yes, the answer:
>> awk 'NR==FNR {s[$1]} NR!=FNR && ($2 in s) && ($3 in s)' file1 file2
> 
> That's the wrong answer. Check the others you got.

What's wrong with it?  In c.u.s the OP said it works.

Janis

> 
>     Ed.
> 
>>
>> William James wrote:
>>
>>> You already got your answer in comp.unix.shell.
>>
>>
>>
0
Reply Janis 10/27/2006 9:38:37 PM

Janis Papanagnou wrote:
> Ed Morton wrote:
> 
>> amrita.ray@gmail.com wrote:
>>
>>> Yes, the answer:
>>> awk 'NR==FNR {s[$1]} NR!=FNR && ($2 in s) && ($3 in s)' file1 file2
>>
>>
>> That's the wrong answer. Check the others you got.
> 
> 
> What's wrong with it?  In c.u.s the OP said it works.

It has 2 tests instead of one so it's less efficient and more 
complicated than it has to be. The right answer is:

awk 'NR==FNR {s[$1]; next} ($2 in s) && ($3 in s)' file1 file2

	Ed.
0
Reply Ed 10/28/2006 1:58:22 AM

Ed Morton wrote:
> Janis Papanagnou wrote:
>> Ed Morton wrote:
>>> amrita.ray@gmail.com wrote:
>>>
>>>> Yes, the answer:
>>>> awk 'NR==FNR {s[$1]} NR!=FNR && ($2 in s) && ($3 in s)' file1 file2
>>>
>>> That's the wrong answer. Check the others you got.
>>
>> What's wrong with it?  In c.u.s the OP said it works.
> 
> It has 2 tests instead of one so it's less efficient and more 
> complicated than it has to be.

I wouldn't call that wrong, just different. Efficiency? - Maybe; I
think any difference is of little relevance here (may even depend
on how sophisticated the awk interpreter cares about optimization).
Nevermind.

But personally I think that breaking awk's natural parse sequence
by using 'next' is more "complicated" than guarding the conditions
a'la Dijkstra's if-guards.

But I wouldn't call any of the two proposed one liners complicated,
anyway, as I wouldn't call any of the two solutions "wrong".

Janis

> The right answer is:
> 
> awk 'NR==FNR {s[$1]; next} ($2 in s) && ($3 in s)' file1 file2
> 
>     Ed.
0
Reply Janis 10/28/2006 2:32:42 AM

Janis Papanagnou wrote:
> Ed Morton wrote:
> 
>> Janis Papanagnou wrote:
>>
>>> Ed Morton wrote:
>>>
>>>> amrita.ray@gmail.com wrote:
>>>>
>>>>> Yes, the answer:
>>>>> awk 'NR==FNR {s[$1]} NR!=FNR && ($2 in s) && ($3 in s)' file1 file2
>>>>
>>>>
>>>> That's the wrong answer. Check the others you got.
>>>
>>>
>>> What's wrong with it?  In c.u.s the OP said it works.
>>
>>
>> It has 2 tests instead of one so it's less efficient and more 
>> complicated than it has to be.
> 
> 
> I wouldn't call that wrong, just different.

I would call it wrong because in addition to the above it's not 
extensible. Let's say you want to do other things with the file2 
records. Would you then do this:

awk '
NR==FNR {s[$1]}
NR!=FNR && ($2 in s) && ($3 in s) { print }
NR!=FNR && theSkyIsGrey { ... }
NR!=FNR && scotlandWinsWorldCup { ... }
NR!=FNR && endOfWorldArrives { ... }
' file1 file2

Rather than this:

awk '
NR==FNR {s[$1]; next}
($2 in s) && ($3 in s) { print }
theSkyIsGrey { ... }
scotlandWinsWorldCup { ... }
endOfWorldArrives { ... }
' file1 file2

Yes, the first version will work, but I'd be surprised if anyone 
advocated doing it that way. Also, as a Scot, I suspect that second from 
last condition will unfortunately never be true....

Regards,
	
	Ed.
0
Reply Ed 10/29/2006 4:30:02 PM

<OT>
Ed Morton wrote:
> awk '
> NR==FNR {s[$1]; next}
> ($2 in s) && ($3 in s) { print }
> theSkyIsGrey { ... }
> scotlandWinsWorldCup { ... }
> endOfWorldArrives { ... }
> ' file1 file2
>
> Yes, the first version will work, but I'd be surprised if anyone
> advocated doing it that way. Also, as a Scot, I suspect that second from
> last condition will unfortunately never be true....
>
> Regards,
>
> 	Ed.

Cheer up, mate. Greece has won Euro2004.
Impossible is nothing.
I hear Scotland has some team these days.
</OT>

0
Reply Vassilis 10/29/2006 6:07:36 PM

Ed Morton wrote:
> Janis Papanagnou wrote:
>> Ed Morton wrote:
>>> Janis Papanagnou wrote:
>>>> Ed Morton wrote:
>>>>> amrita.ray@gmail.com wrote:
>>>>>
>>>>>> Yes, the answer:
>>>>>> awk 'NR==FNR {s[$1]} NR!=FNR && ($2 in s) && ($3 in s)' file1 file2
>>>>>
>>>>> That's the wrong answer. Check the others you got.
>>>>
>>>> What's wrong with it?  In c.u.s the OP said it works.
>>>
>>> It has 2 tests instead of one so it's less efficient and more 
>>> complicated than it has to be.
>>
>> I wouldn't call that wrong, just different.
> 
> I would call it wrong because in addition to the above it's not 
> extensible. Let's say you want to do other things with the file2 
> records. Would you then do this:
> 
> awk '
> NR==FNR {s[$1]}
> NR!=FNR && ($2 in s) && ($3 in s) { print }
> NR!=FNR && theSkyIsGrey { ... }
> NR!=FNR && scotlandWinsWorldCup { ... }
> NR!=FNR && endOfWorldArrives { ... }
> ' file1 file2
> 
> Rather than this:
> 
> awk '
> NR==FNR {s[$1]; next}
> ($2 in s) && ($3 in s) { print }
> theSkyIsGrey { ... }
> scotlandWinsWorldCup { ... }
> endOfWorldArrives { ... }
> ' file1 file2

I would have done exactly the same as you _in this case_, using 'next'.

But entensibility is a multifold (and here an academic?) argument.
If you want to extend your program _in a different way_, say...

NR==FNR {s[$1]}
NR!=FNR && ($2 in s) && ($3 in s) { print }
otherConditionForAllFiles1 { ... }
otherConditionForAllFiles2 { ... }
otherConditionForAllFilesN { ... }
{ ...}

....where the action code in the otherConditionForAllFiles<i> depends on
status data set by any of the first two cases, say s[], 'next' would not
be helpful. (And this extension is just one other example (of many).)

A 'next' breaks native control flow. It's an optimization command, IMO,
as is a continue, break, or goto in other languages. And sometimes it
makes code even more readable/comprehensible/maintainable. Sometimes.
Sometimes not.

> Yes, the first version will work, but I'd be surprised if anyone 
> advocated doing it that way.

Still advocating it, since the conditions are clearer.

(Though still saying, in a one-liner like these, the difference is of
little relevance.)

> Also, as a Scot, I suspect that second from 
> last condition will unfortunately never be true....

There are many ways to reach a goal; in awk as well as in football/soccer.
:-)

Janis

> Regards,
>     
>     Ed.
0
Reply Janis 10/29/2006 11:36:24 PM

11 Replies
692 Views

(page loaded in 0.162 seconds)

Similiar Articles:


















7/23/2012 6:31:04 AM


Reply: