Concatenate fields

  • Follow


Greetings,

New to Awk, new to programing, learning awk as my first language. I'm
understanding it better than my attempts at Python, or Perl. However,
I've got an issue that I know is pretty easy, but I can't quite make
it work.

File format

Cat  bat
Cat  mouse
Cat  rat
Horse grass
Horse hay

etc. The first field is repeated multiple times, and the second is
unique. (with one oddball exception). The first value can be repeated
a variable number of times.

What I want, is to turn that into

Cat  bat mouse rat
Horse grass hay

Its going to be output to a tab delimited file (I know about OFS) for
import into a spreadsheet.

Thank you!


I.E, add the results of the second field to
0
Reply Da_Gut 10/18/2009 8:50:35 PM

Sun, 18 Oct 2009 13:50:35 -0700, Da_Gut did cat :

> Greetings,
> 
> New to Awk, new to programing, learning awk as my first language. I'm
> understanding it better than my attempts at Python, or Perl. However,
> I've got an issue that I know is pretty easy, but I can't quite make it
> work.
> 
> File format
> 
> Cat  bat
> Cat  mouse
> Cat  rat
> Horse grass
> Horse hay
> 
> etc. The first field is repeated multiple times, and the second is
> unique. (with one oddball exception). The first value can be repeated a
> variable number of times.
> 
> What I want, is to turn that into
> 
> Cat  bat mouse rat
> Horse grass hay
> 
> Its going to be output to a tab delimited file (I know about OFS) for
> import into a spreadsheet.

this should give you a start:
$ awk '{v[$1]=v[$1]OFS$2}END{for(i in v){print i OFS v[i]}}' OFS=';' yourfile

left as an exercise getting rid of the doubled OFS (think about conditionnal
assign [x?a:b])
0
Reply Loki 10/18/2009 9:02:18 PM


Da_Gut wrote:

> File format
> 
> Cat  bat
> Cat  mouse
> Cat  rat
> Horse grass
> Horse hay
> 
> etc. The first field is repeated multiple times, and the second is
> unique. (with one oddball exception). The first value can be repeated
> a variable number of times.
> 
> What I want, is to turn that into
> 
> Cat  bat mouse rat
> Horse grass hay
> 
> Its going to be output to a tab delimited file (I know about OFS) for
> import into a spreadsheet.

If the lines with the same first field are consecutive:

awk '{printf "%s",($1!=p)?(p""?ORS:"") $0:OFS $2;p=$1}END{print""}' file

or maybe more clearly:

awk '$1!=p{if(a"")print a;p=a=$1}{a=a OFS $2}END{if(a"")print a}' file

if they are not (which of course also works for the previous case):

awk '{a[$1]=a[$1] OFS $2}END{for(i in a)print i a[i]}'
0
Reply pk 10/18/2009 9:20:34 PM

On Oct 18, 5:20=A0pm, pk <p...@pk.invalid> wrote:
> Da_Gut wrote:
> > File format
>
> > Cat =A0bat
> > Cat =A0mouse
> > Cat =A0rat
> > Horse grass
> > Horse hay
>
> > etc. The first field is repeated multiple times, and the second is
> > unique. (with one oddball exception). The first value can be repeated
> > a variable number of times.
>
> > What I want, is to turn that into
>
> > Cat =A0bat mouse rat
> > Horse grass hay
>
> > Its going to be output to a tab delimited file (I know about OFS) for
> > import into a spreadsheet.
>
> If the lines with the same first field are consecutive:
>
> awk '{printf "%s",($1!=3Dp)?(p""?ORS:"") $0:OFS $2;p=3D$1}END{print""}' f=
ile
>
> or maybe more clearly:
>
> awk '$1!=3Dp{if(a"")print a;p=3Da=3D$1}{a=3Da OFS $2}END{if(a"")print a}'=
 file
>
> if they are not (which of course also works for the previous case):
>
> awk '{a[$1]=3Da[$1] OFS $2}END{for(i in a)print i a[i]}'

Many thanks to you both. I'm presently tearing all of these apart to
understand them (hopefully anyway). What I was trying wasn't anything
like these.
0
Reply Da_Gut 10/19/2009 7:57:09 PM

Mon, 19 Oct 2009 12:57:09 -0700, Da_Gut did cat :

> On Oct 18, 5:20 pm, pk <p...@pk.invalid> wrote:
>> Da_Gut wrote:
>> > File format
>>
>> > Cat  bat
>> > Cat  mouse
>> > Cat  rat
>> > Horse grass
>> > Horse hay
>>
>> > etc. The first field is repeated multiple times, and the second is
>> > unique. (with one oddball exception). The first value can be repeated
>> > a variable number of times.
>>
>> > What I want, is to turn that into
>>
>> > Cat  bat mouse rat
>> > Horse grass hay
>>
>> > Its going to be output to a tab delimited file (I know about OFS) for
>> > import into a spreadsheet.
>>
>> If the lines with the same first field are consecutive:
>>
>> awk '{printf "%s",($1!=p)?(p""?ORS:"") $0:OFS $2;p=$1}END{print""}'
>> file
>>
>> or maybe more clearly:
>>
>> awk '$1!=p{if(a"")print a;p=a=$1}{a=a OFS $2}END{if(a"")print a}' file
>>
>> if they are not (which of course also works for the previous case):
>>
>> awk '{a[$1]=a[$1] OFS $2}END{for(i in a)print i a[i]}'
> 
> Many thanks to you both. I'm presently tearing all of these apart to
> understand them (hopefully anyway).

Good :-) Don't hesitate to follow-up with your
interpretation (and/or complementary questions if any doubt ;-)

> What I was trying wasn't anything
> like these.

Then maybe 'awk' has now a new friend ?-) Cheers!
0
Reply Loki 10/20/2009 6:01:33 PM

On Oct 18, 2:50=A0pm, Da_Gut <googlegro...@gutcup.com> wrote:
> Greetings,
>
> New to Awk, new to programing, learning awk as my first language. I'm
> understanding it better than my attempts at Python, or Perl. However,
> I've got an issue that I know is pretty easy, but I can't quite make
> it work.
>
> File format
>
> Cat =A0bat
> Cat =A0mouse
> Cat =A0rat
> Horse grass
> Horse hay
>
> etc. The first field is repeated multiple times, and the second is
> unique. (with one oddball exception). The first value can be repeated
> a variable number of times.
>
> What I want, is to turn that into
>
> Cat =A0bat mouse rat
> Horse grass hay
>
> Its going to be output to a tab delimited file (I know about OFS) for
> import into a spreadsheet.
>
> Thank you!
>
> I.E, add the results of the second field to

$1 !=3D prev { if (s) print s; prev =3D s =3D $1 }
           { s =3D s OFS $2 }
END        { print s }

0
Reply w_a_x_man 10/25/2009 1:00:56 PM

In article <3010269.cnM708Kxvf@xkzjympik>, pk  <pk@pk.invalid> wrote:
....
>if they are not (which of course also works for the previous case):
>
>awk '{a[$1]=a[$1] OFS $2}END{for(i in a)print i a[i]}'

Yes.  I think of all the solutions, the array-based one is best.
Especially, since it handles the general case (amalgamate all references
to the key - regardless of ordering in the file).

2 notes/nitpicks about the above code:
1) The above works because you concatenate i with a[i] at the end,
    thereby consuming the excess OFS at the beginning of the array
    elements.  Which is OK if your only intended purpose is to print it
    out.  However, most of the time, what you really want is for the
    data in the array to "be correct" - and that takes some additional
    programming (in the usual use case).
2) The use of "for (i in a)" at the end is suspect, as it will print out
    the results in "random" order (*).  Usually, you will want them in
    some specified order.

(*) Usual caveat: Unless you are using TAWK or are a sufficiently whiny user.

0
Reply gazelle 10/25/2009 1:46:07 PM

Kenny McCormack wrote:

> In article <3010269.cnM708Kxvf@xkzjympik>, pk  <pk@pk.invalid> wrote:
> ...
>>if they are not (which of course also works for the previous case):
>>
>>awk '{a[$1]=a[$1] OFS $2}END{for(i in a)print i a[i]}'
> 
> Yes.  I think of all the solutions, the array-based one is best.
> Especially, since it handles the general case (amalgamate all references
> to the key - regardless of ordering in the file).
> 
> 2 notes/nitpicks about the above code:
> 1) The above works because you concatenate i with a[i] at the end,
>     thereby consuming the excess OFS at the beginning of the array
>     elements.  Which is OK if your only intended purpose is to print it
>     out.  However, most of the time, what you really want is for the
>     data in the array to "be correct" - and that takes some additional
>     programming (in the usual use case).

In this case it was intended. The first version of the code was something 
along the usual pattern of

awk '{a[$1]=a[$1] sep[$1] $2; sep[$1]=OFS} END{for(i in a)print i OFS a[i]}'

but then I soon realized that I could shorten it and get the same result by 
doing what I finally posted. It's true that it's not a general solution, but 
it looked OK to use it in this case.
Of course, given more information (for example: is this throwaway code, or 
is it part of a larger script, that should possibly handle slightly 
different variations in the input?) it would be possible to help the OP 
better. Maybe he just didn't realize that the more information about the 
context is provided, the better help you can get; or maybe he purposely kept 
it to a minimum to just get some suggestions and figure out the rest by 
himself. Whatever the reason, I think that the code I posted was appropriate 
given the information provided. And if not, he could always post back to 
explain how and why it is not, and provide more information.

> 2) The use of "for (i in a)" at the end is suspect, as it will print out
>     the results in "random" order (*).  Usually, you will want them in
>     some specified order.

Same as above. Lacking more information, I went for the simplest solution, 
but if that's not OK the OP could always post back.
0
Reply pk 10/25/2009 7:49:00 PM

7 Replies
114 Views

(page loaded in 0.081 seconds)

Similiar Articles:













7/24/2012 9:23:17 AM


Reply: