Greetings,
New to Awk, new to programing, learning awk as my first language. I'm
understanding it better than my attempts at Python, or Perl. However,
I've got an issue that I know is pretty easy, but I can't quite make
it work.
File format
Cat bat
Cat mouse
Cat rat
Horse grass
Horse hay
etc. The first field is repeated multiple times, and the second is
unique. (with one oddball exception). The first value can be repeated
a variable number of times.
What I want, is to turn that into
Cat bat mouse rat
Horse grass hay
Its going to be output to a tab delimited file (I know about OFS) for
import into a spreadsheet.
Thank you!
I.E, add the results of the second field to
|
|
0
|
|
|
|
Reply
|
Da_Gut
|
10/18/2009 8:50:35 PM |
|
Sun, 18 Oct 2009 13:50:35 -0700, Da_Gut did cat :
> Greetings,
>
> New to Awk, new to programing, learning awk as my first language. I'm
> understanding it better than my attempts at Python, or Perl. However,
> I've got an issue that I know is pretty easy, but I can't quite make it
> work.
>
> File format
>
> Cat bat
> Cat mouse
> Cat rat
> Horse grass
> Horse hay
>
> etc. The first field is repeated multiple times, and the second is
> unique. (with one oddball exception). The first value can be repeated a
> variable number of times.
>
> What I want, is to turn that into
>
> Cat bat mouse rat
> Horse grass hay
>
> Its going to be output to a tab delimited file (I know about OFS) for
> import into a spreadsheet.
this should give you a start:
$ awk '{v[$1]=v[$1]OFS$2}END{for(i in v){print i OFS v[i]}}' OFS=';' yourfile
left as an exercise getting rid of the doubled OFS (think about conditionnal
assign [x?a:b])
|
|
0
|
|
|
|
Reply
|
Loki
|
10/18/2009 9:02:18 PM
|
|
Da_Gut wrote:
> File format
>
> Cat bat
> Cat mouse
> Cat rat
> Horse grass
> Horse hay
>
> etc. The first field is repeated multiple times, and the second is
> unique. (with one oddball exception). The first value can be repeated
> a variable number of times.
>
> What I want, is to turn that into
>
> Cat bat mouse rat
> Horse grass hay
>
> Its going to be output to a tab delimited file (I know about OFS) for
> import into a spreadsheet.
If the lines with the same first field are consecutive:
awk '{printf "%s",($1!=p)?(p""?ORS:"") $0:OFS $2;p=$1}END{print""}' file
or maybe more clearly:
awk '$1!=p{if(a"")print a;p=a=$1}{a=a OFS $2}END{if(a"")print a}' file
if they are not (which of course also works for the previous case):
awk '{a[$1]=a[$1] OFS $2}END{for(i in a)print i a[i]}'
|
|
0
|
|
|
|
Reply
|
pk
|
10/18/2009 9:20:34 PM
|
|
On Oct 18, 5:20=A0pm, pk <p...@pk.invalid> wrote:
> Da_Gut wrote:
> > File format
>
> > Cat =A0bat
> > Cat =A0mouse
> > Cat =A0rat
> > Horse grass
> > Horse hay
>
> > etc. The first field is repeated multiple times, and the second is
> > unique. (with one oddball exception). The first value can be repeated
> > a variable number of times.
>
> > What I want, is to turn that into
>
> > Cat =A0bat mouse rat
> > Horse grass hay
>
> > Its going to be output to a tab delimited file (I know about OFS) for
> > import into a spreadsheet.
>
> If the lines with the same first field are consecutive:
>
> awk '{printf "%s",($1!=3Dp)?(p""?ORS:"") $0:OFS $2;p=3D$1}END{print""}' f=
ile
>
> or maybe more clearly:
>
> awk '$1!=3Dp{if(a"")print a;p=3Da=3D$1}{a=3Da OFS $2}END{if(a"")print a}'=
file
>
> if they are not (which of course also works for the previous case):
>
> awk '{a[$1]=3Da[$1] OFS $2}END{for(i in a)print i a[i]}'
Many thanks to you both. I'm presently tearing all of these apart to
understand them (hopefully anyway). What I was trying wasn't anything
like these.
|
|
0
|
|
|
|
Reply
|
Da_Gut
|
10/19/2009 7:57:09 PM
|
|
Mon, 19 Oct 2009 12:57:09 -0700, Da_Gut did cat :
> On Oct 18, 5:20 pm, pk <p...@pk.invalid> wrote:
>> Da_Gut wrote:
>> > File format
>>
>> > Cat bat
>> > Cat mouse
>> > Cat rat
>> > Horse grass
>> > Horse hay
>>
>> > etc. The first field is repeated multiple times, and the second is
>> > unique. (with one oddball exception). The first value can be repeated
>> > a variable number of times.
>>
>> > What I want, is to turn that into
>>
>> > Cat bat mouse rat
>> > Horse grass hay
>>
>> > Its going to be output to a tab delimited file (I know about OFS) for
>> > import into a spreadsheet.
>>
>> If the lines with the same first field are consecutive:
>>
>> awk '{printf "%s",($1!=p)?(p""?ORS:"") $0:OFS $2;p=$1}END{print""}'
>> file
>>
>> or maybe more clearly:
>>
>> awk '$1!=p{if(a"")print a;p=a=$1}{a=a OFS $2}END{if(a"")print a}' file
>>
>> if they are not (which of course also works for the previous case):
>>
>> awk '{a[$1]=a[$1] OFS $2}END{for(i in a)print i a[i]}'
>
> Many thanks to you both. I'm presently tearing all of these apart to
> understand them (hopefully anyway).
Good :-) Don't hesitate to follow-up with your
interpretation (and/or complementary questions if any doubt ;-)
> What I was trying wasn't anything
> like these.
Then maybe 'awk' has now a new friend ?-) Cheers!
|
|
0
|
|
|
|
Reply
|
Loki
|
10/20/2009 6:01:33 PM
|
|
On Oct 18, 2:50=A0pm, Da_Gut <googlegro...@gutcup.com> wrote:
> Greetings,
>
> New to Awk, new to programing, learning awk as my first language. I'm
> understanding it better than my attempts at Python, or Perl. However,
> I've got an issue that I know is pretty easy, but I can't quite make
> it work.
>
> File format
>
> Cat =A0bat
> Cat =A0mouse
> Cat =A0rat
> Horse grass
> Horse hay
>
> etc. The first field is repeated multiple times, and the second is
> unique. (with one oddball exception). The first value can be repeated
> a variable number of times.
>
> What I want, is to turn that into
>
> Cat =A0bat mouse rat
> Horse grass hay
>
> Its going to be output to a tab delimited file (I know about OFS) for
> import into a spreadsheet.
>
> Thank you!
>
> I.E, add the results of the second field to
$1 !=3D prev { if (s) print s; prev =3D s =3D $1 }
{ s =3D s OFS $2 }
END { print s }
|
|
0
|
|
|
|
Reply
|
w_a_x_man
|
10/25/2009 1:00:56 PM
|
|
In article <3010269.cnM708Kxvf@xkzjympik>, pk <pk@pk.invalid> wrote:
....
>if they are not (which of course also works for the previous case):
>
>awk '{a[$1]=a[$1] OFS $2}END{for(i in a)print i a[i]}'
Yes. I think of all the solutions, the array-based one is best.
Especially, since it handles the general case (amalgamate all references
to the key - regardless of ordering in the file).
2 notes/nitpicks about the above code:
1) The above works because you concatenate i with a[i] at the end,
thereby consuming the excess OFS at the beginning of the array
elements. Which is OK if your only intended purpose is to print it
out. However, most of the time, what you really want is for the
data in the array to "be correct" - and that takes some additional
programming (in the usual use case).
2) The use of "for (i in a)" at the end is suspect, as it will print out
the results in "random" order (*). Usually, you will want them in
some specified order.
(*) Usual caveat: Unless you are using TAWK or are a sufficiently whiny user.
|
|
0
|
|
|
|
Reply
|
gazelle
|
10/25/2009 1:46:07 PM
|
|
Kenny McCormack wrote:
> In article <3010269.cnM708Kxvf@xkzjympik>, pk <pk@pk.invalid> wrote:
> ...
>>if they are not (which of course also works for the previous case):
>>
>>awk '{a[$1]=a[$1] OFS $2}END{for(i in a)print i a[i]}'
>
> Yes. I think of all the solutions, the array-based one is best.
> Especially, since it handles the general case (amalgamate all references
> to the key - regardless of ordering in the file).
>
> 2 notes/nitpicks about the above code:
> 1) The above works because you concatenate i with a[i] at the end,
> thereby consuming the excess OFS at the beginning of the array
> elements. Which is OK if your only intended purpose is to print it
> out. However, most of the time, what you really want is for the
> data in the array to "be correct" - and that takes some additional
> programming (in the usual use case).
In this case it was intended. The first version of the code was something
along the usual pattern of
awk '{a[$1]=a[$1] sep[$1] $2; sep[$1]=OFS} END{for(i in a)print i OFS a[i]}'
but then I soon realized that I could shorten it and get the same result by
doing what I finally posted. It's true that it's not a general solution, but
it looked OK to use it in this case.
Of course, given more information (for example: is this throwaway code, or
is it part of a larger script, that should possibly handle slightly
different variations in the input?) it would be possible to help the OP
better. Maybe he just didn't realize that the more information about the
context is provided, the better help you can get; or maybe he purposely kept
it to a minimum to just get some suggestions and figure out the rest by
himself. Whatever the reason, I think that the code I posted was appropriate
given the information provided. And if not, he could always post back to
explain how and why it is not, and provide more information.
> 2) The use of "for (i in a)" at the end is suspect, as it will print out
> the results in "random" order (*). Usually, you will want them in
> some specified order.
Same as above. Lacking more information, I went for the simplest solution,
but if that's not OK the OP could always post back.
|
|
0
|
|
|
|
Reply
|
pk
|
10/25/2009 7:49:00 PM
|
|
|
7 Replies
114 Views
(page loaded in 0.081 seconds)
|