AWK command inside the shell script

  • Follow


Hi,

The following awk command works fine  if I use it separately. It
prints 3 columns.

awk 'BEGIN{FS="\t"} NR==FNR{a[$1]=$2;next} {if($1 in a) {printf("%s\t%s
\t%s\n", $1,$1,a[$1])} else {printf("%s\t%s\tNA\n", $1,$1)}}' file1
file2 > res.txt

I would like to run this command for 100 columns of file 1. So, I
tried:

#! /bin/bash
for i in {2..100}
do
        awk 'BEGIN{FS="\t"}NR==FNR{a[$1]=$i;next} {if($1 in a)
{printf("%s\t%s\t%s\n", $1,$1,a[$1])} else {printf("%s\t%s\tNA\n",
$1,$1)}}' file1 file2 > res$i.txt
done

It prints all the columns of file2 twice (I don't know why).  Could yu
please help to fix this? I have used shell since I have to pass
res*.txt to another program as an input inside the loop.

Thanks in advance.

Kind regards,
Ezhil
0
Reply ezhil 12/16/2010 3:54:18 AM

On 12/15/2010 9:54 PM, ezhil wrote:
> Hi,
>
> The following awk command works fine  if I use it separately. It
> prints 3 columns.
>
> awk 'BEGIN{FS="\t"} NR==FNR{a[$1]=$2;next} {if($1 in a) {printf("%s\t%s
> \t%s\n", $1,$1,a[$1])} else {printf("%s\t%s\tNA\n", $1,$1)}}' file1
> file2>  res.txt
>
> I would like to run this command for 100 columns of file 1. So, I
> tried:
>
> #! /bin/bash
> for i in {2..100}
> do
>          awk 'BEGIN{FS="\t"}NR==FNR{a[$1]=$i;next} {if($1 in a)
> {printf("%s\t%s\t%s\n", $1,$1,a[$1])} else {printf("%s\t%s\tNA\n",
> $1,$1)}}' file1 file2>  res$i.txt
> done
>
> It prints all the columns of file2 twice (I don't know why).  Could yu
> please help to fix this? I have used shell since I have to pass
> res*.txt to another program as an input inside the loop.
>
> Thanks in advance.
>
> Kind regards,
> Ezhil

Step 1: make it readable

#! /bin/bash
for i in {2..100}
do
         awk '
	BEGIN{FS="\t"}
	NR==FNR {
	    a[$1]=$i    #<<<< NOTE
	    next
	}
	{   if($1 in a) {
		printf("%s\t%s\t%s\n", $1,$1,a[$1])
	    } else {
		printf("%s\t%s\tNA\n", $1,$1)
	    }
	}' file1 file2 > res$i.txt
done

Looks like you're trying to access the contents of the shell variable 
"i" above, but instead you're accessing the awk variable "i" which has 
the value zero so this:

	a[$1]=$i

Is the same as this:

	a[$1]=$0

To access the value of a shell variable inside an awk script you do this:

	awk -v awki="$shelli" '...awki...'

rather than:

	awk '...$shelli...'

You can also simplify your printf so I think the above should really be:

#! /bin/bash
for i in {2..100}
do
     awk -F'\t' -v i="$i" '
	NR==FNR { a[$1]=$i; next }
	{ printf "%s\t%s\t%s\n", $1,$1,($1 in a ? a[$1] : "NA") }
     ' file1 file2 > res$i.txt
done

Try that and if it's still not doing what you want (I'm not sure what 
that is!) then post a followup. I'm sure you don't need that external 
shell loop either but since I can't figure out what you're really trying 
to do I hesitate to suggest a better way within the awk script.

	Ed.
0
Reply Ed 12/16/2010 4:15:54 AM


On Dec 16, 4:15=A0am, Ed Morton <mortons...@gmail.com> wrote:
> On 12/15/2010 9:54 PM, ezhil wrote:
>
>
>
> > Hi,
>
> > The following awk command works fine =A0if I use it separately. It
> > prints 3 columns.
>
> > awk 'BEGIN{FS=3D"\t"} NR=3D=3DFNR{a[$1]=3D$2;next} {if($1 in a) {printf=
("%s\t%s
> > \t%s\n", $1,$1,a[$1])} else {printf("%s\t%s\tNA\n", $1,$1)}}' file1
> > file2> =A0res.txt
>
> > I would like to run this command for 100 columns of file 1. So, I
> > tried:
>
> > #! /bin/bash
> > for i in {2..100}
> > do
> > =A0 =A0 =A0 =A0 =A0awk 'BEGIN{FS=3D"\t"}NR=3D=3DFNR{a[$1]=3D$i;next} {i=
f($1 in a)
> > {printf("%s\t%s\t%s\n", $1,$1,a[$1])} else {printf("%s\t%s\tNA\n",
> > $1,$1)}}' file1 file2> =A0res$i.txt
> > done
>
> > It prints all the columns of file2 twice (I don't know why). =A0Could y=
u
> > please help to fix this? I have used shell since I have to pass
> > res*.txt to another program as an input inside the loop.
>
> > Thanks in advance.
>
> > Kind regards,
> > Ezhil
>
> Step 1: make it readable
>
> #! /bin/bash
> for i in {2..100}
> do
> =A0 =A0 =A0 =A0 =A0awk '
> =A0 =A0 =A0 =A0 BEGIN{FS=3D"\t"}
> =A0 =A0 =A0 =A0 NR=3D=3DFNR {
> =A0 =A0 =A0 =A0 =A0 =A0 a[$1]=3D$i =A0 =A0#<<<< NOTE
> =A0 =A0 =A0 =A0 =A0 =A0 next
> =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 { =A0 if($1 in a) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("%s\t%s\t%s\n", $1,$1,a[$1])
> =A0 =A0 =A0 =A0 =A0 =A0 } else {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("%s\t%s\tNA\n", $1,$1)
> =A0 =A0 =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 }' file1 file2 > res$i.txt
> done
>
> Looks like you're trying to access the contents of the shell variable
> "i" above, but instead you're accessing the awk variable "i" which has
> the value zero so this:
>
> =A0 =A0 =A0 =A0 a[$1]=3D$i
>
> Is the same as this:
>
> =A0 =A0 =A0 =A0 a[$1]=3D$0
>
> To access the value of a shell variable inside an awk script you do this:
>
> =A0 =A0 =A0 =A0 awk -v awki=3D"$shelli" '...awki...'
>
> rather than:
>
> =A0 =A0 =A0 =A0 awk '...$shelli...'
>
> You can also simplify your printf so I think the above should really be:
>
> #! /bin/bash
> for i in {2..100}
> do
> =A0 =A0 =A0awk -F'\t' -v i=3D"$i" '
> =A0 =A0 =A0 =A0 NR=3D=3DFNR { a[$1]=3D$i; next }
> =A0 =A0 =A0 =A0 { printf "%s\t%s\t%s\n", $1,$1,($1 in a ? a[$1] : "NA") }
> =A0 =A0 =A0' file1 file2 > res$i.txt
> done
>
> Try that and if it's still not doing what you want (I'm not sure what
> that is!) then post a followup. I'm sure you don't need that external
> shell loop either but since I can't figure out what you're really trying
> to do I hesitate to suggest a better way within the awk script.
>
> =A0 =A0 =A0 =A0 Ed.

Brilliant, it works fine. Thanks a lot Ed. I am passing res$i.txt as
an input to another program called eig3.

#! /bin/bash
for i in {2..100}
do
      awk -F'\t' -v i=3D"$i" '
         NR=3D=3DFNR { a[$1]=3D$i; next }
         { printf "%s\t%s\t%s\n", $1,$1,($1 in a ? a[$1] : "NA") }
      ' file1 file2 > res$i.txt

      eig3 -d 10 cons.txt - p res$i.txt -o out$i.txt
done

The out$i.txt file is the output generated by 'eig3', which has 3
columns. The first column is same for all 99 files. I would like to
make a final output file that has first column (which is common)
followed by 2 columns from 99 files. Could you please suggest me how
do this?  I have tried 'cat out$i.txt >> final.txt', knowing this
doesn't work for what I wanted.

Thanks a lot for your time.

Kind regards,
Ezhil

0
Reply ezhil 12/16/2010 5:16:30 AM

On 12/15/2010 11:16 PM, ezhil wrote:
> On Dec 16, 4:15 am, Ed Morton<mortons...@gmail.com>  wrote:
>> On 12/15/2010 9:54 PM, ezhil wrote:
>>
>>
>>
>>> Hi,
>>
>>> The following awk command works fine  if I use it separately. It
>>> prints 3 columns.
>>
>>> awk 'BEGIN{FS="\t"} NR==FNR{a[$1]=$2;next} {if($1 in a) {printf("%s\t%s
>>> \t%s\n", $1,$1,a[$1])} else {printf("%s\t%s\tNA\n", $1,$1)}}' file1
>>> file2>    res.txt
>>
>>> I would like to run this command for 100 columns of file 1. So, I
>>> tried:
>>
>>> #! /bin/bash
>>> for i in {2..100}
>>> do
>>>           awk 'BEGIN{FS="\t"}NR==FNR{a[$1]=$i;next} {if($1 in a)
>>> {printf("%s\t%s\t%s\n", $1,$1,a[$1])} else {printf("%s\t%s\tNA\n",
>>> $1,$1)}}' file1 file2>    res$i.txt
>>> done
>>
>>> It prints all the columns of file2 twice (I don't know why).  Could yu
>>> please help to fix this? I have used shell since I have to pass
>>> res*.txt to another program as an input inside the loop.
>>
>>> Thanks in advance.
>>
>>> Kind regards,
>>> Ezhil
>>
>> Step 1: make it readable
>>
>> #! /bin/bash
>> for i in {2..100}
>> do
>>           awk '
>>          BEGIN{FS="\t"}
>>          NR==FNR {
>>              a[$1]=$i    #<<<<  NOTE
>>              next
>>          }
>>          {   if($1 in a) {
>>                  printf("%s\t%s\t%s\n", $1,$1,a[$1])
>>              } else {
>>                  printf("%s\t%s\tNA\n", $1,$1)
>>              }
>>          }' file1 file2>  res$i.txt
>> done
>>
>> Looks like you're trying to access the contents of the shell variable
>> "i" above, but instead you're accessing the awk variable "i" which has
>> the value zero so this:
>>
>>          a[$1]=$i
>>
>> Is the same as this:
>>
>>          a[$1]=$0
>>
>> To access the value of a shell variable inside an awk script you do this:
>>
>>          awk -v awki="$shelli" '...awki...'
>>
>> rather than:
>>
>>          awk '...$shelli...'
>>
>> You can also simplify your printf so I think the above should really be:
>>
>> #! /bin/bash
>> for i in {2..100}
>> do
>>       awk -F'\t' -v i="$i" '
>>          NR==FNR { a[$1]=$i; next }
>>          { printf "%s\t%s\t%s\n", $1,$1,($1 in a ? a[$1] : "NA") }
>>       ' file1 file2>  res$i.txt
>> done
>>
>> Try that and if it's still not doing what you want (I'm not sure what
>> that is!) then post a followup. I'm sure you don't need that external
>> shell loop either but since I can't figure out what you're really trying
>> to do I hesitate to suggest a better way within the awk script.
>>
>>          Ed.
>
> Brilliant, it works fine. Thanks a lot Ed. I am passing res$i.txt as
> an input to another program called eig3.
>
> #! /bin/bash
> for i in {2..100}
> do
>        awk -F'\t' -v i="$i" '
>           NR==FNR { a[$1]=$i; next }
>           { printf "%s\t%s\t%s\n", $1,$1,($1 in a ? a[$1] : "NA") }
>        ' file1 file2>  res$i.txt
>
>        eig3 -d 10 cons.txt - p res$i.txt -o out$i.txt
> done
>
> The out$i.txt file is the output generated by 'eig3', which has 3
> columns. The first column is same for all 99 files. I would like to
> make a final output file that has first column (which is common)
> followed by 2 columns from 99 files. Could you please suggest me how
> do this?  I have tried 'cat out$i.txt>>  final.txt', knowing this
> doesn't work for what I wanted.
>
> Thanks a lot for your time.
>
> Kind regards,
> Ezhil
>

Maybe something like this:

awk 'NR==FNR { a[FNR]=$1 } { a[FNR]=a[FNR]"\t"$2"\t"$3 }
      END { for (i=1;i<=FNR;i++) print a[i] } out*.txt > final.txt

called after the shell loop is what you want.

	Ed.
0
Reply Ed 12/16/2010 9:30:00 AM

El 16/12/2010 4:54, ezhil escribi�:
> Hi,
>
> The following awk command works fine  if I use it separately. It
> prints 3 columns.
>
> awk 'BEGIN{FS="\t"} NR==FNR{a[$1]=$2;next} {if($1 in a) {printf("%s\t%s
> \t%s\n", $1,$1,a[$1])} else {printf("%s\t%s\tNA\n", $1,$1)}}' file1
> file2>  res.txt
>
> I would like to run this command for 100 columns of file 1. So, I
> tried:
>
> #! /bin/bash
> for i in {2..100}
> do
>          awk 'BEGIN{FS="\t"}NR==FNR{a[$1]=$i;next} {if($1 in a)
> {printf("%s\t%s\t%s\n", $1,$1,a[$1])} else {printf("%s\t%s\tNA\n",
> $1,$1)}}' file1 file2>  res$i.txt
> done
>
> It prints all the columns of file2 twice (I don't know why).  Could yu
> please help to fix this? I have used shell since I have to pass
> res*.txt to another program as an input inside the loop.
>
> Thanks in advance.

In addition to the solution given by other people, my advice is to 
always put non trivial awk code in a file, say 'mycode.awk' and then 
invoke awk as:

awk -f mycode.awk ...

This avoids a lot of shell quoting problems.

Regards.
-- 
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

0
Reply Manuel 12/16/2010 10:16:50 AM

4 Replies
418 Views

(page loaded in 0.063 seconds)

Similiar Articles:













7/23/2012 11:16:49 AM


Reply: