Remove values from columns in txt file with awk

  • Follow


Hi

I have a file which has a number of columns (both the number rows and
columns can vary from file to file) as follows.


Elapsed(ms)	Ratio(11)	Ratio(110)	Ratio(111)	Ratio(112)
19185		0.420455	0		0		0
566480		0.428044	0		0		0
600226		0.438697	0		0		0
633620		0.436214	0		0		0
664160		0.428571	0		0		0
698094		0.426641	0		0		0
729617		0.424603	0		0		0
760946		0.423459	0.394673	0.491124	0.505263
792959		0.424658	0.396277	0.490946	0
821160		0.42126		0.390745	0.491159	0.508621
856719		0.420432	0.441527	0		0.503171
889029		0.42998		2.26506		0.5		0.51693
924815		0.424		2.215385	0.486022	0.506849
956986		0.42515		1.481481	0.491736	0.527845
990435		0.432		1.096774	0.491632	0.516908
1024625		0		1.010753	0.488753	0.51773
1058423		0.428571	0.954545	0.488613	0.513317
1094587		0.429167	1.054054	0.486258	0.513854
1125157		0.430962	0.811765	0.487912	0.514139
1160837		0.427948	0.663366	0.489933	0.5
1196926		0.424628	0.617647	0.493274	0.503836
1227740		0.421268	0.546296	0.483193	0.49005
1261482		0.420635	0.468254	0.491416	0.493703
1296704		0.422179	0.461538	0.486081	0.479419
1331305		0.423625	0.460317	0.48866		0.49005
1360964		0.42418		0.454545	0.486708	0.488889
1398889		0.424547	0.438017	0.480435	0.481013
1430577		0.424051	0.438596	0.481153	0.486005
1466249		0.420382	0.430894	0.476395	0.476998
1499169		0.432548	0.439024	0.477322	0.47619
1535165		0.422078	0.41791		0.478555	0.472036
1574422		0.411236	0.404762	0.475983	0.472906

In columns 3, 4 and 5 you can see that the first 7 values are 0. (I
can make this 0.00 if it makes this easier).
I want to remove this block of zeros from any column and move all the
values below it, up. I'm sychronizing the data by removing those
leading zeros. In some columns there may be one leading 0, in other
columns there could be a block of 7 or 70 leading zeros. But once I
hit a number greater than zero any zero further down must remain. It's
only leading zeros, not ALL zeros I wanna remove.

So the above data could look like this:

Elapsed(ms)	Ratio(11)	Ratio(110)	Ratio(111)	Ratio(112)
19185		0.420455	0.394673	0.491124	0.505263
566480		0.428044	0.396277	0.490946	0
600226		0.438697	0.390745	0.491159	0.508621
633620		0.436214	0.441527	0		0.503171
664160		0.428571	2.26506		0.5		0.51693
698094		0.426641	2.215385	0.486022	0.506849
729617		0.424603	1.481481	0.491736	0.527845
760946		0.423459	1.096774	0.491632	0.516908
792959		0.424658	1.010753	0.488753	0.51773
821160		0.42126		0.954545	0.488613	0.513317
856719		0.420432	1.054054	0.486258	0.513854
889029		0.42998		0.811765	0.487912	0.514139
924815		0.424		0.663366	0.489933	0.5
956986		0.42515		0.617647	0.493274	0.503836
990435		0.432		0.546296	0.483193	0.49005
1024625		0		0.468254	0.491416	0.493703
1058423		0.428571	0.461538	0.486081	0.479419
1094587		0.429167	0.460317	0.48866		0.49005
1125157		0.430962	0.454545	0.486708	0.488889
1160837		0.427948	0.438017	0.480435	0.481013
1196926		0.424628	0.438596	0.481153	0.486005
1227740		0.421268	0.430894	0.476395	0.476998
1261482		0.420635	0.439024	0.477322	0.47619
1296704		0.422179	0.41791		0.478555	0.472036
1331305		0.423625	0.404762	0.475983	0.472906
1360964		0.42418			
1398889		0.424547			
1430577		0.424051			
1466249		0.420382			
1499169		0.432548			
1535165		0.422078			
1574422		0.411236			

Usually I have about 200 columns and 100 rows. The first 15 or so
columns are usually fine but by column 20 the zero values start
showing up. This is data for light intensities from cells ... and not
all light up at the same time. So removing leading zeros allows me to
sychronize columns.

Anyone have an idea to do this in awk ... I'd appreciate the help.

Thanx

Roger
0
Reply spacemancw 4/8/2004 6:07:28 PM

Since your dataset is so small, this an in memory solution.

	{ for(col=1;col<=NF;col++) {
		if($col!=0) data[NR,current[col]++] = $NF
		else if(current[col]>0) data[NR,current[col]++] = $NF
		}
		records = NR
		if(fields<NF) fields = NF
	}
END	{ for(row=1;row<=records;row++) {
		for(col=1;col<=fields;col++) {
			if(row<current[col])printf("%g ",data[row,col])
			}
		printf("\n")
		}
	)
Basically keep track of the current maximum number of values per column. 
If the input datum in this column is not zero, put the datum in the 
doubly subscripted array data and increment the count for this column. 
If the input datum is zero, then if the number of elements in this 
column already is greater than 0, put the datum in the doubly 
subscripted array data and increment the count for this column.

After all the records have been read, for each possible field in each 
record, output the corresponding element of data, if we have not gone 
beyond the number of data values for this column. This solution does not 
guarantee that the column k<l implies data(column k) > data(column l)!

The usual caveats apply; this off-the-cuff, untested code.
spacemancw wrote:
> Hi
> 
> I have a file which has a number of columns (both the number rows and
> columns can vary from file to file) as follows.
> 
> 
> Elapsed(ms)	Ratio(11)	Ratio(110)	Ratio(111)	Ratio(112)
> 19185		0.420455	0		0		0
> 566480		0.428044	0		0		0
> 600226		0.438697	0		0		0
> 633620		0.436214	0		0		0
> 664160		0.428571	0		0		0
> 698094		0.426641	0		0		0
> 729617		0.424603	0		0		0
> 760946		0.423459	0.394673	0.491124	0.505263
> 792959		0.424658	0.396277	0.490946	0
> 821160		0.42126		0.390745	0.491159	0.508621
> 856719		0.420432	0.441527	0		0.503171
> 889029		0.42998		2.26506		0.5		0.51693
> 924815		0.424		2.215385	0.486022	0.506849
> 956986		0.42515		1.481481	0.491736	0.527845
> 990435		0.432		1.096774	0.491632	0.516908
> 1024625		0		1.010753	0.488753	0.51773
> 1058423		0.428571	0.954545	0.488613	0.513317
> 1094587		0.429167	1.054054	0.486258	0.513854
> 1125157		0.430962	0.811765	0.487912	0.514139
> 1160837		0.427948	0.663366	0.489933	0.5
> 1196926		0.424628	0.617647	0.493274	0.503836
> 1227740		0.421268	0.546296	0.483193	0.49005
> 1261482		0.420635	0.468254	0.491416	0.493703
> 1296704		0.422179	0.461538	0.486081	0.479419
> 1331305		0.423625	0.460317	0.48866		0.49005
> 1360964		0.42418		0.454545	0.486708	0.488889
> 1398889		0.424547	0.438017	0.480435	0.481013
> 1430577		0.424051	0.438596	0.481153	0.486005
> 1466249		0.420382	0.430894	0.476395	0.476998
> 1499169		0.432548	0.439024	0.477322	0.47619
> 1535165		0.422078	0.41791		0.478555	0.472036
> 1574422		0.411236	0.404762	0.475983	0.472906
> 
> In columns 3, 4 and 5 you can see that the first 7 values are 0. (I
> can make this 0.00 if it makes this easier).
> I want to remove this block of zeros from any column and move all the
> values below it, up. I'm sychronizing the data by removing those
> leading zeros. In some columns there may be one leading 0, in other
> columns there could be a block of 7 or 70 leading zeros. But once I
> hit a number greater than zero any zero further down must remain. It's
> only leading zeros, not ALL zeros I wanna remove.
> 
> So the above data could look like this:
> 
> Elapsed(ms)	Ratio(11)	Ratio(110)	Ratio(111)	Ratio(112)
> 19185		0.420455	0.394673	0.491124	0.505263
> 566480		0.428044	0.396277	0.490946	0
> 600226		0.438697	0.390745	0.491159	0.508621
> 633620		0.436214	0.441527	0		0.503171
> 664160		0.428571	2.26506		0.5		0.51693
> 698094		0.426641	2.215385	0.486022	0.506849
> 729617		0.424603	1.481481	0.491736	0.527845
> 760946		0.423459	1.096774	0.491632	0.516908
> 792959		0.424658	1.010753	0.488753	0.51773
> 821160		0.42126		0.954545	0.488613	0.513317
> 856719		0.420432	1.054054	0.486258	0.513854
> 889029		0.42998		0.811765	0.487912	0.514139
> 924815		0.424		0.663366	0.489933	0.5
> 956986		0.42515		0.617647	0.493274	0.503836
> 990435		0.432		0.546296	0.483193	0.49005
> 1024625		0		0.468254	0.491416	0.493703
> 1058423		0.428571	0.461538	0.486081	0.479419
> 1094587		0.429167	0.460317	0.48866		0.49005
> 1125157		0.430962	0.454545	0.486708	0.488889
> 1160837		0.427948	0.438017	0.480435	0.481013
> 1196926		0.424628	0.438596	0.481153	0.486005
> 1227740		0.421268	0.430894	0.476395	0.476998
> 1261482		0.420635	0.439024	0.477322	0.47619
> 1296704		0.422179	0.41791		0.478555	0.472036
> 1331305		0.423625	0.404762	0.475983	0.472906
> 1360964		0.42418			
> 1398889		0.424547			
> 1430577		0.424051			
> 1466249		0.420382			
> 1499169		0.432548			
> 1535165		0.422078			
> 1574422		0.411236			
> 
> Usually I have about 200 columns and 100 rows. The first 15 or so
> columns are usually fine but by column 20 the zero values start
> showing up. This is data for light intensities from cells ... and not
> all light up at the same time. So removing leading zeros allows me to
> sychronize columns.
> 
> Anyone have an idea to do this in awk ... I'd appreciate the help.
> 
> Thanx
> 
> Roger

0
Reply Robert 4/8/2004 7:21:01 PM



spacemancw wrote:
> Hi
> 
> I have a file which has a number of columns (both the number rows and
> columns can vary from file to file) as follows.
> 
> 
> Elapsed(ms)	Ratio(11)	Ratio(110)	Ratio(111)	Ratio(112)
> 19185		0.420455	0		0		0
> 566480		0.428044	0		0		0
> 600226		0.438697	0		0		0
> 633620		0.436214	0		0		0
> 664160		0.428571	0		0		0
> 698094		0.426641	0		0		0
> 729617		0.424603	0		0		0
> 760946		0.423459	0.394673	0.491124	0.505263
> 792959		0.424658	0.396277	0.490946	0
<snip>
I posted this yesterday but apparently it never made it out of my news 
server. Appologies if you see it twice.

Something like this should work:

gawk 'BEGIN{OFS="\t"}
     NR == 1 {
         for (i=1; i<=NF; i++) {
                 line[NR,i] = $i
         }
         next
     }
     {
         for (i=1; i<=NF; i++) {
                 if (($i != 0) && (! (i in start))) {
                         start[i] = NR
                 }
                 if (i in start) {
                         idx = 2 + NR - start[i]
                         line[idx,i] = $i
                 }
         }
     }
     END {
         for (i=1; i<=NR; i++) {
                 for (j=1; j<=NF; j++) {
                         $j = line[i,j]
                 }
                 print
         }
     }'

Regards,

     Ed.

0
Reply Ed 4/9/2004 12:51:32 PM

Thanx much for the reply. I'm having problems as follows

I put the code into a file called ratio-script.awk and did a chmod +x
------------------------------------------------------------
#!/usr/bin/awk

gawk 'BEGIN{OFS=" "}
     NR == 1 {
         for (i=1; i<=NF; i++) {
                 line[NR,i] = $i
         }
         next
     }
     {
 ......... and so on .. exactly as you gave it to me.
------------------------------------------------------------

Then I do:

bin/ratio-script.awk ratio.txt 

and I get

awk: division by zero
 input record number 1, file ratio.txt
 source line number 1

I tried this for the script in the first reply to my post also and got
the same message:

awk: division by zero
 input record number 1, file ratio.txt
 source line number 1

And tried both scripts with and without the first row in my data file:
Elapsed(ms) Ratio(11) Ratio(110) Ratio(111) Ratio(112)

anymore ideas?

Thanx

Roger

Ed Morton <morton@lsupcaemnt.com> wrote in message news:<NbSdnZXDXtLJAevdRVn-ig@comcast.com>...
> spacemancw wrote:
> > Hi
> > 
> > I have a file which has a number of columns (both the number rows and
> > columns can vary from file to file) as follows.
> > 
> > 
> > Elapsed(ms)	Ratio(11)	Ratio(110)	Ratio(111)	Ratio(112)
> > 19185		0.420455	0		0		0
> > 566480		0.428044	0		0		0
> > 600226		0.438697	0		0		0
> > 633620		0.436214	0		0		0
> > 664160		0.428571	0		0		0
> > 698094		0.426641	0		0		0
> > 729617		0.424603	0		0		0
> > 760946		0.423459	0.394673	0.491124	0.505263
> > 792959		0.424658	0.396277	0.490946	0
> <snip>
> I posted this yesterday but apparently it never made it out of my news 
> server. Appologies if you see it twice.
> 
> Something like this should work:
> 
> gawk 'BEGIN{OFS="\t"}
>      NR == 1 {
>          for (i=1; i<=NF; i++) {
>                  line[NR,i] = $i
>          }
>          next
>      }
>      {
>          for (i=1; i<=NF; i++) {
>                  if (($i != 0) && (! (i in start))) {
>                          start[i] = NR
>                  }
>                  if (i in start) {
>                          idx = 2 + NR - start[i]
>                          line[idx,i] = $i
>                  }
>          }
>      }
>      END {
>          for (i=1; i<=NR; i++) {
>                  for (j=1; j<=NF; j++) {
>                          $j = line[i,j]
>                  }
>                  print
>          }
>      }'
> 
> Regards,
> 
>      Ed.
0
Reply spacemancw 4/12/2004 12:20:40 PM


spacemancw wrote:
> Thanx much for the reply. I'm having problems as follows
> 
> I put the code into a file called ratio-script.awk and did a chmod +x
> ------------------------------------------------------------
> #!/usr/bin/awk

Try getting rid of the above "#!..." line.

	Ed.

0
Reply Ed 4/12/2004 2:25:43 PM

On Mon, 12 Apr 2004 09:25:43 -0500 in comp.lang.awk, Ed Morton
<morton@lsupcaemnt.com> wrote:

>
>
>spacemancw wrote:
>> Thanx much for the reply. I'm having problems as follows
>> 
>> I put the code into a file called ratio-script.awk and did a chmod +x
>> ------------------------------------------------------------
>> #!/usr/bin/awk
>
>Try getting rid of the above "#!..." line.

Or get rid of the "gawk '" prefix and the "'" suffix. 

-- 
Thanks. Take care, Brian Inglis 	Calgary, Alberta, Canada

Brian.Inglis@CSi.com 	(Brian dot Inglis at SystematicSw dot ab dot ca)
    fake address		use address above to reply
0
Reply Brian 4/12/2004 6:17:29 PM

Ed

ya I did that and now I get 

spaceman@ibookx> bin/ratio-script.awk ratio.txt 
tcsh: echo: No such file or directory.

I'm on OS X ... just so u know .. I don't have gawk .. just awk


Ed Morton <morton@lsupcaemnt.com> wrote in message news:<c5e8t9$5in@netnews.proxy.lucent.com>...
> spacemancw wrote:
> > Thanx much for the reply. I'm having problems as follows
> > 
> > I put the code into a file called ratio-script.awk and did a chmod +x
> > ------------------------------------------------------------
> > #!/usr/bin/awk
> 
> Try getting rid of the above "#!..." line.
> 
> 	Ed.
0
Reply spacemancw 4/12/2004 6:50:08 PM


spacemancw wrote:
> Ed
> 
> ya I did that and now I get 
> 
> spaceman@ibookx> bin/ratio-script.awk ratio.txt 
> tcsh: echo: No such file or directory.
> 
> I'm on OS X ... just so u know .. I don't have gawk .. just awk

The problem I was trying to help you with with my most recent posting is 
that you can either have the #!....awk shebang line OR you specifically 
invoke awk within the script - you can't do both or it's like you're 
trying to invoke awk within an awk program. You say you only have "awk" 
but that could mean many things. Some older versions of awk may not 
support the features in the script in which case you can always download 
and install gawk or look for a newer version of awk on your machine. I 
don't know whey you got the error message you did, but then I never use 
tcsh. You could change the first line of your script to be a shebang 
that invokes another shell if you like, then change the "gawk" to "awk" 
in my script, or just delete the "gawk '" from the start and the "'" 
from the end of my script.

	Ed.

> 
> Ed Morton <morton@lsupcaemnt.com> wrote in message news:<c5e8t9$5in@netnews.proxy.lucent.com>...
> 
>>spacemancw wrote:
>>
>>>Thanx much for the reply. I'm having problems as follows
>>>
>>>I put the code into a file called ratio-script.awk and did a chmod +x
>>>------------------------------------------------------------
>>>#!/usr/bin/awk
>>
>>Try getting rid of the above "#!..." line.
>>
>>	Ed.

0
Reply Ed 4/12/2004 8:11:24 PM

Ed thanx a million

I was originally putting in the #!/usr/bin/awk
then removed it and was doing
awk -f <path to script> <inputfile>
and this was giving no errors ... just a big blank output. Nothing in
the output.
Anway so I went to gnu.org and downloaded and installed gawk 3.0.6
I run gawk -f <path to script> <inputfile> and it works
I ran it on a file with 125 columns and 50 rows and it works ...
mostly.
This is brilliant and I thank you very much ..... next thing is to get
gawk for cygwin on Win2k .....

thanx

Roger

Ed Morton <morton@lsupcaemnt.com> wrote in message news:<c5et5c$fjr@netnews.proxy.lucent.com>...
> spacemancw wrote:
> > Ed
> > 
> > ya I did that and now I get 
> > 
> > spaceman@ibookx> bin/ratio-script.awk ratio.txt 
> > tcsh: echo: No such file or directory.
> > 
> > I'm on OS X ... just so u know .. I don't have gawk .. just awk
> 
> The problem I was trying to help you with with my most recent posting is 
> that you can either have the #!....awk shebang line OR you specifically 
> invoke awk within the script - you can't do both or it's like you're 
> trying to invoke awk within an awk program. You say you only have "awk" 
> but that could mean many things. Some older versions of awk may not 
> support the features in the script in which case you can always download 
> and install gawk or look for a newer version of awk on your machine. I 
> don't know whey you got the error message you did, but then I never use 
> tcsh. You could change the first line of your script to be a shebang 
> that invokes another shell if you like, then change the "gawk" to "awk" 
> in my script, or just delete the "gawk '" from the start and the "'" 
> from the end of my script.
> 
> 	Ed.
> 
> > 
> > Ed Morton <morton@lsupcaemnt.com> wrote in message news:<c5e8t9$5in@netnews.proxy.lucent.com>...
> > 
> >>spacemancw wrote:
> >>
> >>>Thanx much for the reply. I'm having problems as follows
> >>>
> >>>I put the code into a file called ratio-script.awk and did a chmod +x
> >>>------------------------------------------------------------
> >>>#!/usr/bin/awk
> >>
> >>Try getting rid of the above "#!..." line.
> >>
> >>	Ed.
0
Reply spacemancw 4/13/2004 12:31:29 PM

8 Replies
166 Views

(page loaded in 0.113 seconds)

Similiar Articles:













7/28/2012 9:53:40 AM


Reply: