Dear all!
I want to plot two data files using Gnuplot; in the Gnuplot newsgroup
comp.graphics.apps.gnuplot, I have been advised to proceed my problem
to the awk newsgroup, since my problem seems to be beyond the scope of
Gnuplot. I am new to awk, and thus don't know if awk would be capable
to solve my problem. Thus, I would be very grateful if one of You
specialists here could give me a clear hint. Thanks in advance!!! This
is the problem:
I have two data files that I'd need to plot - one is experimental data,
one reference data. I think it is not important for You what type of
data it is, but just briefly: the data are mass spectrometry data.
The problem: the files have slightly different formatting. My original
experimental data, for example, look something like this (beginning of
a file):
45.000000 11093
46.100000 670
47.100000 747
48.100000 670
49.000000 5200
50.000000 16501
51.100000 26043
As You'll see: the x-values start at 45, and are non-integer (some
values are a bit off from the ".0"-reading) - but the x-spacing between
the data points is approx. one.
My reference data, however, start as this:
15 150
26 80
27 380
29 160
31 60
37 60
38 190
You see: the reference only states the values that are non-zero;
additionally, all x-values are integer, and start at an arbitrary value
(the first non-zero point).
Of course, also the y-values are different (but that is just an
arbitrary scaling factor which isn't important).
Now, what I'd like to have is ONE file that contains both data; I would
only need the x-values once, and I can live with integer values for the
x-axis; in fact, the few non-integer values in the experimental data
are just odd, and for our purpose it would be okay to average them to
integers. So, what I'd need would look something like this:
1 0 0
2 0 0
3 0 0
....
15 0 150
16 0 0
....
26 0 80
....
45 11093 90
Okay? So, I'd need to pad the missing lines into both files (it would
be okay to start at "1", since this would be the first menaingful
x-value in my case - although the intensity (y-value) might be zero).
Additionally, I'd need to average the x-values from the experimental
data file to integers, and use only one of the x-data for the file;
then, I'd need to copy the respective y-values from both files into the
two other columns...
Actually, it is quite a simple task - but I have no clue if awk would
be able to do this - and how to start... unfortunately, I am not a
specialist in those sorts of things, and I don't know much about
regular expressions and such things.
So, as said: could someone of You give me a hint a) as to whether or
not this is possible, and b) how I might start?
Thank You very much for considering my question! And in advance already
for any answer!
With kind regards,
Bj�rn
|
|
0
|
|
|
|
Reply
|
iso
|
3/8/2010 10:11:12 PM |
|
On 3/8/2010 4:11 PM, Bj�rn wrote:
> Dear all!
>
>
> I want to plot two data files using Gnuplot; in the Gnuplot newsgroup
> comp.graphics.apps.gnuplot, I have been advised to proceed my problem to
> the awk newsgroup, since my problem seems to be beyond the scope of
> Gnuplot. I am new to awk, and thus don't know if awk would be capable to
> solve my problem. Thus, I would be very grateful if one of You
> specialists here could give me a clear hint. Thanks in advance!!! This
> is the problem:
>
> I have two data files that I'd need to plot - one is experimental data,
> one reference data. I think it is not important for You what type of
> data it is, but just briefly: the data are mass spectrometry data.
>
> The problem: the files have slightly different formatting. My original
> experimental data, for example, look something like this (beginning of a
> file):
> 45.000000 11093
> 46.100000 670
> 47.100000 747
> 48.100000 670
> 49.000000 5200
> 50.000000 16501
> 51.100000 26043
> As You'll see: the x-values start at 45, and are non-integer (some
> values are a bit off from the ".0"-reading) - but the x-spacing between
> the data points is approx. one.
>
> My reference data, however, start as this:
> 15 150
> 26 80
> 27 380
> 29 160
> 31 60
> 37 60
> 38 190
> You see: the reference only states the values that are non-zero;
> additionally, all x-values are integer, and start at an arbitrary value
> (the first non-zero point).
>
> Of course, also the y-values are different (but that is just an
> arbitrary scaling factor which isn't important).
>
> Now, what I'd like to have is ONE file that contains both data; I would
> only need the x-values once, and I can live with integer values for the
> x-axis; in fact, the few non-integer values in the experimental data are
> just odd, and for our purpose it would be okay to average them to
> integers. So, what I'd need would look something like this:
>
> 1 0 0
> 2 0 0
> 3 0 0
> ...
> 15 0 150
> 16 0 0
> ...
> 26 0 80
> ...
> 45 11093 90
>
> Okay? So, I'd need to pad the missing lines into both files (it would be
> okay to start at "1", since this would be the first menaingful x-value
> in my case - although the intensity (y-value) might be zero).
> Additionally, I'd need to average the x-values from the experimental
> data file to integers, and use only one of the x-data for the file;
> then, I'd need to copy the respective y-values from both files into the
> two other columns...
>
>
> Actually, it is quite a simple task - but I have no clue if awk would be
> able to do this - and how to start... unfortunately, I am not a
> specialist in those sorts of things, and I don't know much about regular
> expressions and such things.
> So, as said: could someone of You give me a hint a) as to whether or not
> this is possible, and b) how I might start?
>
> Thank You very much for considering my question! And in advance already
> for any answer!
> With kind regards,
> Bj�rn
>
Yes, awk can do it. Whatever it is. I think what you're trying to say is that
given input like this:
file1:
3.00 5
5.10 9
7.10 2
file2:
4 8
5 2
you'd want output like this:
1 0 0
2 0 0
3 5 0
4 0 8
5 9 2
6 0 0
7 2 0
So, the output is the first field field rounded down starting at 1 and ending at
the highest number seen, the 2nd field of output is the 2nd field from file1
while the 3rd field of output is the 2nd field from file2. That'd be something
like (untested):
awk '{ $1 = int($1); max = ($1 > max ? $1 : max); out[$1,NR==FNR] = $2 }
END { for (i=1;i<=max;i++) printf "%d %d %d\n",i,out[i,1],out[i,0] }
If that's not what you want, provide some small, simple sample input and
expected output as I showed above.
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
3/8/2010 10:46:20 PM
|
|
Hey, Ed!
GREAT! Thank You very much!!! Great help! :-D I am really happy!!!
Your suggestion really does the trick for me. I had to tweak around a
bit to get it working (as said, I have never used awk - so I had to
find out how to call it and name the input and output files...), but
that's nothing. It was a real motivation to know it would work! (In
fact, I was a bit surprised that Gnuplot couldn't handle this task...
and thus, I was a bit sceptical what tool might be able to help me).
Just for reference (for other awk-newbies such as myself): if You want
Ed's example to work from command line, You have to add the input and
output file information; this is what I used in the end as a call from
the terminal:
awk '{ $1 =3D int($1); max =3D ($1 > max ? $1 : max); out[$1,NR=3D=3DFNR] =
=3D
$2 } END { for (i=3D1;i<=3Dmax;i++) printf "%d %d %d\n",i,out[i,1],out[i,
0] }' expt.txt ref.txt > out.txt
You may recognize that it is Ed's example, but I added the apostrophe
(or single quote sign) at the end to end the command definition fro
awk, and then added the two input files (expt.txt and ref.txt), and -
using the >-sign - the output file out.txt. Thus, my output ends up in
out.txt.
Thanks again, Ed, for taking the time to answer my (probably simple)
question! You saved me a lot of time looking around erratically!
Thanks!
Kind regards,
Bj=F6rn
> So, the output is the first field field rounded down starting at 1 and en=
ding at
> the highest number seen, the 2nd field of output is the 2nd field from fi=
le1
> while the 3rd field of output is the 2nd field from file2. That'd be some=
thing
> like (untested):
>
> awk '{ $1 =3D int($1); max =3D ($1 > max ? $1 : max); out[$1,NR=3D=3DFNR]=
=3D $2 }
> =A0 =A0 =A0 END { for (i=3D1;i<=3Dmax;i++) printf "%d %d %d\n",i,out[i,1]=
,out[i,0] }
>
> If that's not what you want, provide some small, simple sample input and
> expected output as I showed above.
>
> =A0 =A0 =A0 Ed.
|
|
0
|
|
|
|
Reply
|
bbcda
|
3/9/2010 6:05:40 AM
|
|
On 3/9/2010 12:05 AM, bbcda wrote:
> Hey, Ed!
>
> GREAT! Thank You very much!!! Great help! :-D I am really happy!!!
> Your suggestion really does the trick for me. I had to tweak around a
> bit to get it working (as said, I have never used awk - so I had to
> find out how to call it and name the input and output files...), but
> that's nothing. It was a real motivation to know it would work! (In
> fact, I was a bit surprised that Gnuplot couldn't handle this task...
> and thus, I was a bit sceptical what tool might be able to help me).
> Just for reference (for other awk-newbies such as myself): if You want
> Ed's example to work from command line, You have to add the input and
> output file information; this is what I used in the end as a call from
> the terminal:
> awk '{ $1 = int($1); max = ($1> max ? $1 : max); out[$1,NR==FNR] =
> $2 } END { for (i=1;i<=max;i++) printf "%d %d %d\n",i,out[i,1],out[i,
> 0] }' expt.txt ref.txt> out.txt
>
>
> You may recognize that it is Ed's example, but I added the apostrophe
> (or single quote sign) at the end to end the command definition fro
> awk, and then added the two input files (expt.txt and ref.txt), and -
> using the>-sign - the output file out.txt. Thus, my output ends up in
> out.txt.
>
> Thanks again, Ed, for taking the time to answer my (probably simple)
> question! You saved me a lot of time looking around erratically!
> Thanks!
You're welcome. FYI gnuplot is a tool for plotting graphs, not manipulating text
files, while awk is a tool for manipulating text files, not for plotting graphs.
Since a lot of what typically needs to be plotted is a result of manipulating
text files, awk is gnuplots best friend so if you're going to be using gnuplot a
lot, you might want to learn awk from the book Effective Awk Programming, Third
Edition By Arnold Robbins (http://www.oreilly.com/catalog/awkprog3/) and lurk
around here a bit.
Ed.
> Kind regards,
> Bj�rn
>
>
>
>> So, the output is the first field field rounded down starting at 1 and ending at
>> the highest number seen, the 2nd field of output is the 2nd field from file1
>> while the 3rd field of output is the 2nd field from file2. That'd be something
>> like (untested):
>>
>> awk '{ $1 = int($1); max = ($1> max ? $1 : max); out[$1,NR==FNR] = $2 }
>> END { for (i=1;i<=max;i++) printf "%d %d %d\n",i,out[i,1],out[i,0] }
>>
>> If that's not what you want, provide some small, simple sample input and
>> expected output as I showed above.
>>
>> Ed.
>
|
|
0
|
|
|
|
Reply
|
Ed
|
3/9/2010 12:42:45 PM
|
|
|
3 Replies
130 Views
(page loaded in 0.093 seconds)
|