##ALERT!!!### Noobie struggling with Awk
Hi all,
I have a problem trying to figure out how Awk can process two input
files simultaneously, as there is some reference on the web to it doing
so, eg http://www.vectorsite.net/tsawk_3.html, but it is short on detail
and example. I am running Ubuntu 7.04 (gawk I think).
Problem: take one csv file full of data, including one field that needs
to be changed/processed on a regular, automatic, scalable basis. Take
second csv file with data in two columns; first column = field matching
data in first csv, second column = field of replacement data.
So far I have:
in Batch file:
cat old.csv replacement.csv | awk -f awkins f1="old.csv"
f2="replacement.csv" >> newfile
awkins file:
BEGIN { FS = ":" };{
if($10 =="y"||$10=="Y"){ # begin process of extraction because ALLOWSELL=y
qty=$6+$8;
if (($37 =="N"||$37=="n")&&qty<=0){
status=9
}else{ status=1;
}
{
print $1," ",$1".jpg"," ",$1," ",$2," ","url"," ",$7," ",$7," ",qty,"
",$4," ",$5," ",status,"EOREOR";
}
}
}
I couldn't fihure out how to get Awk to look at the second file; getline
worked with it , but filled the field variables with data from the
second file.
Thanks for any hints or help!
|
|
0
|
|
|
|
Reply
|
Combatwombat
|
2/7/2008 2:35:34 AM |
|
On 2/6/2008 8:35 PM, Combatwombat wrote:
> ##ALERT!!!### Noobie struggling with Awk
>
> Hi all,
>
> I have a problem trying to figure out how Awk can process two input
> files simultaneously, as there is some reference on the web to it doing
> so, eg http://www.vectorsite.net/tsawk_3.html, but it is short on detail
> and example. I am running Ubuntu 7.04 (gawk I think).
>
> Problem: take one csv file full of data, including one field that needs
> to be changed/processed on a regular, automatic, scalable basis. Take
> second csv file with data in two columns; first column = field matching
> data in first csv, second column = field of replacement data.
>
> So far I have:
>
> in Batch file:
>
> cat old.csv replacement.csv | awk -f awkins f1="old.csv"
> f2="replacement.csv" >> newfile
>
> awkins file:
>
> BEGIN { FS = ":" };{
>
> if($10 =="y"||$10=="Y"){ # begin process of extraction because ALLOWSELL=y
> qty=$6+$8;
> if (($37 =="N"||$37=="n")&&qty<=0){
> status=9
> }else{ status=1;
> }
> {
>
> print $1," ",$1".jpg"," ",$1," ",$2," ","url"," ",$7," ",$7," ",qty,"
> ",$4," ",$5," ",status,"EOREOR";
>
> }
> }
> }
>
> I couldn't fihure out how to get Awk to look at the second file; getline
> worked with it , but filled the field variables with data from the
> second file.
> Thanks for any hints or help!
It's very hard to tell what it is you're trying to do. Posting some sample input
and expected output would be a big help. In the meantime:
a) "getline" is almost certainly the wrong approach.
b) concatenating both files and setting awk variables with their names is also
almost certainly the wrong approach.
c) let's start with this:
awk '{print FILENAME, $0}' old.csv replacement.csv > newfile
so you can see how you can run an awk script on multiple input files and we'll
take it from there.
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
2/7/2008 6:45:51 AM
|
|
Thanks for your reply, Ed!
The old.csv has numerous lines like this:
AAA80008 Automobile fins AAA CS 0
the replacement csv has lines like this:
AAA Archies Automobiles and Airplanes
so by replacing the AAA in the record of old.csv we end up with:
AAA80008 Automobile fins Archies Automobiles and Airplanes CS 0
Thanks!
|
|
0
|
|
|
|
Reply
|
Combatwombat
|
2/7/2008 7:45:48 AM
|
|
"Ed Morton" <morton@lsupcaemnt.com> schreef in bericht
news:47AAA91F.9010208@lsupcaemnt.com...
>
>
> On 2/6/2008 8:35 PM, Combatwombat wrote:
>> ##ALERT!!!### Noobie struggling with Awk
>>
> a) "getline" is almost certainly the wrong approach.
> b) concatenating both files and setting awk variables with their names is
> also
> almost certainly the wrong approach.
> c) let's start with this:
>
> awk '{print FILENAME, $0}' old.csv replacement.csv > newfile
>
> so you can see how you can run an awk script on multiple input files and
> we'll
> take it from there.
>
> Ed.
>
awk '{ print FILENAME, FNR, NR }' old.csv replacement.csv
will output
old.csv 1 1
old.csv 2 2
old.csv 3.3
replacement.csv 1 4
replacement.csv 2 5
replacement.csv 3 6
so, you can check in your script which inputfile you're reading from bye
comparing FNR and NR
so, i would start with:
awk 'NR=FNR{ repl[$1]=substr($0,length($1+1)); next; } { rest of your
script, which checks if repl[] things are contained in file old.csv }'
replacement.csv old.csv
|
|
0
|
|
|
|
Reply
|
Luuk
|
2/7/2008 9:27:08 AM
|
|
On 7 Feb., 08:45, Combatwombat <combatwombat@> wrote:
> Thanks for your reply, Ed!
>
> The old.csv has numerous lines like this:
>
> AAA80008 =A0 =A0 =A0 =A0Automobile fins =A0 =A0 =A0 =A0 AAA =A0 =A0 CS =A0=
=A0 =A00
>
> the replacement csv has lines like this:
> AAA =A0 =A0 Archies Automobiles and Airplanes
>
> so by replacing the AAA in the record of old.csv we end up with:
>
> AAA80008 =A0 =A0 =A0 =A0Automobile fins =A0 =A0 =A0 =A0 Archies Automobile=
s and Airplanes =A0 =A0 =A0 CS =A0 =A0 =A00
>
> Thanks!
You would first build up a mapping table from the data in
replacement file, then process the data file and replace the
respective data...
awk '...' replacementfile datafile
Now your data files seem to have different field separators,
I suspect there's a TAB in the replacementfile and comma in
the datafile, at least in the examples of your first posting
in this thread...
awk -v FS=3D"\t" -v OFS=3D"\t" '...' replacementfile FS=3D"," datafile
=2E..but if you have just TABs as separators, as it *seems*
here use
awk -v FS=3D"\t" -v OFS=3D"\t" '...' replacementfile datafile
To distinguish whether you are in the first or second file
you compare NR and FNR; in case of the first processed file
they match...
awk -v FS=3D"\t" -v OFS=3D"\t" '
NR=3D=3DFNR { map[$1] =3D $2 ; next }
...
' replacementfile datafile
Now you replace the respective field in the data file...
awk -v FS=3D"\t" -v OFS=3D"\t"
'NR=3D=3DFNR { map[$1] =3D $2 ; next }
$3 in map { $3 =3D map[$3] }
{ print }
' replacementfile datafile
(The code is untested.)
Janis
|
|
0
|
|
|
|
Reply
|
Janis
|
2/7/2008 12:44:35 PM
|
|
On 2/7/2008 1:45 AM, Combatwombat wrote:
> Thanks for your reply, Ed!
You're welcome, but in future please leave enough context so your post stands
alone as this is netnews not a web forum.
> The old.csv has numerous lines like this:
>
> AAA80008 Automobile fins AAA CS 0
>
>
> the replacement csv has lines like this:
> AAA Archies Automobiles and Airplanes
>
>
> so by replacing the AAA in the record of old.csv we end up with:
>
>
> AAA80008 Automobile fins Archies Automobiles and Airplanes CS 0
>
> Thanks!
If those are tabs between the fields then all you need is:
awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {$3=a[$3];print}' replacement.csv
old.csv
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
2/7/2008 2:01:22 PM
|
|
Ed Morton wrote:
>
> On 2/7/2008 1:45 AM, Combatwombat wrote:
>> Thanks for your reply, Ed!
>
> You're welcome, but in future please leave enough context so your post stands
> alone as this is netnews not a web forum.
>
>> The old.csv has numerous lines like this:
>>
>> AAA80008 Automobile fins AAA CS 0
>>
>>
>> the replacement csv has lines like this:
>> AAA Archies Automobiles and Airplanes
>>
>>
>> so by replacing the AAA in the record of old.csv we end up with:
>>
>>
>> AAA80008 Automobile fins Archies Automobiles and Airplanes CS 0
>>
>> Thanks!
>
> If those are tabs between the fields then all you need is:
>
> awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {$3=a[$3];print}' replacement.csv
> old.csv
>
> Ed.
>
Thanks, Mate! that did it;
awk 'BEGIN{FS=OFS=":"} NR==FNR{a[$1]=$2;next} {$4=a[$4];print}'
replacement.csv old.csv > result.csv
(there was another tab in there, and the csv is colon delimited)
You the man, I've spent hours bashing my head on this one, and was just
about to go and learn python to do it.
|
|
0
|
|
|
|
Reply
|
Combatwombat
|
2/7/2008 2:46:46 PM
|
|
Ed Morton wrote:
>
> On 2/7/2008 1:45 AM, Combatwombat wrote:
>
>>Thanks for your reply, Ed!
>
>
> You're welcome, but in future please leave enough context so your post stands
> alone as this is netnews not a web forum.
>
>
>>The old.csv has numerous lines like this:
>>
>>AAA80008 Automobile fins AAA CS 0
>>
>>
>>the replacement csv has lines like this:
>>AAA Archies Automobiles and Airplanes
>>
>>
>>so by replacing the AAA in the record of old.csv we end up with:
>>
>>
>>AAA80008 Automobile fins Archies Automobiles and Airplanes CS 0
>>
>>Thanks!
>
>
> If those are tabs between the fields then all you need is:
>
> awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {$3=a[$3];print}' replacement.csv
> old.csv
>
> Ed.
>
It seems to fit for the OP, but mind that {$3=a[$3];print} will
_always_ replace $3, even in cases where there's no substitute for
the third field defined in the map; in that case the third field
would be blanked.
Janis
|
|
0
|
|
|
|
Reply
|
Janis
|
2/7/2008 6:35:34 PM
|
|
Janis Papanagnou wrote:
> Ed Morton wrote:
>>
>> On 2/7/2008 1:45 AM, Combatwombat wrote:
>>
>>> Thanks for your reply, Ed!
>>
>>
>> You're welcome, but in future please leave enough context so your post
>> stands
>> alone as this is netnews not a web forum.
>>
>>
>>> The old.csv has numerous lines like this:
>>>
>>> AAA80008 Automobile fins AAA CS 0
>>>
>>>
>>> the replacement csv has lines like this:
>>> AAA Archies Automobiles and Airplanes
>>>
>>>
>>> so by replacing the AAA in the record of old.csv we end up with:
>>>
>>>
>>> AAA80008 Automobile fins Archies Automobiles and
>>> Airplanes CS 0
>>>
>>> Thanks!
>>
>>
>> If those are tabs between the fields then all you need is:
>>
>> awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {$3=a[$3];print}'
>> replacement.csv
>> old.csv
>>
>> Ed.
>>
>
> It seems to fit for the OP, but mind that {$3=a[$3];print} will
> _always_ replace $3, even in cases where there's no substitute for
> the third field defined in the map; in that case the third field
> would be blanked.
>
> Janis
Yes, you are right Janis.
I soon discovered that, and reworked the orginal code I had using getline:
BEGIN { FS = ":" };{
if($10 =="y"||$10=="Y"){ # begin process of extraction because ALLOWSELL=y
qty=$6+$8;
if (($37 =="N"||$37=="n")&&qty<=0){
status=9
}else{ status=1;
}
{
#Reassign Globals
NewNR=NR; NewFNR=FNR
for (i=1; i<=37; i++){
store[i]=$i
}#end for loop of storing data
while ((getline < "replacement.csv") > 0)# load replacements file
if ($1==store[4]){ manf=$2 } #select the replacement
close("replacement.csv") #close replacement file
NR=NewNR; FNR=NewFNR
for (i=1; i<=37; i++){
$i=store[i]
}#end for loop of restoring data
$4=manf #assign the replacement
print
$1,":",$1".jpg",":",$1,":",$4,":",$2,":","url",":",$7,":",$7,":",qty,":",$5,":",status,"EOREOR";
}
}
}
I'm betting this could probably be cleaned up code-wise, but it is easy
to read and debug at the moment.
The entire project works well at converting an aging DBF based
accounting system data into data usable by Zencart. If anyone is
interested I will be posting my findings at
http://combatwombat.7doves.com soon.
Thanks for your help everyone!
|
|
0
|
|
|
|
Reply
|
Combatwombat
|
2/7/2008 8:53:59 PM
|
|
On 2/7/2008 2:53 PM, Combatwombat wrote:
> Janis Papanagnou wrote:
>
>>Ed Morton wrote:
>>
>>>On 2/7/2008 1:45 AM, Combatwombat wrote:
>>>
>>>
>>>>Thanks for your reply, Ed!
>>>
>>>
>>>You're welcome, but in future please leave enough context so your post
>>>stands
>>>alone as this is netnews not a web forum.
>>>
>>>
>>>
>>>>The old.csv has numerous lines like this:
>>>>
>>>>AAA80008 Automobile fins AAA CS 0
>>>>
>>>>
>>>>the replacement csv has lines like this:
>>>>AAA Archies Automobiles and Airplanes
>>>>
>>>>
>>>>so by replacing the AAA in the record of old.csv we end up with:
>>>>
>>>>
>>>>AAA80008 Automobile fins Archies Automobiles and
>>>>Airplanes CS 0
>>>>
>>>>Thanks!
>>>
>>>
>>>If those are tabs between the fields then all you need is:
>>>
>>>awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {$3=a[$3];print}'
>>>replacement.csv
>>>old.csv
>>>
>>> Ed.
>>>
>>
>>It seems to fit for the OP, but mind that {$3=a[$3];print} will
>>_always_ replace $3, even in cases where there's no substitute for
>>the third field defined in the map; in that case the third field
>>would be blanked.
>>
>>Janis
>
>
> Yes, you are right Janis.
Yes, all you need is a tweak to check for the key field being present in the array:
awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} $3 in a{$3=a[$3]} {print}'
replacement.csv old.csv
> I soon discovered that, and reworked the orginal code I had using getline:
>
> BEGIN { FS = ":" };{
>
> if($10 =="y"||$10=="Y"){ # begin process of extraction because ALLOWSELL=y
> qty=$6+$8;
> if (($37 =="N"||$37=="n")&&qty<=0){
> status=9
> }else{ status=1;
> }
>
> {
> #Reassign Globals
> NewNR=NR; NewFNR=FNR
> for (i=1; i<=37; i++){
> store[i]=$i
>
> }#end for loop of storing data
> while ((getline < "replacement.csv") > 0)# load replacements file
> if ($1==store[4]){ manf=$2 } #select the replacement
> close("replacement.csv") #close replacement file
>
> NR=NewNR; FNR=NewFNR
> for (i=1; i<=37; i++){
> $i=store[i]
>
> }#end for loop of restoring data
> $4=manf #assign the replacement
>
>
> print
> $1,":",$1".jpg",":",$1,":",$4,":",$2,":","url",":",$7,":",$7,":",qty,":",$5,":",status,"EOREOR";
>
> }
> }
>
>
> }
>
> I'm betting this could probably be cleaned up code-wise, but it is easy
> to read and debug at the moment.
>
> The entire project works well at converting an aging DBF based
> accounting system data into data usable by Zencart. If anyone is
> interested I will be posting my findings at
> http://combatwombat.7doves.com soon.
>
> Thanks for your help everyone!
You REALLY need to read http://tinyurl.com/yn9ka9 if you're still considering
using getline. Among other problems, the above will read the whole of
"replacement.csv" once for each record in "old.csv".
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
2/7/2008 8:59:43 PM
|
|
|
9 Replies
242 Views
(page loaded in 0.089 seconds)
|