Process two files simultaneously

  • Follow


##ALERT!!!### Noobie struggling with Awk

Hi all,

I have a problem trying to figure out how Awk can process two input 
files simultaneously, as there is some reference on the web to it doing 
so, eg http://www.vectorsite.net/tsawk_3.html, but it is short on detail 
and example. I am running Ubuntu 7.04 (gawk I think).

Problem: take one csv file full of data, including one field that needs 
to be changed/processed on a regular, automatic, scalable basis. Take 
second csv file with data in two columns; first column = field matching 
data in first csv, second column = field of replacement data.

So far I have:

in Batch file:

cat old.csv replacement.csv | awk -f awkins f1="old.csv" 
f2="replacement.csv" >> newfile

awkins file:

BEGIN  { FS = ":" };{

if($10 =="y"||$10=="Y"){ # begin process of extraction because ALLOWSELL=y
qty=$6+$8;
if (($37 =="N"||$37=="n")&&qty<=0){
	status=9
}else{ status=1;
}
{

print $1,"	",$1".jpg","	",$1,"	",$2,"	","url","	",$7,"	",$7,"	",qty," 
",$4,"	",$5,"	",status,"EOREOR";

}
}
}

I couldn't fihure out how to get Awk to look at the second file; getline 
  worked with it , but filled the field variables with data from the 
second file.
Thanks for any hints or help!
0
Reply Combatwombat 2/7/2008 2:35:34 AM


On 2/6/2008 8:35 PM, Combatwombat wrote:
> ##ALERT!!!### Noobie struggling with Awk
> 
> Hi all,
> 
> I have a problem trying to figure out how Awk can process two input 
> files simultaneously, as there is some reference on the web to it doing 
> so, eg http://www.vectorsite.net/tsawk_3.html, but it is short on detail 
> and example. I am running Ubuntu 7.04 (gawk I think).
> 
> Problem: take one csv file full of data, including one field that needs 
> to be changed/processed on a regular, automatic, scalable basis. Take 
> second csv file with data in two columns; first column = field matching 
> data in first csv, second column = field of replacement data.
> 
> So far I have:
> 
> in Batch file:
> 
> cat old.csv replacement.csv | awk -f awkins f1="old.csv" 
> f2="replacement.csv" >> newfile
> 
> awkins file:
> 
> BEGIN  { FS = ":" };{
> 
> if($10 =="y"||$10=="Y"){ # begin process of extraction because ALLOWSELL=y
> qty=$6+$8;
> if (($37 =="N"||$37=="n")&&qty<=0){
> 	status=9
> }else{ status=1;
> }
> {
> 
> print $1,"	",$1".jpg","	",$1,"	",$2,"	","url","	",$7,"	",$7,"	",qty," 
> ",$4,"	",$5,"	",status,"EOREOR";
> 
> }
> }
> }
> 
> I couldn't fihure out how to get Awk to look at the second file; getline 
>   worked with it , but filled the field variables with data from the 
> second file.
> Thanks for any hints or help!

It's very hard to tell what it is you're trying to do. Posting some sample input
and expected output would be a big help. In the meantime:

a) "getline" is almost certainly the wrong approach.
b) concatenating both files and setting awk variables with their names is also
almost certainly the wrong approach.
c) let's start with this:

	awk '{print FILENAME, $0}' old.csv replacement.csv > newfile

so you can see how you can run an awk script on multiple input files and we'll
take it from there.

	Ed.

0
Reply Ed 2/7/2008 6:45:51 AM


Thanks for your reply, Ed!

The old.csv has numerous lines like this:

AAA80008	Automobile fins		AAA	CS	0


the replacement csv has lines like this:
AAA 	Archies Automobiles and Airplanes


so by replacing the AAA in the record of old.csv we end up with:


AAA80008	Automobile fins		Archies Automobiles and Airplanes	CS	0

Thanks!
0
Reply Combatwombat 2/7/2008 7:45:48 AM

"Ed Morton" <morton@lsupcaemnt.com> schreef in bericht 
news:47AAA91F.9010208@lsupcaemnt.com...
>
>
> On 2/6/2008 8:35 PM, Combatwombat wrote:
>> ##ALERT!!!### Noobie struggling with Awk
>>

> a) "getline" is almost certainly the wrong approach.
> b) concatenating both files and setting awk variables with their names is 
> also
> almost certainly the wrong approach.
> c) let's start with this:
>
> awk '{print FILENAME, $0}' old.csv replacement.csv > newfile
>
> so you can see how you can run an awk script on multiple input files and 
> we'll
> take it from there.
>
> Ed.
>

awk '{ print FILENAME, FNR, NR }' old.csv replacement.csv

will output
old.csv 1 1
old.csv 2 2
old.csv 3.3
replacement.csv 1 4
replacement.csv 2 5
replacement.csv 3 6

so, you can check in your script which inputfile you're reading from bye 
comparing FNR and NR

so, i would start with:

awk 'NR=FNR{ repl[$1]=substr($0,length($1+1)); next; } { rest of your 
script, which checks if repl[] things are contained in file old.csv }' 
replacement.csv old.csv



0
Reply Luuk 2/7/2008 9:27:08 AM

On 7 Feb., 08:45, Combatwombat <combatwombat@> wrote:
> Thanks for your reply, Ed!
>
> The old.csv has numerous lines like this:
>
> AAA80008 =A0 =A0 =A0 =A0Automobile fins =A0 =A0 =A0 =A0 AAA =A0 =A0 CS =A0=
 =A0 =A00
>
> the replacement csv has lines like this:
> AAA =A0 =A0 Archies Automobiles and Airplanes
>
> so by replacing the AAA in the record of old.csv we end up with:
>
> AAA80008 =A0 =A0 =A0 =A0Automobile fins =A0 =A0 =A0 =A0 Archies Automobile=
s and Airplanes =A0 =A0 =A0 CS =A0 =A0 =A00
>
> Thanks!

You would first build up a mapping table from the data in
replacement file, then process the data file and replace the
respective data...

awk '...' replacementfile datafile

Now your data files seem to have different field separators,
I suspect there's a TAB in the replacementfile and comma in
the datafile, at least in the examples of your first posting
in this thread...

  awk -v FS=3D"\t" -v OFS=3D"\t" '...'  replacementfile FS=3D"," datafile

=2E..but if you have just TABs as separators, as it *seems*
here use

  awk -v FS=3D"\t" -v OFS=3D"\t" '...' replacementfile datafile

To distinguish whether you are in the first or second file
you compare NR and FNR; in case of the first processed file
they match...

awk -v FS=3D"\t" -v OFS=3D"\t" '
     NR=3D=3DFNR { map[$1] =3D $2 ; next }
     ...
    ' replacementfile datafile

Now you replace the respective field in the data file...

awk -v FS=3D"\t" -v OFS=3D"\t"
    'NR=3D=3DFNR { map[$1] =3D $2 ; next }
     $3 in map { $3 =3D map[$3] }
     { print }
    ' replacementfile datafile


(The code is untested.)

Janis
0
Reply Janis 2/7/2008 12:44:35 PM


On 2/7/2008 1:45 AM, Combatwombat wrote:
> Thanks for your reply, Ed!

You're welcome, but in future please leave enough context so your post stands
alone as this is netnews not a web forum.

> The old.csv has numerous lines like this:
> 
> AAA80008	Automobile fins		AAA	CS	0
> 
> 
> the replacement csv has lines like this:
> AAA 	Archies Automobiles and Airplanes
> 
> 
> so by replacing the AAA in the record of old.csv we end up with:
> 
> 
> AAA80008	Automobile fins		Archies Automobiles and Airplanes	CS	0
> 
> Thanks!

If those are tabs between the fields then all you need is:

awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {$3=a[$3];print}' replacement.csv
old.csv

	Ed.

0
Reply Ed 2/7/2008 2:01:22 PM

Ed Morton wrote:
> 
> On 2/7/2008 1:45 AM, Combatwombat wrote:
>> Thanks for your reply, Ed!
> 
> You're welcome, but in future please leave enough context so your post stands
> alone as this is netnews not a web forum.
> 
>> The old.csv has numerous lines like this:
>>
>> AAA80008	Automobile fins		AAA	CS	0
>>
>>
>> the replacement csv has lines like this:
>> AAA 	Archies Automobiles and Airplanes
>>
>>
>> so by replacing the AAA in the record of old.csv we end up with:
>>
>>
>> AAA80008	Automobile fins		Archies Automobiles and Airplanes	CS	0
>>
>> Thanks!
> 
> If those are tabs between the fields then all you need is:
> 
> awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {$3=a[$3];print}' replacement.csv
> old.csv
> 
> 	Ed.
> 

Thanks, Mate! that did it;
awk 'BEGIN{FS=OFS=":"} NR==FNR{a[$1]=$2;next} {$4=a[$4];print}' 
replacement.csv old.csv > result.csv

(there was another tab in there, and the csv is colon delimited)
You the man, I've spent hours bashing my head on this one, and was just 
about to go and learn python to do it.
0
Reply Combatwombat 2/7/2008 2:46:46 PM

Ed Morton wrote:
> 
> On 2/7/2008 1:45 AM, Combatwombat wrote:
> 
>>Thanks for your reply, Ed!
> 
> 
> You're welcome, but in future please leave enough context so your post stands
> alone as this is netnews not a web forum.
> 
> 
>>The old.csv has numerous lines like this:
>>
>>AAA80008	Automobile fins		AAA	CS	0
>>
>>
>>the replacement csv has lines like this:
>>AAA 	Archies Automobiles and Airplanes
>>
>>
>>so by replacing the AAA in the record of old.csv we end up with:
>>
>>
>>AAA80008	Automobile fins		Archies Automobiles and Airplanes	CS	0
>>
>>Thanks!
> 
> 
> If those are tabs between the fields then all you need is:
> 
> awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {$3=a[$3];print}' replacement.csv
> old.csv
> 
> 	Ed.
> 

It seems to fit for the OP, but mind that  {$3=a[$3];print}  will
_always_ replace $3, even in cases where there's no substitute for
the third field defined in the map; in that case the third field
would be blanked.

Janis
0
Reply Janis 2/7/2008 6:35:34 PM

Janis Papanagnou wrote:
> Ed Morton wrote:
>>
>> On 2/7/2008 1:45 AM, Combatwombat wrote:
>>
>>> Thanks for your reply, Ed!
>>
>>
>> You're welcome, but in future please leave enough context so your post 
>> stands
>> alone as this is netnews not a web forum.
>>
>>
>>> The old.csv has numerous lines like this:
>>>
>>> AAA80008    Automobile fins        AAA    CS    0
>>>
>>>
>>> the replacement csv has lines like this:
>>> AAA     Archies Automobiles and Airplanes
>>>
>>>
>>> so by replacing the AAA in the record of old.csv we end up with:
>>>
>>>
>>> AAA80008    Automobile fins        Archies Automobiles and 
>>> Airplanes    CS    0
>>>
>>> Thanks!
>>
>>
>> If those are tabs between the fields then all you need is:
>>
>> awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {$3=a[$3];print}' 
>> replacement.csv
>> old.csv
>>
>>     Ed.
>>
> 
> It seems to fit for the OP, but mind that  {$3=a[$3];print}  will
> _always_ replace $3, even in cases where there's no substitute for
> the third field defined in the map; in that case the third field
> would be blanked.
> 
> Janis

Yes, you are right Janis.
I soon discovered that, and reworked the orginal code I had using getline:

BEGIN  { FS = ":" };{

if($10 =="y"||$10=="Y"){ # begin process of extraction because ALLOWSELL=y
qty=$6+$8;
if (($37 =="N"||$37=="n")&&qty<=0){
status=9
}else{ status=1;
}

{
#Reassign Globals
NewNR=NR; NewFNR=FNR
for (i=1; i<=37; i++){
store[i]=$i

	}#end for loop of storing data
while ((getline < "replacement.csv") > 0)# load replacements file
      if ($1==store[4]){ manf=$2 } #select the replacement
close("replacement.csv") #close replacement file

NR=NewNR; FNR=NewFNR
for (i=1; i<=37; i++){
$i=store[i]

}#end for loop of restoring data
$4=manf #assign the replacement


print 
$1,":",$1".jpg",":",$1,":",$4,":",$2,":","url",":",$7,":",$7,":",qty,":",$5,":",status,"EOREOR";

}
}


}

I'm betting this could probably be cleaned up code-wise, but it is easy 
to read and debug at the moment.

The entire project works well at converting an aging DBF based 
accounting system data into data usable by Zencart. If anyone is 
interested I will be posting my findings at 
http://combatwombat.7doves.com soon.

Thanks for your help everyone!
0
Reply Combatwombat 2/7/2008 8:53:59 PM


On 2/7/2008 2:53 PM, Combatwombat wrote:
> Janis Papanagnou wrote:
> 
>>Ed Morton wrote:
>>
>>>On 2/7/2008 1:45 AM, Combatwombat wrote:
>>>
>>>
>>>>Thanks for your reply, Ed!
>>>
>>>
>>>You're welcome, but in future please leave enough context so your post 
>>>stands
>>>alone as this is netnews not a web forum.
>>>
>>>
>>>
>>>>The old.csv has numerous lines like this:
>>>>
>>>>AAA80008    Automobile fins        AAA    CS    0
>>>>
>>>>
>>>>the replacement csv has lines like this:
>>>>AAA     Archies Automobiles and Airplanes
>>>>
>>>>
>>>>so by replacing the AAA in the record of old.csv we end up with:
>>>>
>>>>
>>>>AAA80008    Automobile fins        Archies Automobiles and 
>>>>Airplanes    CS    0
>>>>
>>>>Thanks!
>>>
>>>
>>>If those are tabs between the fields then all you need is:
>>>
>>>awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} {$3=a[$3];print}' 
>>>replacement.csv
>>>old.csv
>>>
>>>    Ed.
>>>
>>
>>It seems to fit for the OP, but mind that  {$3=a[$3];print}  will
>>_always_ replace $3, even in cases where there's no substitute for
>>the third field defined in the map; in that case the third field
>>would be blanked.
>>
>>Janis
> 
> 
> Yes, you are right Janis.

Yes, all you need is a tweak to check for the key field being present in the array:

awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$1]=$2;next} $3 in a{$3=a[$3]} {print}'
replacement.csv old.csv

> I soon discovered that, and reworked the orginal code I had using getline:
> 
> BEGIN  { FS = ":" };{
> 
> if($10 =="y"||$10=="Y"){ # begin process of extraction because ALLOWSELL=y
> qty=$6+$8;
> if (($37 =="N"||$37=="n")&&qty<=0){
> status=9
> }else{ status=1;
> }
> 
> {
> #Reassign Globals
> NewNR=NR; NewFNR=FNR
> for (i=1; i<=37; i++){
> store[i]=$i
> 
> 	}#end for loop of storing data
> while ((getline < "replacement.csv") > 0)# load replacements file
>       if ($1==store[4]){ manf=$2 } #select the replacement
> close("replacement.csv") #close replacement file
> 
> NR=NewNR; FNR=NewFNR
> for (i=1; i<=37; i++){
> $i=store[i]
> 
> }#end for loop of restoring data
> $4=manf #assign the replacement
> 
> 
> print 
> $1,":",$1".jpg",":",$1,":",$4,":",$2,":","url",":",$7,":",$7,":",qty,":",$5,":",status,"EOREOR";
> 
> }
> }
> 
> 
> }
> 
> I'm betting this could probably be cleaned up code-wise, but it is easy 
> to read and debug at the moment.
> 
> The entire project works well at converting an aging DBF based 
> accounting system data into data usable by Zencart. If anyone is 
> interested I will be posting my findings at 
> http://combatwombat.7doves.com soon.
> 
> Thanks for your help everyone!

You REALLY need to read http://tinyurl.com/yn9ka9 if you're still considering
using getline. Among other problems, the above will read the whole of
"replacement.csv" once for each record in "old.csv".

	Ed.

0
Reply Ed 2/7/2008 8:59:43 PM

9 Replies
242 Views

(page loaded in 0.089 seconds)

Similiar Articles:













7/26/2012 7:38:34 AM


Reply: