matched search string

  • Follow


I am trying to read in a data file that is comma separated and match 4
chars and keep the subsequent date chars and add that to my output
file.  e.g.

My input file format is something like this:
BBBB:2005/11/01,BBBC:2005/12/01,BBBD:2005/12/07,BBBB:2005/12/08

I want to read the file in and match BBBB and save <date string> and
output a new 4 char name like ZZZZ and append the saved date string for
that match (in this case ":2005/12/01" and ":2005/12/08" ) so that my
file would then read.

BBBB:2005/11/01,ZZZZ:2005/11/01,BBBC:2005/12/01,BBBD:2005/11/07,BBBB:2005/12/08,ZZZZ:2005/12/08

I have scoured the sed & awk and vi book by O'Reily but cannot see a
similar option other than hold space in sed. Maybe perl??

Thanks in advance,

Mike D

0
Reply eeb4u (26) 11/2/2005 2:19:01 AM

eeb4u@hotmail.com wrote:
> I am trying to read in a data file that is comma separated and match 4
> chars and keep the subsequent date chars and add that to my output
> file.  e.g.
> 
> My input file format is something like this:
> BBBB:2005/11/01,BBBC:2005/12/01,BBBD:2005/12/07,BBBB:2005/12/08
> 
> I want to read the file in and match BBBB and save <date string> and
> output a new 4 char name like ZZZZ and append the saved date string for
> that match (in this case ":2005/12/01" and ":2005/12/08" ) so that my
> file would then read.
> 
> BBBB:2005/11/01,ZZZZ:2005/11/01,BBBC:2005/12/01,BBBD:2005/11/07,BBBB:2005/12/08,ZZZZ:2005/12/08
> 
> I have scoured the sed & awk and vi book by O'Reily but cannot see a
> similar option other than hold space in sed. Maybe perl??
> 
> Thanks in advance,
> 
> Mike D
> 

It's not apparent whether your file consists of multiple lines like in
the example above or just a stream of characters in one line. In case
it's a one line stream the following awk program does what you want...

BEGIN { ORS=RS="," ; OFS=FS=":" }
{ print $1,$2 ; if ($1 == "BBBB") print "ZZZZ",$2 }


Janis
0
Reply Janis 11/2/2005 3:05:59 AM


The file consists of several lines, one record per line where one
record can have many dates and publications (BBBB, BBBD etc).  I showed
only one line, sorry!

this worked on all but one occurrence during my tests.  Additionally, I
had to modify script as my input file actually contains quotes and
reads:

""BBBB:2005/10/31"",""BBBC:2005/11/01"",""BBBD:2005/11/01""

etc.

thus

BEGIN { ORS=RS="," ; OFS=FS=":" }
{ print $1,$2 ; if ($1 == "\"\"BBBB") print "\"\"ZZZZ",$2 }

The only instance that fails is if BBBB is the last record on the line
and isn't followed by the RS comma.  The script fails to translate this
single record.

I am almost at the end of my shift. I will continue trying to get this
working or hopefully you may have a solution by tomorrow.

Thanks again for your excellent solution.  You turned my 30 line ksh
script that took about 2 hours to run (on my sparc 20) into a
lightspeed one liner!

Mike D

0
Reply eeb4u 11/2/2005 6:15:42 AM

a quick fix is to append each line with a comma using sed, run the awk
one-liner and strip off the comma afterwards.

sed 's/"$/",/g' inputfile > appended.file
awk -f scriptfile appended.file > awked.file
sed 's/",$/"/g' awked.file > datemod.file

I am sure there is a much more elegant, efficient way to accomplish
this!

thanks again

Mike D

0
Reply eeb4u 11/2/2005 6:24:12 AM

eeb4u@hotmail.com wrote:
> a quick fix is to append each line with a comma using sed, run the awk
> one-liner and strip off the comma afterwards.
> 
> sed 's/"$/",/g' inputfile > appended.file
> awk -f scriptfile appended.file > awked.file
> sed 's/",$/"/g' awked.file > datemod.file
> 
> I am sure there is a much more elegant, efficient way to accomplish
> this!
> 
> thanks again
> 
> Mike D
> 

Please read these before posting again:

       http://cfaj.freeshell.org/google
       http://en.wikipedia.org/wiki/Top-posting
       http://en.wikipedia.org/wiki/Netiquette

Now, wrt your problem, does this do what you want:

$ cat file
""BBBB:2005/10/31"",""BBBC:2005/11/01"",""BBBD:2005/11/01""
""BBBA:2005/10/31"",""BBBC:2005/11/01"",""BBBB:2005/11/01""
$ awk 'BEGIN {OFS=FS="," }{ for (i=1;i<=NF;i++) if ($i ~ /^\"\"BBBB:/) { 
tmp = $i; sub(/BBBB/,"ZZZZ",tmp); $i = $i OFS tmp }}1' file
""BBBB:2005/10/31"",""ZZZZ:2005/10/31"",""BBBC:2005/11/01"",""BBBD:2005/11/01""
""BBBA:2005/10/31"",""BBBC:2005/11/01"",""BBBB:2005/11/01"",""ZZZZ:2005/11/01""

Regards,

	Ed.
0
Reply Ed 11/2/2005 12:09:13 PM

eeb4u@hotmail.com wrote:
> The file consists of several lines, one record per line where one
> record can have many dates and publications (BBBB, BBBD etc).  I showed
> only one line, sorry!
>
> this worked on all but one occurrence during my tests.  Additionally, I
> had to modify script as my input file actually contains quotes and
> reads:
>
> ""BBBB:2005/10/31"",""BBBC:2005/11/01"",""BBBD:2005/11/01""
>
> etc.
>
> thus
>
> BEGIN { ORS=RS="," ; OFS=FS=":" }
> { print $1,$2 ; if ($1 == "\"\"BBBB") print "\"\"ZZZZ",$2 }
>
> The only instance that fails is if BBBB is the last record on the line
> and isn't followed by the RS comma.  The script fails to translate this
> single record.
>
> I am almost at the end of my shift. I will continue trying to get this
> working or hopefully you may have a solution by tomorrow.
>
> Thanks again for your excellent solution.  You turned my 30 line ksh
> script that took about 2 hours to run (on my sparc 20) into a
> lightspeed one liner!
>
> Mike D

BEGIN { FS=OFS="\"\",\"\"" }
{ gsub( /^""|""$/, "" )
  for (i=1;i<=NF;i++)
    if ( $i ~ /^BBBB/ )
      $i = $i FS "ZZZZ" substr($i,5)
  print "\"\"" $0 "\"\""
}

0
Reply William 11/2/2005 7:49:49 PM

On 1 Nov 2005 22:24:12 -0800, eeb4u@hotmail.com wrote:

>a quick fix is to append each line with a comma using sed, run the awk
>one-liner and strip off the comma afterwards.
>
>sed 's/"$/",/g' inputfile > appended.file
>awk -f scriptfile appended.file > awked.file
>sed 's/",$/"/g' awked.file > datemod.file
>
>I am sure there is a much more elegant, efficient way to accomplish
>this!


Hi Mike,

You could just stay with one invocation of 'sed';

sed 's|BBBB:\([^,]*\)\(,*\)|BBBB:\1,ZZZZ:\1\2|g' datemod.file

Provided of course the fields are all in the format given
in the original post ( "BBBB:" can't be embedded in the data).

byefornow
laura



>
>thanks again
>
>Mike D
>

-- 
echo alru_aafriehdab@ittnreen.tocm |sed 's/\(.\)\(.\)/\2\1/g'
0
Reply run_signature_script 11/2/2005 10:11:59 PM

"laura fairhead" <run_signature_script_for_my_email@INVALID.com> wrote in 
message news:436937d8.36534764@news.btinternet.com...
> You could just stay with one invocation of 'sed';
>
> sed 's|BBBB:\([^,]*\)\(,*\)|BBBB:\1,ZZZZ:\1\2|g' datemod.file
>
> Provided of course the fields are all in the format given
> in the original post ( "BBBB:" can't be embedded in the data).
>
> byefornow
> laura

I will try the two new solutions tonight.

Thanks,


Mike 


0
Reply Mike 11/4/2005 3:39:38 PM

7 Replies
200 Views

(page loaded in 0.092 seconds)

Similiar Articles:













7/21/2012 2:01:20 AM


Reply: