running gawk;
I have an ascii file with the following format:
start: record 1
head1: fjoijefowijfwoijf
head2: fiwjowiefojwf
head3: fwofjwfoiwfoj
headx: woifjowjwioef
end
name: blb abl bla blb fjie j
address: fwoijwe fwlkjwefj
phone: wfjowejf wfjw ofi
cell: ifejw foiw jfowi jeoi
value: fi woiw fowiej owefj
start: record 2 etc...
here's my gawk code:
BEGIN { RS="(start.*end)*" }
{
print "---\n"$0"\n===";
}
for those not familiar w/ gawk, you can use a full regex for the
record separator.
I have a long RS because 1) I don't care about that data, & 2) there
is a variable amount of data there.
the problem I'm having is this:
the regex, as I'm using it matches the "start" at the begining of the
FILE, and the end at the END of the FILE.
I therefore only get 2 records printed.
I want to see all the records - I need my regex to match EVERY
occurance of the start...end "string".
any ideas?
tia - Bob
|
|
0
|
|
|
|
Reply
|
Bob
|
3/2/2004 10:34:45 PM |
|
["Followup-To:" header set to comp.lang.awk.]
On Tue, 02 Mar 2004 16:34:45 -0600, Bob
<nospam_nsh@starnetwx.net> wrote:
>
> here's my gawk code:
> BEGIN { RS="(start.*end)*" }
> {
> print "---\n"$0"\n===";
> }
>
>
> for those not familiar w/ gawk, you can use a full regex for the
> record separator.
>
> I have a long RS because 1) I don't care about that data, & 2) there
> is a variable amount of data there.
>
> the problem I'm having is this:
> the regex, as I'm using it matches the "start" at the begining of the
> FILE, and the end at the END of the FILE.
>
> I therefore only get 2 records printed.
>
> I want to see all the records - I need my regex to match EVERY
> occurance of the start...end "string".
>
RS="start[^e]*end"
--
Incrsease your earoning poswer and gaerner profwessional resspect.
Get the Un1iversity Dewgree you have already earned.
[from the prestigious, non-accredited University of Spam!]
|
|
0
|
|
|
|
Reply
|
Bill
|
3/3/2004 7:53:30 AM
|
|
On Wed, 3 Mar 2004 02:53:30 -0500, Bill Marcum
<bmarcum@iglou.com.urgent> wrote:
>["Followup-To:" header set to comp.lang.awk.]
>On Tue, 02 Mar 2004 16:34:45 -0600, Bob
> <nospam_nsh@starnetwx.net> wrote:
>>
>> here's my gawk code:
>> BEGIN { RS="(start.*end)*" }
>> {
>> print "---\n"$0"\n===";
>> }
>>
>> the problem I'm having is this:
>> the regex, as I'm using it matches the "start" at the begining of the
>> FILE, and the end at the END of the FILE.
>>
>> I therefore only get 2 records printed.
>>
>> I want to see all the records - I need my regex to match EVERY
>> occurance of the start...end "string".
>>
>RS="start[^e]*end"
Bill - Tera-thanks!
that did the trick. Another question though; as I was playing around
with other permutations of your RE, trying to gain understanding as to
why your RE worked, and mine didn't; I discovered another strange
thing.
I THOUGHT that:
"start.*end" == "start[.]*end"
I found, in fact each of these RS regex's produced vastly different
results. I suppose that to understand why my original RE didn't work,
and yours did, I should re-read the order of precidence for gawk; but
in my last example, I can't imagine why the 2 RE's shouldn't be the
same.
can you shed any lite?
tx again ia!!!
Bob
|
|
0
|
|
|
|
Reply
|
Bob
|
3/3/2004 12:26:33 PM
|
|
On Wed, 03 Mar 2004 06:26:33 -0600, Bob <nospam_nsh@starnetwx.net>
wrote:
>>RS="start[^e]*end"
>
>Bill - Tera-thanks!
>
>that did the trick. Another question though; as I was playing around
>with other permutations of your RE, trying to gain understanding as to
>why your RE worked, and mine didn't; I discovered another strange
>thing.
>
>I THOUGHT that:
>"start.*end" == "start[.]*end"
OH MY GOD - what the hell was I thinking!!!
sorry to bother - I just released my brain fart..... ;-)
|
|
0
|
|
|
|
Reply
|
Bob
|
3/3/2004 12:58:41 PM
|
|
On Wed, 03 Mar 2004 06:26:33 -0600, Bob
<nospam_nsh@starnetwx.net> wrote:
>>> I want to see all the records - I need my regex to match EVERY
>>> occurance of the start...end "string".
>>>
>>RS="start[^e]*end"
>
> Bill - Tera-thanks!
>
> that did the trick. Another question though; as I was playing around
> with other permutations of your RE, trying to gain understanding as to
> why your RE worked, and mine didn't; I discovered another strange
> thing.
>
> I THOUGHT that:
> "start.*end" == "start[.]*end"
>
> I found, in fact each of these RS regex's produced vastly different
> results. I suppose that to understand why my original RE didn't work,
> and yours did, I should re-read the order of precidence for gawk; but
> in my last example, I can't imagine why the 2 RE's shouldn't be the
> same.
>
> can you shed any lite?
>
Regular expressions like "a.*b" are greedy; as the expression is
evaluated from left to right, each "*" matches the longest possible
string.
Actually, my "start[^e]*end" might not work if the letter "e" appears
between "start" and "end". A better solution might be
BEGIN{RS="end"}
{sub(/start.*/,"")}
--
Incrsease your earoning poswer and gaerner profwessional resspect.
Get the Un1iversity Dewgree you have already earned.
[from the prestigious, non-accredited University of Spam!]
|
|
0
|
|
|
|
Reply
|
Bill
|
3/7/2004 5:03:41 AM
|
|
|
4 Replies
323 Views
(page loaded in 0.093 seconds)
|