here is a snippet from a large program that behaves the same way in a
standalone "file":
{gsub(" the | of "," ")}1
echo hair of the dog | gawk -f file
hair the dog
was expecting: hair dog
same output if {gsub(" of | the "," ")}1 is in "file"
my question is why? Am I misunderstanding the "|" operator in gsub or is
there something magic about " the |" or "| the"? same for:
Gnu Awk (gawk) 3.0, patchlevel 0
and
GNU Awk 3.1.6
windows XP pro sp2.
--
(^\pop/^)
I'm lost... I've gone to look for myself.
If I should return before I get back, keep me here.
--
|
|
0
|
|
|
|
Reply
|
pop
|
5/25/2009 12:54:33 PM |
|
pop wrote:
> here is a snippet from a large program that behaves the same way in a
> standalone "file":
> {gsub(" the | of "," ")}1
> echo hair of the dog | gawk -f file
> hair the dog
> was expecting: hair dog
>
> same output if {gsub(" of | the "," ")}1 is in "file"
>
> my question is why?
See this simpler example that might make it more apparent...
echo abc | awk '{gsub(/ab|bc/,"X")}1'
In your case; why do you think that the blank between "hair of"
and "the dog" should be considered in two substitutions?
Now try...
echo abbc | awk '{gsub(/ab|bc/,"X")}1'
or (with two inner spaces)...
echo "hair of the dog" | awk '{gsub(/ the | of /," ")}1'
to see what happens. Makes sense, don't you think?
Janis
> Am I misunderstanding the "|" operator in gsub or is
> there something magic about " the |" or "| the"? same for:
> Gnu Awk (gawk) 3.0, patchlevel 0
> and
> GNU Awk 3.1.6
>
> windows XP pro sp2.
|
|
0
|
|
|
|
Reply
|
Janis
|
5/25/2009 1:07:24 PM
|
|
On Monday 25 May 2009 14:54, pop wrote:
> here is a snippet from a large program that behaves the same way in a
> standalone "file":
> {gsub(" the | of "," ")}1
> echo hair of the dog | gawk -f file
> hair the dog
> was expecting: hair dog
The first match encountered in the string is " of ", which is replaced
with " ", leaving
hair the dog
gsub then continues to scan the string starting from the "t" in "the" (which
is the character after the end of the previous match), and no further match
s found. So the end result you see is correct.
|
|
0
|
|
|
|
Reply
|
pk
|
5/25/2009 1:19:35 PM
|
|
pk said the following on 5/25/2009 8:19 AM:
> On Monday 25 May 2009 14:54, pop wrote:
>
>> here is a snippet from a large program that behaves the same way in a
>> standalone "file":
>> {gsub(" the | of "," ")}1
>> echo hair of the dog | gawk -f file
>> hair the dog
>> was expecting: hair dog
>
> The first match encountered in the string is " of ", which is replaced
> with " ", leaving
>
> hair the dog
>
> gsub then continues to scan the string starting from the "t" in "the" (which
> is the character after the end of the previous match), and no further match
> s found. So the end result you see is correct.
>
OK - thanks; that makes sense now but it was sure unexpected. Solved by
using separate gsubs. Sure do encounter unexpected situations in the
programming world :)
--
(^\pop/^)
I'm lost... I've gone to look for myself.
If I should return before I get back, keep me here.
--
|
|
0
|
|
|
|
Reply
|
pop
|
5/25/2009 1:25:30 PM
|
|
Janis Papanagnou said the following on 5/25/2009 8:07 AM:
> pop wrote:
>> here is a snippet from a large program that behaves the same way in a
>> standalone "file":
>> {gsub(" the | of "," ")}1
>> echo hair of the dog | gawk -f file
>> hair the dog
>> was expecting: hair dog
>>
>> same output if {gsub(" of | the "," ")}1 is in "file"
>>
>> my question is why?
>
> See this simpler example that might make it more apparent...
>
> echo abc | awk '{gsub(/ab|bc/,"X")}1'
>
> In your case; why do you think that the blank between "hair of"
> and "the dog" should be considered in two substitutions?
>
> Now try...
>
> echo abbc | awk '{gsub(/ab|bc/,"X")}1'
>
> or (with two inner spaces)...
>
> echo "hair of the dog" | awk '{gsub(/ the | of /," ")}1'
>
> to see what happens. Makes sense, don't you think?
>
> Janis
>
>> Am I misunderstanding the "|" operator in gsub or is there something
>> magic about " the |" or "| the"? same for:
>> Gnu Awk (gawk) 3.0, patchlevel 0
>> and
>> GNU Awk 3.1.6
>>
>> windows XP pro sp2.
OK - thanks; that makes sense now but it was sure unexpected. Solved by
using separate gsubs. Sure do encounter unexpected situations in the
programming world :)
--
(^\pop/^)
I'm lost... I've gone to look for myself.
If I should return before I get back, keep me here.
--
|
|
0
|
|
|
|
Reply
|
pop
|
5/25/2009 1:26:00 PM
|
|
On May 25, 8:25=A0am, pop <p_...@hotmail.com> wrote:
> pk said the following on 5/25/2009 8:19 AM:
>
>
>
> > On Monday 25 May 2009 14:54, pop wrote:
>
> >> here is a snippet from a large program that behaves the same way in a
> >> standalone "file":
> >> {gsub(" the | of "," ")}1
> >> echo hair of the dog | gawk -f file
> >> hair the dog
> >> was expecting: hair =A0dog
>
> > The first match encountered in the string is " of ", which is replaced
> > with " ", leaving
>
> > hair the dog
>
> > gsub then continues to scan the string starting from the "t" in "the" (=
which
> > is the character after the end of the previous match), and no further m=
atch
> > s found. So the end result you see is correct.
>
> OK - thanks; that makes sense now but it was sure unexpected. Solved by
> using separate gsubs. Sure do encounter unexpected situations in the
> programming world :)
There's almost certainly no need for coding separate calls to gsub().
If you tell us IN WORDS what it is you're trying to do, we could
probably tell you how to do it by writing one call to gsub(). For
example, if you don't really want to add spuriuous white-space between
words, then one of these might be what you really want:
$ echo hair of the dog | awk '{while (gsub(/(the|of) /," "));}1'
hair dog
$ echo hair of the dog | awk '{gsub(/ (the|of)\>/,"")}1'
hair dog
That last one is, I believe, gawk specific.
Regards,
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
5/25/2009 2:55:32 PM
|
|
In article <gve50c$k5q$1@aioe.org>, pk <pk@pk.invalid> wrote:
>On Monday 25 May 2009 14:54, pop wrote:
>
>> here is a snippet from a large program that behaves the same way in a
>> standalone "file":
>> {gsub(" the | of "," ")}1
>> echo hair of the dog | gawk -f file
>> hair the dog
>> was expecting: hair dog
>
>The first match encountered in the string is " of ", which is replaced
>with " ", leaving
>
>hair the dog
>
>gsub then continues to scan the string starting from the "t" in "the" (which
>is the character after the end of the previous match), and no further match
>s found. So the end result you see is correct.
Note that, sometimes, what you want is:
while (sub(...));
instead of gsub(). I use this from time to time; it just keeps doing
the sub until it fails.
|
|
0
|
|
|
|
Reply
|
gazelle
|
5/25/2009 3:07:22 PM
|
|
Ed Morton said the following on 5/25/2009 9:55 AM:
> On May 25, 8:25 am, pop <p_...@hotmail.com> wrote:
>> pk said the following on 5/25/2009 8:19 AM:
>>
>>
>>
<snip>
>> OK - thanks; that makes sense now but it was sure unexpected. Solved by
>> using separate gsubs. Sure do encounter unexpected situations in the
>> programming world :)
>
> There's almost certainly no need for coding separate calls to gsub().
> If you tell us IN WORDS what it is you're trying to do, we could
> probably tell you how to do it by writing one call to gsub(). For
> example, if you don't really want to add spuriuous white-space between
> words, then one of these might be what you really want:
>
> $ echo hair of the dog | awk '{while (gsub(/(the|of) /," "));}1'
> hair dog
>
> $ echo hair of the dog | awk '{gsub(/ (the|of)\>/,"")}1'
> hair dog
>
> That last one is, I believe, gawk specific.
>
> Regards,
>
> Ed.
Thanks - atually I used:
echo hair of the dog|awk '{while(gsub(" (the|of) "," "));}1'
hair dog
in the final program to accomplish what I needed. As far as an
explanation; I was removing all "connectors/articles,etc." such as
"the,a,an,and,or,in,if,..." from title of books,movies,plays,etc. jut
leaving the significant keywords for a personal project I am working on.
--
(^\pop/^)
I'm lost... I've gone to look for myself.
If I should return before I get back, keep me here.
--
|
|
0
|
|
|
|
Reply
|
pop
|
5/25/2009 3:44:55 PM
|
|
On May 25, 10:44=A0am, pop <p_...@hotmail.com> wrote:
> Ed Morton said the following on 5/25/2009 9:55 AM:
>
>
>
> > On May 25, 8:25 am, pop <p_...@hotmail.com> wrote:
> >> pk said the following on 5/25/2009 8:19 AM:
>
> <snip>
> >> OK - thanks; that makes sense now but it was sure unexpected. Solved b=
y
> >> using separate gsubs. Sure do encounter unexpected situations in the
> >> programming world :)
>
> > There's almost certainly no need for coding separate calls to gsub().
> > If you tell us IN WORDS what it is you're trying to do, we could
> > probably tell you how to do it by writing one call to gsub(). For
> > example, if you don't really want to add spuriuous white-space between
> > words, then one of these might be what you really want:
>
> > $ echo hair of the dog | awk '{while (gsub(/(the|of) /," "));}1'
> > hair dog
>
> > $ echo hair of the dog | awk '{gsub(/ (the|of)\>/,"")}1'
> > hair dog
>
> > That last one is, I believe, gawk specific.
>
> > Regards,
>
> > =A0 =A0 Ed.
>
> Thanks - atually I used:
> echo hair of the dog|awk '{while(gsub(" (the|of) "," "));}1'
> hair dog
> in the final program to accomplish what I needed. As far as an
> explanation; I was removing all "connectors/articles,etc." such as
> "the,a,an,and,or,in,if,..." from title of books,movies,plays,etc. jut
> leaving the significant keywords for a personal project I am working on.
>
The way you're approaching it won't work, then, as it'll leave in
words suffixed with punctuation marks, words that contain
capitalisation, etc. You could do this instead with GNU awk:
$ echo "The ghost walked in!" | awk 'BEGIN{IGNORECASE=3D1}
{gsub(/\<(the|a|an|and|or|in|if)\>/,"")}1'
Regards,
Ed.
|
|
0
|
|
|
|
Reply
|
Ed
|
5/25/2009 4:53:26 PM
|
|
Ed Morton said the following on 5/25/2009 11:53 AM:
> On May 25, 10:44 am, pop <p_...@hotmail.com> wrote:
>> Ed Morton said the following on 5/25/2009 9:55 AM:
>>
>>
>>
>>> On May 25, 8:25 am, pop <p_...@hotmail.com> wrote:
>>>> pk said the following on 5/25/2009 8:19 AM:
>> <snip>
>>>> OK - thanks; that makes sense now but it was sure unexpected. Solved by
>>>> using separate gsubs. Sure do encounter unexpected situations in the
>>>> programming world :)
>>> There's almost certainly no need for coding separate calls to gsub().
>>> If you tell us IN WORDS what it is you're trying to do, we could
>>> probably tell you how to do it by writing one call to gsub(). For
>>> example, if you don't really want to add spuriuous white-space between
>>> words, then one of these might be what you really want:
>>> $ echo hair of the dog | awk '{while (gsub(/(the|of) /," "));}1'
>>> hair dog
>>> $ echo hair of the dog | awk '{gsub(/ (the|of)\>/,"")}1'
>>> hair dog
>>> That last one is, I believe, gawk specific.
>>> Regards,
>>> Ed.
>> Thanks - atually I used:
>> echo hair of the dog|awk '{while(gsub(" (the|of) "," "));}1'
>> hair dog
>> in the final program to accomplish what I needed. As far as an
>> explanation; I was removing all "connectors/articles,etc." such as
>> "the,a,an,and,or,in,if,..." from title of books,movies,plays,etc. jut
>> leaving the significant keywords for a personal project I am working on.
>>
>
> The way you're approaching it won't work, then, as it'll leave in
> words suffixed with punctuation marks, words that contain
> capitalisation, etc. You could do this instead with GNU awk:
>
> $ echo "The ghost walked in!" | awk 'BEGIN{IGNORECASE=1}
> {gsub(/\<(the|a|an|and|or|in|if)\>/,"")}1'
>
> Regards,
>
> Ed.
>
Good idea! Thanks, I hadn't thought of that... I'll use it.
--
(^\pop/^)
I'm lost... I've gone to look for myself.
If I should return before I get back, keep me here.
--
|
|
0
|
|
|
|
Reply
|
pop
|
5/25/2009 5:26:50 PM
|
|
|
9 Replies
357 Views
(page loaded in 0.156 seconds)
Similiar Articles: question about gsub - comp.lang.awkhere is a snippet from a large program that behaves the same way in a standalone "file": {gsub(" the | of "," ")}1 echo hair of the dog | gawk -f fil... GNU-awk bug in sub()/gsub() - comp.lang.awkquestion about gsub - comp.lang.awk GNU-awk bug in sub()/gsub() - comp.lang.awk question about gsub - comp.lang.awk GNU-awk bug in sub()/gsub() - comp.lang.awk question ... fprint syntax for repeating fields - comp.lang.awkquestion about gsub - comp.lang.awk fprint syntax for repeating fields - comp.lang.awk question about gsub - comp.lang.awk fprint syntax for repeating fields - comp.lang ... Data with "," and field sepeator is ",", How to handle this - comp ...... Post Question | Groups | ... sep) : !setcsv(str "\"", sep) > =C2=A0 =C2=A0} else { > =C2=A0 =C2=A0 =C2=A0 gsub ... Problem With Multiple Field Separators - comp.lang.awkquestion about gsub - comp.lang.awk Problem With Multiple Field Separators - comp.lang.awk Export text with a tilde as field seperator - comp.databases ... question about ... substitute string for ascii control character - comp.lang.awk ...... Post Question | Groups | ... Any help with the gsub function or another approach will be greatly appreciated. case insenitive gsub - comp.lang.awk... Post Question | Groups | ... BEGIN { gsub(/\$/,"\\$",sd2) # put an escape in front of $'s so that they ... Newbie Question: delete all non alphanumeric characters - comp ...> puts x > x.gsub!(/\W/, '') > puts x ... Newbie Question: delete all non alphanumeric characters - comp ... On Jul 21, 2006, at 2 ... single quotes in gsub - comp.lang.awkquestion about gsub - comp.lang.awk single quotes in gsub - comp.lang.awk... Post Question | Groups | ... $ cat foo.awk {gsub(/hey 'joe'/,"hey 'bob'");print} This way, you ... compiled 3.1.8 gawk for windows - comp.lang.awkquestion about gsub - comp.lang.awk compiled 3.1.8 gawk for windows - comp.lang.awk question about gsub - comp.lang.awk... misunderstanding the "|" operator in gsub or is ... question about gsub - comp.lang.awk | Computer Grouphere is a snippet from a large program that behaves the same way in a standalone "file": {gsub(" the | of "," ")}1 echo hair of the dog | gawk -f fil... Easy question about awk gsub - The UNIX and Linux ForumsUNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !! 7/27/2012 10:41:15 AM
|