confused with regular expression

  • Follow


Sorry
I am not sure what was wrong. 

I have a file which has two pattern

5673058-
AE-848/14271200-

in command line I typed:
awk '/^[0-9]/ { print }' file
, tried to get "5673058-" but it printed nothing
so I tried to type
awk --re-interval '/[0-9]{7}/ { print}' file
strangely, it didn't print "5673058-" but instead it printed "AE-848/14271200-"

my gawk version is 3.1.4

Thanks all

Jui-Hua
0
Reply juihuahsieh 11/3/2004 4:00:08 AM


moggces wrote:
> Sorry
> I am not sure what was wrong. 
> 
> I have a file which has two pattern
> 
> 5673058-
> AE-848/14271200-
> 
> in command line I typed:
> awk '/^[0-9]/ { print }' file
> , tried to get "5673058-" but it printed nothing
> so I tried to type
> awk --re-interval '/[0-9]{7}/ { print}' file
> strangely, it didn't print "5673058-" but instead it printed "AE-848/14271200-"
> 
> my gawk version is 3.1.4

Mine is 3.1.3 and it works as expected:

$ printf "5673058-\nAE-848/14271200-\n" | awk '/^[0-9]/ { print }'
5673058-

$ printf "5673058-\nAE-848/14271200-\n" | awk --re-interval '/[0-9]{7}/ 
{ print }'
5673058-
AE-848/14271200-

$ awk --version
GNU Awk 3.1.3

So, at least you know it isn't your RE(s). If grep or sed don't find 
them, it's your file that's got control chars or something, otherwise 
it's your gawk version.

Regards,

	Ed.
0
Reply Ed 11/3/2004 5:53:29 AM


In article <BJ6dnfoEnfzB7xXcRVn-gw@comcast.com>,
Ed Morton  <morton@lsupcaemnt.com> wrote:
>moggces wrote:
>> Sorry
>> I am not sure what was wrong. 
>> 
>> I have a file which has two pattern
>> 
>> 5673058-
>> AE-848/14271200-
>> 
>> in command line I typed:
>> awk '/^[0-9]/ { print }' file
>> , tried to get "5673058-" but it printed nothing
>> so I tried to type
>> awk --re-interval '/[0-9]{7}/ { print}' file
>> strangely, it didn't print "5673058-" but instead it printed
>"AE-848/14271200-"
>> 
>> my gawk version is 3.1.4
>
>Mine is 3.1.3 and it works as expected:
>
>$ printf "5673058-\nAE-848/14271200-\n" | awk '/^[0-9]/ { print }'
>5673058-
>
>$ printf "5673058-\nAE-848/14271200-\n" | awk --re-interval '/[0-9]{7}/ 
>{ print }'
>5673058-
>AE-848/14271200-
>
>$ awk --version
>GNU Awk 3.1.3
>
>So, at least you know it isn't your RE(s). If grep or sed don't find 
>them, it's your file that's got control chars or something, otherwise 
>it's your gawk version.
>
>Regards,
>
>	Ed.

Actually, I'm 99% certain it's not the gawk version, but the locale; the
original poster is in China.  It's probably better to use

	/^[[:digit:]]/

and

	/^[[:digit:]]{7}/

instead of [0-9].  I suspect that if the original poster uses

	export LC_ALL=C

before running gawk, things will work as expected.

Arnold
-- 
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd.	arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381	Fax: +1 206 350 8765
Nof Ayalon		Cell Phone: +972 50  729-7545
D.N. Shimshon 99785	ISRAEL
0
Reply arnold 11/3/2004 9:09:28 AM

arnold@skeeve.com (Aharon Robbins) wrote in message news:<4188a048@news.012.net.il>...
> In article <BJ6dnfoEnfzB7xXcRVn-gw@comcast.com>,
> Ed Morton  <morton@lsupcaemnt.com> wrote:
> >moggces wrote:
> >> Sorry
> >> I am not sure what was wrong. 
> >> 
> >> I have a file which has two pattern
> >> 
> >> 5673058-
> >> AE-848/14271200-
> >> 
> >> in command line I typed:
> >> awk '/^[0-9]/ { print }' file
> >> , tried to get "5673058-" but it printed nothing
> >> so I tried to type
> >> awk --re-interval '/[0-9]{7}/ { print}' file
> >> strangely, it didn't print "5673058-" but instead it printed
>  "AE-848/14271200-"
> >> 
> >> my gawk version is 3.1.4
> >
> >Mine is 3.1.3 and it works as expected:
> >
> >$ printf "5673058-\nAE-848/14271200-\n" | awk '/^[0-9]/ { print }'
> >5673058-
> >
> >$ printf "5673058-\nAE-848/14271200-\n" | awk --re-interval '/[0-9]{7}/ 
> >{ print }'
> >5673058-
> >AE-848/14271200-
> >
> >$ awk --version
> >GNU Awk 3.1.3
> >
> >So, at least you know it isn't your RE(s). If grep or sed don't find 
> >them, it's your file that's got control chars or something, otherwise 
> >it's your gawk version.
> >
> >Regards,
> >
> >	Ed.
> 
> Actually, I'm 99% certain it's not the gawk version, but the locale; the
> original poster is in China.  It's probably better to use
> 
> 	/^[[:digit:]]/
> 
> and
> 
> 	/^[[:digit:]]{7}/
> 
> instead of [0-9].  I suspect that if the original poster uses
> 
> 	export LC_ALL=C
> 
> before running gawk, things will work as expected.
> 
> Arnold



It didn't work when I use /^[[:digit:]]/ without "export LC_ALL=C"

and both /^[0-9]/ and /^[[:digit:]]/ could work after "export LC_ALL=C" 

Thanks very much for all responses

Jui-Hua 
from Taiwan
0
Reply juihuahsieh 11/5/2004 1:40:23 AM

On 2 Nov 2004 20:00:08 -0800, moggces 
  <juihuahsieh@nhri.org.tw> wrote:
> Sorry
> I am not sure what was wrong. 
>
> I have a file which has two pattern
>
> 5673058-
> AE-848/14271200-
>
> in command line I typed: awk '/^[0-9]/ { print }' file , tried to get
> "5673058-" but it printed nothing so I tried to type awk --re-interval
> '/[0-9]{7}/ { print}' file strangely, it didn't print "5673058-" but
> instead it printed "AE-848/14271200-"
>
Perhaps the file contains a CR character, so that it prints
"AE-848/14271200-" on top of "5673058-".

-- 
Of course power tools and alcohol don't mix.  Everyone knows power
tools aren't soluble in alcohol...
		-- Crazy Nigel
0
Reply Bill 11/5/2004 5:18:57 AM

4 Replies
75 Views

(page loaded in 0.091 seconds)

5/21/2013 2:35:29 AM


Reply: