How to get only numbers from a row using match() and regex?

  • Follow


Hi,

I am using gawk for text parsing. Now I have a simple problem. I have
a string looking like this:
"Found 3 log items"

What I want to do is to extract the "3" only. The thing is that the
log might look like:
"Found 35 log items"
or
"Found 143 log items"
(never more than 3 digits thoug)

So I dont know the length of the numbers. But this is what I tried:

In gawk I use the function match that takes a a string and a regex as
input but I simply cant make it. Match returns the column where the
first token is found and the length of the found "substring". This is
the regex I used:

   where = match($0, "[0-9]{1,3}"
But this returns 0 and -1 which means it is not found?
Is there som simple hello world method to extract all numbers from a
row?
0
Reply di98mase 1/26/2009 4:42:58 PM

di98mase wrote:
> Hi,
> 
> I am using gawk for text parsing. Now I have a simple problem. I have
> a string looking like this:
> "Found 3 log items"
> 
> What I want to do is to extract the "3" only. The thing is that the
> log might look like:
> "Found 35 log items"
> or
> "Found 143 log items"
> (never more than 3 digits thoug)

If it's always the second field just access $2.

   awk '{print $2}'


Janis

> 
> So I dont know the length of the numbers. But this is what I tried:
> 
> In gawk I use the function match that takes a a string and a regex as
> input but I simply cant make it. Match returns the column where the
> first token is found and the length of the found "substring". This is
> the regex I used:
> 
>    where = match($0, "[0-9]{1,3}"
> But this returns 0 and -1 which means it is not found?
> Is there som simple hello world method to extract all numbers from a
> row?
0
Reply Janis 1/26/2009 4:48:02 PM


Janis Papanagnou wrote:
> di98mase wrote:
> 
>> Hi,
>>
>> I am using gawk for text parsing. Now I have a simple problem. I have
>> a string looking like this:
>> "Found 3 log items"
>>
>> What I want to do is to extract the "3" only. The thing is that the
>> log might look like:
>> "Found 35 log items"
>> or
>> "Found 143 log items"
>> (never more than 3 digits thoug)
> 
> 
> If it's always the second field just access $2.
> 
>   awk '{print $2}'

Or if it's really only available through a string try this example...

awk 'BEGIN{s="Found 35 log items"
            split(s,a) ; print a[2]
           }'

Or if you really think you need regexps play around with...

awk 'BEGIN{s="Found 35 log items"
            match(s,/[0-9]+/)
            print substr(s,RSTART,RLENGTH)
           }'


> Janis
> 
>>
>> So I dont know the length of the numbers. But this is what I tried:
>>
>> In gawk I use the function match that takes a a string and a regex as
>> input but I simply cant make it. Match returns the column where the
>> first token is found and the length of the found "substring". This is
>> the regex I used:
>>
>>    where = match($0, "[0-9]{1,3}"
>> But this returns 0 and -1 which means it is not found?
>> Is there som simple hello world method to extract all numbers from a
>> row?
0
Reply Janis 1/26/2009 5:02:13 PM

On 26 Jan, 18:02, Janis Papanagnou <janis_papanag...@hotmail.com>
wrote:
> Janis Papanagnou wrote:
> > di98mase wrote:
>
> >> Hi,
>
> >> I am using gawk for text parsing. Now I have a simple problem. I have
> >> a string looking like this:
> >> "Found 3 log items"
>
> >> What I want to do is to extract the "3" only. The thing is that the
> >> log might look like:
> >> "Found 35 log items"
> >> or
> >> "Found 143 log items"
> >> (never more than 3 digits thoug)
>
> > If it's always the second field just access $2.
>
> > =A0 awk '{print $2}'
>
> Or if it's really only available through a string try this example...
>
> awk 'BEGIN{s=3D"Found 35 log items"
> =A0 =A0 =A0 =A0 =A0 =A0 split(s,a) ; print a[2]
> =A0 =A0 =A0 =A0 =A0 =A0}'
>
> Or if you really think you need regexps play around with...
>
> awk 'BEGIN{s=3D"Found 35 log items"
> =A0 =A0 =A0 =A0 =A0 =A0 match(s,/[0-9]+/)
> =A0 =A0 =A0 =A0 =A0 =A0 print substr(s,RSTART,RLENGTH)
> =A0 =A0 =A0 =A0 =A0 =A0}'
>
> > Janis
>
> >> So I dont know the length of the numbers. But this is what I tried:
>
> >> In gawk I use the function match that takes a a string and a regex as
> >> input but I simply cant make it. Match returns the column where the
> >> first token is found and the length of the found "substring". This is
> >> the regex I used:
>
> >> =A0 =A0where =3D match($0, "[0-9]{1,3}"
> >> But this returns 0 and -1 which means it is not found?
> >> Is there som simple hello world method to extract all numbers from a
> >> row?
>
>

Hi Janis,

thx for your elaborate and simple solution. you are right, why not
just use the $2? I got so into regex that I couldnt bare the thought
of not solving it using regex.  I simply didnt see the wood because of
all trees:)

thx
0
Reply di98mase 1/26/2009 8:12:54 PM

In article <0326a161-f48e-4ca7-be04-d96aa53dbb45@m4g2000vbp.googlegroups.com>,
 <r.p.loui@gmail.com> wrote:
>But this does raise the regexp question of why [0-9]{1,3} is not
>supported.

Gawk currently requires --re-interval on the command line to enable
interval expressions. This will change in the development release one day.

Arnold
-- 
Aharon (Arnold) Robbins 				arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50  729-7545
D.N. Shimshon 99785	ISRAEL
0
Reply arnold 1/29/2009 6:25:40 PM

On Jan 26, 11:42=A0am, di98mase <sebastian.madu...@gmail.com> wrote:
> Hi,
>
> I am using gawk for text parsing. Now I have a simple problem. I have
> a string looking like this:
> "Found 3 log items"
>
> What I want to do is to extract the "3" only. The thing is that the
> log might look like:
> "Found 35 log items"
> or
> "Found 143 log items"
> (never more than 3 digits thoug)
>
> So I dont know the length of the numbers. But this is what I tried:
>
> In gawk I use the function match that takes a a string and a regex as
> input but I simply cant make it. Match returns the column where the
> first token is found and the length of the found "substring". This is
> the regex I used:
>
> =A0 =A0where =3D match($0, "[0-9]{1,3}"
> But this returns 0 and -1 which means it is not found?
> Is there som simple hello world method to extract all numbers from a
> row?

Janis is exactly right that this screams for the use of $2 ...

In fact, you can say

     $0 =3D x; print $2

if you don't need $0 (often you do!) and you don't mind re-parsing all
NF fields.  The split is elegant, and Janis just taught me that you
can omit the regexp...

But this does raise the regexp question of why [0-9]{1,3} is not
supported.  You find various levels of support for complex regexps
throughout Unix tools.  I would be happy with any of

     /[0-9]+/
     /[0-9][0-9]?[0-9]?/
     /[0-9][^ ]*/

as a regexp that would work in your case.
0
Reply r 1/29/2009 7:42:11 PM

5 Replies
300 Views

(page loaded in 0.343 seconds)

Similiar Articles:













7/21/2012 9:35:40 PM


Reply: