Hi,
I am using gawk for text parsing. Now I have a simple problem. I have
a string looking like this:
"Found 3 log items"
What I want to do is to extract the "3" only. The thing is that the
log might look like:
"Found 35 log items"
or
"Found 143 log items"
(never more than 3 digits thoug)
So I dont know the length of the numbers. But this is what I tried:
In gawk I use the function match that takes a a string and a regex as
input but I simply cant make it. Match returns the column where the
first token is found and the length of the found "substring". This is
the regex I used:
where = match($0, "[0-9]{1,3}"
But this returns 0 and -1 which means it is not found?
Is there som simple hello world method to extract all numbers from a
row?
|
|
0
|
|
|
|
Reply
|
di98mase
|
1/26/2009 4:42:58 PM |
|
di98mase wrote:
> Hi,
>
> I am using gawk for text parsing. Now I have a simple problem. I have
> a string looking like this:
> "Found 3 log items"
>
> What I want to do is to extract the "3" only. The thing is that the
> log might look like:
> "Found 35 log items"
> or
> "Found 143 log items"
> (never more than 3 digits thoug)
If it's always the second field just access $2.
awk '{print $2}'
Janis
>
> So I dont know the length of the numbers. But this is what I tried:
>
> In gawk I use the function match that takes a a string and a regex as
> input but I simply cant make it. Match returns the column where the
> first token is found and the length of the found "substring". This is
> the regex I used:
>
> where = match($0, "[0-9]{1,3}"
> But this returns 0 and -1 which means it is not found?
> Is there som simple hello world method to extract all numbers from a
> row?
|
|
0
|
|
|
|
Reply
|
Janis
|
1/26/2009 4:48:02 PM
|
|
Janis Papanagnou wrote:
> di98mase wrote:
>
>> Hi,
>>
>> I am using gawk for text parsing. Now I have a simple problem. I have
>> a string looking like this:
>> "Found 3 log items"
>>
>> What I want to do is to extract the "3" only. The thing is that the
>> log might look like:
>> "Found 35 log items"
>> or
>> "Found 143 log items"
>> (never more than 3 digits thoug)
>
>
> If it's always the second field just access $2.
>
> awk '{print $2}'
Or if it's really only available through a string try this example...
awk 'BEGIN{s="Found 35 log items"
split(s,a) ; print a[2]
}'
Or if you really think you need regexps play around with...
awk 'BEGIN{s="Found 35 log items"
match(s,/[0-9]+/)
print substr(s,RSTART,RLENGTH)
}'
> Janis
>
>>
>> So I dont know the length of the numbers. But this is what I tried:
>>
>> In gawk I use the function match that takes a a string and a regex as
>> input but I simply cant make it. Match returns the column where the
>> first token is found and the length of the found "substring". This is
>> the regex I used:
>>
>> where = match($0, "[0-9]{1,3}"
>> But this returns 0 and -1 which means it is not found?
>> Is there som simple hello world method to extract all numbers from a
>> row?
|
|
0
|
|
|
|
Reply
|
Janis
|
1/26/2009 5:02:13 PM
|
|
On 26 Jan, 18:02, Janis Papanagnou <janis_papanag...@hotmail.com>
wrote:
> Janis Papanagnou wrote:
> > di98mase wrote:
>
> >> Hi,
>
> >> I am using gawk for text parsing. Now I have a simple problem. I have
> >> a string looking like this:
> >> "Found 3 log items"
>
> >> What I want to do is to extract the "3" only. The thing is that the
> >> log might look like:
> >> "Found 35 log items"
> >> or
> >> "Found 143 log items"
> >> (never more than 3 digits thoug)
>
> > If it's always the second field just access $2.
>
> > =A0 awk '{print $2}'
>
> Or if it's really only available through a string try this example...
>
> awk 'BEGIN{s=3D"Found 35 log items"
> =A0 =A0 =A0 =A0 =A0 =A0 split(s,a) ; print a[2]
> =A0 =A0 =A0 =A0 =A0 =A0}'
>
> Or if you really think you need regexps play around with...
>
> awk 'BEGIN{s=3D"Found 35 log items"
> =A0 =A0 =A0 =A0 =A0 =A0 match(s,/[0-9]+/)
> =A0 =A0 =A0 =A0 =A0 =A0 print substr(s,RSTART,RLENGTH)
> =A0 =A0 =A0 =A0 =A0 =A0}'
>
> > Janis
>
> >> So I dont know the length of the numbers. But this is what I tried:
>
> >> In gawk I use the function match that takes a a string and a regex as
> >> input but I simply cant make it. Match returns the column where the
> >> first token is found and the length of the found "substring". This is
> >> the regex I used:
>
> >> =A0 =A0where =3D match($0, "[0-9]{1,3}"
> >> But this returns 0 and -1 which means it is not found?
> >> Is there som simple hello world method to extract all numbers from a
> >> row?
>
>
Hi Janis,
thx for your elaborate and simple solution. you are right, why not
just use the $2? I got so into regex that I couldnt bare the thought
of not solving it using regex. I simply didnt see the wood because of
all trees:)
thx
|
|
0
|
|
|
|
Reply
|
di98mase
|
1/26/2009 8:12:54 PM
|
|
In article <0326a161-f48e-4ca7-be04-d96aa53dbb45@m4g2000vbp.googlegroups.com>,
<r.p.loui@gmail.com> wrote:
>But this does raise the regexp question of why [0-9]{1,3} is not
>supported.
Gawk currently requires --re-interval on the command line to enable
interval expressions. This will change in the development release one day.
Arnold
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
|
|
0
|
|
|
|
Reply
|
arnold
|
1/29/2009 6:25:40 PM
|
|
On Jan 26, 11:42=A0am, di98mase <sebastian.madu...@gmail.com> wrote:
> Hi,
>
> I am using gawk for text parsing. Now I have a simple problem. I have
> a string looking like this:
> "Found 3 log items"
>
> What I want to do is to extract the "3" only. The thing is that the
> log might look like:
> "Found 35 log items"
> or
> "Found 143 log items"
> (never more than 3 digits thoug)
>
> So I dont know the length of the numbers. But this is what I tried:
>
> In gawk I use the function match that takes a a string and a regex as
> input but I simply cant make it. Match returns the column where the
> first token is found and the length of the found "substring". This is
> the regex I used:
>
> =A0 =A0where =3D match($0, "[0-9]{1,3}"
> But this returns 0 and -1 which means it is not found?
> Is there som simple hello world method to extract all numbers from a
> row?
Janis is exactly right that this screams for the use of $2 ...
In fact, you can say
$0 =3D x; print $2
if you don't need $0 (often you do!) and you don't mind re-parsing all
NF fields. The split is elegant, and Janis just taught me that you
can omit the regexp...
But this does raise the regexp question of why [0-9]{1,3} is not
supported. You find various levels of support for complex regexps
throughout Unix tools. I would be happy with any of
/[0-9]+/
/[0-9][0-9]?[0-9]?/
/[0-9][^ ]*/
as a regexp that would work in your case.
|
|
0
|
|
|
|
Reply
|
r
|
1/29/2009 7:42:11 PM
|
|
|
5 Replies
300 Views
(page loaded in 0.343 seconds)
|