Hi,
I'm trying to write a script that first reads a list of words, and
then for each file
in the argument list, checks for all occurrences of the words in all
lines and writes a file
with the words that were present.
Pretty simple, eh?
Actually not, if some words can be substrings of others, and you're
using mawk.
I wrote a script that I thought should work -- and it does work on my
testcase in gawk --
but mawk seems not to recognize "\<", "\>", or "\y".
A simplified version of my script, "findvars", looks like:
-------------
#!/usr/bin/awk -f
#
# findvars
#
BEGIN { IGNORECASE = 1; namesfile = ""; }
NR==1 {
if (length(namesfile) == 0) exit # actually with an error message
while ((getline nraw < namesfile) > 0
{ varre[nraw] = "\\<" nraw "\\>"; ++nn }
}
#
NF==0 { next }
/^[Cc]/ { next } # guess what kind of files I'm processing
/^ *[!]/ { next }
#
{
sub(/[!].*$/, "")
for (nraw in varre)
if ($0 ~ varre[nraw]) { used[nraw] = 1; ++nm }
}
#
END {
if (nm)
{ usesfile = "used-from-" namesfile;
for (nraw in used)
{ if (np++)
printf ",%s", nraw >> usesfile;
else
printf ", only: %s", nraw > usesfile;
}
printf "\n" >> usesfile;
}
-------------
A typical file of names would be "barfmodvars" looking like this:
--------------
xbarf
honk
honker
splat
xbarf1
xbarf2
barf
--------------
(these would be variables declared in a Fortran-90 module)
A typical file to process would be "sloppycode.f" with lines like
this:
--------------
subroutine sloppycode(in1,in2,outs)
use barfmod
implicit real(a-h,o-z)
splat(:) = in1*xbarf1(:)+in2*xbarf2(:)
outs = sum(splat)/honker
return
end
--------------
Then we apply the script with a command, e.g.
~> ./finduses namesfile=barfmodvars sloppycode.f
....and hope to get a file, "used-from-barfmodvars", with a ",
only: ..." clause
that we can paste into the source file.
This works fine with gawk (at home), but the computers at work have
mawk.
On mawk, the word boundary metacharacters apparently don't work,
and the result is no matches for any names.
The man-page for mawk says it uses "extended regular expressions as
with egrep (1)."
However, the local egrep recognizes word boundary metacharacters just
fine.
Am I missing something here, or is there a work-around for this?
Or should I just download and install gawk?
Regards,
Fred
|
|
0
|
|
|
|
Reply
|
fred
|
5/8/2009 1:17:52 AM |
|
Mawk doesn't support the word matching operators. I think it doesn't support
IGNORECASE either. If you need those features, just download and install gawk.
Arnold
In article <a6c66d76-de22-44e7-91a0-f2f913c01a32@r13g2000vbr.googlegroups.com>,
<fred.hawes@gmail.com> wrote:
>Hi,
>
>I'm trying to write a script that first reads a list of words, and
>then for each file
>in the argument list, checks for all occurrences of the words in all
>lines and writes a file
>with the words that were present.
>
>Pretty simple, eh?
>
>Actually not, if some words can be substrings of others, and you're
>using mawk.
>I wrote a script that I thought should work -- and it does work on my
>testcase in gawk --
>but mawk seems not to recognize "\<", "\>", or "\y".
>
>A simplified version of my script, "findvars", looks like:
>-------------
>#!/usr/bin/awk -f
>#
># findvars
>#
>BEGIN { IGNORECASE = 1; namesfile = ""; }
>NR==1 {
> if (length(namesfile) == 0) exit # actually with an error message
>
> while ((getline nraw < namesfile) > 0
> { varre[nraw] = "\\<" nraw "\\>"; ++nn }
>}
>#
>NF==0 { next }
>/^[Cc]/ { next } # guess what kind of files I'm processing
>/^ *[!]/ { next }
>#
>{
> sub(/[!].*$/, "")
> for (nraw in varre)
> if ($0 ~ varre[nraw]) { used[nraw] = 1; ++nm }
>}
>#
>END {
> if (nm)
> { usesfile = "used-from-" namesfile;
> for (nraw in used)
> { if (np++)
> printf ",%s", nraw >> usesfile;
> else
> printf ", only: %s", nraw > usesfile;
> }
> printf "\n" >> usesfile;
>}
>-------------
>
>A typical file of names would be "barfmodvars" looking like this:
>--------------
>xbarf
>honk
>honker
>splat
>xbarf1
>xbarf2
>barf
>--------------
>(these would be variables declared in a Fortran-90 module)
>
>A typical file to process would be "sloppycode.f" with lines like
>this:
>--------------
> subroutine sloppycode(in1,in2,outs)
> use barfmod
> implicit real(a-h,o-z)
>
> splat(:) = in1*xbarf1(:)+in2*xbarf2(:)
> outs = sum(splat)/honker
> return
> end
>--------------
>
>Then we apply the script with a command, e.g.
>~> ./finduses namesfile=barfmodvars sloppycode.f
>
>...and hope to get a file, "used-from-barfmodvars", with a ",
>only: ..." clause
>that we can paste into the source file.
>
>This works fine with gawk (at home), but the computers at work have
>mawk.
>On mawk, the word boundary metacharacters apparently don't work,
>and the result is no matches for any names.
>
>The man-page for mawk says it uses "extended regular expressions as
>with egrep (1)."
>However, the local egrep recognizes word boundary metacharacters just
>fine.
>
>Am I missing something here, or is there a work-around for this?
>Or should I just download and install gawk?
>
>Regards,
>Fred
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
|
|
0
|
|
|
|
Reply
|
arnold
|
5/8/2009 8:14:05 AM
|
|
|
1 Replies
230 Views
(page loaded in 0.041 seconds)
|