efficiency in awk

  • Follow


Hi all,

I have a situation in which I have like 90 tests.  When considering memory
and speed, am I better off writing it as a series of if ... else if ...
statements (some nested), or one if test that has a long string of
conditions connected by || and &&?
I hope I am making sense here.

TIA, M.

-- 
Knowledge is power; power corrupts.
Study hard; be evil.
-----
Candy for spammers:
http://members.fortunecity.com/enderian/spamthis.shtml
0
Reply V 10/29/2003 12:32:32 PM

The subject of this post is a bit of a contradiction.

Awk scripts are written for purposes of ease, portability, prototype and
experiment (and any other of a host of diverse reasons why people love
using awk) and notably Awk is used when efficiency is not a requisite.

When efficiency becomes a requisite, it's time time to take your Awk
prototype script to another language.

This above is me repeating the mantras of "The Art of Programming" (TAOP)
by Brian Kernighan and Rob Pike, the former being one of the co-authors of
Awk language.  Look for it in your local library.

So to end the missionary work and answer your question:

Using combined boolean expressions rather than nested if-else statements
is not only clearer to read (in most cases) for other programmers
(including yourself), but it should be faster (never slower in most likely
all Awk implementions) in performance.

I'm amazed conditional statements or boolean expressions could be
suspected as performance problem areas.  Performance concerns of an Awk
script usually occur when the task expected of Awk is enormous (large
inputs, large files, complex algorithms) and performance weakens as a
result of iterations (while and for-loops or recursion) over large ranges
or sets of data.

/a

On Wed, 29 Oct 2003, V. Mark Lehky wrote:

> Hi all,
>
> I have a situation in which I have like 90 tests.  When considering memory
> and speed, am I better off writing it as a series of if ... else if ...
> statements (some nested), or one if test that has a long string of
> conditions connected by || and &&?
> I hope I am making sense here.
>
> TIA, M.
0
Reply Aaron 10/30/2003 12:57:14 AM


V. Mark Lehky <siking@myrealbox.com> wrote:
> Hi all,
> 
> I have a situation in which I have like 90 tests.  When considering memory
> and speed, am I better off writing it as a series of if ... else if ...
> statements (some nested), or one if test that has a long string of
> conditions connected by || and &&?
> I hope I am making sense here.

Well, try it one way.  If it fails, then try it another way.

-- 
William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>
Linux solution for data management and processing. 
0
Reply William 10/30/2003 3:50:56 AM

For speed, you might try using one of the publicly-available AWK compilers.
See, for example,

http://awka.sourceforge.net/index.html

for the AWKA compiler.

U Netter

William Park <opengeometry@yahoo.ca> wrote in message news:<bnq1qv$13n7vn$1@ID-99293.news.uni-berlin.de>...
> V. Mark Lehky <siking@myrealbox.com> wrote:
> > Hi all,
> > 
> > I have a situation in which I have like 90 tests.  When considering memory
> > and speed, am I better off writing it as a series of if ... else if ...
> > statements (some nested), or one if test that has a long string of
> > conditions connected by || and &&?
> > I hope I am making sense here.
> 
> Well, try it one way.  If it fails, then try it another way.
0
Reply unetter 10/30/2003 2:02:57 PM

It's hard to improve on Aaron's advice, so I'll attempt to merely
augment it.

(1) Get your script working correctly before you begin worrying about
speed.

(2) You might find that the program is 'fast enough' regardless of how
you code it. If this is a one-time program, you probably don't need
efficiency.

(3) I would think that I/O takes far longer than most of the string
processing. But if not, in some cases you might get efficiencies in
both coding/debugging, documentation, and execution by splitting your
task up into a series of simpler AWK programs. Each program would
perform some processing and then pass the output along to the next
program.

(4) Back to your original question, I agree that there is probably
little difference between:

if (a && b && c)

and

if (a) if (b) if (c)

Is that what you meant?

However, there can be times where 

if (c && b && a) is much more efficent.

For example, if 'a' is a function call or a complex string comparison,
and 'c' is a simple boolean expression, then because usually 'a' won't
be executed when 'c' is false, you can benefit from testing 'c' first.

Similarly if 'a' is almost always true and 'c' is more likely to be
false, evaluating 'c' first can speed up the entire expression
evaluation.

However, I doubt that there are many programs where this advice pays
off. I'd stick to (1), (2), and (3) first. Also, if you have an
application where (4) is worthwhile, you probably should be using C
instead of an interpreter.

DKM


On Wed, 29 Oct 2003 19:57:14 -0500, "Aaron S. Hawley"
<Aaron.Hawley@uvm.edu> wrote:

>The subject of this post is a bit of a contradiction.
>
>Awk scripts are written for purposes of ease, portability, prototype and
>experiment (and any other of a host of diverse reasons why people love
>using awk) and notably Awk is used when efficiency is not a requisite.
>
>When efficiency becomes a requisite, it's time time to take your Awk
>prototype script to another language.
>
>This above is me repeating the mantras of "The Art of Programming" (TAOP)
>by Brian Kernighan and Rob Pike, the former being one of the co-authors of
>Awk language.  Look for it in your local library.
>
>So to end the missionary work and answer your question:
>
>Using combined boolean expressions rather than nested if-else statements
>is not only clearer to read (in most cases) for other programmers
>(including yourself), but it should be faster (never slower in most likely
>all Awk implementions) in performance.
>
>I'm amazed conditional statements or boolean expressions could be
>suspected as performance problem areas.  Performance concerns of an Awk
>script usually occur when the task expected of Awk is enormous (large
>inputs, large files, complex algorithms) and performance weakens as a
>result of iterations (while and for-loops or recursion) over large ranges
>or sets of data.
>
>/a
>
>On Wed, 29 Oct 2003, V. Mark Lehky wrote:
>
>> Hi all,
>>
>> I have a situation in which I have like 90 tests.  When considering memory
>> and speed, am I better off writing it as a series of if ... else if ...
>> statements (some nested), or one if test that has a long string of
>> conditions connected by || and &&?
>> I hope I am making sense here.
>>
>> TIA, M.


To contact me directly, send EMAIL to (single letters all)
DEE KAY EMM AT CEE TEE ESS D0T CEE OH EMM
0
Reply Doug 10/30/2003 7:06:20 PM

In article <3699f0db.0310300602.1cd979ba@posting.google.com>,
U Netter <unetter@fastmail.fm> wrote:
>For speed, you might try using one of the publicly-available AWK compilers.
>See, for example,
>
>http://awka.sourceforge.net/index.html
>
>for the AWKA compiler.

Yes, awka is good.  Also worth trying is 'mawk'.  I avoid it for most purposes,
but it is much faster than other awk interpreters, and the problems that make
me avoid it won't affect most awk code.

	John
-- 
John DuBois  spcecdt@armory.com  KC6QKZ/AE  http://www.armory.com/~spcecdt/
0
Reply spcecdt 10/30/2003 8:01:42 PM

In article <3fa16e26$0$1101$8eec23a@newsreader.tycho.net>,
John DuBois <spcecdt@deeptht.armory.com> wrote:
>In article <3699f0db.0310300602.1cd979ba@posting.google.com>,
>U Netter <unetter@fastmail.fm> wrote:
>>For speed, you might try using one of the publicly-available AWK compilers.
>>See, for example,
>>
>>http://awka.sourceforge.net/index.html
>>
>>for the AWKA compiler.
>
>Yes, awka is good.  Also worth trying is 'mawk'.  I avoid it for most
>purposes, but it is much faster than other awk interpreters, and the
>problems that make me avoid it won't affect most awk code.

Stating the obvious question: Why do you avoid it?

(Not that I have any agenda here - I have no strong feelings one way or the
other on mawk [*].  In fact, I use TAWK for everything, since it has is the
most efficient of all the interpreted AWKs and has the most complete
language)

[*] Though to be fair, mawk has 2 nice attributes.  It is, as you allude to,
the most efficient of the free (interpreted) AWKs, and it has comprehensible
source code.
0
Reply gazelle 10/30/2003 9:58:33 PM

gazelle@yin.interaccess.com (Kenny McCormack) writes:

> In article <3fa16e26$0$1101$8eec23a@newsreader.tycho.net>,
> John DuBois <spcecdt@deeptht.armory.com> wrote:
> >In article <3699f0db.0310300602.1cd979ba@posting.google.com>,
> >U Netter <unetter@fastmail.fm> wrote:
> >>For speed, you might try using one of the publicly-available AWK compilers.
> >>See, for example,
> >>
> >>http://awka.sourceforge.net/index.html
> >>
> >>for the AWKA compiler.
> >
> >Yes, awka is good.  Also worth trying is 'mawk'.  I avoid it for most
> >purposes, but it is much faster than other awk interpreters, and the
> >problems that make me avoid it won't affect most awk code.
> 
> Stating the obvious question: Why do you avoid it?

mawk:
 - doesn't support character classes in regular expressions
 - has limits on input line length
 - crashed on huge string or regular expressions
 - gawk has a lot of useful extensions

-- 
Best regards, Aleksey Cheusov.
0
Reply Aleksey 10/31/2003 12:26:59 PM

In article <bns1u4$4bn$1@yin.interaccess.com>,
Kenny McCormack <gazelle@interaccess.com> wrote:
>In article <3fa16e26$0$1101$8eec23a@newsreader.tycho.net>,
>John DuBois <spcecdt@deeptht.armory.com> wrote:
>>Yes, awka is good.  Also worth trying is 'mawk'.  I avoid it for most
>>purposes, but it is much faster than other awk interpreters, and the
>>problems that make me avoid it won't affect most awk code.
>
>Stating the obvious question: Why do you avoid it?

The most common problems I encounter:

- It dies on some regexps
- It has memory leaks
- It requires that all functions that are referred to in the code be defined,
  even if they can never be called (for example, if they're referenced in a
  function that is itself never called).  All of the other awk interpreters
  I've used don't complain until an undefined function is actually called.
  awka doesn't complain about undefined functions if it determines that they
  will never be called.
  This makes dealing with libraries that use functions from other libraries
  something of a pain, in that every one of those functions-that-will-never-be-
  called must be resolved, either by pulling in other libraries that will also
  never be used (and which themselves might require other libraries!), or by
  stubbing out those functions.  Library code that references builtin
  awk-extension functions that don't exist in mawk (e.g. gensub, strftime) are
  even more problematic: they can be stubbed out, but then the awk program
  won't work under awks that implement those functions unless the stubs are
  removed. 
  Yuck.  Of course checking for undefined functions is useful, but I'd really
  much rather see it left up to gawk-style --lint, or at least awka-style
  post-dead-code-pruning checking.

The last one is the biggest problem.  I use an option-processing library that
exercises this problem in almost all of my awk programs.  The net result of
this is that out of 200+ awk programs, I have only one that is still #!mawk -
one that I run quite often and interactively, where the (amazing) ~3x
difference in execution time makes it worthwhile.  For some time I was down to
two mawk programs; the other was a logfile processor that was at the other end
of the spectrum - it consumes the better part of a day of CPU time when I run
it, making it again worth using mawk.  But the memory leaks eventually made it
unusable for that purpose.

Lesser problems:
- Doesn't handle nulls in the input
- Doesn't implement most of the nice extensions in gawk, some of which I use
  quite a lot (like strftime).

Pluses:
- Speed!
- The -Wexec command line option.  This is really nice, as it allows #!mawk
  programs to take POSIX-style options.  gawk stops processing options when
  it encounters one that it doesn't recognize and passes it and the remaining
  arguments to the awk program, but this has two problems:
  First, you can't use any gawk options as options to your program (and if
  the user does give them by accident, you don't have an opportunity to handle
  it; instead the user gets unexpected behavior).
  Second, it makes the program non-POSIXy in that '--' is not sufficient to
  terminate processing of the option list.  If you aren't passing any options
  to your program, and the first argument begins with '-' (or in a script that
  invokes your program, if the first argument *might* begin with a '-'), you
  have to use '-- -- -myfirstarg' (the first '--' being absorbed by gawk, the
  second by your POSIX option-processing library, which has no way of knowing
  about the first one).
  A few years ago I implemented -Wexec in the awk shipped by the OS vendor
  I work for; it would be nice to see it in other awks (especially gawk).

	John
-- 
John DuBois  spcecdt@armory.com  KC6QKZ/AE  http://www.armory.com/~spcecdt/
0
Reply spcecdt 10/31/2003 8:58:38 PM

On Wed, 29 Oct 2003, Aaron S. Hawley wrote:

> ...
>
> This above is me repeating the mantras of "The Art of Programming" (TAOP)
> by Brian Kernighan and Rob Pike, the former being one of the co-authors of
> Awk language.  Look for it in your local library.

The book I was attempting to mention is actually called:
"The Practice of Programming" (TPOP).  Brian W. Kernighan, Rob Pike.
Addison-Wesley. 1999.

> So to end the missionary work and answer your question:
>
> ...
0
Reply Aaron 11/1/2003 5:56:47 PM

First off, I would like to thank everyone in this thread who answered. 
Very useful advice.


Doug McClure wrote:
> if (a && b && c)
> 
> and
> 
> if (a) if (b) if (c)
> 
> Is that what you meant?
> 
> However, there can be times where

This is sorta what I meant, but not really :D

I should have phrased the question this way, however AWKwardly:
Some languages have a feature, and I have no idea what this is called.  For
example, if I have a test such as (A && B) and A evaluates to zero then B
does not even get looked at; alternatively, if I have (A || B) and A
evaluates to one then B does not even get looked at.  Does awk have this
feature?
This would be useful in my case, because I am going to have something like
(A && B && C || D || E ......); if A evaluated to zero then the rest (in my
case approximately 90) of the terms will not even get looked at.  Or if B
evaluates to zero then the remaining 89 terms will bo looked at ....
hopefully you get the idea.

TIA, ML.
0
Reply V 11/3/2003 9:04:29 AM

Hello,

In article <3fa2ccfe$0$1097$8eec23a@newsreader.tycho.net>, John DuBois wrote:
> - The -Wexec command line option.

I'd like to understand what would -Wexec option buy us, if it were added
to gawk.  (Of course, I'm not in a position to _decide_ this, I'm just
curious.)

There are two issues: first with #!, second with options to the awk script
and `--'.  Please let me discuss them in detail.
(Sorry, I'm not able to make it shorter. :-( )

1) #! scripts
=============

I understand, that starting an awk script with
	#!/usr/bin/mawk -Wexec
looks very elegant---there is no need for a shell srapper.
OTOH, how could such script acquire the ability to do POSIX-like option
parsing?  Would you paste the routines into the script?

In gawk, the standard way to do it is to add
	-f /usr/share/awk/getopt.awk
But this would not enable us to use #!, we would again have to resort to
a shell wrapper.

There is a possiblity that --exec (and -W exec) would take an optional
argument, which would be a comma separated list of files to source.
One could then do:
	#!/usr/bin/gawk --exec=/usr/share/awk/getopt.awk,anotherlib.awk
or
	#!/usr/bin/gawk -Wexec=/usr/share/awk/getopt.awk,anotherlib.awk
to get the desired effect.
But is it worth it, just to avoid the shell wrapper? After all, awk should
be rapid prototyping language...

2) options to awk scripts
=========================

>  This is really nice, as it allows #!mawk
>   programs to take POSIX-style options.  gawk stops processing options when
>   it encounters one that it doesn't recognize and passes it and the remaining
>   arguments to the awk program, but this has two problems:
>   First, you can't use any gawk options as options to your program (and if
>   the user does give them by accident, you don't have an opportunity to handle
>   it; instead the user gets unexpected behavior).
>   Second, it makes the program non-POSIXy in that '--' is not sufficient to
>   terminate processing of the option list.  If you aren't passing any options
>   to your program, and the first argument begins with '-' (or in a script that
>   invokes your program, if the first argument *might* begin with a '-'), you
>   have to use '-- -- -myfirstarg' (the first '--' being absorbed by gawk, the
>   second by your POSIX option-processing library, which has no way of knowing
>   about the first one).

I think all this is due to the fact that the args array is processed by the
getopt() twice.  It's similar to shell escaping: when you need more levels of
escaping, things start to look odd, but all can be logically deduced from well
documented axioms.  Trying to smooth some special cases would bring us to...
ehm... C-shell.

In our case, I think that if an awk script is more complex, one creates a shell
wrapper, something like:
	#!/bin/sh
	exec awk -f /usr/share/awk/getopt.awk -f script.awk -- $@
or
	exec awk ${DEBUG+"-vDEBUG=1"} -f /usr/share/awk/getopt.awk \
		-f script.awk -- $@

And the script is called with parameters as a normal program.
(In the later version, we can use "DEBUG=x ./shell_wrapper args" to tell the
awk script to print debugging information.)

If the awk script is not yet ``wrapped,'' things may look strange
('-- -- -myfirstarg'), but everything is logical and consistent.

Conclusion
==========
As it seems to me, ``-Wexec script.awk'' can be replaced by
``-f script.awk --'' with no significant problems.

Have I missed something?

Stepan Kasal
0
Reply Stepan 11/3/2003 10:41:30 AM

Hello,

In article <3FA61A1D.2632A9C6@myrealbox.com>, V. Mark Lehky wrote:
> Some languages have a feature, and I have no idea what this is called.

I think the feature you describe is "short-circuit Boolean evaluation."
And awk really has it. (It is one of many aspects it has similar to C.)
Please see:
http://www.gnu.org/software/gawk/manual/gawk-3.1.1/html_node/Boolean-Ops.html
if you are interested in deatils.

> This would be useful in my case, because I am going to have something like
> (A && B && C || D || E ......); if A evaluated to zero then the rest (in my
> case approximately 90) of the terms will not even get looked at.  Or if B
> evaluates to zero then the remaining 89 terms will bo looked at ....

Beware of operator percedence, though.  The behaviour you describe would
correspond to expression
	(A && B && (C || D || E ......))

The expression (A && B && C || D || E ......) is in fact
	((A && B && C) || D || E ...)
because && (Boolean multiplication) has higher precedence then ||
(Boolean sum)--it's analogous to normal sum (+) and multiplication (*).

Again, see
http://www.gnu.org/software/gawk/manual/gawk-3.1.1/html_node/Precedence.html
if you want to see the full plot.

Hope this time we gave you the right explanation, ;-)
	Stepan
0
Reply Stepan 11/3/2003 10:54:28 AM

Stepan Kasal wrote:
> Hope this time we gave you the right explanation, ;-)

Yes, and Thank You.

VML.

-- 
Knowledge is power; power corrupts.
Study hard; be evil.
-----
Candy for spammers:
http://members.fortunecity.com/enderian/spamthis.shtml
0
Reply V 11/3/2003 3:32:26 PM

Stepan Kasal <kasal@ucw.cz> writes:

....
> Conclusion
> ==========
> As it seems to me, ``-Wexec script.awk'' can be replaced by
> ``-f script.awk --'' with no significant problems.
> 
> Have I missed something?

Such code

#!/bin/sh

awk '
  ...
'


looks very ugly.
It is just inconvenient.

#!/usr/bin/gawk --exec f1.awk,f2.awk
looks much better.

-- 
Best regards, Aleksey Cheusov.
0
Reply Aleksey 11/3/2003 4:40:20 PM

In article <slrnbqcc6q.h05.kasal@matsrv.math.cas.cz>,
Stepan Kasal  <kasal@ucw.cz> wrote:
>In article <3fa2ccfe$0$1097$8eec23a@newsreader.tycho.net>, John DuBois wrote:
>> - The -Wexec command line option.
>
>I'd like to understand what would -Wexec option buy us, if it were added
>to gawk.  (Of course, I'm not in a position to _decide_ this, I'm just
>curious.)
>
>There are two issues: first with #!, second with options to the awk script
>and `--'.  Please let me discuss them in detail.
>(Sorry, I'm not able to make it shorter. :-( )
>
>1) #! scripts
>=============
>
>I understand, that starting an awk script with
>	#!/usr/bin/mawk -Wexec
>looks very elegant---there is no need for a shell srapper.
>OTOH, how could such script acquire the ability to do POSIX-like option
>parsing?  Would you paste the routines into the script?

Yes, they're included as a library.

>After all, awk should be rapid prototyping language...

awk is perfectly suited to being both a prototyping and a final-implementation
language.  Very, very few of my awk programs ever become anything else, though
I do occasionally take library code initially written in awk and convert it to
other languages for similar uses there.  I'm not sure if this is what you mean,
but I think the days of interpreted languages being considered unsuitable for
anything other than small/prototype projects are long past.

>2) options to awk scripts
>=========================
>
>>  This is really nice, as it allows #!mawk
>>   programs to take POSIX-style options.  gawk stops processing options when
>>   it encounters one that it doesn't recognize and passes it and the remaining
>>   arguments to the awk program, but this has two problems:
>>   First, you can't use any gawk options as options to your program (and if
>>   the user does give them by accident, you don't have an opportunity to handle
>>   it; instead the user gets unexpected behavior).
>>   Second, it makes the program non-POSIXy in that '--' is not sufficient to
>>   terminate processing of the option list.  If you aren't passing any options
>>   to your program, and the first argument begins with '-' (or in a script that
>>   invokes your program, if the first argument *might* begin with a '-'), you
>>   have to use '-- -- -myfirstarg' (the first '--' being absorbed by gawk, the
>>   second by your POSIX option-processing library, which has no way of knowing
>>   about the first one).
>
>I think all this is due to the fact that the args array is processed by the
>getopt() twice.  It's similar to shell escaping: when you need more levels of
>escaping, things start to look odd, but all can be logically deduced from well
>documented axioms.

The problem, really, is that awk's behavior is *different* from that of other
interpreters.  Unlike other interpreters, awk does not take arguments on the
command line to be the names of files to interpret.  Instead, such arguments
are programs.  So, to interpret a file, the name of the file must be given as
the argument to an option.  But this means that the option-list has not been
terminated by a non-option argument (the name of a file) as it is with other
interpreters.  We need a way to get the same semantics with awk: the filename
given with this option is a program, *and* it terminates the option list.

And yes, of course you can wrap an awk program with a shell script.  But this
introduces unneccessary complexity, and makes it more difficult to distribute
awk programs.  Programs written in other interpreted languages, and executables
for a suitable platform, can be distributed as single files.  You download it,
put it whereever you want, and run it.  In the case of an awk program +
wrapper, you not only have to deal with two files, you have to either put the
awk program wherever the wrapper expects it or edit the wrapper.  If you're
just prototyping with awk or don't intend to distribute your program or have a
very targeted audience, that's (sort of) OK... but it limits the utility of awk
as a more general-purpose programming language.

	John
-- 
John DuBois  spcecdt@armory.com  KC6QKZ/AE  http://www.armory.com/~spcecdt/
0
Reply spcecdt 11/3/2003 9:33:14 PM

Hello.

In article <3fa6c99a$0$1098$8eec23a@newsreader.tycho.net>, John DuBois wrote:
> The problem, really, is that awk's behavior is *different* from that of other
> interpreters. [...] to interpret a file, the name of the file must be given as
> the argument to an option.  But this means that the option-list has not been
> terminated by a non-option argument (the name of a file) as it is with other
> interpreters.

Another problem is that GNU Coding Standards allow options to be mixed
with arguments, which I often make use of.  An interpreter has to break
this rule in order to get the effect you ask for.
But if you set POSIXLY_CORRECT in the environment, then indeed, first
non-option argument stops option processing.  That's probably what you
have in mind.

I often call gawk with --re-intervals (-Wre-intervals), and that option
doesn't fit into the #! line as it has no one-letter equivalent (and you
cannot ask for it from within the awk program).
But again, this vanishes with POSIXLY_CORRECT set.

I'm not sure what you meant by this:

>> Would you paste the routines into the script?
>
> Yes, they're included as a library.

You mean that you copy the source?  Isn't including the ``library''
functions by multiple -f's more elegant?

But I understand now that copying the functions in enables you
to create one self-contained script file.

(Well, to be exact, the script is not completely self-contained as soon
as it has { or \{ in a regex, as it either supposes that POSIXLY_CORRECT
is set, or that it isn't.)

>>After all, awk should be rapid prototyping language...
>
> awk is perfectly suited to being both a prototyping and a
> final-implementation language.

You are right, my comment was silly.
I have to confess that I've distributed a pair of *.BAT and *.AWK files
to my MS-DOS users just a few weeks ago.

You have (almost) convinced me, but I don't know what's Arnold's opinion.
And his opinion is the only thing which has any relevance here.  ;-)

Have a nice day,
	Stepan Kasal
0
Reply Stepan 11/4/2003 12:47:28 PM

Stepan Kasal wrote:
> Hello.
> 
> In article <3fa6c99a$0$1098$8eec23a@newsreader.tycho.net>, John DuBois wrote:
> 
>>The problem, really, is that awk's behavior is *different* from that of other
>>interpreters. [...] to interpret a file, the name of the file must be given as
>>the argument to an option.  But this means that the option-list has not been
>>terminated by a non-option argument (the name of a file) as it is with other
>>interpreters.
> 
> 
> Another problem is that GNU Coding Standards allow options to be mixed
> with arguments, which I often make use of.  An interpreter has to break
> this rule in order to get the effect you ask for.
> But if you set POSIXLY_CORRECT in the environment, then indeed, first
> non-option argument stops option processing.  That's probably what you
> have in mind.
> 
> I often call gawk with --re-intervals (-Wre-intervals), and that option
> doesn't fit into the #! line as it has no one-letter equivalent (and you
> cannot ask for it from within the awk program).
> But again, this vanishes with POSIXLY_CORRECT set.
> 
> I'm not sure what you meant by this:
> 
> 
>>>Would you paste the routines into the script?
>>
>>Yes, they're included as a library.
> 
> 
> You mean that you copy the source?  Isn't including the ``library''
> functions by multiple -f's more elegant?
> 
> But I understand now that copying the functions in enables you
> to create one self-contained script file.
> 
> (Well, to be exact, the script is not completely self-contained as soon
> as it has { or \{ in a regex, as it either supposes that POSIXLY_CORRECT
> is set, or that it isn't.)
> 
> 
>>>After all, awk should be rapid prototyping language...
>>
>>awk is perfectly suited to being both a prototyping and a
>>final-implementation language.
> 
> 
> You are right, my comment was silly.
> I have to confess that I've distributed a pair of *.BAT and *.AWK files
> to my MS-DOS users just a few weeks ago.
> 
> You have (almost) convinced me, but I don't know what's Arnold's opinion.
> And his opinion is the only thing which has any relevance here.  ;-)
> 
> Have a nice day,
> 	Stepan Kasal
I use gawk as my primary scripting language in Unix and Cygwin.

The sizes range from 5 to 1300 line, with the outlier being a 
textformatter that is currently 6754 lines.

Martin Cohen

0
Reply Martin 11/4/2003 5:42:42 PM

Mark,

There is no reason why you can't test your own AWK interpreter to
learn exactly how it evaluates boolean expressions.

BEGIN {
	# Read test data
	for (i = 1; i < ARGC; i++)
		{
		s[i] = ARGV[i]
		ARGV[i] = ""
		}

	# Evaluate a boolean expression
	if (F_1() || F_2() || F_3())
		print "TRUE"
	else
		print "FALSE"
	}

function F_1()
{
print "F_1 returns", s[1]
return s[1]
}

function F_2()
{
print "F_2 returns", s[2]
return s[2]
}

function F_3()
{
print "F_2 returns", s[3]
return s[3]
}

Once you've learned how (a || b || c) is evaluated, rewrite the test
in the BEGIN section and see how (a && b && c) is evaluated.

DKM

On Mon, 03 Nov 2003 10:04:29 +0100, "V. Mark Lehky"
<siking@myrealbox.com> wrote:
>I should have phrased the question this way, however AWKwardly:
>Some languages have a feature, and I have no idea what this is called.  For
>example, if I have a test such as (A && B) and A evaluates to zero then B
>does not even get looked at; alternatively, if I have (A || B) and A
>evaluates to one then B does not even get looked at.  Does awk have this
>feature?
>This would be useful in my case, because I am going to have something like
>(A && B && C || D || E ......); if A evaluated to zero then the rest (in my
>case approximately 90) of the terms will not even get looked at.  Or if B
>evaluates to zero then the remaining 89 terms will bo looked at ....
>hopefully you get the idea.
>
>TIA, ML.


To contact me directly, send EMAIL to (single letters all)
DEE KAY EMM AT CEE TEE ESS D0T CEE OH EMM
0
Reply Doug 11/7/2003 5:22:27 PM

Doug McClure wrote:
> There is no reason why you can't test your own AWK interpreter to
> learn exactly how it evaluates boolean expressions.
[snip]

Yup, I had not thought of that.  Thanx! ;)  It does break out of early if
it can.

ML.

-- 
The problem with your computer resides on the external side 
of the user interface.
-----
Candy for spammers:
http://members.fortunecity.com/enderian/spamthis.shtml
0
Reply V 11/11/2003 2:43:33 PM

19 Replies
245 Views

(page loaded in 0.256 seconds)

Similiar Articles:


















7/20/2012 5:36:55 PM


Reply: