Is there UB in (g)awk?

  • Follow


We all know that in C, printf("%d %d\n",x,--x) invokes, blah, blah, blah,
undefined behavior, nasal demons, and the end of civilization as we know it.

But what about in AWK (in general) and GAWK (in specific) ?
Does any document say anything about:

$ gawk 'BEGIN {x=5;print x,--x}'
5 4
$

which is, obviously, the *correct* thing to do.

P.S.  Incidentally, TAWK prints "4 4" for the above program; make of that
what you will...

-- 

Some of the more common characteristics of Asperger syndrome include: 

* Inability to think in abstract ways (eg: puns, jokes, sarcasm, etc)
* Difficulties in empathising with others
* Problems with understanding another person's point of view
* Hampered conversational ability
* Problems with controlling feelings such as anger, depression 
    and anxiety
* Adherence to routines and schedules, and stress if expected routine 
    is disrupted
* Inability to manage appropriate social conduct
* Delayed understanding of sexual codes of conduct
* A narrow field of interests. For example a person with Asperger 
    syndrome may focus on learning all there is to know about 
    baseball statistics, politics or television shows.
* Anger and aggression when things do not happen as they want
* Sensitivity to criticism
* Eccentricity
* Behaviour varies from mildly unusual to quite aggressive 
    and difficult

0
Reply gazelle 3/27/2011 8:39:05 PM

On Sun, 27 Mar 2011 20:39:05 +0000 (UTC)
gazelle@shell.xmission.com (Kenny McCormack) wrote:

> We all know that in C, printf("%d %d\n",x,--x) invokes, blah, blah, blah,
> undefined behavior, nasal demons, and the end of civilization as we know
> it.
> 
> But what about in AWK (in general) and GAWK (in specific) ?
> Does any document say anything about:
> 
> $ gawk 'BEGIN {x=5;print x,--x}'
> 5 4
> $
> 
> which is, obviously, the *correct* thing to do.

Obviously. Yes.


> P.S.  Incidentally, TAWK prints "4 4" for the above program; make of that
> what you will...

If TAWK prints "4 4", that "obviously" means that "4 4", and not "5 4", *IS*
the *correct* thing to do. Yes?


0
Reply pk 3/27/2011 9:00:18 PM


In article <imo8mq$o9n$1@speranza.aioe.org>, pk  <pk@pk.invalid> wrote:
....
>> P.S.  Incidentally, TAWK prints "4 4" for the above program; make of that
>> what you will...
>
>If TAWK prints "4 4", that "obviously" means that "4 4", and not "5 4", *IS*
>the *correct* thing to do. Yes?

heh heh - I did say: Make of it what you will.

In fact, TAWK always evaluates function args from right-to-left, which is
the most natural way for a stack-based, C-like language to do things.  I'm
actually kind of curious as to why GAWK doesn't (GAWK seems to evaluate from
left to right)

P.S.  gcc seems to give "4 4" for either: printf("%d %d\n",x,x--)
or: printf("%d %d\n",x--,x)

(at least at the default optimization level)

-- 
"The anti-regulation business ethos is based on the charmingly naive notion
that people will not do unspeakable things for money." - Dana Carpender

Quoted by Paul Ciszek (pciszek at panix dot com).  But what I want to know
is why is this diet/low-carb food author doing making pithy political/economic
statements?

Nevertheless, the above quote is dead-on, because, the thing is - business
in one breath tells us they don't need to be regulated (which is to say:
that they can morally self-regulate), then in the next breath tells us that
corporations are amoral entities which have no obligations to anyone except
their officers and shareholders, then in the next breath they tell us they
don't need to be regulated (that they can morally self-regulate) ...

0
Reply gazelle 3/27/2011 9:09:59 PM

On 27.03.2011 22:39, Kenny McCormack wrote:
> We all know that in C, printf("%d %d\n",x,--x) invokes, blah, blah, blah,
> undefined behavior, nasal demons, and the end of civilization as we know it.
> 
> But what about in AWK (in general) and GAWK (in specific) ?
> Does any document say anything about:
> 
> $ gawk 'BEGIN {x=5;print x,--x}'
> 5 4
> $
> 
> which is, obviously, the *correct* thing to do.
> 
> P.S.  Incidentally, TAWK prints "4 4" for the above program; make of that
> what you will...
> 

In C they speak of "sequence points"[*], and for all languages that allow
ambiguous constructs we need such sequence points or some similar semantic
meta construct to explain things in a more or less convenient way.
I'd really avoid those side effects in the first place; if only because of
the "nasal demons".

WRT the print example; if print were a function I would have expected what
C produces:

> P.S.  gcc seems to give "4 4" for either: printf("%d %d\n",x,x--)
> or: printf("%d %d\n",x--,x)

But gawk seems to produce the same strange result for the printf commands
as for the plain print

printf("%d %d\n",x,--x)   =>   5 4
printf "%d %d\n",x,--x    =>   5 4

So the arguments seem to be considered to be a sequential list, processed
from right to left, with "sequence points" at every comma. I'd really avoid
those constructs. (Oh, I already said that.)

Janis

[*] http://en.wikipedia.org/wiki/Sequence_point
0
Reply Janis 3/28/2011 7:43:07 PM

In article <imo799$aiq$1@news.xmission.com>,
Kenny McCormack <gazelle@shell.xmission.com> wrote:
>We all know that in C, printf("%d %d\n",x,--x) invokes, blah, blah, blah,
>undefined behavior, nasal demons, and the end of civilization as we know it.
>
>But what about in AWK (in general) and GAWK (in specific) ?
>Does any document say anything about:
>
>$ gawk 'BEGIN {x=5;print x,--x}'
>5 4
>$
>
>which is, obviously, the *correct* thing to do.

I don't think the gawk doc says anything about print/printf, nor do I
recall if POSIX says anything, but basically it's undefined.  All four awks
that I tested (gawk stable, gawk devel, mawk, nawk) print 5 and then 4.

The gawk doc does state that something like

	x = 4
	y = x++ + ++x

is undefined.

Gawk-stable evaluates print and printf arguments left to right, since
it builds the list of arguments in the order they're given in the program.
Gawk-devel is likely similar.

Such things are best avoided. :-)
-- 
Aharon (Arnold) Robbins 			arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50  729-7545
D.N. Shimshon 99785	ISRAEL
0
Reply arnold847 (183) 3/29/2011 6:23:03 AM

In article <imrts6$nnm$1@tornado.tornevall.net>,
Aharon Robbins <arnold@skeeve.com> wrote:
>In article <imo799$aiq$1@news.xmission.com>,
>Kenny McCormack <gazelle@shell.xmission.com> wrote:
>>We all know that in C, printf("%d %d\n",x,--x) invokes, blah, blah, blah,
>>undefined behavior, nasal demons, and the end of civilization as we know it.
>>
>>But what about in AWK (in general) and GAWK (in specific) ?
>>Does any document say anything about:
>>
>>$ gawk 'BEGIN {x=5;print x,--x}'
>>5 4
>>$
>>
>>which is, obviously, the *correct* thing to do.
>
>I don't think the gawk doc says anything about print/printf, nor do I
>recall if POSIX says anything, but basically it's undefined.  All four awks
>that I tested (gawk stable, gawk devel, mawk, nawk) print 5 and then 4.

Yes.  As I noted, I think that left-to-right is the intuitively correct
thing to do, althought right-to-left makes "under the hood" sense in, as I
put it in the OP, any "C-like, stack oriented, language".  Once you get used
to it, the right-to-left method used by TAWK makes sense.

>The gawk doc does state that something like
>
>	x = 4
>	y = x++ + ++x
>
>is undefined.

OK, so gawk *does* have the concept of "UB".  Good to know.

>Gawk-stable evaluates print and printf arguments left to right, since
>it builds the list of arguments in the order they're given in the program.
>Gawk-devel is likely similar.

I'm curious if there Is anything special about print/printf - or are they
just like any other awk function?  I ask because we've been talking as if
they are, but, really, they aren't - aren't really functions, because you
can't do: x = print(f)...

>Such things are best avoided. :-)

It's funny that everybody says this.  You said it, Janis said it (twice in
one post!), so it is pretty clear that there's some agenda here.

But why?  Why bother with it?  I think we can all agree that sky-diving is
best avoided, but that doesn't mean either that:

    1) People don't do it (and some of them enjoy it).

    2) People like us (sensible people who avoid the practice) can't discuss
	it.

-- 
Faced with the choice between changing one's mind and proving that there is
no need to do so, almost everyone gets busy on the proof. 

    - John Kenneth Galbraith -

0
Reply gazelle3 (1598) 3/29/2011 11:44:48 AM

On 29.03.2011 13:44, Kenny McCormack wrote:
> In article <imrts6$nnm$1@tornado.tornevall.net>,
> Aharon Robbins <arnold@skeeve.com> wrote:
[...]
>> Such things are best avoided. :-)
> 
> It's funny that everybody says this.  You said it, Janis said it (twice in
> one post!), so it is pretty clear that there's some agenda here.
> 
> But why?  Why bother with it?  I think we can all agree that sky-diving is
> best avoided, but that doesn't mean either that:
> 
>     1) People don't do it (and some of them enjoy it).
> 
>     2) People like us (sensible people who avoid the practice) can't discuss
> 	it.

Sure. I didn't mean to discourage you. Rather I appreciate those discussions.

Janis
0
Reply janis_papanagnou (1029) 3/29/2011 5:33:39 PM

In article <imsgng$s95$2@news.xmission.com>,
Kenny McCormack <gazelle@shell.xmission.com> wrote:
>>Gawk-stable evaluates print and printf arguments left to right, since
>>it builds the list of arguments in the order they're given in the program.
>>Gawk-devel is likely similar.
>
>I'm curious if there Is anything special about print/printf - or are they
>just like any other awk function?  I ask because we've been talking as if
>they are, but, really, they aren't - aren't really functions, because you
>can't do: x = print(f)...

They're the same as anything else. Gawk-stable has different code to
build the argument lists for print/printf from user-defined function calls
but both are still left-to-right.  I'd have to look harder at gawk-devel.

>>Such things are best avoided. :-)
>
>It's funny that everybody says this.  You said it, Janis said it (twice in
>one post!), so it is pretty clear that there's some agenda here.
>
>But why?  Why bother with it?  I think we can all agree that sky-diving is
>best avoided,

Well, at least without a parachute... :-)

>but that doesn't mean either that:
>
>    1) People don't do it (and some of them enjoy it).
>
>    2) People like us (sensible people who avoid the practice) can't discuss
>	it.

You can discuss it all you want. :-)  It should be avoided in code since
you're relying on the implementation to do something a certain way and
it could change when you move your code to a different awk, or to a newer
version of the same awk.  Same thing for C - nothing requires right-to-left
evaluation order of function arguments or things like y = x++ + ++x to
work in a certain order.

Instead, it's better to write your code such that there will never be
any doubt as to what you want to happen.

	y = ++x; y += x++
or
	y = x++; y += ++x

Both say what you mean and you always get the same results.

It gets worse in C, where different optimization levels can change the
results for the same code...

It's worth discussing if only to know what to avoid and why.

HTH,

Arnold
-- 
Aharon (Arnold) Robbins 			arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50  729-7545
D.N. Shimshon 99785	ISRAEL
0
Reply arnold847 (183) 3/29/2011 7:30:07 PM

7 Replies
151 Views

(page loaded in 0.096 seconds)

Similiar Articles:













7/23/2012 3:29:05 AM


Reply: