GAWK feature request

  • Follow


As we all know, having any non-standard "locale" setting - i.e., anything
other than "C" - causes (g)awk scripts to misbehave in totally mysterious
ways.  Plain obvious things like /[0-9]+/ stop working.  Further, we find
that a lot of mysterious ("That just can't happen") support problems on
these newsgroups (and, I would imagine, in real life as well) are caused by
people running with non-standard locale settings.

I'd like to suggest that GAWK have some kind of feature (perhaps an
extension of the WHINY_USER idea) that would cause it to ignore any locale
setting (i.e., behave as if all the locale settings are "C").  This would be
very helpful.

Notes:
    1) Yes, I know you can get this functionality in a shell script, but I
	would like to be able to do it in #!/bin/gawk scripts as well.
    2) Just as a funny aside, I have a situation where, using an "extension
	library", I was able to call the libc "setlocale" function from
	within a GAWK script - and I found, interestingly, that, while
	dynamic regular expressions worked fine, fixed/static reg exps
	caused the program to blow up with an "internal error - aborted"
	message.  I.e., the code is like:
	    MyExtensionLib(1,"C")
	    x = $0 ~ "[0-9]+"	# OK
	    x = $0 ~ /[0-9]+/	# Blows up

La de dah...

-- 
(This discussion group is about C, ...)

Wrong.  It is only OCCASIONALLY a discussion group
about C; mostly, like most "discussion" groups, it is
off-topic Rorsharch [sic] revelations of the childhood
traumas of the participants...

0
Reply gazelle3 (1609) 12/23/2011 3:28:46 AM

On Fri, 23 Dec 2011 03:28:46 +0000 (UTC), gazelle@shell.xmission.com (Kenny
McCormack) wrote:

> As we all know, having any non-standard "locale" setting - i.e., anything
> other than "C" - causes (g)awk scripts to misbehave in totally mysterious
> ways.  Plain obvious things like /[0-9]+/ stop working.  Further, we find
> that a lot of mysterious ("That just can't happen") support problems on
> these newsgroups (and, I would imagine, in real life as well) are caused
> by people running with non-standard locale settings.
> 
> I'd like to suggest that GAWK have some kind of feature (perhaps an
> extension of the WHINY_USER idea) that would cause it to ignore any locale
> setting (i.e., behave as if all the locale settings are "C").  This would
> be very helpful.

ISTR that Arnold is already considering that, at least for bracket
expressions.

> 
> Notes:
>     1) Yes, I know you can get this functionality in a shell script, but I
> 	would like to be able to do it in #!/bin/gawk scripts as well.
>     2) Just as a funny aside, I have a situation where, using an
> "extension library", I was able to call the libc "setlocale" function from
> 	within a GAWK script - and I found, interestingly, that, while
> 	dynamic regular expressions worked fine, fixed/static reg exps
> 	caused the program to blow up with an "internal error - aborted"
> 	message.  I.e., the code is like:
> 	    MyExtensionLib(1,"C")
> 	    x = $0 ~ "[0-9]+"	# OK
> 	    x = $0 ~ /[0-9]+/	# Blows up




0
Reply pk (425) 12/23/2011 8:41:07 AM


On 23.12.2011 09:41, pk wrote:
> On Fri, 23 Dec 2011 03:28:46 +0000 (UTC), gazelle@shell.xmission.com (Kenny
> McCormack) wrote:
> 
[...]
>> I'd like to suggest that GAWK have some kind of feature (perhaps an
>> extension of the WHINY_USER idea) that would cause it to ignore any locale
>> setting (i.e., behave as if all the locale settings are "C").  This would
>> be very helpful.
> 
> ISTR that Arnold is already considering that, at least for bracket
> expressions.

Considering? - I think that became part of the 4.0 release.

From the changelog file...

Sun Jun 12 23:43:06 2011  Arnold D. Robbins  <arnold@skeeve.com>
        * re.c (resetup): Always turn on RE_RANGES_IGNORE_LOCALES.


Janis

> 
>> [...]
0
Reply janis_papanagnou (1038) 12/23/2011 9:56:02 AM

In article <jd0she$ip3$1@news.xmission.com>,
Kenny McCormack <gazelle@shell.xmission.com> wrote:
>As we all know, having any non-standard "locale" setting - i.e., anything
>other than "C" - causes (g)awk scripts to misbehave in totally mysterious
>ways.  Plain obvious things like /[0-9]+/ stop working.  Further, we find
>that a lot of mysterious ("That just can't happen") support problems on
>these newsgroups (and, I would imagine, in real life as well) are caused by
>people running with non-standard locale settings.

In gawk 4.0.  You're 6 months behind the times.  The phrase coined by
Karl Berry is "Rational Range Interpretation". It applies even when --posix
is in effect, since the latest standard allows it.

>    2) Just as a funny aside, I have a situation where, using an "extension
>	library", I was able to call the libc "setlocale" function from
>	within a GAWK script - and I found, interestingly, that, while
>	dynamic regular expressions worked fine, fixed/static reg exps
>	caused the program to blow up with an "internal error - aborted"
>	message.  I.e., the code is like:
>	    MyExtensionLib(1,"C")
>	    x = $0 ~ "[0-9]+"	# OK
>	    x = $0 ~ /[0-9]+/	# Blows up

Makes sense. Static regexps were compiled with the locale in effect at the
time gawk was started, the dynamic ones afterwards.

Move to gawk 4.0. You'll be happier.
-- 
Aharon (Arnold) Robbins 			arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50 729-7545
D.N. Shimshon 99785	ISRAEL
0
Reply arnold847 (183) 12/23/2011 12:14:29 PM

In article <jd1rb5$bkj$1@dont-email.me>,
Aharon Robbins <arnold@skeeve.com> wrote:
>In article <jd0she$ip3$1@news.xmission.com>,
>Kenny McCormack <gazelle@shell.xmission.com> wrote:
>>As we all know, having any non-standard "locale" setting - i.e., anything
>>other than "C" - causes (g)awk scripts to misbehave in totally mysterious
>>ways.  Plain obvious things like /[0-9]+/ stop working.  Further, we find
>>that a lot of mysterious ("That just can't happen") support problems on
>>these newsgroups (and, I would imagine, in real life as well) are caused by
>>people running with non-standard locale settings.
>
>In gawk 4.0.  You're 6 months behind the times.

At least...

More like 7 years.

>The phrase coined by Karl Berry is "Rational Range Interpretation". It
>applies even when --posix is in effect, since the latest standard allows
>it.

Good to hear.  I think I will go ahead and start looking at the latest
version.  Right now, I only have a MacOS version of gawk4; will need to
build a Linux version.

>>    2) Just as a funny aside, I have a situation where, using an "extension
>>	library", I was able to call the libc "setlocale" function from
>>	within a GAWK script - and I found, interestingly, that, while
>>	dynamic regular expressions worked fine, fixed/static reg exps
>>	caused the program to blow up with an "internal error - aborted"
>>	message.  I.e., the code is like:
>>	    MyExtensionLib(1,"C")
>>	    x = $0 ~ "[0-9]+"	# OK
>>	    x = $0 ~ /[0-9]+/	# Blows up
>
>Makes sense. Static regexps were compiled with the locale in effect at the
>time gawk was started, the dynamic ones afterwards.

Yup.  That's what I figured.

>Move to gawk 4.0. You'll be happier.

Maybe so.  I'll need to retro-fit my changes, though...

-- 
Windows 95 n. (Win-doze): A 32 bit extension to a 16 bit user interface for
an 8 bit operating system based on a 4 bit architecture from a 2 bit company
that can't stand 1 bit of competition.

Modern day upgrade --> Windows XP Professional x64: Windows is now a 64 bit
tweak of a 32 bit extension to a 16 bit user interface for an 8 bit
operating system based on a 4 bit architecture from a 2 bit company that
can't stand 1 bit of competition.
0
Reply gazelle3 (1609) 12/23/2011 1:43:55 PM

4 Replies
47 Views

(page loaded in 0.044 seconds)

Similiar Articles:






7/21/2012 11:26:41 PM


Reply: