As we all know, having any non-standard "locale" setting - i.e., anything
other than "C" - causes (g)awk scripts to misbehave in totally mysterious
ways. Plain obvious things like /[0-9]+/ stop working. Further, we find
that a lot of mysterious ("That just can't happen") support problems on
these newsgroups (and, I would imagine, in real life as well) are caused by
people running with non-standard locale settings.
I'd like to suggest that GAWK have some kind of feature (perhaps an
extension of the WHINY_USER idea) that would cause it to ignore any locale
setting (i.e., behave as if all the locale settings are "C"). This would be
very helpful.
Notes:
1) Yes, I know you can get this functionality in a shell script, but I
would like to be able to do it in #!/bin/gawk scripts as well.
2) Just as a funny aside, I have a situation where, using an "extension
library", I was able to call the libc "setlocale" function from
within a GAWK script - and I found, interestingly, that, while
dynamic regular expressions worked fine, fixed/static reg exps
caused the program to blow up with an "internal error - aborted"
message. I.e., the code is like:
MyExtensionLib(1,"C")
x = $0 ~ "[0-9]+" # OK
x = $0 ~ /[0-9]+/ # Blows up
La de dah...
--
(This discussion group is about C, ...)
Wrong. It is only OCCASIONALLY a discussion group
about C; mostly, like most "discussion" groups, it is
off-topic Rorsharch [sic] revelations of the childhood
traumas of the participants...
|
|
0
|
|
|
|
Reply
|
gazelle3 (1609)
|
12/23/2011 3:28:46 AM |
|
On Fri, 23 Dec 2011 03:28:46 +0000 (UTC), gazelle@shell.xmission.com (Kenny
McCormack) wrote:
> As we all know, having any non-standard "locale" setting - i.e., anything
> other than "C" - causes (g)awk scripts to misbehave in totally mysterious
> ways. Plain obvious things like /[0-9]+/ stop working. Further, we find
> that a lot of mysterious ("That just can't happen") support problems on
> these newsgroups (and, I would imagine, in real life as well) are caused
> by people running with non-standard locale settings.
>
> I'd like to suggest that GAWK have some kind of feature (perhaps an
> extension of the WHINY_USER idea) that would cause it to ignore any locale
> setting (i.e., behave as if all the locale settings are "C"). This would
> be very helpful.
ISTR that Arnold is already considering that, at least for bracket
expressions.
>
> Notes:
> 1) Yes, I know you can get this functionality in a shell script, but I
> would like to be able to do it in #!/bin/gawk scripts as well.
> 2) Just as a funny aside, I have a situation where, using an
> "extension library", I was able to call the libc "setlocale" function from
> within a GAWK script - and I found, interestingly, that, while
> dynamic regular expressions worked fine, fixed/static reg exps
> caused the program to blow up with an "internal error - aborted"
> message. I.e., the code is like:
> MyExtensionLib(1,"C")
> x = $0 ~ "[0-9]+" # OK
> x = $0 ~ /[0-9]+/ # Blows up
|
|
0
|
|
|
|
Reply
|
pk (425)
|
12/23/2011 8:41:07 AM
|
|
On 23.12.2011 09:41, pk wrote:
> On Fri, 23 Dec 2011 03:28:46 +0000 (UTC), gazelle@shell.xmission.com (Kenny
> McCormack) wrote:
>
[...]
>> I'd like to suggest that GAWK have some kind of feature (perhaps an
>> extension of the WHINY_USER idea) that would cause it to ignore any locale
>> setting (i.e., behave as if all the locale settings are "C"). This would
>> be very helpful.
>
> ISTR that Arnold is already considering that, at least for bracket
> expressions.
Considering? - I think that became part of the 4.0 release.
From the changelog file...
Sun Jun 12 23:43:06 2011 Arnold D. Robbins <arnold@skeeve.com>
* re.c (resetup): Always turn on RE_RANGES_IGNORE_LOCALES.
Janis
>
>> [...]
|
|
0
|
|
|
|
Reply
|
janis_papanagnou (1038)
|
12/23/2011 9:56:02 AM
|
|
In article <jd0she$ip3$1@news.xmission.com>,
Kenny McCormack <gazelle@shell.xmission.com> wrote:
>As we all know, having any non-standard "locale" setting - i.e., anything
>other than "C" - causes (g)awk scripts to misbehave in totally mysterious
>ways. Plain obvious things like /[0-9]+/ stop working. Further, we find
>that a lot of mysterious ("That just can't happen") support problems on
>these newsgroups (and, I would imagine, in real life as well) are caused by
>people running with non-standard locale settings.
In gawk 4.0. You're 6 months behind the times. The phrase coined by
Karl Berry is "Rational Range Interpretation". It applies even when --posix
is in effect, since the latest standard allows it.
> 2) Just as a funny aside, I have a situation where, using an "extension
> library", I was able to call the libc "setlocale" function from
> within a GAWK script - and I found, interestingly, that, while
> dynamic regular expressions worked fine, fixed/static reg exps
> caused the program to blow up with an "internal error - aborted"
> message. I.e., the code is like:
> MyExtensionLib(1,"C")
> x = $0 ~ "[0-9]+" # OK
> x = $0 ~ /[0-9]+/ # Blows up
Makes sense. Static regexps were compiled with the locale in effect at the
time gawk was started, the dynamic ones afterwards.
Move to gawk 4.0. You'll be happier.
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
|
|
0
|
|
|
|
Reply
|
arnold847 (183)
|
12/23/2011 12:14:29 PM
|
|
In article <jd1rb5$bkj$1@dont-email.me>,
Aharon Robbins <arnold@skeeve.com> wrote:
>In article <jd0she$ip3$1@news.xmission.com>,
>Kenny McCormack <gazelle@shell.xmission.com> wrote:
>>As we all know, having any non-standard "locale" setting - i.e., anything
>>other than "C" - causes (g)awk scripts to misbehave in totally mysterious
>>ways. Plain obvious things like /[0-9]+/ stop working. Further, we find
>>that a lot of mysterious ("That just can't happen") support problems on
>>these newsgroups (and, I would imagine, in real life as well) are caused by
>>people running with non-standard locale settings.
>
>In gawk 4.0. You're 6 months behind the times.
At least...
More like 7 years.
>The phrase coined by Karl Berry is "Rational Range Interpretation". It
>applies even when --posix is in effect, since the latest standard allows
>it.
Good to hear. I think I will go ahead and start looking at the latest
version. Right now, I only have a MacOS version of gawk4; will need to
build a Linux version.
>> 2) Just as a funny aside, I have a situation where, using an "extension
>> library", I was able to call the libc "setlocale" function from
>> within a GAWK script - and I found, interestingly, that, while
>> dynamic regular expressions worked fine, fixed/static reg exps
>> caused the program to blow up with an "internal error - aborted"
>> message. I.e., the code is like:
>> MyExtensionLib(1,"C")
>> x = $0 ~ "[0-9]+" # OK
>> x = $0 ~ /[0-9]+/ # Blows up
>
>Makes sense. Static regexps were compiled with the locale in effect at the
>time gawk was started, the dynamic ones afterwards.
Yup. That's what I figured.
>Move to gawk 4.0. You'll be happier.
Maybe so. I'll need to retro-fit my changes, though...
--
Windows 95 n. (Win-doze): A 32 bit extension to a 16 bit user interface for
an 8 bit operating system based on a 4 bit architecture from a 2 bit company
that can't stand 1 bit of competition.
Modern day upgrade --> Windows XP Professional x64: Windows is now a 64 bit
tweak of a 32 bit extension to a 16 bit user interface for an 8 bit
operating system based on a 4 bit architecture from a 2 bit company that
can't stand 1 bit of competition.
|
|
0
|
|
|
|
Reply
|
gazelle3 (1609)
|
12/23/2011 1:43:55 PM
|
|
|
4 Replies
47 Views
(page loaded in 0.044 seconds)
|