f



Spurious diagnostic messages

In rec.puzzles, I was reading that the probability of hitting any square 
on a sufficiently large board converges to 1/7th (assuming you're 
rolling two fair six-sided dice to determine your progress along the board).

It was sufficiently early in the morning that this rather bland 
assertion struck me as being curious, and I resolved (in my sleepy way) 
to test it. So I spent five contented and rather sleepy minutes writing 
a C program to confirm the obvious fact that the average 2D6 roll is 7 
by playing a million games of "no snakes or ladders" on a rather large 
board and seeing how often each square got landed on.

Having just devoted two whole paragraphs to explaining why this had 
nothing whatsoever to do with spurious diagnostics (nothing could have 
been further from my mind), I will now show you the diagnostics, all of 
which I consider spurious:

simboard.c:5:5: warning: no previous prototype for ‘die’ 
[-Wmissing-prototypes]

The line under consideration was:

int die(int sides)

which *IS* a prototype! The compiler is telling me that this is the 
first prototype it's seen for this function. What am I supposed to do 
with that information - alert the media?

On the other hand, do I want to suppress this message? Heck no. If I 
actually /use/ a function without a prototype, I want the compiler to 
tell me.

Can I improve the code to remove the warning? Well, experimentation 
shows that I could write:

int foo(int);
int foo(int x)

every time, but I'm not convinced that this is "improved". It's just one 
more maintenance chore.

simboard.c: In function ‘die’:
simboard.c:7:3: warning: conversion to ‘int’ from ‘double’ may alter its 
value [-Wconversion]

Here's the line:
   return 1 + sides * (rand() / (RAND_MAX + 1.0));

This is very much along the lines of the hypothetical "adding two 
unsigned integers may result in a value lower than either operand", 
which was thought to be so ridiculous in a recent thread. This line is 
idiomatic, and there is no need whatsoever to change it.

simboard.c: At top level:
simboard.c:10:8: warning: no previous prototype for ‘mean’ 
[-Wmissing-prototypes]
simboard.c:21:8: warning: no previous prototype for ‘stddev’ 
[-Wmissing-prototypes]
simboard.c:34:6: warning: no previous prototype for ‘minimaxi’ 
[-Wmissing-prototypes]

All of these are "gosh, this is the first time I've seen this prototype" 
messages.

simboard.c: In function ‘minimaxi’:
simboard.c:38:3: warning: negative integer implicitly converted to 
unsigned type [-Wsign-conversion]

I have a pointer to an unsigned long, which I wish to set to the lowest 
value in a range of data. Here's the setup line before the loop:

   *low = -1;

Unsigned integer arithmetic *guarantees* that this will assign to *low 
the largest possible unsigned long value. (Of course, the idea is that, 
in the loop, it will be reset to any lower value encountered.)

The warning is 100% correct. Do I want to turn it off? No, because I'd 
like to know whether I'm doing that sort of thing accidentally. But do I 
want to change the code? No. The code as written is perfectly clear, and 
needs no rewriting. A cast would be pointless obscurantism.

simboard.c: In function ‘main’:
simboard.c:79:19: warning: conversion to ‘size_t’ from ‘int’ may change 
the sign of the result [-Wsign-conversion]
simboard.c:80:19: warning: conversion to ‘size_t’ from ‘int’ may change 
the sign of the result [-Wsign-conversion]

Perfectly true, except that it won't. Here are the lines:

   score += die(6);
   score += die(6);

The score object is indeed an unsigned type, but it's set to 0 inside 
the loop where it's used, and then we add a couple of values in the 
range 1 to 6 to it, so the biggest value it's going to reach is 12000 or 
so (because that's how big the board is), so it can't ever wrap.

Do I want to suppress the warning? No, I'd like to be told about such 
issues. Does the warning indicate an actual problem with this code? No. 
Should I change the code to remove the warning? This is perhaps the 
closest candidate for a change in the whole program, but even then I'm 
very much in two minds about it. The obvious candidates for change are:

1) change score to a signed type - but it's used as an index into an 
array, so that's not so bright an idea
2) change die so that it returns an unsigned type - better, but what if 
I had dice marked up with -3 to +2, say? The unsigned type then becomes 
an unspoken assumption.

I'm not really happy with either of these possibilities.

I know! I'll switch back to using Windows, and Visual Studio. That way, 
the only warning I'll have to worry about is that // comment in the 
math.h header.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/23/2015 7:09:35 AM
comp.lang.c 30657 articles. 4 followers. spinoza1111 (3246) is leader. Post Follow

728 Replies
1360 Views

Similar Articles

[PageSpeed] 12

Hello Richard,

On 23/04/2015 09:09, Richard Heathfield wrote:

> simboard.c:5:5: warning: no previous prototype for ‘die’ [-Wmissing-prototypes]
>
> The line under consideration was:
>
> int die(int sides)
>
> which *IS* a prototype! The compiler is telling me that this is the
> first prototype it's seen for this function. What am I supposed to do
> with that information - alert the media?
>
> On the other hand, do I want to suppress this message? Heck no. If I
> actually /use/ a function without a prototype, I want the compiler to
> tell me.
>
> Can I improve the code to remove the warning? Well, experimentation
> shows that I could write:
>
> int foo(int);
> int foo(int x)
>
> every time, but I'm not convinced that this is "improved". It's just one more maintenance chore.

You didn't specify compiler, version, and compilation options.
(I'm using gcc 4.8.2)

First of all, please note that -Wmissing-prototypes is neither
in -Wall nor in -Wextra (it has to be explicitly requested).

https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wmissing-prototypes-548

> Warn if a global function is defined without a previous prototype
> declaration. This warning is issued even if the definition itself
> provides a prototype. Use this option to detect global functions that
> do not have a matching prototype declaration in a header file.

I think the intent is clear:
- either the function is local, then it must be static
- or it is global, then you declare it in the appropriate header

The warning will not trigger for static functions.

Regards.

0
Noob
4/23/2015 8:56:26 AM
Noob wrote:
> Hello Richard,
>
> On 23/04/2015 09:09, Richard Heathfield wrote:
>
>> Warn if a global function is defined without a previous prototype
>> declaration. This warning is issued even if the definition itself
>> provides a prototype. Use this option to detect global functions that
>> do not have a matching prototype declaration in a header file.
>
> I think the intent is clear:
> - either the function is local, then it must be static
> - or it is global, then you declare it in the appropriate header

Nit-pick time!

By the time the compiler can diagnose this condition, the preprocessor 
will have done its thing with include files.  Therefore there there is 
no way for it to tell whether a prototype is local or in a header.

-- 
Ian Collins
0
Ian
4/23/2015 9:13:32 AM
Ian Collins wrote:
> Noob wrote:
>> Richard Heathfield wrote:
>>
>>> Warn if a global function is defined without a previous prototype
>>> declaration. This warning is issued even if the definition itself
>>> provides a prototype. Use this option to detect global functions that
>>> do not have a matching prototype declaration in a header file.
>>
>> I think the intent is clear:
>> - either the function is local, then it must be static
>> - or it is global, then you declare it in the appropriate header
>
> Nit-pick time!
>
> By the time the compiler can diagnose this condition, the
> preprocessor will have done its thing with include files.  Therefore
> there there is no way for it to tell whether a prototype is local or
> in a header.

I don't understand your remark.

If a given function is only intended to be used in the current
translation unit, then it should be static.

If it is not static, then it is intended to be called from other
translation units, and a prototype should be provided in the
appropriate header.

Regards.

0
Noob
4/23/2015 9:25:29 AM
In article <cprrdsF6f3iU3@mid.individual.net>,
Ian Collins  <ian-news@hotmail.com> wrote:
....
>Nit-pick time!
>
>By the time the compiler can diagnose this condition, the preprocessor 
>will have done its thing with include files.  Therefore there there is 
>no way for it to tell whether a prototype is local or in a header.

True, but it doesn't matter.  The OP showed that if he put his own
prototype in place (as shown below) it eliminates the warning:

int foo(int);	/* Add this line right before the definition */
int foo(int x) { ... }

Also, the previous poster's statement was that that was the intent (that
you have a declaration in a header file), not that the system would enforce
it (since, as you point out, it can't really do that).

-- 
Religion is regarded by the common people as true,
	by the wise as foolish,
	and by the rulers as useful.

(Seneca the Younger, 65 AD)

0
gazelle
4/23/2015 9:29:03 AM
On 23/04/15 10:25, Noob wrote:
> Ian Collins wrote:
>> Noob wrote:
>>> Richard Heathfield wrote:
>>>
>>>> Warn if a global function is defined without a previous prototype
>>>> declaration. This warning is issued even if the definition itself
>>>> provides a prototype. Use this option to detect global functions that
>>>> do not have a matching prototype declaration in a header file.
>>>
>>> I think the intent is clear:
>>> - either the function is local, then it must be static
>>> - or it is global, then you declare it in the appropriate header
>>
>> Nit-pick time!
>>
>> By the time the compiler can diagnose this condition, the
>> preprocessor will have done its thing with include files.  Therefore
>> there there is no way for it to tell whether a prototype is local or
>> in a header.
>
> I don't understand your remark.
>
> If a given function is only intended to be used in the current
> translation unit, then it should be static.
>
> If it is not static, then it is intended to be called from other
> translation units, and a prototype should be provided in the
> appropriate header.

What Ian is saying is that the compiler cannot know whether a function 
prototype appears in an appropriate header. And in fact, it not only 
can't, but doesn't.

int foo(int);
int foo(int x)
{
   return x/2;
}

doesn't trigger the diagnostic message, even though the first prototype 
is in the source, not a header.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/23/2015 9:36:35 AM
On 23/04/15 09:56, Noob wrote:
 > Hello Richard,
 >
 > On 23/04/2015 09:09, Richard Heathfield wrote:
 >
 >> simboard.c:5:5: warning: no previous prototype for ‘die’ 
[-Wmissing-prototypes]
 >>
 >> The line under consideration was:
 >>
 >> int die(int sides)
 >>
 >> which *IS* a prototype! The compiler is telling me that this is the
 >> first prototype it's seen for this function. What am I supposed to do
 >> with that information - alert the media?
 >>
 >> On the other hand, do I want to suppress this message? Heck no. If I
 >> actually /use/ a function without a prototype, I want the compiler to
 >> tell me.
 >>
 >> Can I improve the code to remove the warning? Well, experimentation
 >> shows that I could write:
 >>
 >> int foo(int);
 >> int foo(int x)
 >>
 >> every time, but I'm not convinced that this is "improved". It's just 
one more maintenance chore.
 >
 > You didn't specify compiler, version, and compilation options.
 > (I'm using gcc 4.8.2)

4.6.3 - the compilation options are the ones I always use for 
development. Normally, I don't have this problem, but this morning I got 
eight spurious diagnostic messages in just a hundred lines of code, 
immediately after a long clc discussion about such messages. It is 
coincidental, by the way. The program was not constructed with a view to 
forming a part of the argument (in fact, had that been my intent, I 
doubt very much whether I could have intentionally constructed such a 
fine example).

 >
 > First of all, please note that -Wmissing-prototypes is neither
 > in -Wall nor in -Wextra (it has to be explicitly requested).

Sure. And of course I do explicitly request that diagnostic message by
default (i.e. my Makefile generator writes into into my CFLAGS), because 
I want very much to know whether I'm using a function without a valid 
prototype in scope. I wasn't, but I got the warning anyway.

But I take note of your documentation ref. And, as it turns out, the 
purpose of the diagnostic message seems to bear little relation to its 
name. In fact, its documented intended meaning makes it unfit for any 
purpose that I can see. What we need is a diagnostic message that tells 
us whether a function prototype /conflicts/ with a prior prototype, and 
one that tells us whether we are calling a function without a valid 
prototype available. We don't need one to tell us that we're writing a 
function.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/23/2015 9:53:52 AM
On 23/04/2015 11:36, Richard Heathfield wrote:
> On 23/04/15 10:25, Noob wrote:
>> Ian Collins wrote:
>>> Noob wrote:
>>>> Richard Heathfield wrote:
>>>>
>>>>> Warn if a global function is defined without a previous prototype
>>>>> declaration. This warning is issued even if the definition itself
>>>>> provides a prototype. Use this option to detect global functions that
>>>>> do not have a matching prototype declaration in a header file.
>>>>
>>>> I think the intent is clear:
>>>> - either the function is local, then it must be static
>>>> - or it is global, then you declare it in the appropriate header
>>>
>>> Nit-pick time!
>>>
>>> By the time the compiler can diagnose this condition, the
>>> preprocessor will have done its thing with include files.  Therefore
>>> there there is no way for it to tell whether a prototype is local or
>>> in a header.
>>
>> I don't understand your remark.
>>
>> If a given function is only intended to be used in the current
>> translation unit, then it should be static.
>>
>> If it is not static, then it is intended to be called from other
>> translation units, and a prototype should be provided in the
>> appropriate header.
>
> What Ian is saying is that the compiler cannot know whether a
> function prototype appears in an appropriate header. And in fact, it
> not only can't, but doesn't.
>
> int foo(int);
> int foo(int x)
> {
>    return x/2;
> }
>
> doesn't trigger the diagnostic message, even though the first prototype is in the source, not a header.

As Kenny points out, the warning is used to diagnose /missing/
declarations. If you want to silence the warning by providing a
declaration within the source, have a blast. (Or simply don't
request the warning on the command line!)

0
Noob
4/23/2015 9:54:09 AM
On 23/04/15 10:54, Noob wrote:
> On 23/04/2015 11:36, Richard Heathfield wrote:
>> On 23/04/15 10:25, Noob wrote:
>>> Ian Collins wrote:
>>>> Noob wrote:
>>>>> Richard Heathfield wrote:
>>>>>
>>>>>> Warn if a global function is defined without a previous prototype
>>>>>> declaration. This warning is issued even if the definition itself
>>>>>> provides a prototype. Use this option to detect global functions that
>>>>>> do not have a matching prototype declaration in a header file.
>>>>>
>>>>> I think the intent is clear:
>>>>> - either the function is local, then it must be static
>>>>> - or it is global, then you declare it in the appropriate header
>>>>
>>>> Nit-pick time!
>>>>
>>>> By the time the compiler can diagnose this condition, the
>>>> preprocessor will have done its thing with include files.  Therefore
>>>> there there is no way for it to tell whether a prototype is local or
>>>> in a header.
>>>
>>> I don't understand your remark.
>>>
>>> If a given function is only intended to be used in the current
>>> translation unit, then it should be static.
>>>
>>> If it is not static, then it is intended to be called from other
>>> translation units, and a prototype should be provided in the
>>> appropriate header.
>>
>> What Ian is saying is that the compiler cannot know whether a
>> function prototype appears in an appropriate header. And in fact, it
>> not only can't, but doesn't.
>>
>> int foo(int);
>> int foo(int x)
>> {
>>    return x/2;
>> }
>>
>> doesn't trigger the diagnostic message, even though the first prototype is in the source, not a header.
>
> As Kenny points out, the warning is used to diagnose /missing/
> declarations. If you want to silence the warning by providing a
> declaration within the source, have a blast. (Or simply don't
> request the warning on the command line!)

Firstly, the nitpick was about the inaccurate claim in the documentation 
for the diagnostic message.

Secondly, obviously I don't want to silence the warning like that. It's 
clearly a daft warning, so I've removed it from the Makefile. To check 
that genuine prototype problems are still diagnosed, I moved a function 
from top to bottom, and got a warning about an implicit declaration. 
That's fine - that's what I want.

Unfortunately, removing the -Wmissing-prototypes warning from the 
Makefile did not reduce my diagnostic count, because now I'm getting 
precisely the same message text from -Wmissing-declarations, which seems 
to suffer from the same problem as -Wmissing-prototypes in that it is 
completely and utterly useless (yes, I checked the docs carefully this 
time to ensure that I knew what the gcc team claims for the warning).

So that's another warning that I can trim from the Makefile generator. 
(Done.)

Unfortunately, I don't think I can do anything about -Wconversion or 
-Wsign-conversion. I do actually need those.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/23/2015 10:06:29 AM
On 23/04/2015 12:06, Richard Heathfield wrote:
> On 23/04/15 10:54, Noob wrote:
>> On 23/04/2015 11:36, Richard Heathfield wrote:
>>> On 23/04/15 10:25, Noob wrote:
>>>> Ian Collins wrote:
>>>>> Noob wrote:
>>>>>> Richard Heathfield wrote:
>>>>>>
>>>>>>> Warn if a global function is defined without a previous prototype
>>>>>>> declaration. This warning is issued even if the definition itself
>>>>>>> provides a prototype. Use this option to detect global functions that
>>>>>>> do not have a matching prototype declaration in a header file.
>>>>>>
>>>>>> I think the intent is clear:
>>>>>> - either the function is local, then it must be static
>>>>>> - or it is global, then you declare it in the appropriate header
>>>>>
>>>>> Nit-pick time!
>>>>>
>>>>> By the time the compiler can diagnose this condition, the
>>>>> preprocessor will have done its thing with include files.  Therefore
>>>>> there there is no way for it to tell whether a prototype is local or
>>>>> in a header.
>>>>
>>>> I don't understand your remark.
>>>>
>>>> If a given function is only intended to be used in the current
>>>> translation unit, then it should be static.
>>>>
>>>> If it is not static, then it is intended to be called from other
>>>> translation units, and a prototype should be provided in the
>>>> appropriate header.
>>>
>>> What Ian is saying is that the compiler cannot know whether a
>>> function prototype appears in an appropriate header. And in fact, it
>>> not only can't, but doesn't.
>>>
>>> int foo(int);
>>> int foo(int x)
>>> {
>>>    return x/2;
>>> }
>>>
>>> doesn't trigger the diagnostic message, even though the first prototype is in the source, not a header.
>>
>> As Kenny points out, the warning is used to diagnose /missing/
>> declarations. If you want to silence the warning by providing a
>> declaration within the source, have a blast. (Or simply don't
>> request the warning on the command line!)
> 
> Firstly, the nitpick was about the inaccurate claim in the documentation 
> for the diagnostic message.

I see it now. Makes sense.

> Secondly, obviously I don't want to silence the warning like that. It's 
> clearly a daft warning, so I've removed it from the Makefile. To check 
> that genuine prototype problems are still diagnosed, I moved a function 
> from top to bottom, and got a warning about an implicit declaration. 
> That's fine - that's what I want.

I must be missing something obvious, because I don't see where
-Wmissing-prototypes would trigger spurious warnings.

static functions are ignored, and non-static functions should
be declared in an included header.

> Unfortunately, removing the -Wmissing-prototypes warning from the 
> Makefile did not reduce my diagnostic count, because now I'm getting 
> precisely the same message text from -Wmissing-declarations, which seems 
> to suffer from the same problem as -Wmissing-prototypes in that it is 
> completely and utterly useless (yes, I checked the docs carefully this 
> time to ensure that I knew what the gcc team claims for the warning).

I read the description for -Wmissing-declarations several times
and still don't understand it.

> Unfortunately, I don't think I can do anything about -Wconversion or 
> -Wsign-conversion. I do actually need those.

I'll have to look at those more closely, but warning that converting
from double to int may produce weird results seems reasonable?

Regards.

0
Noob
4/23/2015 10:35:02 AM
Richard Heathfield <rjh@cpax.org.uk> writes:
<snip>
> Having just devoted two whole paragraphs to explaining why this had
> nothing whatsoever to do with spurious diagnostics (nothing could have
> been further from my mind), I will now show you the diagnostics, all
> of which I consider spurious:
>
> simboard.c:5:5: warning: no previous prototype for ‘die’
> [-Wmissing-prototypes]
>
> The line under consideration was:
>
> int die(int sides)
>
> which *IS* a prototype! The compiler is telling me that this is the
> first prototype it's seen for this function. What am I supposed to do
> with that information - alert the media?

I'm aware of a long related thread which I have not studied, but this
baffles me.  Presumably you asked for this warning, so why are you now
objecting to it?  From man gcc:

  "Warn if a global function is defined without a previous prototype
  declaration.  This warning is issued even if the definition itself
  provides a prototype.  Use this option to detect global functions that
  do not have a matching prototype declaration in a header file."

It's doing what you asked for.

> On the other hand, do I want to suppress this message? Heck no. If I
> actually /use/ a function without a prototype, I want the compiler to
> tell me.

What's the connection?  Omitting -Wmissing-prototypes won't prevent
that very important warning.

<snip>
> simboard.c: In function ‘die’:
> simboard.c:7:3: warning: conversion to ‘int’ from ‘double’ may alter
> its value [-Wconversion]
>
> Here's the line:
>   return 1 + sides * (rand() / (RAND_MAX + 1.0));
>
> This is very much along the lines of the hypothetical "adding two
> unsigned integers may result in a value lower than either operand",
> which was thought to be so ridiculous in a recent thread. This line is
> idiomatic, and there is no need whatsoever to change it.

Idiomatic, but not 100% safe.  On systems where the code can go wrong
(i.e. when RAND_MAX + 1.0 == RAND_MAX), you would hope for a better
error, but something is better than nothing.

<snip>
-- 
Ben.
0
Ben
4/23/2015 10:37:15 AM
Richard Heathfield <rjh@cpax.org.uk> writes:

> On 23/04/15 10:25, Noob wrote:
>> Ian Collins wrote:
>>> Noob wrote:
>>>> Richard Heathfield wrote:
>>>>
>>>>> Warn if a global function is defined without a previous prototype
>>>>> declaration. This warning is issued even if the definition itself
>>>>> provides a prototype. Use this option to detect global functions that
>>>>> do not have a matching prototype declaration in a header file.
>>>>
>>>> I think the intent is clear:
>>>> - either the function is local, then it must be static
>>>> - or it is global, then you declare it in the appropriate header
>>>
>>> Nit-pick time!
>>>
>>> By the time the compiler can diagnose this condition, the
>>> preprocessor will have done its thing with include files.  Therefore
>>> there there is no way for it to tell whether a prototype is local or
>>> in a header.
>>
>> I don't understand your remark.
>>
>> If a given function is only intended to be used in the current
>> translation unit, then it should be static.
>>
>> If it is not static, then it is intended to be called from other
>> translation units, and a prototype should be provided in the
>> appropriate header.
>
> What Ian is saying is that the compiler cannot know whether a function
> prototype appears in an appropriate header. And in fact, it not only
> can't, but doesn't.

But Ian's comment (I'm piggybacking my reply here for simplicity) is not
strictly true, surely?  The compiler can know, and if it wants remember,
where it saw a prototype.  Whether that's worth the effort is debatable,
especially as would then have to decide what to do with multiple
prototypes, some in headers and some not.

<snip>
-- 
Ben.
0
Ben
4/23/2015 10:43:15 AM
On 23/04/15 11:35, Noob wrote:

<snip>

> I must be missing something obvious, because I don't see where
> -Wmissing-prototypes would trigger spurious warnings.
>
> static functions are ignored, and non-static functions should
> be declared in an included header.

So to deal with the warning, we have to make functions static even in a 
single-module C source, where they are rather pointless. I think I'd 
rather lose the warning.

>> Unfortunately, removing the -Wmissing-prototypes warning from the
>> Makefile did not reduce my diagnostic count, because now I'm getting
>> precisely the same message text from -Wmissing-declarations, which seems
>> to suffer from the same problem as -Wmissing-prototypes in that it is
>> completely and utterly useless (yes, I checked the docs carefully this
>> time to ensure that I knew what the gcc team claims for the warning).
>
> I read the description for -Wmissing-declarations several times
> and still don't understand it.

I think I do. If you have this code:

#include <stdio.h>
unsigned long seed;
unsigned long prng(void)
{
   ++seed;
   seed *= 0x87654321;
   return seed >> 7; /* don't use this at home, folks! */
}

int main(void)
{
   printf("%lu\n", prng());
   return 0;
}

-Wmissing-declarations would warn for seed, because there is no 
declaration preceding its definition. It would also warn for prng(), 
because there is no declaration (a function declaration in this case) 
preceding its definition.

Fix 1: write a header in which unsigned long seed is declared extern, 
and in which prng is prototyped;
Fix 2: make them static;
Fix 3: don't use that warning.

I think I'll go for Fix 3.

>
>> Unfortunately, I don't think I can do anything about -Wconversion or
>> -Wsign-conversion. I do actually need those.
>
> I'll have to look at those more closely, but warning that converting
> from double to int may produce weird results seems reasonable?

Yes, it is indeed reasonable to warn about that, but it is also 
reasonable to read the warning, decide that in this case it doesn't 
apply, comment the code accordingly, and leave it at that.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/23/2015 10:44:27 AM
On 23/04/2015 09:09, Richard Heathfield wrote:

> It was sufficiently early in the morning that this rather bland 
> assertion struck me as being curious, and I resolved (in my sleepy way) 
> to test it. So I spent five contented and rather sleepy minutes writing 
> a C program to confirm the obvious fact that the average 2D6 roll is 7 
> by playing a million games of "no snakes or ladders" on a rather large 
> board and seeing how often each square got landed on.

OT remark :-)

The expected value (in the mathematical sense, EV) of a discrete random
variable is the probability-weighted average of all possible values.
Even simpler for a uniform distribution.

https://en.wikipedia.org/wiki/Expected_value
https://en.wikipedia.org/wiki/Uniform_distribution_%28discrete%29

(1+2+3+4+5+6) / 6 = 3.5

Regards.

0
Noob
4/23/2015 10:46:50 AM
On 23/04/2015 12:44, Richard Heathfield wrote:

> On 23/04/15 11:35, Noob wrote:
> 
>> I must be missing something obvious, because I don't see where
>> -Wmissing-prototypes would trigger spurious warnings.
>>
>> static functions are ignored, and non-static functions should
>> be declared in an included header.
> 
> So to deal with the warning, we have to make functions static even
> in a single-module C source, where they are rather pointless.

Disagree. 'static' tells the compiler: "there are no other
users of this function, apart from those you see here".
Helps the compiler with inlining decisions.

Regards.

0
Noob
4/23/2015 10:56:29 AM
On 23/04/15 11:37, Ben Bacarisse wrote:
> Richard Heathfield <rjh@cpax.org.uk> writes:
> <snip>
>> Having just devoted two whole paragraphs to explaining why this had
>> nothing whatsoever to do with spurious diagnostics (nothing could have
>> been further from my mind), I will now show you the diagnostics, all
>> of which I consider spurious:
>>
>> simboard.c:5:5: warning: no previous prototype for ‘die’
>> [-Wmissing-prototypes]
>>
>> The line under consideration was:
>>
>> int die(int sides)
>>
>> which *IS* a prototype! The compiler is telling me that this is the
>> first prototype it's seen for this function. What am I supposed to do
>> with that information - alert the media?
>
> I'm aware of a long related thread which I have not studied, but this
> baffles me.  Presumably you asked for this warning, so why are you now
> objecting to it?

Yes, I asked for it. Why? Cargo-cult accumulation of warnings 
(embarrassing but true). Now that I understand the warning message more 
thoroughly, I've decided I can easily live without it. :-)

More importantly...

> <snip>
>> simboard.c: In function ‘die’:
>> simboard.c:7:3: warning: conversion to ‘int’ from ‘double’ may alter
>> its value [-Wconversion]
>>
>> Here's the line:
>>   return 1 + sides * (rand() / (RAND_MAX + 1.0));
>>
>> This is very much along the lines of the hypothetical "adding two
>> unsigned integers may result in a value lower than either operand",
>> which was thought to be so ridiculous in a recent thread. This line is
>> idiomatic, and there is no need whatsoever to change it.
>
> Idiomatic, but not 100% safe.  On systems where the code can go wrong
> (i.e. when RAND_MAX + 1.0 == RAND_MAX), you would hope for a better
> error, but something is better than nothing.

That's an interesting angle that I hadn't considered. On such a system, 
RAND_MAX would have, I think, to be at least ten decimal digits. In 
fact, on my system it /is/ ten decimal digits - it's 2^31 - 1 - but a 
quick test reveals that RAND_MAX + 1.0 != RAND_MAX right here, anyway.

On a system such as you describe, with a decent PRNG, the code would 
still succeed (at least) 99.99999999% of the time! That doesn't make it 
/right/, obviously. An actuary of my acquaintance (hi James) would 
describe such code as "sweet" - meaning that it isn't actually right, 
but that nobody else would ever, ever know. :-)

I'm not advocating ignoring the problem, however. An assertion might be 
in order (for real code, at any rate).

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/23/2015 11:12:13 AM
On Thursday, April 23, 2015 at 11:44:42 AM UTC+1, Richard Heathfield wrote:
> On 23/04/15 11:35, Noob wrote:
>=20
> <snip>
>=20
> > I must be missing something obvious, because I don't see where
> > -Wmissing-prototypes would trigger spurious warnings.
> >
> > static functions are ignored, and non-static functions should
> > be declared in an included header.
>=20
> So to deal with the warning, we have to make functions static even in a=
=20
> single-module C source, where they are rather pointless. I think I'd=20
> rather lose the warning.

The documentation clearly describes the intended use of this warning - and =
single-module C source is evidently not that.  You *deliberately* and *spec=
ifically* turned on a particular warning that is *singularly, clearly and e=
xplicitly* not intended for a single source module and feel the need to poi=
nt that fact out.

    "Use this option to detect global functions=20
that do not have a matching prototype declaration in a header file."

Very, very lame, Richard.
0
gwowen
4/23/2015 11:16:01 AM
On 23/04/15 11:56, Noob wrote:
> On 23/04/2015 12:44, Richard Heathfield wrote:
>
>> On 23/04/15 11:35, Noob wrote:
>>
>>> I must be missing something obvious, because I don't see where
>>> -Wmissing-prototypes would trigger spurious warnings.
>>>
>>> static functions are ignored, and non-static functions should
>>> be declared in an included header.
>>
>> So to deal with the warning, we have to make functions static even
>> in a single-module C source, where they are rather pointless.
>
> Disagree. 'static' tells the compiler: "there are no other
> users of this function, apart from those you see here".
> Helps the compiler with inlining decisions.

That smacks of premature optimisation. I realise some people do write 
code that's right on the ragged edge and they need to squeeze every last 
millicycle out of their source, but I would venture to suggest that for 
most programmers this is probably not actually essential. Their 
bottlenecks will generally lie elsewhere.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/23/2015 11:16:50 AM
On 23/04/15 12:16, gwowen wrote:
> On Thursday, April 23, 2015 at 11:44:42 AM UTC+1, Richard Heathfield wrote:
>> On 23/04/15 11:35, Noob wrote:
>>
>> <snip>
>>
>> > I must be missing something obvious, because I don't see where
>> > -Wmissing-prototypes would trigger spurious warnings.
>> >
>> > static functions are ignored, and non-static functions should
>> > be declared in an included header.
>>
>> So to deal with the warning, we have to make functions static even in a
>> single-module C source, where they are rather pointless. I think I'd
>> rather lose the warning.
>
> The documentation clearly describes the intended use of this warning - and single-module C source is evidently not that.  You *deliberately* and *specifically* turned on a particular warning that is *singularly, clearly and explicitly* not intended for a single source module and feel the need to point that fact out.
>
>      "Use this option to detect global functions
> that do not have a matching prototype declaration in a header file."
>
> Very, very lame, Richard.

I'm afraid it's even lamer than you think. Cargo cult warnings. I hadn't 
even /read/ the documentation.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/23/2015 11:19:59 AM
On 04/23/2015 03:09 AM, Richard Heathfield wrote:
....
> simboard.c:5:5: warning: no previous prototype for ‘die’ 
> [-Wmissing-prototypes]
> 
> The line under consideration was:
> 
> int die(int sides)
> 
> which *IS* a prototype! The compiler is telling me that this is the 
> first prototype it's seen for this function. What am I supposed to do 
> with that information - alert the media?

You have two options:
If the function is used only in the same module, declare it static. This
has a couple of advantages. It enables a warning message if the function
is never actually used in that same module, and it prevents the
function's name from polluting the space of names with external linkage.

If the function is used in modules other than the one it is used in,
there should be a single declaration of the function in a separate
header file, and that file should be #included in every module where
that function is used. Less obviously, but very important: it should
also be #included in the file where the function is defined. Putting it
there will provoke a diagnostic if the header file's declaration is
incompatible with the function definition.

I think both things are improvements over the original code (though the
second case is far more important than the first), and either one will
make the warning message go away.

I'm in full agreement with all of the other cases you discuss.
-- 
James Kuyper
0
James
4/23/2015 12:47:34 PM
On Thursday, April 23, 2015 at 12:10:02 AM UTC-7, Richard Heathfield wrote:
> In rec.puzzles, I was reading that the probability of hitting any square 
> on a sufficiently large board converges to 1/7th (assuming you're 
> rolling two fair six-sided dice to determine your progress along the board).
> 
> It was sufficiently early in the morning that this rather bland 
> assertion struck me as being curious, and I resolved (in my sleepy way) 
> to test it. So I spent five contented and rather sleepy minutes writing 
> a C program to confirm the obvious fact that the average 2D6 roll is 7 
> by playing a million games of "no snakes or ladders" on a rather large 
> board and seeing how often each square got landed on.
> 
> Having just devoted two whole paragraphs to explaining why this had 
> nothing whatsoever to do with spurious diagnostics (nothing could have 
> been further from my mind), I will now show you the diagnostics, all of 
> which I consider spurious:
> 
> simboard.c:5:5: warning: no previous prototype for 'die' 
> [-Wmissing-prototypes]
> 
> The line under consideration was:
> 
> int die(int sides)
> 
> which *IS* a prototype! The compiler is telling me that this is the 
> first prototype it's seen for this function. What am I supposed to do 
> with that information - alert the media?
> 
> On the other hand, do I want to suppress this message? Heck no. If I 
> actually /use/ a function without a prototype, I want the compiler to 
> tell me.
> 
> Can I improve the code to remove the warning? Well, experimentation 
> shows that I could write:
> 
> int foo(int);
> int foo(int x)
> 
> every time, but I'm not convinced that this is "improved". It's just one 
> more maintenance chore.
> 
> simboard.c: In function 'die':
> simboard.c:7:3: warning: conversion to 'int' from 'double' may alter its 
> value [-Wconversion]
> 
> Here's the line:
>    return 1 + sides * (rand() / (RAND_MAX + 1.0));
> 
> This is very much along the lines of the hypothetical "adding two 
> unsigned integers may result in a value lower than either operand", 
> which was thought to be so ridiculous in a recent thread. This line is 
> idiomatic, and there is no need whatsoever to change it.
> 
> simboard.c: At top level:
> simboard.c:10:8: warning: no previous prototype for 'mean' 
> [-Wmissing-prototypes]
> simboard.c:21:8: warning: no previous prototype for 'stddev' 
> [-Wmissing-prototypes]
> simboard.c:34:6: warning: no previous prototype for 'minimaxi' 
> [-Wmissing-prototypes]
> 
> All of these are "gosh, this is the first time I've seen this prototype" 
> messages.
> 
> simboard.c: In function 'minimaxi':
> simboard.c:38:3: warning: negative integer implicitly converted to 
> unsigned type [-Wsign-conversion]
> 
> I have a pointer to an unsigned long, which I wish to set to the lowest 
> value in a range of data. Here's the setup line before the loop:
> 
>    *low = -1;
> 
> Unsigned integer arithmetic *guarantees* that this will assign to *low 
> the largest possible unsigned long value. (Of course, the idea is that, 
> in the loop, it will be reset to any lower value encountered.)
> 
> The warning is 100% correct. Do I want to turn it off? No, because I'd 
> like to know whether I'm doing that sort of thing accidentally. But do I 
> want to change the code? No. The code as written is perfectly clear, and 
> needs no rewriting. A cast would be pointless obscurantism.
> 
> simboard.c: In function 'main':
> simboard.c:79:19: warning: conversion to 'size_t' from 'int' may change 
> the sign of the result [-Wsign-conversion]
> simboard.c:80:19: warning: conversion to 'size_t' from 'int' may change 
> the sign of the result [-Wsign-conversion]
> 
> Perfectly true, except that it won't. Here are the lines:
> 
>    score += die(6);
>    score += die(6);
> 
> The score object is indeed an unsigned type, but it's set to 0 inside 
> the loop where it's used, and then we add a couple of values in the 
> range 1 to 6 to it, so the biggest value it's going to reach is 12000 or 
> so (because that's how big the board is), so it can't ever wrap.
> 
> Do I want to suppress the warning? No, I'd like to be told about such 
> issues. Does the warning indicate an actual problem with this code? No. 
> Should I change the code to remove the warning? This is perhaps the 
> closest candidate for a change in the whole program, but even then I'm 
> very much in two minds about it. The obvious candidates for change are:
> 
> 1) change score to a signed type - but it's used as an index into an 
> array, so that's not so bright an idea
> 2) change die so that it returns an unsigned type - better, but what if 
> I had dice marked up with -3 to +2, say? The unsigned type then becomes 
> an unspoken assumption.

If the dice can have negative numbers, then 'score' MUST NOT be unsigned,
since the total sum CAN remain negative. 

If 'score' is used as index into an array, then the design of the program
is such that the dice MUST NOT have negative values.

-- 
Fred Kleinschmidt


<snip>
0
Fred
4/23/2015 2:39:51 PM
On 23/04/15 15:39, Fred K wrote:
> On Thursday, April 23, 2015 at 12:10:02 AM UTC-7, Richard Heathfield wrote:

<snip>

>> The obvious candidates for change are:
>>
>> 1) change score to a signed type - but it's used as an index into an
>> array, so that's not so bright an idea
>> 2) change die so that it returns an unsigned type - better, but what if
>> I had dice marked up with -3 to +2, say? The unsigned type then becomes
>> an unspoken assumption.
>
> If the dice can have negative numbers, then 'score' MUST NOT be unsigned,
> since the total sum CAN remain negative.

It would be a mild abuse of unsigned integer arithmetic, but it could be 
done as long as the indexing into the array were to be calculated modulo 
the array size.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/23/2015 3:20:22 PM
On 04/23/2015 11:20 AM, Richard Heathfield wrote:
....
> It would be a mild abuse of unsigned integer arithmetic, but it could be 
> done as long as the indexing into the array were to be calculated modulo 
> the array size.

Another alternative is used in some languages I know of: a negative
subscript refers to the first element of the array, a subscript equal to
or larger than the number of elements in the array refers to the last
element of the array. I've even seen algorithms written specifically for
such languages which explicitly relied upon this behavior to simplify
the algorithm - equivalent C code would have had to explicitly test for
those cases. Those languages don't actually avoid the test - they just
make it implicit in the subscript operation, wasting time when the code
is written to render those tests unnecessary.
0
James
4/23/2015 4:43:31 PM
Ian Collins <ian-news@hotmail.com> writes:
> Noob wrote:
>> Hello Richard,
>>
>> On 23/04/2015 09:09, Richard Heathfield wrote:
>>
>>> Warn if a global function is defined without a previous prototype
>>> declaration. This warning is issued even if the definition itself
>>> provides a prototype. Use this option to detect global functions that
>>> do not have a matching prototype declaration in a header file.
>>
>> I think the intent is clear:
>> - either the function is local, then it must be static
>> - or it is global, then you declare it in the appropriate header
>
> Nit-pick time!
>
> By the time the compiler can diagnose this condition, the preprocessor
> will have done its thing with include files.  Therefore there there is
> no way for it to tell whether a prototype is local or in a header.

Well, it *could*.  The preprocessor can (and in the case of gcc, does)
emit directives that the compiler could use to determine which source
file a given declaration appears in.  Note that compilers are able to
emit warning messages pointing to the exact file name and line number on
which a construct actually appears.  A compiler could use that same
information to decide whether or not to issue a warning.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
4/23/2015 6:52:27 PM
Noob <root@127.0.0.1> writes:
> On 23/04/2015 12:44, Richard Heathfield wrote:
>> On 23/04/15 11:35, Noob wrote:
>>> I must be missing something obvious, because I don't see where
>>> -Wmissing-prototypes would trigger spurious warnings.
>>>
>>> static functions are ignored, and non-static functions should
>>> be declared in an included header.
>> 
>> So to deal with the warning, we have to make functions static even
>> in a single-module C source, where they are rather pointless.
>
> Disagree. 'static' tells the compiler: "there are no other
> users of this function, apart from those you see here".
> Helps the compiler with inlining decisions.

More important, it means that the name isn't exported, so it won't
interfere with uses of the same identifier in other source files.

In my opinion, this is a case where C gets the default wrong.  If you
define an object or function at file scope, it should not be exported
(made visible via the linker to other translation units) unless you
explicitly ask for that.  My preferred workaround is to use "static"
for file-scope definitions unless I specifically want them to have
external linkage.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
4/23/2015 7:09:30 PM
Noob wrote:
> On 23/04/2015 12:44, Richard Heathfield wrote:
>
>> On 23/04/15 11:35, Noob wrote:
>>
>>> I must be missing something obvious, because I don't see where
>>> -Wmissing-prototypes would trigger spurious warnings.
>>>
>>> static functions are ignored, and non-static functions should
>>> be declared in an included header.
>>
>> So to deal with the warning, we have to make functions static even
>> in a single-module C source, where they are rather pointless.
>
> Disagree. 'static' tells the compiler: "there are no other
> users of this function, apart from those you see here".
> Helps the compiler with inlining decisions.

How?

-- 
Ian Collins
0
Ian
4/23/2015 7:52:55 PM
On 04/23/2015 03:52 PM, Ian Collins wrote:
> Noob wrote:
....
>> Disagree. 'static' tells the compiler: "there are no other
>> users of this function, apart from those you see here".
>> Helps the compiler with inlining decisions.
> 
> How?

If a function is NOT declared static, then actual code for that function
must be generated, just in case it gets called from some other
translation unit. Since an actual function body must be created anyway,
that provides a disincentive for inlining. The function can still be
inlined - but in marginal cases the existence of an actual function body
means it isn't worthwhile to inline it.

0
James
4/23/2015 8:15:33 PM
James Kuyper wrote:
> On 04/23/2015 03:52 PM, Ian Collins wrote:
>> Noob wrote:
> ....
>>> Disagree. 'static' tells the compiler: "there are no other
>>> users of this function, apart from those you see here".
>>> Helps the compiler with inlining decisions.
>>
>> How?
>
> If a function is NOT declared static, then actual code for that function
> must be generated, just in case it gets called from some other
> translation unit. Since an actual function body must be created anyway,
> that provides a disincentive for inlining. The function can still be
> inlined - but in marginal cases the existence of an actual function body
> means it isn't worthwhile to inline it.

Yes the code for the function body must be emitted, but that shouldn't 
have any influence on whether it actually gets inlined.

-- 
Ian Collins
0
Ian
4/23/2015 8:21:46 PM
On 04/23/2015 04:21 PM, Ian Collins wrote:
> James Kuyper wrote:
>> On 04/23/2015 03:52 PM, Ian Collins wrote:
>>> Noob wrote:
>> ....
>>>> Disagree. 'static' tells the compiler: "there are no other
>>>> users of this function, apart from those you see here".
>>>> Helps the compiler with inlining decisions.
>>>
>>> How?
>>
>> If a function is NOT declared static, then actual code for that function
>> must be generated, just in case it gets called from some other
>> translation unit. Since an actual function body must be created anyway,
>> that provides a disincentive for inlining. The function can still be
>> inlined - but in marginal cases the existence of an actual function body
>> means it isn't worthwhile to inline it.
> 
> Yes the code for the function body must be emitted, but that shouldn't 
> have any influence on whether it actually gets inlined.

Why not? Inlining is a space/time tradeoff. Let's consider the extreme
case where there's only one call to the function. If it's static, it's
almost trivially the case that the call should be inlined, pretty much
regardless of how big it is or how long it takes to execute, no matter
how small of an advantage it gains from being inlined (so long as it
actually is an advantage, rather than a disadvantage, and I believe that
this is the usual case).

However, if the function has external linkage, and an actual function
body must therefore be created anyway, just in case it does get called
externally, then whether or not it makes sense to inline the function
rather than calling the actual function depends upon all of those
things. For a sufficiently big function that takes a sufficiently long
amount of time to execute, if the time saved by inlining it is
sufficiently small, and if the space required by the inline version of
the code is sufficiently close to the size of the real function, I'd
expect a decent optimizer to favor saving space rather than time, by
calling the real function rather than inlining it.

0
James
4/23/2015 8:44:19 PM
Two nitpicks.

On Thursday, April 23, 2015 at 2:10:02 PM UTC+7, Richard Heathfield wrote:
>    *low = -1;
> The code as written is perfectly clear, and 
> needs no rewriting. A cast would be pointless obscurantism.

Au contraire, much clearer (for me) would be, for example
     *low = (unsigned)-1;
YMMV.  Some codings clear to me are less clear to others;
and, in this case, vice versa.  Yes, the cast is implicit
anyway *but only if I'm bothered to check low's type.*
In any event, I fail to see why such a cast is "obscurantism;"
it's rather *the opposite.*

>    return 1 + sides * (rand() / (RAND_MAX + 1.0));

That introduces a tiny bias.  If RAND_MAX is a power-of-2
the simplest change to get an unbiased variate is
     return 1 + sides * ((rand() | 1) / (RAND_MAX + 1.0));
Yes, the bias is miniscule unless RAND_MAX is small.
But nitpicking about random() usage is one thing we do here.  :-)

James Dow Allen

0
James
4/24/2015 8:40:02 AM
On 24/04/15 09:40, James Dow Allen wrote:
> Two nitpicks.
>
> On Thursday, April 23, 2015 at 2:10:02 PM UTC+7, Richard Heathfield wrote:
>>    *low = -1;
>> The code as written is perfectly clear, and
>> needs no rewriting. A cast would be pointless obscurantism.
>
> Au contraire, much clearer (for me) would be, for example
>       *low = (unsigned)-1;

You make my point well. The cast introduces a bug that is not present 
without it. If you re-read my OP, you will see that *low is an unsigned 
long, not an unsigned int.

> YMMV.

MMDoesIndeedV.

> Some codings clear to me are less clear to others;
> and, in this case, vice versa.  Yes, the cast is implicit
> anyway *but only if I'm bothered to check low's type.*

For the cast to be valid, you /have/ to check low's type, which you didn't.

> In any event, I fail to see why such a cast is "obscurantism;"
> it's rather *the opposite.*

James, you win some, you lose some. I think you just lost this one.

>
>>    return 1 + sides * (rand() / (RAND_MAX + 1.0));
>
> That introduces a tiny bias.  If RAND_MAX is a power-of-2
> the simplest change to get an unbiased variate is
>       return 1 + sides * ((rand() | 1) / (RAND_MAX + 1.0));
> Yes, the bias is miniscule unless RAND_MAX is small.
> But nitpicking about random() usage is one thing we do here.  :-)

Yes, it is. Nevertheless, for this particular (and ludicrously 
unimportant) exercise, I wasn't even remotely concerned about tiny 
biases. I'd have been concerned about *big* biases, but even with 
RAND_MAX at 32767 the bias isn't really all that big.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/24/2015 10:14:21 AM
James Dow Allen <jdallen2000@yahoo.com> writes:

> Two nitpicks.
>
> On Thursday, April 23, 2015 at 2:10:02 PM UTC+7, Richard Heathfield wrote:
<snip>
>>    return 1 + sides * (rand() / (RAND_MAX + 1.0));
>
> That introduces a tiny bias.  If RAND_MAX is a power-of-2
> the simplest change to get an unbiased variate is
>      return 1 + sides * ((rand() | 1) / (RAND_MAX + 1.0));

RAND_MAX is very unlikely to be a power of two, but if it is, this
change introduces a bug (in that the result can now be larger than
intended) whilst not removing the bias.

When RAND_MAX is one less than a power of two (the more usual case)
there is no bug, but the bias remains (though altered, of course).

<snip>
-- 
Ben.
0
Ben
4/24/2015 10:30:46 AM
On 23/04/15 22:44, James Kuyper wrote:
> On 04/23/2015 04:21 PM, Ian Collins wrote:
>> James Kuyper wrote:
>>> On 04/23/2015 03:52 PM, Ian Collins wrote:
>>>> Noob wrote:
>>> ....
>>>>> Disagree. 'static' tells the compiler: "there are no other
>>>>> users of this function, apart from those you see here".
>>>>> Helps the compiler with inlining decisions.
>>>>
>>>> How?
>>>
>>> If a function is NOT declared static, then actual code for that function
>>> must be generated, just in case it gets called from some other
>>> translation unit. Since an actual function body must be created anyway,
>>> that provides a disincentive for inlining. The function can still be
>>> inlined - but in marginal cases the existence of an actual function body
>>> means it isn't worthwhile to inline it.
>>
>> Yes the code for the function body must be emitted, but that shouldn't 
>> have any influence on whether it actually gets inlined.
> 
> Why not? Inlining is a space/time tradeoff. Let's consider the extreme
> case where there's only one call to the function. If it's static, it's
> almost trivially the case that the call should be inlined, pretty much
> regardless of how big it is or how long it takes to execute, no matter
> how small of an advantage it gains from being inlined (so long as it
> actually is an advantage, rather than a disadvantage, and I believe that
> this is the usual case).

The only exception to this would be if the compiler used likely/unlikely
paths to rearrange hot and cold sections of code to optimise cache hits.

(Or one might disable inlining of single-use functions for debugging
convenience.)

> 
> However, if the function has external linkage, and an actual function
> body must therefore be created anyway, just in case it does get called
> externally, then whether or not it makes sense to inline the function
> rather than calling the actual function depends upon all of those
> things. For a sufficiently big function that takes a sufficiently long
> amount of time to execute, if the time saved by inlining it is
> sufficiently small, and if the space required by the inline version of
> the code is sufficiently close to the size of the real function, I'd
> expect a decent optimizer to favor saving space rather than time, by
> calling the real function rather than inlining it.
> 

In particular, it is very common in smaller embedded systems to use
"optimise for space" optimisations ("-Os" in gcc terms), where you want
fast code but have a bit more emphasis on space.  A compiler should then
not inline a function that it has generated in its entirety (due to
external linkage), unless inlining leads to smaller code overall,
perhaps due to extra constant propagation optimisations.



0
David
4/24/2015 11:01:51 AM
On 4/23/2015 3:44 AM, Richard Heathfield wrote:

> #include <stdio.h>
> unsigned long seed;
> unsigned long prng(void)
> {
>    ++seed;
>    seed *= 0x87654321;
>    return seed >> 7; /* don't use this at home, folks! */
> }
>
> int main(void)
> {
>    printf("%lu\n", prng());
>    return 0;
> }
>
> -Wmissing-declarations would warn for seed, because there
> is no declaration preceding its definition.

Actually, it doesn't. Global variables don't need to be declared
before they're defined in the same compilation unit. The only
places a non-definition declaration is required is if they're
accessed in *other* compilation units; those src files would need:

extern unsigned long seed;

That tells the linker to look for the file in which that
variable is defined, and link to that.

But putting that declaration in the src file that defines the
variable is not only unnecessary but causes confusion and
possibility of errors. I used to maintain a large software
project using MANY *.c and *.h files and a large number of global
variables. (Not my doing, I swear.) To help unscramble that mess,
I moved all the global variables to one file only: "globals.h".
Every *.c file #include'd that file. The file went like this:

#ifndef GLOBALS_H
#define GLOBALS_H
#ifdef  MAIN_C
#define EXTERN
#define INIT = 0
#else
#define EXTERN extern
#define INIT
#endif
EXTERN long int       Aardvark   INIT ;
EXTERN double         Baklava    INIT ;
EXTERN unsigned long  Crunchy    INIT ;
EXTERN char           Delta[52]       ;
/* hundreds more variables */
#endif

That way, every global variable is declared, defined, and
initialized in one place only: "globals.h".  The copy that
gets included in "main.c" is the only copy that defines and
initializes the variables; the copy included in every other
*.c file just declares the variables "extern".

> It would also warn for prng(), because there is no declaration
> (a function declaration in this case) preceding its definition.

Yes.

What I get when I compile it with a whole mess of
needlessly pedantic warnings turned on is:

gcc  -I /rhe/include -pedantic -Wall -Wextra -Wfloat-equal
-Wshadow -Wcast-qual -Wcast-align -Wconversion -Winline -Wcomments
-Wundef -Wunused-macros -Wold-style-definition -Wmissing-prototypes
-Wmissing-declarations -Wnested-externs -std=c11 -Os -s declaration-test.c
-L/rhe/lib -lrh -lfl -ly -lm -o /rhe/bin/test/declaration-test.exe

declaration-test.c:3:15: warning:
no previous prototype for ‘prng’ [-Wmissing-prototypes]
  unsigned long prng(void)
                ^

> Fix 1: write a header in which unsigned long seed is declared
> extern, and in which prng is prototyped;
> Fix 2: make them static;
> Fix 3: don't use that warning.
> I think I'll go for Fix 3.

It can be fixed more easily:

#include <stdio.h>
unsigned long prng(void); /* added prototype */
unsigned long seed;
unsigned long prng(void)
{
   ++seed;
   seed *= 0x87654321;
   return seed >> 7;
}
int main(void)
{
   printf("%lu\n", prng());
   return 0;
}

Compilation:

gcc  -I /rhe/include -pedantic -Wall -Wextra -Wfloat-equal
-Wshadow -Wcast-qual -Wcast-align -Wconversion -Winline -Wcomments
-Wundef -Wunused-macros -Wold-style-definition -Wmissing-prototypes
-Wmissing-declarations -Wnested-externs -std=c11 -Os -s declaration-test.c
-L/rhe/lib -lrh -lfl -ly -lm -o /rhe/bin/test/declaration-test.exe

[no errors, no warnings]


If that was intended to generate pseudo-random numbers, though,
it fails miserably. On my system it prints 17746566 repeatedly.


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
0
Robbie
4/24/2015 11:42:53 AM
On 24/04/15 12:42, Robbie Hatley wrote:
>
<snip>
>
>
> If that was intended to generate pseudo-random numbers, though,
> it fails miserably. On my system it prints 17746566 repeatedly.


That doesn't surprise me. Nor does it surprise me that, with delicious 
irony, you ignored my warning.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/24/2015 3:07:48 PM
On Friday, April 24, 2015 at 5:14:33 PM UTC+7, Richard Heathfield wrote:
> On 24/04/15 09:40, James Dow Allen wrote:
> > Au contraire, much clearer (for me) would be, for example
> >       *low = (unsigned)-1;
> 
> You make my point well. The cast introduces a bug that is not present 
> without it. If you re-read my OP, you will see that *low is an unsigned 
> long, not an unsigned int.

No.  Obviously I would have checked the type of low
before changing the code; my post was hurried and illustrative.
The point is that writing code correctly ONCE is
easier than reading it correctly MANY times.

> If RAND_MAX is a power-of-2

Obviously (again) I meant one less than a power-of-two.

Geeez.  Can't you guys assume I wrote what was intended?
  :-)  :-)

James
0
James
4/24/2015 5:02:36 PM
On 24/04/15 18:02, James Dow Allen wrote:
> On Friday, April 24, 2015 at 5:14:33 PM UTC+7, Richard Heathfield wrote:
>> On 24/04/15 09:40, James Dow Allen wrote:
>> > Au contraire, much clearer (for me) would be, for example
>> >       *low = (unsigned)-1;
>>
>> You make my point well. The cast introduces a bug that is not present
>> without it. If you re-read my OP, you will see that *low is an unsigned
>> long, not an unsigned int.
>
> No.  Obviously I would have checked the type of low
> before changing the code; my post was hurried and illustrative.

Many code changes are hurried. After such a "simple" change, many people 
don't even bother to re-test. (I know it, you know it.) This hurried 
code change introduced a bug.

> The point is that writing code correctly ONCE is
> easier than reading it correctly MANY times.

True enough, but there is no substitute for knowing the language. The 
concept of reduction modulo (<TYPE>_MAX + 1) for unsigned integer types 
is integral to a proper understanding of C - unlike, say, the rote 
learning of p53 of K&R2; I wouldn't expect any C programmer to know the 
precedence table off by heart (especially as it is only illustrative, 
not a formal statement of how the rules are constructed), so I am quite 
in favour of adding parentheses where they clarify the intent of the 
author. But I *would* expect C programmers to know that setting an 
unsigned integer object's value to -1 would result in a reduction into 
the appropriate range in the obvious manner.

Once you add the cast, you have to convince yourself that it's correct, 
/every/ time you read the code (because casts are tricksy beasts, and 
are the source of many a bug). As you say, writing code correctly ONCE 
is easier than reading it correctly MANY times, so it's better to leave 
the cast out completely. And, if I were unkind, I would reiterate that 
it has the added advantage of not introducing a bug. But I'm not unkind, 
so I won't say that. Oh, I already did. Oops. ;-)

<snip>

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/24/2015 5:20:54 PM
On Saturday, April 25, 2015 at 12:21:03 AM UTC+7, Richard Heathfield wrote: 
> concept of reduction modulo (<TYPE>_MAX + 1) for unsigned integer types 
> is integral to a proper understanding of C

BUT when browsing the code quickly, it is easily overlooked that
the variable is unsigned rather than signed.  Initializing a
max/min aggregate to very large or very small (or often just any
negative) is very common.  "-1" looks like a lower bound.
"(unsigned [whatever])-1" would be recognized as an upper bound.
Stated differently writing "-1" and expecting the user to
immediately know that a large positive integer is intended is a form
of obfuscation.

I don't feel strongly about this.  But my coding is sometimes
denigrated for assuming reader knows, e.g. precedence rules, and
turnabout is fair play.  :-)

A key difference is that many complex expressions are understandable
by themselves, BUT your example requires that the reader be aware
of the specific type declared elsewhere (and often irrelevant
for quick browsing).  Of course anyone planning to modify the code
would know the type, but sometimes we browse large
chunks of code quickly for general comprehension.

James

0
James
4/24/2015 8:27:22 PM
James Dow Allen <jdallen2000@yahoo.com> writes:
<snip>
>> If RAND_MAX is a power-of-2
>
> Obviously (again) I meant one less than a power-of-two.
>
> Geeez.  Can't you guys assume I wrote what was intended?

That's a wise bit of advice but I am completely stumped by what you
intended to say about removing the bias.

-- 
Ben.
0
Ben
4/24/2015 10:02:42 PM
Ian Collins <ian-news@hotmail.com> writes:

> Noob wrote:
>> Hello Richard,
>>
>> On 23/04/2015 09:09, Richard Heathfield wrote:
>>
>>> Warn if a global function is defined without a previous prototype
>>> declaration.  This warning is issued even if the definition itself
>>> provides a prototype.  Use this option to detect global functions that
>>> do not have a matching prototype declaration in a header file.
>>
>> I think the intent is clear:
>> - either the function is local, then it must be static
>> - or it is global, then you declare it in the appropriate header
>
> Nit-pick time!
>
> By the time the compiler can diagnose
> this condition, the preprocessor will
> have done its thing with include files.
> Therefore there there is no way for it
> to tell whether a prototype is local or
> in a header.

I suggest otherwise:


  File proto.h:

    int foo( double );


  File proto.c:

    #include "proto.h"

    int foo( int );

    int
    main(){
        return  0;
    }


  Result of gcc -ansi -pedantic proto.c -

    proto.c:3: error: conflicting types for 'foo'
    proto.h:1: error: previous declaration of 'foo' was here

0
Tim
4/25/2015 10:03:43 AM
James Dow Allen <jdallen2000@yahoo.com> wrote:

> Geeez.  Can't you guys assume I wrote what was intended?
>   :-)  :-)

The compiler can't...

Richard
0
raltbos
4/25/2015 10:25:00 AM
Richard Heathfield <rjh@cpax.org.uk> writes:

> simboard.c:  In function ?die?:
> simboard.c:7:3: warning: conversion to
> ?int?  from ?double?  may alter its value
> [-Wconversion]
>
> Here's the line:
>   return 1 + sides * (rand() / (RAND_MAX + 1.0));
>
> This is very much along the lines of the
> hypothetical "adding two unsigned
> integers may result in a value lower
> than either operand", which was thought
> to be so ridiculous in a recent
> thread.  This line is idiomatic, and
> there is no need whatsoever to change
> it.

I'm not sure you have realized what the complaint is about.
The problem is not the expression itself.  The problem is
that the value of the expression, which is of type (double),
is used as the return value for a function of type (int).
An analogous situation is

    int
    foo( int x ){
        return  x / 2.0;
    }

which IMO is a reasonable situation in which to issue a
diagnostic.  Not that I would use that class of diagnostics
all the time necessarily, but on those occasions when I'm
looking for "all suspicious cases" this one definitely
qualifies.
0
Tim
4/25/2015 10:32:14 AM
Richard Heathfield <rjh@cpax.org.uk> writes:

> simboard.c:  In function ?main?:
> simboard.c:79:19: warning: conversion to
> ?size_t?  from ?int?  may change the sign
> of the result [-Wsign-conversion]
> simboard.c:80:19: warning: conversion to
> ?size_t?  from ?int?  may change the sign
> of the result [-Wsign-conversion]
>
> Perfectly true, except that it won't.  Here are the lines:
>
>   score += die(6);
>   score += die(6);
>
> The score object is indeed an unsigned
> type, but it's set to 0 inside the loop
> where it's used, and then we add a
> couple of values in the range 1 to 6 to
> it, so the biggest value it's going to
> reach is 12000 or so (because that's how
> big the board is), so it can't ever
> wrap.
>
> Do I want to suppress the warning?  No,
> I'd like to be told about such
> issues.  Does the warning indicate an
> actual problem with this code?
> No.  Should I change the code to remove
> the warning?  This is perhaps the closest
> candidate for a change in the whole
> program, but even then I'm very much in
> two minds about it.  The obvious
> candidates for change are:
>
> 1) change score to a signed type - but
> it's used as an index into an array, so
> that's not so bright an idea
> 2) change die so that it returns an
> unsigned type - better, but what if I
> had dice marked up with -3 to +2, say?
> The unsigned type then becomes an
> unspoken assumption.

If die() might return a negative value, then the running
total 'score' should normally be signed as well.  Apparently
there is an unstated assumption that die() always returns
non-negative values, which suggests to me that die() should
be given an unsigned return type, at least until such time
as it needs to return a negative value, at which time
potential conversion difficulties like this one will
surface again.  That's my take anyway, fwiw.

If I might add an editorial comment, it looks like the
diagnostic option -Wsign-conversion has become so
restrictive that it is basically useless (ie, to use on a
regular basis, as opposed to occasional spot checks.)
0
Tim
4/25/2015 11:01:26 AM
On 4/24/2015 8:07 AM, Richard Heathfield wrote:

> ... Nor does it surprise me that,with delicious irony,
> you ignored my warning.

Warning? Ah, you must mean your "don't try this at home".
But Richard, that's the best way in the world to pique my
curiosity; when I see something like that, of *course* I'm
going to try it at home. :D


-- 
Cheers,
Robbie Hatley
Midway City, CA, USA
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
http://www.well.com/user/lonewolf/
https://www.facebook.com/robbie.hatley
0
Robbie
4/26/2015 5:08:17 AM
On 25/04/15 11:32, Tim Rentsch wrote:
> Richard Heathfield <rjh@cpax.org.uk> writes:
>
>> simboard.c:  In function ?die?:
>> simboard.c:7:3: warning: conversion to
>> ?int?  from ?double?  may alter its value
>> [-Wconversion]
>>
>> Here's the line:
>>   return 1 + sides * (rand() / (RAND_MAX + 1.0));
>>
>> This is very much along the lines of the
>> hypothetical "adding two unsigned
>> integers may result in a value lower
>> than either operand", which was thought
>> to be so ridiculous in a recent
>> thread.  This line is idiomatic, and
>> there is no need whatsoever to change
>> it.
>
> I'm not sure you have realized what the complaint is about.

I think I have.

> The problem is not the expression itself.

Agreed. The function returns int (because a die roll yields an integer 
value). The expression is calculated as a double because we want to 
multiply the maximum score of the die by a value in the range [0,1), a 
value that we need to be able to be non-integer (although it might still 
be an integer if it's 0), and then add 1 to it to give the final result, 
which is what we return. That result is still a double, but its floor is 
an integer value, and that is what actually gets returned.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/26/2015 8:33:03 AM
On 26/04/15 06:08, Robbie Hatley wrote:
>
> On 4/24/2015 8:07 AM, Richard Heathfield wrote:
>
>> ... Nor does it surprise me that,with delicious irony,
>> you ignored my warning.
>
> Warning? Ah, you must mean your "don't try this at home".
> But Richard, that's the best way in the world to pique my
> curiosity; when I see something like that, of *course* I'm
> going to try it at home. :D


No user-serviceable parts inside? *I'll* be the judge of that!

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/26/2015 1:29:15 PM
Richard Heathfield <rjh@cpax.org.uk> writes:

> On 25/04/15 11:32, Tim Rentsch wrote:
>> Richard Heathfield <rjh@cpax.org.uk> writes:
>>
>>> simboard.c:  In function ?die?:
>>> simboard.c:7:3: warning: conversion to
>>> ?int?  from ?double?  may alter its value
>>> [-Wconversion]
>>>
>>> Here's the line:
>>>   return 1 + sides * (rand() / (RAND_MAX + 1.0));
>>>
>>> This is very much along the lines of the
>>> hypothetical "adding two unsigned
>>> integers may result in a value lower
>>> than either operand", which was thought
>>> to be so ridiculous in a recent
>>> thread.  This line is idiomatic, and
>>> there is no need whatsoever to change
>>> it.
>>
>> I'm not sure you have realized what the complaint is about.
>
> I think I have.

Sorry, I was misled by your remark that the line was
idiomatic, whereas the idiom used (and indeed almost
everything in the whole expression) is incidental to
the diagnostic generated.

>> The problem is not the expression itself.
>
> Agreed.  The function returns int
> (because a die roll yields an integer
> value).  The expression is calculated as
> a double because we want to multiply the
> maximum score of the die by a value in
> the range [0,1), a value that we need to
> be able to be non-integer (although it
> might still be an integer if it's 0),
> and then add 1 to it to give the final
> result, which is what we return.  That
> result is still a double, but its floor
> is an integer value, and that is what
> actually gets returned.

Given what you're trying to do, I think it's better to
solve the problem entirely in integers:

    int
    die( int sides ){
        /* verify 1 <= sides <= RAND_MAX */
        int least = (RAND_MAX - sides + 1) % sides;
        int r;
        do  r = rand();  while(  least > r  );
     #  if  PERSNICKETY
            return  1 + (r-least) / (1 + (RAND_MAX - sides + 1) / sides);
     #  else
            return  1 + (r - least) % sides;
     #  endif
    }

This formulation avoids the diagnostic, probably runs faster,
and also eliminates bias.  A win all the way around.  (You can
do a '#define PERSNICKETY 1' to lessen dependence on low-order
bits if you think that's necessary.)
0
Tim
4/26/2015 2:37:20 PM
James Dow Allen <jdallen2000@yahoo.com> writes:

> Two nitpicks.
>
> On Thursday, April 23, 2015 at 2:10:02 PM UTC+7, Richard Heathfield wrote:
>>    *low = -1;
>> The code as written is perfectly clear, and 
>> needs no rewriting.  A cast would be pointless obscurantism.
>
> Au contraire, much clearer (for me) would be, for example
>      *low = (unsigned)-1;
> YMMV.  Some codings clear to me are less clear to others;
> and, in this case, vice versa.  Yes, the cast is implicit
> anyway *but only if I'm bothered to check low's type.*
> In any event, I fail to see why such a cast is "obscurantism;"
> it's rather *the opposite.*

I have a strong aversion to casts, especially in situations
like this one.  I agree however that this line merits
some sort of fix-up, since what's going on isn't obvious.
(That might be different if we were initialialing rather
than assigning, because the type would be right in case, but
we aren't.)  Taking both of these concerns into account, I
suggest

    *low = (uintmax_t){ -1 };

which surely does the right thing if the left-hand-side type is
any unsigned integer type, and also makes it obvious that setting
some unsigned type to all ones is what is intended.

>>    return 1 + sides * (rand() / (RAND_MAX + 1.0));
>
> That introduces a tiny bias.  If RAND_MAX is a power-of-2
> the simplest change to get an unbiased variate is
>      return 1 + sides * ((rand() | 1) / (RAND_MAX + 1.0));
> Yes, the bias is miniscule unless RAND_MAX is small.
> But nitpicking about random() usage is one thing we do here.  :-)

I have to disagree on this one.  If RAND_MAX % sides != sides-1
then there is no way to get an unbiased sample without multiple
calls (potentially) to rand().  The /simplest/ change to get an
unbiased variate is probably this one:

    int
    die( int sides ){
        int r = rand();
        return  r < sides  ?  r + 1  :  die( sides );
    }

which I'm sure other folks will have no difficulty criticizing
or improving.  :)
0
Tim
4/26/2015 3:09:32 PM
Richard Heathfield <rjh@cpax.org.uk> wrote:

> On 26/04/15 06:08, Robbie Hatley wrote:
> >
> > On 4/24/2015 8:07 AM, Richard Heathfield wrote:
> >
> >> ... Nor does it surprise me that,with delicious irony,
> >> you ignored my warning.
> >
> > Warning? Ah, you must mean your "don't try this at home".
> > But Richard, that's the best way in the world to pique my
> > curiosity; when I see something like that, of *course* I'm
> > going to try it at home. :D
> 
> No user-serviceable parts inside? *I'll* be the judge of that!

As behooves the proper programmer. Hack it, and if (when...) it breaks,
hack it back.

Richard
0
raltbos
4/26/2015 3:45:45 PM
Tim Rentsch <txr@alumni.caltech.edu> writes:
[...]
> I have a strong aversion to casts, especially in situations
> like this one.  I agree however that this line merits
> some sort of fix-up, since what's going on isn't obvious.
> (That might be different if we were initialialing rather
> than assigning, because the type would be right in case, but
> we aren't.)  Taking both of these concerns into account, I
> suggest
>
>     *low = (uintmax_t){ -1 };
>
> which surely does the right thing if the left-hand-side type is
> any unsigned integer type, and also makes it obvious that setting
> some unsigned type to all ones is what is intended.
[...]

The use of a compound literal for a scalar type might be a bit
confusing to some.  I'm not saying it *should* be, and I wouldn't
mind if that became a common idiom, but I'm not sure I've seen anyone
other than you take advantage of it.  A cast would be equivalent
and arguably clearer.  (It's also supported in C90, but uintmax_t
isn't, so ...).

But it doesn't *always* work correctly.  On 64-bit targets, gcc
supports types `__int128` and `unsigned __int128` as an extension --
but they're *not* treated as extended integer types, and intmax_t
and uintmax_t are still 64 bits.  Which means that if *low happens
to be of type unsigned __int128, then the above will assign an
incorrect value to it, whereas

    *low = -1;

would have worked correctly.

I've been told that some ABI requires intmax_t and uintmax_t
to be 64 bits.  IMHO that's a horrible requirement.
It doesn't forbid extended types wider than 64 bits, but
it forbids implementing them correctly.  But I just checked
http://www.x86-64.org/documentation/abi.pdf, and it doesn't mention
[u]intmax_t or <stdint.h>.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
4/26/2015 7:26:26 PM
Keith Thompson <kst-u@mib.org> writes:

> Tim Rentsch <txr@alumni.caltech.edu> writes:
> [...]
>> I have a strong aversion to casts, especially in situations
>> like this one.  I agree however that this line merits
>> some sort of fix-up, since what's going on isn't obvious.
>> (That might be different if we were initialialing rather
>> than assigning, because the type would be right in case, but
>> we aren't.)  Taking both of these concerns into account, I
>> suggest
>>
>>     *low = (uintmax_t){ -1 };
>>
>> which surely does the right thing if the left-hand-side type is
>> any unsigned integer type, and also makes it obvious that setting
>> some unsigned type to all ones is what is intended.
>
> [...]
>
> The use of a compound literal for a scalar type might be a bit
> confusing to some.  I'm not saying it *should* be, and I wouldn't
> mind if that became a common idiom, but I'm not sure I've seen
> anyone other than you take advantage of it.

I agree it's more likely to see a cast used in such cases, which
is probably largely historical.  I agree also that it would be
unfamiliar (not sure about confusing, but at least unfamiliar)
to many readers.  However that's not a reason not to use it -
if it is a better way to express what is intended (which IMO it
is), then it's better to make the switch.  Change has to start
somewhere. 

> A cast would be equivalent and arguably clearer.

A cast would be equivalent /in this case/, but not in similar
cases, and that is the point.  Using a compound literal in cases
like this is safer because it accepts only assignable types, not
the wider range of types that casts allow.  As for clearer, I
believe that is a mistaken identification;  it's what most
people are used to, but that doesn't make it clearer.  I would
argue that using a cast is less clear, because casts can effect
both safe and unsafe conversions, whereas compound literals can
effect only safe conversions (with the obvious disclaimers about
narrowing, void *, etc).  If I see a cast I have to wonder (if
even only briefly) whether the conversion is safe or unsafe,
but with compound literals there is no such concern.  IMO casts
should be limited to cases of unsafe conversions, and compound
literals used for all safe conversion.

> (It's also supported in C90, but uintmax_t isn't, so ...).

Right.

> But it doesn't *always* work correctly.  On 64-bit targets, gcc
> supports types `__int128` and `unsigned __int128` as an extension
> -- but they're *not* treated as extended integer types, and
> intmax_t and uintmax_t are still 64 bits.  Which means that if
> *low happens to be of type unsigned __int128, then the above will
> assign an incorrect value to it, whereas
>
>     *low = -1;
>
> would have worked correctly.

It does work correctly in the cases I said it would, where the
left-hand side is an unsigned integer type, since the __int128
types are not integer types as the Standard uses the term (kind
of what you pointed out).  However I take your point, and it is
a good one.

> I've been told that some ABI requires intmax_t and uintmax_t to
> be 64 bits.  IMHO that's a horrible requirement.

Beyond horrble.  Stupefyingly bad.

> It doesn't
> forbid extended types wider than 64 bits, but it forbids
> implementing them correctly.  But I just checked
> http://www.x86-64.org/documentation/abi.pdf, and it doesn't
> mention [u]intmax_t or <stdint.h>.

So, this raises the question - do you know of some actual
specification that limits [u]intmax_t to 64 bits?
0
Tim
4/28/2015 4:52:39 PM
Tim Rentsch <txr@alumni.caltech.edu> writes:
> Keith Thompson <kst-u@mib.org> writes:
[...]
>> I've been told that some ABI requires intmax_t and uintmax_t to
>> be 64 bits.  IMHO that's a horrible requirement.
>
> Beyond horrble.  Stupefyingly bad.
>
>> It doesn't
>> forbid extended types wider than 64 bits, but it forbids
>> implementing them correctly.  But I just checked
>> http://www.x86-64.org/documentation/abi.pdf, and it doesn't
>> mention [u]intmax_t or <stdint.h>.
>
> So, this raises the question - do you know of some actual
> specification that limits [u]intmax_t to 64 bits?

No, I don't.  The web page showing C++11 implementation status for
Clang, in the section on C99 features in C++11, says:

    Extended integral types: N/A

with a footnote that says:

    No compiler changes are required for an implementation such as Clang
    that does not provide any extended integer types. __int128 is not
    treated as an extended integer type, because changing intmax_t would
    be an ABI-incompatible change.

I don't know what ABI that's referring to.  As I said, the x86-64 ABI
document I was able to find (which isn't referenced from the Clang page)
doesn't mention [u]intmax_t, or even <stdint.h>.

There's a gcc bug report:

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49595

that's been resolved as "invalid".  The response says that __int128 is
not an "extended integer type" as defined by the C99 standard, and
therefore the definition of [u]intmax_t needn't reflect its existence.
That's true, but no real justification is given for the (permitted but
IMHO flawed) decision to treat __int128 as an extension rather than as
an extended integer type.  A comment in the bug report claims that

    sizeof(intmax_t) is fixed by various LP64 ABIs and cannot be changed

but no support is given for that claim, even though someone else
specifically asked for specifics.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
4/28/2015 6:27:53 PM
On 28/04/15 19:27, Keith Thompson wrote:

<snip>

> A comment in the bug report claims that
>
>      sizeof(intmax_t) is fixed by various LP64 ABIs and cannot be changed

That's priceless. Next stop: intmaxmax_t

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
4/28/2015 7:09:43 PM
James Dow Allen <jdallen2000@yahoo.com> writes:

> On Thursday, April 23, 2015 at 2:10:02 PM UTC+7, Richard Heathfield wrote:
>>
>>    return 1 + sides * (rand() / (RAND_MAX + 1.0));
>
> That introduces a tiny bias.  If RAND_MAX is a power-of-2
> the simplest change to get an unbiased variate is
>      return 1 + sides * ((rand() | 1) / (RAND_MAX + 1.0));
> Yes, the bias is miniscule unless RAND_MAX is small.
> But nitpicking about random() usage is one thing we do here.  :-)

I posted a response on this earlier, giving a solution that was
at least somewhat tongue-in-cheek.

Thinking it over a little more though I arrived at this:

    int
    die( int sides ){
        int  least = (RAND_MAX - sides + 1) % sides,  r = rand();
        return  r < least  ?  die( sides )  :  1 + (r-least) % sides;
    }

This definition is suitable as an actual implementation:  it
makes use of as many different rand() values as possible (without
being biased), and also compiles to good code (on a fairly new
gcc, but I think earlier versions are also good).  In particular,
besides turning the tail call into a loop, gcc is smart enough to
hoist the computation of 'least' out of the loop so it is
calculated only once.
0
Tim
4/28/2015 7:30:41 PM
Tim Rentsch <txr@alumni.caltech.edu> might have writ, in 
news:kfnk2wwcaou.fsf@x-alumni2.alumni.caltech.edu:
> Thinking it over a little more though I arrived at this:
> 
>     int
>     die( int sides ){
>         int  least = (RAND_MAX - sides + 1) % sides,  r = rand();
>         return  r < least  ?  die( sides )  :  1 + (r-least) % sides;
>     }

Sigh.  :-)  It's bad enough to reinvent a wheel, but to reinvent an 
(inferior)  wheel that's already reinvented thrice a year just in c.l.c. 
is downright weird. (And since you've gone all-out for keystroke 
minimization, the useless "-least" near the end is odd.  :-)

If a goal is to reduce calls to rand(), why not just get almost TWELVE 
random die flips from a single 32-bit rand(), instead of only (almost) 
ONE as you do here?  I showed how, months ago, in code you didn't seem 
to understand.

And if a goal is simplifying source code, why not just invoke a ... 
(gasp!) ... library function
    roll = rand_index(6) + 1;

Tim can't even claim ignorance of these techniques; they were mentioned 
a few months agoi and Tim stepped in with (ignorant) commentary.

James Dow Allen
0
James
4/29/2015 4:07:12 AM
Keith Thompson <kst-u@mib.org> writes:

> Tim Rentsch <txr@alumni.caltech.edu> writes:
>> Keith Thompson <kst-u@mib.org> writes:
>
> [...]
>
>>> I've been told that some ABI requires intmax_t and uintmax_t to
>>> be 64 bits.  IMHO that's a horrible requirement.
>>
>> Beyond horrble.  Stupefyingly bad.
>>
>>> It doesn't
>>> forbid extended types wider than 64 bits, but it forbids
>>> implementing them correctly.  But I just checked
>>> http://www.x86-64.org/documentation/abi.pdf, and it doesn't
>>> mention [u]intmax_t or <stdint.h>.
>>
>> So, this raises the question - do you know of some actual
>> specification that limits [u]intmax_t to 64 bits?
>
> No, I don't.  The web page showing C++11 implementation status for
> Clang, in the section on C99 features in C++11, says:
>
>     Extended integral types:  N/A
>
> with a footnote that says:
>
>     No compiler changes are required for an implementation such as Clang
>     that does not provide any extended integer types.  __int128 is not
>     treated as an extended integer type, because changing intmax_t would
>     be an ABI-incompatible change.
>
> I don't know what ABI that's referring to.  As I said, the x86-64 ABI
> document I was able to find (which isn't referenced from the Clang page)
> doesn't mention [u]intmax_t, or even <stdint.h>.
>
> There's a gcc bug report:
>
>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49595
>
> that's been resolved as "invalid".  The response says that __int128 is
> not an "extended integer type" as defined by the C99 standard, and
> therefore the definition of [u]intmax_t needn't reflect its existence.
> That's true, but no real justification is given for the (permitted but
> IMHO flawed) decision to treat __int128 as an extension rather than as
> an extended integer type.  A comment in the bug report claims that
>
>     sizeof(intmax_t) is fixed by various LP64 ABIs and cannot be changed
>
> but no support is given for that claim, even though someone else
> specifically asked for specifics.

Thank you for the information.  Following this, I decided to do
some web research.  Based on what I read, I have a theory of
sorts, which is this.  Operating systems such as Linux supply a
system call library, to access functions in the OS.  What is
needed by that library defines the ABI for that system.  Rightly
or wrongly, the system-supplied library is thought to include
(some portion of) a standard C library.  Since the OS now "owns"
those functions, the ABI for that OS covers any types used in
said library.  The types [u]intmax_t are among those type,
because they may be used for arguments given to printf (ie, there
is a format specifier that expects such types).  Hence the ABI
for the OS includes these types as part of its specifications.

Some selected quotes:

   "(b) The <stdint.h> types may need to be compatible with
    those chosen by vendors.  For intmax_t, this may affect the
    printf ABI."

   "Indeed. The underlying type of all typedefs is part of the
    ABI if you're considering C++.  [...]  I think [u]intmax_t
    should just be moved back to alltypes.h with the correct
    per-arch definitions."

   "Unfortunately, not many software developers rush towards
    binary compatibility - for instance, the Microsoft compilers
    use their own C++ ABI. And because Morpher produces binary
    object files, users of compilers that do not stick to the
    Generic C++ ABI can face compatibility problems."

   "Most platforms have a well-defined ABI that covers C code,
    but ABIs that cover C++ functionality are not yet common."

For reference here are URLS for pages that turned up in my search
(some of which may be incidental to the question).  If you look
into this question further I'd be interested to hear any other
conclusions you might reach.

http://gcc.gnu.org/ml/gcc/2000-07/msg00142.html
http://www.risc.jku.at/education/courses/ws2003/intropar/origin-new/MproCplrDbx_TG/sgi_html/ch06.html
http://upstream-tracker.org/compat_reports/jemalloc/3.4.1_to_3.5.1/abi_compat_report.html
http://www.openwall.com/lists/musl/2013/02/09/3
http://morpher.com/documentation/articles/abi/
http://gcc.gnu.org/onlinedocs/gcc/Compatibility.html
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2270.html
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2130.html
http://developerblog.redhat.com/2014/10/23/comparing-abis-for-compatibility-with-libabigail-part-1/
http://infocenter.arm.com/help/topic/com.arm.doc.dui0774a/chr1383660321827.html
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20140224/100324.html
http://herbsutter.com/2012/12/04/compatibility/
0
Tim
4/30/2015 1:01:01 AM
James Dow Allen <gmail@jamesdowallen.nospam> writes:

> Tim Rentsch <txr@alumni.caltech.edu> might have writ, in 
> news:kfnk2wwcaou.fsf@x-alumni2.alumni.caltech.edu:
>> Thinking it over a little more though I arrived at this:
>>
>>     int
>>     die( int sides ){
>>         int  least = (RAND_MAX - sides + 1) % sides,  r = rand();
>>         return  r < least  ?  die( sides )  :  1 + (r-least) % sides;
>>     }
>
> Sigh.  :-)  It's bad enough to reinvent a wheel, but to reinvent an 
> (inferior)  wheel that's already reinvented thrice a year just in c.l.c. 
> is downright weird.  (And since you've gone all-out for keystroke 
> minimization, the useless "-least" near the end is odd.  :-)
>
> If a goal is to reduce calls to rand(), why not just get almost TWELVE 
> random die flips from a single 32-bit rand(), instead of only (almost) 
> ONE as you do here?  I showed how, months ago, in code you didn't seem 
> to understand.
>
> And if a goal is simplifying source code, why not just invoke a ... 
> (gasp!) ... library function
>     roll = rand_index(6) + 1;
>
> Tim can't even claim ignorance of these techniques;  they were mentioned 
> a few months agoi and Tim stepped in with (ignorant) commentary.

The point of my posting was to illustrate a general technique, in
case other people might be interested in that.  The particular
functionality supplied is incidental to the point I was trying to
make.  I'm sorry that didn't come across more clearly.

As far as the function itself goes, whatever its shortcomings may
be, it does have the useful property that it correctly shows how
to accomplish a non-biased uniform sample, which the attempt made
upthread (ie, in the posting to which I was responding) did not.
So it seems there is some reason to think explaining this again
is useful (and of course there may be readers now who missed any
earlier postings on the topic).
0
Tim
4/30/2015 2:48:44 PM
Keith Thompson <kst-u@mib.org> writes:
> Tim Rentsch <txr@alumni.caltech.edu> writes:
> > Keith Thompson <kst-u@mib.org> writes:
> [...]
> >> I've been told that some ABI requires intmax_t and uintmax_t to
> >> be 64 bits.  IMHO that's a horrible requirement.
> >
> > Beyond horrble.  Stupefyingly bad.

I think I'm somewhere between you two on that point.

> >> It doesn't
> >> forbid extended types wider than 64 bits, but it forbids
> >> implementing them correctly.  But I just checked
> >> http://www.x86-64.org/documentation/abi.pdf, and it doesn't
> >> mention [u]intmax_t or <stdint.h>.
> >
> > So, this raises the question - do you know of some actual
> > specification that limits [u]intmax_t to 64 bits?
> 
> No, I don't.  The web page showing C++11 implementation status for
> Clang, in the section on C99 features in C++11, says:
> 
>     Extended integral types: N/A
> 
> with a footnote that says:
> 
>     No compiler changes are required for an implementation such as Clang
>     that does not provide any extended integer types. __int128 is not
>     treated as an extended integer type, because changing intmax_t would
>     be an ABI-incompatible change.
> 
> I don't know what ABI that's referring to.  As I said, the x86-64 ABI
> document I was able to find (which isn't referenced from the Clang page)
> doesn't mention [u]intmax_t, or even <stdint.h>.

Looks like proof by assertion.

> There's a gcc bug report:
> 
>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49595
> 
> that's been resolved as "invalid".  The response says that __int128 is
> not an "extended integer type" as defined by the C99 standard, and
> therefore the definition of [u]intmax_t needn't reflect its existence.
> That's true, but no real justification is given for the (permitted but
> IMHO flawed) decision to treat __int128 as an extension rather than as
> an extended integer type.

I presume it does all the usual (extended) interger type behaviour
(appropriate conversion rank and associated automatic conversions/
promotions)? How does GCC explain that away - does it list all of
the usual behaviour of integer types and say that despite 128-bit
types not being integer types they behave the same way?

>  A comment in the bug report claims that
> 
>     sizeof(intmax_t) is fixed by various LP64 ABIs and cannot be changed
> 
> but no support is given for that claim, even though someone else
> specifically asked for specifics.

Ah, proof by assertion again, always a winner. At least they didn't
just point the finger at Clang and say "we're just doing what they
said was right".

Phil
-- 
A well regulated militia, being necessary to the security of a free state,
the right of the people to keep and bear arms, shall be well regulated.
0
Phil
5/9/2015 10:14:42 AM
Phil Carmody <pc+usenet@asdf.org> writes:

> Keith Thompson <kst-u@mib.org> writes:
>> Tim Rentsch <txr@alumni.caltech.edu> writes:
>>> Keith Thompson <kst-u@mib.org> writes:
>>
>> [...]
>>
>>>> I've been told that some ABI requires intmax_t and uintmax_t to
>>>> be 64 bits.  IMHO that's a horrible requirement.
>>>
>>> Beyond horrble.  Stupefyingly bad.
>
> I think I'm somewhere between you two on that point.

Oh really?  I thought "stupefyingly bad" was a fairly mild
increase over "horrible" - ie, not so extreme as, eg,
"horrifically grotesque".  :)

>>>> It doesn't
>>>> forbid extended types wider than 64 bits, but it forbids
>>>> implementing them correctly.  But I just checked
>>>> http://www.x86-64.org/documentation/abi.pdf, and it doesn't
>>>> mention [u]intmax_t or <stdint.h>.
>>>
>>> So, this raises the question - do you know of some actual
>>> specification that limits [u]intmax_t to 64 bits?
>>
>> No, I don't.  The web page showing C++11 implementation status for
>> Clang, in the section on C99 features in C++11, says:
>>
>>     Extended integral types:  N/A
>>
>> with a footnote that says:
>>
>>     No compiler changes are required for an implementation such as
>>     Clang that does not provide any extended integer types.
>>     __int128 is not treated as an extended integer type, because
>>     changing intmax_t would be an ABI-incompatible change.
>>
>> I don't know what ABI that's referring to.  As I said, the x86-64
>> ABI document I was able to find (which isn't referenced from the
>> Clang page) doesn't mention [u]intmax_t, or even <stdint.h>.
>
> Looks like proof by assertion.

I think the remark is meant only as a statement, not as a proof.

>> There's a gcc bug report:
>>
>>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49595
>>
>> that's been resolved as "invalid".  The response says that __int128
>> is not an "extended integer type" as defined by the C99 standard,
>> and therefore the definition of [u]intmax_t needn't reflect its
>> existence.  That's true, but no real justification is given for the
>> (permitted but IMHO flawed) decision to treat __int128 as an
>> extension rather than as an extended integer type.
>
> I presume it does all the usual (extended) interger type behaviour
> (appropriate conversion rank and associated automatic conversions/
> promotions)?  How does GCC explain that away - does it list all of
> the usual behaviour of integer types and say that despite 128-bit
> types not being integer types they behave the same way?

The types __[u]int128 satisfy some, but not all, requirements
that the Standard gives for standard/extended integer types.
This page

    http://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html

says this:

    As an extension the integer scalar type __int128 is
    supported for targets which have an integer mode wide enough
    to hold 128 bits. Simply write __int128 for a signed 128-bit
    integer, or unsigned __int128 for an unsigned 128-bit
    integer. There is no support in GCC for expressing an
    integer constant of type __int128 for targets with long long
    integer less than 128 bits wide.

Note that there are other considerations besides promotion and
conversion rank.  For example, are int128_t or PRI*128 defined?
(I believe they are not.)  Also there are implications for the
preprocessor, where all integer types have a width matching
[u]intmax_t, but not __[u]int128.

>>  A comment in the bug report claims that
>>
>>     sizeof(intmax_t) is fixed by various LP64 ABIs and cannot be
>>     changed
>>
>> but no support is given for that claim, even though someone else
>> specifically asked for specifics.
>
> Ah, proof by assertion again, always a winner.  At least they
> didn't just point the finger at Clang and say "we're just doing
> what they said was right".

Did you see my followup note?  There is some substance to the
claim, because of practical realities concerning how linking
is done with third-party libraries.  To me this indicates a
flaw somewhere, but it isn't clear just where.
0
Tim
5/10/2015 3:23:15 PM
On 04/28/2015 02:27 PM, Keith Thompson wrote:
....
>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49595
> 
> that's been resolved as "invalid".  The response says that __int128 is
> not an "extended integer type" as defined by the C99 standard, and
> therefore the definition of [u]intmax_t needn't reflect its existence.
> That's true, but no real justification is given for the (permitted but
> IMHO flawed) decision to treat __int128 as an extension rather than as
> an extended integer type.  A comment in the bug report claims that

That's a perfectly feasible approach - but it has certain consequences:
if __int128 isn't an extended integer type (and it's certainly not a
standard integer type), it can't qualify as an integer type. The
standard doesn't allow for extended floating point types, so it can't
qualify as a floating point type (and probably wouldn't meet the
appropriate requirements if the standard did allow extended floating
point types). Therefore, it can't qualify as an arithmetic type, either.
By similar logic, it can't qualify as a scalar type.

Therefore, any use of __int128 in any context where an integer,
arithmetic or scalar type is required to avoid a constraint violation, a
diagnostic must be issued if __int128 is used.

With the appropriate options chosen, does gcc issue the required
diagnostics? I'd test it myself, but I'm very short of spare time right now.
-- 
James Kuyper
0
James
5/10/2015 4:20:54 PM
On 5/10/15 12:20 PM, James Kuyper wrote:
> On 04/28/2015 02:27 PM, Keith Thompson wrote:
> ...
>>      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49595
>>
>> that's been resolved as "invalid".  The response says that __int128 is
>> not an "extended integer type" as defined by the C99 standard, and
>> therefore the definition of [u]intmax_t needn't reflect its existence.
>> That's true, but no real justification is given for the (permitted but
>> IMHO flawed) decision to treat __int128 as an extension rather than as
>> an extended integer type.  A comment in the bug report claims that
>
> That's a perfectly feasible approach - but it has certain consequences:
> if __int128 isn't an extended integer type (and it's certainly not a
> standard integer type), it can't qualify as an integer type. The
> standard doesn't allow for extended floating point types, so it can't
> qualify as a floating point type (and probably wouldn't meet the
> appropriate requirements if the standard did allow extended floating
> point types). Therefore, it can't qualify as an arithmetic type, either.
> By similar logic, it can't qualify as a scalar type.
>
> Therefore, any use of __int128 in any context where an integer,
> arithmetic or scalar type is required to avoid a constraint violation, a
> diagnostic must be issued if __int128 is used.
>
> With the appropriate options chosen, does gcc issue the required
> diagnostics? I'd test it myself, but I'm very short of spare time right now.
>

Not diagnostic*s* but diagnostic. The standard makes no requirement to 
the number of diagnostics generated for an "incorrect" program. A 
compiler could just say "You have errors" if a constraint is violated 
and be conforming. There also is no requirement that diagnostic is an 
"error", so a perfectly conforming diagnostic would be:

Info: __int128 used, printed just once per translation unit.


0
Richard
5/10/2015 5:22:27 PM
On 10/05/15 18:22, Richard Damon wrote:
> On 5/10/15 12:20 PM, James Kuyper wrote:

<snip>

>> With the appropriate options chosen, does gcc issue the required
>> diagnostics? I'd test it myself, but I'm very short of spare time right now.
>>
>
> Not diagnostic*s* but diagnostic.

I might beg to differ. :-)

The term "diagnostics" is often used in the same grammatical sense as 
"statistics" or "pediatrics" (indeed, the Standard itself uses the word 
in that way], and so the way in which James used the word is perfectly 
correct (assuming that he meant it in that way). The word "diagnostic" 
is an adjective, not a noun. The term "issue a diagnostic" is a common 
one, and perhaps acceptable for that reason, but the proper term should 
be "issue a diagnostic message" (which is the way in which the Standard 
uses the term, when it is referring to the issuing of a message.

If James meant "diagnostics" in the sense of a plural, however, then I 
agree with your correction.

<snip>

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/10/2015 5:44:25 PM
Tim Rentsch <txr@alumni.caltech.edu> writes:
[...]
> The types __[u]int128 satisfy some, but not all, requirements
> that the Standard gives for standard/extended integer types.

The types are `__int128` and `unsigned __int128`.  `__int128` is treated
as an implementation-defined keyword.

The main requirement that they fail to satisfy is documentation that
they're extended integer types.  The gcc manual specifically says that
gcc doesn't support any extended integer types.

> This page
>
>     http://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html
>
> says this:
>
>     As an extension the integer scalar type __int128 is
>     supported for targets which have an integer mode wide enough
>     to hold 128 bits. Simply write __int128 for a signed 128-bit
>     integer, or unsigned __int128 for an unsigned 128-bit
>     integer. There is no support in GCC for expressing an
>     integer constant of type __int128 for targets with long long
>     integer less than 128 bits wide.
>
> Note that there are other considerations besides promotion and
> conversion rank.  For example, are int128_t or PRI*128 defined?

No.

> (I believe they are not.)  Also there are implications for the
> preprocessor, where all integer types have a width matching
> [u]intmax_t, but not __[u]int128.

gcc's preprocessor warns about integer constants bigger than 2**64-1.

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/10/2015 6:06:07 PM
James Kuyper <jameskuyper@verizon.net> writes:
[...]
> Therefore, any use of __int128 in any context where an integer,
> arithmetic or scalar type is required to avoid a constraint violation, a
> diagnostic must be issued if __int128 is used.
>
> With the appropriate options chosen, does gcc issue the required
> diagnostics? I'd test it myself, but I'm very short of spare time right now.

Yes.  With "gcc -std=cNN -pedantic" (where NN is 90, 99, or 11), it
warns:

c.c:2:5: warning: ISO C does not support ‘__int128’ type [-Wpedantic]
     __int128 n = 0;
     ^

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/10/2015 6:12:26 PM
James Kuyper <jameskuyper@verizon.net> writes:

> On 04/28/2015 02:27 PM, Keith Thompson wrote:
> ...
>>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49595
>>
>> that's been resolved as "invalid".  The response says that __int128 is
>> not an "extended integer type" as defined by the C99 standard, and
>> therefore the definition of [u]intmax_t needn't reflect its existence.
>> That's true, but no real justification is given for the (permitted but
>> IMHO flawed) decision to treat __int128 as an extension rather than as
>> an extended integer type.  A comment in the bug report claims that
>
> That's a perfectly feasible approach - but it has certain consequences:
> if __int128 isn't an extended integer type (and it's certainly not a
> standard integer type), [snip]

The last part there is ambiguous, and I'm not sure what is meant.
It may be true that __int128 is not a standard integer type in
a particular implementation, but __int128 can refer to a standard
integer type in a(nother) conforming implementation.
0
Tim
5/12/2015 4:46:36 AM
Keith Thompson <kst-u@mib.org> writes:

> Tim Rentsch <txr@alumni.caltech.edu> writes:
> [...]
>> The types __[u]int128 satisfy some, but not all, requirements
>> that the Standard gives for standard/extended integer types.
>
> The types are `__int128` and `unsigned __int128`.

Ahh, okay, thank you for the correction.

> `__int128` is treated
> as an implementation-defined keyword.
>
> The main requirement that they fail to satisfy is documentation that
> they're extended integer types.  The gcc manual specifically says that
> gcc doesn't support any extended integer types.

I was thinking of implicit requirements like their ranges being
a subset of the range of [u]intmax_t.  (I don't know if there
are others, I didn't try to make an exhaustive list.)

>> This page
>>
>>     http://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html
>>
>> says this:
>>
>>     As an extension the integer scalar type __int128 is
>>     supported for targets which have an integer mode wide enough
>>     to hold 128 bits.  Simply write __int128 for a signed 128-bit
>>     integer, or unsigned __int128 for an unsigned 128-bit
>>     integer.  There is no support in GCC for expressing an
>>     integer constant of type __int128 for targets with long long
>>     integer less than 128 bits wide.
>>
>> Note that there are other considerations besides promotion and
>> conversion rank.  For example, are int128_t or PRI*128 defined?
>
> No.
>
>> (I believe they are not.)  Also there are implications for the
>> preprocessor, where all integer types have a width matching
>> [u]intmax_t, but not __[u]int128.
>
> gcc's preprocessor warns about integer constants bigger than 2**64-1.

Right, which presumably it would not do if __int128 were
considered by gcc to be an extended integer type.
0
Tim
5/12/2015 4:52:52 AM
Tim Rentsch <txr@alumni.caltech.edu> writes:
> James Kuyper <jameskuyper@verizon.net> writes:
>> On 04/28/2015 02:27 PM, Keith Thompson wrote:
>> ...
>>>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49595
>>>
>>> that's been resolved as "invalid".  The response says that __int128 is
>>> not an "extended integer type" as defined by the C99 standard, and
>>> therefore the definition of [u]intmax_t needn't reflect its existence.
>>> That's true, but no real justification is given for the (permitted but
>>> IMHO flawed) decision to treat __int128 as an extension rather than as
>>> an extended integer type.  A comment in the bug report claims that
>>
>> That's a perfectly feasible approach - but it has certain consequences:
>> if __int128 isn't an extended integer type (and it's certainly not a
>> standard integer type), [snip]
>
> The last part there is ambiguous, and I'm not sure what is meant.
> It may be true that __int128 is not a standard integer type in
> a particular implementation, but __int128 can refer to a standard
> integer type in a(nother) conforming implementation.

I was referring mostly to gcc.  (A side point: __int128 isn't
supported on all targets.)

The standard integer types are _Bool and signed and unsigned char,
short, int, long, and long long.  The integer types are those plus
any extended integer types.  (Plain char doesn't seem to fit into
that scheme, which I find odd.)

__int128 could be the same type as long long, for example (it's in the
implementation's reserved namespace so it could be just about
*anything*).  But there wouldn't be much point in defining it that way,
since it should also be int128_t.  (That's me, not the standard, saying
it "should" be defined as in128_t.)

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/12/2015 5:11:56 PM
On 12/05/15 18:11, Keith Thompson wrote:

<snip>

> The standard integer types are _Bool and signed and unsigned char,
> short, int, long, and long long.  The integer types are those plus
> any extended integer types.  (Plain char doesn't seem to fit into
> that scheme, which I find odd.)

It /is/ odd, but for fairly unremarkable historical reasons. A char has 
to be able to store, with a non-negative value, every character in the 
basic execution character set. EBCDIC has some fairly important 
characters with code points greater than 127. On an EBCDIC system in 
which 8-bit char is considered desirable, implementations are more or 
less forced to make char unsigned.

We have some choices:

1) outlaw EBCDIC;
2) force ASCII-based implementations to make char unsigned;
3) make char at least 9 bits rather than 8;
4) live with a weird char, like we do now.

If we do (1), IBM will sulk.

If we do (2), GNU, Microsoft, etc will sulk (and we'll introduce a 
different anomaly, that of char being unsigned by default whereas short, 
int, long, and long long, are signed by default.

If we do (3), *everybody* will sulk.

But if we go for (4), we don't have to change anything, which is always 
a popular option.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/12/2015 5:26:04 PM
Keith Thompson <kst-u@mib.org> writes:

> Tim Rentsch <txr@alumni.caltech.edu> writes:
>> James Kuyper <jameskuyper@verizon.net> writes:
>>> On 04/28/2015 02:27 PM, Keith Thompson wrote:
>>> ...
>>>>     https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49595
>>>>
>>>> that's been resolved as "invalid".  The response says that __int128 is
>>>> not an "extended integer type" as defined by the C99 standard, and
>>>> therefore the definition of [u]intmax_t needn't reflect its existence.
>>>> That's true, but no real justification is given for the (permitted but
>>>> IMHO flawed) decision to treat __int128 as an extension rather than as
>>>> an extended integer type.  A comment in the bug report claims that
>>>
>>> That's a perfectly feasible approach - but it has certain consequences:
>>> if __int128 isn't an extended integer type (and it's certainly not a
>>> standard integer type), [snip]
>>
>> The last part there is ambiguous, and I'm not sure what is meant.
>> It may be true that __int128 is not a standard integer type in
>> a particular implementation, but __int128 can refer to a standard
>> integer type in a(nother) conforming implementation.
>
> I was referring mostly to gcc.

So was I.  In fact I mentioned it because I saw the possibility
mentioned somewhere in connection with gcc.

> (A side point: __int128 isn't supported on all targets.)

No, this is C, why should it be? ;)

> The standard integer types are _Bool and signed and unsigned char,
> short, int, long, and long long.  The integer types are those plus
> any extended integer types.  (Plain char doesn't seem to fit into
> that scheme, which I find odd.)

Eh?  The integer types are char, unsigned integer types, signed
integer types, and enumeration types.  You must be thinking of
something else.

> __int128 could be the same type as long long, for example (it's in
> the implementation's reserved namespace so it could be just about
> *anything*).  But there wouldn't be much point in defining it that
> way, since it should also be int128_t.  (That's me, not the
> standard, saying it "should" be defined as in128_t.)

Offhand I can think of two reasons.  One, assuming the type is
supported, since __int128 is a keyword extension, it might be
more convenient to define it than not.  Two, the requirements
for __int128 are less stringent than they are for int128_t -
some implementations might provide __int128 even though they
couldn't (conveniently) provide int128_t.  Oh, make that three
reasons:  three, if int128_t is going to be provided, it is
almost certainly easier to define it in terms of __int128
than it is with #if's havingg 'long long' as one of the choices.
0
Tim
5/12/2015 6:23:02 PM
Richard Heathfield <rjh@cpax.org.uk> writes:
> On 12/05/15 18:11, Keith Thompson wrote:
>
> <snip>
>
>> The standard integer types are _Bool and signed and unsigned char,
>> short, int, long, and long long.  The integer types are those plus
>> any extended integer types.  (Plain char doesn't seem to fit into
>> that scheme, which I find odd.)
>
> It /is/ odd, but for fairly unremarkable historical reasons. A char
> has to be able to store, with a non-negative value, every character in
> the basic execution character set. EBCDIC has some fairly important
> characters with code points greater than 127. On an EBCDIC system in
> which 8-bit char is considered desirable, implementations are more or
> less forced to make char unsigned.
>
> We have some choices:
>
> 1) outlaw EBCDIC;
> 2) force ASCII-based implementations to make char unsigned;
> 3) make char at least 9 bits rather than 8;
> 4) live with a weird char, like we do now.
>
> If we do (1), IBM will sulk.
>
> If we do (2), GNU, Microsoft, etc will sulk (and we'll introduce a
> different anomaly, that of char being unsigned by default whereas
> short, int, long, and long long, are signed by default.
>
> If we do (3), *everybody* will sulk.
>
> But if we go for (4), we don't have to change anything, which is
> always a popular option.

Sure, I understand all that, and I'm not suggesting a change.  (I
wouldn't mind requiring plain char to be unsigned, but that's probably
not going to happen, and it could have performance implications for some
systems.Z)

If I had been writing section 6.2.5 of the standard, I'd include
plain "char" in the list of standard integer types.  Clearly it's
standard, and integer, and a type, so excluding it seems odd.

Currently, the standard defines the *standard signed integer types*
as:
    signed char
    short
    int
    long
    long long
the *standard unsigned integer types* as:
    _Bool
    unsigned char
    unsigned short
    unsigned int
    unsigned long
    unsigned long long
and the *standard integer types* as the standard signed integer types
and the standard unsigned integer types.

6.2.5p17 defines the category of *integer types* as the signed and
unsigned integer types, plain char, and the enumerated types.  So plain
char is an *integer type* but not a *standard ineteger type*.
Furthermore, though char is an integer type and is either signed or
unsigned, it is neither a *signed integer type* nor an *unsigned integer
type*.

I don't think the standard uses these categories in a way that causes
real problems, but I find it unnecessarily confusing.

I suggest that it would have made more sense to say that plain char
is either one of the standard signed integer types or one of the
standard unsigned integer types (an implementation-defined choice)
*or* to define the *standard integer types* as the standard signed
integer types, the standard unsigned integer types, and plain char.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/12/2015 6:29:54 PM
Tim Rentsch <txr@alumni.caltech.edu> writes:
> Keith Thompson <kst-u@mib.org> writes:
>> Tim Rentsch <txr@alumni.caltech.edu> writes:
[...]
>> The standard integer types are _Bool and signed and unsigned char,
>> short, int, long, and long long.  The integer types are those plus
>> any extended integer types.  (Plain char doesn't seem to fit into
>> that scheme, which I find odd.)
>
> Eh?  The integer types are char, unsigned integer types, signed
> integer types, and enumeration types.  You must be thinking of
> something else.

Just a careless mistake.  I think I got the details right in my recent
followup to Richard.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/12/2015 8:08:34 PM
On Wed, 13 May 2015 14:55:27 +0200, David Brown
<david.brown@hesbynett.no> wrote:

>On 13/05/15 12:55, Richard Heathfield wrote:
>> On 13/05/15 08:14, David Brown wrote:
>> 
>>> Isn't EBCDIC only used in really old systems?
>> 
>> Spoken like a true PC junkie. :-)
>
>Well, PC downwards.  I don't do mainframes, minis, etc.
>
>> 
>> Which reminds me: isn't ASCII only used in really tiny systems?
>> 
>
>The point of ASCII is that apart from EBCDIC, everyone agrees on the
>first 128 characters.  And those first 128 characters are sufficient for
>a large proportion of programs.  We won't get everyone to agree about
>anything beyond that, so there is no point in trying.


Well, we don't really agree on the first 128 characters so much
either.  More these days, but things like $ and # in pre-ANSI ASCII
varied between languages considerably.  And we only semi-agree on a
few of the control characters.
0
Robert
5/13/2015 1:01:01 AM
On 12/05/15 20:29, Keith Thompson wrote:
> Richard Heathfield <rjh@cpax.org.uk> writes:
>> On 12/05/15 18:11, Keith Thompson wrote:
>>
>> <snip>
>>
>>> The standard integer types are _Bool and signed and unsigned char,
>>> short, int, long, and long long.  The integer types are those plus
>>> any extended integer types.  (Plain char doesn't seem to fit into
>>> that scheme, which I find odd.)
>>
>> It /is/ odd, but for fairly unremarkable historical reasons. A char
>> has to be able to store, with a non-negative value, every character in
>> the basic execution character set. EBCDIC has some fairly important
>> characters with code points greater than 127. On an EBCDIC system in
>> which 8-bit char is considered desirable, implementations are more or
>> less forced to make char unsigned.
>>
>> We have some choices:
>>
>> 1) outlaw EBCDIC;
>> 2) force ASCII-based implementations to make char unsigned;
>> 3) make char at least 9 bits rather than 8;
>> 4) live with a weird char, like we do now.
>>
>> If we do (1), IBM will sulk.

Isn't EBCDIC only used in really old systems?  I have always thought the
C standard committee should make a clean break at some point and say
that with C15 (for example) they get rid of the cruft and complications
required to support weird and/or outdated systems.  People programming
dinosaurs can stick to C11 or before.  Then they could drop EBCDIC
support and enshrine ASCII (but keep the choice of utf-8, latin-9,
etc.), mandate 8-bit char, 2's complement integers, etc.  It would
simplify many aspects of the language and work well with every modern
system.

>>
>> If we do (2), GNU, Microsoft, etc will sulk (and we'll introduce a
>> different anomaly, that of char being unsigned by default whereas
>> short, int, long, and long long, are signed by default.

gcc works fine with "-funsigned-char", and some gcc targets have
unsigned plain chars by default.  But of course some programmers write
code that makes assumptions about the signedness of plain char - even
though they are wrong to do so, it's not a good idea to change such things.

>>
>> If we do (3), *everybody* will sulk.

/I/ certainly would!

>>
>> But if we go for (4), we don't have to change anything, which is
>> always a popular option.

And it's easy to implement :-)

> 
> Sure, I understand all that, and I'm not suggesting a change.  (I
> wouldn't mind requiring plain char to be unsigned, but that's probably
> not going to happen, and it could have performance implications for some
> systems.Z)

I would prefer plain char to be unsigned - it is on some of the targets
I use, and seems much more natural to me.  It just doesn't make sense to
have a "negative" character (and small integers in my code use int8_t or
uint8_t, not "char").  For some targets, unsigned chars can be
noticeably more efficient for arithmetic than signed chars, so
performance can swing both ways.

> 
> If I had been writing section 6.2.5 of the standard, I'd include
> plain "char" in the list of standard integer types.  Clearly it's
> standard, and integer, and a type, so excluding it seems odd.
> 
> Currently, the standard defines the *standard signed integer types*
> as:
>     signed char
>     short
>     int
>     long
>     long long
> the *standard unsigned integer types* as:
>     _Bool
>     unsigned char
>     unsigned short
>     unsigned int
>     unsigned long
>     unsigned long long
> and the *standard integer types* as the standard signed integer types
> and the standard unsigned integer types.

I suppose _Bool is the odd one out, there being no corresponding "signed
_Bool".

> 
> 6.2.5p17 defines the category of *integer types* as the signed and
> unsigned integer types, plain char, and the enumerated types.  So plain
> char is an *integer type* but not a *standard ineteger type*.
> Furthermore, though char is an integer type and is either signed or
> unsigned, it is neither a *signed integer type* nor an *unsigned integer
> type*.
> 
> I don't think the standard uses these categories in a way that causes
> real problems, but I find it unnecessarily confusing.
> 
> I suggest that it would have made more sense to say that plain char
> is either one of the standard signed integer types or one of the
> standard unsigned integer types (an implementation-defined choice)
> *or* to define the *standard integer types* as the standard signed
> integer types, the standard unsigned integer types, and plain char.
> 

I think it would have been better to define a category "integer types"
consisting of the signed and unsigned integer types, and a category
"discrete types" consisting of the "integer types", plain char, and
enumerated types.  Arithmetic should then only be allowed on true
"integer types", while the other discrete types had a restricted set of
operations (including comparisons, but not arithmetic operators).

Then the distinction would make sense, and improve type safety in C
programming.


0
David
5/13/2015 7:14:15 AM
On 13/05/15 08:14, David Brown wrote:

> Isn't EBCDIC only used in really old systems?

Spoken like a true PC junkie. :-)

Which reminds me: isn't ASCII only used in really tiny systems?

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/13/2015 10:55:54 AM
On 13/05/15 12:55, Richard Heathfield wrote:
> On 13/05/15 08:14, David Brown wrote:
> 
>> Isn't EBCDIC only used in really old systems?
> 
> Spoken like a true PC junkie. :-)

Well, PC downwards.  I don't do mainframes, minis, etc.

> 
> Which reminds me: isn't ASCII only used in really tiny systems?
> 

The point of ASCII is that apart from EBCDIC, everyone agrees on the
first 128 characters.  And those first 128 characters are sufficient for
a large proportion of programs.  We won't get everyone to agree about
anything beyond that, so there is no point in trying.


0
David
5/13/2015 12:55:27 PM
On 13/05/15 13:55, David Brown wrote:
> On 13/05/15 12:55, Richard Heathfield wrote:
>> On 13/05/15 08:14, David Brown wrote:
>>
>>> Isn't EBCDIC only used in really old systems?
>>
>> Spoken like a true PC junkie. :-)
>
> Well, PC downwards.  I don't do mainframes, minis, etc.
>
>>
>> Which reminds me: isn't ASCII only used in really tiny systems?
>>
>
> The point of ASCII is that apart from EBCDIC, everyone agrees on the
> first 128 characters.

So are you claiming that a 7-bit code that has been standardised is a 
7-bit code that has been standardised? Outrageous! :-)

The point of EBCDIC is that, like it or not, big iron runs it, *almost* 
without exception, so we're kinda stuck with it for now (although there 
are signs that IBM is slowly edging towards ASCII).

> And those first 128 characters are sufficient for
> a large proportion of programs.  We won't get everyone to agree about
> anything beyond that, so there is no point in trying.

Unfortunately, EBCDIC will be with us for some time to come (decades at 
least, I should imagine), so if we want our C programs to run on big 
iron (and, believe it or not, some of us really really do), we're kind 
of stuck with a weird char.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/13/2015 1:35:51 PM
Richard Heathfield <rjh@cpax.org.uk> writes:
[...]
> Unfortunately, EBCDIC will be with us for some time to come (decades
> at least, I should imagine), so if we want our C programs to run on
> big iron (and, believe it or not, some of us really really do), we're
> kind of stuck with a weird char.

If the next C standard were incompatible with EBCDIC (for example,
requiring 'a' + 25 == 'z', or even 'a' == 97), IBM would continue
to sell and use C-like compilers that use EBCDIC.  They'd probably
just claim conformance to an earlier edition of the standard (have
they adopted C11? C99?).  Or, if the new standard had other features
IBM finds useful, they'd just implement them and acknowledge the
non-conformance of the character set.

The problem is that it would fragment the language, and encourage
programmers to write "portable" C code that wouldn't work on IBM
mainframes.

In any case, I think IBM has enough influence on the C standard
committee that this isn't going to happen any time soon.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/13/2015 3:24:29 PM
On 13/05/15 15:35, Richard Heathfield wrote:
> On 13/05/15 13:55, David Brown wrote:
>> On 13/05/15 12:55, Richard Heathfield wrote:
>>> On 13/05/15 08:14, David Brown wrote:
>>>
>>>> Isn't EBCDIC only used in really old systems?
>>>
>>> Spoken like a true PC junkie. :-)
>>
>> Well, PC downwards.  I don't do mainframes, minis, etc.
>>
>>>
>>> Which reminds me: isn't ASCII only used in really tiny systems?
>>>
>>
>> The point of ASCII is that apart from EBCDIC, everyone agrees on the
>> first 128 characters.
>
> So are you claiming that a 7-bit code that has been standardised is a
> 7-bit code that has been standardised? Outrageous! :-)
>
> The point of EBCDIC is that, like it or not, big iron runs it, *almost*
> without exception, so we're kinda stuck with it for now (although there
> are signs that IBM is slowly edging towards ASCII).
>
>> And those first 128 characters are sufficient for
>> a large proportion of programs.  We won't get everyone to agree about
>> anything beyond that, so there is no point in trying.
>
> Unfortunately, EBCDIC will be with us for some time to come (decades at
> least, I should imagine), so if we want our C programs to run on big
> iron (and, believe it or not, some of us really really do), we're kind
> of stuck with a weird char.
>

That's fine - I am not suggesting that C compilers drop support for 
these systems.  I am just wondering if such systems need support from 
newer C standards - and if it would be practical to say that if a 
program is written to C15 (or whatever), then it the character set must 
be 8-bit, and the first 128 characters must be ASCII.  This doesn't 
change C for EBCDIC computers - it merely hinders them from running C15 
programs.  But it would make life a little easier for everyone who 
/doesn't/ write code to run on those systems - as well as making the 
standard itself a little shorter and clearer.

As another example, consider support for char's wider than 8-bit. 
Outside of mainframes, the most common class of systems with wider 
char's are DSP devices - they sometimes have 16-bit or even 32-bit chars 
(with a few really weird systems having 24-bit chars).  But would there 
be any disadvantage of saying that these cannot support C15, or at least 
that they cannot do so efficiently?  I don't think anything would be 
lost - few compilers for such devices support anything beyond C99 (and 
not all have come that far!).  And of course developers would be happy 
to see such tools support 8-bit chars, even if it meant somewhat less 
efficient generated code.

0
David
5/13/2015 7:15:22 PM
David Brown <david.brown@hesbynett.no> wrote:
(snip)
> That's fine - I am not suggesting that C compilers drop support for 
> these systems.  I am just wondering if such systems need support from 
> newer C standards - and if it would be practical to say that if a 
> program is written to C15 (or whatever), then it the character set must 
> be 8-bit, and the first 128 characters must be ASCII.  This doesn't 
> change C for EBCDIC computers - it merely hinders them from running C15 
> programs.  But it would make life a little easier for everyone who 
> /doesn't/ write code to run on those systems - as well as making the 
> standard itself a little shorter and clearer.

IBM will probably disagree. I believe they have good C support
for z/OS.
 
> As another example, consider support for char's wider than 8-bit. 
> Outside of mainframes, the most common class of systems with wider 
> char's are DSP devices - they sometimes have 16-bit or even 32-bit chars 
> (with a few really weird systems having 24-bit chars).  But would there 
> be any disadvantage of saying that these cannot support C15, or at least 
> that they cannot do so efficiently?  I don't think anything would be 
> lost - few compilers for such devices support anything beyond C99 (and 
> not all have come that far!).  And of course developers would be happy 
> to see such tools support 8-bit chars, even if it meant somewhat less 
> efficient generated code.

We have a C compiler on the PDP-10, but I don't know that anyone
expects C15 for that.  It does have 9 bit char.

-- glen
0
glen
5/13/2015 7:40:16 PM
On 13/05/2015 20:40, glen herrmannsfeldt wrote:
> David Brown <david.brown@hesbynett.no> wrote:
> (snip)
>> That's fine - I am not suggesting that C compilers drop support for
>> these systems.  I am just wondering if such systems need support from
>> newer C standards - and if it would be practical to say that if a
>> program is written to C15 (or whatever), then it the character set must
>> be 8-bit, and the first 128 characters must be ASCII.  This doesn't
>> change C for EBCDIC computers - it merely hinders them from running C15
>> programs.  But it would make life a little easier for everyone who
>> /doesn't/ write code to run on those systems - as well as making the
>> standard itself a little shorter and clearer.
>
> IBM will probably disagree. I believe they have good C support
> for z/OS.

They can keep EBCDIC if they want, but why inflict it on everyone else? 
Especially as it was such a crazy character mapping.

People can spend lifetimes programming and not come across EBCDIC, yet 
they are denied language features such as: case 'A'..'F', and EBCDIC is 
usually quoted as one of the reasons.

> We have a C compiler on the PDP-10, but I don't know that anyone
> expects C15 for that.  It does have 9 bit char.

The one I used had mainly 7-bit characters (not in C). And used ASCII.

-- 
Bartc

0
Bartc
5/13/2015 8:08:02 PM
Bartc <bc@freeuk.com> wrote:
>> David Brown <david.brown@hesbynett.no> wrote:
>> (snip)
>>> That's fine - I am not suggesting that C compilers drop support for
>>> these systems.  I am just wondering if such systems need support from
>>> newer C standards - and if it would be practical to say that if a
>>> program is written to C15 (or whatever), then it the character set must
>>> be 8-bit, and the first 128 characters must be ASCII.  

(snip, then I wrote)
>> IBM will probably disagree. I believe they have good C support
>> for z/OS.
 
> They can keep EBCDIC if they want, but why inflict it on 
> everyone else?   Especially as it was such a crazy 
> character mapping.

Seems that ASCII and EBCDIC are about the same age. 

IBM was considering ASCII-8 (which isn't ASCII-7 with an extra
bit on the left.) but the standard never came. 
 
> People can spend lifetimes programming and not come across EBCDIC, 
> yet they are denied language features such as: case 'A'..'F', 
> and EBCDIC is usually quoted as one of the reasons.

You can use 'A'..'F' just fine. There are some other codes between
'I' and 'J', and also between 'R' and 'S', but not many printable
characters. Specifically, only '}' and '\'. 
 
>> We have a C compiler on the PDP-10, but I don't know that anyone
>> expects C15 for that.  It does have 9 bit char.
 
> The one I used had mainly 7-bit characters (not in C). And used ASCII.

The PDP-10 file system stores 7-bit ASCII, five to a 36 bit word.
But C requires that char be at least 8 bits, and that a word (int)
be an integral multiple of char. That leaves 9, 12, 18, and 36 bits
as possible choices. 

-- glen
0
glen
5/13/2015 8:52:42 PM
In article <mj0diq$f3t$1@speranza.aioe.org>,
glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:
....
>> People can spend lifetimes programming and not come across EBCDIC, 
>> yet they are denied language features such as: case 'A'..'F', 
>> and EBCDIC is usually quoted as one of the reasons.
>
>You can use 'A'..'F' just fine. There are some other codes between
>'I' and 'J', and also between 'R' and 'S', but not many printable
>characters. Specifically, only '}' and '\'. 

I think you missed Bart's point.  He was (I believe) talking about
"language features", not encoding specifics.

(What follows is speculation as to what Bart was talking about; I may be
wrong, but if I am, then consider it to be tagged to me and not to him)

The problem is that because C wants to support weird character sets like
EBCDIC, so as a result we can't have nice things.  Specifically, we can't
have as part of the syntax of "switch":

	case 'A'..'F':
	    puts("Valid grade!");

I'm pretty sure other languages support this - I think Pascal does.

-- 
This is the GOP's problem.  When you're at the beginning of the year
and you've got nine Democrats running for the nomination, maybe one or
two of them are Dennis Kucinich.  When you have nine Republicans, seven
or eight of them are Michelle Bachmann.
0
gazelle
5/13/2015 9:06:50 PM
Kenny McCormack <gazelle@shell.xmission.com> wrote:

(snip)
> I think you missed Bart's point.  He was (I believe) talking about
> "language features", not encoding specifics.
 
> (What follows is speculation as to what Bart was talking about; I may be
> wrong, but if I am, then consider it to be tagged to me and not to him)
 
> The problem is that because C wants to support weird character sets like
> EBCDIC, so as a result we can't have nice things.  Specifically, we can't
> have as part of the syntax of "switch":
 
>        case 'A'..'F':
>            puts("Valid grade!");
 
> I'm pretty sure other languages support this - I think Pascal does.

In both ASCII and EBCDIC, the characters 'A' through 'F' are contiguous,
so you could do that. (Seems to me a language design decision.)

There are some ranges that are not contiguous, and in all cases
it is up to the programmer to know how to use them. 

There are plenty of programming mistakes you can make using
just ASCII characters.

-- glen


0
glen
5/13/2015 11:28:55 PM
On Wed, 13 May 2015 08:24:29 -0700, Keith Thompson <kst-u@mib.org>
wrote:

>If the next C standard were incompatible with EBCDIC (for example,
>requiring 'a' + 25 == 'z', or even 'a' == 97), IBM would continue
>to sell and use C-like compilers that use EBCDIC.  They'd probably
>just claim conformance to an earlier edition of the standard (have
>they adopted C11? C99?).


AFAIK, the zOS compiler is pretty complete on C99, and has partial C11
and C++11 support.
0
Robert
5/13/2015 11:47:30 PM
On Wed, 13 May 2015 21:06:50 +0000 (UTC), gazelle@shell.xmission.com
(Kenny McCormack) wrote:

>In article <mj0diq$f3t$1@speranza.aioe.org>,
>glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:
>...
>>> People can spend lifetimes programming and not come across EBCDIC, 
>>> yet they are denied language features such as: case 'A'..'F', 
>>> and EBCDIC is usually quoted as one of the reasons.
>>
>>You can use 'A'..'F' just fine. There are some other codes between
>>'I' and 'J', and also between 'R' and 'S', but not many printable
>>characters. Specifically, only '}' and '\'. 
>
>I think you missed Bart's point.  He was (I believe) talking about
>"language features", not encoding specifics.
>
>(What follows is speculation as to what Bart was talking about; I may be
>wrong, but if I am, then consider it to be tagged to me and not to him)
>
>The problem is that because C wants to support weird character sets like
>EBCDIC, so as a result we can't have nice things.  Specifically, we can't
>have as part of the syntax of "switch":
>
>	case 'A'..'F':
>	    puts("Valid grade!");
>
>I'm pretty sure other languages support this - I think Pascal does.


We *could* have that in C, but what it would mean, when using
character constants, would be quite implementation defined.  Is
avoiding the inevitable unexpected performance of 'a'..'z' worth not
having the feature?  You'd have exactly the same problem coding a
range like that in a if statement.

OTOH, I'm having trouble thinking of a time I wanted a character
range, but I've wanted a numeric range quite a few times (not that
those two cases are distinct in C).
0
Robert
5/13/2015 11:54:01 PM
In article <mj0mnm$4i8$1@speranza.aioe.org>,
glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:
>Kenny McCormack <gazelle@shell.xmission.com> wrote:
>
>(snip)
>> I think you missed Bart's point.  He was (I believe) talking about
>> "language features", not encoding specifics.
> 
>> (What follows is speculation as to what Bart was talking about; I may be
>> wrong, but if I am, then consider it to be tagged to me and not to him)
> 
>> The problem is that because C wants to support weird character sets like
>> EBCDIC, so as a result we can't have nice things.  Specifically, we can't
>> have as part of the syntax of "switch":
> 
>>        case 'A'..'F':
>>            puts("Valid grade!");
> 
>> I'm pretty sure other languages support this - I think Pascal does.
>
>In both ASCII and EBCDIC, the characters 'A' through 'F' are contiguous,
>so you could do that.

If you think that what you've written is an answer to or an objection to
what I wrote, then you have completely missed the point of my post.

> (Seems to me a language design decision.)

Yes.  That *is* the point.  That C's syntax does not allow ranges on
switch() statements.  And that, I speculate, part of the reason it doesn't
is because of wanting to support EBCDIC (and/or other weird encodings).

I'm going to quote myself here.  Please read the following again, carefully:

* The problem is that because C wants to support weird character sets like
* EBCDIC, so as a result we can't have nice things.

Note that the phrase "can't have nice things" is a current cliche.  A nice
turn of phrase...

-- 
"I heard somebody say, 'Where's Nelson Mandela?' Well, 
Mandela's dead. Because Saddam killed all the Mandelas." 

George W. Bush, on the former South African president who 
is still very much alive, Sept. 20, 2007

0
gazelle
5/14/2015 12:19:42 AM
On 14/05/2015 00:54, Robert Wessel wrote:
> On Wed, 13 May 2015 21:06:50 +0000 (UTC), gazelle@shell.xmission.com
> (Kenny McCormack) wrote:

>> (What follows is speculation as to what Bart was talking about; I may be
>> wrong, but if I am, then consider it to be tagged to me and not to him)

(You're spot on)

>> The problem is that because C wants to support weird character sets like
>> EBCDIC, so as a result we can't have nice things.  Specifically, we can't
>> have as part of the syntax of "switch":
>>
>> 	case 'A'..'F':
>> 	    puts("Valid grade!");
>>
>> I'm pretty sure other languages support this - I think Pascal does.
>
>
> We *could* have that in C, but what it would mean, when using
> character constants, would be quite implementation defined.

Why? Within the ASCII range of 32 to 127, and especially for digits and 
letters, it would be well-defined.


   Is
> avoiding the inevitable unexpected performance of 'a'..'z' worth not
> having the feature?  You'd have exactly the same problem coding a
> range like that in a if statement.

Yes, and the same advantage. But there's a particular advantage in 
having it available in a switch statement:

> OTOH, I'm having trouble thinking of a time I wanted a character
> range, but I've wanted a numeric range quite a few times (not that
> those two cases are distinct in C).

I use character ranges all the time. The following example is typical, 
but I can only do this outside C because C doesn't allow it (one reason 
for this is the business with EBCDIC):

  int c

  switch c
  when 'A'..'Z','a'..'z','0'..'9','_','$' then

  end switch

If I get my compiler to turn this into C, it produces this:

     int c;
     switch (c)
     {
     case 65:
     case 66:
     case 67:
     case 68:
     case 69:
     case 70:
     case 71:
     case 72:
     case 73:
     case 74:
     case 75:
     case 76:
     case 77:
     case 78:
     case 79:
     case 80:
     case 81:
     case 82:
     case 83:
     case 84:
     case 85:
     case 86:
     case 87:
     case 88:
     case 89:
     case 90:
     case 97:
     case 98:
     case 99:
     case 100:
     case 101:
     case 102:
     case 103:
     case 104:
     case 105:
     case 106:
     case 107:
     case 108:
     case 109:
     case 110:
     case 111:
     case 112:
     case 113:
     case 114:
     case 115:
     case 116:
     case 117:
     case 118:
     case 119:
     case 120:
     case 121:
     case 122:
     case 48:
     case 49:
     case 50:
     case 51:
     case 52:
     case 53:
     case 54:
     case 55:
     case 56:
     case 57:
     case 95:
     case 36:
         break;
     default:;
     }

OK, actual C will have 'A' instead of 65, and there might be several 
cases per line to keep it more compact. But it's still a lot of typing!

BTW those 'A'..'Z' ranges are part of working code and they work 
properly. But it need not be just because it assumes ASCII. If my 
translator generated 'A' constants instead of 65, then the same C output 
would compile in an EBCDIC system with the same 'A'..Z' range in the 
input source, and work correctly!

-- 
Bartc
0
Bartc
5/14/2015 12:31:40 AM
On Thu, 14 May 2015 09:57:49 +0100, Bartc <bc@freeuk.com> wrote:

>On 14/05/2015 00:28, glen herrmannsfeldt wrote:
>> Kenny McCormack <gazelle@shell.xmission.com> wrote:
>>
>> (snip)
>>> I think you missed Bart's point.  He was (I believe) talking about
>>> "language features", not encoding specifics.
>>
>>> (What follows is speculation as to what Bart was talking about; I may be
>>> wrong, but if I am, then consider it to be tagged to me and not to him)
>>
>>> The problem is that because C wants to support weird character sets like
>>> EBCDIC, so as a result we can't have nice things.  Specifically, we can't
>>> have as part of the syntax of "switch":
>>
>>>         case 'A'..'F':
>>>             puts("Valid grade!");
>>
>>> I'm pretty sure other languages support this - I think Pascal does.
>>
>> In both ASCII and EBCDIC, the characters 'A' through 'F' are contiguous,
>> so you could do that. (Seems to me a language design decision.)
>
>You keep saying that. Perhaps I should have chosen a different example! 
>(I had no idea where the breaks in the EBCDIC set were.) The point is 
>that because of those breaks in the sequence - wherever they happen to 
>be - you can't use case letter1 .. letter2: in general.
>
>But it does make you wonder why they bothered making some subsets of the 
>alphabet have consecutive codes at all, rather than just make them all 
>arbitrary and help keep everyone on their toes.


EBCDIC derives from earlier codes (all with "BCD" somewhere in the
name) and punch card usage.

The three groups of uppercase letters on IBM standard punch cards use
a 12/11/0 punch, plus a 1-9 punch.  That first punch effectively
represents the high nibble of the EBCDIC code (0xc0/0xd0/0xe0) for the
characters, and second punch becomes the low nibble of the code.  The
excess code point, oxe1, is used for a slash.  The digits are just
single 0-9 punches.

704 BCD followed a similar pattern, despite having 6-bit characters,
and the digits and three groups of alphabetic characters were in the
0x00/0x10/0x20/0x30 ranges.

In the 704 days relating the internal representation to the punch card
format made some sense.  As mentioned elsewhere IBM had intended an
ASCII variant for S/360, but both the standard and the ASCII
peripherals were late (many of the card devices and printers - "unit
record" gear - were enhanced/adapted versions of devices from
pre-S/360 systems), so the OS and everything else got written with the
updated BCD code, and despite having an ASCII mode, S/360 never used
it.

ASCII has some odd choices too, but it's clearly a much cleaner
organization.

But if you want discontiguous (and disordered!) letters, Baudot and
it's various descendents will do the trick.
0
Robert
5/14/2015 1:01:01 AM
On 13-May-15 19:31, Bartc wrote:
> On 14/05/2015 00:54, Robert Wessel wrote:
>> gazelle@shell.xmission.com (Kenny McCormack) wrote:
>>> case 'A'..'F': puts("Valid grade!");
>>> 
>>> I'm pretty sure other languages support this - I think Pascal
>>> does.
>> 
>> We *could* have that in C, but what it would mean, when using 
>> character constants, would be quite implementation defined.
> 
> Why? Within the ASCII range of 32 to 127, and especially for digits
> and letters, it would be well-defined.

Sadly, not all character sets agree 100% within that range, even if you
ignore the oddballs like EBCDIC where you expect problems.  Various ISO
character sets (and most of them still in common use) replace some of
punctuation characters--and not even the same ones from set to set!

Above 127, of course, it's a complete free-for-all.

> BTW those 'A'..'Z' ranges are part of working code and they work 
> properly. But it need not be just because it assumes ASCII. If my 
> translator generated 'A' constants instead of 65, then the same C
> output would compile in an EBCDIC system with the same 'A'..Z' range
> in the input source, and work correctly!

The problem is that, in C, 'A'..'Z' may include things that are _not_
upper case letters if the character set in use isn't one of the ASCII
compatible ones, e.g. EBCDIC.  That's why we can't have nice things.

For numeric ranges, the justification is a lot weaker, but since C
doesn't see a real difference between integers and characters, if we
have a problem with characters, we also have a problem with integers, so
again, we can't have nice things.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/14/2015 2:05:08 AM
Bartc <bc@freeuk.com> writes:
> On 13/05/2015 20:40, glen herrmannsfeldt wrote:
>> David Brown <david.brown@hesbynett.no> wrote:
>> (snip)
>>> That's fine - I am not suggesting that C compilers drop support for
>>> these systems.  I am just wondering if such systems need support from
>>> newer C standards - and if it would be practical to say that if a
>>> program is written to C15 (or whatever), then it the character set must
>>> be 8-bit, and the first 128 characters must be ASCII.  This doesn't
>>> change C for EBCDIC computers - it merely hinders them from running C15
>>> programs.  But it would make life a little easier for everyone who
>>> /doesn't/ write code to run on those systems - as well as making the
>>> standard itself a little shorter and clearer.
>>
>> IBM will probably disagree. I believe they have good C support
>> for z/OS.
>
> They can keep EBCDIC if they want, but why inflict it on everyone
> else? Especially as it was such a crazy character mapping.
>
> People can spend lifetimes programming and not come across EBCDIC, yet
> they are denied language features such as: case 'A'..'F', and EBCDIC
> is usually quoted as one of the reasons.

The characters 'A'..'F' and 'a'..'f' are contiguous even in EBCDIC.  A
future standard could guarantee that without breaking any systems that
use ASCII-based or EBCDIC-based character sets.

I just don't see all that many cases where case ranges would be useful.
Maybe you see more than I do.

If you want more than that, say 'A'..'Z' to test for uppercase letters,
that might seem reasonable -- but there are potentially lot more than 26
uppercase letters.

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/14/2015 2:17:17 AM
Richard Heathfield wrote:
> On 13/05/15 13:55, David Brown wrote:
>> On 13/05/15 12:55, Richard Heathfield wrote:
>>> On 13/05/15 08:14, David Brown wrote:
>>>
>>>> Isn't EBCDIC only used in really old systems?
>>>
>>> Spoken like a true PC junkie. :-)
>>
>> Well, PC downwards.  I don't do mainframes, minis, etc.
>>
>>>
>>> Which reminds me: isn't ASCII only used in really tiny systems?
>>>
>>
>> The point of ASCII is that apart from EBCDIC, everyone agrees on the
>> first 128 characters.
>
> So are you claiming that a 7-bit code that has been standardised is a
> 7-bit code that has been standardised? Outrageous! :-)
>
> The point of EBCDIC is that, like it or not, big iron runs it, *almost*
> without exception, so we're kinda stuck with it for now (although there
> are signs that IBM is slowly edging towards ASCII).

Well that depends on where you get your big iron...  Everything "big" 
I've used this century runs Linux, Solaris or <insert vendors name here> 
Unix.

-- 
Ian Collins
0
Ian
5/14/2015 2:37:51 AM
Kenny McCormack <gazelle@shell.xmission.com> wrote:

(snip)
>>> The problem is that because C wants to support weird character sets like
>>> EBCDIC, so as a result we can't have nice things.  Specifically, we can't
>>> have as part of the syntax of "switch":
 
>>>        case 'A'..'F':
>>>            puts("Valid grade!");
 
>>> I'm pretty sure other languages support this - I think Pascal does.

(snip)

>>In both ASCII and EBCDIC, the characters 'A' through 'F' are contiguous,
>>so you could do that.
 
> If you think that what you've written is an answer to or an objection
> to what I wrote, then you have completely missed the point of my post.

That is possible, but I am not convinced. Do you have any reference
for the decision?  Since a large fraction of switch/case use non-char
operands, this reason wouldn't apply. 

Well, also, when I first learned switch/case I considered it the C
version of the Fortran computed GOTO, commonly implemented as a
branch table. Branch tables are a more obvious solution with a smaller
number of choices.
 
>> (Seems to me a language design decision.)
 
> Yes.  That *is* the point.  That C's syntax does not allow ranges on
> switch() statements.  And that, I speculate, part of the reason it doesn't
> is because of wanting to support EBCDIC (and/or other weird encodings).

You can speculate that if you want, I speculate that it isn't that.
 
> I'm going to quote myself here.  Please read the following again, carefully:
 
> * The problem is that because C wants to support weird character 
>   sets like  EBCDIC, so as a result we can't have nice things.  
>  Note that the phrase "can't have nice things" is a current cliche.  
>  A nice turn of phrase...

There is a long list of features that other languages have and C
doesn't. We can speculate all we want, but the most important reason
is that C was meant to be a fairly simple language. 

-- glen
 
0
glen
5/14/2015 4:29:20 AM
On Wed, 13 May 2015 21:05:08 -0500, Stephen Sprunk
<stephen@sprunk.org> wrote:

>On 13-May-15 19:31, Bartc wrote:
>> On 14/05/2015 00:54, Robert Wessel wrote:
>>> gazelle@shell.xmission.com (Kenny McCormack) wrote:
>>>> case 'A'..'F': puts("Valid grade!");
>>>> 
>>>> I'm pretty sure other languages support this - I think Pascal
>>>> does.
>>> 
>>> We *could* have that in C, but what it would mean, when using 
>>> character constants, would be quite implementation defined.
>> 
>> Why? Within the ASCII range of 32 to 127, and especially for digits
>> and letters, it would be well-defined.
>
>Sadly, not all character sets agree 100% within that range, even if you
>ignore the oddballs like EBCDIC where you expect problems.  Various ISO
>character sets (and most of them still in common use) replace some of
>punctuation characters--and not even the same ones from set to set!
>
>Above 127, of course, it's a complete free-for-all.
>
>> BTW those 'A'..'Z' ranges are part of working code and they work 
>> properly. But it need not be just because it assumes ASCII. If my 
>> translator generated 'A' constants instead of 65, then the same C
>> output would compile in an EBCDIC system with the same 'A'..Z' range
>> in the input source, and work correctly!
>
>The problem is that, in C, 'A'..'Z' may include things that are _not_
>upper case letters if the character set in use isn't one of the ASCII
>compatible ones, e.g. EBCDIC.  That's why we can't have nice things.


I think BartC is suggesting that case 'H'..'K' generate (the
equivalent of):

  case 'H':
  case 'I':
  case 'J':
  case 'K':

or the numeric equivalents, and *not*, for EBCDIC, the equivalent of:

  case: 0xc8:
  case: 0xc9:
  case: 0xca:
  case: 0xcb:
  case: 0xcc:
  case: 0xcd:
  case: 0xce:
  case: 0xcf:
  case: 0xd0:
  case: 0xd1:
  case: 0xd2:

Of course defining that behavior for the language presents an
interesting challenge.  The behavior is fairly obvious so long as the
range is entirely within *one* of the sets of digits, uppercase
characters, or lowercase characters.  That would also have some
interesting implications for the syntax of a case label.
0
Robert
5/14/2015 5:48:53 AM
On 14/05/2015 03:05, Stephen Sprunk wrote:
> On 13-May-15 19:31, Bartc wrote:

>> BTW those 'A'..'Z' ranges are part of working code and they work
>> properly. But it need not be just because it assumes ASCII. If my
>> translator generated 'A' constants instead of 65, then the same C
>> output would compile in an EBCDIC system with the same 'A'..Z' range
>> in the input source, and work correctly!
>
> The problem is that, in C, 'A'..'Z' may include things that are _not_
> upper case letters if the character set in use isn't one of the ASCII
> compatible ones, e.g. EBCDIC.  That's why we can't have nice things.

Strange then that when a programmer thinks "A to Z", they then proceed 
to write out:

'A','B','C','D','E','F','G','H','I','J','K','L','M',
'N','O','P','Q','R','S','T','U','V','W','X','Y','Z'

without any non-upper-case letters polluting the sequence. So,at least 
for someone thinking in English (and you might notice C has English 
keywords and many English-based function names), it's not because A to Z 
has an implicit ASCII coding, it's because it's a natural sequence.

Since computers are supposed to stop us wasting time writing out stuff 
such as the above (which was actually machine-generated), why is it that 
C requires people to do things the long way?

All it needs is a rule that when two character constants are separated 
by a symbol such "..", then an alphabetical sequence is generated: all 
intervening values are generated without non-alphabeticals:

  case 'H'..'O':

becomes:

  case 'H','I','J','K','L','M','N','O':

(making use of another trivial enhancement where you don't need to 
repeat 'case').

> For numeric ranges, the justification is a lot weaker

When it's been discussed in this group before, the 'A'..'Z' argument is 
always trotted out.

I agree it would cause problems if people were to write: c>='A' && 
c<='Z', but that's another matter. That would break under EBCDIC but 
that's what warnings are for.

-- 
Bartc
0
Bartc
5/14/2015 8:41:41 AM
On 14/05/2015 00:28, glen herrmannsfeldt wrote:
> Kenny McCormack <gazelle@shell.xmission.com> wrote:
>
> (snip)
>> I think you missed Bart's point.  He was (I believe) talking about
>> "language features", not encoding specifics.
>
>> (What follows is speculation as to what Bart was talking about; I may be
>> wrong, but if I am, then consider it to be tagged to me and not to him)
>
>> The problem is that because C wants to support weird character sets like
>> EBCDIC, so as a result we can't have nice things.  Specifically, we can't
>> have as part of the syntax of "switch":
>
>>         case 'A'..'F':
>>             puts("Valid grade!");
>
>> I'm pretty sure other languages support this - I think Pascal does.
>
> In both ASCII and EBCDIC, the characters 'A' through 'F' are contiguous,
> so you could do that. (Seems to me a language design decision.)

You keep saying that. Perhaps I should have chosen a different example! 
(I had no idea where the breaks in the EBCDIC set were.) The point is 
that because of those breaks in the sequence - wherever they happen to 
be - you can't use case letter1 .. letter2: in general.

But it does make you wonder why they bothered making some subsets of the 
alphabet have consecutive codes at all, rather than just make them all 
arbitrary and help keep everyone on their toes.

-- 
Bartc
0
Bartc
5/14/2015 8:57:49 AM
On 2015-05-14, Keith Thompson <kst-u@mib.org> wrote:
> The characters 'A'..'F' and 'a'..'f' are contiguous even in EBCDIC.  A
> future standard could guarantee that without breaking any systems that
> use ASCII-based or EBCDIC-based character sets.
>
> I just don't see all that many cases where case ranges would be useful.
> Maybe you see more than I do.
>
> If you want more than that, say 'A'..'Z' to test for uppercase letters,
> that might seem reasonable -- but there are potentially lot more than 26
> uppercase letters.

ISTM that this is what isupper() is for.  Furthermore, it's not wildly
difficult to gin up a function or macro to do any more complex test you
may have.

I think it would be better in general to lobby for more of these sort of
tests to be added to the standard, rather than making some radical
change to the base system requirements.

For example, when was isxdigit() added?  If there are lots more that are
really generally useful, by all means, let's add them to the standard.

Granted, this breaks a nice switch construct, but I would just do the
isupper() (or whatever) test first, and then put a switch for the
special cases in an else block.

-- 
nw
0
Nathan
5/14/2015 9:54:46 AM
On Thu, 14 May 2015 09:41:41 +0100, Bartc <bc@freeuk.com> wrote:

>On 14/05/2015 03:05, Stephen Sprunk wrote:
>> On 13-May-15 19:31, Bartc wrote:
>
>>> BTW those 'A'..'Z' ranges are part of working code and they work
>>> properly. But it need not be just because it assumes ASCII. If my
>>> translator generated 'A' constants instead of 65, then the same C
>>> output would compile in an EBCDIC system with the same 'A'..Z' range
>>> in the input source, and work correctly!
>>
>> The problem is that, in C, 'A'..'Z' may include things that are _not_
>> upper case letters if the character set in use isn't one of the ASCII
>> compatible ones, e.g. EBCDIC.  That's why we can't have nice things.
>
>Strange then that when a programmer thinks "A to Z", they then proceed 
>to write out:
>
>'A','B','C','D','E','F','G','H','I','J','K','L','M',
>'N','O','P','Q','R','S','T','U','V','W','X','Y','Z'
>
>without any non-upper-case letters polluting the sequence. So,at least 
>for someone thinking in English (and you might notice C has English 
>keywords and many English-based function names), it's not because A to Z 
>has an implicit ASCII coding, it's because it's a natural sequence.


Those 26 aren't the only alphabetic characters.  And given that you
have to retrofit this into C, what forms are you going to allow?  What
should the following generate:

  case 1..9:
  case 'A'+2..'J'+2:
  case 'A'..'b':
  case '['..']':


>Since computers are supposed to stop us wasting time writing out stuff 
>such as the above (which was actually machine-generated), why is it that 
>C requires people to do things the long way?
>
>All it needs is a rule that when two character constants are separated 
>by a symbol such "..", then an alphabetical sequence is generated: all 
>intervening values are generated without non-alphabeticals:
>
>  case 'H'..'O':
>
>becomes:
>
>  case 'H','I','J','K','L','M','N','O':
>
>(making use of another trivial enhancement where you don't need to 
>repeat 'case').


So what happens when someone specifies 'A'..'b'?


>> For numeric ranges, the justification is a lot weaker
>
>When it's been discussed in this group before, the 'A'..'Z' argument is 
>always trotted out.
>
>I agree it would cause problems if people were to write: c>='A' && 
>c<='Z', but that's another matter. That would break under EBCDIC but 
>that's what warnings are for.
0
Robert
5/14/2015 10:02:30 AM
On 14/05/15 10:56, Robert Wessel wrote:

<snip>

> But if you want discontiguous (and disordered!) letters, Baudot and
> it's various descendents will do the trick.

The designers of the latest version of the DeathStation 9000, in a rare 
spirit of generosity and amity towards their customers, decided that 
their character set (which is of course 100% conforming) would have the 
following layout:

Code Point 0: null (obviously), followed by no fewer than 81 
DeathStation-specific code points which I won't go into here.

Then come Esc f1-f12 (in that order) PrtScr ScrollLock Break ESC F1-F12 
PRTSCR SCROLLLOCK Pause

� ! " � $ % ^ & * ( ) _ + Backspace Insert Home PageUp

`
(At this point, there is a note in the design document along the lines 
of "growf!", as they realise they're going to have to break the logical 
pattern at this point in order to remain conforming)

0 1 2 3 4 5 6 7 8 9 - = BACKSPACE INSERT HOME PAGEUP

Tab q w e r t y u i o p [ ] Newline Delete End PageDown
TAB Q W E R T Y U I O P { } Carriage DELETE END PAGEDOWN
CapsLock a s d f g h j k l ; ' #
CAPSLOCK A S D F G H J K L : @ ~
LeftShift \ z x c v b n m , . / RightShift UpArrow
LEFTSHIFT | Z X C V B N M < > ? RIGHTSHIFT UPARROW
LeftCtrl Meta Alt Space Metameta Metametameta  RightCtrl LeftArrow 
DownArrow RightArrow
LEFTCTRL META ALT SPACE METAMETA METAMETAMETA RIGHTCTRL LEFTARROW 
DOWNARROW RIGHTARROW

In case you're curious, the DS9000 keyboard hardware distinguishes 
between LEFTSHIFT (upper case left shift) and RIGHTSHIFT by remembering 
which shift key was pressed first. Left first, you get RIGHTSHIFT. Right 
first, you get LEFTSHIFT. But you have to be quick, or you just get 
leftshift or rightshift instead.

You can order the very latest DS9000 by writing to:

The Purchasing Department
Armed Response Technologi%^t"\%Qaa^(`v9}NO CARRIER
0
Richard
5/14/2015 10:34:02 AM
On 14/05/15 04:37, Ian Collins wrote:
> Richard Heathfield wrote:
>> On 13/05/15 13:55, David Brown wrote:
>>> On 13/05/15 12:55, Richard Heathfield wrote:
>>>> On 13/05/15 08:14, David Brown wrote:
>>>>
>>>>> Isn't EBCDIC only used in really old systems?
>>>>
>>>> Spoken like a true PC junkie. :-)
>>>
>>> Well, PC downwards.  I don't do mainframes, minis, etc.
>>>
>>>>
>>>> Which reminds me: isn't ASCII only used in really tiny systems?
>>>>
>>>
>>> The point of ASCII is that apart from EBCDIC, everyone agrees on the
>>> first 128 characters.
>>
>> So are you claiming that a 7-bit code that has been standardised is a
>> 7-bit code that has been standardised? Outrageous! :-)
>>
>> The point of EBCDIC is that, like it or not, big iron runs it, *almost*
>> without exception, so we're kinda stuck with it for now (although there
>> are signs that IBM is slowly edging towards ASCII).
>
> Well that depends on where you get your big iron...  Everything "big"
> I've used this century runs Linux, Solaris or <insert vendors name here>
> Unix.
>

And I think many of the really "big iron" systems that IBM sells these 
days are used to run vast numbers of Linux virtual machines.


0
David
5/14/2015 11:13:35 AM
On 14/05/2015 11:02, Robert Wessel wrote:
> On Thu, 14 May 2015 09:41:41 +0100, Bartc <bc@freeuk.com> wrote:

>> Strange then that when a programmer thinks "A to Z", they then proceed
>> to write out:
>>
>> 'A','B','C','D','E','F','G','H','I','J','K','L','M',
>> 'N','O','P','Q','R','S','T','U','V','W','X','Y','Z'
>>
>> without any non-upper-case letters polluting the sequence. So,at least
>> for someone thinking in English (and you might notice C has English
>> keywords and many English-based function names), it's not because A to Z
>> has an implicit ASCII coding, it's because it's a natural sequence.
>
>
> Those 26 aren't the only alphabetic characters.

The 26-letter A-Z range is special. While many languages (without even 
getting into other alphabets) can have extra letters, with a coding 
outside the ASCII range, and others have fewer letters (Italian used to 
get by with only 21 letters), the 26-letter A-Z (and a-z) range remains 
important enough to be given special treatment.

(For example, they are the only letters that C allows in identifiers. In 
fact, on the first web page I looked at on the subject, it says that 
identifiers consist of a-z and A-Z (as well as 0-9 and underscore). It 
doesn't go into details as to what letters a-z or A-Z might include. 
Presumably it's obvious! Which is my point.)

> And given that you
> have to retrofit this into C, what forms are you going to allow?  What
> should the following generate:
>
>    case 1..9:
>    case 'A'+2..'J'+2:
>    case 'A'..'b':
>    case '['..']':

Doing it properly in a language, where you want to be able to assume 
that letters have a contiguous encoding, though your target might have 
an arbitrary one, is difficult. And that's when you have the freedom to 
do anything.

With C, anything retrofitted is going to be clunky enough and ugly 
enough that no one would use it. The simplest is to just allow 
everything in your examples, but give a warning when the char encoding 
doesn't have contiguous letters in the special set.

(What /can/ be retrofitted easily is to simply allow a list after 
'case', so that if a range does have to be enumerated one by one, it is 
a bit less painful.)

 >    case 1..9:

This one is just the set 1,2,3,4,5,6,7,8,9

 >    case 'A'..'b':
 >    case '['..']':

These two are bad practice anyway. Suppose someone was to talk about, in 
any context including real life, the sequence from capital A to small b, 
what would it mean? So they would be ambiguous.

 >    case 'A'+2..'J'+2:

While expressions such as these are already allowed in C. They may or 
may not be the equivalent of 'C' and 'L'. But no-none is suggesting they 
oughtn't be allowed because they might give different results under EBCDIC.

-- 
Bartc
0
Bartc
5/14/2015 12:44:22 PM
On 14-May-15 04:54, Nathan Wagner wrote:
> On 2015-05-14, Keith Thompson <kst-u@mib.org> wrote:
>> If you want more than that, say 'A'..'Z' to test for uppercase
>> letters, that might seem reasonable -- but there are potentially
>> lot more than 26 uppercase letters.
> 
> ISTM that this is what isupper() is for.  Furthermore, it's not
> wildly difficult to gin up a function or macro to do any more complex
> test you may have.

Unfortunately, those functions only handle single-byte characters.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/14/2015 1:57:05 PM
Stephen Sprunk <stephen@sprunk.org> writes:
> On 14-May-15 04:54, Nathan Wagner wrote:
>> On 2015-05-14, Keith Thompson <kst-u@mib.org> wrote:
>>> If you want more than that, say 'A'..'Z' to test for uppercase
>>> letters, that might seem reasonable -- but there are potentially
>>> lot more than 26 uppercase letters.
>> 
>> ISTM that this is what isupper() is for.  Furthermore, it's not
>> wildly difficult to gin up a function or macro to do any more complex
>> test you may have.
>
> Unfortunately, those functions only handle single-byte characters.

<uchar.h> declares iswupper() et al.  It was added by C11.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/14/2015 3:31:39 PM
Nathan Wagner <nw@hydaspes.if.org> writes:
[...]
> For example, when was isxdigit() added?  If there are lots more that are
> really generally useful, by all means, let's add them to the standard.
[...]

isxdigit() is in C90.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/14/2015 3:33:00 PM
Bartc <bc@freeuk.com> wrote:

(snip, I wrote)
>> In both ASCII and EBCDIC, the characters 'A' through 'F' are contiguous,
>> so you could do that. (Seems to me a language design decision.)
 
> You keep saying that. Perhaps I should have chosen a different example! 
> (I had no idea where the breaks in the EBCDIC set were.) The point is 
> that because of those breaks in the sequence - wherever they happen to 
> be - you can't use case letter1 .. letter2: in general.
 
> But it does make you wonder why they bothered making some subsets of the 
> alphabet have consecutive codes at all, rather than just make them all 
> arbitrary and help keep everyone on their toes.

As noted earlier, EBCDIC and ASCII are about the same time, but
EBCDIC traces back through BCDIC (most often just called BCD) to
the punched card codes from years earlier.  Even before anyone
thought about computers.

The codes match up with the punch codes on the cards, each letter
having two punches. 

EBCDIC makes somewhat more sense than the BCDIC code, though.

In any case, I am not convinced that this had anything to do
with the development of C. If you find a reliable source that
knows more, I would be interested to hear about it.

-- glen

0
glen
5/14/2015 5:13:15 PM
Robert Wessel <robertwessel2@yahoo.com> wrote:

(snip)
> EBCDIC derives from earlier codes (all with "BCD" somewhere in the
> name) and punch card usage.
 
> The three groups of uppercase letters on IBM standard punch cards use
> a 12/11/0 punch, plus a 1-9 punch.  That first punch effectively
> represents the high nibble of the EBCDIC code (0xc0/0xd0/0xe0) for the
> characters, and second punch becomes the low nibble of the code.  The
> excess code point, oxe1, is used for a slash.  The digits are just
> single 0-9 punches.
 
> 704 BCD followed a similar pattern, despite having 6-bit characters,
> and the digits and three groups of alphabetic characters were in the
> 0x00/0x10/0x20/0x30 ranges.

In the six bit code days, there were multiple characters for some
of the codes, with commercial and scientific systems using different
ones.  I believe the 026 keypunch came both ways, but otherwise
you would learn what to type to get the character you wanted.
 
> In the 704 days relating the internal representation to the punch card
> format made some sense.  As mentioned elsewhere IBM had intended an
> ASCII variant for S/360, but both the standard and the ASCII
> peripherals were late (many of the card devices and printers - "unit
> record" gear - were enhanced/adapted versions of devices from
> pre-S/360 systems), so the OS and everything else got written with the
> updated BCD code, and despite having an ASCII mode, S/360 never used
> it.

Well, the ASCII-8 standard never came, still hasn't. 

Note that the 704 has to convert the input punches into BCD code
in software. The hardware just reads in the punch positions.
(Cards are read top to bottom, each punch row into two 36 bit words.)

The 360 peripherals, and smaller 360 systems, convert punch codes
to EBCDIC in microcode. That is, no look-up table.  (EBCDIC punch
codes allow only zero or one punch in rows 1 though 7, which allows
for eight possibilities. That, along with the other five punch rows,
gives 256 codes. Not all match so nicely to internal values, though.)
 
> ASCII has some odd choices too, but it's clearly a much cleaner
> organization.
 
> But if you want discontiguous (and disordered!) letters, Baudot and
> it's various descendents will do the trick.

Specifically, the codes used by Teletype corporation before the ASR33.

-- glen
0
glen
5/14/2015 5:22:07 PM
Richard Heathfield <rjh@cpax.org.uk> wrote:

(snip)
> The designers of the latest version of the DeathStation 9000, in a rare 
> spirit of generosity and amity towards their customers, decided that 
> their character set (which is of course 100% conforming) would have the 
> following layout:
 
(snip)

> � ! " � $ % ^ & * ( ) _ + Backspace Insert Home PageUp

and the real problem with ASCII, no � sign. Very inconvenient
for PL/I programming.

-- glen
0
glen
5/14/2015 5:24:03 PM
Robert Wessel <robertwessel2@yahoo.com> wrote:

(snip, someone wrote)
>>The problem is that, in C, 'A'..'Z' may include things that are _not_
>>upper case letters if the character set in use isn't one of the ASCII
>>compatible ones, e.g. EBCDIC.  That's why we can't have nice things.
 
> I think BartC is suggesting that case 'H'..'K' generate (the
> equivalent of):
 
>  case 'H':
>  case 'I':
>  case 'J':
>  case 'K':
 
(snip)

> Of course defining that behavior for the language presents an
> interesting challenge.  The behavior is fairly obvious so long as the
> range is entirely within *one* of the sets of digits, uppercase
> characters, or lowercase characters.  That would also have some
> interesting implications for the syntax of a case label.

regexp compilers know how to do it.  It would also be interesting
to allow:

case upper(x):

(Yes I know that C doesn't allow non-constant expressions in case,
but verilog does.  But there is probably another way to add it
to C.)

-- glen


0
glen
5/14/2015 5:27:18 PM
Bartc <bc@freeuk.com> wrote:

(snip)

> The 26-letter A-Z range is special. 

Residents of many other countries might disagree.

> While many languages (without even 
> getting into other alphabets) can have extra letters, with a coding 
> outside the ASCII range, and others have fewer letters (Italian used to 
> get by with only 21 letters), the 26-letter A-Z (and a-z) range remains 
> important enough to be given special treatment.
 
> (For example, they are the only letters that C allows in identifiers. In 
> fact, on the first web page I looked at on the subject, it says that 
> identifiers consist of a-z and A-Z (as well as 0-9 and underscore). It 
> doesn't go into details as to what letters a-z or A-Z might include. 
> Presumably it's obvious! Which is my point.)

Note that Java doesn't have that restriction. All unicode letter
and digits are allowed in Java identifiers.  That is especially 
interesting as many look almost the same as Roman alphabet letters.
(Notice that I didn't say English alphabet.)

Also, note that C is now an ISO standard. 

-- glen
0
glen
5/14/2015 5:32:10 PM
On 14/05/2015 18:32, glen herrmannsfeldt wrote:
> Bartc <bc@freeuk.com> wrote:
>
> (snip)
>
>> The 26-letter A-Z range is special.
>
> Residents of many other countries might disagree.

No. But that doesn't change the facts.

That particular set of letters is fully supported and can be represented in:

- Baudot code
- Morse code
- Semaphore
- Sign Language (BSL for sure)
- Radix 50
- 'Sixbit'
- ASCII
- Even EBCDIC

That's just off the top of my head.

There's also been plenty of hardware and software in the past which, if 
it was restricted in the characters it could deal with, would at least 
have supported upper case A to Z.

>> (For example, they are the only letters that C allows in identifiers. In
>> fact, on the first web page I looked at on the subject, it says that
>> identifiers consist of a-z and A-Z (as well as 0-9 and underscore). It
>> doesn't go into details as to what letters a-z or A-Z might include.
>> Presumably it's obvious! Which is my point.)
>
> Note that Java doesn't have that restriction. All unicode letter
> and digits are allowed in Java identifiers.  That is especially
> interesting as many look almost the same as Roman alphabet letters.
> (Notice that I didn't say English alphabet.)

Languages are usually more restrictive in the character set that is used 
to express source code. But one thing they will invariably have in 
common is to allow A to Z. (Oddball languages excluded.)

Or perhaps you have another candidate for an alphabet that has been of 
similar significance and ubiquity?

-- 
Bartc
0
Bartc
5/14/2015 7:18:40 PM
On Thu, 14 May 2015 13:13:35 +0200, David Brown
<david.brown@hesbynett.no> wrote:

>On 14/05/15 04:37, Ian Collins wrote:
>> Richard Heathfield wrote:
>>> On 13/05/15 13:55, David Brown wrote:
>>>> On 13/05/15 12:55, Richard Heathfield wrote:
>>>>> On 13/05/15 08:14, David Brown wrote:
>>>>>
>>>>>> Isn't EBCDIC only used in really old systems?
>>>>>
>>>>> Spoken like a true PC junkie. :-)
>>>>
>>>> Well, PC downwards.  I don't do mainframes, minis, etc.
>>>>
>>>>>
>>>>> Which reminds me: isn't ASCII only used in really tiny systems?
>>>>>
>>>>
>>>> The point of ASCII is that apart from EBCDIC, everyone agrees on the
>>>> first 128 characters.
>>>
>>> So are you claiming that a 7-bit code that has been standardised is a
>>> 7-bit code that has been standardised? Outrageous! :-)
>>>
>>> The point of EBCDIC is that, like it or not, big iron runs it, *almost*
>>> without exception, so we're kinda stuck with it for now (although there
>>> are signs that IBM is slowly edging towards ASCII).
>>
>> Well that depends on where you get your big iron...  Everything "big"
>> I've used this century runs Linux, Solaris or <insert vendors name here>
>> Unix.
>>
>
>And I think many of the really "big iron" systems that IBM sells these 
>days are used to run vast numbers of Linux virtual machines.


Well a few of them are, anyway.  Having a few Linux images on a
mainframe is quite common, but the actual wholesale replacement of
"squatty box" server farms appears quite rare.  Virtualizing a bunch
of low-utilization servers onto a single physical box certainly has
merit, but a mainframe doesn't really bring all that much that you
can't do with VMWare (and the like).
0
Robert
5/15/2015 1:01:01 AM
On Thursday, 14 May 2015 20:32:20 UTC+3, glen herrmannsfeldt  wrote:
>
> Note that Java doesn't have that restriction. All unicode letter
> and digits are allowed in Java identifiers.  That is especially 
> interesting as many look almost the same as Roman alphabet letters.
> (Notice that I didn't say English alphabet.)

What is the significance or trick about that notice? My (possibly wrong)
knowledge is that English uses classical Roman alphabet to what 
letters J, U and W were added. There are other characters in Unicode
that look very similar to characters J, U, W, j, u and w.
0
ISO
5/15/2015 12:32:28 PM
On 15-May-15 07:32, Öö Tiib wrote:
> On Thursday, 14 May 2015 20:32:20 UTC+3, glen herrmannsfeldt  wrote:
>> Note that Java doesn't have that restriction. All unicode letter 
>> and digits are allowed in Java identifiers.  That is especially 
>> interesting as many look almost the same as Roman alphabet
>> letters. (Notice that I didn't say English alphabet.)
> 
> What is the significance or trick about that notice? My (possibly
> wrong) knowledge is that English uses classical Roman alphabet to
> what letters J, U and W were added.

It's the Latin, not Roman, alphabet.

English uses the "modern" Latin alphabet, which adds some letters, e.g.
J, U and W, and excludes some others, e.g. Æ, Þ, Œ.  Note that most
Latin texts used in schools have been transformed into the "modern"
alphabet, rather than being studied in the original form.

Most other languages using the Latin alphabet have diacritical marks,
which aren't in ASCII either; English had some too, until the advent of
the typewriter, when they were dropped as "unnecessary".

> There are other characters in Unicode that look very similar to
> characters J, U, W, j, u and w.

Several letters in the Greek and/or Cyrillic alphabets look the same as
ones in the Latin alphabet, but they all have distinct code points:

http://upload.wikimedia.org/wikipedia/commons/8/84/Venn_diagram_showing_Greek,_Latin_and_Cyrillic_letters.svg

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/15/2015 6:47:03 PM
On Thu, 14 May 2015 11:34:02 +0100, Richard Heathfield
<rjh@cpax.org.uk> wrote:

>On 14/05/15 10:56, Robert Wessel wrote:
>
><snip>
>
>> But if you want discontiguous (and disordered!) letters, Baudot and
>> it's various descendents will do the trick.
>
>The designers of the latest version of the DeathStation 9000, in a rare 
>spirit of generosity and amity towards their customers, decided that 
>their character set (which is of course 100% conforming) would have the 
>following layout:
>
>Code Point 0: null (obviously), followed by no fewer than 81 
>DeathStation-specific code points which I won't go into here.
>
>Then come Esc f1-f12 (in that order) PrtScr ScrollLock Break ESC F1-F12 
>PRTSCR SCROLLLOCK Pause
>
>� ! " � $ % ^ & * ( ) _ + Backspace Insert Home PageUp
>
>`
>(At this point, there is a note in the design document along the lines 
>of "growf!", as they realise they're going to have to break the logical 
>pattern at this point in order to remain conforming)
>
>0 1 2 3 4 5 6 7 8 9 - = BACKSPACE INSERT HOME PAGEUP
>
>Tab q w e r t y u i o p [ ] Newline Delete End PageDown
>TAB Q W E R T Y U I O P { } Carriage DELETE END PAGEDOWN
>CapsLock a s d f g h j k l ; ' #
>CAPSLOCK A S D F G H J K L : @ ~
>LeftShift \ z x c v b n m , . / RightShift UpArrow
>LEFTSHIFT | Z X C V B N M < > ? RIGHTSHIFT UPARROW
>LeftCtrl Meta Alt Space Metameta Metametameta  RightCtrl LeftArrow 
>DownArrow RightArrow
>LEFTCTRL META ALT SPACE METAMETA METAMETAMETA RIGHTCTRL LEFTARROW 
>DOWNARROW RIGHTARROW
>
>In case you're curious, the DS9000 keyboard hardware distinguishes 
>between LEFTSHIFT (upper case left shift) and RIGHTSHIFT by remembering 
>which shift key was pressed first. Left first, you get RIGHTSHIFT. Right 
>first, you get LEFTSHIFT. But you have to be quick, or you just get 
>leftshift or rightshift instead.
>
>You can order the very latest DS9000 by writing to:
>
>The Purchasing Department
>Armed Response Technologi%^t"\%Qaa^(`v9}NO CARRIER


Is the collating sequence different on DS9000's manufactured for sale
in France?
0
Robert
5/16/2015 6:21:37 AM
On Thu, 14 May 2015 17:22:07 +0000 (UTC), glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>Robert Wessel <robertwessel2@yahoo.com> wrote:
>
>(snip)
>> EBCDIC derives from earlier codes (all with "BCD" somewhere in the
>> name) and punch card usage.
> 
>> The three groups of uppercase letters on IBM standard punch cards use
>> a 12/11/0 punch, plus a 1-9 punch.  That first punch effectively
>> represents the high nibble of the EBCDIC code (0xc0/0xd0/0xe0) for the
>> characters, and second punch becomes the low nibble of the code.  The
>> excess code point, oxe1, is used for a slash.  The digits are just
>> single 0-9 punches.
> 
>> 704 BCD followed a similar pattern, despite having 6-bit characters,
>> and the digits and three groups of alphabetic characters were in the
>> 0x00/0x10/0x20/0x30 ranges.
>
>In the six bit code days, there were multiple characters for some
>of the codes, with commercial and scientific systems using different
>ones.  I believe the 026 keypunch came both ways, but otherwise
>you would learn what to type to get the character you wanted.
> 
>> In the 704 days relating the internal representation to the punch card
>> format made some sense.  As mentioned elsewhere IBM had intended an
>> ASCII variant for S/360, but both the standard and the ASCII
>> peripherals were late (many of the card devices and printers - "unit
>> record" gear - were enhanced/adapted versions of devices from
>> pre-S/360 systems), so the OS and everything else got written with the
>> updated BCD code, and despite having an ASCII mode, S/360 never used
>> it.
>
>Well, the ASCII-8 standard never came, still hasn't. 


I've never been clear on whether ASCII-8 was actually supposed to be
an X3/ASA/USASI/ANSI standard, or it was just an IBM thing.  While X2
considered eight bits as a possibility, I don't think it ever went
anywhere.  AS defined by IBM, ASCII-8 was ASCII(-7)*, with the high
two bits shifted left one position, with the high bit duplicated into
the vacated position.


*With a few variations, especially in the control characters, from
what the final ASCII was.  Which may just be because ASCII(-7) was
late.
0
Robert
5/16/2015 6:59:30 AM
On Thu, 14 May 2015 17:27:18 +0000 (UTC), glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>Robert Wessel <robertwessel2@yahoo.com> wrote:
>
>(snip, someone wrote)
>>>The problem is that, in C, 'A'..'Z' may include things that are _not_
>>>upper case letters if the character set in use isn't one of the ASCII
>>>compatible ones, e.g. EBCDIC.  That's why we can't have nice things.
> 
>> I think BartC is suggesting that case 'H'..'K' generate (the
>> equivalent of):
> 
>>  case 'H':
>>  case 'I':
>>  case 'J':
>>  case 'K':
> 
>(snip)
>
>> Of course defining that behavior for the language presents an
>> interesting challenge.  The behavior is fairly obvious so long as the
>> range is entirely within *one* of the sets of digits, uppercase
>> characters, or lowercase characters.  That would also have some
>> interesting implications for the syntax of a case label.
>
>regexp compilers know how to do it.  It would also be interesting
>to allow:
>
>case upper(x):
>
>(Yes I know that C doesn't allow non-constant expressions in case,
>but verilog does.  But there is probably another way to add it
>to C.)


A more sophisticated selection statement, akin to Cobol "evaluate", is
possible.
0
Robert
5/16/2015 7:00:35 AM
On Friday, 15 May 2015 21:47:21 UTC+3, Stephen Sprunk  wrote:
> On 15-May-15 07:32, =D6=F6 Tiib wrote:
> > On Thursday, 14 May 2015 20:32:20 UTC+3, glen herrmannsfeldt  wrote:
> >> Note that Java doesn't have that restriction. All unicode letter=20
> >> and digits are allowed in Java identifiers.  That is especially=20
> >> interesting as many look almost the same as Roman alphabet
> >> letters. (Notice that I didn't say English alphabet.)
> >=20
> > What is the significance or trick about that notice? My (possibly
> > wrong) knowledge is that English uses classical Roman alphabet to
> > what letters J, U and W were added.
>=20
> It's the Latin, not Roman, alphabet.

Aren't those same? At least Wikipedia says "Latin alphabet", AKA=20
"Roman alphabet".

> English uses the "modern" Latin alphabet, which adds some letters, e.g.
> J, U and W, and excludes some others, e.g. =C6, =DE, OE.

Thanks for more accurate description of differences. I'm still unsure
why Glen emphasised the difference between Latin and English alphabets
in context of similar-looking code points in Unicode.
0
ISO
5/16/2015 9:51:00 AM
Robert Wessel <robertwessel2@yahoo.com> wrote:

(snip, I wrote)
>>In the six bit code days, there were multiple characters for some
>>of the codes, with commercial and scientific systems using different
>>ones.  I believe the 026 keypunch came both ways, but otherwise
>>you would learn what to type to get the character you wanted.

(after someone else wrote)
>>> In the 704 days relating the internal representation to the punch card
>>> format made some sense.  As mentioned elsewhere IBM had intended an
>>> ASCII variant for S/360, but both the standard and the ASCII
>>> peripherals were late (many of the card devices and printers - "unit
>>> record" gear - were enhanced/adapted versions of devices from
>>> pre-S/360 systems), so the OS and everything else got written with the
>>> updated BCD code, and despite having an ASCII mode, S/360 never used
>>> it.

>>Well, the ASCII-8 standard never came, still hasn't. 
 
> I've never been clear on whether ASCII-8 was actually supposed to be
> an X3/ASA/USASI/ANSI standard, or it was just an IBM thing.  While X2
> considered eight bits as a possibility, I don't think it ever went
> anywhere.  AS defined by IBM, ASCII-8 was ASCII(-7)*, with the high
> two bits shifted left one position, with the high bit duplicated into
> the vacated position.

As I understand it, if ASCII-8 was accepted, then OS/360 would
have used it instead of EBCDIC.  It wasn't, and it didn't. 

It would have been a challenge to get the card readers and such
to use it, but that was about the time that 256 byte translation
tables became affordable. It might have cost a little more, but
not a lot more.

But also, translation tables allowed for ease of conversion
when needed. OS/360 can read and write AL tapes, and, not so much
later, the 3705 can talk to ASCII terminals.
 
> *With a few variations, especially in the control characters, from
> what the final ASCII was.  Which may just be because ASCII(-7) was
> late.

In addition to ^ and _ replacing up and back arrow.

-- glen
0
glen
5/16/2015 9:51:58 AM
On Sat, 16 May 2015 09:51:58 +0000 (UTC), glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>Robert Wessel <robertwessel2@yahoo.com> wrote:
>
>(snip, I wrote)
>>>In the six bit code days, there were multiple characters for some
>>>of the codes, with commercial and scientific systems using different
>>>ones.  I believe the 026 keypunch came both ways, but otherwise
>>>you would learn what to type to get the character you wanted.
>
>(after someone else wrote)
>>>> In the 704 days relating the internal representation to the punch card
>>>> format made some sense.  As mentioned elsewhere IBM had intended an
>>>> ASCII variant for S/360, but both the standard and the ASCII
>>>> peripherals were late (many of the card devices and printers - "unit
>>>> record" gear - were enhanced/adapted versions of devices from
>>>> pre-S/360 systems), so the OS and everything else got written with the
>>>> updated BCD code, and despite having an ASCII mode, S/360 never used
>>>> it.
>
>>>Well, the ASCII-8 standard never came, still hasn't. 
> 
>> I've never been clear on whether ASCII-8 was actually supposed to be
>> an X3/ASA/USASI/ANSI standard, or it was just an IBM thing.  While X2
>> considered eight bits as a possibility, I don't think it ever went
>> anywhere.  AS defined by IBM, ASCII-8 was ASCII(-7)*, with the high
>> two bits shifted left one position, with the high bit duplicated into
>> the vacated position.
>
>As I understand it, if ASCII-8 was accepted, then OS/360 would
>have used it instead of EBCDIC.  It wasn't, and it didn't. 


I'm not sure I see the timing work out.  While ASCII was planned for
the S/360, and the standardization was underway, the S/360 project
really got running at the end of 1961 (there were a couple of years of
discussion on unification prior to that, much focusing on a proposed
"8000" machine, which was never built).  The first (partial) version
of ASCII, missing the lower case letters and then some, wasn't
published until early 1963.  A more complete version was published
later that year.  It seems inconceivable that there would have been
enough time to convert OS/360 to ASCII between then and 1964 (or even
April 1965 when the first non-beta machines shipped to customers),
especially since OS/360 was already a software development disaster by
then.  Even ignoring the issue of peripherals.

Again, I've never seen a reference to "ASCII-8" outside of IBM.  Even
if that had been something X3 was considering, and it had been
standardized on the same day the seven-bit code was standardized, I
still don't see how it could have ended up in OS/360.

ASCII in any form just wasn't there in time for it to be a realistic
option for S/360.
0
Robert
5/16/2015 10:36:03 AM
Robert Wessel <robertwessel2@yahoo.com> wrote:

(snip, I previously wrote)
>>>>Well, the ASCII-8 standard never came, still hasn't. 
 
>>> I've never been clear on whether ASCII-8 was actually supposed to be
>>> an X3/ASA/USASI/ANSI standard, or it was just an IBM thing.  While X2
>>> considered eight bits as a possibility, I don't think it ever went
>>> anywhere.  AS defined by IBM, ASCII-8 was ASCII(-7)*, with the high
>>> two bits shifted left one position, with the high bit duplicated into
>>> the vacated position.

>>As I understand it, if ASCII-8 was accepted, then OS/360 would
>>have used it instead of EBCDIC.  It wasn't, and it didn't. 
 
> I'm not sure I see the timing work out.  While ASCII was planned for
> the S/360, and the standardization was underway, the S/360 project
> really got running at the end of 1961 (there were a couple of years of
> discussion on unification prior to that, much focusing on a proposed
> "8000" machine, which was never built).  

I recommend reading "Computer Architecture Concepts and Evolution"
by Blaauw and Brooks. For one, it has many details on many different
computers built over the years, but is especially detailed on the
design of S/360 and OS/360. There is a lot of detail on different
character sets, who did what, when, and why. 

Note that much of the development of OS/360 was done on 7090s. 
There weren't yet 360s to run it on. 

They at least thought about this problem in writing the PL/I (F)
compiler and library. Routines have comments indicating that either
they are independent of the character set used, or that they adopt
the character set that they are assembled in. That is, that 
character set specific values are only used in character constants,
and vice versa.  If done right, the only changed needed is to 
convert the source and assemble with the new assembler!

> The first (partial) version
> of ASCII, missing the lower case letters and then some, wasn't
> published until early 1963.  A more complete version was published
> later that year.  It seems inconceivable that there would have been
> enough time to convert OS/360 to ASCII between then and 1964 (or even
> April 1965 when the first non-beta machines shipped to customers),
> especially since OS/360 was already a software development disaster by
> then.  Even ignoring the issue of peripherals.

If OS/360 development was done on the 7090, in BCDIC, it had to be
converted somewhere along the line. It doesn't make so much difference
which way it goes.
 
> Again, I've never seen a reference to "ASCII-8" outside of IBM.  Even
> if that had been something X3 was considering, and it had been
> standardized on the same day the seven-bit code was standardized, I
> still don't see how it could have ended up in OS/360.

Consider that the printer usually used with S/360, the 1403, is the
printer that came from the 1401 series. (They didn't even renumber
it.)  To print, there is a mapping from the characters to the position
on the print train, but note that position isn't any easier to find
in EBCDIC than ASCII. (They are not, for example, in consecutive 
EBCDIC codes.) It is a little easier to translate card punch codes
to EBCDIC, but not all that much harder to ASCII.  It might take
a little more microcode to do it.
 
> ASCII in any form just wasn't there in time for it to be a realistic
> option for S/360.

Blaauw and Brooks description sounds like ASCII specifically wanted
to be as incompatible with BCDIC as they could be. 

Now, the problem for IBM was to make current customers happy. 
That was easier converting to EBCDIC, not so different from BCDIC
as ASCII-7 or ASCII-8.

-- glen
0
glen
5/16/2015 7:32:19 PM
On 16-May-15 04:51, Öö Tiib wrote:
> On Friday, 15 May 2015 21:47:21 UTC+3, Stephen Sprunk  wrote:
>> On 15-May-15 07:32, Öö Tiib wrote:
>>> What is the significance or trick about that notice? My
>>> (possibly wrong) knowledge is that English uses classical Roman
>>> alphabet to what letters J, U and W were added.
>> 
>> It's the Latin, not Roman, alphabet.
> 
> Aren't those same? At least Wikipedia says "Latin alphabet", AKA 
> "Roman alphabet".

"Roman" could be misinterpreted as one of several other alphabets that
have been used in Rome, before and after Latin was king.  That, and
Unicode officially calls them "Latin" characters.

Also, "Roman" has a specific meaning in the font world, so it's best not
to use the same term for something different in a related field.

>> English uses the "modern" Latin alphabet, which adds some letters,
>> e.g. J, U and W, and excludes some others, e.g. Æ, Þ, OE.

It's interesting that your reply has "OE" rather than the "Œ" of my
original post.  I've never seen a reply decompose a glyph like that;
either it comes through correctly or it's completely garbled.

> Thanks for more accurate description of differences. I'm still
> unsure why Glen emphasised the difference between Latin and English
> alphabets in context of similar-looking code points in Unicode.

I assumed that he was pointing out that English isn't the only user,
much less originator, of that alphabet, but that's not relevant to the
issue of lookalike glyphs, so maybe not.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/16/2015 10:13:47 PM
On 16/05/15 07:21, Robert Wessel wrote:

<snip>

> Is the collating sequence different on DS9000's manufactured for sale
> in France?

Yes, and a different one still for Germany, and yet another for Russia 
(and yes, worryingly, DS9Ks are very popular in Russia for some reason).

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/17/2015 9:00:16 AM
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:

> Kenny McCormack <gazelle@shell.xmission.com> wrote:
> 
> > The problem is that because C wants to support weird character sets like
> > EBCDIC, so as a result we can't have nice things.  Specifically, we can't
> > have as part of the syntax of "switch":
>  
> >        case 'A'..'F':
> >            puts("Valid grade!");
>  
> > I'm pretty sure other languages support this - I think Pascal does.
> 
> In both ASCII and EBCDIC, the characters 'A' through 'F' are contiguous,
> so you could do that. (Seems to me a language design decision.)
> 
> There are some ranges that are not contiguous, and in all cases
> it is up to the programmer to know how to use them. 
> 
> There are plenty of programming mistakes you can make using
> just ASCII characters.

For example, assuming that all letters are within the original ASCII
range. This hasn't been true in Europe for decades.

Richard
0
raltbos
5/17/2015 10:13:56 AM
Bartc <bc@freeuk.com> wrote:

> I agree it would cause problems if people were to write: c>='A' && 
> c<='Z', but that's another matter. That would break under EBCDIC but 
> that's what warnings are for.

Indeed, and it would be _more_ difficult to write an acceptable
exception for conditional expressions than it would be to write one for
an extended case.
Which means that your argument that the existence of EBCDIC is the
crucial thing that means we don't have an extended case in C is
foundered by your own argument.

Richard
0
raltbos
5/17/2015 10:36:32 AM
In article <55586ec4.5718781@news.xs4all.nl>,
Richard Bos <rlbos@xs4all.nl> wrote:
>Bartc <bc@freeuk.com> wrote:
>
>> I agree it would cause problems if people were to write: c>='A' && 
>> c<='Z', but that's another matter. That would break under EBCDIC but 
>> that's what warnings are for.
>
>Indeed, and it would be _more_ difficult to write an acceptable
>exception for conditional expressions than it would be to write one for
>an extended case.
>Which means that your argument that the existence of EBCDIC is the
>crucial thing that means we don't have an extended case in C is
>foundered by your own argument.

I think you are talking in circles here.  I heard a great quote the other
day, that seems somehow appropriate: Speaking of one or another of the
Republican clown car guys, the line was "He sounds like what dumb people
think smart people sound like".

But, having said that, let's be clear that it was I, not Bart, who
suggested that it was because of EBCDIC that we couldn't have nice things.
And, further, I actually think that:

    case 'A'..'Z':

*should* mean (if it were legal C syntax, which it, of course, is not [yet])
the same thing as "isupper" - i.e. the upper case letters.  Bart & I
disagree here, as he thinks it should mean the same thing as:

    c>='A' && c<='Z'

As has been pointed out ad nauseum in this thread, this has equivalent
functionality in ASCII, but not so in EBCDIC.

-- 
Windows 95 n. (Win-doze): A 32 bit extension to a 16 bit user interface for
an 8 bit operating system based on a 4 bit architecture from a 2 bit company
that can't stand 1 bit of competition.

Modern day upgrade --> Windows XP Professional x64: Windows is now a 64 bit
tweak of a 32 bit extension to a 16 bit user interface for an 8 bit
operating system based on a 4 bit architecture from a 2 bit company that
can't stand 1 bit of competition.
0
gazelle
5/17/2015 12:26:36 PM
On 17/05/2015 11:36, Richard Bos wrote:
> Bartc <bc@freeuk.com> wrote:
>
>> I agree it would cause problems if people were to write: c>='A' &&
>> c<='Z', but that's another matter. That would break under EBCDIC but
>> that's what warnings are for.
>
> Indeed, and it would be _more_ difficult to write an acceptable
> exception for conditional expressions than it would be to write one for
> an extended case.
> Which means that your argument that the existence of EBCDIC is the
> crucial thing that means we don't have an extended case in C is
> foundered by your own argument.

Not at all. C allows c>='A' && c<'Z' even though what is probably 
intended would only work with ASCII and not EBCDIC.

But it doesn't allow case 'A'..'Z' even though exactly the same is true.

Part of the reason it doesn't allow is because of this problem (it's 
also easier to isolate the ".." or whatever construction is used to 
signify a set of consecutive values.

Most of us don't care that our code might not run on EBCDISC. And those 
that did, can continue writing out all the values individually.

(BTW would you have any objection to being able to write case 
'A','B','C': or does it absolutely have to be case ''A': case 'B': case 
'C': ? What about case "ABC": where the characters of the string are 
separated out into 'A', 'B', 'C'?)

-- 
Bartc
0
Bartc
5/17/2015 1:13:50 PM
In article <mja44d$n5p$1@dont-email.me>, Bartc  <bc@freeuk.com> wrote:
....
>Most of us don't care that our code might not run on EBCDISC. And those 
>that did, can continue writing out all the values individually.

True, 'dat!  But the point is that the standard has to continue to support
these weird (and by "weird", yes, I mean, "Other than ASCII") character sets.

That is, until such time, as it (i.e., the committee) decides not to
support them any more.  Something that *will* happen, at some future time...

>(BTW would you have any objection to being able to write case 
>'A','B','C': or does it absolutely have to be case ''A': case 'B': case 
>'C': ? What about case "ABC": where the characters of the string are 
>separated out into 'A', 'B', 'C'?)

The *right* solution to all of this (i.e., this specific problem as well as
a whole bunch more) is to allow expression in the case labels, like certain
other "C-like" languages, such as PHP, do.

That is, to allow:

switch (1) {
    case !!isupper(c):
    ...
    }

This form of switch exists in several "scripting type" languages, PHP
probably being the best known example.

P.S.  Yes, I can hear the caterwauling already about how inefficient this
would be - how it would prevent the compiler from using so-called "jump
tables" and so on.  Well, I'm not listening to said caterwauling...

-- 
"Unattended children will be given an espresso and a free kitten."
0
gazelle
5/17/2015 1:45:56 PM
On 17/05/2015 14:45, Kenny McCormack wrote:
> In article <mja44d$n5p$1@dont-email.me>, Bartc  <bc@freeuk.com> wrote:
> ...
>> Most of us don't care that our code might not run on EBCDISC. And those
>> that did, can continue writing out all the values individually.
>
> True, 'dat!  But the point is that the standard has to continue to support
> these weird (and by "weird", yes, I mean, "Other than ASCII") character sets.
>
> That is, until such time, as it (i.e., the committee) decides not to
> support them any more.  Something that *will* happen, at some future time...
>
>> (BTW would you have any objection to being able to write case
>> 'A','B','C': or does it absolutely have to be case ''A': case 'B': case
>> 'C': ? What about case "ABC": where the characters of the string are
>> separated out into 'A', 'B', 'C'?)
>
> The *right* solution to all of this (i.e., this specific problem as well as
> a whole bunch more) is to allow expression in the case labels, like certain
> other "C-like" languages, such as PHP, do.
>
> That is, to allow:
>
> switch (1) {
>      case !!isupper(c):
>      ...
>      }
>
> This form of switch exists in several "scripting type" languages, PHP
> probably being the best known example.
>
> P.S.  Yes, I can hear the caterwauling already about how inefficient this
> would be - how it would prevent the compiler from using so-called "jump
> tables" and so on.  Well, I'm not listening to said caterwauling...

That's already allowed, but you have to write it like this:

if (!!isupper) ...

But sometimes you do need a proper jump-table switch for speed, for 
which I suppose you can write:

  #define UPPER \
          'A': case 'B': case 'C': case 'D': case 'E': \
     case 'F': case 'G': case 'H': case 'I': case 'J': \
     case 'K': case 'M': case 'N': case 'O': case 'P': \
     case 'Q': case 'R': case 'S': case 'T': case 'U': \
     case 'V': case 'W': case 'X': case 'Y': case 'Z'

then use:

     switch (c) {
     case UPPER:

But, you still have write that lot in the first place, with the same 
potential for mistakes, which will now be propagated everywhere.

It still doesn't seem right that the programmer has to take care of this 
stuff because the language is unwilling to.

-- 
Bartc
0
Bartc
5/17/2015 2:02:08 PM
In article <mja6uv$10b$1@dont-email.me>, Bartc  <bc@freeuk.com> wrote:
....
>> switch (1) {
>>      case !!isupper(c):
>>      ...
>>      }
>>
....
>That's already allowed, but you have to write it like this:
>
>if (!!isupper) ...

No, that's not the same thing.  Think more carefully.
You might want to read the Wikipedia page about switch statements (which
was referenced somewhere upthread) and read specifically what it has to say
about the PHP style "switch" statement.

Besides, if you use an 'if' statement, then you don't need the '!!'.

-- 
"I heard somebody say, 'Where's Nelson Mandela?' Well, 
Mandela's dead. Because Saddam killed all the Mandelas." 

George W. Bush, on the former South African president who 
is still very much alive, Sept. 20, 2007

0
gazelle
5/17/2015 2:20:48 PM
On 17/05/2015 15:20, Kenny McCormack wrote:
> In article <mja6uv$10b$1@dont-email.me>, Bartc  <bc@freeuk.com> wrote:
> ...
>>> switch (1) {
>>>       case !!isupper(c):
>>>       ...
>>>       }
>>>
> ...
>> That's already allowed, but you have to write it like this:
>>
>> if (isupper(c)) ... [fixed]
>
> No, that's not the same thing.  Think more carefully.
> You might want to read the Wikipedia page about switch statements (which
> was referenced somewhere upthread) and read specifically what it has to say
> about the PHP style "switch" statement.
>
> Besides, if you use an 'if' statement, then you don't need the '!!'.

Sorry, I can't see the difference. In both cases some code gets executed 
when c is a capital letter. (I understand that form of switch won't work 
in C.)

-- 
Bartc
0
Bartc
5/17/2015 2:51:25 PM
Bartc <bc@freeuk.com> writes:
> On 17/05/2015 11:36, Richard Bos wrote:
>> Bartc <bc@freeuk.com> wrote:
>>
>>> I agree it would cause problems if people were to write: c>='A' &&
>>> c<='Z', but that's another matter. That would break under EBCDIC but
>>> that's what warnings are for.
>>
>> Indeed, and it would be _more_ difficult to write an acceptable
>> exception for conditional expressions than it would be to write one for
>> an extended case.
>> Which means that your argument that the existence of EBCDIC is the
>> crucial thing that means we don't have an extended case in C is
>> foundered by your own argument.
>
> Not at all. C allows c>='A' && c<'Z' even though what is probably
> intended would only work with ASCII and not EBCDIC.
>
> But it doesn't allow case 'A'..'Z' even though exactly the same is true.
>
> Part of the reason it doesn't allow is because of this problem (it's
> also easier to isolate the ".." or whatever construction is used to
> signify a set of consecutive values.

Certainly C doesn't support case ranges (gcc supports them as an
extension, but using "...", since that token already exists in the
language).  But I'm not convinced EBCDIC is the reason for that.

The lack of case ranges isn't an arbitrary distinction, it's just a
feature that doesn't exist in the language.  That may be a subtle
distinction, but it's significant.

C's switch statement is a very low-level construct, essentially a
computed goto (see also Duff's Device).  BCPL and B both had switch
statements very similar to C's, and neither supported ranges.  I doubt
that the design of BCPL was influenced by concern about EBCDIC.

> Most of us don't care that our code might not run on EBCDISC. And
> those that did, can continue writing out all the values individually.
>
> (BTW would you have any objection to being able to write case
> A','B','C': or does it absolutely have to be case ''A': case 'B': case
> C': ?

It depends on what you mean by "objection".  Certainly it's not
valid in current C.  It could be added without breaking anything;
the current syntax is

    case constant-expression :

and constant expression cannot include a comma operator.  I wouldn't
object to adding it to the language, but even if it appeared in, say,
C2020, I wouldn't be able to take advantage of it in code that needs
to be compiled by pre-C2020 compilers.  I doubt that it would be
added, since I've seen no demand for it other than your suggestion,
and it adds no real expressive power.  If someone designing a new
language wants to support a more terse syntax for multiple cases,
that's great; some languages already do.

I wouldn't advise someone designing a new language to use C-style switch
statements in the first place, unless C compatibility is an important
requirement.

>       What about case "ABC": where the characters of the string are
> separated out into 'A', 'B', 'C'?)

My only objection to that is that I find it a bit ugly.  It would
assign a new and very different meaning to string literals.
There's nothing inconsistent about it, but it could cause some
confusion; I've seen code by new programmers trying to use switch
statements on strings.  It adds no real expressive power, and IMHO
it's not worth the cost.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/17/2015 2:56:04 PM
Bartc <bc@freeuk.com> writes:
[...]
> if (!!isupper) ...
[...]

I don't know what this is in response to, but I presume you didn't
mean that literally.  The expression `!!isupper` simply yields 1,
since isupper is of function pointer type and is non-null.

You can certainly use the result of isupper() in an if statement, but
there's no need to apply `!!` to it.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/17/2015 2:58:49 PM
In article <mja9rb$b72$1@dont-email.me>, Bartc  <bc@freeuk.com> wrote:
....
>Sorry, I can't see the difference. In both cases some code gets executed 
>when c is a capital letter. (I understand that form of switch won't work 
>in C.)

You need to think beyond what's directly in front of you.  This is a high
bar for most CLCers.  That's what makes CLC CLC.  But I think you are
capable of it - although too much time spent in CLC tends to make one's
brain rot.

Anyway, I'm starting a new thread about the new form of switch() (Yes, the
one that C doesn't have [yet]).
-- 
"Unattended children will be given an espresso and a free kitten."
0
gazelle
5/17/2015 3:00:40 PM
On 17/05/2015 15:56, Keith Thompson wrote:
> Bartc <bc@freeuk.com> writes:

>>        What about case "ABC": where the characters of the string are
>> separated out into 'A', 'B', 'C'?)
>
> My only objection to that is that I find it a bit ugly.  It would
> assign a new and very different meaning to string literals.
> There's nothing inconsistent about it, but it could cause some
> confusion; I've seen code by new programmers trying to use switch
> statements on strings.  It adds no real expressive power, and IMHO
> it's not worth the cost.


Yet there is a precedent for it in C:

  char str[] = "ABC";

instead of having to write:

  char str[] = {'A','B','C',0};

(although there is the ambiguity of that final 0 which isn't needed when 
used in a switch case.)

The expressive power is in being able to match an arbitrary set of 
characters more easily, where even the range doesn't help. And it's 
easier also to define that string elsewhere, and use it in several 
places including the switch, as well as being able print it or pass it 
to functions.

Something that exists only as case 'X': case 'P': case '.' is less flexible.

Anyway that was just something I thought of while writing that post.

-- 
Bartc




0
Bartc
5/17/2015 3:14:30 PM
Keith Thompson <kst-u@mib.org> wrote:
> Bartc <bc@freeuk.com> writes:

(snip)
>> Not at all. C allows c>='A' && c<'Z' even though what is probably
>> intended would only work with ASCII and not EBCDIC.

>> But it doesn't allow case 'A'..'Z' even though exactly the same is true.

>> Part of the reason it doesn't allow is because of this problem (it's
>> also easier to isolate the ".." or whatever construction is used to
>> signify a set of consecutive values.
 
> Certainly C doesn't support case ranges (gcc supports them as an
> extension, but using "...", since that token already exists in the
> language).  But I'm not convinced EBCDIC is the reason for that.

I am not at all convinced, either, though have no direct evidence
either way.
 
> The lack of case ranges isn't an arbitrary distinction, it's just a
> feature that doesn't exist in the language.  That may be a subtle
> distinction, but it's significant.
 
> C's switch statement is a very low-level construct, essentially a
> computed goto (see also Duff's Device).  BCPL and B both had switch
> statements very similar to C's, and neither supported ranges.  I doubt
> that the design of BCPL was influenced by concern about EBCDIC.

Having learned Fortran long before C, I was surprised after some
years to find out when it isn't.  If you put a 

   case 1000000000:

you find the difference.
 
(snip)

> It depends on what you mean by "objection".  Certainly it's not
> valid in current C.  It could be added without breaking anything;
> the current syntax is
 
>    case constant-expression :
 
> and constant expression cannot include a comma operator.  I wouldn't
> object to adding it to the language, but even if it appeared in, say,
> C2020, I wouldn't be able to take advantage of it in code that needs
> to be compiled by pre-C2020 compilers.  

If a new switch/case is added to C, I would suggest one like
the verilog case statement.

In verilog, the cases aren't required to be constant, and the first
one that matches is taken. (And no fall through, either.)
(Though most verilog code with case does use constants.)

> I doubt that it would be
> added, since I've seen no demand for it other than your suggestion,
> and it adds no real expressive power.  If someone designing a new
> language wants to support a more terse syntax for multiple cases,
> that's great; some languages already do.

-- glen
0
glen
5/17/2015 3:46:34 PM
Bartc <bc@freeuk.com> writes:
> On 17/05/2015 15:56, Keith Thompson wrote:
>> Bartc <bc@freeuk.com> writes:
>
>>>        What about case "ABC": where the characters of the string are
>>> separated out into 'A', 'B', 'C'?)
>>
>> My only objection to that is that I find it a bit ugly.  It would
>> assign a new and very different meaning to string literals.
>> There's nothing inconsistent about it, but it could cause some
>> confusion; I've seen code by new programmers trying to use switch
>> statements on strings.  It adds no real expressive power, and IMHO
>> it's not worth the cost.
>
> Yet there is a precedent for it in C:
>
>  char str[] = "ABC";
>
> instead of having to write:
>
>  char str[] = {'A','B','C',0};

No, that's nothing more than the usual meaning of a string literal.
(It's one of the three cases where an expression of array type isn't
implicitly converted to a pointer.)

> (although there is the ambiguity of that final 0 which isn't needed
> when used in a switch case.)
>
> The expressive power is in being able to match an arbitrary set of
> characters more easily, where even the range doesn't help. And it's
> easier also to define that string elsewhere, and use it in several
> places including the switch, as well as being able print it or pass it
> to functions.
>
> Something that exists only as case 'X': case 'P': case '.' is less
> flexible.

When I said it adds no expressive power, I meant that there's
nothing you can express using the proposed construct that can't be
straightforwardly expressed (though more verbosely) using existing
constructs.

As for defining it elsewhere, you could always write:

    #define CASE_UPPER \
        case 'A': case 'B': case 'C': \
        ... \
        case 'Z'

> Anyway that was just something I thought of while writing that post.

If you're going to make switch statements more flexible, you'd have
to reach a consensus on just how far to take it.  Should non-constant
expressions be permitted?  String comparisons?  Floating-point?
Should there be a special syntax that matches uppercase letters
regardless of the character set?

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/17/2015 7:57:54 PM
On 17-May-15 08:45, Kenny McCormack wrote:
> Bartc  <bc@freeuk.com> wrote:
>> (BTW would you have any objection to being able to write case 
>> 'A','B','C': or does it absolutely have to be case ''A': case 'B':
>> case 'C': ? What about case "ABC": where the characters of the
>> string are separated out into 'A', 'B', 'C'?)
> 
> The *right* solution to all of this (i.e., this specific problem as
> well as a whole bunch more) is to allow expression in the case
> labels, like certain other "C-like" languages, such as PHP, do.
> 
> That is, to allow:
> 
> switch (1) { case !!isupper(c): ... }
> 
> This form of switch exists in several "scripting type" languages,
> PHP probably being the best known example.

That sounds great, but what if two (or more) of the case expressions
were true, which might not be determinable at compile time because the
value returned by isuppper() depends on the locale?  Do you execute the
first one?  All of them?  Would the result be implementation-defined,
unspecified, or undefined?

> P.S.  Yes, I can hear the caterwauling already about how inefficient
> this would be - how it would prevent the compiler from using
> so-called "jump tables" and so on.  Well, I'm not listening to said
> caterwauling...

An inability to use jump tables, which is the main advantage of a switch
statement over an if/else chain, seems to defeat the purpose.

Bartc's suggestion of "case 'A','B','C':" would be compatible with jump
tables, however, since it's just a syntactic shortcut for the existing
"case 'A': case 'B': case 'C':".

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/17/2015 8:35:35 PM
On 17-May-15 09:02, Bartc wrote:
> But sometimes you do need a proper jump-table switch for speed, for 
> which I suppose you can write:
> 
> #define UPPER \ 'A': case 'B': case 'C': case 'D': case 'E': \ case
> 'F': case 'G': case 'H': case 'I': case 'J': \ case 'K': case 'M':
> case 'N': case 'O': case 'P': \ case 'Q': case 'R': case 'S': case
> 'T': case 'U': \ case 'V': case 'W': case 'X': case 'Y': case 'Z'
> 
> then use:
> 
> switch (c) { case UPPER:
> 
> But, you still have write that lot in the first place, with the same 
> potential for mistakes, which will now be propagated everywhere.

Worse, now your lack of support for non-ASCII letters is hidden in some
header file, rather than in the code where it belongs.

That's part of why we have isupper() et al.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/17/2015 8:36:45 PM
On 17-May-15 09:56, Keith Thompson wrote:
> I've seen code by new programmers trying to use switch statements on
> strings.  It adds no real expressive power, and IMHO it's not worth
> the cost.

If I were going to muck with how switch statements work, that's probably
the first place I'd look.  I've seen a _lot_ of code that would be much
clearer if it could simply switch on a string value.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/17/2015 8:39:21 PM
Stephen Sprunk <stephen@sprunk.org> writes:
> On 17-May-15 09:56, Keith Thompson wrote:
>> I've seen code by new programmers trying to use switch statements on
>> strings.  It adds no real expressive power, and IMHO it's not worth
>> the cost.
>
> If I were going to muck with how switch statements work, that's probably
> the first place I'd look.  I've seen a _lot_ of code that would be much
> clearer if it could simply switch on a string value.

Sure, I've seen code that could sensibly be implemented using a switch
on a string value, equivalent to:

    if (strcmp(s, "this") == 0) {
        /* ... */
    }
    else if (strcmp(s, "that") == 0) {
        /* ... */
    }
    ...

But very often the required test for each case isn't just a string
equality comparison.  Frequently you really need a case-insensitive
comparison, or a substring comparison, or even a regular expression
test.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/17/2015 8:58:31 PM
On 17/05/2015 21:36, Stephen Sprunk wrote:
> On 17-May-15 09:02, Bartc wrote:
>> But sometimes you do need a proper jump-table switch for speed, for
>> which I suppose you can write:
>>
>> #define UPPER \ 'A': case 'B': case 'C': case 'D': case 'E': \ case
>> 'F': case 'G': case 'H': case 'I': case 'J': \ case 'K': case 'M':
>> case 'N': case 'O': case 'P': \ case 'Q': case 'R': case 'S': case
>> 'T': case 'U': \ case 'V': case 'W': case 'X': case 'Y': case 'Z'
>>
>> then use:
>>
>> switch (c) { case UPPER:
>>
>> But, you still have write that lot in the first place, with the same
>> potential for mistakes, which will now be propagated everywhere.
>
> Worse, now your lack of support for non-ASCII letters is hidden in some
> header file, rather than in the code where it belongs.

So how would you code a single switch case to detect all 26 letters of 
the alphabet? (And if you say use something other than a switch, then 
that's side-stepping the issue.)

Anyway using a macro such as UPPER is one approach that can work 
provided it's reliably defined, if there is no way to avoid enumerating 
every single character. (Maybe you've noticed there's a letter missing.) 
However this obviously needs to be done; it would be better if it was 
built-in.

  >That's part of why we have isupper() et al.

What does it do with character codes 128 to 255? This is an unknown. If 
I'm parsing some input and specifically need to detect A-Z, a-z, 0-9 and 
a few extra, then it's not so useful.

-- 
Bartc

0
Bartc
5/17/2015 9:28:33 PM
On Sunday, 17 May 2015 01:13:55 UTC+3, Stephen Sprunk  wrote:
> >> English uses the "modern" Latin alphabet, which adds some letters,
> >> e.g. J, U and W, and excludes some others, e.g. =C6, =DE, OE.
>=20
> It's interesting that your reply has "OE" rather than the "OE" of my
> original post.  I've never seen a reply decompose a glyph like that;
> either it comes through correctly or it's completely garbled.

Interesting indeed; I used Google Groups to reply. Lets see if it does
that again above. I observe both in quoted text above.
0
ISO
5/18/2015 5:34:57 AM
On 05/17/2015 11:46 AM, glen herrmannsfeldt wrote:
> Keith Thompson <kst-u@mib.org> wrote:
....
>> C's switch statement is a very low-level construct, essentially a
>> computed goto (see also Duff's Device).  BCPL and B both had switch
>> statements very similar to C's, and neither supported ranges.  I doubt
>> that the design of BCPL was influenced by concern about EBCDIC.
> 
> Having learned Fortran long before C, I was surprised after some
> years to find out when it isn't.  If you put a 
> 
>    case 1000000000:
> 
> you find the difference.

It could very easily be due to a lack of sleep on my part, but I had
trouble understanding that comment. If you were to expand the phrase
"when it isn't" to "when A is not B", what would "A" be, and what would
"B" be?

What is the difference you found with the following line?

	case 1000000000:

That line causes the value 1000000000 to be converted to the promoted
type of the controlling switch expression. If that's a type which isn't
capable of representing that value, the results can be unexpected, but a
decent compiler should warn you of that problem. In principle, that's
equally true of any case label, though a value that large is more likely
to trigger such a problem. If that's the issue you're referring to, you
should have provided information about the type of the controlling
switch expression. Otherwise, that looks to me like a perfectly ordinary
case label.
0
James
5/18/2015 3:10:10 PM
James Kuyper <jameskuyper@verizon.net> writes:
> On 05/17/2015 11:46 AM, glen herrmannsfeldt wrote:
>> Keith Thompson <kst-u@mib.org> wrote:
> ...
>>> C's switch statement is a very low-level construct, essentially a
>>> computed goto (see also Duff's Device).  BCPL and B both had switch
>>> statements very similar to C's, and neither supported ranges.  I doubt
>>> that the design of BCPL was influenced by concern about EBCDIC.
>> 
>> Having learned Fortran long before C, I was surprised after some
>> years to find out when it isn't.  If you put a 
>> 
>>    case 1000000000:
>> 
>> you find the difference.
>
> It could very easily be due to a lack of sleep on my part, but I had
> trouble understanding that comment. If you were to expand the phrase
> "when it isn't" to "when A is not B", what would "A" be, and what would
> "B" be?
>
> What is the difference you found with the following line?
>
> 	case 1000000000:
>
> That line causes the value 1000000000 to be converted to the promoted
> type of the controlling switch expression. If that's a type which isn't
> capable of representing that value, the results can be unexpected, but a
> decent compiler should warn you of that problem. In principle, that's
> equally true of any case label, though a value that large is more likely
> to trigger such a problem. If that's the issue you're referring to, you
> should have provided information about the type of the controlling
> switch expression. Otherwise, that looks to me like a perfectly ordinary
> case label.

A sufficiently naive compiler, given

    case 1000000000:

might (try to) generate a jump table with a billion entries.
(I've seen exactly that problem for the equivalent construct with
a Pascal compiler, though things started blowing up around 10,000.)

I haven't checked, but I'm skeptical that any "sufficiently naive"
compilers in thise sense are currently in production.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/18/2015 3:56:14 PM
James Kuyper <jameskuyper@verizon.net> wrote:

(snip, I wrote)
>> Having learned Fortran long before C, I was surprised after some
>> years to find out when it isn't.  If you put a 
 
>>    case 1000000000:
 
>> you find the difference.
 
> It could very easily be due to a lack of sleep on my part, but I had
> trouble understanding that comment. If you were to expand the phrase
> "when it isn't" to "when A is not B", what would "A" be, and what would
> "B" be?
 
> What is the difference you found with the following line?
 
>        case 1000000000:

Yes, I didn't make it very clear at all.  Consider:

switch(i) {
   case 1: s="one"; break;
   case 1000000: s="million"; break;
   case 1000000000: s="billion"; break;
   }

I do hope the compiler doesn't generate a 1000000001 entry branch table.
If all the cases are close together:

switch(i) {
   case 999999999: s="lots of nines"; break;
   case 1000000000: s="billion"; break;
   case 1000000001: s="billion and one"; break;
   case 1000000002: s="billion and two"; break;
   }

The compiler might subtract 999999999 and then use a small branch
table on the difference.

(snip)

-- glen
0
glen
5/18/2015 6:03:03 PM
Keith Thompson <kst-u@mib.org> wrote:
> James Kuyper <jameskuyper@verizon.net> writes:

(snip, I wrote)
>>> Having learned Fortran long before C, I was surprised after some
>>> years to find out when it isn't.  If you put a 
 
>>>    case 1000000000:
 
(snip)
>> What is the difference you found with the following line?

>>       case 1000000000:

(snip)

> A sufficiently naive compiler, given
 
>    case 1000000000:
 
> might (try to) generate a jump table with a billion entries.
> (I've seen exactly that problem for the equivalent construct with
> a Pascal compiler, though things started blowing up around 10,000.)

> I haven't checked, but I'm skeptical that any "sufficiently naive"
> compilers in thise sense are currently in production.

It was some years ago when I first tried that.  But then again, 
compilers stay in use long after they are out of production.

I was decoding the high bits of a 32 bit word, such as:

   case 0x10000000: ...
   case 0x20000000: ...
   case 0x30000000: ...
   case 0x40000000: ...

My first use of switch/case was a translation of a Pascal program...

-- glen
0
glen
5/18/2015 6:09:01 PM
On 05/18/2015 02:03 PM, glen herrmannsfeldt wrote:
> James Kuyper <jameskuyper@verizon.net> wrote:
> 
> (snip, I wrote)
>>> Having learned Fortran long before C, I was surprised after some
>>> years to find out when it isn't.  If you put a 
>  
>>>    case 1000000000:
>  
>>> you find the difference.
>  
>> It could very easily be due to a lack of sleep on my part, but I had
>> trouble understanding that comment. If you were to expand the phrase
>> "when it isn't" to "when A is not B", what would "A" be, and what would
>> "B" be?

You didn't answer that bit, so I'm still confused about it.

>> What is the difference you found with the following line?
>  
>>        case 1000000000:
> 
> Yes, I didn't make it very clear at all.  Consider:
> 
> switch(i) {
>    case 1: s="one"; break;
>    case 1000000: s="million"; break;
>    case 1000000000: s="billion"; break;
>    }
> 
> I do hope the compiler doesn't generate a 1000000001 entry branch table.

That's purely a quality of implementation issue (QoI), and I would
expect any decent compiler to generate a branch table only if doing so
was superior to the alternatives. Given that the equivalent of the
following code is also a valid alternative:

    if(i == 1)
        s = "one";
    else if(i == 1000000)
       s = "million";
    else if(i == 1000000000)
       s = "billion";

It's hard to imagine a compiler validly concluding that such a huge
branch table is a superior approach.
Unless, of course, the required memory is cheaper (in all relevant
senses of the word) than the average CPU time needed to evaluate the
if()s, in which case, what's wrong with using a billion-entry branch table?

0
James
5/18/2015 6:41:00 PM
Stephen Sprunk wrote:
> On 17-May-15 14:28, Bartc wrote:
>> On 17/05/2015 21:36, Stephen Sprunk wrote:
>>> On 17-May-15 09:02, Bartc wrote:
>>>> switch (c) { case UPPER:
>>>>
>>>> But, you still have write that lot in the first place, with the
>>>> same potential for mistakes, which will now be propagated
>>>> everywhere.
>>>
>>> Worse, now your lack of support for non-ASCII letters is hidden in
>>> some header file, rather than in the code where it belongs.
>>
>> So how would you code a single switch case to detect all 26 letters
>> of the alphabet? (And if you say use something other than a switch,
>> then that's side-stepping the issue.)
>
> Who says the user's alphabet has 26 letters--or that the user's script
> is an alphabet at all, or that the concept of "upper case" applies?
>
>> Anyway using a macro such as UPPER is one approach that can work
>> provided it's reliably defined, if there is no way to avoid
>> enumerating every single character. (Maybe you've noticed there's a
>> letter missing.) However this obviously needs to be done; it would be
>> better if it was built-in.
>
> It doesn't seem to be so "obvious" to those who work with languages
> other than English, i.e. the majority of humanity.
>
>>> That's part of why we have isupper() et al.
>>
>> What does it do with character codes 128 to 255? This is an unknown.
>> If I'm parsing some input and specifically need to detect A-Z, a-z,
>> 0-9 and a few extra, then it's not so useful.
>
> If you need to detect those _specific_ characters, and you can safely
> ignore the million or so other characters, then you can spend a minute
> or two writing the cases out--but I agree that "case 'A','B':" would be
> a nice syntactic shortcut for such code.


case 'A':
case 'B':

Oh so onerous :)

> We've already demonstrated why
> we can't have "case 'A'..'Z':", which I know is what you really want.
>

If you want Pascal, use Pascal. Royal "you" there, not pointed at Stephen.

> OTOH, if what you really want is to detect letters and numbers in
> general, then there are various is*() functions that are far better for
> that purpose than anything the average programmer will come up with,


+1

> and
> IMHO we shouldn't tempt him away from the correct solution by offering a
> better way to do the incorrect solution.
>
> S
>


-- 
Les Cargill
0
Les
5/20/2015 1:01:01 AM
bart4858@gmail.com wrote:
> On Wednesday, 20 May 2015 17:52:26 UTC+1, Les Cargill  wrote:
>> Stephen Sprunk wrote:
>
>>> or two writing the cases out--but I agree that "case 'A','B':"
>>> would be a nice syntactic shortcut for such code.
>>
>>
>> case 'A': case 'B':
>>
>> Oh so onerous :)
>
> Try writing 26 of them. Or 100, if you want to detect the range 100
> to 199 for example (or 100 to c. 209 with every tenth intermediate
> value skipped, just to forestall you suggesting using an 'if').
>

Doctor, doctor, it hurts when I do that...

Declarative range checks and possibly tables spring to mind
as better approaches.

>>> We've already demonstrated why we can't have "case 'A'..'Z':",
>>> which I know is what you really want.
>>>
>>
>> If you want Pascal, use Pascal. Royal "you" there, not pointed at
>> Stephen.
>
> Pascal had some good stuff in it. But I can write 'A'..'Z'  now, and
> I can make it compile under C. However I need to go to the trouble of
> superimposing a somewhat different language on top of it when it
> could be implemented so easily.
>

Sure. But it's not 'C' any more.

"switch" control in 'C' would compile to a jump table in
older compilers. Very handy when you were shaving every cycle.
Today, it's less of a thing.

> Look, what exactly would happen if it /was/ available in C? Probably
> nothing. If you're worried about EBCDIC, then don't use it. (I
> understand that gcc may already have that extension, and the world
> doesn't seem to have ended yet.)
>

Wouldn't hurt a thing, really. I think it's somewhat less concise than
"if (x<0) ; else if (x>100) ; else
{
}"

with the appropriate bracing/line breaks.

> If a program intended for ASCII is ever run on EBCDIC hardware, then
> there may indeed be a problem, but I suspect it will go wrong in a
> dozen other ways first, including not being able to compile at all
> (many C applications seem extremely fragile).
>
> (In the early 80s I was programming machines with Z80s with display
> systems based on ASCII codes for text. I was anyway coding font
> bitmaps and glyphs myself based on codes that I chose, which usually
> were ASCII (sometimes dropping to 6-bit, but A-Z were still
> consecutive).
>
> The chances of my code running on some IBM mainframe were miniscule.
> So why /shouldn't/ I write 'A'..'Z' and assume ASCII code points? As
> I said above, other things would break first, such as the graphics
> hardware being obsolete and incompatible anyway.)
>

-- 
Les Cargill

0
Les
5/20/2015 1:01:01 AM
On 17-May-15 22:34, Öö Tiib wrote:
> On Sunday, 17 May 2015 01:13:55 UTC+3, Stephen Sprunk  wrote:
>>>> English uses the "modern" Latin alphabet, which adds some letters,
>>>> e.g. J, U and W, and excludes some others, e.g. Æ, Þ, OE.
>>
>> It's interesting that your reply has "OE" rather than the "OE" of my
>> original post.  I've never seen a reply decompose a glyph like that;
>> either it comes through correctly or it's completely garbled.
> 
> Interesting indeed; I used Google Groups to reply. Lets see if it does
> that again above. I observe both in quoted text above.

Decomposed again.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/20/2015 3:47:53 PM
On 17-May-15 14:28, Bartc wrote:
> On 17/05/2015 21:36, Stephen Sprunk wrote:
>> On 17-May-15 09:02, Bartc wrote:
>>> switch (c) { case UPPER:
>>> 
>>> But, you still have write that lot in the first place, with the
>>> same potential for mistakes, which will now be propagated
>>> everywhere.
>> 
>> Worse, now your lack of support for non-ASCII letters is hidden in
>> some header file, rather than in the code where it belongs.
> 
> So how would you code a single switch case to detect all 26 letters
> of the alphabet? (And if you say use something other than a switch,
> then that's side-stepping the issue.)

Who says the user's alphabet has 26 letters--or that the user's script
is an alphabet at all, or that the concept of "upper case" applies?

> Anyway using a macro such as UPPER is one approach that can work 
> provided it's reliably defined, if there is no way to avoid
> enumerating every single character. (Maybe you've noticed there's a
> letter missing.) However this obviously needs to be done; it would be
> better if it was built-in.

It doesn't seem to be so "obvious" to those who work with languages
other than English, i.e. the majority of humanity.

>> That's part of why we have isupper() et al.
> 
> What does it do with character codes 128 to 255? This is an unknown.
> If I'm parsing some input and specifically need to detect A-Z, a-z,
> 0-9 and a few extra, then it's not so useful.

If you need to detect those _specific_ characters, and you can safely
ignore the million or so other characters, then you can spend a minute
or two writing the cases out--but I agree that "case 'A','B':" would be
a nice syntactic shortcut for such code.  We've already demonstrated why
we can't have "case 'A'..'Z':", which I know is what you really want.

OTOH, if what you really want is to detect letters and numbers in
general, then there are various is*() functions that are far better for
that purpose than anything the average programmer will come up with, and
IMHO we shouldn't tempt him away from the correct solution by offering a
better way to do the incorrect solution.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/20/2015 3:59:22 PM
Stephen Sprunk <stephen@sprunk.org> writes:
[...]
> OTOH, if what you really want is to detect letters and numbers in
> general, then there are various is*() functions that are far better for
> that purpose than anything the average programmer will come up with, and
> IMHO we shouldn't tempt him away from the correct solution by offering a
> better way to do the incorrect solution.

I suspect the average programmer wouldn't write a set of is*() and to*(
functions that have undefined behavior when called with a negative
`char` argument other than EOF.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/20/2015 5:44:46 PM
On Wednesday, 20 May 2015 16:59:31 UTC+1, Stephen Sprunk  wrote:
> On 17-May-15 14:28, Bartc wrote:

> > Anyway using a macro such as UPPER is one approach that can work 
> > provided it's reliably defined, if there is no way to avoid
> > enumerating every single character. (Maybe you've noticed there's a
> > letter missing.) However this obviously needs to be done; it would be
> > better if it was built-in.
> 
> It doesn't seem to be so "obvious" to those who work with languages
> other than English, i.e. the majority of humanity.

I've already posted the various ways in which that 26-letter alphabet dominates.

There's also a reason why it's the set supported in ASCII (now the base-set of Unicode and the only letters that don't need escapes in UTF8).

So it is still very important. In fact, you can see A-Z almost anywhere, and everyone will know what is meant.

That's in the English speaking world, but it's still a very dominant language, although other languages are available. But if someone is coding in English, then why shouldn't they be able to express A-Z succinctly?

(Posting from google-groups (and in a place where the official language isn't English, but nearly everyone speaks it anyway. It gets everywhere) so apologies if this comes out screwed up.)

-- 
bartc

> 
> >> That's part of why we have isupper() et al.
> > 
> > What does it do with character codes 128 to 255? This is an unknown.
> > If I'm parsing some input and specifically need to detect A-Z, a-z,
> > 0-9 and a few extra, then it's not so useful.
> 
> If you need to detect those _specific_ characters, and you can safely
> ignore the million or so other characters, then you can spend a minute
> or two writing the cases out--but I agree that "case 'A','B':" would be
> a nice syntactic shortcut for such code.  We've already demonstrated why
> we can't have "case 'A'..'Z':", which I know is what you really want.
> 
> OTOH, if what you really want is to detect letters and numbers in
> general, then there are various is*() functions that are far better for
> that purpose than anything the average programmer will come up with, and
> IMHO we shouldn't tempt him away from the correct solution by offering a
> better way to do the incorrect solution.
> 
> S
> 
> -- 
> Stephen Sprunk         "God does not play dice."  --Albert Einstein
> CCIE #3723         "God is an inveterate gambler, and He throws the
> K5SSS        dice at every possible opportunity." --Stephen Hawking

0
bart4858
5/20/2015 8:42:24 PM
On Wednesday, 20 May 2015 17:52:26 UTC+1, Les Cargill  wrote:
> Stephen Sprunk wrote:

> > or two writing the cases out--but I agree that "case 'A','B':" would be
> > a nice syntactic shortcut for such code.
>=20
>=20
> case 'A':
> case 'B':
>=20
> Oh so onerous :)

Try writing 26 of them. Or 100, if you want to detect the range 100 to 199 =
for example (or 100 to c. 209 with every tenth intermediate value skipped, =
just to forestall you suggesting using an 'if').
=20
> > We've already demonstrated why
> > we can't have "case 'A'..'Z':", which I know is what you really want.
> >
>=20
> If you want Pascal, use Pascal. Royal "you" there, not pointed at Stephen=
..

Pascal had some good stuff in it. But I can write 'A'..'Z'  now, and I can =
make it compile under C. However I need to go to the trouble of superimposi=
ng a somewhat different language on top of it when it could be implemented =
so easily.

Look, what exactly would happen if it /was/ available in C? Probably nothin=
g. If you're worried about EBCDIC, then don't use it. (I understand that gc=
c may already have that extension, and the world doesn't seem to have ended=
 yet.)

If a program intended for ASCII is ever run on EBCDIC hardware, then there =
may indeed be a problem, but I suspect it will go wrong in a dozen other wa=
ys first, including not being able to compile at all (many C applications s=
eem extremely fragile).

(In the early 80s I was programming machines with Z80s with display systems=
 based on ASCII codes for text. I was anyway coding font bitmaps and glyphs=
 myself based on codes that I chose, which usually were ASCII (sometimes dr=
opping to 6-bit, but A-Z were still consecutive).

The chances of my code running on some IBM mainframe were miniscule. So why=
 /shouldn't/ I write 'A'..'Z' and assume ASCII code points? As I said above=
, other things would break first, such as the graphics hardware being obsol=
ete and incompatible anyway.)

--=20
Bartc
0
bart4858
5/20/2015 9:00:30 PM
bart4858@gmail.com writes:
> On Wednesday, 20 May 2015 16:59:31 UTC+1, Stephen Sprunk  wrote:
>> On 17-May-15 14:28, Bartc wrote:
>
>> > Anyway using a macro such as UPPER is one approach that can work 
>> > provided it's reliably defined, if there is no way to avoid
>> > enumerating every single character. (Maybe you've noticed there's a
>> > letter missing.) However this obviously needs to be done; it would be
>> > better if it was built-in.
>> 
>> It doesn't seem to be so "obvious" to those who work with languages
>> other than English, i.e. the majority of humanity.
>
> I've already posted the various ways in which that 26-letter alphabet
> dominates.
>
> There's also a reason why it's the set supported in ASCII (now the
> base-set of Unicode and the only letters that don't need escapes in
> UTF8).
>
> So it is still very important. In fact, you can see A-Z almost
> anywhere, and everyone will know what is meant.
>
> That's in the English speaking world, but it's still a very dominant
> language, although other languages are available. But if someone is
> coding in English, then why shouldn't they be able to express A-Z
> succinctly?

Because C doesn't support case ranges.

You think it should.  We get it.  Speaking only for myself, I don't
necessarily disagree.

But convincing us here in comp.lang.c that C *should* support case
ranges is neither necessary nor sufficient to change the language.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/20/2015 9:17:17 PM
In article <87twv70x0i.fsf@kst-u.example.com>,
Keith Thompson  <kst-u@mib.org> wrote:
....
>Because C doesn't support case ranges.
>
>You think it should.  We get it.  Speaking only for myself, I don't
>necessarily disagree.
>
>But convincing us here in comp.lang.c that C *should* support case
>ranges is neither necessary nor sufficient to change the language.

Quite so.

I think the salient point here is that gcc *does* support it, and, really,
when you get right down to it, why aren't you (very, very rhetorical "you"
here) using gcc to compile your code anyway?

The choices are pretty clear-cut: If you want maximum flexibility and, dare
I say it, fun, in your C programming, then use GCC & POSIX - both of which
extend quite a bit the very primitive language described in the beloved
ANSI C standards documents.  If you want maximum portability then, as is
noted many times every day in this newsgroup, stick to *only* what is in
those documents.

Seems pretty clear to me.

Now, if I can just get the GCC folks to add in a "Kenny-style" switch()
statement...

-- 
Those on the right constantly remind us that America is not a
democracy; now they claim that Obama is a threat to democracy.
0
gazelle
5/20/2015 9:25:47 PM
bart4858@gmail.com wrote:
> On Thursday, 21 May 2015 00:04:05 UTC+1, Les Cargill  wrote:
>> Bart:
>
>>>> Oh so onerous :)
>>>
>>> Try writing 26 of them. Or 100, if you want to detect the range 100
>>> to 199 for example (or 100 to c. 209 with every tenth intermediate
>>> value skipped, just to forestall you suggesting using an 'if').
>>>
>>
>> Doctor, doctor, it hurts when I do that...
>>
>> Declarative range checks and possibly tables spring to mind
>> as better approaches.
> ....
>> Sure. But it's not 'C' any more.
>>
>> "switch" control in 'C' would compile to a jump table in
>> older compilers. Very handy when you were shaving every cycle.
>> Today, it's less of a thing.
>
> It's about being expressive. To check whether some value lies
> in one particular set of values or another, then 'switch' is
> a good way of coding that.
>
> A set will be some combination of single values and ranges.
> That can be concisely denoted by syntax such as case 10,12..20:
> instead of enumerating each value (and possibly getting one
> wrong, as happens with tedious repetition).  (It can also be
> impossible when the endpoints of the range are symbols or
> expressions). It can still be C. But the intention can be taken
> in at a glance.
>

When I've run into cases where I felt there was risk from this -
around 200 values it starts to be pretty painful - I'd write something
in a scripting language to generate the overall structure.

If it gets big enough, it's worth thinking in tables. Having
a bespoke switch with even dozens of labels is probably doing
it wrong.


> It's not good when you have to start looking at workarounds
> because the language doesn't support what you want to do.
>
>

Sure. So maybe 'C' is not for you. I don't mean that to sound
elitist, but if it bothers  you, use something else.

IMO, you ( or at least I ) see what happened with C++.
It seemed like a good idea at the time...

-- 
Les Cargill
0
Les
5/21/2015 1:01:01 AM
On Thursday, 21 May 2015 00:04:05 UTC+1, Les Cargill  wrote:
>Bart:

> >> Oh so onerous :)
> >
> > Try writing 26 of them. Or 100, if you want to detect the range 100
> > to 199 for example (or 100 to c. 209 with every tenth intermediate
> > value skipped, just to forestall you suggesting using an 'if').
> >
> 
> Doctor, doctor, it hurts when I do that...
> 
> Declarative range checks and possibly tables spring to mind
> as better approaches.
.....
> Sure. But it's not 'C' any more.
> 
> "switch" control in 'C' would compile to a jump table in
> older compilers. Very handy when you were shaving every cycle.
> Today, it's less of a thing.

It's about being expressive. To check whether some value lies
in one particular set of values or another, then 'switch' is
a good way of coding that.

A set will be some combination of single values and ranges.
That can be concisely denoted by syntax such as case 10,12..20:
instead of enumerating each value (and possibly getting one
wrong, as happens with tedious repetition). (It can also be
impossible when the endpoints of the range are symbols or
expressions). It can still be C. But the intention can be taken
in at a glance.

It's not good when you have to start looking at workarounds
because the language doesn't support what you want to do.


-- 
Bartc
0
bart4858
5/21/2015 7:57:08 AM
On Thursday, 21 May 2015 18:34:46 UTC+1, Les Cargill  wrote:
> bart....@gmail.com wrote:

> > A set will be some combination of single values and ranges.
> > That can be concisely denoted by syntax such as case 10,12..20:
> > instead of enumerating each value (and possibly getting one
> > wrong, as happens with tedious repetition).  (It can also be
> > impossible when the endpoints of the range are symbols or
> > expressions). It can still be C. But the intention can be taken
> > in at a glance.

> If it gets big enough, it's worth thinking in tables. Having
> a bespoke switch with even dozens of labels is probably doing
> it wrong.

Tell that to the people who wrote CPython; the main dispatch loop is a giant switch. (Although you can't recognise it beneath a mess of multi-layered macros.)  

> > It's not good when you have to start looking at workarounds
> > because the language doesn't support what you want to do.
> >
> >
> 
> Sure. So maybe 'C' is not for you. I don't mean that to sound
> elitist, but if it bothers  you, use something else.

I have done and I do. But I also have to be involved with C
from time to time. My OS uses APIs defined in C for example.

> IMO, you ( or at least I ) see what happened with C++.
> It seemed like a good idea at the time...

I agree. But I reckon I know how to make modest enhancements
to a language without turning it a huge, incomprehensible mess
that no single person understands fully.

(Actually, this is a list I wrote up some time ago, of 90-odd
features of my own C-class language which I considered
improvements from their C equivalents or omissions:

http://pastebin.com/9ZdmGd68

The resulting language is still low-level (it does what C does).
Most features are straightforward to implement, although not all
can be retro-fitted into C easily.

The 'switch' features mentioned in this thread are #45 and #46
in the list.)

-- 
Bartc
0
bart4858
5/21/2015 8:25:47 PM
bart4858@gmail.com writes:
[...]
> (Actually, this is a list I wrote up some time ago, of 90-odd
> features of my own C-class language which I considered
> improvements from their C equivalents or omissions:
>
> http://pastebin.com/9ZdmGd68
>
> The resulting language is still low-level (it does what C does).
> Most features are straightforward to implement, although not all
> can be retro-fitted into C easily.
>
> The 'switch' features mentioned in this thread are #45 and #46
> in the list.)

Interesting.  I like quite a few of these features.

A few comments:

    5 Binary literals can be written as 2x1101 or 1101B (C will have
      this soon)

I haven't heard of any proposals to add binary integer constants
to C; have you?  C++ has added them with the syntax `0b1101` or
`0B1101`.  I'd like to see C adopt this.

    6 Octal literals written ax 8x377 not 0377

If hexadecimal literals are still written as 0xFFFF, I can imagine
that "8x" and "0x" might be a little difficult to distinguish.  If you
drop C's "0x" syntax and just use "16x", that wouldn't be an issue.

    8 Numeric literals can have separators (either _ ' or `)

I can imagine ' and ` causing syntactic ambiguities.  If I were
designing a language (I'm not), I'd just use a single character; I don't
see providing multiple choices as an advantage in this particular case.

    9 Large integer literals don't need L LL or other suffixes except to
      designate type

How does that differ from C?  For example, an unsuffixed decimal
constant is of type int, long int, or long long int, depending on its
value.

    11 Exact-width integer types can be specified via syntax, as int:N
       for N bits, or int*N for N bytes wide.

Another case where multiple options are IMHO not helpful.  I'd probably
stick to the ":N" syntax.  (I presume N has to be a constant?)

    14 Exact-width floating point types available, eg. real:64 or real*8
       (this comes from an old Fortran dialect)

The size of a floating-point type doesn't completely describe it.  It's
also not the most important characteristic for a programmer.  I care
about precision and range; the compiler can figure out how many bytes
are needed.

Ada's syntax is, for example:
type Real is digits 6 range -1.0e100 .. +1.0e100;

But whatever syntax you have is probably just going to select one of two
or three supported formats.

    17 Arrays are value types (however there are no array ops other than
       assignment and equality, same as structs)

Arrays are tricky.  What happens if you try to compare or assign two
arrays of different lengths?

    19 Arrays and pointers are distinct types. To pass an array to a
       function, usually a pointer to the array type (not element type)
       is needed

Arrays and pointers are distinct types in C.

If a function has a parameter of pointer-to-array type, can it accept a
pointer to an array of any length?  Are there distinct types "pointer to
array of ints" and "pointer to array of 42 ints"?

    40 Loop controls 'break' and 'continue' (I call them 'exit' and
       'next') work with multi-level loops using an index

If you support calling C functions, making "exit" a keyword might cause
problems.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/21/2015 9:33:05 PM
On 21/05/15 23:33, Keith Thompson wrote:
> bart4858@gmail.com writes:
> [...]
>> (Actually, this is a list I wrote up some time ago, of 90-odd
>> features of my own C-class language which I considered
>> improvements from their C equivalents or omissions:
>>
>> http://pastebin.com/9ZdmGd68
>>
>> The resulting language is still low-level (it does what C does).
>> Most features are straightforward to implement, although not all
>> can be retro-fitted into C easily.
>>
>> The 'switch' features mentioned in this thread are #45 and #46
>> in the list.)
> 
> Interesting.  I like quite a few of these features.
> 
> A few comments:
> 
>     5 Binary literals can be written as 2x1101 or 1101B (C will have
>       this soon)
> 
> I haven't heard of any proposals to add binary integer constants
> to C; have you?  C++ has added them with the syntax `0b1101` or
> `0B1101`.  I'd like to see C adopt this.

I don't think I have seen proposals to at 0b1101 constants to the C
standards, but it is a very common extension which has been supported in
gcc and clang (and many embedded compilers) for many years.  Since it is
a supported /de facto/ standard in C, and a standard in C++, it will
probably turn up in the C standards eventually - it should be just a
matter of paperwork.

I certainly don't see competing formats like 2x1101 or 1101B being used.

> 
>     6 Octal literals written ax 8x377 not 0377
> 
> If hexadecimal literals are still written as 0xFFFF, I can imagine
> that "8x" and "0x" might be a little difficult to distinguish.  If you
> drop C's "0x" syntax and just use "16x", that wouldn't be an issue.

Personally, I would rather see octal support being dropped entirely.  I
know that's not going to happen - but hopefully compilers will one day
have optional warnings to detect them (I know of some compilers that
already do).

> 
>     8 Numeric literals can have separators (either _ ' or `)
> 
> I can imagine ' and ` causing syntactic ambiguities.  If I were
> designing a language (I'm not), I'd just use a single character; I don't
> see providing multiple choices as an advantage in this particular case.
> 

C++14 has ' as a digit separator.  Many people would have preferred _,
but that's the one that got chosen.  It is not unlikely that C (or at
least, C compilers) will follow.

>     9 Large integer literals don't need L LL or other suffixes except to
>       designate type
> 
> How does that differ from C?  For example, an unsuffixed decimal
> constant is of type int, long int, or long long int, depending on its
> value.
> 
>     11 Exact-width integer types can be specified via syntax, as int:N
>        for N bits, or int*N for N bytes wide.
> 
> Another case where multiple options are IMHO not helpful.  I'd probably
> stick to the ":N" syntax.  (I presume N has to be a constant?)
> 
>     14 Exact-width floating point types available, eg. real:64 or real*8
>        (this comes from an old Fortran dialect)
> 
> The size of a floating-point type doesn't completely describe it.  It's
> also not the most important characteristic for a programmer.  I care
> about precision and range; the compiler can figure out how many bytes
> are needed.

For some uses, exact sizes /are/ important.  A good option would be to
have a <stdfloat.h> like <stdint.h>, defining float32_t and float64_t
(and optionally float128_t, or others that people might find useful)
with the requirement that they are the given size, and in standard IEEE
formats (just as int32_t must be two's complement as well as being 32
bits).  That would cover both requirements.

> 
> Ada's syntax is, for example:
> type Real is digits 6 range -1.0e100 .. +1.0e100;
> 
> But whatever syntax you have is probably just going to select one of two
> or three supported formats.
> 
>     17 Arrays are value types (however there are no array ops other than
>        assignment and equality, same as structs)
> 
> Arrays are tricky.  What happens if you try to compare or assign two
> arrays of different lengths?
> 
>     19 Arrays and pointers are distinct types. To pass an array to a
>        function, usually a pointer to the array type (not element type)
>        is needed
> 
> Arrays and pointers are distinct types in C.
> 
> If a function has a parameter of pointer-to-array type, can it accept a
> pointer to an array of any length?  Are there distinct types "pointer to
> array of ints" and "pointer to array of 42 ints"?
> 
>     40 Loop controls 'break' and 'continue' (I call them 'exit' and
>        'next') work with multi-level loops using an index
> 
> If you support calling C functions, making "exit" a keyword might cause
> problems.
> 

0
David
5/22/2015 7:11:36 AM
On 21/05/2015 22:33, Keith Thompson wrote:
> bart4858@gmail.com writes:

>> (Actually, this is a list I wrote up some time ago, of 90-odd
>> features of my own C-class language which I considered
>> improvements from their C equivalents or omissions:
>>
>> http://pastebin.com/9ZdmGd68

> Interesting.  I like quite a few of these features.

Thanks.

> A few comments:
>
>      5 Binary literals can be written as 2x1101 or 1101B (C will have
>        this soon)
>
> I haven't heard of any proposals to add binary integer constants
> to C; have you?  C++ has added them with the syntax `0b1101` or
> `0B1101`.  I'd like to see C adopt this.
>
>      6 Octal literals written ax 8x377 not 0377
>
> If hexadecimal literals are still written as 0xFFFF, I can imagine
> that "8x" and "0x" might be a little difficult to distinguish.  If you
> drop C's "0x" syntax and just use "16x", that wouldn't be an issue.

Octal constants are rare enough that you might be right that at first 
glance, it can be mistaken for 0x. But I think that C's leading zero 
syntax is going to be worse for catching people out.

Maybe octals can be highlighted with separators: 8x_377 or 8x'377' for 
example.

16x is a possibility, but 0x is near universal for hex. (I used to use a 
H suffix but switched to 0x.)

>      8 Numeric literals can have separators (either _ ' or `)
>
> I can imagine ' and ` causing syntactic ambiguities.  If I were
> designing a language (I'm not), I'd just use a single character; I don't
> see providing multiple choices as an advantage in this particular case.

Yeah, but which one? Allowing several means that data source can be 
pasted directly from other languages; some might use ', others _

>
>      9 Large integer literals don't need L LL or other suffixes except to
>        designate type
>
> How does that differ from C?  For example, an unsuffixed decimal
> constant is of type int, long int, or long long int, depending on its
> value.

TBH I don't know exactly what the rules are in C. I know that sometimes 
odd compiler errors go away if I put a suffix in.

>
>      11 Exact-width integer types can be specified via syntax, as int:N
>         for N bits, or int*N for N bytes wide.
>
> Another case where multiple options are IMHO not helpful.  I'd probably
> stick to the ":N" syntax.  (I presume N has to be a constant?)

Again, this allows people to choose a style they prefer. So long as it's 
consistent within the same code, I can't see that it's a problem. So a 
32-bit signed int can be expressed as any of:

  int:32 int*4 int32 i32

Don't forget that C also allows:

  int, signed int, int signed, signed and int32_t

(with even more variations if you introduce long, short, unsigned and 
const.)

 > (I presume N has to be a constant?)

It has to be a number, macro, or constant expression. But usually a 
colloquial name is used ('byte'), or a name without : or * (int16).

>      14 Exact-width floating point types available, eg. real:64 or real*8
>         (this comes from an old Fortran dialect)
>
> The size of a floating-point type doesn't completely describe it.  It's
> also not the most important characteristic for a programmer.  I care
> about precision and range; the compiler can figure out how many bytes
> are needed.

> Ada's syntax is, for example:
> type Real is digits 6 range -1.0e100 .. +1.0e100;
>
> But whatever syntax you have is probably just going to select one of two
> or three supported formats.

It's too easy designing something and ending up with something like Ada! 
And I know my limits. With floating point, in practice you're going to 
get 32 or 64 bit IEEE floats. And you will normally just use 'real' 
(64-bit) unless the design or interface calls for a short version.

>      17 Arrays are value types (however there are no array ops other than
>         assignment and equality, same as structs)
>
> Arrays are tricky.  What happens if you try to compare or assign two
> arrays of different lengths?

You just get a type error. (However lower bounds have to match too; I 
might look into that.)

>      19 Arrays and pointers are distinct types. To pass an array to a
>         function, usually a pointer to the array type (not element type)
>         is needed
>
> Arrays and pointers are distinct types in C.
>
> If a function has a parameter of pointer-to-array type, can it accept a
> pointer to an array of any length?  Are there distinct types "pointer to
> array of ints" and "pointer to array of 42 ints"?

The array is just unbounded like the equivalent (but rarely used) C version:

   proc fn(ref[]int a) = ...  # pointer to array of int

(I won't attempt the C version because I will get it wrong.)


>      40 Loop controls 'break' and 'continue' (I call them 'exit' and
>         'next') work with multi-level loops using an index
>
> If you support calling C functions, making "exit" a keyword might cause
> problems.

That's part of a larger set of problems with trying to use C libraries, 
where I have to duplicate what's in the corresponding header files. 
Sometimes it's easier to write an intermediate module in actual C, but 
with a simpler interface that I specify. Then I can call exit() 
indirectly via my_exit() in the C helper module.

If it's just name clashes with C runtime functions, sometimes I can 
write clib.printf() to distinguish from a local printf. But with 'exit' 
being a keyword, that doesn't work. However defining it as _exit() seems 
to work under Windows.

-- 
Bartc
0
Bartc
5/22/2015 1:16:37 PM
Keith Thompson <kst-u@mib.org> writes:

> Tim Rentsch <txr@alumni.caltech.edu> writes:
>> Keith Thompson <kst-u@mib.org> writes:
>>> Tim Rentsch <txr@alumni.caltech.edu> writes:
>
> [...]
>
>>> The standard integer types are _Bool and signed and unsigned char,
>>> short, int, long, and long long.  The integer types are those plus
>>> any extended integer types.  (Plain char doesn't seem to fit into
>>> that scheme, which I find odd.)
>>
>> Eh?  The integer types are char, unsigned integer types, signed
>> integer types, and enumeration types.  You must be thinking of
>> something else.
>
> Just a careless mistake.  I think I got the details right in my recent
> followup to Richard.

Yes I see that now, apparently you posted that about the
same time I was writing my message.
0
Tim
5/22/2015 1:32:36 PM
Keith Thompson <kst-u@mib.org> writes:

> Richard Heathfield <rjh@cpax.org.uk> writes:
>> On 12/05/15 18:11, Keith Thompson wrote:
>>
>> <snip>
>>
>>> The standard integer types are _Bool and signed and unsigned char,
>>> short, int, long, and long long.  The integer types are those plus
>>> any extended integer types.  (Plain char doesn't seem to fit into
>>> that scheme, which I find odd.)
>>
>> It /is/ odd, but for fairly unremarkable historical reasons.  A
>> char has to be able to store, with a non-negative value, every
>> character in the basic execution character set.  EBCDIC has some
>> fairly important characters with code points greater than 127.  On
>> an EBCDIC system in which 8-bit char is considered desirable,
>> implementations are more or less forced to make char unsigned.
>>
>> We have some choices:
>>
>> 1) outlaw EBCDIC;
>> 2) force ASCII-based implementations to make char unsigned;
>> 3) make char at least 9 bits rather than 8;
>> 4) live with a weird char, like we do now.
>>
>> If we do (1), IBM will sulk.
>>
>> If we do (2), GNU, Microsoft, etc will sulk (and we'll introduce a
>> different anomaly, that of char being unsigned by default whereas
>> short, int, long, and long long, are signed by default.
>>
>> If we do (3), *everybody* will sulk.
>>
>> But if we go for (4), we don't have to change anything, which is
>> always a popular option.
>
> Sure, I understand all that, and I'm not suggesting a change.  (I
> wouldn't mind requiring plain char to be unsigned, but that's
> probably not going to happen, and it could have performance
> implications for some systems.Z)
>
> If I had been writing section 6.2.5 of the standard, I'd include
> plain "char" in the list of standard integer types.  Clearly it's
> standard, and integer, and a type, so excluding it seems odd.
>
> Currently, the standard defines the *standard signed integer types*
> as:
>     signed char
>     short
>     int
>     long
>     long long
> the *standard unsigned integer types* as:
>     _Bool
>     unsigned char
>     unsigned short
>     unsigned int
>     unsigned long
>     unsigned long long
> and the *standard integer types* as the standard signed integer types
> and the standard unsigned integer types.
>
> 6.2.5p17 defines the category of *integer types* as the signed and
> unsigned integer types, plain char, and the enumerated types.  So
> plain char is an *integer type* but not a *standard ineteger
> type*.  Furthermore, though char is an integer type and is either
> signed or unsigned, it is neither a *signed integer type* nor an
> *unsigned integer type*.
>
> I don't think the standard uses these categories in a way that
> causes real problems, but I find it unnecessarily confusing.
>
> I suggest that it would have made more sense to say that plain
> char is either one of the standard signed integer types or one of
> the standard unsigned integer types (an implementation-defined
> choice) *or* to define the *standard integer types* as the
> standard signed integer types, the standard unsigned integer
> types, and plain char.

Without meaning to argue, let me offer an argument that how the
Standard does this now may be a better choice.

First, having the terms '[standard] [un]signed integer types'
depend on the signedness of 'char' would make the definitions
awkward.  Having the sets of standard [un]signed integer types be
fixed is simpler.

Second, perhaps more important, 'char' isn't really used (or
shouldn't be used, anyway) the same way that types like 'int'
are.  In this respect, 'char' is more like an enumeration than a
{signed,unsigned} integer type.  From that point of view it makes
sense to lump it in with enumerations under the umbrella term
'integer types', but not specifically as either a signed or
unsigned integer type, in the same way and for much the same
reasons that enumeration types are not listed in those categories
even though each individual enumeration type is either signed
or unsigned (ie, by virtue of being compatible with a type that
has signed or unsigned behavior).  So there is some sense in
'char' being part of 'integer types' but not part of either
'signed integer types' or 'unsigned integer types'.

I agree the existing classification is a bit surprising.  However
the question is not IMO clear cut either way, so it might be
better to reshape our thinking so how the Standard does things
now makes more sense, especially since it is highly unlikely
that is ever going to change.
0
Tim
5/22/2015 2:37:16 PM
David Brown <david.brown@hesbynett.no> writes:
> On 21/05/15 23:33, Keith Thompson wrote:
[...]
>>     6 Octal literals written ax 8x377 not 0377
>> 
>> If hexadecimal literals are still written as 0xFFFF, I can imagine
>> that "8x" and "0x" might be a little difficult to distinguish.  If you
>> drop C's "0x" syntax and just use "16x", that wouldn't be an issue.
>
> Personally, I would rather see octal support being dropped entirely.  I
> know that's not going to happen - but hopefully compilers will one day
> have optional warnings to detect them (I know of some compilers that
> already do).

I'd miss being able to use 0.  8-)}

>>     8 Numeric literals can have separators (either _ ' or `)
>> 
>> I can imagine ' and ` causing syntactic ambiguities.  If I were
>> designing a language (I'm not), I'd just use a single character; I don't
>> see providing multiple choices as an advantage in this particular case.
>> 
>
> C++14 has ' as a digit separator.  Many people would have preferred _,
> but that's the one that got chosen.  It is not unlikely that C (or at
> least, C compilers) will follow.

I think C++14 uses ' because it uses _ for user-defined literals.  IMHO
that's a pity; there's ample precedent for using _ as a digit separator,
and I would have liked to see the same thing in C and C++.

[...]

>>     14 Exact-width floating point types available, eg. real:64 or real*8
>>        (this comes from an old Fortran dialect)
>> 
>> The size of a floating-point type doesn't completely describe it.  It's
>> also not the most important characteristic for a programmer.  I care
>> about precision and range; the compiler can figure out how many bytes
>> are needed.
>
> For some uses, exact sizes /are/ important.  A good option would be to
> have a <stdfloat.h> like <stdint.h>, defining float32_t and float64_t
> (and optionally float128_t, or others that people might find useful)
> with the requirement that they are the given size, and in standard IEEE
> formats (just as int32_t must be two's complement as well as being 32
> bits).  That would cover both requirements.

In that case it's not the size that's important, it's the size *and the
representation*.

In C implementations that support Annex F, float and double are reuqired
to be IEEE 32-bit and 64-bit, respectively.  The requirements for long
double are less strict.

It might make sense to have a <stdfloat.h> header that defines types
guaranteed to match the IEEE formats.

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/22/2015 3:15:13 PM
Bartc <bc@freeuk.com> writes:
> On 21/05/2015 22:33, Keith Thompson wrote:
[...]
>>      8 Numeric literals can have separators (either _ ' or `)
>>
>> I can imagine ' and ` causing syntactic ambiguities.  If I were
>> designing a language (I'm not), I'd just use a single character; I don't
>> see providing multiple choices as an advantage in this particular case.
>
> Yeah, but which one? Allowing several means that data source can be
> pasted directly from other languages; some might use ', others _

If there are three different ways to write a numeric literal,
programmers are going to use all three.  In my opinion, that's not
a good thing, and the minor advantage of the added flexibility is
IMHO outweighed by the potential confusion.

A minor point; Markdown uses the backtick to delimit code.  Using it as
part of your language syntax could make it slightly more difficult to
discuss in some forums.

Of course it's your language, and my opinion probably shouldn't
count for much.

>>      9 Large integer literals don't need L LL or other suffixes except to
>>        designate type
>>
>> How does that differ from C?  For example, an unsuffixed decimal
>> constant is of type int, long int, or long long int, depending on its
>> value.
>
> TBH I don't know exactly what the rules are in C. I know that
> sometimes odd compiler errors go away if I put a suffix in.

It can be necessary in expressions.  For example `1000 * 1000` will
overflow if int is 16 bits; you can write `1000L * 1000L` to avoid that.

>>      11 Exact-width integer types can be specified via syntax, as int:N
>>         for N bits, or int*N for N bytes wide.
>>
>> Another case where multiple options are IMHO not helpful.  I'd probably
>> stick to the ":N" syntax.  (I presume N has to be a constant?)
>
> Again, this allows people to choose a style they prefer. So long as
> it's consistent within the same code, I can't see that it's a
> problem.

It *won't* always be consistent within the same code, unless each
program is written by one and only one person who never changes his or
her mind.

>          So a 32-bit signed int can be expressed as any of:
>
>  int:32 int*4 int32 i32
>
> Don't forget that C also allows:
>
>  int, signed int, int signed, signed and int32_t
>
> (with even more variations if you introduce long, short, unsigned and
> const.)

int32_t doesn't belong in that list.

Again, in my opinion a language shouldn't provide multiple ways to
express the same thing *unless* there's some real expressive advantage
to doing so.  If there's only one way to do it, code is easier to read.

(Ironically, I'm a big fan of Perl, whose motto is TMTOWTDI, "There's
more than one way to do it".  I never claimed to be entirely
consistent.)

[...]

> It's too easy designing something and ending up with something like
> Ada!

I think the designers of Ada would disagree with that!

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/22/2015 3:38:15 PM
On 22-May-15 10:15, Keith Thompson wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> C++14 has ' as a digit separator.  Many people would have preferred
>> _, but that's the one that got chosen.  It is not unlikely that C
>> (or at least, C compilers) will follow.
> 
> I think C++14 uses ' because it uses _ for user-defined literals.
> IMHO that's a pity; there's ample precedent for using _ as a digit
> separator, and I would have liked to see the same thing in C and
> C++.

There's plenty of room for debate about which new features of C++ that
we should backport into C, but once we've decided to do so, IMHO C
should, to the extent possible, strive to be compatible with the C-like
subset of C++ (and vice versa, when applicable).

For the record, I'm also in favor of cleaning up existing differences,
e.g. type of character literals, constness of string literals, etc., not
because any of them are important in and of themselves but because any
differences make it harder for the many people who need to compile the
same code (typically header files, but not always) in both languages.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/22/2015 4:47:36 PM
On 22/05/2015 16:38, Keith Thompson wrote:
> Bartc <bc@freeuk.com> writes:
>> On 21/05/2015 22:33, Keith Thompson wrote:
> [...]
>>>       8 Numeric literals can have separators (either _ ' or `)
>>>
>>> I can imagine ' and ` causing syntactic ambiguities.  If I were
>>> designing a language (I'm not), I'd just use a single character; I don't
>>> see providing multiple choices as an advantage in this particular case.
>>
>> Yeah, but which one? Allowing several means that data source can be
>> pasted directly from other languages; some might use ', others _
>
> If there are three different ways to write a numeric literal,
> programmers are going to use all three.

I think I've used two separator styles in the same number, the extra one 
to superimpose some extra structure or meaning on top of the normal 3- 
and 4-digit groups. People will take advantage however even with a 
single separator character, unless you impose restrictions:

0__________________1_2__3___4____5_____6______7_______8

> A minor point; Markdown uses the backtick to delimit code.  Using it as
> part of your language syntax could make it slightly more difficult to
> discuss in some forums.

I can't remember why I allowed the back-tick, as single quote is also a 
available without shifting.

>>           So a 32-bit signed int can be expressed as any of:
>>
>>   int:32 int*4 int32 i32
>>
>> Don't forget that C also allows:
>>
>>   int, signed int, int signed, signed and int32_t
>>
>> (with even more variations if you introduce long, short, unsigned and
>> const.)
>
> int32_t doesn't belong in that list.

Well, OK, but the point is that signed, unsigned, const, long, short, 
int and so on can appear in any order. Sometimes you can leave something 
out, so that 'signed short int', 'short signed int' and just 'short' all 
mean the same thing.

So if you're talking about different ways of specifying the same thing, 
C has got it down to a fine art!

What I'm doing is providing an orthogonal set of different styles, you 
just choose the column you like:

word:8   word*1  word8   u8     byte
word:16  word*2  word16  u16    halfword (I've just made this up..)
word:32  word*4  word32  u32    word
word:64  word*8  word64  u64    dword
int:8    int*1   int8    i8
int:16   int*2   int16   i16    halfint (..and this)
int:32   int*4   int32   i32    int
int:64   int*8   int64   i64    dint
real:32  real*4  real32  r32
real:64  real*8  real64  r64    real
char:8   char*1  char8   c8     char
char:16  char*2  char16  c16
char:32  char*4  char32  c32
bit:1    bit*1   bit     (u1)   bit   (also word:1, etc)
bit:2    bit*2   bit2    (u2)
bit:4    bit*4   bit4    (u4)

But remember you will only choose from the first four columns when an 
exact-width type is needed. Mostly names from the last column will be used.

>> It's too easy designing something and ending up with something like
>> Ada!
>
> I think the designers of Ada would disagree with that!

I wasn't criticising it. I mean if you take some things to their logical 
conclusion, you might end up with something as comprehensive as Ada. So 
specifying types by ranges, you will eventually end up with Ada's type 
system if done properly.

I admire Ada but this is intended to be a low level language with a 
simple type system.

-- 
Bartc
0
Bartc
5/22/2015 5:06:09 PM
Stephen Sprunk <stephen@sprunk.org> writes:
> On 22-May-15 10:15, Keith Thompson wrote:
>> David Brown <david.brown@hesbynett.no> writes:
>>> C++14 has ' as a digit separator.  Many people would have preferred
>>> _, but that's the one that got chosen.  It is not unlikely that C
>>> (or at least, C compilers) will follow.
>> 
>> I think C++14 uses ' because it uses _ for user-defined literals.
>> IMHO that's a pity; there's ample precedent for using _ as a digit
>> separator, and I would have liked to see the same thing in C and
>> C++.
>
> There's plenty of room for debate about which new features of C++ that
> we should backport into C, but once we've decided to do so, IMHO C
> should, to the extent possible, strive to be compatible with the C-like
> subset of C++ (and vice versa, when applicable).

I agree.

> For the record, I'm also in favor of cleaning up existing differences,
> e.g. type of character literals, constness of string literals, etc., not
> because any of them are important in and of themselves but because any
> differences make it harder for the many people who need to compile the
> same code (typically header files, but not always) in both languages.

I'd like string literals to be const, but it would break some existing
code.  Dropping implicit int did that too, but it was easier to fix the
code.  If the next standard made string literals const, a lot of users
would either stick with the older standard or use compilers that are
non-conforming to the new standard.

Changing the type of character literals wouldn't break as much code --
but there aren't many cases where it matters.  In most cases, character
literals are implicitly converted to the required type anyway.  (It
matters in C++ because of overloading.)

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/22/2015 7:11:01 PM
Keith Thompson wrote:
>
> I'd like string literals to be const, but it would break some existing
> code.  Dropping implicit int did that too, but it was easier to fix the
> code.  If the next standard made string literals const, a lot of users
> would either stick with the older standard or use compilers that are
> non-conforming to the new standard.

Wouldn't  making string literals const simply cause already broken code 
to fail to compile, rather than break currently correct code?

-- 
Ian Collins
0
Ian
5/22/2015 7:26:40 PM
Bartc <bc@freeuk.com> writes:
> On 22/05/2015 16:38, Keith Thompson wrote:
>> Bartc <bc@freeuk.com> writes:
[...]
>>> Don't forget that C also allows:
>>>
>>>   int, signed int, int signed, signed and int32_t
>>>
>>> (with even more variations if you introduce long, short, unsigned and
>>> const.)
>>
>> int32_t doesn't belong in that list.
>
> Well, OK, but the point is that signed, unsigned, const, long, short,
> int and so on can appear in any order. Sometimes you can leave
> something out, so that 'signed short int', 'short signed int' and just
> short' all mean the same thing.
>
> So if you're talking about different ways of specifying the same
> thing, C has got it down to a fine art!

Sure -- but why repeat C's mistakes?

> What I'm doing is providing an orthogonal set of different styles, you
> just choose the column you like:
>
> word:8   word*1  word8   u8     byte
> word:16  word*2  word16  u16    halfword (I've just made this up..)
> word:32  word*4  word32  u32    word
> word:64  word*8  word64  u64    dword
> int:8    int*1   int8    i8
> int:16   int*2   int16   i16    halfint (..and this)
> int:32   int*4   int32   i32    int
> int:64   int*8   int64   i64    dint
> real:32  real*4  real32  r32
> real:64  real*8  real64  r64    real
> char:8   char*1  char8   c8     char
> char:16  char*2  char16  c16
> char:32  char*4  char32  c32
> bit:1    bit*1   bit     (u1)   bit   (also word:1, etc)
> bit:2    bit*2   bit2    (u2)
> bit:4    bit*4   bit4    (u4)
>
> But remember you will only choose from the first four columns when an
> exact-width type is needed. Mostly names from the last column will be
> used.

As I said, it's your language.  I just don't see any benefit, and some
cost in potential confusion, from permitting both int:32 and int*4 with
exactly the same meaning.  (An I correct in assuming that your language,
unlike C, assumes 8-bit bytes?)

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/22/2015 8:56:01 PM
Ian Collins <ian-news@hotmail.com> writes:
> Keith Thompson wrote:
>> I'd like string literals to be const, but it would break some existing
>> code.  Dropping implicit int did that too, but it was easier to fix the
>> code.  If the next standard made string literals const, a lot of users
>> would either stick with the older standard or use compilers that are
>> non-conforming to the new standard.
>
> Wouldn't  making string literals const simply cause already broken
> code to fail to compile, rather than break currently correct code?

Not entirely.  The code that would be broken is probably poor style, but
it's perfectly valid.

This program:

#include <stdio.h>

void print_string(char *s) {
    puts(s);
}

int main(void) {
    print_string("hello");
    return 0;
}

is strictly conforming under C90, C99, and C11 (I'm assuming the
possibility of puts() failing doesn't affect strict conformance).
It would be better to define the parameter `s` as `const char *s`, but
as long as the function *doesn't* modify the string there's no
actual problem.

(Prior to C89, there was no "const" keyword; making string literals
const would have broken all pre-ANSI programs that passed string
literals to functions.)

gcc's "-Wwrite-strings" option makes it warn about the above program.
Combined with "-pedantic-errors", it makes gcc non-conforming.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/22/2015 9:14:21 PM
Keith Thompson wrote:
> Ian Collins <ian-news@hotmail.com> writes:
>> Keith Thompson wrote:
>>> I'd like string literals to be const, but it would break some existing
>>> code.  Dropping implicit int did that too, but it was easier to fix the
>>> code.  If the next standard made string literals const, a lot of users
>>> would either stick with the older standard or use compilers that are
>>> non-conforming to the new standard.
>>
>> Wouldn't  making string literals const simply cause already broken
>> code to fail to compile, rather than break currently correct code?
>
> Not entirely.  The code that would be broken is probably poor style, but
> it's perfectly valid.
>
> This program:
>
> #include <stdio.h>
>
> void print_string(char *s) {
>      puts(s);
> }
>
> int main(void) {
>      print_string("hello");
>      return 0;
> }
>
> is strictly conforming under C90, C99, and C11 (I'm assuming the
> possibility of puts() failing doesn't affect strict conformance).
> It would be better to define the parameter `s` as `const char *s`, but
> as long as the function *doesn't* modify the string there's no
> actual problem.
>
> (Prior to C89, there was no "const" keyword; making string literals
> const would have broken all pre-ANSI programs that passed string
> literals to functions.)
>
> gcc's "-Wwrite-strings" option makes it warn about the above program.
> Combined with "-pedantic-errors", it makes gcc non-conforming.

Non-conforming but a sensible combination none the less!  I remember the 
amount of cruft that was flushed out of the old Solaris code bases when 
the compilers changed to make string literals read only.

-- 
Ian Collins
0
Ian
5/22/2015 9:28:45 PM
On 22/05/2015 21:56, Keith Thompson wrote:
> Bartc <bc@freeuk.com> writes:

>> What I'm doing is providing an orthogonal set of different styles, you
>> just choose the column you like:

>> int:32   int*4   int32   i32    int

>> But remember you will only choose from the first four columns when an
>> exact-width type is needed. Mostly names from the last column will be
>> used.
>
> As I said, it's your language.  I just don't see any benefit, and some
> cost in potential confusion, from permitting both int:32 and int*4 with
> exactly the same meaning.

Sometimes you put in a bunch of features, some of which get used, and 
others don't, after which they are quietly dropped. In this case they 
will probably stay in but not be used (I don't use them any more), other 
than to underpin designations such as int32 and i32.

> (An I correct in assuming that your language,
> unlike C, assumes 8-bit bytes?)

I guess so. I think so do C# and Java, and doubtless many others 
(including the Fortran version that 'integer*4' etc came from).

Perhaps a fork of C could do the same and maybe benefit from not beating 
around the bush about what exactly a byte is. It is very refreshing 
seeing a language reference list the primitive types together with the 
exact widths of each.

(My first compiler project was on the 'pdp-10' 36-bit machine which 
didn't really have bytes, but that was a long time ago and such systems 
are pretty much dead (I hope).)

-- 
Bartc
0
Bartc
5/22/2015 10:38:27 PM
On Fri, 22 May 2015 14:16:37 +0100, Bartc <bc@freeuk.com> wrote:

>On 21/05/2015 22:33, Keith Thompson wrote:
>> bart4858@gmail.com writes:
>
>>> (Actually, this is a list I wrote up some time ago, of 90-odd
>>> features of my own C-class language which I considered
>>> improvements from their C equivalents or omissions:
>>>
>>> http://pastebin.com/9ZdmGd68
>
>> Interesting.  I like quite a few of these features.
>
>Thanks.
>
>> A few comments:
>>
>>      5 Binary literals can be written as 2x1101 or 1101B (C will have
>>        this soon)
>>
>> I haven't heard of any proposals to add binary integer constants
>> to C; have you?  C++ has added them with the syntax `0b1101` or
>> `0B1101`.  I'd like to see C adopt this.
>>
>>      6 Octal literals written ax 8x377 not 0377
>>
>> If hexadecimal literals are still written as 0xFFFF, I can imagine
>> that "8x" and "0x" might be a little difficult to distinguish.  If you
>> drop C's "0x" syntax and just use "16x", that wouldn't be an issue.
>
>Octal constants are rare enough that you might be right that at first 
>glance, it can be mistaken for 0x. But I think that C's leading zero 
>syntax is going to be worse for catching people out.
>
>Maybe octals can be highlighted with separators: 8x_377 or 8x'377' for 
>example.
>
>16x is a possibility, but 0x is near universal for hex. (I used to use a 
>H suffix but switched to 0x.)


0x is certainly common for hex, but is far from universal.  Consider
the list of methods in:

https://en.wikipedia.org/wiki/Hexadecimal

But I've always figured for C adding 0b and 0o for binary and octal
would be a good idea.  I'd also depreciate the leading-zero-only form
of octal, although I have to assume someone is actually using that. In
my personal experience that's never been anything but the inadvertent
introduction of a bug.
0
Robert
5/22/2015 11:28:40 PM
Robert Wessel <robertwessel2@yahoo.com> writes:
[...]
> 0x is certainly common for hex, but is far from universal.  Consider
> the list of methods in:
>
> https://en.wikipedia.org/wiki/Hexadecimal
>
> But I've always figured for C adding 0b and 0o for binary and octal
> would be a good idea.  I'd also depreciate the leading-zero-only form
> of octal, although I have to assume someone is actually using that. In
> my personal experience that's never been anything but the inadvertent
> introduction of a bug.

You've used octal constants, unless you've managed to avoid using the
constant 0.  Of course if the current leading-zero notation were
dropped, the grammar would be changed to make 0 a decimal constant.  (Or
it could be base 42; that would work too.)

Octal is commonly used for the mode argument to the POSIX chmod() and
open() functions.  Permissions are encoded in binary in groups of 3, so
for example 0750 means read/write/execute for owner, read/execute for
group, and no permissions for others.  Such code would have to be
changed to use the new notation.

I'm not sure "0o" is the best choice, unless it's restricted to lower
case; 0O750 is too difficult to read.  Perhaps 0C, reminiscent of the
first two letters of "octal"?

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/22/2015 11:44:40 PM
Ian Collins wrote:
> Keith Thompson wrote:
>> Ian Collins <ian-news@hotmail.com> writes:
>>> Keith Thompson wrote:
>>>> I'd like string literals to be const, but it would break some existing
>>>> code.  Dropping implicit int did that too, but it was easier to fix the
>>>> code.  If the next standard made string literals const, a lot of users
>>>> would either stick with the older standard or use compilers that are
>>>> non-conforming to the new standard.
>>>
>>> Wouldn't  making string literals const simply cause already broken
>>> code to fail to compile, rather than break currently correct code?
>>
>> Not entirely.  The code that would be broken is probably poor style, but
>> it's perfectly valid.
>>
>> This program:
>>
>> #include <stdio.h>
>>
>> void print_string(char *s) {
>>      puts(s);
>> }
>>
>> int main(void) {
>>      print_string("hello");
>>      return 0;
>> }
>>
>> is strictly conforming under C90, C99, and C11 (I'm assuming the
>> possibility of puts() failing doesn't affect strict conformance).
>> It would be better to define the parameter `s` as `const char *s`, but
>> as long as the function *doesn't* modify the string there's no
>> actual problem.
>>
>> (Prior to C89, there was no "const" keyword; making string literals
>> const would have broken all pre-ANSI programs that passed string
>> literals to functions.)
>>
>> gcc's "-Wwrite-strings" option makes it warn about the above program.
>> Combined with "-pedantic-errors", it makes gcc non-conforming.
>
> Non-conforming but a sensible combination none the less!

Yes.

> I remember the
> amount of cruft that was flushed out of the old Solaris code bases when
> the compilers changed to make string literals read only.
>

I have to wonder how many defects were found in that way.

Maybe it's a bit rude, but I consider the mixing of buffers
and literals conceptually to be an excellent way to write lots
of Heisenbugs. I've fixed lots of those; you know they happened,
and nobody seems to have caught them at it.

To wit:

char z[] = "Hello ";
printf("%s\n",strcat(z," in there."));

That may not even crash. It *should*, but you never know.

-- 
Les Cargill

-- 
Les Cargill



0
Les
5/23/2015 1:01:01 AM
bart4858@gmail.com wrote:

> On Wednesday, 20 May 2015 16:59:31 UTC+1, Stephen Sprunk  wrote:
> > On 17-May-15 14:28, Bartc wrote:
> 
> > > Anyway using a macro such as UPPER is one approach that can work 
> > > provided it's reliably defined, if there is no way to avoid
> > > enumerating every single character. (Maybe you've noticed there's a
> > > letter missing.) However this obviously needs to be done; it would be
> > > better if it was built-in.
> > 
> > It doesn't seem to be so "obvious" to those who work with languages
> > other than English, i.e. the majority of humanity.
> 
> I've already posted the various ways in which that 26-letter alphabet dominates.

Yes, but you were wrong then, too.

Even in English; at least, if you want to read poetry. Or stories about
cag�d whales that know nothing of mighty deeps.

Richard
0
raltbos
5/23/2015 10:15:26 AM
Keith Thompson <kst-u@mib.org> wrote:

> bart4858@gmail.com writes:

> > There's also a reason why it's the set supported in ASCII (now the
> > base-set of Unicode and the only letters that don't need escapes in
> > UTF8).

> > That's in the English speaking world, but it's still a very dominant
> > language, although other languages are available. But if someone is
> > coding in English, then why shouldn't they be able to express A-Z
> > succinctly?
> 
> Because C doesn't support case ranges.
> 
> You think it should.  We get it.  Speaking only for myself, I don't
> necessarily disagree.
> 
> But convincing us here in comp.lang.c that C *should* support case
> ranges is neither necessary nor sufficient to change the language.

And more to the point, it has absolutely _nothing_ to do with limiting
the language's support to ASCII. You can do the former without the
latter, or even /vice versa/.

Richard
0
raltbos
5/23/2015 10:17:04 AM
On 23/05/2015 11:15, Richard Bos wrote:
> bart4858@gmail.com wrote:
>
>> On Wednesday, 20 May 2015 16:59:31 UTC+1, Stephen Sprunk  wrote:
>>> On 17-May-15 14:28, Bartc wrote:
>>
>>>> Anyway using a macro such as UPPER is one approach that can work
>>>> provided it's reliably defined, if there is no way to avoid
>>>> enumerating every single character. (Maybe you've noticed there's a
>>>> letter missing.) However this obviously needs to be done; it would be
>>>> better if it was built-in.
>>>
>>> It doesn't seem to be so "obvious" to those who work with languages
>>> other than English, i.e. the majority of humanity.
>>
>> I've already posted the various ways in which that 26-letter alphabet dominates.
>
> Yes, but you were wrong then, too.


> Even in English; at least, if you want to read poetry. Or stories about
> cag�d whales that know nothing of mighty deeps.

You mean because the basic alphabet can't represent that � character?

I'm not sure it disproves my assertion that the 26-letter dominates.

On my typewriter, I don't have � either. And I don't recall seeing it on 
Telex machines on Teletypes.

My rather more modern phone can't do it, but it can support the groups 
ABC DEF GHI JKL MNO PQRS TUV WXYZ associated with the digits 2 to 9. In 
other words, the basic 26-letter alphabet.

And neither does an Android tablet I've just looked at, not on its 
default on-screen keyboard anyway. Nor on a Kindle. (And actually, I've 
no idea how to get � on my PC keyboard other than pasting it from your 
post!)

Looking further afield, it seems that URLs can only use ASCII (anything 
else needs to be encoded), and I think that web-site names can only use 
those letters too and some symbols.

So if this alphabet isn't dominant, which is? It seems pretty conclusive 
to me.

-- 
bartc
0
Bartc
5/23/2015 10:46:18 AM
Bartc <bc@freeuk.com> writes:
<snip>
> On my typewriter, I don't have � either.
<snip>
> And neither does an Android tablet I've just looked at, not on its
> default on-screen keyboard anyway.

My Android phone does it by default -- I just hold the 'e' for a moment
to bring up five accented alternatives (just as well as I need to text
French and Spanish).

<snip>
-- 
Ben.
0
Ben
5/23/2015 4:22:24 PM
On 23-May-15 05:46, Bartc wrote:
> My rather more modern phone can't do it, but it can support the
> groups ABC DEF GHI JKL MNO PQRS TUV WXYZ associated with the digits 2
> to 9. In other words, the basic 26-letter alphabet.

If it's a smartphone, it can do it.  Most dumb phones probably can too
(after all, they're sold around the world), but it varies by model.

> And neither does an Android tablet I've just looked at, not on its 
> default on-screen keyboard anyway.

Just hold down the "e" key for a second to display accented versions;
that's what the "..." on certain characters denotes.

> Nor on a Kindle.

Never used one, but I'd bet it's similar.

> (And actually, I've no idea how to get é on my PC keyboard other
> than pasting it from your post!)

If you rarely need them, just use Character Map (for Windows; other
platforms have similar tools) to find the character you need and copy it
to the clipboard.

I have my system set up so I can easily toggle between the standard US
and US-International keyboards, and typing accented letters is easy on
the latter, though I often forget where some of them are since nobody
seems to make keycaps that show the entire set.

> Looking further afield, it seems that URLs can only use ASCII
> (anything else needs to be encoded), and I think that web-site names
> can only use those letters too and some symbols.

We've had int'l domain names for years, and most/all browsers and
servers these days allow UTF-8 in the path part of URLs as well, though
I'm not sure that's actually allowed by the standards.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/23/2015 8:57:59 PM
On 23/05/2015 21:57, Stephen Sprunk wrote:
> On 23-May-15 05:46, Bartc wrote:
>> My rather more modern phone can't do it, but it can support the
>> groups ABC DEF GHI JKL MNO PQRS TUV WXYZ associated with the digits 2
>> to 9. In other words, the basic 26-letter alphabet.
>
> If it's a smartphone, it can do it.  Most dumb phones probably can too
> (after all, they're sold around the world), but it varies by model.
>
>> And neither does an Android tablet I've just looked at, not on its
>> default on-screen keyboard anyway.
>
> Just hold down the "e" key for a second to display accented versions;
> that's what the "..." on certain characters denotes.

Yes, Ben mentioned that earlier. But it's not quite the point...

>> Nor on a Kindle.
>
> Never used one, but I'd bet it's similar.

There's a hoop that needs to be jumped through then you have a 
'keyboard' full of all the vowel/accent combinations you need. I guess 
you choose one then get back to the main A-Z layout. (But this 
button-controlled on-screen keyboard is a complete pita to use anyway 
even without accented characters.)

My point is that A-Z is the de-facto standard, just as the 8-bit byte 
is, and as the ASCII coding is for that basic alphabet.

>> (And actually, I've no idea how to get é on my PC keyboard other
>> than pasting it from your post!)
>
> If you rarely need them, just use Character Map (for Windows; other
> platforms have similar tools) to find the character you need and copy it
> to the clipboard.

(I was writing international applications in the 80s. I designed actual 
keyboard layouts to use on digitising tablets, where all the special 
characters could be very easily entered. That was before Windows and 
Unicode came along to make it hard again!)

> I have my system set up so I can easily toggle between the standard US
> and US-International keyboards, and typing accented letters is easy on
> the latter, though I often forget where some of them are since nobody
> seems to make keycaps that show the entire set.

(Before laptop computers were available, when I visited clients abroad I 
had to use their desktop keyboards and even the swapping of Y and Z, or 
A and W or whatever it was, used to drive me to distraction.)

-- 
Bartc
0
Bartc
5/23/2015 9:51:43 PM
Bartc <bc@freeuk.com> writes:
[...]
> My point is that A-Z is the de-facto standard, just as the 8-bit byte
> is, and as the ASCII coding is for that basic alphabet.

A-Z is the de facto standard for what exactly?

Most methods of transmitting and storing text these days allow for
Unicode, typically using either UTF-8 or UTF-16 (the latter is mostly
used on MS Windows).  You were able to include an é in your recent
followup, and I was able to read it on my screen.  And there are other
ways of encoding non-ASCII characters (see HTML for example).

Sure, the ways of *entering* non-ASCII characters are varied and often
awkward; that's the price we pay for not having a million or so keys on
our keyboards.  Written languages were designed for pens, brushes, or
styluses; entering text by pressing buttons is a more recent and
imperfect innovation.

And ASCII is not universal; IBM mainframes using EBCDIC still exist.
You're free to ignore them; not everyone has that freedom.

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/24/2015 2:01:36 AM
On Sat, 23 May 2015 19:01:36 -0700, Keith Thompson <kst-u@mib.org>
wrote:

>Bartc <bc@freeuk.com> writes:
>[...]
>> My point is that A-Z is the de-facto standard, just as the 8-bit byte
>> is, and as the ASCII coding is for that basic alphabet.
>
>A-Z is the de facto standard for what exactly?
>
>Most methods of transmitting and storing text these days allow for
>Unicode, typically using either UTF-8 or UTF-16 (the latter is mostly
>used on MS Windows).  You were able to include an � in your recent
>followup, and I was able to read it on my screen.  And there are other
>ways of encoding non-ASCII characters (see HTML for example).
>
>Sure, the ways of *entering* non-ASCII characters are varied and often
>awkward; that's the price we pay for not having a million or so keys on
>our keyboards.  Written languages were designed for pens, brushes, or
>styluses; entering text by pressing buttons is a more recent and
>imperfect innovation.
>
>And ASCII is not universal; IBM mainframes using EBCDIC still exist.
>You're free to ignore them; not everyone has that freedom.


Just to get the conversation back on track (I think)...

I believe the subject was an extended form of the case label where you
could specify something like:

  case 3..7:

to represent the equivalent of:

  case 3:
  case 4:
  case 5:
  case 6:
  case 7:

I think we're all agreed that is clear enough, and has at least some
use cases (although not whether it's sufficiently useful to include in
the language).

The problem comes up when the specification is:

  case 'A'..'Z':

In C, of course, those character literals are numbers, and what
numbers they are depends on the implementation, but what the above
means is obvious assuming the standard handling of character literals.
That the above would have different results on an ASCII and EBCDIC
machines, is a slightly different issue.  As has been pointed out, the
same problem exists if you were to code a character test as:

  if (c >= 'a' && c <= 'z')

Personally I find the use case for a simple range compelling enough,
and the issue with collating sequence small enough, that I'd like the
simple extension in the language.  YMMV.

Bartc appears to be proposing special handling of some forms of the
range, specifically so that the above case would result in matches for
the 26 basic letters.  A major issue is exactly what the syntax and
semantics of that would be.  For example, what heppens when the
following are specified:

  case 'A'+2..'J'+2:
  case 'A'..'b':
  case '['..']':

I don't see any solutions that aren't particularly ugly.  So there's
probably no way, in C, to actually do that.
0
Robert
5/24/2015 3:50:30 AM
On 5/23/15 10:50 PM, Robert Wessel wrote:
> The problem comes up when the specification is:
>
>    case 'A'..'Z':
>
> In C, of course, those character literals are numbers, and what
> numbers they are depends on the implementation, but what the above
> means is obvious assuming the standard handling of character literals.
> That the above would have different results on an ASCII and EBCDIC
> machines, is a slightly different issue.  As has been pointed out, the
> same problem exists if you were to code a character test as:
>
>    if (c >= 'a' && c <= 'z')
>
> Personally I find the use case for a simple range compelling enough,
> and the issue with collating sequence small enough, that I'd like the
> simple extension in the language.  YMMV.

I’d kinda like to see:

     case 'A'..'Ω':

too. :-)

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/24/2015 4:41:03 AM
On 23-May-15 23:41, Morris Dovey wrote:
> On 5/23/15 10:50 PM, Robert Wessel wrote:
>> The problem comes up when the specification is:
>> 
>> case 'A'..'Z':
>> 
>> ...
> 
> I’d kinda like to see:
> 
> case 'A'..'Ω':
> 
> too. :-)

Even if we assume you're switching on a wchar_t (or a UTF-8 string, if
that were to be allowed too), that probably won't do what you expect:

http://unicode.org/charts/PDF/U0370.pdf

Internationalization is hard; let's go shopping!

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/24/2015 5:40:00 AM
On 24/05/2015 04:50, Robert Wessel wrote:

 > I think we're all agreed that is clear enough, and has at least some
 > use cases (although not whether it's sufficiently useful to include in
 > the language).

I briefly mentioned the kind of use where it might not be practical to 
enumerate all the cases, such as:

  case first .. last:

You don't know what first or last are, and even if you did, now these 
cases need updating whenever first or last change:

  case first:
  case first+1:
  case ???
  ...
  case last:

I don't think there's any question that it would be useful. That's 
probably why gcc has it (but with three dots which have to be separated 
from numeric constants).

> Bartc appears to be proposing special handling of some forms of the
> range, specifically so that the above case would result in matches for
> the 26 basic letters.

Yes, otherwise encoding A to Z as consecutive code points in ASCII, and 
having the lower case versions offset by exactly 32, would have been a 
complete waste of time, if no-one is allowed to make use of that fact! 
They might as well have been assigned completely randomly.

> A major issue is exactly what the syntax and
> semantics of that would be.  For example, what heppens when the
> following are specified:
>
>    case 'A'+2..'J'+2:
>    case 'A'..'b':
>    case '['..']':
>
> I don't see any solutions that aren't particularly ugly.  So there's
> probably no way, in C, to actually do that.

 >    case '['..']':
 >
 > I don't see any solutions that aren't particularly ugly.  So there's
 > probably no way, in C, to actually do that.

I think I replied to these examples before. What exactly do you want to 
happen here? It's not obvious nor intuitive. But sub-ranges of the 
English alphabet, /of the same case/, are.

While C /does/ allow you to write:

  case 'A'+2: case 'J'+2:

which also makes some assumptions.

-- 
Bartc
0
Bartc
5/24/2015 9:57:36 AM
If a Language is written with
256 char...
One can use permutations
P:{0..256}->{0..256}
instead of switch 
If one language is different of
asci in disposition of char
if x is a char in above disposition
P(x) 
would be the ascii char correspond
to x

0
asetofsymbols
5/24/2015 9:57:36 AM
Bartc <bc@freeuk.com> writes:
>I briefly mentioned the kind of use where it might not be practical to 
>enumerate all the cases, such as:
>  case first .. last:

  untested, not the best readability, but possible:

switch( i >= first && i <= last ? -1 : i )
{ case -1: ...
  case 12: ...
  case 14: ...
  default: ... }

  or

if( i >= first && i <= last )...
else switch( i )
{ case 12: ...
  case 14: ...
  default: ... }

  or - with a hypothetical preprocessor extension:

switch( i )
{

#{ for( int i = FIRST; i <= LAST; ++i )
#  fprintf( stdsrc, "case %d:\n", i ); }

  ...
  case 12: ...
  case 14: ...
  default: ... }

0
ram
5/24/2015 12:09:00 PM
On 5/24/15 12:40 AM, Stephen Sprunk wrote:
> On 23-May-15 23:41, Morris Dovey wrote:
>> On 5/23/15 10:50 PM, Robert Wessel wrote:
>>> The problem comes up when the specification is:
>>>
>>> case 'A'..'Z':
>>>
>>> ...
>>
>> I’d kinda like to see:
>>
>> case 'A'..'Ω':
>>
>> too. :-)
>
> Even if we assume you're switching on a wchar_t (or a UTF-8 string, if
> that were to be allowed too), that probably won't do what you expect:
>
> http://unicode.org/charts/PDF/U0370.pdf
>
> Internationalization is hard; let's go shopping!

I was being (somewhat) facetious. I think the problem has been poorly 
stated, because it does not allow distinction between the numerical 
value of a character and a collating sequence. Consider

    collate("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");

    case co('A')..co('F'):

where co(x) returns the subscript of x within the specified collating 
sequence, independent of the character encoding - and there's no reason 
not to make a wchar_t version.

This kind of approach could resolve ASCII/EBCDIC/? encoding headaches 
and (perhaps) make some parts of internationalization easier.

Having said all that, I’ll add that I get along quite well without 
ranges, TYVM.

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/24/2015 12:40:05 PM
On 24/05/2015 13:40, Morris Dovey wrote:
> On 5/24/15 12:40 AM, Stephen Sprunk wrote:
>> On 23-May-15 23:41, Morris Dovey wrote:

>>> I’d kinda like to see:
>>>
>>> case 'A'..'Ω':
>>>
>>> too. :-)
>>
>> Even if we assume you're switching on a wchar_t (or a UTF-8 string, if
>> that were to be allowed too), that probably won't do what you expect:
>>
>> http://unicode.org/charts/PDF/U0370.pdf
>>
>> Internationalization is hard; let's go shopping!
>
> I was being (somewhat) facetious. I think the problem has been poorly
> stated, because it does not allow distinction between the numerical
> value of a character and a collating sequence. Consider
>
>     collate("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");
>
>     case co('A')..co('F'):
>
> where co(x) returns the subscript of x within the specified collating
> sequence, independent of the character encoding - and there's no reason
> not to make a wchar_t version.
>
> This kind of approach could resolve ASCII/EBCDIC/? encoding headaches
> and (perhaps) make some parts of internationalization easier.

Yes, or just ascii('A'), since the sort of people who code such ranges 
will usually have ASCII in mind.

And of course ascii() will be a no-op on most compilers. On the EBCDIC 
ones, it could just generate an error, or map to ebcdic_to_ascii[].

It also solve the problem of 'A'+2 by writing it as co('A')+2 (with 
perhaps an inverse operation available).

However, this would still require ranges, which are harder to add tidily 
when they are not built-in.

-- 
Bartc
0
Bartc
5/24/2015 1:24:40 PM
On 5/24/15 8:24 AM, Bartc wrote:
> On 24/05/2015 13:40, Morris Dovey wrote:
>> On 5/24/15 12:40 AM, Stephen Sprunk wrote:
>>> On 23-May-15 23:41, Morris Dovey wrote:
>
>>>> I’d kinda like to see:
>>>>
>>>> case 'A'..'Ω':
>>>>
>>>> too. :-)
>>>
>>> Even if we assume you're switching on a wchar_t (or a UTF-8 string, if
>>> that were to be allowed too), that probably won't do what you expect:
>>>
>>> http://unicode.org/charts/PDF/U0370.pdf
>>>
>>> Internationalization is hard; let's go shopping!
>>
>> I was being (somewhat) facetious. I think the problem has been poorly
>> stated, because it does not allow distinction between the numerical
>> value of a character and a collating sequence. Consider
>>
>>     collate("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");
>>
>>     case co('A')..co('F'):
>>
>> where co(x) returns the subscript of x within the specified collating
>> sequence, independent of the character encoding - and there's no reason
>> not to make a wchar_t version.
>>
>> This kind of approach could resolve ASCII/EBCDIC/? encoding headaches
>> and (perhaps) make some parts of internationalization easier.
>
> Yes, or just ascii('A'), since the sort of people who code such ranges
> will usually have ASCII in mind.
>
> And of course ascii() will be a no-op on most compilers. On the EBCDIC
> ones, it could just generate an error, or map to ebcdic_to_ascii[].
>
> It also solve the problem of 'A'+2 by writing it as co('A')+2 (with
> perhaps an inverse operation available).
>
> However, this would still require ranges, which are harder to add tidily
> when they are not built-in.

They're easy enough, but assuming English/ASCII is rather parochial in a 
programming language with an /international/ standard - in a world where 
only 5.4% of people are native English speakers.

https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers

If it's worth doing, it's worth doing right - and if it's not worth 
doing right, then it's not worth doing.


-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/24/2015 2:23:30 PM
On 24/05/15 15:23, Morris Dovey wrote:

<snip>

>
> [...] assuming English/ASCII is rather parochial in a
> programming language with an /international/ standard - in a world where
> only 5.4% of people are native English speakers.
>
> https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers
>
> If it's worth doing, it's worth doing right - and if it's not worth
> doing right, then it's not worth doing.

If you're saying it's worth learning English to get the percentage up a 
bit (say, for instance, to about 100%), I'd have to agree with you.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/24/2015 2:36:15 PM
Keith Thompson <kst-u@mib.org> writes:

> Robert Wessel <robertwessel2@yahoo.com> writes:
> [...]
>> 0x is certainly common for hex, but is far from universal.  Consider
>> the list of methods in:
>>
>> https://en.wikipedia.org/wiki/Hexadecimal
>>
>> But I've always figured for C adding 0b and 0o for binary and octal
>> would be a good idea.  I'd also depreciate the leading-zero-only form
>> of octal, although I have to assume someone is actually using that.  In
>> my personal experience that's never been anything but the inadvertent
>> introduction of a bug.
>
> You've used octal constants, unless you've managed to avoid using the
> constant 0.  Of course if the current leading-zero notation were
> dropped, the grammar would be changed to make 0 a decimal constant.  (Or
> it could be base   42;  that would work too.)
>
> Octal is commonly used for the mode argument to the POSIX chmod() and
> open() functions.  Permissions are encoded in binary in groups of 3, so
> for example 0750 means read/write/execute for owner, read/execute for
> group, and no permissions for others.  Such code would have to be
> changed to use the new notation.
>
> I'm not sure "0o" is the best choice, unless it's restricted to lower
> case;  0O750 is too difficult to read.  Perhaps 0C, reminiscent of the
> first two letters of "octal"?

I see no reason to make any of these changes.  A binary base
isn't useful because the lengths involved make it hard to
see just what the value is (and is error prone to write).
The current notation for octal is a bit surprising if you're
not used to it, but it isn't that bad either.  I agree that
octal constants offer an opportunity for program flaws, but
they are easily found with a simple grep command.  Note also
that extending constant literals with new base formats will
probably also mean adding stuff in library functions, for
example strtol().  The current scheme isn't ideal but I can
live with it, as I am sure can most others.  There are more
important things to worry about.

Also, note that the matter of expressing literals in different
bases is less of an issue in C++, by virtue of constexpr.  To
my way of thinking that reduces the incentive to make changes
to how C handles them.
0
Tim
5/24/2015 2:44:10 PM
Morris Dovey <mrdovey@iedu.com> writes:

> [snip]
> Having said all that, I'll add that I get along quite well without
> ranges, TYVM.

+1
0
Tim
5/24/2015 2:44:59 PM
On 24/05/2015 15:44, Tim Rentsch wrote:

> I see no reason to make any of these changes.  A binary base
> isn't useful because the lengths involved make it hard to
> see just what the value is (and is error prone to write).

Are you suggesting that binary constants are /never/ useful? This is 
such a trivially implemented enhancement that you can afford to just 
throw it in, and let people use it if they want.

Obviously nobody is suggesting switching from decimal, or hex, to binary 
just for the hell of it. Binary will be used for a reason.

Here are some cases where I've used binary constants (most not in C):

(1) As a map. Here, a '1' bit in the last element marks an optional item 
in the preceding list of eight:

  (ki,ki, 0, 0, 0, 0, 0, 0, 01000000B),

  The position of the '1' corresponds exactly to the position item in 
the list.

(2) To mask the Nth bit. For one-off masking, specifying the mask in 
binary is less error-prone than doing so in hex. It will also be more 
obvious that you are masking a single bit, but if masking several, then 
the pattern will be far clearer.

(3) To define patterns. So if I had dotted line pattern marked by 1 
pixel on ('1') and 2 off, then 3 on and 2 off, then repeating, I might 
denote it as the following in a 16-bit cycle:

     100_11100_100_11100B

(4) Defining a set of masks:

    const attr1 = 0000_0001B
    const attr2 = 0000_0010B
    const attr3 = 0000_0100B

Binary makes it easy to keep track of where you are. And also see which 
might occupy two consecutive bits, or to see how they would interact 
when combined with & and |.

(5) Defining data which is grouped by bitfields:

   const fadd_op = 01_010_111B   # some made-up x87 op

(6) Defining small images, this is more efficient if you can use 1 bit 
per pixel but this needs binary:

   00000b,
   11110b,
   10001b,
   10001b,
   11110b,
   10001b,
   10001b,
   11110b

Etc. The advantage of binary is that it lets you visualise the data when 
there's a certain pattern or grouping involved. (It would need 
separators too.)

Or, you maybe you are right and this would all be just as easy in 
decimal or hex. (When it's come up before, someone devised a macro which 
interpreted its parameter as a binary string. But if someone goes to 
that much trouble, it means it could do with being in the language.)

-- 
Bartc
0
Bartc
5/24/2015 3:52:44 PM
On 05/24/2015 10:23 AM, Morris Dovey wrote:
....
> They're easy enough, but assuming English/ASCII is rather parochial in a 
> programming language with an /international/ standard - in a world where 
> only 5.4% of people are native English speakers.
> 
> https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers

Keep in mind, however, that C originated in the US. Despite the fact
that it's an international standard, I believe that it's officially
published only in English. I would expect that the committee's official
meetings are conducted almost entirely in English - can anyone who's
actually attended those meeting confirm that guess?

All of C's keywords, and all of the identifiers that provide the
interface to the C standard library, are either actual English words, or
are clearly derived from English. I get the impression that even the C
users who are not native speakers of English, are disproportionately
likely to be able to at least read English, even if they are horrible at
writing it.

I'm all in favor of internationalization - but C is far from being the
ideal starting point for a completely internationalized programming
language.

> If it's worth doing, it's worth doing right - and if it's not worth 
> doing right, then it's not worth doing.

I've frequently heard this claim, but it makes no sense to me. In
general, there's no unique right way to do something, just a variety of
different ways to do it better or worse, with corresponding costs.
There's a way of doing any particular thing that maximizes the value of
the results minus the values of the associated costs, and that's the
best way to do it. Doing the job any better than that is a waste of
valuable resources that could be better spent on some other task.

Example: at a time when I was in charge of a grand total of 1.5
subordinates, my group was asked to spend a lot time writing up detailed
descriptions of the procedures we used to develop our software. The goal
was to get us officially certified as having met a certain level of
sophistication in our development practices. I strongly approve of the
general idea, but the specific requirements we were asked to meet were
designed for a group that is large enough to justify hiring a full-time
person who's only job was to monitor conformance with those procedures.
They were ridiculously excessive for a group of our size.
I don't mean that the development procedures were too onerous - that
would not have been a problem - the procedures we actually currently use
would have been good enough. It was the requirement that we determine
what our procedures should be, write them up in detail, get then
approved, and then periodically review them for appropriateness. At the
required level of detail, that would have kept me (the only person in my
group sufficiently fluent in English to do the job) busy for several
years. That was overkill for a group of our size - but according to
someone, doing things that way was "doing it right".
-- 
James Kuyper
0
James
5/24/2015 5:10:07 PM
Bartc <bc@freeuk.com> writes:

> On 24/05/2015 15:44, Tim Rentsch wrote:
>
>> I see no reason to make any of these changes.  A binary base
>> isn't useful because the lengths involved make it hard to
>> see just what the value is (and is error prone to write).
>
> Are you suggesting that binary constants are /never/ useful? This is
> such a trivially implemented enhancement that you can afford to just
> throw it in, and let people use it if they want.
>
> Obviously nobody is suggesting switching from decimal, or hex, to
> binary just for the hell of it. Binary will be used for a reason.
>
> Here are some cases where I've used binary constants (most not in C):
>
> (1) As a map. Here, a '1' bit in the last element marks an optional
> item in the preceding list of eight:
>
>  (ki,ki, 0, 0, 0, 0, 0, 0, 01000000B),
>
>  The position of the '1' corresponds exactly to the position item in
> the list.

I don't know exactly what you are doing here but if the "eight" is
important why is (ki,ki, 0, 0, 0, 0, 0, 0, 1 << 8) not clearer?

> (2) To mask the Nth bit. For one-off masking, specifying the mask in
> binary is less error-prone than doing so in hex. It will also be more
> obvious that you are masking a single bit,

Surely 1 << N is even less error prone and needs no counting of zeros.

> but if masking several,
> then the pattern will be far clearer.

Maybe.  It depends on the pattern and the logic by which is is defined.
An example (with context) would help.

> (3) To define patterns. So if I had dotted line pattern marked by 1
> pixel on ('1') and 2 off, then 3 on and 2 off, then repeating, I might
> denote it as the following in a 16-bit cycle:
>
>     100_11100_100_11100B

Yes, I've used them for this and it's often the clearest notation but
see below in number 6 for a C option.

> (4) Defining a set of masks:
>
>    const attr1 = 0000_0001B
>    const attr2 = 0000_0010B
>    const attr3 = 0000_0100B

Surely

    const attr1 = 1 << 0;
    const attr2 = 1 << 1;
    const attr3 = 1 << 2;

makes the pattern clearer and simpler to check.  You could even use an
array or list comprehension (e.g. let attrs = [bit n | n <- [0..6]]).

> Binary makes it easy to keep track of where you are. And also see
> which might occupy two consecutive bits, or to see how they would
> interact when combined with & and |.

All but the last are made simpler using 1 << n, and I'm not sure the
last is really a strong point.

> (5) Defining data which is grouped by bitfields:
>
>   const fadd_op = 01_010_111B   # some made-up x87 op

Maybe for a couple of once-off cases, but as soon as I have more than
two I'd want to have, say, a macro to build the result from nicely named
parts.  Depending on the context, binary constants might help for some
parts:

  const fadd_op = FLAGS(1) | ADDR_MODE(010B) | OP_CODE(7);
  
> (6) Defining small images, this is more efficient if you can use 1 bit
> per pixel but this needs binary:
>
>   00000b,
>   11110b,
>   10001b,
>   10001b,
>   11110b,
>   10001b,
>   10001b,
>   11110b
>
> Etc. The advantage of binary is that it lets you visualise the data
> when there's a certain pattern or grouping involved. (It would need
> separators too.)

Like a linear pattern (number 3) this is one I've used too, but even in
C you can do some messing about in macros to get patterns that stand out
quite strongly:

  const unsigned image[] = {
      B(X___,___X)
      B(_X__,__X_)
      B(__X_,_X__)
      B(___X,X___)
      B(___X,X___)
      B(__X_,_X__)
      B(_X__,__X_)
      B(X___,___X)
  };

using these definitions:

  #define PAT_____ 0
  #define PAT____X 1
  #define PAT___X_ 2
  #define PAT___XX 3
  #define PAT__X__ 4
  #define PAT__X_X 5
  #define PAT__XX_ 6
  #define PAT__XXX 7
  #define PAT_X___ 8
  #define PAT_X__X 9
  #define PAT_X_X_ a
  #define PAT_X_XX b
  #define PAT_XX__ c
  #define PAT_XX_X d
  #define PAT_XXX_ e
  #define PAT_XXXX f
   
  #define B(a, b)   BX(PAT_ ## a, PAT_ ## b)
  #define BX(a, b)  BXX(a, b)
  #define BXX(a, b) 0x ## a ## b,
   
I agree that the very low cost to implement binary literals should be
taken into account.  Even a few uses makes it a net gain, but it almost
certainly needs a digit separator as well.

<snip>
-- 
Ben.
0
Ben
5/24/2015 9:19:11 PM
On 24/05/15 22:19, Ben Bacarisse wrote:
> Bartc <bc@freeuk.com> writes:
>
<snip>
>>
>> The advantage of binary is that it lets you visualise the data
>> when there's a certain pattern or grouping involved. (It would need
>> separators too.)
>
> Like a linear pattern (number 3) this is one I've used too, but even in
> C you can do some messing about in macros to get patterns that stand out
> quite strongly:
>
>    const unsigned image[] = {
>        B(X___,___X)
>        B(_X__,__X_)
>        B(__X_,_X__)
>        B(___X,X___)
>        B(___X,X___)
>        B(__X_,_X__)
>        B(_X__,__X_)
>        B(X___,___X)
>    };
>
> using these definitions:
>
>    #define PAT_____ 0
>    #define PAT____X 1
>    #define PAT___X_ 2
>    #define PAT___XX 3
>    #define PAT__X__ 4
>    #define PAT__X_X 5
>    #define PAT__XX_ 6
>    #define PAT__XXX 7
>    #define PAT_X___ 8
>    #define PAT_X__X 9
>    #define PAT_X_X_ a
>    #define PAT_X_XX b
>    #define PAT_XX__ c
>    #define PAT_XX_X d
>    #define PAT_XXX_ e
>    #define PAT_XXXX f
>
>    #define B(a, b)   BX(PAT_ ## a, PAT_ ## b)
>    #define BX(a, b)  BXX(a, b)
>    #define BXX(a, b) 0x ## a ## b,
>
> I agree that the very low cost to implement binary literals should be
> taken into account.  Even a few uses makes it a net gain, but it almost
> certainly needs a digit separator as well.
>
> <snip>
>
#define l )*2+1
#define O )*2
#define binary ((((((((((((((((0
/* add 16 more (s for 32 bit quantities */

Usage:

a = binary O O O O O O O O O O O O O O O O ; /* 0 */
b = binary O O O O O O O O O O O O O O O l ; /* 1 */
c = binary O O O O O O O O O O O O O O l O ; /* 2 */
d = binary O O O O O O O O O O O O O O l l ; /* 3 */

etc.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/24/2015 9:28:55 PM
On 24/05/2015 22:19, Ben Bacarisse wrote:
> Bartc <bc@freeuk.com> writes:

>> (1) As a map. Here, a '1' bit in the last element marks an optional
>> item in the preceding list of eight:
>>
>>   (ki,ki, 0, 0, 0, 0, 0, 0, 01000000B),
>>
>>   The position of the '1' corresponds exactly to the position item in
>> the list.
>
> I don't know exactly what you are doing here but if the "eight" is
> important why is (ki,ki, 0, 0, 0, 0, 0, 0, 1 << 8) not clearer?

It might be clearer with a slightly different example written like this:

    (ki,ki,kr,ki, 0, 0, 0, 0,

      0__1__1__0__0__0__0__0_B)

Each bit is a flag for the corresponding item above. This way it's easy 
to see at a glance which item is marked, and maintenance is simpler. 
Probably the separators are not really needed with only eight items: 
0110_0000B would have done; the original didn't use them.

(A series of int values would have worked also, but in the original 
context, a single int had to contain the bit flags.)

>> (4) Defining a set of masks:
>>
>>     const attr1 = 0000_0001B
>>     const attr2 = 0000_0010B
>>     const attr3 = 0000_0100B
>
> Surely
>
>      const attr1 = 1 << 0;
>      const attr2 = 1 << 1;
>      const attr3 = 1 << 2;
>
> makes the pattern clearer and simpler to check.  You could even use an
> array or list comprehension (e.g. let attrs = [bit n | n <- [0..6]]).

Maybe, but sometimes just laying out all the bits can be clearer than 
using a formula. For the same reasons that tabulating numbers in aligned 
columns can make it easier to compare or to take in data.

For example, I'd imagine that this:

      const rmask = 0x00_00_FF
      const gmask = 0x00_FF_00
      const bmask = 0xFF_00_00

is a little easier to grasp than:

      const rmask = 0xFF << 0
      const gmask = 0xFF << 8
      const bmask = 0xFF << 16


>> (6) Defining small images, this is more efficient if you can use 1 bit
>> per pixel but this needs binary:
>>
>>    00000b,
>>    11110b,
>>    10001b,

>> Etc. The advantage of binary is that it lets you visualise the data
>> when there's a certain pattern or grouping involved. (It would need
>> separators too.)
>
> Like a linear pattern (number 3) this is one I've used too, but even in
> C you can do some messing about in macros to get patterns that stand out
> quite strongly:
>
>    const unsigned image[] = {
>        B(X___,___X)
>        B(_X__,__X_)
>        B(__X_,_X__)
>        B(___X,X___)
>        B(___X,X___)
>        B(__X_,_X__)
>        B(_X__,__X_)
>        B(X___,___X)
>    };
>
> using these definitions:
>
>    #define PAT_____ 0
>    #define PAT____X 1
.....

> I agree that the very low cost to implement binary literals should be
> taken into account.  Even a few uses makes it a net gain, but it almost
> certainly needs a digit separator as well.

Yes there are lots of ways to do some of this stuff, but it sometimes 
need some ingenuity. And obviously C has managed without it up to now.

But I think if binary constants had been available, they would be used 
for a lot of it. Tim however was saying he couldn't see any uses for 
them at all.

-- 
Bartc

0
Bartc
5/24/2015 10:13:42 PM
Richard Heathfield <rjh@cpax.org.uk> writes:
<snip>
> #define l )*2+1
> #define O )*2
> #define binary ((((((((((((((((0
> /* add 16 more (s for 32 bit quantities */
>
> Usage:
>
> a = binary O O O O O O O O O O O O O O O O ; /* 0 */
> b = binary O O O O O O O O O O O O O O O l ; /* 1 */
> c = binary O O O O O O O O O O O O O O l O ; /* 2 */
> d = binary O O O O O O O O O O O O O O l l ; /* 3 */
>
> etc.

That's nice.

(I'd want to #undef O and l right after -- just to be on the safe side.)

-- 
Ben.
0
Ben
5/24/2015 11:00:27 PM
On 24-May-15 07:40, Morris Dovey wrote:
> On 5/24/15 12:40 AM, Stephen Sprunk wrote:
>> On 23-May-15 23:41, Morris Dovey wrote:
>>> I’d kinda like to see:
>>> 
>>> case 'A'..'Ω':
>>> 
>>> too. :-)
>> 
>> Even if we assume you're switching on a wchar_t (or a UTF-8 string,
>> if that were to be allowed too), that probably won't do what you
>> expect:
>> 
>> http://unicode.org/charts/PDF/U0370.pdf
>> 
>> Internationalization is hard; let's go shopping!
> 
> I was being (somewhat) facetious. I think the problem has been
> poorly stated, because it does not allow distinction between the
> numerical value of a character and a collating sequence. Consider
> 
> collate("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");
> 
> case co('A')..co('F'):
> 
> where co(x) returns the subscript of x within the specified
> collating sequence, independent of the character encoding - and
> there's no reason not to make a wchar_t version.

static inline int coll(char c) {
  return strchr("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", c);
}
....
switch (coll(c)) {
  case coll('A')..coll('F'):
....

That requires two significant changes to C's switch syntax, but it
solves all of the objections I've seen in this thread so far, and it
wouldn't be difficult to implement at all.

With some implementation-specific hackery, you could also tap into the
Standard Library's collation tables so that it automatically adapted to
the user's locale, and it'd only require a tiny bit of work to handle
UTF-8 strings as well.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/25/2015 12:26:57 AM
On 5/24/15 7:26 PM, Stephen Sprunk wrote:
> static inline int coll(char c) {
>    return strchr("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", c);
> }
> ...
> switch (coll(c)) {
>    case coll('A')..coll('F'):
> ...
>
> That requires two significant changes to C's switch syntax, but it
> solves all of the objections I've seen in this thread so far, and it
> wouldn't be difficult to implement at all.
>
> With some implementation-specific hackery, you could also tap into the
> Standard Library's collation tables so that it automatically adapted to
> the user's locale, and it'd only require a tiny bit of work to handle
> UTF-8 strings as well.

I'd prefer to allow user-specified character sets and collation orders 
for each switch statement. That would increase usefulness with not too 
much additional compiler design complexity.

I did a bit of doodling to try to pin down what execution logic might 
look like and produced this bit of pseudocode (note that 'case code' can 
include a break statement):

char *seqstr;  /* Pointer to collating sequence string */
int   seqlen;  /* Length of sequence string */
int   switch;  /* Switch target value */
int   skip;    /* Target matched, do not test again */

void collate(char *s) /* Register collating sequence */
{  seqstr = s;
    seqlen = strlen(s);
}

int co(char x) /* Return index of x in sequence, or -1 */
{  for (int i=0; i<seqlen; i++) if (x == seqstr[i]) return i;
    return -1;
}

int inrange(char min,char max) /* Case range test or skip */
{  int i1,i2;
    if (skip) return 1;
    i1 = co(min),
    i2 = co(max);
    skip = -1 != i1 && -1 != i2 && switch >= i1 && switch <= i2;
    return skip;
}

collate("0123456789ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ");

/* Switch (z) */        skip = 0; if (-1 != (switch = co(z))) do
{
    /* case 'A'..'Z': */ if (!inrange('A','Z')) goto L1;
       /* case code */
L1:
    /* case 'H'..'Σ': */   if (!inrange('H','Σ')) goto L2;
       /* case code */
L2:
    /* default: */
       /* case code */
}
else /* Invalid target */

What I have NOT accounted for is nested switch statements - nor did I 
even try to deal with nested switch statements using different character 
sets and/or collating orders. I think I know what kind of code structure 
that might entail, but it's too late/early and I'm just not up to 
dealing with that tonight. :-)

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/25/2015 5:51:41 AM
Keith Thompson <kst-u@mib.org> writes:
> David Brown <david.brown@hesbynett.no> writes:
> > On 21/05/15 23:33, Keith Thompson wrote:
> [...]
> >>     6 Octal literals written ax 8x377 not 0377
> >> 
> >> If hexadecimal literals are still written as 0xFFFF, I can imagine
> >> that "8x" and "0x" might be a little difficult to distinguish.  If you
> >> drop C's "0x" syntax and just use "16x", that wouldn't be an issue.
> >
> > Personally, I would rather see octal support being dropped entirely.  I
> > know that's not going to happen - but hopefully compilers will one day
> > have optional warnings to detect them (I know of some compilers that
> > already do).
> 
> I'd miss being able to use 0.  8-)}

Were octal contants outlawed, you'd still be able to use 0,
as it would be a decimal constant instead.

Defaulting to octal I never liked. I'd have been happier with
0c755 (and I like 0b... likewise). Alas it's too late to change
now.

Phil
-- 
A well regulated militia, being necessary to the security of a free state,
the right of the people to keep and bear arms, shall be well regulated.
0
Phil
5/25/2015 8:36:26 AM
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> Bartc <bc@freeuk.com> writes:
> <snip>
> > On my typewriter, I don't have � either.
> <snip>
> > And neither does an Android tablet I've just looked at, not on its
> > default on-screen keyboard anyway.
> 
> My Android phone does it by default -- I just hold the 'e' for a moment
> to bring up five accented alternatives (just as well as I need to text
> French and Spanish).

If you have to press and hold to pull up an additional selection list,
then I'd say it's not on the default on-screen keyboard, it's on a
selection list of alternatives that are optionally pulled up by a long
press.

Phil
-- 
A well regulated militia, being necessary to the security of a free state,
the right of the people to keep and bear arms, shall be well regulated.
0
Phil
5/25/2015 8:49:10 AM
Stephen Sprunk <stephen@sprunk.org> writes:
> On 23-May-15 23:41, Morris Dovey wrote:
> > On 5/23/15 10:50 PM, Robert Wessel wrote:
> >> The problem comes up when the specification is:
> >> 
> >> case 'A'..'Z':
> >> 
> >> ...
> > 
> > I�d kinda like to see:
> > 
> > case 'A'..'�':
> > 
> > too. :-)
> 
> Even if we assume you're switching on a wchar_t (or a UTF-8 string, if
> that were to be allowed too), that probably won't do what you expect:
> 
> http://unicode.org/charts/PDF/U0370.pdf
> 
> Internationalization is hard; let's go shopping!

If unicrud has anything to do with internationalisation any more,
I'm a dutchman. It's now just one of those crappy clip-art libraries
which seemed really 'kewl' for all of 5 seconds back in the early 90s:
 http://unicode.org/charts/PDF/U1F300.pdf

Phil, not even vaguely dutch
-- 
A well regulated militia, being necessary to the security of a free state,
the right of the people to keep and bear arms, shall be well regulated.
0
Phil
5/25/2015 9:11:50 AM
Phil Carmody <pc+usenet@asdf.org> writes:

> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>> Bartc <bc@freeuk.com> writes:
>> <snip>
>> > On my typewriter, I don't have � either.
>> <snip>
>> > And neither does an Android tablet I've just looked at, not on its
>> > default on-screen keyboard anyway.
>> 
>> My Android phone does it by default -- I just hold the 'e' for a moment
>> to bring up five accented alternatives (just as well as I need to text
>> French and Spanish).
>
> If you have to press and hold to pull up an additional selection list,
> then I'd say it's not on the default on-screen keyboard, it's on a
> selection list of alternatives that are optionally pulled up by a long
> press.

So had BartC said that he does not have a comma, any digits, or a
capital 'A' on his Android's default on-screen keyboard you'd have
agreed?  That interpretation does not make sense given the context of
A-Z being "universal" and all the talk about ASCII.

The usual interpretation of "keyboard", when it is soft (as it is for
on-screen keyboards), is the software, not the "main screen" it
displays.  Replacing the "default keyboard" usually means installing a
new program.

-- 
Ben.
0
Ben
5/25/2015 9:50:33 AM
On 24/05/15 16:44, Tim Rentsch wrote:

> I see no reason to make any of these changes.  A binary base
> isn't useful because the lengths involved make it hard to
> see just what the value is (and is error prone to write).

Binary constants of the format 0b0110 made it into C compilers and the 
C++ standards because they are very useful, particularly in embedded 
programming where we live in a world of bits for device registers, IO 
ports, and so on.

The format 0b0110 was implemented as a target-specific extension in the 
AVR (an 8-bit microcontroller) port of gcc.  It was then accepted as a 
general extension in gcc, and copied by clang and other embedded 
compilers.  It is quite likely that other embedded compilers used the 
same notation, probably before avr-gcc, but not in any of the ones I 
happen to have used.

Someone on the C++ committee realised that this was a useful extension, 
with no conflicts with existing code and wide-spread implementation, so 
it was cheap and easy to add to the C++ standards.  It works 
particularly well with the digit separator '.

Hopefully the C committee will think likewise.


(Octal literals other than 0, on the other hand, are practically 
non-existent in embedded code, and banned by many coding standards as 
confusing, hard to read, and error prone.  When you work with data sizes 
of 8, 16 and 32 bits, a notation for 3 bit sizes is useless.)


0
David
5/25/2015 10:04:15 AM
On 25/05/2015 10:11, Phil Carmody wrote:
> Stephen Sprunk <stephen@sprunk.org> writes:

>> http://unicode.org/charts/PDF/U0370.pdf
>>
>> Internationalization is hard; let's go shopping!
>
> If unicrud has anything to do with internationalisation any more,
> I'm a dutchman. It's now just one of those crappy clip-art libraries
> which seemed really 'kewl' for all of 5 seconds back in the early 90s:
>   http://unicode.org/charts/PDF/U1F300.pdf

So this is why we have to go to 32-bits per character and still need 
UTF-16 instead of just having 16-bit characters and be done with it.

To be able to have pictures of farmyard animals and slices of pizza!

I agree that this stuff doesn't belong inside a character set mapping. 
Even if we'd need more than 64K code points anyway.

(At one time it was my responsibility to provide glyphs for both raster 
and vector fonts, fortunately for 8-bit character sets; doing this lot 
would have taken ages!)

-- 

Bartc
0
Bartc
5/25/2015 10:04:36 AM
On 24/05/2015 22:19, Ben Bacarisse wrote:
> Bartc <bc@freeuk.com> writes:

>> (5) Defining data which is grouped by bitfields:
>>
>>    const fadd_op = 01_010_111B   # some made-up x87 op
>
> Maybe for a couple of once-off cases, but as soon as I have more than
> two I'd want to have, say, a macro to build the result from nicely named
> parts.  Depending on the context, binary constants might help for some
> parts:
>
>    const fadd_op = FLAGS(1) | ADDR_MODE(010B) | OP_CODE(7);

This is the actual set of examples that I couldn't find yesterday:

http://pastebin.com/GiWaTEu1

The binary form keeps it compact.

(Note that the original 22-page Intel datasheets for the 8087 (among 
other chips) use binary in the instruction set details. Then it's useful 
that the source forms matches that format. Although I can't remember how 
the above example originated.)

-- 
Bartc

0
Bartc
5/25/2015 10:30:11 AM
On Monday, 25 May 2015 00:19:20 UTC+3, Ben Bacarisse  wrote:
> Bartc <bc@freeuk.com> writes:
> 
> > Etc. The advantage of binary is that it lets you visualise the data
> > when there's a certain pattern or grouping involved. (It would need
> > separators too.)
> 
> Like a linear pattern (number 3) this is one I've used too, but even in
> C you can do some messing about in macros to get patterns that stand out
> quite strongly:
> 
>   const unsigned image[] = {
>       B(X___,___X)
>       B(_X__,__X_)
>       B(__X_,_X__)
>       B(___X,X___)
>       B(___X,X___)
>       B(__X_,_X__)
>       B(_X__,__X_)
>       B(X___,___X)
>   };
> 
> using these definitions:
> 
>   #define PAT_____ 0
>   #define PAT____X 1
>   #define PAT___X_ 2
>   #define PAT___XX 3
>   #define PAT__X__ 4
>   #define PAT__X_X 5
>   #define PAT__XX_ 6
>   #define PAT__XXX 7
>   #define PAT_X___ 8
>   #define PAT_X__X 9
>   #define PAT_X_X_ a
>   #define PAT_X_XX b
>   #define PAT_XX__ c
>   #define PAT_XX_X d
>   #define PAT_XXX_ e
>   #define PAT_XXXX f
>    
>   #define B(a, b)   BX(PAT_ ## a, PAT_ ## b)
>   #define BX(a, b)  BXX(a, b)
>   #define BXX(a, b) 0x ## a ## b,
>    
> I agree that the very low cost to implement binary literals should be
> taken into account.  Even a few uses makes it a net gain, but it almost
> certainly needs a digit separator as well.

Neat trick of self constructing binary literals. 
Can some pedant reject usage of  those 'PAT__XX_' on review as usage
of reserved names? 
Can perhaps fall back to usage of 'oXXo' or the like on such case but
'_XX_' looks better.
0
ISO
5/25/2015 2:41:02 PM
�� Tiib <ootiib@hot.ee> writes:

> On Monday, 25 May 2015 00:19:20 UTC+3, Ben Bacarisse  wrote:
>> Bartc <bc@freeuk.com> writes:
>> 
>> > Etc. The advantage of binary is that it lets you visualise the data
>> > when there's a certain pattern or grouping involved. (It would need
>> > separators too.)
>> 
>> Like a linear pattern (number 3) this is one I've used too, but even in
>> C you can do some messing about in macros to get patterns that stand out
>> quite strongly:
>> 
>>   const unsigned image[] = {
>>       B(X___,___X)
>>       B(_X__,__X_)
>>       B(__X_,_X__)
>>       B(___X,X___)
>>       B(___X,X___)
>>       B(__X_,_X__)
>>       B(_X__,__X_)
>>       B(X___,___X)
>>   };
>> 
>> using these definitions:
>> 
>>   #define PAT_____ 0
>>   #define PAT____X 1
>>   #define PAT___X_ 2
>>   #define PAT___XX 3
>>   #define PAT__X__ 4
>>   #define PAT__X_X 5
>>   #define PAT__XX_ 6
>>   #define PAT__XXX 7
>>   #define PAT_X___ 8
>>   #define PAT_X__X 9
>>   #define PAT_X_X_ a
>>   #define PAT_X_XX b
>>   #define PAT_XX__ c
>>   #define PAT_XX_X d
>>   #define PAT_XXX_ e
>>   #define PAT_XXXX f
>>    
>>   #define B(a, b)   BX(PAT_ ## a, PAT_ ## b)
>>   #define BX(a, b)  BXX(a, b)
>>   #define BXX(a, b) 0x ## a ## b,
>>    
>> I agree that the very low cost to implement binary literals should be
>> taken into account.  Even a few uses makes it a net gain, but it almost
>> certainly needs a digit separator as well.
>
> Neat trick of self constructing binary literals.
> Can some pedant reject usage of  those 'PAT__XX_' on review as usage
> of reserved names?

I think it's OK.  The PAT__ names are fine, and though an implementation
is permitted to define macros called __XX (and so on), they will have no
effect on the code because no macro called __XX is ever defined or
expanded.

> Can perhaps fall back to usage of 'oXXo' or the like on such case but
> '_XX_' looks better.

I like Richard's nested expression idea better in some ways.  I think
(for patterns) you could get the stringer contrast by using X and _
where he wrote O and l.  That's safe because _ on its own is not a
reserved name.

-- 
Ben.
0
Ben
5/25/2015 4:10:23 PM
On 25/05/15 17:10, Ben Bacarisse wrote:

> I like Richard's nested expression idea better in some ways.  I think
> (for patterns) you could get the stringer contrast by using X and _
> where he wrote O and l.  That's safe because _ on its own is not a
> reserved name.

The idea is not mine - it comes from Peter van der Linden's book, "Deep 
C Secrets". All I did was to introduce O and l (to mimic 0 and 1, of 
course). His original tokens were, IIRC, _ and X. :-)

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/25/2015 4:42:19 PM
Öö Tiib <ootiib@hot.ee> writes:
[...]
> Neat trick of self constructing binary literals. 
> Can some pedant reject usage of  those 'PAT__XX_' on review as usage
> of reserved names? 

In C, PAT__XX_ is not reserved.

(In C++, identifiers containing a double underscore are reserved.)

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/25/2015 7:20:29 PM
Bartc <bc@freeuk.com> writes:
> On 25/05/2015 10:11, Phil Carmody wrote:
>> Stephen Sprunk <stephen@sprunk.org> writes:
>>>> http://unicode.org/charts/PDF/U0370.pdf
>>>
>>> Internationalization is hard; let's go shopping!
>>
>> If unicrud has anything to do with internationalisation any more,
>> I'm a dutchman. It's now just one of those crappy clip-art libraries
>> which seemed really 'kewl' for all of 5 seconds back in the early 90s:
>>   http://unicode.org/charts/PDF/U1F300.pdf

How much of range of Unicode characters is taken up by Emoji?

> So this is why we have to go to 32-bits per character and still need
> UTF-16 instead of just having 16-bit characters and be done with it.
>
> To be able to have pictures of farmyard animals and slices of pizza!

Not at all.  There are plenty of characters from real-world alphabets
that don't fit into the initial 2**16 code values.

> I agree that this stuff doesn't belong inside a character set
> mapping. Even if we'd need more than 64K code points anyway.

I haven't paid much attention to the Emoji.  I have no strong
opinion on whether Unicode *should* support them, but presumably
they were added because there was demand for them.

As long as I'm able to read UTF-8 text without having to worry about
them, I don't have any problem with their presence.  (And they
probably reduce the incentive to render punctuation sequences as
smiley faces.  I get USGS earthquake notifications as text messages,
and I got really tired of every occurrence of the messages being
displayed with random smiley faces.)

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/25/2015 7:30:36 PM
On Sun, 24 May 2015 10:57:36 +0100, Bartc <bc@freeuk.com> wrote:

>On 24/05/2015 04:50, Robert Wessel wrote:
>
> > I think we're all agreed that is clear enough, and has at least some
> > use cases (although not whether it's sufficiently useful to include in
> > the language).
>
>I briefly mentioned the kind of use where it might not be practical to 
>enumerate all the cases, such as:
>
>  case first .. last:
>
>You don't know what first or last are, and even if you did, now these 
>cases need updating whenever first or last change:
>
>  case first:
>  case first+1:
>  case ???
>  ...
>  case last:
>
>I don't think there's any question that it would be useful. That's 
>probably why gcc has it (but with three dots which have to be separated 
>from numeric constants).
>
>> Bartc appears to be proposing special handling of some forms of the
>> range, specifically so that the above case would result in matches for
>> the 26 basic letters.
>
>Yes, otherwise encoding A to Z as consecutive code points in ASCII, and 
>having the lower case versions offset by exactly 32, would have been a 
>complete waste of time, if no-one is allowed to make use of that fact! 
>They might as well have been assigned completely randomly.
>
>> A major issue is exactly what the syntax and
>> semantics of that would be.  For example, what heppens when the
>> following are specified:
>>
>>    case 'A'+2..'J'+2:
>>    case 'A'..'b':
>>    case '['..']':
>>
>> I don't see any solutions that aren't particularly ugly.  So there's
>> probably no way, in C, to actually do that.
>
> >    case '['..']':
> >
> > I don't see any solutions that aren't particularly ugly.  So there's
> > probably no way, in C, to actually do that.
>
>I think I replied to these examples before. What exactly do you want to 
>happen here? It's not obvious nor intuitive. But sub-ranges of the 
>English alphabet, /of the same case/, are.
>
>While C /does/ allow you to write:
>
>  case 'A'+2: case 'J'+2:
>
>which also makes some assumptions.


My question is what exactly syntactically distinguishes the above
cases from the "sub-ranges of the  English alphabet" that would get
special handling.

I understand that you *want* "case 'a..'z':" to mean the 26 letters, I
want to know what reasonable syntax rules would produce such a thing.
0
Robert
5/25/2015 7:31:43 PM
On 25-May-15 05:04, Bartc wrote:
> On 25/05/2015 10:11, Phil Carmody wrote:
>> Stephen Sprunk <stephen@sprunk.org> writes:
>>> http://unicode.org/charts/PDF/U0370.pdf
>>> 
>>> Internationalization is hard; let's go shopping!
>> 
>> If unicrud has anything to do with internationalisation any more, 
>> I'm a dutchman. It's now just one of those crappy clip-art
>> libraries which seemed really 'kewl' for all of 5 seconds back in
>> the early 90s:
>> http://unicode.org/charts/PDF/U1F300.pdf
> 
> So this is why we have to go to 32-bits per character and still need 
> UTF-16 instead of just having 16-bit characters and be done with it.

No; 16 bits wasn't enough to support even just CJK characters, much less
all the other (past and present) scripts of humanity.

However, once we had a 21-bit space, it was natural to start to look for
things to fill it; emoji is a relatively small and harmless example.

> (At one time it was my responsibility to provide glyphs for both
> raster and vector fonts, fortunately for 8-bit character sets; doing
> this lot would have taken ages!)

Still, only one person has to do it, and in the vast majority of cases,
that's done before a code point is assigned by Unicode anyway.  Where do
you think the glyphs in their tables come from?

If the user-selected font doesn't have a glyph for a particular code
point, e.g. because it's in an unsupported script, the system will just
find another font that does have a glyph.  Emoji is just another script,
no different from the dozens of others in common use.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/25/2015 8:20:31 PM
On 25-May-15 14:30, Keith Thompson wrote:
> Bartc <bc@freeuk.com> writes:
>> On 25/05/2015 10:11, Phil Carmody wrote:
>>> Stephen Sprunk <stephen@sprunk.org> writes:
>>>>> http://unicode.org/charts/PDF/U0370.pdf
>>>> 
>>>> Internationalization is hard; let's go shopping!
>>> 
>>> If unicrud has anything to do with internationalisation any
>>> more, I'm a dutchman. It's now just one of those crappy clip-art
>>> libraries which seemed really 'kewl' for all of 5 seconds back in
>>> the early 90s: http://unicode.org/charts/PDF/U1F300.pdf
> 
> How much of range of Unicode characters is taken up by Emoji?

As of Unicode 7.0, there are two blocks:

U+2600 - U+27BF
U+1F300 - U+1F6FF

That's 1,742 code points out of 1,114,112, or 0.156%.  I suspect both of
those blocks have some reservations for expansion, but even if so, they
are still a tiny, tiny fraction of the entire code space.

>> So this is why we have to go to 32-bits per character and still
>> need UTF-16 instead of just having 16-bit characters and be done
>> with it.
>> 
>> To be able to have pictures of farmyard animals and slices of
>> pizza!
> 
> Not at all.  There are plenty of characters from real-world
> alphabets that don't fit into the initial 2**16 code values.

Heck, there is _one_ script that doesn't fit in 2^16 codes, even by
itself; add in all the others, even just the ones in current use around
the world, and it's not even close.

The original UCS-2 proposal apparently assumed that folks would simply
stop using all characters outside the BMP.  However, since that includes
characters that some people need _just to write their own names, that
was obviously never going to work.

>> I agree that this stuff doesn't belong inside a character set 
>> mapping. Even if we'd need more than 64K code points anyway.
> 
> I haven't paid much attention to the Emoji.  I have no strong opinion
> on whether Unicode *should* support them, but presumably they were
> added because there was demand for them.

Well, part of Unicode's charter is to provide code points for _every_
character that has a code point in some legacy encoding, and the
original emoji fell into that category.  Once some were in, it would
have been a lot more difficult to deny useful expansions.  For instance,
if there was already an emoji for the Japanese flag, then how can you
deny adding emoji for other countries' flags?  What about the recent
proposal for skin-tone modifiers, to address claims of racism with the
"white" (actually mostly yellow) human emoji?

> As long as I'm able to read UTF-8 text without having to worry about 
> them, I don't have any problem with their presence.

Agreed; we have to support UTF-8 and dozens of scripts anyway, for
proper internationalization, so emoji don't make things any worse.

> (And they probably reduce the incentive to render punctuation
> sequences as smiley faces.  I get USGS earthquake notifications as
> text messages, and I got really tired of every occurrence of the
> messages being displayed with random smiley faces.)

Agreed, and it's often hard to figure out which characters errant smiley
faces were supposed to be.  Emoji solve that problem nicely.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/25/2015 8:41:01 PM
On 25-May-15 00:51, Morris Dovey wrote:
> On 5/24/15 7:26 PM, Stephen Sprunk wrote:
>> static inline int coll(char c) {
>>    return strchr("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", c);
>> }
>> ...
>> switch (coll(c)) {
>>    case coll('A')..coll('F'):
>> ...
>>
>> That requires two significant changes to C's switch syntax, but it
>> solves all of the objections I've seen in this thread so far, and it
>> wouldn't be difficult to implement at all.
>>
>> With some implementation-specific hackery, you could also tap into the
>> Standard Library's collation tables so that it automatically adapted to
>> the user's locale, and it'd only require a tiny bit of work to handle
>> UTF-8 strings as well.
> 
> I'd prefer to allow user-specified character sets and collation orders
> for each switch statement. That would increase usefulness with not too
> much additional compiler design complexity.

The two changes required to make the above example work are:

1. Allowing all compile-time constants for cases, not just "integer
constant expressions".  C++ already has this.
2. Add support for numeric ranges.  GCC already has this.

Note that my example does _not_ require any character-specific hackery
in the language itself.  It'd be nice if the Standard Library provided a
function to access collation ordinals (in addition to strcoll() and
wcscoll(), which only compare two characters' ordinals), but that is
completely optional.

> I did a bit of doodling to try to pin down what execution logic might
> look like and produced this bit of pseudocode (note that 'case code' can
> include a break statement):
> 
> char *seqstr;  /* Pointer to collating sequence string */
> int   seqlen;  /* Length of sequence string */
> int   switch;  /* Switch target value */
> int   skip;    /* Target matched, do not test again */
> 
> void collate(char *s) /* Register collating sequence */
> {  seqstr = s;
>    seqlen = strlen(s);
> }
> 
> int co(char x) /* Return index of x in sequence, or -1 */
> {  for (int i=0; i<seqlen; i++) if (x == seqstr[i]) return i;
>    return -1;
> }

Why not just use strchr() as in my example?  That also lets you get rid
of your seqlen variable.  If you want to keep the latter, you could use
memchr() instead of an explicit loop.

You'd need to change that to strstr() for UTF-8 support, but that's a
minor change; it doesn't break the design.

> int inrange(char min,char max) /* Case range test or skip */
> {  int i1,i2;
>    if (skip) return 1;
>    i1 = co(min),
>    i2 = co(max);
>    skip = -1 != i1 && -1 != i2 && switch >= i1 && switch <= i2;
>    return skip;
> }

If you're going to do it that way, you might as well use strcoll() (or
wcscoll()) rather than rolling your own.

> What I have NOT accounted for is nested switch statements - nor did
> I even try to deal with nested switch statements using different
> character sets and/or collating orders. I think I know what kind of
> code structure that might entail, but it's too late/early and I'm
> just not up to dealing with that tonight. :-)

If that's truly necessary, then by leaning on the existing collation
functions, you could accommodate it by having each level change locale,
without any other changes to the language itself.  That'd be slow, but I
suspect it's a rare enough use case to not matter.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/25/2015 8:48:44 PM
On 25/05/2015 20:31, Robert Wessel wrote:
> On Sun, 24 May 2015 10:57:36 +0100, Bartc <bc@freeuk.com> wrote:
>> On 24/05/2015 04:50, Robert Wessel wrote:

>>> A major issue is exactly what the syntax and
>>> semantics of that would be.  For example, what heppens when the
>>> following are specified:
>>>
>>>     case 'A'+2..'J'+2:
>>>     case 'A'..'b':
>>>     case '['..']':
>>>
>>> I don't see any solutions that aren't particularly ugly.  So there's
>>> probably no way, in C, to actually do that.
>>
>>>     case '['..']':
>>>
>>> I don't see any solutions that aren't particularly ugly.  So there's
>>> probably no way, in C, to actually do that.
>>
>> I think I replied to these examples before. What exactly do you want to
>> happen here? It's not obvious nor intuitive. But sub-ranges of the
>> English alphabet, /of the same case/, are.
>>
>> While C /does/ allow you to write:
>>
>>   case 'A'+2: case 'J'+2:
>>
>> which also makes some assumptions.
>
>
> My question is what exactly syntactically distinguishes the above
> cases from the "sub-ranges of the  English alphabet" that would get
> special handling.

> I understand that you *want* "case 'a..'z':" to mean the 26 letters, I
> want to know what reasonable syntax rules would produce such a thing.

Morris Dovey suggested one approach which was to apply a function to 
each character code which would map it into whatever character set you 
wanted (a collating sequence he called it).

I'd be happy if it just mapped any character to ASCII (and this is very 
easy to do on most machines as it will already be ASCII).

But it means the examples would need rewriting as something like:

  case asc('A')+2 .. asc('J')+2:
  case asc('A') .. asc('b'):

Here, the assumption is that asc() will accept both upper and lower case 
and return a code within a single range (eg. 0 to 127). Other functions 
might take only upper case, or map everything to lower case. 
Out-of-range codes might return 0.

  case asc('[')..asc(']'):

In this example, this just ensures that whatever range is intended, it 
is consistent across different machines.

However, take away asc(), and this is what C does now on machines with 
ASCII targets. asc() or some equivalent function might satisfy people 
who worry about EBCDIC, or need to work with an alphabet that needs to 
include extra, or fewer, characters.

(It would still need the range feature, and asc() etc would need to be a 
compile-time constant for use as a case label.)

-- 
Bartc
0
Bartc
5/25/2015 9:30:50 PM
On Mon, 25 May 2015 22:30:50 +0100, Bartc <bc@freeuk.com> wrote:

>On 25/05/2015 20:31, Robert Wessel wrote:
>> On Sun, 24 May 2015 10:57:36 +0100, Bartc <bc@freeuk.com> wrote:
>>> On 24/05/2015 04:50, Robert Wessel wrote:
>
>>>> A major issue is exactly what the syntax and
>>>> semantics of that would be.  For example, what heppens when the
>>>> following are specified:
>>>>
>>>>     case 'A'+2..'J'+2:
>>>>     case 'A'..'b':
>>>>     case '['..']':
>>>>
>>>> I don't see any solutions that aren't particularly ugly.  So there's
>>>> probably no way, in C, to actually do that.
>>>
>>>>     case '['..']':
>>>>
>>>> I don't see any solutions that aren't particularly ugly.  So there's
>>>> probably no way, in C, to actually do that.
>>>
>>> I think I replied to these examples before. What exactly do you want to
>>> happen here? It's not obvious nor intuitive. But sub-ranges of the
>>> English alphabet, /of the same case/, are.
>>>
>>> While C /does/ allow you to write:
>>>
>>>   case 'A'+2: case 'J'+2:
>>>
>>> which also makes some assumptions.
>>
>>
>> My question is what exactly syntactically distinguishes the above
>> cases from the "sub-ranges of the  English alphabet" that would get
>> special handling.
>
>> I understand that you *want* "case 'a..'z':" to mean the 26 letters, I
>> want to know what reasonable syntax rules would produce such a thing.
>
>Morris Dovey suggested one approach which was to apply a function to 
>each character code which would map it into whatever character set you 
>wanted (a collating sequence he called it).
>
>I'd be happy if it just mapped any character to ASCII (and this is very 
>easy to do on most machines as it will already be ASCII).
>
>But it means the examples would need rewriting as something like:
>
>  case asc('A')+2 .. asc('J')+2:
>  case asc('A') .. asc('b'):
>
>Here, the assumption is that asc() will accept both upper and lower case 
>and return a code within a single range (eg. 0 to 127). Other functions 
>might take only upper case, or map everything to lower case. 
>Out-of-range codes might return 0.
>
>  case asc('[')..asc(']'):
>
>In this example, this just ensures that whatever range is intended, it 
>is consistent across different machines.


I really don't see how this would work.  For something like this to
work you'd need to translate *both* the value and the comperand(s).

IOW:

switch (co(value))
  {
  case co('a')..(co'z');
  }


>However, take away asc(), and this is what C does now on machines with 
>ASCII targets. asc() or some equivalent function might satisfy people 
>who worry about EBCDIC, or need to work with an alphabet that needs to 
>include extra, or fewer, characters.
>
>(It would still need the range feature, and asc() etc would need to be a 
>compile-time constant for use as a case label.)


At least the range would be needed, the rest could be a port of C++
constexprs.
0
Robert
5/25/2015 10:56:09 PM
On 5/25/15 3:48 PM, Stephen Sprunk wrote:
> On 25-May-15 00:51, Morris Dovey wrote:
>> On 5/24/15 7:26 PM, Stephen Sprunk wrote:
>>> static inline int coll(char c) {
>>>     return strchr("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", c);
>>> }
>>> ...
>>> switch (coll(c)) {
>>>     case coll('A')..coll('F'):
>>> ...
>>>
>>> That requires two significant changes to C's switch syntax, but it
>>> solves all of the objections I've seen in this thread so far, and it
>>> wouldn't be difficult to implement at all.
>>>
>>> With some implementation-specific hackery, you could also tap into the
>>> Standard Library's collation tables so that it automatically adapted to
>>> the user's locale, and it'd only require a tiny bit of work to handle
>>> UTF-8 strings as well.
>>
>> I'd prefer to allow user-specified character sets and collation orders
>> for each switch statement. That would increase usefulness with not too
>> much additional compiler design complexity.
>
> The two changes required to make the above example work are:
>
> 1. Allowing all compile-time constants for cases, not just "integer
> constant expressions".  C++ already has this.

"All compile-time constants" could be pretty squirrely. How close do 
floating point values need to be to a range bound to be classified as 
'in' the range?

What non-C languages support is irrelevant. If C++ already has what you 
need, then your problem is already solved and by insisting on a C 
solution, you’re rejecting the very solution you’re proposing. :-P

> 2. Add support for numeric ranges.  GCC already has this.

Goody for gcc – but it’s just one C compiler package out of a whole pile 
of C compilers, and Gnu lacks imperial authority.

> Note that my example does _not_ require any character-specific hackery
> in the language itself.  It'd be nice if the Standard Library provided a
> function to access collation ordinals (in addition to strcoll() and
> wcscoll(), which only compare two characters' ordinals), but that is
> completely optional.

You’re wanting to do pointer comparisons and I still prefer integer 
comparisons. I’m probably not going to change my mind on this one 
without seeing compelling reasons.

I think you’re missing an important point - that while the character set 
for program code can be nailed down for a language language, programmers 
can reasonably be expected to deal with textual data in a multitude of 
languages - in Canada, for example, with both English and French 
character sets and collating sequences - or in Greece with Greek, Latin, 
Turkish, Arabic, and Macedonian textual data. For those reasons I would 
argue that if we’re going to allow character ranges, the programmer 
should be able to specify the character set and associated collating 
sequence.

One of the reasons I picked Greek for my example, by the way, is that a 
range of 'A'..'Z' only includes the first six letters of the Greek alphabet.
>
>> I did a bit of doodling to try to pin down what execution logic might
>> look like and produced this bit of pseudocode (note that 'case code' can
>> include a break statement):
>>
>> char *seqstr;  /* Pointer to collating sequence string */
>> int   seqlen;  /* Length of sequence string */
>> int   switch;  /* Switch target value */
>> int   skip;    /* Target matched, do not test again */
>>
>> void collate(char *s) /* Register collating sequence */
>> {  seqstr = s;
>>     seqlen = strlen(s);
>> }
>>
>> int co(char x) /* Return index of x in sequence, or -1 */
>> {  for (int i=0; i<seqlen; i++) if (x == seqstr[i]) return i;
>>     return -1;
>> }
>
> Why not just use strchr() as in my example?  That also lets you get rid
> of your seqlen variable.  If you want to keep the latter, you could use
> memchr() instead of an explicit loop.
>
> You'd need to change that to strstr() for UTF-8 support, but that's a
> minor change; it doesn't break the design.
>
>> int inrange(char min,char max) /* Case range test or skip */
>> {  int i1,i2;
>>     if (skip) return 1;
>>     i1 = co(min),
>>     i2 = co(max);
>>     skip = -1 != i1 && -1 != i2 && switch >= i1 && switch <= i2;
>>     return skip;
>> }
>
> If you're going to do it that way, you might as well use strcoll() (or
> wcscoll()) rather than rolling your own.

Perhaps – but since I was writing pseudocode, I was more concerned with 
clarity of concept than the library. Please feel free to tweak it any 
way that pleases you.

The sharp programmer will probably notice that the character set 
specified for a switch statement need only include those characters used 
in the range specifications, and that by so limiting the character set 
there are speed and memory advantages over hackish tricks with the 
locale. :-D

>> What I have NOT accounted for is nested switch statements - nor did
>> I even try to deal with nested switch statements using different
>> character sets and/or collating orders. I think I know what kind of
>> code structure that might entail, but it's too late/early and I'm
>> just not up to dealing with that tonight. :-)
>
> If that's truly necessary, then by leaning on the existing collation
> functions, you could accommodate it by having each level change locale,
> without any other changes to the language itself.  That'd be slow, but I
> suspect it's a rare enough use case to not matter.

It is truly necessary - the standard allows nested switch statements.

There was an unanswered request for possible syntax suggestions. I meant 
to respond but got sidetracked. I can suggest

    switch (x) using ("ABCDEF")
    {  case 'A'..'F':
          /* case code */
       case 'G':
          /* case code */
       default:
          /* case code */
    }

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/26/2015 7:36:56 AM
David Brown <david.brown@hesbynett.no> wrote:

> (Octal literals other than 0, on the other hand, are practically 
> non-existent in embedded code, and banned by many coding standards as 
> confusing, hard to read, and error prone.  When you work with data sizes 
> of 8, 16 and 32 bits, a notation for 3 bit sizes is useless.)

The only _common_ use of octal literals I know of is Unix file
permissions, but there they're very common.

There may, of course, be many other _un_common uses.

Richard
0
raltbos
5/26/2015 9:11:54 AM
On 26/05/15 11:11, Richard Bos wrote:
> David Brown <david.brown@hesbynett.no> wrote:
> 
>> (Octal literals other than 0, on the other hand, are practically 
>> non-existent in embedded code, and banned by many coding standards as 
>> confusing, hard to read, and error prone.  When you work with data sizes 
>> of 8, 16 and 32 bits, a notation for 3 bit sizes is useless.)
> 
> The only _common_ use of octal literals I know of is Unix file
> permissions, but there they're very common.

That's my impression too (plus 0, which is technically an octal literal).

> 
> There may, of course, be many other _un_common uses.
> 

My gut feeling is that accidental usage (i.e., hard-to-spot bugs) is
more common than intentional usage of octal, other than for *nix file
permissions and for zero.  The only uncommon use I can think of is that
some people may use it for ASCII characters.



0
David
5/26/2015 9:26:38 AM
David Brown wrote:
> On 26/05/15 11:11, Richard Bos wrote:
>> David Brown <david.brown@hesbynett.no> wrote:
>>
>>> (Octal literals other than 0, on the other hand, are practically
>>> non-existent in embedded code, and banned by many coding standards as
>>> confusing, hard to read, and error prone.  When you work with data sizes
>>> of 8, 16 and 32 bits, a notation for 3 bit sizes is useless.)
>>
>> The only _common_ use of octal literals I know of is Unix file
>> permissions, but there they're very common.
>
> That's my impression too (plus 0, which is technically an octal literal).
>
>>
>> There may, of course, be many other _un_common uses.
>>
>
> My gut feeling is that accidental usage (i.e., hard-to-spot bugs) is
> more common than intentional usage of octal, other than for *nix file
> permissions and for zero.  The only uncommon use I can think of is that
> some people may use it for ASCII characters.

This seems to be particularly common in old code!

-- 
Ian Collins
0
Ian
5/26/2015 10:02:38 AM
David Brown <david.brown@hesbynett.no> writes:
<snip>
> (Octal literals other than 0, on the other hand, are practically
> non-existent in embedded code, and banned by many coding standards as
> confusing, hard to read, and error prone.  When you work with data
> sizes of 8, 16 and 32 bits, a notation for 3 bit sizes is useless.)

If the data are arbitrary, I agree, but octal is useful for some kinds
of data.  The most common data you entered via the switches on the front
panel of a PDP-11 were instructions, and it's no accident that the
switches are grouped in threes as the instruction format is full of
octal-significant patterns.  If you are doing low-level PDP-11
programming you will use octal a lot, despite it being a 16-bit machine
with 8-bit bytes.  The use of octal in the PDP-11 probably comes the
36-bit PDP-10 range.

It's tempting to think the octal got into C because of the octal-heavy
DEC heritage, but octal constants where in B in exactly the same form (a
leading 0 indicating base 8) so I think they were probably just copied
into C in an environment where octal was used enough that there would be
no motivation to "tidy up" B by dropping them.

B was developed on Honeywell 6000 machines (re-badged GE machines) which
also make heavy use of octal.  Addresses are 18 bits and alphanumeric
data are composed of either 6 or 9 bit bytes.  The opcodes are 9 bits
and some 3-bit portions denote variations.  Octal was used almost
exclusively.

-- 
Ben.
0
Ben
5/26/2015 1:59:41 PM
On 26/05/15 15:59, Ben Bacarisse wrote:
> David Brown <david.brown@hesbynett.no> writes:
> <snip>
>> (Octal literals other than 0, on the other hand, are practically
>> non-existent in embedded code, and banned by many coding standards as
>> confusing, hard to read, and error prone.  When you work with data
>> sizes of 8, 16 and 32 bits, a notation for 3 bit sizes is useless.)
> 
> If the data are arbitrary, I agree, but octal is useful for some kinds
> of data.  The most common data you entered via the switches on the front
> panel of a PDP-11 were instructions, and it's no accident that the
> switches are grouped in threes as the instruction format is full of
> octal-significant patterns.  If you are doing low-level PDP-11
> programming you will use octal a lot, despite it being a 16-bit machine
> with 8-bit bytes.  The use of octal in the PDP-11 probably comes the
> 36-bit PDP-10 range.

I can accept that there is historical significance here, but I can't see
this as a reason to want to use octal in C programming.  If anyone still
enters instructions to PDP-11 machines by switches, they are programming
in machine code - not C.  And while I have not used a PDP-11, I have
used MSP430 microcontrollers extensively, which share a substantial
common subset with the PDP-11 ISA.  Never in all my assembly or C
programming on these devices have I come across anything that would be
sensibly expressed in octal.

> 
> It's tempting to think the octal got into C because of the octal-heavy
> DEC heritage, but octal constants where in B in exactly the same form (a
> leading 0 indicating base 8) so I think they were probably just copied
> into C in an environment where octal was used enough that there would be
> no motivation to "tidy up" B by dropping them.
> 
> B was developed on Honeywell 6000 machines (re-badged GE machines) which
> also make heavy use of octal.  Addresses are 18 bits and alphanumeric
> data are composed of either 6 or 9 bit bytes.  The opcodes are 9 bits
> and some 3-bit portions denote variations.  Octal was used almost
> exclusively.
> 

I think this suggests that octal and the octal notation were historical
baggage from before the start of C, which happened to be convenient for
*nix permission bits.

0
David
5/26/2015 2:58:13 PM
On 26/05/2015 15:58, David Brown wrote:
> On 26/05/15 15:59, Ben Bacarisse wrote:

>> B was developed on Honeywell 6000 machines (re-badged GE machines) which
>> also make heavy use of octal.  Addresses are 18 bits and alphanumeric
>> data are composed of either 6 or 9 bit bytes.  The opcodes are 9 bits
>> and some 3-bit portions denote variations.  Octal was used almost
>> exclusively.
>>
>
> I think this suggests that octal and the octal notation were historical
> baggage from before the start of C, which happened to be convenient for
> *nix permission bits.

I'm not bothered whether there is octal notation or not, but the leading 
zero notation is just wrong.

Octal constants should be more explicitly denoted.

Then everyone can forget about them, and we have the freedom to write 
decimal numbers with leading zeros if we want without introducing subtle 
bugs.

-- 
Bartc
0
Bartc
5/26/2015 3:19:26 PM
On 5/26/15 2:36 AM, Morris Dovey wrote:
> There was an unanswered request for possible syntax suggestions. I meant
> to respond but got sidetracked. I can suggest
>
>     switch (x) using ("ABCDEF")
>     {  case 'A'..'F':
>           /* case code */
>        case 'G':
>           /* case code */
>        default:
>           /* case code */
>     }

After sleeping on that, I like better:

    switch (x)
    {  case 'A'..'F' using ("ABCDEF"):
          /* case code */
       case 'G':
          /* case code */
       default:
          /* case code */
    }

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/26/2015 6:23:49 PM
On 26-May-15 02:36, Morris Dovey wrote:
> On 5/25/15 3:48 PM, Stephen Sprunk wrote:
>> On 25-May-15 00:51, Morris Dovey wrote:
>>> On 5/24/15 7:26 PM, Stephen Sprunk wrote:
>>>> static inline int coll(char c) {
>>>>     return strchr("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", c);
>>>> }
>>>> ...
>>>> switch (coll(c)) {
>>>>     case coll('A')..coll('F'):
>>>> ...
>>>>
>>>> That requires two significant changes to C's switch syntax, but
>>>> it solves all of the objections I've seen in this thread so
>>>> far, and it wouldn't be difficult to implement at all.
>>>> 
>>>> With some implementation-specific hackery, you could also tap
>>>> into the Standard Library's collation tables so that it
>>>> automatically adapted to the user's locale, and it'd only
>>>> require a tiny bit of work to handle UTF-8 strings as well.
>>> 
>>> I'd prefer to allow user-specified character sets and collation
>>> orders for each switch statement. That would increase usefulness
>>> with not too much additional compiler design complexity.
>> 
>> The two changes required to make the above example work are:
>> 
>> 1. Allowing all compile-time constants for cases, not just
>> "integer constant expressions".  C++ already has this.
> 
> "All compile-time constants" could be pretty squirrely. How close do 
> floating point values need to be to a range bound to be classified
> as 'in' the range?

Hmm; I was assuming integers, but if you wanted to add floating point as
well, which becomes a lot more useful with ranges, then I suppose you
could define it in terms of the relative comparison operators.

> What non-C languages support is irrelevant. If C++ already has what
> you need, then your problem is already solved and by insisting on a
> C solution, you’re rejecting the very solution you’re proposing. :-P

My point was that we have an example that is easily copied, just as C
has copied many other things from C++ (and vice versa, of course).

>> 2. Add support for numeric ranges.  GCC already has this.
> 
> Goody for gcc – but it’s just one C compiler package out of a whole
> pile of C compilers, and Gnu lacks imperial authority.

Again, my point was that we have an example that is easily copied, and
many "new" features in C started as compiler (often GCC) extensions.

>> Note that my example does _not_ require any character-specific
>> hackery in the language itself.  It'd be nice if the Standard
>> Library provided a function to access collation ordinals (in
>> addition to strcoll() and wcscoll(), which only compare two
>> characters' ordinals), but that is completely optional.
> 
> You’re wanting to do pointer comparisons and I still prefer integer 
> comparisons. I’m probably not going to change my mind on this one 
> without seeing compelling reasons.

Huh?  My example was all integer comparisons.

> I think you’re missing an important point - that while the character 
> set for program code can be nailed down for a language language, 
> programmers can reasonably be expected to deal with textual data in
> a multitude of languages - in Canada, for example, with both English 
> and French character sets and collating sequences - or in Greece
> with Greek, Latin, Turkish, Arabic, and Macedonian textual data. For
> those reasons I would argue that if we’re going to allow character
> ranges, the programmer should be able to specify the character set
> and associated collating sequence.

That's why my example had a caller-provided collating sequence and, as
an exercise for the reader, I suggested tapping into the existing
collation tables that are automatically selected by user locale.

> One of the reasons I picked Greek for my example, by the way, is
> that a range of 'A'..'Z' only includes the first six letters of the
> Greek alphabet.

Does it?  I can't tell if that's U+0391 .. U+0396 or U+0041 .. U+005A.

> The sharp programmer will probably notice that the character set 
> specified for a switch statement need only include those characters
> used in the range specifications, and that by so limiting the
> character set there are speed and memory advantages over hackish
> tricks with the locale. :-D

If you don't include the entire range explicitly, then you're back to
being dependent on the particular character encoding used by the
specific implementation you're running on, e.g. it works on ASCII
systems but breaks on EBCDIC ones.

>>> What I have NOT accounted for is nested switch statements - nor
>>> did I even try to deal with nested switch statements using
>>> different character sets and/or collating orders. I think I know
>>> what kind of code structure that might entail, but it's too
>>> late/early and I'm just not up to dealing with that tonight. :-)
>> 
>> If that's truly necessary, then by leaning on the existing
>> collation functions, you could accommodate it by having each level
>> change locale, without any other changes to the language itself.
>> That'd be slow, but I suspect it's a rare enough use case to not
>> matter.
> 
> It is truly necessary - the standard allows nested switch
> statements.

True, but that doesn't mean you _must_ allow different collation orders
at each level of nesting.

> There was an unanswered request for possible syntax suggestions. I
> meant to respond but got sidetracked. I can suggest
> 
>    switch (x) using ("ABCDEF")
>    {  case 'A'..'F':
>          /* case code */
>       case 'G':
>          /* case code */
>       default:
>          /* case code */
>    }

Do you really want to have to specify the collation sequence for every
switch, even though for the vast majority of code it'll be fixed, either
at compile time or by selecting a single locale at run-time?

Also, that syntax change makes this a very specific core language
feature; my example was designed to leverage two simpler, more general
language features that have _already_ been proposed numerous times and
have working examples in the wild.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/26/2015 6:27:11 PM
Morris Dovey <mrdovey@iedu.com> writes:

> On 5/26/15 2:36 AM, Morris Dovey wrote:
>> There was an unanswered request for possible syntax suggestions. I meant
>> to respond but got sidetracked. I can suggest
>>
>>     switch (x) using ("ABCDEF")
>>     {  case 'A'..'F':
>>           /* case code */
>>        case 'G':
>>           /* case code */
>>        default:
>>           /* case code */
>>     }
>
> After sleeping on that, I like better:
>
>    switch (x)
>    {  case 'A'..'F' using ("ABCDEF"):
>          /* case code */
>       case 'G':
>          /* case code */
>       default:
>          /* case code */
>    }

I suppose you could even have:

    switch (x)
    {  case "ABCDEF":
	  /* case code */
       case 'G':
	  /* case code */
       default:
	  /* case code */
    }

-- 
Ben.
0
Ben
5/26/2015 11:55:00 PM
Bartc wrote:
> On 27/05/2015 15:54, David Brown wrote:
>
>> I might object to allowing floating point constants as cases unless you
>> are using ranges, because it is generally a bad idea to test floating
>> point values for equality.
>
> If == is allowed between floating point values, there you can't really
> object to comparing floats anywhere else.
>


There is a long list of caveats in just comparing floating
point numbers.

> The switch might simply be used as a more organised way of doing lots of
> ifs and else-ifs.
>

Not so much. A switch is a (sparse) "array of code blocks".

> (And there are a huge number of values for which floating point equality
> can be tested without problem, including all the 4 billion values of a
> 32-bit int.)
>

That is so implementation dependent than if I were part of
the governance for the language, I'd suggest avoiding it.

-- 
Les Cargill
0
Les
5/27/2015 1:01:01 AM
On 5/26/15 1:27 PM, Stephen Sprunk wrote:
> On 26-May-15 02:36, Morris Dovey wrote:
>> On 5/25/15 3:48 PM, Stephen Sprunk wrote:

>>> The two changes required to make the above example work are:
>>>
>>> 1. Allowing all compile-time constants for cases, not just
>>> "integer constant expressions".  C++ already has this.
>>
>> "All compile-time constants" could be pretty squirrely. How close do
>> floating point values need to be to a range bound to be classified
>> as 'in' the range?
>
> Hmm; I was assuming integers, but if you wanted to add floating point as
> well, which becomes a lot more useful with ranges, then I suppose you
> could define it in terms of the relative comparison operators.

'Scuse me, but I'm not the one wanting to add anything - I'm joining in 
the discussion to explore the possibility that you might have a useful 
idea. "All compile-time constants" seemed fairly unambiguous, so I took 
it literally.

    case 0.1-1E-10..sqrt(2.)+1E-12: /* Like this? */

>> What non-C languages support is irrelevant. If C++ already has what
>> you need, then your problem is already solved and by insisting on a
>> C solution, you’re rejecting the very solution you’re proposing. :-P
>
> My point was that we have an example that is easily copied, just as C
> has copied many other things from C++ (and vice versa, of course).

How interesting.

>>> 2. Add support for numeric ranges.  GCC already has this.
>>
>> Goody for gcc – but it’s just one C compiler package out of a whole
>> pile of C compilers, and Gnu lacks imperial authority.
>
> Again, my point was that we have an example that is easily copied, and
> many "new" features in C started as compiler (often GCC) extensions.

Hmm. If I were in a contentious mood, I would invite your complete 
enumeration of those that originated in the gcc compiler group... :-)

>>> Note that my example does _not_ require any character-specific
>>> hackery in the language itself.  It'd be nice if the Standard
>>> Library provided a function to access collation ordinals (in
>>> addition to strcoll() and wcscoll(), which only compare two
>>> characters' ordinals), but that is completely optional.
>>
>> You’re wanting to do pointer comparisons and I still prefer integer
>> comparisons. I’m probably not going to change my mind on this one
>> without seeing compelling reasons.
>
> Huh?  My example was all integer comparisons.

Err - yesterday you wrote "Why not just use strchr() as in my example?"

>> I think you’re missing an important point - that while the character
>> set for program code can be nailed down for a language language,
>> programmers can reasonably be expected to deal with textual data in
>> a multitude of languages - in Canada, for example, with both English
>> and French character sets and collating sequences - or in Greece
>> with Greek, Latin, Turkish, Arabic, and Macedonian textual data. For
>> those reasons I would argue that if we’re going to allow character
>> ranges, the programmer should be able to specify the character set
>> and associated collating sequence.
>
> That's why my example had a caller-provided collating sequence and, as
> an exercise for the reader, I suggested tapping into the existing
> collation tables that are automatically selected by user locale.

You did, and I'm still interested in how that would work in situations 
where multiple locales would be needed.

>> One of the reasons I picked Greek for my example, by the way, is
>> that a range of 'A'..'Z' only includes the first six letters of the
>> Greek alphabet.
>
> Does it?  I can't tell if that's U+0391 .. U+0396 or U+0041 .. U+005A.

You shouldn’t need to. "ΑΒΓΔΕΖ" should be sufficient.

>> The sharp programmer will probably notice that the character set
>> specified for a switch statement need only include those characters
>> used in the range specifications, and that by so limiting the
>> character set there are speed and memory advantages over hackish
>> tricks with the locale. :-D
>
> If you don't include the entire range explicitly, then you're back to
> being dependent on the particular character encoding used by the
> specific implementation you're running on, e.g. it works on ASCII
> systems but breaks on EBCDIC ones.

Switch statements are inherently data-dependent, and the programmer is 
responsible for knowing those details.

Interestingly, one could build context strings for both ASCII and EBCDIC 
and specify

    switch (x)
    {  case 'A'..'E' using (ascii_context_string):
          /* case code */
          break;
       case 'A'..'E' using (ebcdic_context_string);
          /* case code */
       default:
          /* case code */
    }

to handle both encodings in the same switch.

>>>> What I have NOT accounted for is nested switch statements - nor
>>>> did I even try to deal with nested switch statements using
>>>> different character sets and/or collating orders. I think I know
>>>> what kind of code structure that might entail, but it's too
>>>> late/early and I'm just not up to dealing with that tonight. :-)
>>>
>>> If that's truly necessary, then by leaning on the existing
>>> collation functions, you could accommodate it by having each level
>>> change locale, without any other changes to the language itself.
>>> That'd be slow, but I suspect it's a rare enough use case to not
>>> matter.
>>
>> It is truly necessary - the standard allows nested switch
>> statements.
>
> True, but that doesn't mean you _must_ allow different collation orders
> at each level of nesting.

That's true - but if we're going to consider adding a feature to the 
language, let's aim to add as much power as possible at the lowest 
possible cost. At present C doesn't support ranges at all, and if we're 
to consider adding them to the standard a larger benefit probably means 
a higher probability of adoption.

>> There was an unanswered request for possible syntax suggestions. I
>> meant to respond but got sidetracked. I can suggest
>>
>>     switch (x) using ("ABCDEF")
>>     {  case 'A'..'F':
>>           /* case code */
>>        case 'G':
>>           /* case code */
>>        default:
>>           /* case code */
>>     }

Quoting my "rethink":

>> After sleeping on that, I like better:
>>
>>    switch (x)
>>    {  case 'A'..'F' using ("ABCDEF"):
>>          /* case code */
>>       case 'G':
>>          /* case code */
>>       default:
>>          /* case code */
>>    }

Which would allow (but not require) specification of a different 
collation sequence for each range case.

> Do you really want to have to specify the collation sequence for every
> switch, even though for the vast majority of code it'll be fixed, either
> at compile time or by selecting a single locale at run-time?

Nope. The only time an explicit collation sequence specification should 
be needed is when the programmer wants to use other than that which is 
the default at compilation time, and then only for character range cases.

> Also, that syntax change makes this a very specific core language
> feature; my example was designed to leverage two simpler, more general
> language features that have _already_ been proposed numerous times and
> have working examples in the wild.

Range cases may indeed have been proposed numerous times – which means 
that they've been deemed to offer insufficient benefit all those 
numerous times.

Paraphrasing: The definition of "insanity" may be proposing the same 
change repeatedly, each time expecting a different response, while 
always getting the same response.

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/27/2015 2:09:05 AM
On 26-May-15 21:09, Morris Dovey wrote:
> On 5/26/15 1:27 PM, Stephen Sprunk wrote:
>> On 26-May-15 02:36, Morris Dovey wrote:
>>> "All compile-time constants" could be pretty squirrely. How close
>>> do floating point values need to be to a range bound to be
>>> classified as 'in' the range?
>> 
>> Hmm; I was assuming integers, but if you wanted to add floating
>> point as well, which becomes a lot more useful with ranges, then I
>> suppose you could define it in terms of the relative comparison
>> operators.
> 
> 'Scuse me, but I'm not the one wanting to add anything - I'm joining
> in the discussion to explore the possibility that you might have a
> useful idea. "All compile-time constants" seemed fairly unambiguous,
> so I took it literally.
> 
> case 0.1-1E-10..sqrt(2.)+1E-12: /* Like this? */

I just realized that makes jump tables impossible, so that probably
isn't a good idea; I'd keep it limited to integers.  Good question.

>>>> Note that my example does _not_ require any character-specific 
>>>> hackery in the language itself.  It'd be nice if the Standard 
>>>> Library provided a function to access collation ordinals (in 
>>>> addition to strcoll() and wcscoll(), which only compare two 
>>>> characters' ordinals), but that is completely optional.
>>> 
>>> You’re wanting to do pointer comparisons and I still prefer
>>> integer comparisons. I’m probably not going to change my mind on
>>> this one without seeing compelling reasons.
>> 
>> Huh?  My example was all integer comparisons.
> 
> Err - yesterday you wrote "Why not just use strchr() as in my
> example?"

My example used strchr() to find the collation ordinal of a given
character, and then compared those; it never compared pointers.

>> That's why my example had a caller-provided collating sequence and,
>> as an exercise for the reader, I suggested tapping into the
>> existing collation tables that are automatically selected by user
>> locale.
> 
> You did, and I'm still interested in how that would work in
> situations where multiple locales would be needed.

The only way I can see the latter working is to require changing locales
for each level of nesting.  As I said, that'd be slow, but it should
also be rare.

>>> One of the reasons I picked Greek for my example, by the way, is 
>>> that a range of 'A'..'Z' only includes the first six letters of
>>> the Greek alphabet.
>> 
>> Does it?  I can't tell if that's U+0391 .. U+0396 or U+0041 ..
>> U+005A.
> 
> You shouldn’t need to. "ΑΒΓΔΕΖ" should be sufficient.

Then we're back to specifying the entire collation sequence rather than
just a range.

And once you step outside the basic ASCII set, those collation sequences
can be _massive_, and the average programmer is almost certain to get
them wrong, which is why I suggested using the library's set, since it
already has to have those for its own collation functions.

>> Do you really want to have to specify the collation sequence for
>> every switch, even though for the vast majority of code it'll be
>> fixed, either at compile time or by selecting a single locale at
>> run-time?
> 
> Nope. The only time an explicit collation sequence specification
> should be needed is when the programmer wants to use other than that
> which is the default at compilation time, and then only for character
> range cases.

What is default at compilation time is almost certainly incorrect, at
least for any program with even halfway decent localization.

>> Also, that syntax change makes this a very specific core language 
>> feature; my example was designed to leverage two simpler, more
>> general language features that have _already_ been proposed
>> numerous times and have working examples in the wild.
> 
> Range cases may indeed have been proposed numerous times – which
> means that they've been deemed to offer insufficient benefit all
> those numerous times.
> 
> Paraphrasing: The definition of "insanity" may be proposing the same 
> change repeatedly, each time expecting a different response, while 
> always getting the same response.

Fair enough, or perhaps nobody so far has provided sufficient
justification for their inclusion.  I don't know the history, so I don't
know exactly why past attempts failed.  If I were submitting a formal
proposal, I'd do the necessary research and find a better approach.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/27/2015 5:03:14 AM
On Wed, 27 May 2015 00:03:14 -0500, Stephen Sprunk
<stephen@sprunk.org> wrote:

>On 26-May-15 21:09, Morris Dovey wrote:
>> On 5/26/15 1:27 PM, Stephen Sprunk wrote:
>>> On 26-May-15 02:36, Morris Dovey wrote:
>>>> "All compile-time constants" could be pretty squirrely. How close
>>>> do floating point values need to be to a range bound to be
>>>> classified as 'in' the range?
>>> 
>>> Hmm; I was assuming integers, but if you wanted to add floating
>>> point as well, which becomes a lot more useful with ranges, then I
>>> suppose you could define it in terms of the relative comparison
>>> operators.
>> 
>> 'Scuse me, but I'm not the one wanting to add anything - I'm joining
>> in the discussion to explore the possibility that you might have a
>> useful idea. "All compile-time constants" seemed fairly unambiguous,
>> so I took it literally.
>> 
>> case 0.1-1E-10..sqrt(2.)+1E-12: /* Like this? */
>
>I just realized that makes jump tables impossible, so that probably
>isn't a good idea; I'd keep it limited to integers.  Good question.


Only if you assume a 1970s compiler.  If all the cases are integers,
optimize as you do now.  If not, then jump tables may not be
applicable.
0
Robert
5/27/2015 5:59:01 AM
On 5/27/15 12:03 AM, Stephen Sprunk wrote:
> On 26-May-15 21:09, Morris Dovey wrote:
>> On 5/26/15 1:27 PM, Stephen Sprunk wrote:
>>> On 26-May-15 02:36, Morris Dovey wrote:
>>>> "All compile-time constants" could be pretty squirrely. How close
>>>> do floating point values need to be to a range bound to be
>>>> classified as 'in' the range?
>>>
>>> Hmm; I was assuming integers, but if you wanted to add floating
>>> point as well, which becomes a lot more useful with ranges, then I
>>> suppose you could define it in terms of the relative comparison
>>> operators.
>>
>> 'Scuse me, but I'm not the one wanting to add anything - I'm joining
>> in the discussion to explore the possibility that you might have a
>> useful idea. "All compile-time constants" seemed fairly unambiguous,
>> so I took it literally.
>>
>> case 0.1-1E-10..sqrt(2.)+1E-12: /* Like this? */
>
> I just realized that makes jump tables impossible, so that probably
> isn't a good idea; I'd keep it limited to integers.  Good question.

Thanks. I'm not sure that it wouldn't be useful, but am sure that FP 
could offer some challenges. It might look 'friendlier' if I wrote

    case 0.1-EPSILON..sqrt(2)+EPSILON:

or it might even be possible for the compiler to automagically provide a 
target-dependent EPSILON so we could get away with writing

    case 0.1..sqrt(2):

but that would take more analysis than I want to do tonight.

>>>>> Note that my example does _not_ require any character-specific
>>>>> hackery in the language itself.  It'd be nice if the Standard
>>>>> Library provided a function to access collation ordinals (in
>>>>> addition to strcoll() and wcscoll(), which only compare two
>>>>> characters' ordinals), but that is completely optional.
>>>>
>>>> You’re wanting to do pointer comparisons and I still prefer
>>>> integer comparisons. I’m probably not going to change my mind on
>>>> this one without seeing compelling reasons.
>>>
>>> Huh?  My example was all integer comparisons.
>>
>> Err - yesterday you wrote "Why not just use strchr() as in my
>> example?"
>
> My example used strchr() to find the collation ordinal of a given
> character, and then compared those; it never compared pointers.

Yabbut what you get from strchr() is a pointer (to char)...

>>> That's why my example had a caller-provided collating sequence and,
>>> as an exercise for the reader, I suggested tapping into the
>>> existing collation tables that are automatically selected by user
>>> locale.
>>
>> You did, and I'm still interested in how that would work in
>> situations where multiple locales would be needed.
>
> The only way I can see the latter working is to require changing locales
> for each level of nesting.  As I said, that'd be slow, but it should
> also be rare.
>
>>>> One of the reasons I picked Greek for my example, by the way, is
>>>> that a range of 'A'..'Z' only includes the first six letters of
>>>> the Greek alphabet.
>>>
>>> Does it?  I can't tell if that's U+0391 .. U+0396 or U+0041 ..
>>> U+005A.
>>
>> You shouldn’t need to. "ΑΒΓΔΕΖ" should be sufficient.
>
> Then we're back to specifying the entire collation sequence rather than
> just a range.

No - only enough of the sequence to determine whether the switch value 
is within the specified sequence! Consider:

    switch (x)
    {  case 'A'..'Y' using ("AEIOUY"):
          /* case code for processing a vowel */
          break;
       /* other cases */
    }

> And once you step outside the basic ASCII set, those collation sequences
> can be _massive_, and the average programmer is almost certain to get
> them wrong, which is why I suggested using the library's set, since it
> already has to have those for its own collation functions.
>
>>> Do you really want to have to specify the collation sequence for
>>> every switch, even though for the vast majority of code it'll be
>>> fixed, either at compile time or by selecting a single locale at
>>> run-time?
>>
>> Nope. The only time an explicit collation sequence specification
>> should be needed is when the programmer wants to use other than that
>> which is the default at compilation time, and then only for character
>> range cases.
>
> What is default at compilation time is almost certainly incorrect, at
> least for any program with even halfway decent localization.

Yes. Add this to the list of issues to be resolved prior to making a 
proposal to the standards committee.

>>> Also, that syntax change makes this a very specific core language
>>> feature; my example was designed to leverage two simpler, more
>>> general language features that have _already_ been proposed
>>> numerous times and have working examples in the wild.
>>
>> Range cases may indeed have been proposed numerous times – which
>> means that they've been deemed to offer insufficient benefit all
>> those numerous times.
>>
>> Paraphrasing: The definition of "insanity" may be proposing the same
>> change repeatedly, each time expecting a different response, while
>> always getting the same response.
>
> Fair enough, or perhaps nobody so far has provided sufficient
> justification for their inclusion.  I don't know the history, so I don't
> know exactly why past attempts failed.  If I were submitting a formal
> proposal, I'd do the necessary research and find a better approach.

First class! An excellent plan. :-)

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/27/2015 6:15:00 AM
On 26/05/15 17:19, Bartc wrote:
> On 26/05/2015 15:58, David Brown wrote:
>> On 26/05/15 15:59, Ben Bacarisse wrote:
> 
>>> B was developed on Honeywell 6000 machines (re-badged GE machines) which
>>> also make heavy use of octal.  Addresses are 18 bits and alphanumeric
>>> data are composed of either 6 or 9 bit bytes.  The opcodes are 9 bits
>>> and some 3-bit portions denote variations.  Octal was used almost
>>> exclusively.
>>>
>>
>> I think this suggests that octal and the octal notation were historical
>> baggage from before the start of C, which happened to be convenient for
>> *nix permission bits.
> 
> I'm not bothered whether there is octal notation or not, but the leading
> zero notation is just wrong.

Agreed.

> 
> Octal constants should be more explicitly denoted.

Yes - in a way that is hard to miss or get wrong either when writing
them, or when reading them.  (A leading "O", for example, would be hard
to get wrong in writing, but easy to misunderstand in reading.)

> 
> Then everyone can forget about them, and we have the freedom to write
> decimal numbers with leading zeros if we want without introducing subtle
> bugs.
> 

That would be nice - and a good plan for a non-C language.

But since I work with C, and it's not going to lose its badly designed
octal literals, I just have to hope that compilers such as gcc and clang
introduce optional warnings for them.


0
David
5/27/2015 6:45:20 AM
On 27/05/2015 03:09, Morris Dovey wrote:

>> Also, that syntax change makes this a very specific core language
>> feature; my example was designed to leverage two simpler, more general
>> language features that have _already_ been proposed numerous times and
>> have working examples in the wild.
>
> Range cases may indeed have been proposed numerous times – which means
> that they've been deemed to offer insufficient benefit all those
> numerous times.

Most languages I've created have included some sort of range feature 
(usually in the form of an inclusive range denoted by a..b, just like 
Pascal).

They are used by themselves (eg. as x in a..b), as switch-case values, 
to build sets (eg. [a..b, c, d..e]), for array bounds (eg. [a..b]int), 
to specify slices and substrings (eg. x[a..b]), or sometimes simply to 
combine two values into one (eg. return a..b), with obviously the 
ability to extract either limit.

And I use them everywhere (well, until I started using C).

But if the people responsible for C decide they are of insufficient 
benefit, then they must be right. After all, what do I know? (I will 
just continue using them to increase my own productivity and readability 
in my own way.)

-- 
Bartc
0
Bartc
5/27/2015 9:22:33 AM
On 05/27/2015 01:03 AM, Stephen Sprunk wrote:
> On 26-May-15 21:09, Morris Dovey wrote:
>> On 5/26/15 1:27 PM, Stephen Sprunk wrote:
>>> On 26-May-15 02:36, Morris Dovey wrote:
>>>> "All compile-time constants" could be pretty squirrely. How close
>>>> do floating point values need to be to a range bound to be
>>>> classified as 'in' the range?
>>>
>>> Hmm; I was assuming integers, but if you wanted to add floating
>>> point as well, which becomes a lot more useful with ranges, then I
>>> suppose you could define it in terms of the relative comparison
>>> operators.
>>
>> 'Scuse me, but I'm not the one wanting to add anything - I'm joining
>> in the discussion to explore the possibility that you might have a
>> useful idea. "All compile-time constants" seemed fairly unambiguous,
>> so I took it literally.
>>
>> case 0.1-1E-10..sqrt(2.)+1E-12: /* Like this? */
> 
> I just realized that makes jump tables impossible, so that probably
> isn't a good idea; I'd keep it limited to integers.  Good question.

Any decent compiler should decide whether or not to use a jump table
only after considering the values of the corresponding case labels, to
determine whether it's a reasonable strategy. This change would only add
additional cases where jump tables are not reasonable; it doesn't
prevent the use of jump tables when they are reasonable.
-- 
James Kuyper
0
James
5/27/2015 2:36:57 PM
On 27/05/15 16:36, James Kuyper wrote:
> On 05/27/2015 01:03 AM, Stephen Sprunk wrote:
>> On 26-May-15 21:09, Morris Dovey wrote:
>>> On 5/26/15 1:27 PM, Stephen Sprunk wrote:
>>>> On 26-May-15 02:36, Morris Dovey wrote:
>>>>> "All compile-time constants" could be pretty squirrely. How close
>>>>> do floating point values need to be to a range bound to be
>>>>> classified as 'in' the range?
>>>>
>>>> Hmm; I was assuming integers, but if you wanted to add floating
>>>> point as well, which becomes a lot more useful with ranges, then I
>>>> suppose you could define it in terms of the relative comparison
>>>> operators.
>>>
>>> 'Scuse me, but I'm not the one wanting to add anything - I'm joining
>>> in the discussion to explore the possibility that you might have a
>>> useful idea. "All compile-time constants" seemed fairly unambiguous,
>>> so I took it literally.
>>>
>>> case 0.1-1E-10..sqrt(2.)+1E-12: /* Like this? */
>>
>> I just realized that makes jump tables impossible, so that probably
>> isn't a good idea; I'd keep it limited to integers.  Good question.
> 
> Any decent compiler should decide whether or not to use a jump table
> only after considering the values of the corresponding case labels, to
> determine whether it's a reasonable strategy. This change would only add
> additional cases where jump tables are not reasonable; it doesn't
> prevent the use of jump tables when they are reasonable.
> 

Indeed - no (decent!) compiler will generate a jump table for switch
labels 1000000, 2000000, 3000000, which are perfectly legal to use in
current C.

I might object to allowing floating point constants as cases unless you
are using ranges, because it is generally a bad idea to test floating
point values for equality.


0
David
5/27/2015 2:54:26 PM
On 27/05/2015 15:54, David Brown wrote:

> I might object to allowing floating point constants as cases unless you
> are using ranges, because it is generally a bad idea to test floating
> point values for equality.

If == is allowed between floating point values, there you can't really 
object to comparing floats anywhere else.

The switch might simply be used as a more organised way of doing lots of 
ifs and else-ifs.

(And there are a huge number of values for which floating point equality 
can be tested without problem, including all the 4 billion values of a 
32-bit int.)

-- 
Bartc


0
Bartc
5/27/2015 3:08:05 PM
On 5/27/15 4:22 AM, Bartc wrote:
> On 27/05/2015 03:09, Morris Dovey wrote:
>
>>> Also, that syntax change makes this a very specific core language
>>> feature; my example was designed to leverage two simpler, more general
>>> language features that have _already_ been proposed numerous times and
>>> have working examples in the wild.
>>
>> Range cases may indeed have been proposed numerous times – which means
>> that they've been deemed to offer insufficient benefit all those
>> numerous times.
>
> Most languages I've created have included some sort of range feature
> (usually in the form of an inclusive range denoted by a..b, just like
> Pascal).
>
> They are used by themselves (eg. as x in a..b), as switch-case values,
> to build sets (eg. [a..b, c, d..e]), for array bounds (eg. [a..b]int),
> to specify slices and substrings (eg. x[a..b]), or sometimes simply to
> combine two values into one (eg. return a..b), with obviously the
> ability to extract either limit.
>
> And I use them everywhere (well, until I started using C).
>
> But if the people responsible for C decide they are of insufficient
> benefit, then they must be right. After all, what do I know? (I will
> just continue using them to increase my own productivity and readability
> in my own way.)

There would seem to be a lot of interesting possibilities...

One that I've wondered about from time to time is

    switch (x)
    {  case { expression }:
          /* case code */
       /* other cases */
    }

where <expression> is code that examines x and yields TRUE (execute case 
code) or FALSE (do not execute case code) – of which an interesting 
subset/derivative possibility might be:

    switch (x)
    {  case { function(x) }:
          /* case code */
       /* other cases */
    }

which causes the named function to be invoked to produce the TRUE/FALSE 
result.

C actually does have a (primitive) range-testing feature:

    if (x >= lower_bound && x <= upper_bound) /* x in range! */;

:-)

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/27/2015 3:32:05 PM
On 27/05/15 16:32, Morris Dovey wrote:

<snip>

> C actually does have a (primitive) range-testing feature:
>
>      if (x >= lower_bound && x <= upper_bound) /* x in range! */;

If you don't like &&, you can avoid it as follows:

   if(abs(2 * x - (upper + lower)) < upper - lower) /* x in range! */

That is, if the (absolute) distance of x from the midpoint is lower than 
half the range, then x is within range. Multiplying by 2 throughout 
removes the (potentially expensive) divisions.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/27/2015 4:08:25 PM
On 05/27/2015 02:15 AM, Morris Dovey wrote:
....
> Thanks. I'm not sure that it wouldn't be useful, but am sure that FP 
> could offer some challenges. It might look 'friendlier' if I wrote
> 
>     case 0.1-EPSILON..sqrt(2)+EPSILON:
> 
> or it might even be possible for the compiler to automagically provide a 
> target-dependent EPSILON so we could get away with writing

Were such a change made to the language, the following code would more
accurately reflect what I would want to do:

     case nextafter(0.1,0.0)..nextafter(sqrt(2.0), 2.0):

In the more general case, I would use either -DBL_MAX or DBL_MAX as the
second argument to the nexafter() calls.
0
James
5/27/2015 5:34:57 PM
On 27/05/2015 18:18, Les Cargill wrote:
> Bartc wrote:

>> (And there are a huge number of values for which floating point equality
>> can be tested without problem, including all the 4 billion values of a
>> 32-bit int.)
>>
>
> That is so implementation dependent than if I were part of
> the governance for the language, I'd suggest avoiding it.

Which implementations would return false for 1.0 == 1? (or 1.0 == 1.0?)

How about this:

double a,b=1.0;

a = b;

Are there any implementations where a == b is false? (Assume b contains 
a valid value.)

-- 
Bartc
0
Bartc
5/27/2015 5:39:16 PM
Morris Dovey <mrdovey@iedu.com> writes:
[...]
> One that I've wondered about from time to time is
>
>    switch (x)
>    {  case { expression }:
>          /* case code */
>       /* other cases */
>    }
>
> where <expression> is code that examines x and yields TRUE (execute
> case code) or FALSE (do not execute case code) – of which an
> interesting subset/derivative possibility might be:
>
>    switch (x)
>    {  case { function(x) }:
>          /* case code */
>       /* other cases */
>    }
>
> which causes the named function to be invoked to produce the
> TRUE/FALSE result.

What's the point of specifying x in `switch (x)`?  Can the expression
refer to things other than x?  If it can't, it seems like an odd
restriction.  If it can, it's just an if/else in disguise.

> C actually does have a (primitive) range-testing feature:
>
>    if (x >= lower_bound && x <= upper_bound) /* x in range! */;
>
> :-)

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/27/2015 5:40:38 PM
On 27/05/2015 16:32, Morris Dovey wrote:
> On 5/27/15 4:22 AM, Bartc wrote:

>>
>> But if the people responsible for C decide they are of insufficient
>> benefit, then they must be right. After all, what do I know?

> There would seem to be a lot of interesting possibilities...
>
> One that I've wondered about from time to time is
>
>     switch (x)
>     {  case { expression }:
>           /* case code */
>        /* other cases */
>     }
>
> where <expression> is code that examines x and yields TRUE (execute case
> code) or FALSE (do not execute case code) – of which an interesting
> subset/derivative possibility might be:
>
>     switch (x)
>     {  case { function(x) }:
>           /* case code */
>        /* other cases */
>     }
>
> which causes the named function to be invoked to produce the TRUE/FALSE
> result.

Unless I've missed something, those are equivalent to:

  if (expression) {
    ....

and:

  if (function(x)) {
    ....

so there is a less compelling reason for such a feature (ie. runtime 
case expressions, although you've confused it a little by requiring a 
match with TRUE rather then the switch expression).

With a more elaborate example, I have a syntax which would look like:

  case
  when expr1, expr2 then
    ....
  when expr3, expr4, expr5 then
   ....
  else
   ....
  end

which is tidy, but is still just another way of writing:

  if (expr1 || expr2) {
     ....
  } else if (expr3 || expr4 || expr5) {
     ....
  } else {
     ....
  }


> C actually does have a (primitive) range-testing feature:
>
>     if (x >= lower_bound && x <= upper_bound) /* x in range! */;

Here, on the other hand, a proper treatment of ranges would have real 
advantages. (You can do certain things with macros, but they would look 
terrible. Some things simply need to be built-in.)

-- 
Bartc
0
Bartc
5/27/2015 5:55:51 PM
On 27-May-15 01:15, Morris Dovey wrote:
> On 5/27/15 12:03 AM, Stephen Sprunk wrote:
>> On 26-May-15 21:09, Morris Dovey wrote:
>>>> Huh?  My example was all integer comparisons.
>>> 
>>> Err - yesterday you wrote "Why not just use strchr() as in my 
>>> example?"
>> 
>> My example used strchr() to find the collation ordinal of a given 
>> character, and then compared those; it never compared pointers.
> 
> Yabbut what you get from strchr() is a pointer (to char)...

D'oh!  That's what I get for not testing my code.  Notice that I had
"return strchr(...);" in a function returning int.

That should have been:

static const char seq[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
return strchr(seq, c) - seq;

It's still lacking any error-handling code; that's left as an exercise
for the reader, as is making the sequence caller-supplied.

>>>>> One of the reasons I picked Greek for my example, by the way,
>>>>> is that a range of 'A'..'Z' only includes the first six
>>>>> letters of the Greek alphabet.
>>>> 
>>>> Does it?  I can't tell if that's U+0391 .. U+0396 or U+0041 .. 
>>>> U+005A.
>>> 
>>> You shouldn’t need to. "ΑΒΓΔΕΖ" should be sufficient.
>> 
>> Then we're back to specifying the entire collation sequence rather
>> than just a range.
> 
> No - only enough of the sequence to determine whether the switch
> value is within the specified sequence! Consider:
> 
> switch (x)
> {  case 'A'..'Y' using ("AEIOUY"):

That's still the entire collation sequence, just a rather short one.

>       /* case code for processing a vowel */
>       break;
>    /* other cases */
> }

I still don't see why you want your "using" clause on every case, which
obviously opens up the opportunity of having multiple cases match,
rather than putting it on the switch.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/27/2015 7:07:28 PM
Stephen Sprunk <stephen@sprunk.org> wrote:
(snip) 
> D'oh!  That's what I get for not testing my code.  Notice that I had
> "return strchr(...);" in a function returning int.
 
> That should have been:
 
> static const char seq[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> return strchr(seq, c) - seq;

Reminds me of my first program using strstr.  

Being used to PL/I, where INDEX return an integer, I had:

j=strstr(s,t)-s;

then later, when I needed to test it:

if(j==NULL-s) ...
 
> It's still lacking any error-handling code; that's left as an exercise
> for the reader, as is making the sequence caller-supplied.

especially when strchr() returns null.

-- glen
0
glen
5/27/2015 7:53:10 PM
Keith Thompson <kst-u@mib.org> wrote:
> Morris Dovey <mrdovey@iedu.com> writes:
> [...]
>> One that I've wondered about from time to time is

>>    switch (x)
>>    {  case { expression }:
>>          /* case code */
>>       /* other cases */
>>    }

(snip)
>> which causes the named function to be invoked to produce the
>> TRUE/FALSE result.
 
> What's the point of specifying x in `switch (x)`?  

I presume x should be true (1).

> Can the expression
> refer to things other than x?  If it can't, it seems like an odd
> restriction.  If it can, it's just an if/else in disguise.

The verilog switch/case allows expressions in the case, though most
often they are constant. It evaluated them in order, until the first
one that matches the switch argument. It would be unusual to put
a ocnstant on the switch(), and non-constant expressions on the case,
but legal.

-- glen
0
glen
5/27/2015 7:56:47 PM
On 27/05/15 19:39, Bartc wrote:
> On 27/05/2015 18:18, Les Cargill wrote:
>> Bartc wrote:
>
>>> (And there are a huge number of values for which floating point equality
>>> can be tested without problem, including all the 4 billion values of a
>>> 32-bit int.)
>>>
>>
>> That is so implementation dependent than if I were part of
>> the governance for the language, I'd suggest avoiding it.
>
> Which implementations would return false for 1.0 == 1? (or 1.0 == 1.0?)
>
> How about this:
>
> double a,b=1.0;
>
> a = b;
>
> Are there any implementations where a == b is false? (Assume b contains
> a valid value.)
>

I believe it is perfectly possible for the comparison to be false 
(perhaps not with 1.0, since that has a nice easy representation in IEEE 
formats).  In particular, compilers on x86 systems might use 80-bit 
registers to store values temporarily, even if they are 32-bit or 64-bit 
in memory - and equality comparisons may fail.

It is also possible, I think, for some types of NaN (which may count as 
"valid value", depending on your definitions) to always fail 
comparisons.  In such cases, neither "a == b" nor "a != b" will be true.

I cannot say for sure what the standards allow, and I haven't seen such 
problems personally, but I have heard enough war stories from real code 
to know that one should never compare floating point values for equality.

0
David
5/27/2015 8:04:19 PM
Stephen Sprunk <stephen@sprunk.org> writes:

> On 27-May-15 01:15, Morris Dovey wrote:
>> On 5/27/15 12:03 AM, Stephen Sprunk wrote:
<snip>
>>> My example used strchr() to find the collation ordinal of a given 
>>> character, and then compared those;
<snip>
> static const char seq[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> return strchr(seq, c) - seq;
>
> It's still lacking any error-handling code; that's left as an exercise
> for the reader, as is making the sequence caller-supplied.

You might consider using

  strcspn(seq, (char []){c, 0})

for that.  You get some kind of error handling free, in that all
characters not in the collating sequence are given the same high (yet
valid) index.

The strspn and strcspn functions are much overlooked.  For example,
membership of the can be detected with

  strspn((char []){c, 0}, seq)

This might be what's needed for some of these switch examples (sorry,
I've lost tack of the actual point of the discussion!).

<snip>
-- 
Ben.
0
Ben
5/27/2015 8:21:07 PM
On 5/27/15 12:55 PM, Bartc wrote:
> On 27/05/2015 16:32, Morris Dovey wrote:
>> On 5/27/15 4:22 AM, Bartc wrote:
>
>>>
>>> But if the people responsible for C decide they are of insufficient
>>> benefit, then they must be right. After all, what do I know?
>
>> There would seem to be a lot of interesting possibilities...
>>
>> One that I've wondered about from time to time is
>>
>>     switch (x)
>>     {  case { expression }:
>>           /* case code */
>>        /* other cases */
>>     }
>>
>> where <expression> is code that examines x and yields TRUE (execute case
>> code) or FALSE (do not execute case code) – of which an interesting
>> subset/derivative possibility might be:
>>
>>     switch (x)
>>     {  case { function(x) }:
>>           /* case code */
>>        /* other cases */
>>     }
>>
>> which causes the named function to be invoked to produce the TRUE/FALSE
>> result.
>
> Unless I've missed something, those are equivalent to:
>
>   if (expression) {
>     ....
>
> and:
>
>   if (function(x)) {
>     ....

I agree - although there's need to determine whether this case test can 
be bypassed and its code executed without qualification because a 
previous case qualified, executed, and neither it nor any of the 
intervening cases contained a break statement.

> so there is a less compelling reason for such a feature (ie. runtime
> case expressions, although you've confused it a little by requiring a
> match with TRUE rather then the switch expression).
>
> With a more elaborate example, I have a syntax which would look like:
>
>   case
>   when expr1, expr2 then
>     ....
>   when expr3, expr4, expr5 then
>    ....
>   else
>    ....
>   end

I think I could like that were it not for the fact that it would break a 
lot of existing code.
>
> which is tidy, but is still just another way of writing:
>
>   if (expr1 || expr2) {
>      ....
>   } else if (expr3 || expr4 || expr5) {
>      ....
>   } else {
>      ....
>   }
>
>
>> C actually does have a (primitive) range-testing feature:
>>
>>     if (x >= lower_bound && x <= upper_bound) /* x in range! */;
>
> Here, on the other hand, a proper treatment of ranges would have real
> advantages. (You can do certain things with macros, but they would look
> terrible. Some things simply need to be built-in.)

Agreed.

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/27/2015 8:31:21 PM
On 27/05/2015 21:04, David Brown wrote:
> On 27/05/15 19:39, Bartc wrote:
>> On 27/05/2015 18:18, Les Cargill wrote:
>>> Bartc wrote:

>> Which implementations would return false for 1.0 == 1? (or 1.0 == 1.0?)

> I believe it is perfectly possible for the comparison to be false
> (perhaps not with 1.0, since that has a nice easy representation in IEEE
> formats).  In particular, compilers on x86 systems might use 80-bit
> registers to store values temporarily, even if they are 32-bit or 64-bit
> in memory - and equality comparisons may fail.
>
> It is also possible, I think, for some types of NaN (which may count as
> "valid value", depending on your definitions) to always fail
> comparisons.  In such cases, neither "a == b" nor "a != b" will be true.

What about a <= b, a < b, a > b or a >= b when either operand is a NaN?

> I cannot say for sure what the standards allow, and I haven't seen such
> problems personally, but I have heard enough war stories from real code
> to know that one should never compare floating point values for equality.

 From badly designed code no doubt. It should be fine if you know what 
you're doing.

But since the language doesn't actually ban testing for floating point 
equality, I can't see that being a reason for not allowing it anywhere else.

(Looking at my own code, there are plenty of comparisons against 0.0, 
most likely of values which might have been specifically set to zero 
rather than the approximate result of a calculation, when some tolerance 
would need be taken into account.)

-- 
Bartc



0
Bartc
5/27/2015 8:38:01 PM
glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:
> Keith Thompson <kst-u@mib.org> wrote:
>> Morris Dovey <mrdovey@iedu.com> writes:
>> [...]
>>> One that I've wondered about from time to time is
>
>>>    switch (x)
>>>    {  case { expression }:
>>>          /* case code */
>>>       /* other cases */
>>>    }
>
> (snip)
>>> which causes the named function to be invoked to produce the
>>> TRUE/FALSE result.
>  
>> What's the point of specifying x in `switch (x)`?  
>
> I presume x should be true (1).

That seems unlikely.  Given the proposed syntax:

    switch (x) {
        case { expr1 }: ...;
        case { expr2 }: ...;
        case { expr3 }: ...;
        default:        ...;
    }

I presume it would branch to the first case whose expression is true.

What I don't understand from Morris's description is the relationship
between x and the expressions.  If x itself is expected to be true,
then there seems little point in specifying it.  (I'm probably
making too big a deal over a minor oversight.)

You could have an enhanced form of switch statement where the switch
expression is somehow substituted into each of the case expressions.
For example:

    int sign(int x) {
        switch (x) {
            case <  0: return -1;
            case == 0: return  0;
            case >  0: return +1;
        }
    }

But defining the syntax and how it's evaluated would be non-trivial, and
I'm not convinced it's that much better than an equivalent if/else chain
that refers to x explicitly in each expression.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/27/2015 8:46:34 PM
On 5/27/15 12:40 PM, Keith Thompson wrote:

> What's the point of specifying x in `switch (x)`?  Can the expression
> refer to things other than x?  If it can't, it seems like an odd
> restriction.  If it can, it's just an if/else in disguise.

Because it's needed for the kind of case testing already in use.

I think one of the goals should be to extend the existing syntax and 
capabilities without affecting any existing code in any way.

Keith, I confess that I've never thought of 'switch' as anything other 
than a string of 'if' statements with (hidden) 'goto' statements.

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/27/2015 8:47:59 PM
David Brown <david.brown@hesbynett.no> writes:
> On 27/05/15 19:39, Bartc wrote:
[...]
>> double a,b=1.0;
>>
>> a = b;
>>
>> Are there any implementations where a == b is false? (Assume b contains
>> a valid value.)
>>
>
> I believe it is perfectly possible for the comparison to be false
> (perhaps not with 1.0, since that has a nice easy representation in
> IEEE formats).  In particular, compilers on x86 systems might use
> 80-bit registers to store values temporarily, even if they are 32-bit
> or 64-bit in memory - and equality comparisons may fail.

C doesn't require IEEE formats or semantics.

Any floating-point type that meets the requirements of the
floating-point model in N1570 5.2.4.2.2 can represent 1.0 exactly
-- but that doesn't necessarily mean that 1.0 == 1.0 will evaluate
to 1.  An implementation where 1.0 != 1.0 might be conforming,
but it would IMHO be badly broken.

> It is also possible, I think, for some types of NaN (which may count
> as "valid value", depending on your definitions) to always fail
> comparisons.  In such cases, neither "a == b" nor "a != b" will be
> true.

If a and b are both NaNs, then `a == b` is false and `a != b` is true
(assuming IEEE semantics).

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/27/2015 8:51:22 PM
Morris Dovey <mrdovey@iedu.com> writes:
> On 5/27/15 12:40 PM, Keith Thompson wrote:
>> What's the point of specifying x in `switch (x)`?  Can the expression
>> refer to things other than x?  If it can't, it seems like an odd
>> restriction.  If it can, it's just an if/else in disguise.
>
> Because it's needed for the kind of case testing already in use.

I *think* I see what you're suggesting.

Correct me if I'm mistaken.  You're suggesting a new kind of case label,
with the syntax

    case { expression } :

where the surrounding switch statement branches to it if the expression
evaluates to a true value.

You'd be able to mix old-style and new-style case labels within the same
switch statement, right?  So a sign function might be written like this:

    int sign(int x) {
        switch (x) {
            case { x < 0 }: return -1;
            case 0:         return  0;
            case { x > 0 }: return +1;
        }
    }

A new-style case label would evaluate its expression and would not have
to refer to the expression in the switch at all (unless you want to
impose that restriction, but it seems arbitrary).

> I think one of the goals should be to extend the existing syntax and
> capabilities without affecting any existing code in any way.
>
> Keith, I confess that I've never thought of 'switch' as anything other
> than a string of 'if' statements with (hidden) 'goto' statements.

It can be implemented that way, but the current restrictions are
specifically intended to make it implementable as a jump table.

For example, specifying the same value in two case labels is a
compile-time constraint violation.  That rule would have be modified for
switch statements containing new-style case labels.  Perhaps all the
old-style case labels would have to be distinct, and any new-style case
labels are evaluated only if none of the old-style labels match?  It
seems overly complicated.

My sign() function could be written in current C as follows:

    int sign(int x) {
        switch (x) {
            case 0:
                return 0;
            default:
                if      (x < 0) return -1;
                else if (x > 0) return +1;
        }
    }

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/27/2015 9:08:42 PM
On 27/05/15 21:47, Morris Dovey wrote:

<snip>

> Keith, I confess that I've never thought of 'switch' as anything other
> than a string of 'if' statements with (hidden) 'goto' statements.


As is usual for me, I prefer to look at things more abstractly. As soon 
as I encountered the word "switch", I thought of a switch (odd, I know, 
but there you are). Something that turns something else on or off. 
Having learned what it actually does, though, I realised that it was 
more like a railway switch (in the UK, we call them "points"), but not 
/precisely/ like a railway switch, because a railway switch gives you 
two possibilities (if/else), whereas switch(){} gives you several.

Railways often make for good programming analogies, though, so I spent a 
little time thinking about it, and the best I could come up with was a 
turntable. You get on, you spin the handle a few times depending on your 
condition, and then you disembark and carry merrily on your way.

What I can't work into the railway analogy is "fall-through", which 
seems a very odd thing to me. If we must have it, I'd have preferred to 
have it explicitly coded in (thus eliminating the need for break). For 
example, fallthrough BAZ; meaning "jump to case BAZ". But I'd prefer to 
do without it completely.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/27/2015 9:11:34 PM
Keith Thompson <kst-u@mib.org> wrote:

(snip)
>>>>    switch (x)
>>>>    {  case { expression }:
>>>>          /* case code */
>>>>       /* other cases */
>>>>    }

>> (snip)
>>>> which causes the named function to be invoked to produce the
>>>> TRUE/FALSE result.
  
>>> What's the point of specifying x in `switch (x)`?  

(snip)
(I wrote)

>> I presume x should be true (1).
 
> That seems unlikely.  Given the proposed syntax:
 
>    switch (x) {
>        case { expr1 }: ...;
>        case { expr2 }: ...;
>        case { expr3 }: ...;
>        default:        ...;
>    }
 
> I presume it would branch to the first case whose expression is true.

I was assuming it would take the first one equal to x, and so x
should be true (1, with the usual caveats for true and 1).  

You could:

switch (3) {
   case {i+j}: printf "i+j==3";
   case {i-j}: printf "i-j==3";
   case {i*j}: printf "i*j==3";
   case {i/j}: printf "i/j==3";
   }

(as I noted earlier, you can do this in verilog).

> What I don't understand from Morris's description is the relationship
> between x and the expressions.  If x itself is expected to be true,
> then there seems little point in specifying it.  (I'm probably
> making too big a deal over a minor oversight.)

Well, x could be false, or 3, or any other value.
 
> You could have an enhanced form of switch statement where the switch
> expression is somehow substituted into each of the case expressions.
> For example:
> 
>    int sign(int x) {
>        switch (x) {
>            case <  0: return -1;
>            case == 0: return  0;
>            case >  0: return +1;
>        }
>    }
 
> But defining the syntax and how it's evaluated would be non-trivial, and
> I'm not convinced it's that much better than an equivalent if/else chain
> that refers to x explicitly in each expression.

That is another way to do it.

-- glen

0
glen
5/27/2015 9:16:27 PM
On 27/05/2015 21:47, Morris Dovey wrote:
> On 5/27/15 12:40 PM, Keith Thompson wrote:
>
>> What's the point of specifying x in `switch (x)`?  Can the expression
>> refer to things other than x?  If it can't, it seems like an odd
>> restriction.  If it can, it's just an if/else in disguise.
>
> Because it's needed for the kind of case testing already in use.
>
> I think one of the goals should be to extend the existing syntax and
> capabilities without affecting any existing code in any way.
>
> Keith, I confess that I've never thought of 'switch' as anything other
> than a string of 'if' statements with (hidden) 'goto' statements.

(I've always implemented 'switch' specifically for jump tables, which 
means various restrictions on what is allowed. For anything else, or 
where a jump table is not practical, then there is a separate kind of 
statement.

I've recently found however that switch statements with a small number 
of cases (<8 in one program) are more efficient when implemented as a 
sequential set of tests (for x64 at least). So small switches are 
silently converted to the other form which does test sequentially.

But with either kind of statement (C has just the one), the pattern is 
clear to me: you are testing the same expression against a range of 
possibilities. Rather different from the arbitrary testing you can do in 
an if-else chain.)

-- 
Barftc

0
Bartc
5/27/2015 9:22:03 PM
On Wed, 27 May 2015 21:38:01 +0100, Bartc <bc@freeuk.com> wrote:

>On 27/05/2015 21:04, David Brown wrote:
>> On 27/05/15 19:39, Bartc wrote:
>>> On 27/05/2015 18:18, Les Cargill wrote:
>>>> Bartc wrote:
>
>>> Which implementations would return false for 1.0 == 1? (or 1.0 == 1.0?)
>
>> I believe it is perfectly possible for the comparison to be false
>> (perhaps not with 1.0, since that has a nice easy representation in IEEE
>> formats).  In particular, compilers on x86 systems might use 80-bit
>> registers to store values temporarily, even if they are 32-bit or 64-bit
>> in memory - and equality comparisons may fail.
>>
>> It is also possible, I think, for some types of NaN (which may count as
>> "valid value", depending on your definitions) to always fail
>> comparisons.  In such cases, neither "a == b" nor "a != b" will be true.
>
>What about a <= b, a < b, a > b or a >= b when either operand is a NaN?


All false.  And NaNs don't compare to themselves either:  

  double a=NaN;
  if (a==a)
     Never.
  if (a!=a)
     Not this either.
  if (a>a)
     Uh uh.
  if (a<=a)
     Nope.
0
Robert
5/28/2015 12:54:57 AM
On Wed, 27 May 2015 19:54:57 -0500, Robert Wessel
<robertwessel2@yahoo.com> wrote:

>On Wed, 27 May 2015 21:38:01 +0100, Bartc <bc@freeuk.com> wrote:
>
>>On 27/05/2015 21:04, David Brown wrote:
>>> On 27/05/15 19:39, Bartc wrote:
>>>> On 27/05/2015 18:18, Les Cargill wrote:
>>>>> Bartc wrote:
>>
>>>> Which implementations would return false for 1.0 == 1? (or 1.0 == 1.0?)
>>
>>> I believe it is perfectly possible for the comparison to be false
>>> (perhaps not with 1.0, since that has a nice easy representation in IEEE
>>> formats).  In particular, compilers on x86 systems might use 80-bit
>>> registers to store values temporarily, even if they are 32-bit or 64-bit
>>> in memory - and equality comparisons may fail.
>>>
>>> It is also possible, I think, for some types of NaN (which may count as
>>> "valid value", depending on your definitions) to always fail
>>> comparisons.  In such cases, neither "a == b" nor "a != b" will be true.
>>
>>What about a <= b, a < b, a > b or a >= b when either operand is a NaN?
>
>
>All false.  And NaNs don't compare to themselves either:  
>
>  double a=NaN;
>  if (a==a)
>     Never.
>  if (a!=a)
>     Not this either.
>  if (a>a)
>     Uh uh.
>  if (a<=a)
>     Nope.


My mistake, the inequality test above would return *true*.  All the
others return false.
0
Robert
5/28/2015 1:01:01 AM
On 28/05/15 14:49, Thomas Jahns wrote:
> On 05/27/15 18:08, Richard Heathfield wrote:
>>> C actually does have a (primitive) range-testing feature:
>>>
>>>      if (x >= lower_bound && x <= upper_bound) /* x in range! */;
>>
>> If you don't like &&, you can avoid it as follows:
>>
>>    if(abs(2 * x - (upper + lower)) < upper - lower) /* x in range! */
>>
>> That is, if the (absolute) distance of x from the midpoint is lower than half
>> the range, then x is within range. Multiplying by 2 throughout removes the
>> (potentially expensive) divisions.
>
> Since you didn't write it*: note that this method introduces a number of places
> where precision might get lost, namely x might be close to HUGE_VAL, so that
> multiplying by 2 or subtracting (upper + lower) might push the expression over
> the edge into INFINITY resulting in an incorrect predicate.
>
> Of course the code is likely to be valid for any well-behaved data/bound
> combination.
>
> Thomas
>
> * I assume Richard is well aware of the caveats and just forgot to mention them.

Actually, I was well aware of the caveats and assumed that anyone for 
whom they would be even remotely relevant would also be aware of them. 
The advantage of doing this was that I could post a short article. The 
disadvantage was the risk that someone like you would nit-pick it. 
(Please note: I actually approve of nit-picking, so this is not a 
criticism!)

In any article, one always has to decide where to draw the line - if one 
tries to explain every detail, the article becomes unwieldy in its 
length and indeed in its comprehensibility. It also takes far too long 
to write. So it is necessary to make the cut at some point, and there 
will always be someone for whom that cut was made in the wrong place. 
(Which is fair enough, of course. This isn't a complaint, merely an 
observation.)

I should perhaps take this opportunity to add that it was a bit daft of 
me to say that a couple of divisions by two are potentially expensive. 
They're actually very cheap (although not /quite/ as cheap as a single 
multiplication by two). I still prefer the multiplied-out method though, 
because (in my view) it is easier to read.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/28/2015 1:01:01 AM
On Wed, 27 May 2015 22:11:34 +0100, Richard Heathfield
<rjh@cpax.org.uk> wrote:

>On 27/05/15 21:47, Morris Dovey wrote:
>
><snip>
>
>> Keith, I confess that I've never thought of 'switch' as anything other
>> than a string of 'if' statements with (hidden) 'goto' statements.
>
>
>As is usual for me, I prefer to look at things more abstractly. As soon 
>as I encountered the word "switch", I thought of a switch (odd, I know, 
>but there you are). Something that turns something else on or off. 
>Having learned what it actually does, though, I realised that it was 
>more like a railway switch (in the UK, we call them "points"), but not 
>/precisely/ like a railway switch, because a railway switch gives you 
>two possibilities (if/else), whereas switch(){} gives you several.


Except when there are three choices:

https://en.wikipedia.org/wiki/File:ThreeWayStub.jpg

I don't see why larger (more than three-way) switches would not be
possible as well, although I'm not sure they'd have much advantage
over a set of smaller switches.  Likely they'd be a bit shorter, but
would have more severe speed limits.
0
Robert
5/28/2015 1:10:59 AM
On 28/05/15 02:10, Robert Wessel wrote:
> On Wed, 27 May 2015 22:11:34 +0100, Richard Heathfield
> <rjh@cpax.org.uk> wrote:
>
>>Having learned what it actually does, though, I realised that it was
>>more like a railway switch (in the UK, we call them "points"), but not
>>/precisely/ like a railway switch, because a railway switch gives you
>>two possibilities (if/else), whereas switch(){} gives you several.
>
>
> Except when there are three choices:

That's the trouble with knowledge. When a fact is so obvious, so 
blindingly obvious, that it is completely and utterly beyond dispute, 
and there is no need whatsoever to even think about it, let alone check 
it... that's precisely when some pedant[1] comes along and proves that 
the fact is completely wrong.

Just as well, really - turntables make me dizzy.

[1] Just for the record, I consider the word "pedant" to be a compliment.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within
0
Richard
5/28/2015 1:21:50 AM
Just for grins, I wrote a tiny program to exercise the logic I imagined 
might be used in a switch statement with both range and 'normal' case 
statements. Notice that I used strchr() to do the range test. :-)

I've commented out the switch C code and shown unoptimized replacement 
code on the next line. I think this approach would allow nesting switch 
statements, but haven't tested. I think this should be enough to provide 
a start for anyone who might like to play...

~/Desktop: nt 3 strange.c
#include <stdio.h>
#include <string.h>

int main(void)
{  char *data = "123ABCDEFGHIJKLMN", *p = data, x;

    while (x = *p++)
    {  /* switch (x) { */
       do { int test = 1;

          /* case 'A'..'F' in "ABCDEF": */
          if (test && (test = !strchr("ABCDEF",x))) goto L2;
             printf("1st case: %c\n",x);
             break;

          /* case 'G'..'L' in "GHIJKL": */
    L2:   if (test && (test = !strchr("GHIJKL",x))) goto L3;
             printf("2nd case: %c\n",x);
             break;

          /* case 'M': */
    L3:   if (test && (test = (x != 'M'))) goto L4;
             printf("3rd case: %c\n",x);
             break;

          /* default: */
    L4:      printf("4th case: %c\n",x);
       /* } */
       } while (0);
    }
    return 0;
}
~/Desktop: gcc strange.c -o strange
~/Desktop: strange
4th case: 1
4th case: 2
4th case: 3
1st case: A
1st case: B
1st case: C
1st case: D
1st case: E
1st case: F
2nd case: G
2nd case: H
2nd case: I
2nd case: J
2nd case: K
2nd case: L
3rd case: M
4th case: N
~/Desktop:

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/28/2015 3:20:21 AM
Richard Heathfield <rjh@cpax.org.uk> wrote:
> On 27/05/15 21:47, Morris Dovey wrote:
 
> <snip>
 
>> Keith, I confess that I've never thought of 'switch' as anything other
>> than a string of 'if' statements with (hidden) 'goto' statements.
 
> As is usual for me, I prefer to look at things more abstractly. As soon 
> as I encountered the word "switch", I thought of a switch (odd, I know, 
> but there you are). Something that turns something else on or off. 
> Having learned what it actually does, though, I realised that it was 
> more like a railway switch (in the UK, we call them "points"), but not 
> /precisely/ like a railway switch, because a railway switch gives you 
> two possibilities (if/else), whereas switch(){} gives you several.

It is usual for railway switchyards to put in a series of switches to
allow trains down any of the needed tracks. That is often in the form
of a binary tree, which is also a possible expansion of C switch.

But I learned Fortran and computed GOTO before C existed, and also
knew the generated assembly code for that, which is usually a jump
table. 
 
> Railways often make for good programming analogies, though, so I spent a 
> little time thinking about it, and the best I could come up with was a 
> turntable. You get on, you spin the handle a few times depending on your 
> condition, and then you disembark and carry merrily on your way.

Reminds me that some Fortran books, I believe from compiler vendors,
suggest using arithmetic IF in place of three way computed GOTO.

The original Fortran IF statement has the form:

      IF(expression) stmt1, stmt2, stmt3


where the three stmts are statement numbers (labels), chosen as
expression is less than, equal to, or greater than zero, respectively.

So,

      IF(I-2) stmt1, stmt2, stmt3

has a similar function to a three way computed GOTO.

(In the Fortran 66 standard, there is no default fall through.
It is undefined if the selection variable is out of range, possibly with
a branch to whatever happens to be in that position.  Fixed in 1977.
 
> What I can't work into the railway analogy is "fall-through", which 
> seems a very odd thing to me. If we must have it, I'd have preferred to 
> have it explicitly coded in (thus eliminating the need for break). For 
> example, fallthrough BAZ; meaning "jump to case BAZ". But I'd prefer to 
> do without it completely.

Well, you do want fall through in the case of multiple case for the same
statement.  Fall through also happens naturally with computed GOTO.

-- glen



0
glen
5/28/2015 3:21:27 AM
On Wednesday, May 27, 2015 at 3:21:15 PM UTC-5, Ben Bacarisse wrote:
> Stephen Sprunk <stephen@sprunk.org> writes:
> 
> > On 27-May-15 01:15, Morris Dovey wrote:
> >> On 5/27/15 12:03 AM, Stephen Sprunk wrote:
> <snip>
> >>> My example used strchr() to find the collation ordinal of a given 
> >>> character, and then compared those;
> <snip>
> > static const char seq[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> > return strchr(seq, c) - seq;
> >
> > It's still lacking any error-handling code; that's left as an exercise
> > for the reader, as is making the sequence caller-supplied.
> 
> You might consider using
> 
>   strcspn(seq, (char []){c, 0})
> 
> for that.  You get some kind of error handling free, in that all
> characters not in the collating sequence are given the same high (yet
> valid) index.
> 
> The strspn and strcspn functions are much overlooked.  For example,
> membership of the can be detected with
> 
>   strspn((char []){c, 0}, seq)
> 
> This might be what's needed for some of these switch examples (sorry,
> I've lost tack of the actual point of the discussion!).
> 
> <snip>

That is awesome! I love you, dude!

-- 
+500 bounty
0
luser
5/28/2015 4:45:08 AM
On Wednesday, May 27, 2015 at 4:11:42 PM UTC-5, Richard Heathfield wrote:
> On 27/05/15 21:47, Morris Dovey wrote:
> 
> <snip>
> 
> > Keith, I confess that I've never thought of 'switch' as anything other
> > than a string of 'if' statements with (hidden) 'goto' statements.
> 
> 
> As is usual for me, I prefer to look at things more abstractly. As soon 
> as I encountered the word "switch", I thought of a switch (odd, I know, 
> but there you are). Something that turns something else on or off. 
> Having learned what it actually does, though, I realised that it was 
> more like a railway switch (in the UK, we call them "points"), but not 
> /precisely/ like a railway switch, because a railway switch gives you 
> two possibilities (if/else), whereas switch(){} gives you several.
> 
> Railways often make for good programming analogies, though, so I spent a 
> little time thinking about it, and the best I could come up with was a 
> turntable. You get on, you spin the handle a few times depending on your 
> condition, and then you disembark and carry merrily on your way.
> 
> What I can't work into the railway analogy is "fall-through", which 
> seems a very odd thing to me. If we must have it, I'd have preferred to 
> have it explicitly coded in (thus eliminating the need for break). For 
> example, fallthrough BAZ; meaning "jump to case BAZ". But I'd prefer to 
> do without it completely.
> 

I think the best analogy would be a telephone 
switching plugboard. Just imagine that Saturday
night live sketch with Lily Tomlin. One end of
the cable is fixed (the test expression) and the
other end connects to the appropriate junction
determine by the whim (protocol) of the operator
(implementation). 

-- 
note that parenthetical terms which (over)emphasize
the metaphor are for the audience and not Mr.
Heathfield, author of the message to which this is
a reply. I assume he would get the idea without
such "baby talk". man am i stoned.
0
luser
5/28/2015 5:17:04 AM
On 28/05/2015 04:20, Morris Dovey wrote:
> Just for grins, I wrote a tiny program to exercise the logic I imagined
> might be used in a switch statement with both range and 'normal' case
> statements. Notice that I used strchr() to do the range test. :-)
>
> I've commented out the switch C code and shown unoptimized replacement
> code on the next line. I think this approach would allow nesting switch
> statements, but haven't tested. I think this should be enough to provide
> a start for anyone who might like to play...
>
> ~/Desktop: nt 3 strange.c
> #include <stdio.h>
> #include <string.h>
>
> int main(void)
> {  char *data = "123ABCDEFGHIJKLMN", *p = data, x;
>
>     while (x = *p++)
>     {  /* switch (x) { */
>        do { int test = 1;
>
>           /* case 'A'..'F' in "ABCDEF": */
>           if (test && (test = !strchr("ABCDEF",x))) goto L2;
>              printf("1st case: %c\n",x);
>              break;
>
>           /* case 'G'..'L' in "GHIJKL": */
>     L2:   if (test && (test = !strchr("GHIJKL",x))) goto L3;
>              printf("2nd case: %c\n",x);
>              break;
>
>           /* case 'M': */
>     L3:   if (test && (test = (x != 'M'))) goto L4;
>              printf("3rd case: %c\n",x);
>              break;
>
>           /* default: */
>     L4:      printf("4th case: %c\n",x);
>        /* } */
>        } while (0);
>     }
>     return 0;
> }

What's the purpose of the 'test' variable here? I can't see it, although 
it seems to make it marginally faster than without it.

However, using a conventional switch was about 6 times as fast (using 
the test code here: http://pastebin.com/7eyBwyVc which just keeps count 
of the different groups of letters).

Maintaining the speed is part of the reason for using switch; you don't 
want to be calling functions at runtime even if they would be faster 
than strchr.

(Adding a char->char mapping on the switch expression, such as ebcdic to 
ascii, added a 10-15% overhead. This would not be needed on the case 
labels as they are constant and you would hope the translation is done 
at compile time. But it was still 5 times as fast than using strchr.)

I don't understand either what problems you are envisioning with nested 
switch statements; each statement should be independent of any other.

-- 
Bartc
0
Bartc
5/28/2015 9:28:58 AM
On 5/28/15 4:28 AM, Bartc wrote:

> What's the purpose of the 'test' variable here? I can't see it, although
> it seems to make it marginally faster than without it.

It's a state variable indicating that case testing is or is not 
required. It is initially set and is reset on success to disable testing 
for subsequent cases when there is no break statement. It's not there 
for speed - it's required to provide correct operation.

A conventional switch statement /should/ be considerably faster, since 
it makes only a single value comparison for each case - except, of 
course, for the default case (where no comparison is done).

-- 
Morris Dovey
http://www.iedu.com/Solar
0
Morris
5/28/2015 11:00:42 AM
On 27/05/15 22:38, Bartc wrote:
> On 27/05/2015 21:04, David Brown wrote:
>> On 27/05/15 19:39, Bartc wrote:
>>> On 27/05/2015 18:18, Les Cargill wrote:
>>>> Bartc wrote:
> 
>>> Which implementations would return false for 1.0 == 1? (or 1.0 == 1.0?)
> 
>> I believe it is perfectly possible for the comparison to be false
>> (perhaps not with 1.0, since that has a nice easy representation in IEEE
>> formats).  In particular, compilers on x86 systems might use 80-bit
>> registers to store values temporarily, even if they are 32-bit or 64-bit
>> in memory - and equality comparisons may fail.
>>
>> It is also possible, I think, for some types of NaN (which may count as
>> "valid value", depending on your definitions) to always fail
>> comparisons.  In such cases, neither "a == b" nor "a != b" will be true.
> 
> What about a <= b, a < b, a > b or a >= b when either operand is a NaN?

I believe these will also return "unordered" for the comparisons.  The
compiler will probably consider them as not "true", but it may also lead
to some sort of exception or signal.

> 
>> I cannot say for sure what the standards allow, and I haven't seen such
>> problems personally, but I have heard enough war stories from real code
>> to know that one should never compare floating point values for equality.
> 
> From badly designed code no doubt. It should be fine if you know what
> you're doing.

Technically, that may be a true statement - but because the details of
"what you're doing" depends highly on the compiler, the target, the
compiler flags, the values in use, and the exact source code and exact
generated object code, I'd say you can so rarely rely on floating point
equality that it is best to assume it is always bad design.

> 
> But since the language doesn't actually ban testing for floating point
> equality, I can't see that being a reason for not allowing it anywhere
> else.

The language doesn't ban jumping off cliffs either, but I am sure there
are good reasons not to do so!

Remember, the C standards are standards documents - not guides to good
programming or software design.

> 
> (Looking at my own code, there are plenty of comparisons against 0.0,
> most likely of values which might have been specifically set to zero
> rather than the approximate result of a calculation, when some tolerance
> would need be taken into account.)
> 

A comparison to 0.0 like that will almost certainly work as expected.
But remember that you can still get weird effects - the format supports
signed zeros, subnormals which may appear to be 0 in some circumstances
but not others, and qNaNs that will compare "false" for inequality with 0.0.



0
David
5/28/2015 12:24:51 PM
On 27/05/15 23:22, Bartc wrote:
> On 27/05/2015 21:47, Morris Dovey wrote:
>> On 5/27/15 12:40 PM, Keith Thompson wrote:
>>
>>> What's the point of specifying x in `switch (x)`?  Can the expression
>>> refer to things other than x?  If it can't, it seems like an odd
>>> restriction.  If it can, it's just an if/else in disguise.
>>
>> Because it's needed for the kind of case testing already in use.
>>
>> I think one of the goals should be to extend the existing syntax and
>> capabilities without affecting any existing code in any way.
>>
>> Keith, I confess that I've never thought of 'switch' as anything other
>> than a string of 'if' statements with (hidden) 'goto' statements.
> 
> (I've always implemented 'switch' specifically for jump tables, which
> means various restrictions on what is allowed. For anything else, or
> where a jump table is not practical, then there is a separate kind of
> statement.
> 
> I've recently found however that switch statements with a small number
> of cases (<8 in one program) are more efficient when implemented as a
> sequential set of tests (for x64 at least). So small switches are
> silently converted to the other form which does test sequentially.

There are many situations where a series of tests is a better choice
than a jump table for implementing a switch.  In particular, jump tables
quickly get inefficient if the cases are not sequential (or where the
cases can be combined in a compact but non-sequential manner).  But even
if they are sequential, a sequence of tests (usually a balanced tree)
can be better than a jump table if it works faster on the target cpu -
the tests can be predicted or executed speculatively, while a computed
jump may need a full pipeline flush.

> 
> But with either kind of statement (C has just the one), the pattern is
> clear to me: you are testing the same expression against a range of
> possibilities. Rather different from the arbitrary testing you can do in
> an if-else chain.)
> 

0
David
5/28/2015 12:52:58 PM
On 05/27/15 18:08, Richard Heathfield wrote:
>> C actually does have a (primitive) range-testing feature:
>>
>>      if (x >= lower_bound && x <= upper_bound) /* x in range! */;
>
> If you don't like &&, you can avoid it as follows:
>
>    if(abs(2 * x - (upper + lower)) < upper - lower) /* x in range! */
>
> That is, if the (absolute) distance of x from the midpoint is lower than half
> the range, then x is within range. Multiplying by 2 throughout removes the
> (potentially expensive) divisions.

Since you didn't write it*: note that this method introduces a number of places 
where precision might get lost, namely x might be close to HUGE_VAL, so that 
multiplying by 2 or subtracting (upper + lower) might push the expression over 
the edge into INFINITY resulting in an incorrect predicate.

Of course the code is likely to be valid for any well-behaved data/bound 
combination.

Thomas

* I assume Richard is well aware of the caveats and just forgot to mention them.
0
Thomas
5/28/2015 1:49:29 PM
On 27/05/2015 17:08, Richard Heathfield wrote:
> On 27/05/15 16:32, Morris Dovey wrote:
>
> <snip>
>
>> C actually does have a (primitive) range-testing feature:
>>
>>      if (x >= lower_bound && x <= upper_bound) /* x in range! */;
>
> If you don't like &&, you can avoid it as follows:
>
>    if(abs(2 * x - (upper + lower)) < upper - lower) /* x in range! */

Is there any advantage in this compared with obvious version?

I can see 6 operations here, compared with two in the original (plus the 
&&, but that's not really an op, it's usually a conditional jump, and 
often the second compare is not needed).

I'm not sure about the need to avoid the &&, but my version would be:

  if ((x-lower)<=(upper-lower)) ...

with both sides cast to unsigned (omitted for clarity).

-- 
Bartc
0
Bartc
5/28/2015 3:54:02 PM
On Thursday, May 28, 2015 at 11:54:15 AM UTC-4, Bart wrote:
> On 27/05/2015 17:08, Richard Heathfield wrote:
> > On 27/05/15 16:32, Morris Dovey wrote:
> >
> > <snip>
> >
> >> C actually does have a (primitive) range-testing feature:
> >>
> >>      if (x >= lower_bound && x <= upper_bound) /* x in range! */;
> >
> > If you don't like &&, you can avoid it as follows:
> >
> >    if(abs(2 * x - (upper + lower)) < upper - lower) /* x in range! */
> 
> Is there any advantage in this compared with obvious version?

If there is, I don't think the performance gain would make up for
the loss of readability of the second version.  The first style is
idiomatic, though I tend to write it as

if (lower_bound <= x && x <= upper_bound)
0
jadill33
5/28/2015 5:32:47 PM
On 05/28/2015 09:49 AM, Thomas Jahns wrote:
> On 05/27/15 18:08, Richard Heathfield wrote:
>>> C actually does have a (primitive) range-testing feature:
>>>
>>>      if (x >= lower_bound && x <= upper_bound) /* x in range! */;
>>
>> If you don't like &&, you can avoid it as follows:
>>
>>    if(abs(2 * x - (upper + lower)) < upper - lower) /* x in range! */
>>
>> That is, if the (absolute) distance of x from the midpoint is lower than half
>> the range, then x is within range. Multiplying by 2 throughout removes the
>> (potentially expensive) divisions.
> 
> Since you didn't write it*: note that this method introduces a number of places 
> where precision might get lost, namely x might be close to HUGE_VAL, so that

Note: unless __STDE_IEC559__ is pre#defined by the implementation, the
only requirement imposed by the standard on HUGE_VAL is that it must be
positive. It's name, and the contexts in which it is returned by
standard library functions, both imply that it should be INFINITY, if an
implementation supports infinities, and otherwise should be DBL_MAX -
but the standard imposes no such requirement. As far as I can tell, a
fully conforming implementation could have HUGE_VAL == DBL_MIN.

I think that DBL_MAX is what you want in this context, not HUGE_VAL.
-- 
James Kuyper
0
James
5/28/2015 6:35:33 PM
On 27-May-15 15:21, Ben Bacarisse wrote:
> Stephen Sprunk <stephen@sprunk.org> writes:
>> static const char seq[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
>> return strchr(seq, c) - seq;
>>
>> It's still lacking any error-handling code; that's left as an exercise
>> for the reader, as is making the sequence caller-supplied.
> 
> You might consider using
> 
>   strcspn(seq, (char []){c, 0})
> 
> for that.  You get some kind of error handling free, in that all
> characters not in the collating sequence are given the same high (yet
> valid) index.
> 
> The strspn and strcspn functions are much overlooked.

Indeed; I don't think I've ever used either, probably because when I
first (last?) encountered them, no uses came to mind, so your example is
quite helpful.

S

-- 
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
K5SSS        dice at every possible opportunity." --Stephen Hawking
0
Stephen
5/28/2015 10:36:54 PM
On 05/28/15 17:54, Bartc wrote:
> On 27/05/2015 17:08, Richard Heathfield wrote:
>> On 27/05/15 16:32, Morris Dovey wrote:
>>> C actually does have a (primitive) range-testing feature:
>>>
>>>      if (x >= lower_bound && x <= upper_bound) /* x in range! */;
>>
>> If you don't like &&, you can avoid it as follows:
>>
>>    if(abs(2 * x - (upper + lower)) < upper - lower) /* x in range! */
>
> Is there any advantage in this compared with obvious version?
>
> I can see 6 operations here, compared with two in the original (plus the &&, but
> that's not really an op, it's usually a conditional jump, and often the second
> compare is not needed).

Since branches are much more expensive than a pipelined extra compare, jumping 
is probably not what any compiler would emit for integer computations. For 
floats (and then fabs instead of abs) the second version might save a trip to 
memory and back if the comparisons are done in the FPU and the && in the ALU 
(and there is no fast route from FPU to ALU registers like on e.g. on some 
PowerPC/POWER CPUs, which can do non-branching FPU predicates with FSEL instead).

But like any other micro-optimization: measure first to see if this is an issue 
at all.

Thomas
0
Thomas
5/29/2015 8:17:28 AM
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:

> I suppose you could even have:
> 
>     switch (x)
>     {  case "ABCDEF":
> 	  /* case code */

Well, except that there have already been calls to allow a switch to
compare on string literals, so that you could have


  switch (permutation) {
    case "ABC":
    case "ACB":
      /* code... */
    case "CBA":
      /* code... */
    default:
      /* code... */
  }

So there'd need to be a way of disunconfusing those cases.

Once you start being creative with your desired additions to the
language, there is no end to the complications.

Richard
0
raltbos
5/29/2015 12:28:22 PM
Robert Wessel <robertwessel2@yahoo.com> wrote:

> I understand that you *want* "case 'a..'z':" to mean the 26 letters, I
> want to know what reasonable syntax rules would produce such a thing.

What I want to know is where he thinks he's going to go with that
proposal in Turkey, or Wales.

Richard
0
raltbos
5/29/2015 12:34:59 PM
On 29/05/2015 13:28, Richard Bos wrote:
> Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
>
>> I suppose you could even have:
>>
>>      switch (x)
>>      {  case "ABCDEF":
>> 	  /* case code */
>
> Well, except that there have already been calls to allow a switch to
> compare on string literals, so that you could have
>
>
>    switch (permutation) {
>      case "ABC":
>      case "ACB":
>        /* code... */
>      case "CBA":
>        /* code... */
>      default:
>        /* code... */
>    }
>
> So there'd need to be a way of disunconfusing those cases.

Presumably x would be a char or int type, so if matching with a string 
literal, the intention can be assumed to be to match one of those 
characters.

But 'permutation' would have to be a char* type, so this is string 
matching. Doing "strcmp" however is not quite the same as the "==" kind 
of matching that switch effectively does. It's not been well thought out.

> Once you start being creative with your desired additions to the
> language, there is no end to the complications.

Not really, if you know when to stop, and know what is suitable for the 
language.

Your example is no problem at all for a scripting language, but it's 
difficult to justify in C or to make it fit in.

(What /does/ suit C more is something like:

  int permutation;

   switch (permutation) {
     case 'ABC':
     case 'ACB':
     etc.

Except that such char literals are probably badly defined, and with 
Unicode becoming popular, that introduces problems.)

-- 
Bartc

0
Bartc
5/29/2015 12:44:42 PM
On 29/05/2015 13:34, Richard Bos wrote:
> Robert Wessel <robertwessel2@yahoo.com> wrote:
>
>> I understand that you *want* "case 'a..'z':" to mean the 26 letters, I
>> want to know what reasonable syntax rules would produce such a thing.
>
> What I want to know is where he thinks he's going to go with that
> proposal in Turkey, or Wales.

I'm looking at a street atlas of Cardiff right now, published by the A-Z 
map company. The street index at the back lists all the usual letters of 
the alphabet and in the usual order (although X is missing, probably 
because there are no streets starting with X).

Any composite letters I guess are represented by two letters from A-Z 
(eg "ll" as "l" followed by "l"). It shows how versatile it is.

As for Turkish, didn't they recently convert from Arabic? (I wonder 
why?) In any case, their Latin alphabet seems strikingly familiar to our 
A-Z one, just with more bits stuck on, probably different ways of 
pronouncing them.

-- 
Bartc
0
Bartc
5/29/2015 1:04:50 PM
Bartc <bc@freeuk.com> writes:

> On 29/05/2015 13:34, Richard Bos wrote:
>> Robert Wessel <robertwessel2@yahoo.com> wrote:
>>
>>> I understand that you *want* "case 'a..'z':" to mean the 26 letters, I
>>> want to know what reasonable syntax rules would produce such a thing.
>>
>> What I want to know is where he thinks he's going to go with that
>> proposal in Turkey, or Wales.
[...]
> As for Turkish, didn't they recently convert from Arabic?

Yes. In 1928.

> (I wonder why?) In any case, their Latin alphabet seems strikingly
> familiar to our A-Z one, just with more bits stuck on, probably
> different ways of pronouncing them.

It's not that simple. The Turkish alphabet has a dotless-i (ı, aka
U+0131) whose uppercase is I (U+0049), and a i-with-dot (U+0069), whose
uppercase is İ (U+0130).

-- Alain.
0
Alain
5/29/2015 1:28:01 PM
On 29/05/15 15:04, Bartc wrote:
> On 29/05/2015 13:34, Richard Bos wrote:
>> Robert Wessel <robertwessel2@yahoo.com> wrote:
>>
>>> I understand that you *want* "case 'a..'z':" to mean the 26 letters, I
>>> want to know what reasonable syntax rules would produce such a thing.
>>
>> What I want to know is where he thinks he's going to go with that
>> proposal in Turkey, or Wales.
> 
> I'm looking at a street atlas of Cardiff right now, published by the A-Z
> map company. The street index at the back lists all the usual letters of
> the alphabet and in the usual order (although X is missing, probably
> because there are no streets starting with X).
> 
> Any composite letters I guess are represented by two letters from A-Z
> (eg "ll" as "l" followed by "l"). It shows how versatile it is.
> 
> As for Turkish, didn't they recently convert from Arabic? (I wonder
> why?) In any case, their Latin alphabet seems strikingly familiar to our
> A-Z one, just with more bits stuck on, probably different ways of
> pronouncing them.
> 

They converted from the Arabic alphabet to a Latin alphabet in the
1920's, as part of the modernisation and Westernisation of the country
after WWI.  There were several reasons for doing so - they wanted to be
closer to Europe and further from the Middle East, the Latin alphabet is
a better fit for the language (which is completely unrelated to Arabic),
it would make it easier for communicating in different languages, it is
arguably easier to learn than the Arabic alphabet, and it would allow
the usage of modern technology such as typewriters and more flexible
printing presses.

The result is that within just a few years, literacy rates in Turkey
jumped from around 7% to 15-20%, and continued to climb thereafter.
(There were a great many radical changes in Turkey at the time, which
also affected this figure.)

But what I cannot understand is that when they took on the Latin
alphabet, they added a range of accents and diacritical marks that are
unknown in any other language, making it impossible to use standard
typewriters (or in today's terms, standard fonts and keyboards) from
other common Latin alphabet languages.  I think the idea was to avoid
merely copying the "foreign" Latin alphabet, but to take it and make it
into something Turkish - nationalism was a very big motivator for Turkey
at the time.

0
David
5/29/2015 1:31:55 PM
Bartc <bc@freeuk.com> wrote:

> On 29/05/2015 13:28, Richard Bos wrote:

> > Once you start being creative with your desired additions to the
> > language, there is no end to the complications.
> 
> Not really, if you know when to stop, and know what is suitable for the 
> language.

Yeah, and _if_ my aunt had as many bollocks as are talked in this
thread, she'd be my uncle.

Richard
0
raltbos
5/29/2015 3:55:46 PM
On 29/05/2015 16:55, Richard Bos wrote:
> Bartc <bc@freeuk.com> wrote:
>
>> On 29/05/2015 13:28, Richard Bos wrote:
>
>>> Once you start being creative with your desired additions to the
>>> language, there is no end to the complications.
>>
>> Not really, if you know when to stop, and know what is suitable for the
>> language.
>
> Yeah, and _if_ my aunt had as many bollocks as are talked in this
> thread, she'd be my uncle.

Well, some of us have implemented many of the possible features that 
have been discussed in the group, not just in the thread. So we have a 
good idea of what's easy, what's difficult, and what isn't appropriate.

-- 
bartc
0
Bartc
5/29/2015 5:17:29 PM
Bartc <bc@freeuk.com> writes:
[...]
> (What /does/ suit C more is something like:
>
>  int permutation;
>
>   switch (permutation) {
>     case 'ABC':
>     case 'ACB':
>     etc.
>
> Except that such char literals are probably badly defined, and with
> Unicode becoming popular, that introduces problems.)

'ABC' is already a valid character constant.  It has type int and an
implementation-defined value.  (With gcc on my system, its value is
('A'<< 16) + ('B' << 8) + 'C').

It's a feature that I've seen used incorrectly far more often than
it's been used correctly -- and it pretty much *can't* be used
correctly in portable code.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
5/29/2015 5:52:41 PM
Alain Ketterlin <alain@universite-de-strasbourg.fr.invalid> wrote:

> Bartc <bc@freeuk.com> writes:
> 
> > On 29/05/2015 13:34, Richard Bos wrote:
> >> Robert Wessel <robertwessel2@yahoo.com> wrote:
> >>
> >>> I understand that you *want* "case 'a..'z':" to mean the 26 letters, I
> >>> want to know what reasonable syntax rules would produce such a thing.
> >>
> >> What I want to know is where he thinks he's going to go with that
> >> proposal in Turkey, or Wales.

> > (I wonder why?) In any case, their Latin alphabet seems strikingly
> > familiar to our A-Z one, just with more bits stuck on, probably
> > different ways of pronouncing them.
> 
> It's not that simple. The Turkish alphabet has a dotless-i (ı, aka
> U+0131) whose uppercase is I (U+0049), and a i-with-dot (U+0069), whose
> uppercase is İ (U+0130).

And crucially, to the Turks, the dotless i isn't stuck _on_, it's stuck
_in_. A-Z, to them, has more than Bart's 26 letters.

Richard
0
raltbos
6/5/2015 3:18:50 PM
On Tue, 26 May 2015 14:59:41 +0100, Ben Bacarisse
<ben.usenet@bsb.me.uk> wrote:

> David Brown <david.brown@hesbynett.no> writes:
> <snip>
> > (Octal literals other than 0, on the other hand, are practically
> > non-existent in embedded code, and banned by many coding standards as
> > confusing, hard to read, and error prone.  When you work with data
> > sizes of 8, 16 and 32 bits, a notation for 3 bit sizes is useless.)
> 
> If the data are arbitrary, I agree, but octal is useful for some kinds
> of data.  The most common data you entered via the switches on the front
> panel of a PDP-11 were instructions, and it's no accident that the
> switches are grouped in threes as the instruction format is full of
> octal-significant patterns.  If you are doing low-level PDP-11
> programming you will use octal a lot, despite it being a 16-bit machine
> with 8-bit bytes.  The use of octal in the PDP-11 probably comes the
> 36-bit PDP-10 range.
> 
PDP-6 and -10 software did mostly use octal, but it wasn't a great
fit; instrucations had a mixture including 4-bit and 5-bit fields, and
software mostly used 7-bits for characters. The 12-bit PDP-5,8,12 and
IIRC also 18-bit PDP-7,9 were more conveniently octal.

And on LSI-11 models without switches (or blinkenlights, sadly) the
serial-console 'virtual front panel' was a firmware form of the
software debugger ODT, where O means what you can guess.

> It's tempting to think the octal got into C because of the octal-heavy
> DEC heritage, but octal constants where in B in exactly the same form (a
> leading 0 indicating base 8) so I think they were probably just copied
> into C in an environment where octal was used enough that there would be
> no motivation to "tidy up" B by dropping them.
> 
> B was developed on Honeywell 6000 machines (re-badged GE machines) which
> also make heavy use of octal.  Addresses are 18 bits and alphanumeric
> data are composed of either 6 or 9 bit bytes.  The opcodes are 9 bits
> and some 3-bit portions denote variations.  Octal was used almost
> exclusively.

Are you sure? Ritchie's HOPL2 paper on C says B was developed on the
PDP-7 and moved to the (new) PDP-11 along with very early Unix, where
it quickly 'evolved' into C, and *also* was ported to and used on
H6070. By then it was too late to influence C. 

Ritchie's home page still(!) at https://www.bell-labs.com/usr/dmr/www/
has a 'User's Reference to B' dated Jan. 7 1972 for the PDP-11
version, and 'Computing Science Technical Report #8' in 2 parts (and 3
forms) dated Jan. 1973 for the GCOS version. Both say leading 0 means
octal but DIGITS 8 AND 9 ARE STILL ALLOWED; the former gives an
example that 09 is the same as 011! At least C fixed that.

0
David
6/21/2015 4:46:38 PM
On Sun, 17 May 2015 10:00:16 +0100, Richard Heathfield
<rjh@cpax.org.uk> wrote:

> On 16/05/15 07:21, Robert Wessel wrote:
> 
> <snip>
> 
> > Is the collating sequence different on DS9000's manufactured for sale
> > in France?
> 
> Yes, and a different one still for Germany, and yet another for Russia 
> (and yes, worryingly, DS9Ks are very popular in Russia for some reason).

Why is this a factory option? Even IBM back in the 60s managed to do
field mods, especially the widely rumored one to upgrade a lower-speed
card sorter to a higher-rental higher-speed sorter by removing the
speed reduction circuit. For that matter, DS9k ought to be able to
detect its position by GPS and *automatically* set itself to the
correct locale -- with the possible exception of cases where
boundaries 'magically' move without warning, like Crimea. <G??>
0
David
6/21/2015 4:46:38 PM
On Wed, 13 May 2015 18:44:23 -0500, Robert Wessel
<robertwessel2@yahoo.com> wrote:

> On Wed, 13 May 2015 14:55:27 +0200, David Brown
> <david.brown@hesbynett.no> wrote: <snip>
> >The point of ASCII is that apart from EBCDIC, everyone agrees on the
> >first 128 characters.  <snip>
> 
> Well, we don't really agree on the first 128 characters so much
> either.  More these days, but things like $ and # in pre-ANSI ASCII
> varied between languages considerably.  And we only semi-agree on a
> few of the control characters.

ASCII was from its start a standard of the American (meaning USA)
standards body; are you referring to the fact that ANSI was previously
known by other names? If so, that's true and is the reason why the
(exact) same standard is called both ASCII and USASCII, and I believe
there was even a period it was ANSCII. But that organization change
had NO effect on the characters IN X3.4.

The 7bit char-code standard that DID vary a few (about 10) characters
for (most) European countries was ISO 646, which was modified from
ASCII, and thus necessarily after. ASCII/646 were 'extended' by
(several) techniques to 'switch' 7bit codes among more than 128 chars,
then superseded by 8bit ISO 8859 with the first 128 identical to ASCII
always, but the *second* 128 varying among about a dozen choices,
including non-Latin alphabets like Greek, Turkish, Cyrillic, etc. And
finally 10646/Unicode which retains the first variant of 8859, namely
8859-1, for the first 256 codepoints, but expands to 17bits to include
thousands of ideographs for "CJK" (China, Japan, Korea).

0
David
6/21/2015 4:46:38 PM
On 06/21/2015 12:46 PM, David Thompson wrote:
> On Sun, 17 May 2015 10:00:16 +0100, Richard Heathfield
> <rjh@cpax.org.uk> wrote:
> 
>> On 16/05/15 07:21, Robert Wessel wrote:
>>
>> <snip>
>>
>>> Is the collating sequence different on DS9000's manufactured for sale
>>> in France?
>>
>> Yes, and a different one still for Germany, and yet another for Russia 
>> (and yes, worryingly, DS9Ks are very popular in Russia for some reason).
> 
> Why is this a factory option? Even IBM back in the 60s managed to do
> field mods, especially the widely rumored one to upgrade a lower-speed
> card sorter to a higher-rental higher-speed sorter by removing the
> speed reduction circuit. For that matter, DS9k ought to be able to
> detect its position by GPS and *automatically* set itself to the
> correct locale -- with the possible exception of cases where
> boundaries 'magically' move without warning, like Crimea. <G??>

The makers of the DS9000 are always looking for new ways to surprise
their users - I'm sure that they will be quite interested in your
suggestion.
-- 
James Kuyper
0
James
6/21/2015 5:20:41 PM
David Thompson <dave.thompson2@verizon.net> writes:

> On Tue, 26 May 2015 14:59:41 +0100, Ben Bacarisse
> <ben.usenet@bsb.me.uk> wrote:
<snip>
>> B was developed on Honeywell 6000 machines (re-badged GE machines) which
>> also make heavy use of octal.  Addresses are 18 bits and alphanumeric
>> data are composed of either 6 or 9 bit bytes.  The opcodes are 9 bits
>> and some 3-bit portions denote variations.  Octal was used almost
>> exclusively.
>
> Are you sure? Ritchie's HOPL2 paper on C says B was developed on the
> PDP-7 and moved to the (new) PDP-11 along with very early Unix, where
> it quickly 'evolved' into C, and *also* was ported to and used on
> H6070. By then it was too late to influence C.

No, nor sure at all.  I first used B on a Honeywell (at Aberdeen
University of all places) which, I think, contributed to this notion,
and the only document I've ever seen about it is "A TUTORIAL
INTRODUCTION TO THE LANGUAGE B" by Kernighan (along with an associated
language reference manual).  Both refer only to the GCOS version but
that is quite possibly because they were wrotten only for a GCOS
audience.  Both are undated but they probably come from Ritchie's site.

I think the wording probably also lead me astray in a subliminal way:

  "B is a new computer language designed and implemented at Murray
  Hill. It runs and is actively supported and documented on the H6070
  TSS system at Murray Hill."

  "The original design and implementation are the work of K. L. Thompson
  and D. M. Ritchie; their original 6070 version has been substantially
  improved by S. C. Johnson, who also wrote the runtime library."

despite it not actually saying that the Honeywell version was the
original one!

<snip>
-- 
Ben.
0
Ben
6/21/2015 7:41:46 PM
David Thompson <dave.thompson2@verizon.net> writes:
[...]
> Ritchie's home page still(!) at https://www.bell-labs.com/usr/dmr/www/
[...]

Ah, that's good news!

I had been accessing dmr's home page (and the documents linked
from it) via http://cm.bell-labs.com/cm/cs/who/dmr/, which has been
incommunicado for some time.  I was afraid it had all been removed.
I'm glad to see that it still exists.

And in fact http://cm.bell-labs.com/ links to
http://www.bell-labs.com/; I hadn't thought to check for that.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
0
Keith
6/21/2015 8:27:58 PM
Bartc <bc@freeuk.com> writes:

> On 24/05/2015 15:44, Tim Rentsch wrote:
>
>> I see no reason to make any of these changes.  A binary base
>> isn't useful because the lengths involved make it hard to
>> see just what the value is (and is error prone to write).
>
> Are you suggesting that binary constants are /never/ useful?

No, only that they offer very marginal utility, and cases where
the extra utility might be worthwhile are sufficiently rare so
that it is better for the language to leave them out.  Any
language construct anyone can think up is going to be useful
in _some_ circumstances;  the question is does the value outweigh
the intellectual cost of making the language bigger.  Binary
constants are just not very useful, as evidenced by their rarity
in regular writing.  By comparison octal constants or hexadecimal
constants occur with reasonable frequency.

> This is such a trivially implemented enhancement that you can
> afford to just throw it in, and let people use it if they want.

The cost of implementation is irrelevant here.  Also you are
ignoring the collateral costs for things like library functions,
etc.

> Obviously nobody is suggesting switching from decimal, or hex, to
> binary just for the hell of it.  Binary will be used for a reason.
> [examples]

I don't find any of these examples compelling, let alone
convincing.  Most of them are better done using techniques others
have pointed out in their responses.  The bitmap example is
arguably an exception, but even in that case other approaches
were pointed out that IMO do a better job;  plus putting bitmaps
in programs is very rare (and these days tends to be done using
resource files, which don't have to rely on language syntax).  In
general I think it's a good idea to follow Tony Hoare's advice:
a proposed language construct should be added to the language
only if it is going to be used by every (non-trivial) program
written in that language.  Binary constants fail that test.

> Or, you maybe you are right and this would all be just as easy in
> decimal or hex.  (When it's come up before, someone devised a macro
> which interpreted its parameter as a binary string.  But if someone
> goes to that much trouble, it means it could do with being in the
> language.)

It could mean that, but in this case I don't think it does.
The reason is pretty simple:  most programs don't need it.
Better to leave the funny macro solution in for those (very!) few
programs that do.
0
Tim
7/10/2015 5:07:22 PM
Richard Heathfield <rjh@cpax.org.uk> writes:

> On 24/05/15 22:19, Ben Bacarisse wrote:
>> Bartc <bc@freeuk.com> writes:
>
> <snip>
>
>>> The advantage of binary is that it lets you visualise the data
>>> when there's a certain pattern or grouping involved.  (It would need
>>> separators too.)
>>
>> Like a linear pattern (number 3) this is one I've used too, but even in
>> C you can do some messing about in macros to get patterns that stand out
>> quite strongly:
>>
>>    const unsigned image[] = {
>>        B(X___,___X)
>>        B(_X__,__X_)
>>        B(__X_,_X__)
>>        B(___X,X___)
>>        B(___X,X___)
>>        B(__X_,_X__)
>>        B(_X__,__X_)
>>        B(X___,___X)
>>    };
>>
>> using these definitions:
>>
>>    #define PAT_____ 0
>>    #define PAT____X 1
>>    #define PAT___X_ 2
>>    #define PAT___XX 3
>>    #define PAT__X__ 4
>>    #define PAT__X_X 5
>>    #define PAT__XX_ 6
>>    #define PAT__XXX 7
>>    #define PAT_X___ 8
>>    #define PAT_X__X 9
>>    #define PAT_X_X_ a
>>    #define PAT_X_XX b
>>    #define PAT_XX__ c
>>    #define PAT_XX_X d
>>    #define PAT_XXX_ e
>>    #define PAT_XXXX f
>>
>>    #define B(a, b)   BX(PAT_ ## a, PAT_ ## b)
>>    #define BX(a, b)  BXX(a, b)
>>    #define BXX(a, b) 0x ## a ## b,
>>
>> I agree that the very low cost to implement binary literals should be
>> taken into account.  Even a few uses makes it a net gain, but it almost
>> certainly needs a digit separator as well.
>>
>> <snip>
>
> #define l )*2+1
> #define O )*2
> #define binary ((((((((((((((((0
> /* add 16 more (s for 32 bit quantities */
>
> Usage:
>
> a = binary O O O O O O O O O O O O O O O O ; /* 0 */
> b = binary O O O O O O O O O O O O O O O l ; /* 1 */
> c = binary O O O O O O O O O O O O O O l O ; /* 2 */
> d = binary O O O O O O O O O O O O O O l l ; /* 3 */

Clever.  Thank you for posting this.

I might suggest an (untested) improvement:

    #define l *2+1)
    #define O *2)
    #define binary ((((((((((((((((0U

This should be somewhat safer, especially in implementations
that have 16-bit ints.
0
Tim
7/10/2015 5:12:34 PM
David Brown <david.brown@hesbynett.no> writes:

> On 24/05/15 16:44, Tim Rentsch wrote:
>
>> I see no reason to make any of these changes.  A binary base
>> isn't useful because the lengths involved make it hard to
>> see just what the value is (and is error prone to write).
>
> Binary constants of the format 0b0110 made it into C compilers and the
> C++ standards because they are very useful,

C++ has made lots of decisions which are IMO bad ones.  So I guess
I'm not surprised to see another one.

> particularly in embedded
> programming where we live in a world of bits for device registers, IO
> ports, and so on.  [snip elaboration]

I don't find this motivation convincing.  ISTM that in such
cases one would want to define symbolic names for the various
masks, etc, and use those rather than putting binary constants
everywhere, in which cases the exact form of the constant
used is of much less importance.  (Surely you don't mean to
suggest that embedded programmers are guilty of bad programming
practices?)  Moreover I think it's likely that the documentation
that describes such fields gives their values in octal, decimal,
or hex rather than binary, excepting perhaps for masks which
are better described in terms of width and position than using
a long binary constant.
0
Tim
7/10/2015 5:19:58 PM
David Brown <david.brown@hesbynett.no> writes:

> On 27/05/15 19:39, Bartc wrote:
>> On 27/05/2015 18:18, Les Cargill wrote:
>>> Bartc wrote:
>>>
>>>> (And there are a huge number of values for which floating point equality
>>>> can be tested without problem, including all the 4 billion values of a
>>>> 32-bit int.)
>>>
>>> That is so implementation dependent than if I were part of
>>> the governance for the language, I'd suggest avoiding it.
>>
>> Which implementations would return false for   1.0 == 1?  (or 1.0 == 1.0?)
>>
>> How about this:
>>
>> double a,b=1.0;
>>
>> a = b;
>>
>> Are there any implementations where a == b is false?  (Assume b contains
>> a valid value.)
>
> I believe it is perfectly possible for the comparison to be false
> (perhaps not with 1.0, since that has a nice easy representation in
> IEEE formats).

In conforming C it is not possible, as long as the value held in
'b' is a valid and comparable value (and not, eg, NaN).

> In particular, compilers on x86 systems might use
> 80-bit registers to store values temporarily, even if they are 32-bit
> or 64-bit in memory - and equality comparisons may fail.

That is irrelevant in this case.

> It is also possible, I think, for some types of NaN (which may count
> as "valid value", depending on your definitions) to always fail
> comparisons.  In such cases, neither "a == b" nor "a != b" will be
> true.

Yes but that is not the case here.

> I cannot say for sure what the standards allow, and I haven't seen
> such problems personally, but I have heard enough war stories from
> real code to know that one should never compare floating point values
> for equality.

ISTM that if you are going to position yourself as an expert then
it behooves you to _be_ sure.  There are cases where C semantics
requires floating-point equality to work, and this is one of
them.
0
Tim
7/10/2015 6:05:26 PM
Keith Thompson <kst-u@mib.org> writes:

> David Brown <david.brown@hesbynett.no> writes:
>> On 27/05/15 19:39, Bartc wrote:
>
> [...]
>
>>> double a,b=1.0;
>>>
>>> a = b;
>>>
>>> Are there any implementations where a == b is false?  (Assume b contains
>>> a valid value.)
>>
>> I believe it is perfectly possible for the comparison to be false
>> (perhaps not with 1.0, since that has a nice easy representation in
>> IEEE formats).  In particular, compilers on x86 systems might use
>> 80-bit registers to store values temporarily, even if they are 32-bit
>> or 64-bit in memory - and equality comparisons may fail.
>
> C doesn't require IEEE formats or semantics.
>
> Any floating-point type that meets the requirements of the
> floating-point model in N1570 5.2.4.2.2 can represent 1.0 exactly
> -- but that doesn't necessarily mean that 1.0 == 1.0 will evaluate
> to 1.  An implementation where 1.0 != 1.0 might be conforming,
> but it would IMHO be badly broken.

That's true for '1.0 == 1.0' but only because of the latitude
granted in converting floating-point constants.  In the original
question the equality test must yield '1' as its value.
0
Tim
7/10/2015 6:08:02 PM
In article <kfn7fq76g7t.fsf@x-alumni2.alumni.caltech.edu>,
Tim Rentsch  <txr@alumni.caltech.edu> wrote:
....
>ISTM that if you are going to position yourself as an expert then
>it behooves you to _be_ sure.  There are cases where C semantics
>requires floating-point equality to work, and this is one of
>them.

When did DB "position" himself as an "expert" ?

-- 
People who say they'll vote for someone else because Obama couldn't solve
all of Bush's messes are like people complaining that he couldn't cure cancer,
so they'll go and vote for cancer.

0
gazelle
7/10/2015 6:08:50 PM
On 10/07/2015 18:07, Tim Rentsch wrote:
> Bartc <bc@freeuk.com> writes:

>> Are you suggesting that binary constants are /never/ useful?
>
> No, only that they offer very marginal utility, and cases where
> the extra utility might be worthwhile are sufficiently rare so
> that it is better for the language to leave them out.

You mean, such as octal constants, and a hundred other things that 
hardly anyone ever uses?

> Any
> language construct anyone can think up is going to be useful
> in _some_ circumstances;  the question is does the value outweigh
> the intellectual cost of making the language bigger.

Sometimes orthogonality and completeness is also of value.

> Binary
> constants are just not very useful, as evidenced by their rarity
> in regular writing.

Well, there might be a good reason for that! It's difficult to use 
binary constants when they don't exist in the language. If you mean 
writing in natural language, then I can understand they will not figure 
too much in Shakespeare or Dickens, but I'm looking at the AMD64 manuals 
now and they do use binary constants within the text.

> By comparison octal constants  or hexadecimal
> constants occur with reasonable frequency.

Octal constants don't, unless you include the value 0. I'm fairly sure a 
lot of C coders don't even know they exist.

>> This is such a trivially implemented enhancement that you can
>> afford to just throw it in, and let people use it if they want.
>
> The cost of implementation is irrelevant here.  Also you are
> ignoring the collateral costs for things like library functions,
> etc.

What are those costs? As I can't see they impact library functions at all.

>> Obviously nobody is suggesting switching from decimal, or hex, to
>> binary just for the hell of it.  Binary will be used for a reason.
>> [examples]
>
> I don't find any of these examples compelling, let alone
> convincing.  Most of them are better done using techniques others
> have pointed out in their responses.

Well, /I/ use them and I find them a benefit. I don't care if other 
people try to convince themselves that they are not needed.

I am in the fortunate position of being able to write or adapt languages 
so that I can do what I like. But I feel sorry for those deprived of a 
simple, handy feature such as this, or are forced to resort to 
hairy-looking macros.

> It could mean that, but in this case I don't think it does.
> The reason is pretty simple:  most programs don't need it.

I found it amusing once to list all the things that C programs didn't 
really need because it was possible to fall back on something else. 
Believe me, there wasn't much of the language left!

-- 
Bartc
0
Bartc
7/10/2015 7:09:59 PM
Tim Rentsch wrote:
> David Brown <david.brown@hesbynett.no> writes:
>
>> On 24/05/15 16:44, Tim Rentsch wrote:
>>
>>> I see no reason to make any of these changes.  A binary base
>>> isn't useful because the lengths involved make it hard to
>>> see just what the value is (and is error prone to write).
>>
>> Binary constants of the format 0b0110 made it into C compilers and the
>> C++ standards because they are very useful,
>
> C++ has made lots of decisions which are IMO bad ones.  So I guess
> I'm not surprised to see another one.

Your opinions appear to be at odds with the majority of C++ programmers. 
  Features get added because us programmers want them.

>> particularly in embedded
>> programming where we live in a world of bits for device registers, IO
>> ports, and so on.  [snip elaboration]
>
> I don't find this motivation convincing.  ISTM that in such
> cases one would want to define symbolic names for the various
> masks, etc, and use those rather than putting binary constants
> everywhere, in which cases the exact form of the constant
> used is of much less importance.  (Surely you don't mean to
> suggest that embedded programmers are guilty of bad programming
> practices?)  Moreover I think it's likely that the documentation
> that describes such fields gives their values in octal, decimal,
> or hex rather than binary, excepting perhaps for masks which
> are better described in terms of width and position than using
> a long binary constant.

The documentation that describes such fields tends to be tables of bits, 
or a diagrammatic representation of registers.  I for one have typed 
many a binary pattern into a calculator to generate a hex constant, at 
least I now have the option not to.

-- 
Ian Collins
0
Ian
7/10/2015 8:47:19 PM
gazelle@shell.xmission.com (Kenny McCormack) writes:

> In article <kfn7fq76g7t.fsf@x-alumni2.alumni.caltech.edu>,
> Tim Rentsch  <txr@alumni.caltech.edu> wrote:
> ...
>> ISTM that if you are going to position yourself as an expert then
>> it behooves you to _be_ sure.  There are cases where C semantics
>> requires floating-point equality to work, and this is one of
>> them.
>
> When did DB "position" himself as an "expert" ?

Of course I don't know if David Brown intends to position himself
has an expert or not.  But that is often how it comes across,
based on his writing.
0
Tim
7/11/2015 2:55:56 PM
In article <kfnpp3yzqtf.fsf@x-alumni2.alumni.caltech.edu>,
Tim Rentsch  <txr@alumni.caltech.edu> wrote:
>gazelle@shell.xmission.com (Kenny McCormack) writes:
>
>> In article <kfn7fq76g7t.fsf@x-alumni2.alumni.caltech.edu>,
>> Tim Rentsch  <txr@alumni.caltech.edu> wrote:
>> ...
>>> ISTM that if you are going to position yourself as an expert then
>>> it behooves you to _be_ sure.  There are cases where C semantics
>>> requires floating-point equality to work, and this is one of
>>> them.
>>
>> When did DB "position" himself as an "expert" ?
>
>Of course I don't know if David Brown intends to position himself
>has an expert or not.  But that is often how it comes across,
>based on his writing.

I've never been all that impressed with anything he's written on this forum.

-- 

There are many self-professed Christians who seem to think that because
they believe in Jesus' sacrifice they can reject Jesus' teachings about
how we should treat others. In this country, they show that they reject
Jesus' teachings by voting for Republicans.

0
gazelle
7/11/2015 3:02:22 PM
Bartc <bc@freeuk.com> writes:

> On 10/07/2015 18:07, Tim Rentsch wrote:
>> Bartc <bc@freeuk.com> writes:
>>
>>> Are you suggesting that binary constants are /never/ useful?
>>
>> No, only that they offer very marginal utility, and cases where
>> the extra utility might be worthwhile are sufficiently rare so
>> that it is better for the language to leave them out.
>
> You mean, such as octal constants,

Personally I find octal constants useful perhaps somewhat rarely,
but they do offer significant utility in those cases.  YMMV.

> and a hundred other things that
> hardly anyone ever uses?

Not sure which hundred things you think no one uses, but
if you post a list here I'd be happy to take a look at it.

>> Any
>> language construct anyone can think up is going to be useful
>> in _some_ circumstances;  the question is does the value outweigh
>> the intellectual cost of making the language bigger.
>
> Sometimes orthogonality and completeness is also of value.

Yes, that is a factor.  It may not be the determining factor,
but it is a factor.

>> Binary
>> constants are just not very useful, as evidenced by their rarity
>> in regular writing.
>
> Well, there might be a good reason for that!  It's difficult to use
> binary constants when they don't exist in the language.  If you mean
> writing in natural language, then I can understand they will not
> figure too much in Shakespeare or Dickens, but I'm looking at the
> AMD64 manuals now and they do use binary constants within the text.

I was thinking of technical literature and documentation.

>> By comparison octal constants  or hexadecimal
>> constants occur with reasonable frequency.
>
> Octal constants don't, unless you include the value 0.  I'm fairly sure
> a lot of C coders don't even know they exist.

I guess you haven't read the same technical literature that I
have.

>>> This is such a trivially implemented enhancement that you can
>>> afford to just throw it in, and let people use it if they want.
>>
>> The cost of implementation is irrelevant here.  Also you are
>> ignoring the collateral costs for things like library functions,
>> etc.
>
> What are those costs?  As I can't see they impact library
> functions at all.

Really?  Are you familiar with the functions in <stdlib.h>
and <stdio.h>?

>>> Obviously nobody is suggesting switching from decimal, or hex, to
>>> binary just for the hell of it.  Binary will be used for a reason.
>>> [examples]
>>
>> I don't find any of these examples compelling, let alone
>> convincing.  Most of them are better done using techniques others
>> have pointed out in their responses.
>
> Well, /I/ use them and I find them a benefit.  I don't care if other
> people try to convince themselves that they are not needed.

When it comes to adding something to C, it's necessary to take
into account what other people think.

> I am in the fortunate position of being able to write or adapt
> languages so that I can do what I like.  But I feel sorry for those
> deprived of a simple, handy feature such as this, or are forced to
> resort to hairy-looking macros.

Okay, but my comments here are made in the context of whether
something should be added to C.  I have no opinion on what
should go into your pet language.

>> It could mean that, but in this case I don't think it does.
>> The reason is pretty simple:  most programs don't need it.
>
> I found it amusing once to list all the things that C programs didn't
> really need because it was possible to fall back on something
> else.  Believe me, there wasn't much of the language left!

I've looked through that list, or at least what I think you're
referring to.  I didn't find it to be of much value, since
basically all it reflects is your own opinions;  that is, it
doesn't say what other people think they need, but only what you
think they need.  If one wants to be effective in getting changes
made to C, it's important to take into account what other people
actually think, not just what you think they should think.
0
Tim
7/11/2015 6:19:01 PM
On 11/07/2015 19:19, Tim Rentsch wrote:
> Bartc <bc@freeuk.com> writes:

>> What are those costs?  As I can't see they impact library
>> functions at all.
>
> Really?  Are you familiar with the functions in <stdlib.h>
> and <stdio.h>?

Many of them, yes. One or two might be affected if it becomes necessary 
to support binary input and output. But binary is most useful when it's 
possible to write binary constants within the source code. I/O can be 
done with user functions. In that case, which library functions are 
affected?

>> I am in the fortunate position of being able to write or adapt
>> languages so that I can do what I like.  But I feel sorry for those
>> deprived of a simple,