Hello,
What would you recommend is the best way to construct a std::string from
a numerical variable (eg an int).
Ive found _itoa() function on msdn thats part of the ms c runtime
library. This is presumably only available in their library though and
thus importable?
One implement such a conversion function eg. by looping (eg via %)
through the digits, d, in the number, and convert each to a character
via eg 'a'+ d, and appand this to a string. I have the distinct feeling
Id be reinventing the wheel, and missing on generalisations/caveats like
different bases etc.
Im surprsied that this isnt a function in the c library or the like, as
a atoi() is. In fact Im surprised it isnt a std::string member function.
Am I missing it?
Thansk very much in advance.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
p8mode
|
6/6/2004 11:28:15 AM |
|
p8mode <p8mode@gmx.net> wrote:
> What would you recommend is the best way to construct a std::string from
> a numerical variable (eg an int).
>
> Ive found _itoa() function on msdn thats part of the ms c runtime
> library. This is presumably only available in their library though and
> thus importable?
>
> One implement such a conversion function eg. by looping (eg via %)
> through the digits, d, in the number, and convert each to a character
> via eg 'a'+ d, and appand this to a string. I have the distinct feeling
> Id be reinventing the wheel, and missing on generalisations/caveats like
> different bases etc.
>
> Im surprsied that this isnt a function in the c library or the like, as
> a atoi() is. In fact Im surprised it isnt a std::string member function.
> Am I missing it?
There is a standard portable way of doing such conversions in c++. You can use streams:
std::string itoa(int i)
{
std::stringstream s;
s << i;
return s.str();
}
Or use boost::lexical_cast<> which is based on the above approach:
std::string s = boost::lexical_cast<std::string>(i);
The good thing with the latter approach is that once you've defined std::basic_stream<>& operator<<(std::basic_stream<>&, T t) for your own classes you can use boost::lexical_cast<> to convert objects of that classes to string representation.
The drawback with both means is that IMHO streams are not quite effective. With time critical tasks you might prefer using something like _itoa().
--
Maxim Yegorushkin
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Maxim
|
6/6/2004 5:16:23 PM
|
|
Hi,
p8mode wrote:
> What would you recommend is the best way to construct a std::string from
> a numerical variable (eg an int).
I think you may find these articles interesting:
http://www.cuj.com/documents/s=8006/cuj0212wilson/
http://www.cuj.com/documents/s=8840/cujexp0309wilson/
http://www.cuj.com/documents/s=8906/cujexp0311wilson/
http://www.cuj.com/documents/s=8943/cujexp0312wilson/
--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Maciej
|
6/6/2004 11:14:34 PM
|
|
p8mode <p8mode@gmx.net> wrote:
> What would you recommend is the best way to construct a std::string from
> a numerical variable (eg an int).
This is answered in sections 38.2 and 38.3 of the FAQ:
http://www.parashift.com/c++-faq-lite/misc-technical-issues.html#faq-38.2
Take a look at the rest of the FAQ as well - I think you'll find it
useful.
> Ive found _itoa() function on msdn thats part of the ms c runtime
> library. This is presumably only available in their library though and
> thus importable?
You can use std::itoa(), which is a standard C function that, like all
C functions, is available in C++ as well. If you are programming in
C++, you should prefer the C++ way described in the FAQ unless you
have a good reason not to.
Best regards,
Tom
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Thomas8675309
|
6/6/2004 11:15:36 PM
|
|
Tom <Thomas8675309@yahoo.com> wrote:
>
> You can use std::itoa(), which is a standard C function that, like all
> C functions, is available in C++ as well. If you are programming in
> C++, you should prefer the C++ way described in the FAQ unless you
> have a good reason not to.
>
> Best regards,
>
> Tom
Itoa() and _itoa() are non standard functions, they are not mentioned
in C99 or C90 standards. Thus itoa is not a C++ standard function.
You can use stringstreams, directly or via boost::lexigraphic_cast<..>
or you can if you are game use the put function of the facet num_put of
the current locale to do what string stream are going to do directly
into the string you will return.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
cbarron3
|
6/7/2004 11:08:30 AM
|
|
I wrote:
> p8mode <p8mode@gmx.net> wrote:
>
> > What would you recommend is the best way to construct a std::string
> > from a numerical variable (eg an int).
>
> This is answered in sections 38.2 and 38.3 of the FAQ:
>
> http://www.parashift.com/c++-faq-lite/misc-technical-issues.html#faq-38.2
Oops. I got it backwards - I was thinking abuot converting a string
_into_ a numerical type. The correct FAQ reference is to section
38.1:
http://www.parashift.com/c++-faq-lite/misc-technical-issues.html#faq-38.1
[snip]
> > Ive found _itoa() function on msdn thats part of the ms c runtime
> > library. This is presumably only available in their library though and
> > thus importable?
>
> You can use std::itoa(), which is a standard C function that, like all
> C functions, is available in C++ as well. If you are programming in
> C++, you should prefer the C++ way described in the FAQ unless you
> have a good reason not to.
As Carl Barron points out, itoa is not standard (although it is
available on my gcc implementation). Again, I was confused and had it
backwards - I was thinking of std::atoi(), which is standard. If you
insisted on doing it the C way, I suppose you could use sprintf and
std::string::c_str() - but I can't think of a good,
non-implementation-specific reason why you would.
Best regards and mea culpa,
Tom
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Thomas8675309
|
6/7/2004 10:53:57 PM
|
|
Thomas8675309@yahoo.com (Tom) wrote in message
news:<7b68d58f.0406061332.29befdbb@posting.google.com>...
> p8mode <p8mode@gmx.net> wrote:
> > What would you recommend is the best way to construct a std::string
> > from a numerical variable (eg an int).
> This is answered in sections 38.2 and 38.3 of the FAQ:
>
> http://www.parashift.com/c++-faq-lite/misc-technical-issues.html#faq-38.2
> Take a look at the rest of the FAQ as well - I think you'll find it
> useful.
> > Ive found _itoa() function on msdn thats part of the ms c runtime
> > library. This is presumably only available in their library though
> > and thus importable?
> You can use std::itoa(), which is a standard C function that, like all
> C functions, is available in C++ as well.
That's news to me. I was unable to find it in the C standard.
There is a reason why the C standard has functions like atoi(), but not
the reverse: there is no universally valid semantics for itoa().
Although there is normally only one way of representing a value as an
int (or at least, only one canonical way), the same is hardly true for
representing it as a string.
There is one thing that intregues me in the original posting. He
suggests as his own implementation generating the digits with 'a'+d
(where d is the value of the digit). First, of course, this isn't very
well defined -- there is no guarantee in the standard that 'a'+1 ==
'b'. So you should normally use an array, e.g.:
"abcdefghijklmnopqrstuvwxyz"[d]. Except that if you start with 'a' for
0, the numbers won't be very legible to humans. While it seems
reasonable to suppose that the original poster wasn't aware that letters
in the alphabet might not be contiguous (since they are contiguous under
Windows or any of the Unix like systems), I really can't imagine that he
thinks that "a" is the usual representation for 0, "b" for 1, and so
on. So I'm led to conclude that his goal in the conversion is something
particular.
If the goal is not legibility, there was an interesting thread here not
too long ago about converting using base 96, or a little less, which
resulted in the shortest possible string consisting of only printable
ASCII characters. (I forget the details, although I am the one that
posted many of them.)
--
James Kanze GABI Software
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
kanze
|
6/7/2004 10:56:07 PM
|
|
Hi,
kanze@gabi-soft.fr wrote:
> There is a reason why the C standard has functions like atoi(), but not
> the reverse: there is no universally valid semantics for itoa().
Could you please clarify the justification for this apparent lack of
symmetry?
If there is no "universally valid semantics" for itoa, then I think it
is also true for atoi. These functions should reverse each other in the
following sense:
int i = ...;
assert(i == atoi(itoa(i)));
(forget the exact interface of these functions, I'm talking about
conceptual complementation)
> There is one thing that intregues me in the original posting. He
> suggests as his own implementation generating the digits with 'a'+d
> (where d is the value of the digit). First, of course, this isn't very
> well defined -- there is no guarantee in the standard that 'a'+1 ==
> 'b'.
I believe he meant '0' + 1 == '1'. This is reasonable expectation and I
suspect it is also backed up by either C or C++ standard.
--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Maciej
|
6/9/2004 12:22:50 PM
|
|
Maciej Sobczak <no.spam@no.spam.com> wrote in message
news:<ca5ds8$ktu$1@atlantis.news.tpi.pl>...
> kanze@gabi-soft.fr wrote:
> > There is a reason why the C standard has functions like atoi(), but
> > not the reverse: there is no universally valid semantics for
> > itoa().
> Could you please clarify the justification for this apparent lack of
> symmetry?
Because the functions aren't symetric. A function can only return one
value. So a function can implement an n->1 mapping, but not a 1->n.
Atoi is n->1. Itoa, if it existed, would be 1->n. To be usable, you
must be able to specify which of the n you want. That's why there are
so many options for formatting integers in (s)printf.
> If there is no "universally valid semantics" for itoa, then I think it
> is also true for atoi. These functions should reverse each other in
> the following sense:
> int i = ...;
> assert(i == atoi(itoa(i)));
> (forget the exact interface of these functions, I'm talking about
> conceptual complementation)
And what about ?
assert( strcmp(itoa(atoi(someString)), someString) == 0 ) ;
It's mathematically impossible, since we don't have a bijection.
> > There is one thing that intregues me in the original posting. He
> > suggests as his own implementation generating the digits with 'a'+d
> > (where d is the value of the digit). First, of course, this isn't
> > very well defined -- there is no guarantee in the standard that
> > 'a'+1 == 'b'.
> I believe he meant '0' + 1 == '1'. This is reasonable expectation and
> I suspect it is also backed up by either C or C++ standard.
Only for the ten digits in the basic character set. For no other
characters.
--
James Kanze GABI Software
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
kanze
|
6/10/2004 10:53:35 AM
|
|
In article <ca5ds8$ktu$1@atlantis.news.tpi.pl>, Maciej Sobczak
<no.spam@no.spam.com> writes
>Hi,
>
>kanze@gabi-soft.fr wrote:
>
>> There is a reason why the C standard has functions like atoi(), but not
>> the reverse: there is no universally valid semantics for itoa().
>
>Could you please clarify the justification for this apparent lack of
>symmetry?
Because the processes are only superficially symmetrical. In order to
convert ascii to int you just need storage for an int and that is of a
size known to the compiler and a function whose declaration is:
int atoi(char const *);
works fine. However:
char const * itoa(int i);
has a serious problem which starts with the return type (char const *).
To what is the pointer pointing?
>If there is no "universally valid semantics" for itoa, then I think it
>is also true for atoi. These functions should reverse each other in the
>following sense:
The problem isn't with the 'semantics' but with the way such semantics
would have to be provided.
>
>int i = ...;
>assert(i == atoi(itoa(i)));
Great and how do you propose to deal with the memory leak?
>
>(forget the exact interface of these functions, I'm talking about
>conceptual complementation)
No, we cannot ignore the implementation details they are an inherent
part of the problem. Now if we were to tackle the problem in C++ rather
than just inheriting a solution form C we might use std::string but that
is a different issue.
>
>
>> There is one thing that intregues me in the original posting. He
>> suggests as his own implementation generating the digits with 'a'+d
>> (where d is the value of the digit). First, of course, this isn't very
>> well defined -- there is no guarantee in the standard that 'a'+1 ==
>> 'b'.
>
>I believe he meant '0' + 1 == '1'. This is reasonable expectation and I
>suspect it is also backed up by either C or C++ standard.
Yes, for some reason C over-specified its native character set by
requiring that the ten digits have consecutive numerical values. Note
that no such requirement is made for any other sub-set of the native
character set.
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Francis
|
6/10/2004 3:06:23 PM
|
|
Hi,
kanze@gabi-soft.fr wrote:
> > Could you please clarify the justification for this apparent lack of
> > symmetry?
>
> Because the functions aren't symetric.
[...]
> And what about ?
> assert( strcmp(itoa(atoi(someString)), someString) == 0 ) ;
>
> It's mathematically impossible, since we don't have a bijection.
So?
For advanced formatting options we have streams.
The question about itoa is a FAQ (even literally) and every time
somebody asks this question, the intention is to have the most basic
formatting, most often the equivalent of printf("%d", i); or s << i;
with no modifiers applied and basic locale.
For me, the fact that lots of people want it and lots of them also
implement it, is an indication that it is a good candidate for standard
library.
Take into account the following:
using namespace boost;
// ...
assert(i == lexical_cast<int>(lexical_cast<std::string>(i)));
The fact that it does not always work the other way round does not
change the fact that lexical_cast is a really handy tool.
There are many other examples of pairs of functions that are not
bijections, but nevertheless are useful. Some of them can be made
bijective by just limiting their domains, for example sqr (not in C/C++,
but in other languages defined as x^2) and sqrt reverse each other for x
>= 0.
In the case of itoa and atoi, there is a set of strings and integers in
which assertions in both directions are always true. In my opinion, the
bare fact that the full domain of atoi is bigger than codomain of itoa
does not explain the absence of itoa in stdlib.
> > I believe he meant '0' + 1 == '1'. This is reasonable expectation and
> > I suspect it is also backed up by either C or C++ standard.
>
> Only for the ten digits in the basic character set. For no other
> characters.
Sure. Yet this guarantee is enough (and usually used) to allow very
efficient implementations of itoa, at least for base 10.
--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Maciej
|
6/10/2004 11:19:11 PM
|
|
Hi,
Francis Glassborow wrote:
>>Could you please clarify the justification for this apparent lack of
>>symmetry?
>
> Because the processes are only superficially symmetrical.
[...]
> char const * itoa(int i);
>
> has a serious problem which starts with the return type (char const *).
> To what is the pointer pointing?
To the buffer I'll give it? (that's why I wrote "forget the exact
interface").
One of the versions (the first that Google returned) has the signature:
char * itoa(int value, char * buffer, int radix);
In other words, it is *possible* to implement such function. Its
conceptual interface can be described as "takes an int and returns its
string representation". And that's all that's usually needed.
The articles in CUJ referenced in one of my previous posts present many
alternative approaches to this problem, but the concept is always the
same and all the assertions I wrote refer to this concept. The memory
management issues are not important here and can be always resolved, for
example as in the example above.
The question remains: why there is no function in the standard library
that would produce simple strings out of ints?
For the sake of programming style, a function that operates on const
char * would belong only to the C stdlib, although its more "civilized"
version:
std::string to_string(int);
would for sure find its place in the C++ stdlib.
(implementations of std::string with small string optimizations would be
a very good place to show the good points of such function)
boost::lexical_cast is fine, but performance is a strong argument
against it, especially when all is required is the simplest "canonical"
string representation, without any locale trickery.
Hence the Frequently Asked Question (it is really in the FAQ) and hence
the articles in CUJ.
People need it and people make it.
If there is a need and it is feasible to do it, why it is not in stdlib?
> The problem isn't with the 'semantics' but with the way such semantics
> would have to be provided.
> Yes, for some reason C over-specified its native character set by
> requiring that the ten digits have consecutive numerical values.
I wasn't there when that was decided, but I'm almost dead sure that the
motivation for this over-specification was exactly to allow fast
numeric-to-string and string-to-numeric conversions in base 10.
--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Maciej
|
6/11/2004 10:39:23 AM
|
|
Maciej Sobczak <no.spam@no.spam.com> wrote in message
news:<ca9jd5$7b3$1@nemesis.news.tpi.pl>...
> kanze@gabi-soft.fr wrote:
>>> Could you please clarify the justification for this apparent lack
>>> of symmetry?
>> Because the functions aren't symetric.
> [...]
>> And what about ?
>> assert( strcmp(itoa(atoi(someString)), someString) == 0 ) ;
>> It's mathematically impossible, since we don't have a bijection.
> So?
So there is no symetry, so the argument in favor of symetry is
worthless.
> For advanced formatting options we have streams. The question about
> itoa is a FAQ (even literally) and every time somebody asks this
> question, the intention is to have the most basic formatting, most
> often the equivalent of printf("%d", i); or s << i; with no modifiers
> applied and basic locale.
As far as I can see, everytime someone asks the question, it is a
beginner, who doesn't realize the lack of symetry.
The differences are, of course, much more obvious for double; it's true
that most of the time, the simple equivalent of %d would be an
acceptable default. But there is no acceptable default for double that
I can see. And symetry argues against having itoa, but not ftoa.
> For me, the fact that lots of people want it and lots of them also
> implement it, is an indication that it is a good candidate for
> standard library.
> Take into account the following:
> using namespace boost;
> // ...
> assert(i == lexical_cast<int>(lexical_cast<std::string>(i)));
> The fact that it does not always work the other way round does not
> change the fact that lexical_cast is a really handy tool.
I can't say that I've ever found a use for it. Most of Boost is very
useful. lexical_cast is the exception.
> There are many other examples of pairs of functions that are not
> bijections, but nevertheless are useful. Some of them can be made
> bijective by just limiting their domains, for example sqr (not in
> C/C++, but in other languages defined as x^2) and sqrt reverse each
> other for x >= 0.
> In the case of itoa and atoi, there is a set of strings and integers in
> which assertions in both directions are always true. In my opinion, the
> bare fact that the full domain of atoi is bigger than codomain of itoa
> does not explain the absence of itoa in stdlib.
>>> I believe he meant '0' + 1 == '1'. This is reasonable expectation
>>> and I suspect it is also backed up by either C or C++ standard.
>> Only for the ten digits in the basic character set. For no other
>> characters.
> Sure. Yet this guarantee is enough (and usually used) to allow very
> efficient implementations of itoa, at least for base 10.
Actually, it's more useful on input conversions, where it can, in a very
few, limited, cases, be used to avoid a table lookup. When converting
for output, directly indexing into the table is more flexible, and not
generally more expensive. Every time I've implemented the code, I've
used something like `"0123456789abcdef"[ value % base ]'.
--
James Kanze GABI Software
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
kanze
|
6/12/2004 12:37:52 AM
|
|
"Maciej Sobczak" <no.spam@no.spam.com> wrote in message
news:caak2n$anj$1@atlantis.news.tpi.pl...
> The question remains: why there is no function in the standard library
> that would produce simple strings out of ints?
And the answer remains: see sprintf. We intentionally dropped itoa
when we standardized C, just as we intentionally dropped open/close
etc.
> Hence the Frequently Asked Question (it is really in the FAQ) and hence
> the articles in CUJ.
>
> People need it and people make it.
> If there is a need and it is feasible to do it, why it is not in stdlib?
Because, perhaps, not every frigging thing that people need and people
make *has* to end up in a standard library. See Java.
> > Yes, for some reason C over-specified its native character set by
> > requiring that the ten digits have consecutive numerical values.
>
> I wasn't there when that was decided, but I'm almost dead sure that the
> motivation for this over-specification was exactly to allow fast
> numeric-to-string and string-to-numeric conversions in base 10.
Exactly. Another intentional decision on the part of the C committee.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
P
|
6/12/2004 8:32:19 AM
|
|
In article <caak2n$anj$1@atlantis.news.tpi.pl>, Maciej Sobczak
<no.spam@no.spam.com> writes
>Francis Glassborow wrote:
>
>>>Could you please clarify the justification for this apparent lack of
>>>symmetry?
>>
>> Because the processes are only superficially symmetrical.
>[...]
>> char const * itoa(int i);
>>
>> has a serious problem which starts with the return type (char const *).
>> To what is the pointer pointing?
>
>To the buffer I'll give it? (that's why I wrote "forget the exact
>interface").
>One of the versions (the first that Google returned) has the signature:
>
>char * itoa(int value, char * buffer, int radix);
But that 'exact' interface completely breaks any putative symmetry.
>
>In other words, it is *possible* to implement such function. Its
>conceptual interface can be described as "takes an int and returns its
>string representation". And that's all that's usually needed.
>The articles in CUJ referenced in one of my previous posts present many
>alternative approaches to this problem, but the concept is always the
>same and all the assertions I wrote refer to this concept. The memory
>management issues are not important here and can be always resolved, for
>example as in the example above.
Interesting but that is not true in the common C implementations of
itoa(). The argument is for symmetry and not for one more place to
provide opportunities for buffer overruns.
>
>The question remains: why there is no function in the standard library
>that would produce simple strings out of ints?
Because in the days when that part of the library was being developed we
were less conscious of the advantages to be gained from using a genuine
string type rather than an array of char.
>For the sake of programming style, a function that operates on const
>char * would belong only to the C stdlib, although its more "civilized"
>version:
C tended to assume competence among programmers and only provided the
kind of common library subset that made porting operating systems easy
(have a careful look at the C Standard Library and note all the things
that are missing as well as the odd things that are there)
>
>std::string to_string(int);
But why not to_wstring or to_japanese_string etc.? Yes I happen to agree
that providing an std::string version would make sense but for rather
more things than just int.
>
>would for sure find its place in the C++ stdlib.
>(implementations of std::string with small string optimizations would be
>a very good place to show the good points of such function)
Yes, but of course those are a post 1998 development.
>
>boost::lexical_cast is fine, but performance is a strong argument
>against it, especially when all is required is the simplest "canonical"
>string representation, without any locale trickery.
Hm... I am less convinced. Don't you think itoa() should be locale
correct? Even more so when we want to convert double to string.
Shouldn't all fundamental types be treated equally?
>
>Hence the Frequently Asked Question (it is really in the FAQ) and hence
>the articles in CUJ.
>
>People need it and people make it.
>If there is a need and it is feasible to do it, why it is not in stdlib?
Well propose it to WG14 :-) But I think a rather bigger collection of
additions might make more impact.
>
>> The problem isn't with the 'semantics' but with the way such semantics
>> would have to be provided.
>
>> Yes, for some reason C over-specified its native character set by
>> requiring that the ten digits have consecutive numerical values.
>
>I wasn't there when that was decided, but I'm almost dead sure that the
>motivation for this over-specification was exactly to allow fast
>numeric-to-string and string-to-numeric conversions in base 10.
Yes, and a classic example of premature optimisation:-) There is no
reason that platforms where the codes for '0' to '9' are consecutive
shouldn't use fast code but there is also no reason to require platforms
to enable such coding mechanisms. I mean let us be logical, why not
require that '0' through '9' through 'A' to 'F' be consecutive to allow
fast coding for hexadecimal?
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Francis
|
6/12/2004 11:06:26 AM
|
|
In article <hylyc.12144$H65.10299@nwrddc02.gnilink.net>, P.J. Plauger
<pjp@dinkumware.com> writes
> > > Yes, for some reason C over-specified its native character set by
> > > requiring that the ten digits have consecutive numerical values.
> >
> > I wasn't there when that was decided, but I'm almost dead sure that the
> > motivation for this over-specification was exactly to allow fast
> > numeric-to-string and string-to-numeric conversions in base 10.
>
>Exactly. Another intentional decision on the part of the C committee.
Being intentional does not make it correct :-) In this case I think it
is almost completely academic because I know of no character set where
the codes for digits are not consecutive. I say almost because in modern
times we have the possibility that a character set may have more than
one set of consecutive representations of the digits (for example 0x3021
through 0x3029 represent Suzhou numerals in the Unicode coding)
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Francis
|
6/12/2004 7:58:30 PM
|
|
Hi,
I have to admit that the posts from you, from Mr. Kanze and from Mr.
Plauger prompted me to look at itoa from some distance.
But let's continue a bit, there are still some interesting issues.
Francis Glassborow wrote:
>>char * itoa(int value, char * buffer, int radix);
>
> But that 'exact' interface completely breaks any putative symmetry.
Yes, but that's only one of the possible approaches.
There are others, including the use of static local buffers, thread
local storage and other stuff. But they all differ only in their
tradeoffs, meaning that neither is perfectly suited for stdlib.
Which provokes an interesting question.
Possibly, atoi was the "safest" to specify and implement, without
risking any tradeoff related to memory management. But I cannot believe
that the mood of standard committe was "hey, let's do easy stuff and let
others do their homework with non-obvious pieces".
I've asked why there is no itoa in stdlib.
I should be more verbose: why there is no itoa in stdlib if there is
atoi? Or, even better: what was the reason to provide atoi if there was
no wish to do itoa?
Or, probably the best of all: "why there is atoi in the first place"?
Was it considered useful? Was it considered better than sscanf?
If it was useful and better, why there is no itoa?
If atoi can be explained in terms of sscanf, then itoa can be explained
in terms of sprintf. With the same level of safety, similar tradeoffs,
etc. I just cannot understand that having a pair sscanf/sprintf, there
is no pair atoi/itoa. There is only atoi, fooling around and provoking
discussions about symmetry, feasibility and applicability. Every
argument for and against itoa can be applied also for sprintf.
Possibly, without atoi, there would never be a question and debate about
assymetry in the library.
>>std::string to_string(int);
>
> But why not to_wstring or to_japanese_string etc.?
OK.
So why there is no atoi_from_wstring or atoi_from_japanese_string?
Why there is only atoi?
What still bothers me is a hole in the library, because there is no
itoa. atoi looks like part of something bigger that was started but
never finished.
If there is atoi (and there is), there should be a big bunch of related
functions, covering other types, wide strings and... reverse conversions.
There is either a hole in stdlib (no itoa) or a stranger (atoi).
> Hm... I am less convinced. Don't you think itoa() should be locale
> correct?
Well...
atoi is defined as equivalent to strtol, which can accept various input
based in locale in use. So, yes, itoa should use the locale as well, in
a way that should be recognized by consecutive call to atoi. This is
also the kind of symmetry I expressed in the conceptual assertions.
--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Maciej
|
6/13/2004 10:15:07 AM
|
|
Francis Glassborow <francis@robinton.demon.co.uk> writes:
[...]
|> >The question remains: why there is no function in the standard
|> >library that would produce simple strings out of ints?
|> Because in the days when that part of the library was being
|> developed we were less conscious of the advantages to be gained from
|> using a genuine string type rather than an array of char.
Would you say then that it might be a good idea to have a function
itoa() (probably under a different name) which returned and std::string?
What about ftoa(), then?
Depending on the answer to the last question:
- Why support this for integers and not for floating point, or
- What should the semantics of the function be?
--
James Kanze
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
James
|
6/13/2004 10:23:01 AM
|
|
In article <caftpf$ssa$1@nemesis.news.tpi.pl>, Maciej Sobczak
<no.spam@no.spam.com> writes
>Francis Glassborow wrote:
>
>>>char * itoa(int value, char * buffer, int radix);
>>
>> But that 'exact' interface completely breaks any putative symmetry.
>
>Yes, but that's only one of the possible approaches.
And there is the key. Where there was more than one way to do something
with different trade-offs C nearly always leaves it up to the user to
decide what they want to do. atoi() has just about only one reasonable
semantic. itoa() has a range of different ones. In C++ we might get
somewhere with either overloading or a template but we chose to leave
the area alone. If you feel strongly enough write up a consistent
proposal to WG21 for conversion of all arithmetic types to a string type
but please do not try to special case integer types -- they do not
deserve to have special treatment.
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Francis
|
6/14/2004 2:23:50 AM
|
|
In article <m2isdwzked.fsf@lns-vlq-29-82-254-1-63.adsl.proxad.net>,
James Kanze <kanze@gabi-soft.fr> writes
>Francis Glassborow <francis@robinton.demon.co.uk> writes:
>
> [...]
>|> >The question remains: why there is no function in the standard
>|> >library that would produce simple strings out of ints?
>
>|> Because in the days when that part of the library was being
>|> developed we were less conscious of the advantages to be gained from
>|> using a genuine string type rather than an array of char.
>
>Would you say then that it might be a good idea to have a function
>itoa() (probably under a different name) which returned and std::string?
>What about ftoa(), then?
>
>Depending on the answer to the last question:
> - Why support this for integers and not for floating point, or
I wouldn't, though the fact that they have exact representations gives
them some advantages, they aren't, IMO, that special.
> - What should the semantics of the function be?
It isn't my proposal but that should be addressed by anyone who wishes
to make such a proposal.
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Francis
|
6/14/2004 2:24:41 AM
|
|
Francis Glassborow <francis@robinton.demon.co.uk> writes:
|> In article <hylyc.12144$H65.10299@nwrddc02.gnilink.net>, P.J. Plauger
|> <pjp@dinkumware.com> writes
|> > > > Yes, for some reason C over-specified its native character
|> > > > set by requiring that the ten digits have consecutive
|> > > > numerical values.
|> > > I wasn't there when that was decided, but I'm almost dead sure
|> > > that the motivation for this over-specification was exactly to
|> > > allow fast numeric-to-string and string-to-numeric conversions
|> > > in base 10.
|> >Exactly. Another intentional decision on the part of the C committee.
|> Being intentional does not make it correct :-) In this case I think
|> it is almost completely academic because I know of no character set
|> where the codes for digits are not consecutive.
I've actually used one. And in the (distant) pass, things like Grey
encoding were considered.
I don't know what the motivation of this decision was. I do know that
it wasn't extended to alphabetical characters, and I suspect (without
being 100% sure) that this is because EBCDIC was still fairly
widespread, and the alphabetical characters are not contiguous in
EBCDIC.
|> I say almost because in modern times we have the possibility that a
|> character set may have more than one set of consecutive
|> representations of the digits (for example 0x3021 through 0x3029
|> represent Suzhou numerals in the Unicode coding)
>From the standards point of view, the ONLY digits are the ten digits in
the basic character set. For the others, isdigit is required to return
false.
--
James Kanze
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
James
|
6/14/2004 2:38:25 AM
|
|
Maciej Sobczak <no.spam@no.spam.com> writes:
|> I have to admit that the posts from you, from Mr. Kanze and from Mr.
|> Plauger prompted me to look at itoa from some distance.
|> But let's continue a bit, there are still some interesting issues.
|> Francis Glassborow wrote:
|> >>char * itoa(int value, char * buffer, int radix);
|> > But that 'exact' interface completely breaks any putative
|> > symmetry.
Note that we are really talking about C here. The atoi in C++ is
inherited from C, and only C needs the extended interface. In C++, it
would be relatively simple to address the memory management issues
using:
int atoi( std::string const& ) ;
std::string itoa( int ) ;
The C++ standard didn't decide to do this, however. (Had they decided
to go this route, there would also be questions about functions like
stdfdate.)
|> Yes, but that's only one of the possible approaches. There are
|> others, including the use of static local buffers, thread local
|> storage and other stuff. But they all differ only in their
|> tradeoffs, meaning that neither is perfectly suited for stdlib.
For various historical reasons, there are a number of functions in the C
library which return pointers to static buffers. I don't think that
this was the motivation for excluding itoa.
|> Which provokes an interesting question. Possibly, atoi was the
|> "safest" to specify and implement, without risking any tradeoff
|> related to memory management. But I cannot believe that the mood of
|> standard committe was "hey, let's do easy stuff and let others do
|> their homework with non-obvious pieces".
|> I've asked why there is no itoa in stdlib. I should be more
|> verbose: why there is no itoa in stdlib if there is atoi? Or, even
|> better: what was the reason to provide atoi if there was no wish to
|> do itoa?
A valid question? Why is there an atoi, when strtol is always a better
choice? (Because of its defined behavior in case of errors.)
|> Or, probably the best of all: "why there is atoi in the first
|> place"? Was it considered useful? Was it considered better than
|> sscanf? If it was useful and better, why there is no itoa?
Attitudes concerning what is good and acceptable evolve. Almost by
definition, a program today which uses gets is considered broken. But
at one time, someone apparently thought that the function would be a
good idea, and it was widespread enough that the C standards committee
felt obliged to incorporate it, although by the end of the 80's, its
problems were well known. (The C standard was adopted after the
Internet worm, which propagated by exploiting buffer overflow in a
couple of Unix daemons. fingerd, if I recall correctly, and I wouldn't
be at all surprised if the buffer overflow wasn't due to the program
using gets.)
I presume that it was considered useful at one time. I don't see much
use for it today. But frankly, I never use atoi either -- I don't
consider having undefined behavior in case of erroneous input acceptable
program behavior today.
|> If atoi can be explained in terms of sscanf, then itoa can be
|> explained in terms of sprintf. With the same level of safety,
|> similar tradeoffs, etc. I just cannot understand that having a pair
|> sscanf/sprintf, there is no pair atoi/itoa.
The obvious difference is the fact that there is a one to many mapping.
And that it is relatively trivial for atoi to map the many into one,
whereas it is less obvious for itoa to know which of the many it should
map the one into.
In the case of itoa, of course, there is an almost canonical
representation, which could be summed up by the "%d" format. But in the
same way you place itoa in a set of functions with atoi, I place it in a
set with ftoa. And in the case of ftoa, there most definitly isn't a
canonical representation; listen to the complaints concerning appending
a double to String in Java.
|> There is only atoi, fooling around and provoking discussions about
|> symmetry, feasibility and applicability.
There is also atof. And, potentially, itoa and ftoa.
And there is also strtoi, strtol, strtof...
There's also strfdate; I could easily imagine arguments for strfi,
strff, etc. along the same lines.
The total set is large, and there are many possible subsets.
|> Every argument for and against itoa can be applied also for sprintf.
No. The problem of the one to many mapping doesn't exist, since sprintf
provides for specifying the mapping. On the other hand, the problem of
memory management is, if anything, worse. Not to mention the usual
problems of the printf family; there are reasons why we say to use
ostream in C++. (For what it's worth, sprintf is deprecated in C.
Doubtlessly because of the memory management issues, since the
replacement, snprintf, is still like printf, and has all of the other
problems.)
|> Possibly, without atoi, there would never be a question and debate
|> about assymetry in the library.
Possibly, without an ftoa, there would never have been a question of not
having itoa as part of the library. I don't know.
|> >>std::string to_string(int);
|> > But why not to_wstring or to_japanese_string etc.?
|> OK. So why there is no atoi_from_wstring or
|> atoi_from_japanese_string? Why there is only atoi?
Because both C and C++ only recognize western European digits. Unlike
most of <ctype.h>, for example isdigit (and isxdigit) is NOT locale
dependant. And the C++ standard requires std::num_put to generate the
ten digits in the basic character set -- while there are provision in
num_punct for changing the decimal character and the thousands
separator, there are no provisions for changing the actual digits used.
In many ways, I think that Glassborow is ahead of most of us in this
regard; he's certainly ahead of what the C++ standard requires. In some
of my own Unicode-based code, I provide an entry for the character to be
used for 0 (and my input routines accept anything defined as a decimal
digit in Unicode); when outputting, I add the value of the digit to
this. (All of the decimal digits in Unicode come in blocks of ten,
conveniently in order.) I have no idea whether this is really
sufficient; some of the alphabets have special characters for 100, or
IIRC, for 20; if it is necessary to support these, the resulting
function is going to be quite complicated. (I once wrote a function
which would convert integral values into French text. It wasn't
trivial, since parts of the French number system is base 20. For that
matter, even in English, you can't generate thirteen from ten and three.)
|> What still bothers me is a hole in the library, because there is no
|> itoa. atoi looks like part of something bigger that was started but
|> never finished.
|> If there is atoi (and there is), there should be a big bunch of
|> related functions, covering other types, wide strings and... reverse
|> conversions. There is either a hole in stdlib (no itoa) or a
|> stranger (atoi).
Since we're talking about C++, I would rather see an:
template< typename T >
T atot( std::string const& ) ;
In fact, it has been done. Except that it isn't named std::atot, but
boost::lexical_cast. (I don't much like the name, but that is neither
here nor there.)
|> > Hm... I am less convinced. Don't you think itoa() should be
|> > locale correct?
|> Well...
|> atoi is defined as equivalent to strtol, which can accept various input
|> based in locale in use. So, yes, itoa should use the locale as well, in
|> a way that should be recognized by consecutive call to atoi. This is
|> also the kind of symmetry I expressed in the conceptual assertions.
It's true that the specifications of strtol state that "In other that
the "C" locale, additional implementation-defiend subject sequences may
be accepted." Which is pretty vague, and I don't know of an
implementation which does accept any. Other than that, strtol is
defined in terms of isspace() (which is locale dependant), and the
format of an integral constant in C (which is NOT locale dependant).
--
James Kanze
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
James
|
6/14/2004 2:47:49 AM
|
|
In article <m2zn77woj1.fsf@lns-vlq-29-82-254-1-63.adsl.proxad.net>,
James Kanze <kanze@gabi-soft.fr> writes
>|> I say almost because in modern times we have the possibility that a
>|> character set may have more than one set of consecutive
>|> representations of the digits (for example 0x3021 through 0x3029
>|> represent Suzhou numerals in the Unicode coding)
>
>>From the standards point of view, the ONLY digits are the ten digits in
>the basic character set. For the others, isdigit is required to return
>false.
And reading the relevant sections of the C & C++ Standards (basic
character sets) leads me to think that the implicit requirement for
codes to represent the 10 glyphs used to represent the digits as 0, 1
....9 is another example of Anglo-Saxon bias. Indeed the whole of 7.4.1
in C99 is deeply suspect once we move to 16-bit (or higher) character
sets. According to my reading, assuming a Unicode character set
islower(0x3021), isupper(0x3021), isalpha(0x3021) return true unless we
are in a C locale.
I think that this area of C and C++ is ripe for reconsideration.
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Francis
|
6/14/2004 10:24:51 AM
|
|
James Kanze wrote:
> Maciej Sobczak <no.spam@no.spam.com> writes:
<snip>
>|> I've asked why there is no itoa in stdlib. I should be more
>|> verbose: why there is no itoa in stdlib if there is atoi? Or, even
>|> better: what was the reason to provide atoi if there was no wish to
>|> do itoa?
>
> A valid question? Why is there an atoi, when strtol is always a better
> choice? (Because of its defined behavior in case of errors.)
According to the Rationale:
"atof, atoi, and atol are subsumed by strtod and strtol, but have
been retained because they are used extensively in existing code.
They are less reliable, but may be faster if the argument is known
to be in a valid range."
>|> Or, probably the best of all: "why there is atoi in the first
>|> place"? Was it considered useful? Was it considered better than
>|> sscanf? If it was useful and better, why there is no itoa?
>
> Attitudes concerning what is good and acceptable evolve. Almost by
> definition, a program today which uses gets is considered broken. But
> at one time, someone apparently thought that the function would be a
> good idea, and it was widespread enough that the C standards committee
> felt obliged to incorporate it, although by the end of the 80's, its
> problems were well known.
In any case gets is a vestigial part of the portable I/O library,
distributed with Unix version 6, which was obsoleted by the standard
I/O library in version 7 (released in 1979).
> (The C standard was adopted after the Internet worm, which
> propagated by exploiting buffer overflow in a couple of Unix
> daemons. fingerd, if I recall correctly, and I wouldn't be at all
> surprised if the buffer overflow wasn't due to the program using
> gets.)
It exploited three different kinds of vulnerability: a buffer overflow
in fingerd on 4.3BSD, a debugging mode in sendmail enabled by default
in several releases of BSD and SunOS, and poor password selection
combined with weak hashing in conventional Unix password files.
<snip>
> (I once wrote a function which would convert integral values into
> French text. It wasn't trivial, since parts of the French number
> system is base 20.
<snip>
Except in Switzerland, where it's much more sensible.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Ben
|
6/14/2004 9:21:18 PM
|
|
Ben Hutchings <do-not-spam-benh@bwsint.com> wrote in message
news:<slrnccraca.pld.do-not-spam-benh@shadbolt.i.decadentplace.org.uk>...
> James Kanze wrote:
> > Maciej Sobczak <no.spam@no.spam.com> writes:
> <snip>
> >|> I've asked why there is no itoa in stdlib. I should be more
> >|> verbose: why there is no itoa in stdlib if there is atoi? Or,
> >|> even better: what was the reason to provide atoi if there was no
> >|> wish to do itoa?
> > A valid question? Why is there an atoi, when strtol is always a
> > better choice? (Because of its defined behavior in case of errors.)
> According to the Rationale:
> "atof, atoi, and atol are subsumed by strtod and strtol, but have
> been retained because they are used extensively in existing code.
> They are less reliable, but may be faster if the argument is known
> to be in a valid range."
The same rationale is valid for itoa and ftoa.
[...]
> <snip>
> > (I once wrote a function which would convert integral values into
> > French text. It wasn't trivial, since parts of the French number
> > system is base 20.
> <snip>
> Except in Switzerland, where it's much more sensible.
And in Belgium.
In fact, even in English, you still have to handle eleven and twelve,
and even the rest of the teen's are irregular. In the end, I
implemented a base 10, with pointers to the strings (like "twenty",
"thirty", etc.); if there was a null pointer, I dropped into base
twenty. And I used this to handle not only French, but the teen's in
English.
I never did find a good solution to some of the other odd rules,
however: 100 (cent) is plurial if therer is more than one, but only for
round hundreds (e.g. "deux cents", but "deux cent quarante-deux"), and
above all, 1000 is written "mille" except in a year in a date, when it
is "mil". The first of these rules has since been abrogated by a
spelling reform. So my code handled everything but the last case
correctly, for English, French, German and Italian. Including the fact
that word order is different in German than in the other languages, and
that in French, you insert an "et" between the tens and a one, but not
between the tens and other values (e.g. vingt-et-un, but vingt-deux).
--
James Kanze GABI Software
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
kanze
|
6/15/2004 11:34:54 PM
|
|
Francis Glassborow <francis@robinton.demon.co.uk> wrote in message
news:<DgYfWpDD1WzAFwg4@robinton.demon.co.uk>...
> In article <m2zn77woj1.fsf@lns-vlq-29-82-254-1-63.adsl.proxad.net>,
> James Kanze <kanze@gabi-soft.fr> writes
> >|> I say almost because in modern times we have the possibility that
> >|> a character set may have more than one set of consecutive
> >|> representations of the digits (for example 0x3021 through 0x3029
> >|> represent Suzhou numerals in the Unicode coding)
> >From the standards point of view, the ONLY digits are the ten digits
> >in the basic character set. For the others, isdigit is required to
> >return false.
> And reading the relevant sections of the C & C++ Standards (basic
> character sets) leads me to think that the implicit requirement for
> codes to represent the 10 glyphs used to represent the digits as 0, 1
> ...9 is another example of Anglo-Saxon bias. Indeed the whole of
> 7.4.1 in C99 is deeply suspect once we move to 16-bit (or higher)
> character sets. According to my reading, assuming a Unicode character
> set islower(0x3021), isupper(0x3021), isalpha(0x3021) return true
> unless we are in a C locale.
> I think that this area of C and C++ is ripe for reconsideration.
I quite agree, but I'm still not sure just how far it should go. Using
a different set of digits seems obvious, but perhaps there are locales
where people don't normally use base 10. Should we consider this (a
locale specific base) as well? And what about a locale for Roman
numerals?
With regards to the results of islower, isupper, etc...
The forms you used correspond to the functions in <cctype>, not those in
locale. I don't know whether this was the intent, however, on my
machine, the calls you give result in undefined behavior -- the
constraint on the functions in <cctype> is that the argument be in the
range [0...UCHAR_MAX] or that it be EOF.
If you meant in fact the functions iswlower, iswupper and iswalpha, then
all the C standard requires is that iswalpha be true if either iswlower
or iswupper are true; they certainly don't require any of them to be
true, and assuming Unicode, it would seem evident that the intent is
that they all be false (in all locales). (It's interesting to note that
while the C standard allows iswalpha to return true when neither
iswupper nor iswlower return true, it doesn't forbid characters where
both iswupper and iswlower are true.)
Finally, if you really meant the templated functions from <locale>, and
just forgot the second parameter, then you have undefined behavior,
since these functions are normally only defined for char and wchar_t,
and 0x3021 has type int:-). Cast to wchar_t: 1) I don't seen anything
which specifies the behavior for any locale, including "C", and in fact,
in the case of Unicode, I would expect all locales to behave
indentically, and 2) assuming Unicode, I would also assume that the
Unicode character categorizations are used; at the very least, both
isupper and islower should return false. I'm not sure whether isalpha
should return true for an Nl or not; in my experimental implementation
of Unicode character classes, it doesn't. I agree that some indication
in the standard would be useful; something more along the lines of a TR
or a recommended practice, however, and not an absolute constraint.
(That said, I'm curious as to why Unicode classifies this character as
Nl, and not Nd. I don't know the alphabet in question, and how the
character is actually used, but in general, when I see a sequence of ten
numeric characters, with values from 0 to 9, I would certainly expect
Nd. Still, I'm not going to second guess Unicode; if Unicode says Nl,
then isdigit returns false. As does islower, isupper, and at least in
my implementation, isalpha. The only possible ambiguity that I see is
the last.)
--
James Kanze GABI Software
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
kanze
|
6/15/2004 11:35:28 PM
|
|
kanze@gabi-soft.fr wrote:
> In fact, even in English, you still have to handle eleven and twelve,
> and even the rest of the teen's are irregular. In the end, I
> implemented a base 10, with pointers to the strings (like "twenty",
> "thirty", etc.); if there was a null pointer, I dropped into base
> twenty. And I used this to handle not only French, but the teen's in
> English.
And did you check the location to know whether to use "thousand million"
or "billion"?
--
Andy V
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Andy
|
6/16/2004 10:14:22 AM
|
|
Andy V <nobody@nowhere.net> wrote in message
news:<CvNzc.181$Zx3.6614@petpeeve.ziplink.net>...
> kanze@gabi-soft.fr wrote:
> > In fact, even in English, you still have to handle eleven and twelve,
> > and even the rest of the teen's are irregular. In the end, I
> > implemented a base 10, with pointers to the strings (like "twenty",
> > "thirty", etc.); if there was a null pointer, I dropped into base
> > twenty. And I used this to handle not only French, but the teen's in
> > English.
> And did you check the location to know whether to use
> "thousand million" or "billion"?
No. It was a long time ago, on a 16 bit machine, so the problem
didn't come up:-).
--
James Kanze GABI Software
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
kanze
|
6/17/2004 8:41:12 AM
|
|
|
27 Replies
549 Views
(page loaded in 0.416 seconds)
|