COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Max integer in a floating point type?

• Email
• Follow

```Given <float.h> (or <limits>), how can you calculate the largest X such
that all the integers in [0, X] are exactly representable in floating
point type T?

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated.    First time posters: Do this! ]

```
 0
Reply anti_spam_email2003 (129) 2/12/2006 11:41:18 PM

See related articles to this posting

```In article <1139780739.125353.78060@f14g2000cwb.googlegroups.com>, Me
<anti_spam_email2003@yahoo.com> wrote:

> Given <float.h> (or <limits>), how can you calculate the largest X such
> that all the integers in [0, X] are exactly representable in floating
> point type T?
>
Not directly but numeric_limits<T> has two const int members digits
and digits10.  digits is the number of radix digits in the mantissa and
digits10 is the number of decimal digits in that can be represented
without change,  so we have the  largest it representable integer in a
floating point pt type is 2^numeric_limits<T>::digits - 1.

double max_int = std::pow(2.,std::numeric_limits<T>::digits) - 1 as a
first approx.

double calc_max_integer(unsigned int n)
{
double ans=1.;
for(;n!=0;--n)
ans += ans;
return --ans;
}

const double max_integer =
calc_max_integer(std::numeric_limits<double>::value);
etc...

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated.    First time posters: Do this! ]

```
 0
Reply Carl 2/13/2006 6:43:04 AM

```Me wrote:
> Given <float.h> (or <limits>), how can you calculate the largest X such
> that all the integers in [0, X] are exactly representable in floating
> point type T?

I would use std::numeric_limits<float>::digits to calculate the maximum
integer value that a floating point type can represent with a �1.0
precision:

#include <limits>
#include <math.h>

int main()
{
float f = powf(2, std::numeric_limits<float>::digits);
double d = pow(2, std::numeric_limits<double>::digits);
...

For 32-bit (single precision IEEE) floating point, X is 16777216.0. For
64-bit (double precision IEEE), X is 9007199254740992.0. The
corresponding minimum value for a floating point type would be -X.

Greg

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated.    First time posters: Do this! ]

```
 0
Reply Greg 2/13/2006 6:44:40 AM

```Greg Herlihy wrote:
> Me wrote:
> > Given <float.h> (or <limits>), how can you calculate the
> > largest X such that all the integers in [0, X] are exactly
> > representable in floating point type T?

> I would use std::numeric_limits<float>::digits to calculate
> the maximum integer value that a floating point type can
> represent with a �1.0 precision:

>     #include <limits>
>     #include <math.h>

>     int main()
>     {
>          float f = powf(2, std::numeric_limits<float>::digits);
>          double d = pow(2, std::numeric_limits<double>::digits);

Why pow(2., ...)?  That won't work with IBM format, for example.
I'd use:

pow( std::numeric_limits< T >::radix,
std::numeric_limits< T >::digits ) ;

(Also, if you include <cmath>, instead of <math.h>, you should
have overloads of the pow function, so you don't need an
explicit powf.)

--
James Kanze                                           GABI Software
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated.    First time posters: Do this! ]

```
 0
Reply kanze 2/13/2006 11:48:01 AM

```kanze wrote:
> Greg Herlihy wrote:
> > Me wrote:
> > > Given <float.h> (or <limits>), how can you calculate the
> > > largest X such that all the integers in [0, X] are exactly
> > > representable in floating point type T?
>
> > I would use std::numeric_limits<float>::digits to calculate
> > the maximum integer value that a floating point type can
> > represent with a �1.0 precision:
>
> >     #include <limits>
> >     #include <math.h>
>
> >     int main()
> >     {
> >          float f >          double d
> Why pow(2., ...)?  That won't work with IBM format, for example.
> I'd use:
>
>      pow( std::numeric_limits< T >::radix,
>           std::numeric_limits< T >::digits ) ;

The value 2 must be used in the pow() expression if we are to obtain an
answer with the requisite precision for any floating point type. Using
the actual radix of the floating point format (should it not happen to
be 2) would lead to an answer with a different - and incorrect -
precision.

The required granularity of 1.0 dictates the value of the exponent in
the floating point representation. So question then becomes the maximum
representable value of the significand. The answer is:

pow( 2, std::numeric_limits< T >::digits) - 1;

For floating point values with a radix 2 representation, the next
higher integer is also exactly representable, hence my original answer.

Greg

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated.    First time posters: Do this! ]

```
 0
Reply Greg 2/14/2006 12:29:23 AM

```kanze wrote:
> Greg Herlihy wrote:
> > Me wrote:
> > > Given <float.h> (or <limits>), how can you calculate the
> > > largest X such that all the integers in [0, X] are exactly
> > > representable in floating point type T?
>
> > I would use std::numeric_limits<float>::digits to calculate
> > the maximum integer value that a floating point type can
> > represent with a �1.0 precision:
>
> >     #include <limits>
> >     #include <math.h>
>
> >     int main()
> >     {
> >          float f >          double d
> Why pow(2., ...)?  That won't work with IBM format, for example.
> I'd use:
>
>      pow( std::numeric_limits< T >::radix,
>           std::numeric_limits< T >::digits ) ;

My understanding is that because of normalization (having the 1st
digit of mantissa equal to 1 and not using it in the representation),
it should be std::numeric_limits< T >::digits + 1.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated.    First time posters: Do this! ]

```
 0
Reply Michael 2/14/2006 12:33:05 AM

```Greg Herlihy wrote:
> kanze wrote:
> > Greg Herlihy wrote:
> > > Me wrote:
> > > > Given <float.h> (or <limits>), how can you calculate the
> > > > largest X such that all the integers in [0, X] are
> > > > exactly representable in floating point type T?

> > > I would use std::numeric_limits<float>::digits to
> > > calculate the maximum integer value that a floating point
> > > type can represent with a �1.0 precision:

> > >     #include <limits>
> > >     #include <math.h>

> > >     int main()
> > >     {
> > >          float f >          double d
> > Why pow(2., ...)?  That won't work with IBM format, for example.
> > I'd use:

> >      pow( std::numeric_limits< T >::radix,
> >           std::numeric_limits< T >::digits ) ;

> The value 2 must be used in the pow() expression if we are to
> obtain an answer with the requisite precision for any floating
> point type.  Using the actual radix of the floating point
> format (should it not happen to be 2) would lead to an answer
> with a different - and incorrect - precision.

Using the value of 2 simply gives a totally wrong result.  The
number of digits is the number of radix digits, not the number
of base two digits (which wouldn't even make sense for base 10).
If we consider an IBM float, for example, we have a radix = 16
and digits = 6.  Your formula would give 64 as the maximum
integral value which can be held without loss, rather than
16777216, which is the correct answer.

> The required granularity of 1.0 dictates the value of the
> exponent in the floating point representation.

The representation of a "digit" in the floatting point format
will be able to hold all of the integral values in the range
[0...radix).  The granularity of the representation is radix ^
(weight of low order digit) -- as long as a digit with the
weight radix^0 is present, the granularity is 1.  And that's
what my expression determines.

Strictly speaking, it determines the minimum value for which the
weight of the lowest order digit present is larger than 1.  It
is obvious, however, that for this value, the low order digit
that is no longer representated is 0, so the representation is
also exact.  For all smaller values, the weight of the low order
digit is less than or equal 1, and for the next highest value,
the lost low order digit has a value of 1, so the representation
is not exact.

That's the proof, of course.  Even the simplest test would have
shown that your expression doesn't give a reasonable answer for
radixes other than 2.

--
James Kanze                                           GABI Software
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated.    First time posters: Do this! ]

```
 0
Reply kanze 2/14/2006 10:50:06 AM

```kkkkMichael Tiomkin wrote:
> kanze wrote:
> > Greg Herlihy wrote:
> > > Me wrote:
> > > > Given <float.h> (or <limits>), how can you calculate the
> > > > largest X such that all the integers in [0, X] are exactly
> > > > representable in floating point type T?

> > > I would use std::numeric_limits<float>::digits to calculate
> > > the maximum integer value that a floating point type can
> > > represent with a �1.0 precision:

> > >     #include <limits>
> > >     #include <math.h>

> > >     int main()
> > >     {
> > >          float f >          double d
> > Why pow(2., ...)?  That won't work with IBM format, for example.
> > I'd use:

> >      pow( std::numeric_limits< T >::radix,
> >           std::numeric_limits< T >::digits ) ;

>   My understanding is that because of normalization (having the 1st
> digit of mantissa equal to 1 and not using it in the representation),
> it should be std::numeric_limits< T >::digits + 1.

Your understanding is wrong.

First, of course, the high order digit is only necessarily 1 if
the radix is 2.  When the radix is e.g. 16, as in IBM format,
the high order digit of a normalized number can be anything in
the range [1...16).

Secondly, even in the case of base 2, numeric_limits<>::digits
is the number of digits, not the number of bits used to
represent them.  While its true that IEEE exploits the fact that
the high order digit is known, and need not be represented, the
digit is still present in the value, and is reflected in the
value of numeric_limits<>::digits.

For the two most common formats, we have thus:

IEEE:
float
digits = 24
maxint = 16777216
double
digits = 53
maxint = 9007199254740992

IBM:
float
digits = 6
maxint = 16777216
double
digits = 14
maxint = 72057594037927936

If you output values around maxint, cast to the target type, you
will find that values below maxint are exact, as is maxint, but
that values above jump, with a granularity of radix -- thus,
with IEEE float, you get something like: 16777213, 16777214,
16777215, 16777216, 16777216, 16777218 ... (The exact sequence
after maxint will depend on the rounding rules.)  Note too that
incrementing the floating point value (instead of using long
long and converting to floating point) will become a no-op after
maxint.

--
James Kanze                                           GABI Software
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated.    First time posters: Do this! ]

```
 0
Reply kanze 2/14/2006 10:52:01 AM

```Hi!

Greg Herlihy schrieb:

> I would use std::numeric_limits<float>::digits to calculate the maximum
> integer value that a floating point type can represent with a �1.0
> precision:
>
>     #include <limits>
>     #include <math.h>
>
>     int main()
>     {
>          float f          double d          ...
>
> For 32-bit (single precision IEEE) floating point, X is 16777216.0. For
> 64-bit (double precision IEEE), X is 9007199254740992.0. The
> corresponding minimum value for a floating point type would be -X.

On my machine, std::pow( 2.0, std::numeric_limits<float>::digits )
yields 16777216.0. To prove that experimentally, I used:

float e = 1.0, f = 2.0;
while ( static_cast<int>(f) - 1 == static_cast<int>(f-1.0) ) {
f *= 2.0; e *= 2.0;
}

Funny enough, e holds 9007199254740992.0 after that.

Regards,
Matthias

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated.    First time posters: Do this! ]

```
 0
Reply Matthias 2/14/2006 10:59:30 AM

```Matthias Kluwe wrote:

> Greg Herlihy schrieb:

> > I would use std::numeric_limits<float>::digits to calculate
> > the maximum integer value that a floating point type can
> > represent with a �1.0 precision:

> >     #include <limits>
> >     #include <math.h>
> >
> >     int main()
> >     {
> >          float f          double d          ...

> > For 32-bit (single precision IEEE) floating point, X is
> > 16777216.0. For 64-bit (double precision IEEE), X is
> > 9007199254740992.0. The corresponding minimum value for a
> > floating point type would be -X.

> On my machine, std::pow( 2.0, std::numeric_limits<float>::digits )
> yields 16777216.0. To prove that experimentally, I used:

>     float e = 1.0, f = 2.0 ;
>     while ( static_cast<int>(f) - 1 == static_cast<int>(f-1.0) ) {
>         f *= 2.0 ; e *= 2.0;  }

(I'm guessing a bit on that reconsituation.  For some reason,
all of the code examples in this thread are screwed up when I
read them -- problems with Google, perhaps?)

> Funny enough, e holds 9007199254740992.0 after that.

On what machine?

There are several problems here.  The first is that the standard
allows intermediate floating point values to use extended
formats, so f-1.0 may contain the correct value, even if that
value is not representable in a float.  You need to cast it back
to float (or assign it to a float) before converting it to an
integral type.

The value you find is the correct value for an IEEE double,
which would suggest that your implementation uses double for
intermediate values in floating point calculations.

The second problem is that int may not be big enough.  It
typically will be for float, but since you're actually testing
double...

Also, I wonder if there might be an off by one problem as well.
At the limit, subtracting one works, but adding one doesn't.

Finally, of course, this only works if your radix is 2 or a
power of 2.  True for all of the machines I know today (where 2
and 16 seem to be the only radixes in use), but historically,
there have been base 10 representations as well.

--
James Kanze                                           GABI Software
Conseils en informatique orient�e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S�mard, 78210 St.-Cyr-l'�cole, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated.    First time posters: Do this! ]

```
 0
Reply kanze 2/15/2006 1:36:43 PM

```Hi!

"kanze" <kanze@gabi-soft.fr>:
> Matthias Kluwe wrote:
> > On my machine, std::pow( 2.0, std::numeric_limits<float>::digits )
> > yields 16777216.0. To prove that experimentally, I used:
>
> >     float e = 1.0, f = 2.0 ;
> >     while ( static_cast<int>(f) - 1 == static_cast<int>(f-1.0) ) {
> >         f *= 2.0 ; e *= 2.0;  }
>
> (I'm guessing a bit on that reconsituation.  For some reason,
> all of the code examples in this thread are screwed up when I
> read them -- problems with Google, perhaps?)
>
> > Funny enough, e holds 9007199254740992.0 after that.
>
> On what machine?

That's a Intel PIV, MS VC++ 7.1.

> There are several problems here.  The first is that the standard
> allows intermediate floating point values to use extended
> formats, so f-1.0 may contain the correct value, even if that
> value is not representable in a float.  You need to cast it back
> to float (or assign it to a float) before converting it to an
> integral type.

You're right. I was not aware of this. Casting the f-1.0 to float
leads to the expected result.

Regards,
Matthias

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated.    First time posters: Do this! ]

```
 0
Reply Matthias 3/8/2006 10:03:41 PM

10 Replies
259 Views

Similar Articles

12/8/2013 1:34:35 AM
page loaded in 5482 ms. (0)