Hello,
I'm currently developing a little C-like programming language as a
hobby project. After having implemented the basic integral integer
types like known from Java/C# (with fixed sizes for each type), I
thought a bit about 64-bit machines and wanted to ask: if you develop
on a 64-bit machine, would it be preferable to still leave the
standard integer type ("int") 32-bit, or would it be better to have
"int" grow to 64 bit? In this case, I could have an
architecture-dependent "int" type along with fixed-sized types like
"int8", "int16", "int32" etc.
What do you think?
Cheers,
Denis Washington
[I would make my int type the natural word size of the machine. If people
want a particular size, they can certainly say so. -John]
|
|
0
|
|
|
|
Reply
|
dwashington (5)
|
7/2/2007 3:43:35 PM |
|
Denis Washington <dwashington@gmx.net> writes:
> I'm currently developing a little C-like programming language as a
> hobby project. After having implemented the basic integral integer
> types like known from Java/C# (with fixed sizes for each type), I
> thought a bit about 64-bit machines and wanted to ask: if you develop
> on a 64-bit machine, would it be preferable to still leave the
> standard integer type ("int") 32-bit, or would it be better to have
> "int" grow to 64 bit? In this case, I could have an
> architecture-dependent "int" type along with fixed-sized types like
> "int8", "int16", "int32" etc.
>
> What do you think?
>
> [I would make my int type the natural word size of the machine. If people
> want a particular size, they can certainly say so. -John]
I never really liked C's machine-dependent integer type. I prefer
integer types to have explicit fixed sizes (and a selection of those)
or be unbounded. However, I'm happy to allow the implementation to
use more bits than required, so an int16 could be implemented as a
32-bit integer on machines where operating on 16-bit entities is
difficult or costly.
Even better than a small fixed number of sizes (such as int8, int16,
int32 and int64) is to (like in Pascal) explicitly state the required
minimum and maximum values, so you have types like -10..10 or 0..255.
You would be guaranteed that all values in the interval would be
representable. Ideally (as in Pascal), you would get errors if you
put a larger value into a variable than its type support, but if you
are worried about performance, it would be acceptable to drop these
tests. Many of them could be eliminated at compile-time, though, as
as index checks.
In addition to explicitly bounded numbers, you could have an integer
type that is only bounded by the available memory to store it. If you
just add a machine-dependent bounded integer type (as Pascal), people
will tend to use it instead of the bounded type and just make tacit
assumptions about the range of values.
Torben
[PL/I let you specify how big all your integers needed to be, and I
can't say that part was a rousing success. -John]
|
|
0
|
|
|
|
Reply
|
torbenm
|
7/4/2007 9:30:40 AM
|
|
On 2007-07-02, Denis Washington <dwashington@gmx.net> wrote:
> I'm currently developing a little C-like programming language as a
> hobby project. After having implemented the basic integral integer
> types like known from Java/C# (with fixed sizes for each type), I
> thought a bit about 64-bit machines and wanted to ask: if you develop
> on a 64-bit machine, would it be preferable to still leave the
> standard integer type ("int") 32-bit, or would it be better to have
> "int" grow to 64 bit?
It depends on the hardware (the specifics of the 64-bit architecture).
If the architecture heavily penalizes 32-bit access, then a 64-bit
value is better. If not, enlarging the INT type will only blow up the
average datastructure (and decrease cache utilization)
So you can have two 64-bit architectures where this differs. The key
thing is to keep some type equivalent to pointers. (and maybe another
for differences between pointers if you are very strict).
This is also the current situation currently for existing native compilers
Search for the terms LLP64, ILP64 and LP64 to find more of these discussions.
|
|
0
|
|
|
|
Reply
|
Marco
|
7/4/2007 11:21:26 AM
|
|
C++ STL has size_t which is the natural word size of the machine.
you can keep int8/16/32/64/128 types and then keep a type
int_machinetype to represent the natural word size of machine.
The question to ask is: (if you want int to be default 64) how many
times do we create objects which have an integral value greater than
what can be represented by 32 bits.
|
|
0
|
|
|
|
Reply
|
Amit
|
7/5/2007 12:31:06 AM
|
|
Torben Fgidius Mogensen wrote:
>>[I would make my int type the natural word size of the machine. If people
>>want a particular size, they can certainly say so. -John]
>
> I never really liked C's machine-dependent integer type. I prefer
> integer types to have explicit fixed sizes (and a selection of those)
> or be unbounded. However, I'm happy to allow the implementation to
> use more bits than required, so an int16 could be implemented as a
> 32-bit integer on machines where operating on 16-bit entities is
> difficult or costly.
Just some thoughts:
Computations can use any appropriate size internally, which typically
will be the (biggest) natural machine word size. As long as only one
size is used in computations, this is the natural size of the default
integer type.
IMO users are not very pleased with multiple fixed-size integer types,
which require a careful selection for every local variable. Not to
mention the documentation for every decision... ;-)
Sized types primarily are required for processing data, which are
stored in legacy formats (like API function arguments). Portability is
another key, when sized/ranged data types again come into play. As
long as only 32 and 64 bit machines are involved, 32 bit integers are
a meaningful default (guaranteed) size, which AFAIK also is used for
int's in 64 bit programs. AFAIR Integer*4 was the most frequently used
data type in scientific Fortran libraries (IMSL...), since decades.
File sizes, however, already require more than 32 bits, so this may be
another mandatory type, to be implemented for every target machine.
> Even better than a small fixed number of sizes (such as int8, int16,
> int32 and int64) is to (like in Pascal) explicitly state the required
> minimum and maximum values, so you have types like -10..10 or 0..255.
Another model is the digit count, as known from COBOL, and occuring in
formatted output in most programming languages. These approaches are
quite self-documenting :-)
OTOH a specified range of values doesn't restrict the compiler to use
any decent storage format for such numbers. Range checks are required
only for assignments to such variables, integrating neatly into the
inevitable code for reading and writing variables of different byte
count. A single option can be used to enable/disable such checks, for
optimization purposes.
Another aspect is the scope of the project. In an
experimental/self-educational lanuguage project I'd use a Fortran-like
notation, which can be implemented and extended in multiple steps:
1) machine specific "Integer"
2) with a global option to specify a minimum size
3) explicitly typed "Integer*n"
4) unbounded "Integer**"
Unsigned, scaled, and true fixed-size integer types are independent
options, e.g. for binary data exchange, monetary calculations etc.
For binary compatiblity a notation like "Integer!n" could be used, to
indicate a mandatory bit or byte size. And to complete confusion,
another "Integer#i.f" notation could be used, indicating scaled integers
with i integral and f fractional *digits*.
For practical reasons I'd also implement type aliases (C/Pascal style
typedefs), which allow the user to specify type names for specific
variables/purposes, in a distinct place. For educational reasons the use
of "styled" data types could be restricted to type declarations, whereas
in the remaining code only type names are acceptable.
DoDi
[Defining the range for each variable is one of those ideas that seems
like a good idea but in practice, programmers don't do it. In Multics
PL/I programs, all the variables were FIXED INT(35) because everyone
knew that was how you got a word variable. On IBM mainframes, they
were all FIXED INT(31). -John]
|
|
0
|
|
|
|
Reply
|
Hans
|
7/5/2007 5:00:09 AM
|
|
Denis Washington <dwashington@gmx.net> writes:
>I'm currently developing a little C-like programming language as a
>hobby project. After having implemented the basic integral integer
>types like known from Java/C# (with fixed sizes for each type), I
>thought a bit about 64-bit machines and wanted to ask: if you develop
>on a 64-bit machine, would it be preferable to still leave the
>standard integer type ("int") 32-bit, or would it be better to have
>"int" grow to 64 bit? In this case, I could have an
>architecture-dependent "int" type along with fixed-sized types like
>"int8", "int16", "int32" etc.
For a C-like language (i.e., where integers have some connection to
pointers, e.g., pointer arithmetic), the main integer type should have
the same size as a pointer; for your 64-bit machines, it should be 64
bits.
As for having a zoo of integer types like C, that's a bad idea in my
experience. It causes lots of portability bugs; in contrast, in Forth
there's one dominant integer type, the cell, which can also contain an
address; in Forth portability bugs between 32-bit and 64-bit systems,
and between different byte orders are very rare.
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
|
|
0
|
|
|
|
Reply
|
anton
|
7/5/2007 6:09:41 PM
|
|
On Thu, 05 Jul 2007 07:00:09 +0200, Hans-Peter Diettrich wrote:
> IMO users are not very pleased with multiple fixed-size integer types,
> which require a careful selection for every local variable. Not to
> mention the documentation for every decision... ;-)
Hmm, if you take int16 to hold values from the range 1..512, then you
indeed have to document somewhere that 513 is not a legal value. On the
contrary when you write in the program:
type Foo is range 1..512;
then that barely requires further documentation. But what is more important
is that an exception will be raised when -3 is assigned to a variable of
the type Foo.
> [Defining the range for each variable is one of those ideas that seems
> like a good idea but in practice, programmers don't do it.
True, range (or more general constraint) is a property of a type or a
subtype. It is not a property of a variable.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
[The problem isn't that it's a variable rather than a type, the problem is
that programmers aren't very good at predicting the range of values
that a variable can take, so they won't try. If you force them to do so,
as PL/I does, they'll pick a size that works reasonably consistently and
always use that. -John]
|
|
0
|
|
|
|
Reply
|
Dmitry
|
7/5/2007 8:33:01 PM
|
|
Our esteemed moderator writes:
> [The problem isn't that it's a variable rather than a type, the problem is
> that programmers aren't very good at predicting the range of values
> that a variable can take, so they won't try. If you force them to do so,
> as PL/I does, they'll pick a size that works reasonably consistently and
> always use that. -John]
My experience is different, for languages that support ranges
(as opposed to bit-counts) -- Pascal, Modula-2, and Ada, for example.
Programmers in those languages use subrange types quite heavily.
Sometimes the range is something like 1..Max_Int, where as you say, the
programmer couldn't predict the upper bound very well. But even
in that case, the lower bound is clear. Other times, more meaningful
upper bounds are used. 1..31 for day-within-month, for example.
In Ada, you can say:
type Foo is range 1..1_000_000;
subtype Bar is Foo'Base range 1..Foo'Base'Last;
which means that Bar has a range whose lower bound is 1,
and whose upper bound is at least a million, but chosen
by the compiler to match the hardware (perhaps 2**31-1).
Subranges of enumeration types and character types also
make sense. (A character type is an enumeration type,
in Ada.) And for floating-point and fixed-point types.
In Lisp, Smalltalk, etc, the programmer who is not "very good at
predicting the range of values" uses bignums, at the cost of some
efficiency. I'd like to get the best of both worlds.
Anyway, if programmers (who supposedly know their problem domain)
can't choose ranges properly, how on Earth can the language designer
or the compiler writer (who knows the hardware, but has no idea about
the problem being solved)? And what is the programmer supposed to do
with a type called "int" or "Integer" that is a small subset of the
integers, whose bounds are based on the whim of the compiler writer?
The programmer who wants to write portable code, I mean.
- Bob
|
|
0
|
|
|
|
Reply
|
Robert
|
7/6/2007 12:50:03 AM
|
|
Hans-Peter Diettrich wrote:
> Sized types primarily are required for processing data, which are
> stored in legacy formats (like API function arguments).
(snip)
> AFAIR Integer*4 was the most frequently used
> data type in scientific Fortran libraries (IMSL...), since decades.
Well, Fortran requires the default INTEGER and default REAL to have
the same size (so that EQUIVALENCE works). Some compilers on 16 bit
machines didn't do that, others store 16 bit integers in 32 bits.
> File sizes, however, already require more than 32 bits, so this may be
> another mandatory type, to be implemented for every target machine.
(snip)
> Another model is the digit count, as known from COBOL, and occuring in
> formatted output in most programming languages. These approaches are
> quite self-documenting :-)
(snip)
In some cases you know how big the value in a variable could
possibly be with useful data, other times you don't want to
unnecessarily restrict the user, and a convenient default word
size is nice.
(snip)
> [Defining the range for each variable is one of those ideas that seems
> like a good idea but in practice, programmers don't do it. In Multics
> PL/I programs, all the variables were FIXED INT(35) because everyone
> knew that was how you got a word variable. On IBM mainframes, they
> were all FIXED INT(31). -John]
I agree. Having the ability to specify it is nice, and sometimes
useful, but more often one want a convenient size. (By the way,
I think you meant FIXED BIN(31)). If the default width and scale
factor are carefully selected, and those are used, it might not
be so bad.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
7/6/2007 1:44:01 AM
|
|
Hans-Peter Diettrich wrote:
> [Defining the range for each variable is one of those ideas that seems
> like a good idea but in practice, programmers don't do it. In Multics
> PL/I programs, all the variables were FIXED BIN(35) because everyone
> knew that was how you got a word variable. On IBM mainframes, they
> were all FIXED BIN(31). -John]
Perhaps somebody can help me with a proper citation: AFAIR Bjarne
Stroustrup (or Wirth?) said about OOP, that it makes a difference
whether a langaage *supports* OOP, or whether it *encourages* to use OOP.
The same for other features of a language. In strictly typed languages,
like Delphi/OPL, it's common practice to use subrange types, along with
enumerated and set types.
Subrange and other types can not only simplify the design and debugging
of a program, they also can lead to faster execution, due to better
optimization criteria for the compiler.
Consider a common situation: the difference between two n-bit values
requires n+1 bits, or overflows can occur when less bits are used for
the result. Using an subrange type of wordsize-n bits, both the user and
the compiler can be sure that a sum or difference cannot produce an
overflow, in computations with full wordsize registers or variables.
A common situation, found in many open source C codes, are comparisons
of signed and unsigned values. The Delphi compiler warns the user, that
the comparison requires to extend the values to the next higher type. I
dunno what a C compiler will do in such cases, but in Delphi I can use
appropriate subrange types to eliminate such conversions.
Now tell me please, how one can prevent overflows or inappropriate or
bloated comparison code, when e.g. only a single integer type is available.
DoDi
|
|
0
|
|
|
|
Reply
|
Hans
|
7/6/2007 4:22:49 AM
|
|
On 2007-07-05, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> For a C-like language (i.e., where integers have some connection to
> pointers, e.g., pointer arithmetic), the main integer type should have
> the same size as a pointer; for your 64-bit machines, it should be 64
> bits.
In C, integers have no connection to pointers, long has. However the
current situation of multiple models in use for 64-bit architectures
probably has to do with code that doesn't follow the exact guidelines.
> As for having a zoo of integer types like C, that's a bad idea in my
> experience. It causes lots of portability bugs; in contrast, in Forth
> there's one dominant integer type, the cell, which can also contain an
> address; in Forth portability bugs between 32-bit and 64-bit systems,
> and between different byte orders are very rare.
You need a range of fixed types ints anyway to be able to define a
structure that maps (at least nearly) any external structure, so that
you can communicate without marshalling.
[Jeez, I thought we got away from the abomination of pointers that are
longer than ints when we left the 286 large mode. No such luck, huh? -John]
|
|
0
|
|
|
|
Reply
|
Marco
|
7/6/2007 5:45:21 AM
|
|
Anton Ertl wrote:
> For a C-like language (i.e., where integers have some connection to
> pointers, e.g., pointer arithmetic), the main integer type should have
> the same size as a pointer; for your 64-bit machines, it should be 64
> bits.
I dare to decline. Even in C, pointers and computational values should
be kept separated. The pointer size is of little interest in pointer
arithmetic, it only dictates the required precision of the computations
and results.
Of course it's convenient to have an integer type, that can hold an
pointer (FILE, HANDLE...), but this also can be a long int, or simply a
void*. For portability reasons, a language should provide an according
type name in the standard library. Along with precisely sized integer
types...
With regards to pointer differences, what's IMO the only case where 2
pointers are involved in pointer arithmetic, there exists a need for an
*bigger* integer type, or for a convention that valid pointer (address!)
values are a subrange of ptrdiff_t.
In the current migration from 32 to 64 bit code we should keep in mind,
that we move from a memory model with a somewhat "saturated" address
range, to a model where the available address space is a fraction of the
addressable space. The same consideration applies to smaller
(embedded...) systems, where it IMO were fine to have at least two
distinct models, for dealing with machines with little or much memory.
The "an pointer is an int" paradigm applies to the little memory
category, where pointer differences don't exceed the typical integer
value range. Following this model, one should use 64 bit integers on 64
bit machines. But many people will be happy with 32 bit (or even less)
integers, because their computations either do not require bigger
values, or still require an wider range (floating point values!).
IMO the *application* should be considered, to determine which data
types to use in computations, not the machine. Consider files: it's
stupid to load huge files into memory, only because the file size is
lower than the addressing capabilites. Consider arrays: word-size
indices are overkill, when the element size prevents the allocation of
so much physical memory. The requirements for applications stay the
same, regardless of the evolution of the machines: the resources
(RAM...) are limited and require careful data management, based on
reasonable assumptions about a minimum equipment of the target machines.
DoDi
|
|
0
|
|
|
|
Reply
|
Hans
|
7/6/2007 6:10:29 AM
|
|
Marco van de Voort wrote:
> [Jeez, I thought we got away from the abomination of pointers that are
> longer than ints when we left the 286 large mode. No such luck, huh? -John]
Why do you see a need for *any* relationship between addresses and values?
For an compiler writer, addresses are just another data type, whose size
varies with the architecture of the target machine.
For an debugger writer, addresses are offsets into the address space of
an process.
For an application writer, addresses should not have any specific
meaning. Can you give a practical example, where addresses (i.e. the
values of pointers) have to be handled as numerical data types?
Hint: Even in C, a pointer difference is expressed in *elements*, not in
bytes. The same for pointer increments...
The NULL pointer also doesn't have an natural value, it's zero only by
convention. Floating point numbers have more such special "values", like
NaN, but all these have a useful meaning only within their own data
type, not as integer values.
And for an machine code hacker, the word and register sizes are fixed by
the architecture, no room for discussions ;-)
DoDi
|
|
0
|
|
|
|
Reply
|
Hans
|
7/6/2007 8:08:33 AM
|
|
Marco van de Voort <marcov@stack.nl> writes:
>On 2007-07-05, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>> For a C-like language (i.e., where integers have some connection to
>> pointers, e.g., pointer arithmetic), the main integer type should have
>> the same size as a pointer; for your 64-bit machines, it should be 64
>> bits.
>
>In C, integers have no connection to pointers, long has.
long is an integer type. And long has no more connection to pointers
than int, neither in the standard nor in practice (e.g., on the
PDP-11, i.e., the classical C implementation, long is 32-bit and
pointers and ints are 16 bit.
What I meant with the connection between integers and pointers is that
one can do pointer arithmetic and cast between pointers and integers.
>You need a range of fixed types ints anyway to be able to define a
>structure that maps (at least nearly) any external structure, so that
>you can communicate without marshalling.
I think that a zoo of types is worse than marshalling, and you have to
marshal anyway if your structure involves pointers. In any case,
there are also other alternatives, e.g., one could allow the
fixed-size types only in structures.
>[Jeez, I thought we got away from the abomination of pointers that are
>longer than ints when we left the 286 large mode. No such luck, huh? -John]
No, the 286 guys took over the C compiler groups of MIPS and DEC and
invented the I32LP64 model, so that Unix programmers had to suffer
just as much as DOS programmers used to. But that's water down
the river.
- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
|
|
0
|
|
|
|
Reply
|
anton
|
7/6/2007 9:00:25 AM
|
|
On 2007-07-06, Marco van de Voort <marcov@stack.nl> wrote:
> [Jeez, I thought we got away from the abomination of pointers that are
> longer than ints when we left the 286 large mode. No such luck, huh? -John]
Afaik even Linux-on-Alpha had already 32-bit ints? It's not such a surprise
that x86_64 follows that.
|
|
0
|
|
|
|
Reply
|
Marco
|
7/8/2007 8:32:04 AM
|
|
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> No, the 286 guys took over the C compiler groups of MIPS and DEC and
> invented the I32LP64 model, so that Unix programmers had to suffer
> just as much as DOS programmers used to. But that's water down
> the river.
That's not the way I recall it and I was working on the Alpha Unix C
compiler at the time. All of us, preferred the ILP64 model--the
instruction set was much better suited to that model. However, we had
some GNU software that wouldn't work (at that time) if ints were 64
bits--I don't recall whether it was emacs or gcc or something else
important. Therefore, since we needed to get something out-the-door
without rewriting that software into an alpha specific cul-de-sac
version, we did the I32LP64 model with a special hack that all "user
space" addresses actually fit in 32 bits, so that the machine would
look like I32L64P32 when needed. I'm sure to some that looked like a
hack, but it seemed like an elegant hack to me. It wasn't what we
wanted, but it allowed the assumption that-all-the-world-was-a-vax to
persist a little bit longer, when very few applications were 64 bit
ready.
Just my recollections,
-Chris
*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
|
|
0
|
|
|
|
Reply
|
Chris
|
7/8/2007 11:53:27 PM
|
|
Anton Ertl wrote:
> What I meant with the connection between integers and pointers is that
> one can do pointer arithmetic and cast between pointers and integers.
Again my question: can you give an example, where such conversions are a
requirement?
Or do you mean that pointer arithmetic should be *emulated* by integer
arithmethic, not be *implemented* as true portable arithmetic on pointers?
DoDi
|
|
0
|
|
|
|
Reply
|
Hans
|
7/9/2007 5:05:38 AM
|
|
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> What I meant with the connection between integers and pointers is that
> one can do pointer arithmetic and cast between pointers and integers.
Pointer arithmetic does not require pointers and integers to be of the
same size. All that is required is that an integer can hold the
number of elements in an allocated array, so the difference of two
pointers into the same array can fit in an integer. Since it makes
sense to have virtual addresses much larger than physical memory, it
also makes sense to have addresses larger than integers.
Casting pointers to integers is an abomination.
> I think that a zoo of types is worse than marshalling, and you have to
> marshal anyway if your structure involves pointers.
I think machine-dependent integers is a much worse impediment to
marshalling than subrange types. With machine-dependent integers, you
can't write machine-independent marshalling code. And with
marshalling/unmarshalling you can't even be sure that the unmarshaller
has the same size ints as the marshaller, so you can get into loads of
trouble if you are not careful (such as storing the size of the number
with each number).
When marshalling structures that use pointers, all you need is an
ability to find the offset between two pointers in the same allocated
array and to comapre two pointer that are not in the same allocated
array for equality. Noone in their right mind would marshal a pointer
as its bit pattern. What you need is information sufficient to
reconstruct an equivalent heap structure -- without keeping exact
addresses.
Torben
|
|
0
|
|
|
|
Reply
|
torbenm
|
7/9/2007 7:50:05 AM
|
|
On Fri, 06 Jul 2007 06:22:49 +0200, Hans-Peter Diettrich
<DrDiettrich1@aol.com> wrote:
>Consider a common situation: the difference between two n-bit values
>requires n+1 bits, or overflows can occur when less bits are used for
>the result. Using an subrange type of wordsize-n bits, both the user and
>the compiler can be sure that a sum or difference cannot produce an
>overflow, in computations with full wordsize registers or variables.
>
>A common situation, found in many open source C codes, are comparisons
>of signed and unsigned values. The Delphi compiler warns the user, that
>the comparison requires to extend the values to the next higher type. I
>dunno what a C compiler will do in such cases, but in Delphi I can use
>appropriate subrange types to eliminate such conversions.
Subranges are mapped into an appropriately sized integral type. So
depending on where you make your range declaration(s), you are either
using a larger data type everywhere or performing a local conversion
anyway.
Actually, the integer compare instructions on many CPUs correctly
handles the n+1 bit result, but a compiler may choose instead to use
subtraction and not catch the over/underflow.
Most C compilers warn about mixing signed and unsigned values in an
expression - at least if the programmer doesn't turn off the warning.
Comparing them is usually a stupid mistake, but it works except at the
corner case where the unsigned value exceeds the largest positive
value of the corresponding signed type. C doesn't catch integer
over/underflows and it won't warn you about the possibility.
>Now tell me please, how one can prevent overflows or inappropriate or
>bloated comparison code, when e.g. only a single integer type is available.
You either have to catch the overflow (possibly using assembler) or
change the data representation - e.g., by scaling or offsetting - to
make sure an overflow can't happen.
If the data cooperates, one trick you can use in C is to define
reduced range integer types using structures and bit fields. For
example, if you define a 20-bit signed type and a 19-bit unsigned
type, you know that these can be compared cleanly (even if the
compiler warns) and that an unsigned value could be safely assigned to
a signed type (though not the reverse).
Bit field integers must fit into a machine register (so no 37-bit
integers on a 32-bit machine :) and they require masking or shifting
so they are slower to use than plain integers (though the hit is
generally minor if you limit one bit field to a structure). And the
compiler still won't warn you if you screw up using them - they obey
the same rules as normal C integers. But with an appropriate variable
naming scheme you could at least look at an expression and be able to
tell whether it might bust.
George
--
for email reply remove "/" from address
|
|
0
|
|
|
|
Reply
|
George
|
7/9/2007 10:28:16 AM
|
|
Chris F Clark <cfc@shell01.TheWorld.com> wrote:
>
>That's not the way I recall it and I was working on the Alpha Unix C
>compiler at the time. All of us, preferred the ILP64 model--the
>instruction set was much better suited to that model.
John Mashey wrote an excellent history of the 32/64 bit transition:
http://www.acmqueue.org/modules.php?name=Content&pa=printer_friendly&pid=421&page=1
Tony.
--
f.a.n.finch <dot@dotat.at> http://dotat.at/
|
|
0
|
|
|
|
Reply
|
Tony
|
7/9/2007 4:37:10 PM
|
|
On 2007-07-06, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> Marco van de Voort <marcov@stack.nl> writes:
>>On 2007-07-05, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>>> For a C-like language (i.e., where integers have some connection to
>>> pointers, e.g., pointer arithmetic), the main integer type should have
>>> the same size as a pointer; for your 64-bit machines, it should be 64
>>> bits.
>>
>>In C, integers have no connection to pointers, long has.
>
> long is an integer type.
_an_ integer type, but not the "main" integer type.
> And long has no more connection to pointers
> than int, neither in the standard nor in practice
No the standard is indeed vague. What I mentioned is afaik more POSIX. And
probably it is not equal, but equal or greater.
> What I meant with the connection between integers and pointers is that
> one can do pointer arithmetic and cast between pointers and integers.
One can also do this (using the suitable integer type) if the main type
isn't. But that allows the compiler some leeway in which precision it does
non pointer related arithmetic.
>>You need a range of fixed types ints anyway to be able to define a
>>structure that maps (at least nearly) any external structure, so that
>>you can communicate without marshalling.
>
> I think that a zoo of types is worse than marshalling, and you have to
> marshal anyway if your structure involves pointers.
External to the language (e.g. OS or foreign library), not necessarily
external to the process.
> In any case, there are also other alternatives, e.g., one could allow the
> fixed-size types only in structures.
IMHO one should simply declare one type pointer compatible, and be done with
it. I know this has been tried in the past, and some companies have deviated
again (LLP64 model), but I don't see to need to blow up the integer
unnecessarily.
I don't see much advantages in equating pointer and integer. If people
deviate from rules (like with LLP), any scheme breaks.
|
|
0
|
|
|
|
Reply
|
Marco
|
7/10/2007 9:19:50 AM
|
|
George Neuner wrote:
> Subranges are mapped into an appropriately sized integral type. So
> depending on where you make your range declaration(s), you are either
> using a larger data type everywhere or performing a local conversion
> anyway.
Right, the *storage* format must match the container (memory, register).
Local conversions occur just like for other variables, when a smaller
operand must be extended to the size of the bigger one.
> Actually, the integer compare instructions on many CPUs correctly
> handles the n+1 bit result,
You mean the overflow and carry flags? These flags contain meaninful
values only after a comparison of arguments of the same signedness.
Typically the *following* instructions determine the *interpretation* of
the comparison result, as either a comparison of two signed or two
unsigned values.
> but a compiler may choose instead to use
> subtraction and not catch the over/underflow.
When the language allows for such behaviour...
> Most C compilers warn about mixing signed and unsigned values in an
> expression - at least if the programmer doesn't turn off the warning.
> Comparing them is usually a stupid mistake, but it works except at the
> corner case where the unsigned value exceeds the largest positive
> value of the corresponding signed type.
Right. With subrange types one can make sure that the corner case never
will occur, without assembly code hacks.
> If the data cooperates, one trick you can use in C is to define
> reduced range integer types using structures and bit fields. For
> example, if you define a 20-bit signed type and a 19-bit unsigned
> type, you know that these can be compared cleanly (even if the
> compiler warns) and that an unsigned value could be safely assigned to
> a signed type (though not the reverse).
Right, C bitfields define subranges. But I never saw code that used
bitfields for exactly that purpose, only for packing multiple values
into one container.
> Bit field integers must fit into a machine register (so no 37-bit
> integers on a 32-bit machine :) and they require masking or shifting
> so they are slower to use than plain integers (though the hit is
> generally minor if you limit one bit field to a structure).
Right, packing multiple bitfields into one container results in more
code and slower computations. It's the designer's choice, as appropriate
or acceptable in a concrete application.
DoDi
|
|
0
|
|
|
|
Reply
|
Hans
|
7/13/2007 9:57:04 AM
|
|
Tony Finch <dot@dotat.at> writes:
> John Mashey wrote an excellent history of the 32/64 bit transition:
> http://www.acmqueue.org/modules.php?name=Content&pa=printer_friendly&pid=421&page=1
Thanks for the pointer. It was good reading and a welcome reminder as
these days I'm actually doing instruction set design (not intended to
be user accessible, but we all know how intentions like that fair) and
we are trying to fit things in an 8/16 bit world. It gives me a good
argument for laying down some 32 (and 64 bit) address instructions,
even though on the first implementation, the machine will just barely
have 12 bits of address space, and thus they won't get used.
Thanks,
-Chris
*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
|
|
0
|
|
|
|
Reply
|
Chris
|
7/13/2007 4:18:14 PM
|
|
On Jul 2, 11:43 pm, Denis Washington <dwashing...@gmx.net> wrote:
> Hello,
>
> I'm currently developing a little C-like programming language as a
> hobby project. After having implemented the basic integral integer
> types like known from Java/C# (with fixed sizes for each type), I
> thought a bit about 64-bit machines and wanted to ask: if you develop
> on a 64-bit machine, would it be preferable to still leave the
> standard integer type ("int") 32-bit, or would it be better to have
> "int" grow to 64 bit? In this case, I could have an
> architecture-dependent "int" type along with fixed-sized types like
> "int8", "int16", "int32" etc.
IMHO for 32-bit architecture upwards I would make "int" to be fixed to
32 bits long for practicality. Then you can have "int8", "int16",
"int32", "int64", "int128", etc.
This is due to the fact that since ANSI C 1989 (ISO C 1990) came out
most C compiler implementations for 32-bit machines for "int" have
been 32 bits for about two decades; and overwhelming number of
applications have been written with that in mind. Although 64-bit
architectures were available at the time, but it was not nearly as
common as 32-bit architectures until recently.
But, finally it's up to you :-)
Regards.
Napi
|
|
0
|
|
|
|
Reply
|
napi
|
7/14/2007 12:32:07 AM
|
|
> Casting pointers to integers is an abomination.
Not really. It's pretty convenient for various tricks such as
eq-hash tables.
The real abomination is casting from integers to pointers.
Stefan
|
|
0
|
|
|
|
Reply
|
Stefan
|
7/15/2007 3:11:34 AM
|
|
On Fri, 13 Jul 2007 11:57:04 +0200, Hans-Peter Diettrich
<DrDiettrich1@aol.com> wrote:
>George Neuner wrote:
>
>> Actually, the integer compare instructions on many CPUs correctly
>> handles the n+1 bit result,
>
>You mean the overflow and carry flags? These flags contain meaninful
>values only after a comparison of arguments of the same signedness.
>Typically the *following* instructions determine the *interpretation* of
>the comparison result, as either a comparison of two signed or two
>unsigned values.
Only partly. What I mean is that, at the assembler level, the compare
and subtraction instructions are not necessarily interchangeable.
There are ISAs where some or all of the arithmetic instructions do not
set any condition codes and others in which add/subtract only indicate
a carry or borrow occurred. On such machines, the result can be
tested for zero and sign, but overflow information is lost. However,
the compare instruction generally correctly indicates when an overflow
occurs in the intrinsic subtraction.
There are ISAs which make multiword arithmetic very expensive because
propagating the carry/borrow correctly requires several instructions.
>> but a compiler may choose instead to use
>> subtraction and not catch the over/underflow.
>
>When the language allows for such behaviour...
What compare generally tells you that subtract may not is whether the
corresponding arithmetic result is valid. Offhand, I can't think of
any example of an HLL where the semantics of comparison depend on that
particular difference. Anybody?
For ISAs where compare and subtract both set condition codes in the
same way, or in which condition codes are not exposed but compare
produces a signed value in a register, compare and subtract are
interchangeable when a value needs to be produced (such as for a
boolean store). Even if subtract does not produce exactly the same
conditions as compare, the compiler may use the subtraction form
anyway because the arithmetic result of the subtraction will be
needed.
George
|
|
0
|
|
|
|
Reply
|
George
|
7/16/2007 7:09:16 AM
|
|
George Neuner wrote:
> What compare generally tells you that subtract may not is whether the
> corresponding arithmetic result is valid. Offhand, I can't think of
> any example of an HLL where the semantics of comparison depend on that
> particular difference. Anybody?
A HLL comparison can return a perfectly valid True/False result,
without an extra overflow case. That's different for a subtraction,
where overflows can occur, and the language specification or
implementation must specify the handling of this case.
IMO you missed my point. On machines with a single compare
instruction, different conditional jumps must be used for signed or
unsigned compares, i.e. HLL ">" becomes either BHI or BGT. No such
instructions or condition code settings exist for mixed sign
operands. The same restrictions apply to other machine architectures,
and to subtraction instead of comparison instructions.
That's why a comparison or other arithmetic on mixed sign operands
becomes more expensive than comparisons of operands of the same
signedness, on every machine.
In the case of true subranges (less bits required than a machine word
can hold), the values can be stored as either signed or unsigned
words, according to their signedness. Then all such values can be
retrieved and interpreted as signed words, preventing mixed mode
arithmetic. Otherwise, with packed bitfields, different instructions
had to be used, to signed/unsigned extend the values to the full word
size, when loading the values into the ALU.
If you or somebody else disagrees, we better discuss the topic
privately, and present the results to the group.
DoDi
|
|
0
|
|
|
|
Reply
|
Hans
|
7/19/2007 6:29:15 AM
|
|
On Thu, 19 Jul 2007 08:29:15 +0200, Hans-Peter Diettrich
<DrDiettrich1@aol.com> wrote:
>George Neuner wrote:
>
>> What compare generally tells you that subtract may not is whether the
>> corresponding arithmetic result is valid. Offhand, I can't think of
>> any example of an HLL where the semantics of comparison depend on that
>> particular difference. Anybody?
>
>A HLL comparison can return a perfectly valid True/False result,
>without an extra overflow case. That's different for a subtraction,
>where overflows can occur, and the language specification or
>implementation must specify the handling of this case.
My point was that the overflow occurs during compare as well during
subtraction and on some architectures only compare will tell you that
it did. As you correctly point out - overflow doesn't matter for
comparison.
>IMO you missed my point.
Sorry if I did.
>On machines with a single compare
>instruction, different conditional jumps must be used for signed or
>unsigned compares, i.e. HLL ">" becomes either BHI or BGT. No such
>instructions or condition code settings exist for mixed sign
>operands. The same restrictions apply to other machine architectures,
>and to subtraction instead of comparison instructions.
Not necessarily. For machines that don't expose codes but simply
return a signed value in a register reflecting the result, the basic
six signed conditional jumps are sufficient for all uniform
comparisons.
>That's why a comparison or other arithmetic on mixed sign operands
>becomes more expensive than comparisons of operands of the same
>signedness, on every machine.
I believe comparing with mixed signage is usually a semantic mistake.
The unfortunate reality is that it happens to work approximately
99.999999976716935634613037109375% of the time and this lulls people
into thinking it is okay.
>In the case of true subranges (less bits required than a machine word
>can hold), the values can be stored as either signed or unsigned
>words, according to their signedness. Then all such values can be
>retrieved and interpreted as signed words, preventing mixed mode
>arithmetic.
At least within integer subranges. I haven't used a language with
real subranges for quite a while but I recall that pascal and
derivatives required deliberate ord() calls to compare enumerated
types, which structurally are also subranges.
>If you or somebody else disagrees, we better discuss the topic
>privately, and present the results to the group.
I don't think we have any real disagreement ... this exchange has been
mainly expository.
George
|
|
0
|
|
|
|
Reply
|
George
|
7/19/2007 10:17:15 PM
|
|
|
27 Replies
283 Views
(page loaded in 0.268 seconds)
Similiar Articles: 64 bit integer - comp.lang.c++.moderatedIntegers on 64-bit machines - comp.compilers Signed shift of 32-bit int using 16-bit instructions? - comp.lang ... Integers on 64-bit machines - comp.compilers Signed ... Difference between MATLAB 64-bit and 32-bit - comp.soft-sys.matlab ...Large parts of the model use integer or fi math and most of the floating point ... I have 32-bit and 64-bit 2007a running on two different machines. The 64-bit system is a ... 128 bit integer - comp.lang.cIntegers on 64-bit machines - comp.compilers 128-bit integers - comp.lang.c++.moderated Integers on 64-bit machines - comp.compilers 128 bit integer - comp.lang.c I'd like ... Access 10 (64bit) -Integer & Table size - comp.databases.ms-access ...Integers on 64-bit machines - comp.compilers... penalizes 32-bit access, then a 64-bit ... arithmetic), the main integer type should have the same size as a pointer; for ... 64-bit Random Number Generator - comp.lang.c++.moderated ...... C++ - Bytes - Tech Commmunity ... 64-bit Random Number Generator. C / C++ Forums on Bytes. ... Hello; Does anyone know a 64-bit integer generator for 32-bit machines that 64 bit Matlab hangs during long computations - comp.soft-sys ...... 10mins - 12hrs) in 64 bit Matlab. The ... can do except switch the machine off ... US-III a program compiled as 64 bit runs ... speaking, code that's heavy with 64bit integer ... Mixed-mode Applet signing - comp.lang.java.securityIntegers on 64-bit machines - comp.compilers Mixed-mode Applet signing - comp.lang.java.security Integers on 64-bit machines - comp.compilers Then all such values can be ... 64-bit byteswapping and legacy programs - comp.lang.fortran ...windows 7 64-bit fortran compiler - comp.soft-sys.matlab ..... on 64-bit machines - comp.compilers... comp.soft-sys.matlab Integers on 64-bit machines - comp.compilers ... Signed shift of 32-bit int using 16-bit instructions? - comp.lang ...Integers on 64-bit machines - comp.compilers Signed shift of 32-bit int using 16-bit instructions? - comp.lang ... Integers on 64-bit machines - comp.compilers Signed ... How to port 32 bit unsigned char* to 64 bit - comp.lang.c++ ...On most machines I use, it is either unsigned long, or unsigned int with unsigned ... 64 bit integer - comp.lang.c++.moderated... s in an integer, so i need an unsigned 64 ... Compiling 64 bit php on sun4u 2.10 - comp.unix.solaris64-Bit FAQs Only the UltraSPARC (sun4u) machines. 12.5 How do I boot the 64-bit kernel? ... (as opposed to 2.6 or an earlier release). The 64-bit capable compiler as well ... Reversing bit order in delphi ? - comp.lang.asm.x86Integers on 64-bit machines - comp.compilers In strictly typed languages, like Delphi ... bugs between 32-bit and 64-bit systems, > and between different byte orders are ... Part specification... is neither an integer nor a list of integers ...Integers on 64-bit machines - comp.compilers... how big all your integers needed to be, and I can't say that part was ... pointers > than int, neither in the standard nor ... Define a variable at a fixed address? - comp.compilers.lcc ...Integers on 64-bit machines - comp.compilers... knew that was how you got a word variable. On IBM mainframes, they were all FIXED ... And what is the programmer supposed ... Restrict access to a field IF based on privleges? - comp.databases ...Integers on 64-bit machines - comp.compilers If the architecture heavily penalizes 32-bit access, then ... small subset of the integers, whose bounds are based on ... What should the 64-bit integer type on new, 64-bit machines be ...What should the 64-bit integer type on new, 64-bit machines be?__ , Ask your Computers & Technology questions at ibibo, Give answers share your knowledge on Computers ... 64-bit - Wikipedia, the free encyclopediaIn 32-bit programs, pointers and data types such as integers generally have the same length; this is not necessarily true on 64-bit machines. Mixing data types in ... 7/29/2012 7:49:28 PM
|