"Edsko de Vries" <devriese@cs.tcd.ie> writes: > our current definition for integer constant looks something like > > INT ([1-9][0-9]*)|0 > > In particular, note that it does not allow for an (optional) "+" or "-" > at the start of the integer. In other words, "-" is treated as a unary > operator, rather than as part of the number. > > This works fine, with the sole exception of the number "-2147483648". > The problem is, of course, overflow: -2147483648 is a valid negative > number (assuming 32-bit numbers), but the integer 2147483648 is _not_ a > valid positive number. > > At the moment, I have added a new rule to the lexer that matches > this exact number, which solves the problem - but it's a hack.. One solution is to let the lexer keep the number as a string and handle unary minus applied to a constant in the parser (at this time converting the string to a number). Or you can use a bignum package in the compiler, only converting to machine integers when you generate code. Torben

0 |

1/31/2006 5:47:16 PM

> This works fine, with the sole exception of the number "-2147483648". > The problem is, of course, overflow: -2147483648 is a valid negative > number (assuming 32-bit numbers), but the integer 2147483648 is _not_ a > valid positive number. Thus, the above method of dealing with "-" as a > unary operator breaks down. I know a few vendor-supplied C compilers that get this wrong, printing warnings about "positive constant overflows" or something like that, which is all the more confusing an error when you look at it and say "but that's *not* a positive constant!". > The solution is to interpret the "-" as part of the number, and > generate INT(-2147483648), I think a better solution is to let the parser do its job and determine whether the - is a negation or a subtraction, and then once that is done, you can warn about INT(2147483648) that isn't inside a negation. I put this into the Plan 9 C compilers recently and it worked well. Russ

0 |

1/31/2006 5:47:25 PM

"Edsko de Vries" <devriese@cs.tcd.ie> writes: > Is there a good solution that solves both problems? I don't know if you consider it "good", but doing compile time evaluation of constant expressions with an unbounded precision package would solve the issue. It immediately extend to having unbounded precision integers at runtime when you'll add it. -- Jean-Marc

0 |

1/31/2006 5:47:34 PM

On 31 Jan 2006 09:57:37 -0500, Edsko de Vries wrote: [...] > Is there a good solution that solves both problems? At the moment, I > have added a new rule to the lexer that matches this exact number, > which solves the problem - but it's a hack.. (actually, there are two > rules, one for 32-bit machines, one for 64-bit machines..) In Ada all literals are considered of some indefinite numeric type [Universal_Integer]. The compiler evaluates all expressions exactly and generates error when the result is converted to the target [finite] type. So, for example, if Int_8 has range -128..127, nevertheless the following is legal Ada: X : Int_8 := 1000 - 900; -- The result is in the range With this approach, literals are unsigned and unary minus works just fine. Of course, you will not be able to compile something like 2**98756466 - 2**98756466 which is also legal. But it is not a big loss. (:-)) -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de

0 |

1/31/2006 5:47:46 PM

On 31 Jan 2006 09:57:37 -0500, "Edsko de Vries" <devriese@cs.tcd.ie> wrote: >-2147483648 is a valid negative number (assuming 32-bit numbers), >but the integer 2147483648 is _not_ a valid positive number. >Even requiring a space before the operator >does not help, because "1 -1" will be recognised as INT(1), INT(-1) >instead of INT(1), OP(-), INT(1). > >Is there a good solution that solves both problems? You can solve it two ways: require that white space or another operator (such as parentheses) follow the binary operator, or restrict the range of integer constants so that unary negation is always possible. Or you could try a suitably condescending error message ... it's obvious that "1-1" is a misspelling of "H" 8-) George

0 |

2/1/2006 2:20:51 AM

Russ Cox wrote: >>This works fine, with the sole exception of the number "-2147483648". >>The problem is, of course, overflow: -2147483648 is a valid negative >>number (assuming 32-bit numbers), but the integer 2147483648 is _not_ a >>valid positive number. Thus, the above method of dealing with "-" as a >>unary operator breaks down. > > > I know a few vendor-supplied C compilers that get this wrong, > printing warnings about "positive constant overflows" or something > like that, which is all the more confusing an error when you look at > it and say "but that's *not* a positive constant!". > Your C compiler is correct. In standard C there are no "negative" integer constants. See section 6.4.4.1 of the C99 standard. What is happening here, on a 32-bit 2's compliment machine is that the value is too large to fit into a signed 'int', so it becomes an 'unsigned int', following the type hierarchy in section 6.4.4.1. You've then applied a unary negation to an 'unsigned int' and the result is also an 'unsigned int' (see section 6.5.3.3).... it's not clear what the arithmetic negation of an 'unsigned int' would be. For this example, our compiled generates: Warning #2296: unary negation applied to an unsigned type Some other compilers also generate warnings like "constant is so large that it is unsigned", I think that's one I've seen from GCC before. So - to go back to what the original post was about, the C language has the same problem. Our solution is to keep constants as strings. When it is time for the value of the constant to be evaluated, we have to examine it to see which type can "hold" it, and place it in appropriate "containers" for that type. This can be particularly cumbersome in a cross-platform environment, such as a 32-bit host compiling for a 64-bit machine... where there would be a requirement for manipulating large values in a non-native fashion (e.g. some "bignum" implemenation.) - Dave Rivers - -- rivers@dignus.com Work: (919) 676-0847 Get your mainframe programming tools at http://www.dignus.com

0 |

2/1/2006 2:21:08 AM

Edsko de Vries writes: >> our current definition for integer constant looks something like >> INT ([1-9][0-9]*)|0 >> In particular, note that it does not allow for an (optional) "+" or "-" >> at the start of the integer. In other words, "-" is treated as a unary >> operator, rather than as part of the number. >> >> This works fine, with the sole exception of the number "-2147483648". >> The problem is, of course, overflow: -2147483648 is a valid negative >> number (assuming 32-bit numbers), but the integer 2147483648 is _not_ a >> valid positive number. >> >> At the moment, I have added a new rule to the lexer that matches >> this exact number, which solves the problem - but it's a hack.. > Torben �gidius Mogensen wrote: > One solution is to let the lexer keep the number as a string and > handle unary minus applied to a constant in the parser (at this time > converting the string to a number). Or you can use a bignum package > in the compiler, only converting to machine integers when you generate > code. Another method is to convert all but the last digit into an integer, so that <2147483648> becomes <214748364, 8>. Then a simple range comparison can be made to determine if the token is too big for a valid integer value. This method avoids overflow. It's interesting to note that the C and C++ grammars do not treat leading signs as part of a constant but as a separate unary operator. Other languages such as COBOL do treat leading signs as part of a literal constant, but these kinds of languages also typically require additional spaces to separate operands from operators. -drt

0 |

2/1/2006 2:22:45 AM

Edsko de Vries <devriese@cs.tcd.ie> wrote: >The problem is, of course, overflow: -2147483648 is a valid negative >number (assuming 32-bit numbers), but the integer 2147483648 is _not_ a >valid positive number... Is there a good solution...? The hack that has been used for this in the past is to compute and store a number as the negative of its actual value (temporarily, until you know whether there's a unary minus in front). If that's too distasteful, you really have to find an extra bit somewhere: + Store the constant as an unsigned number (32-bit unsigned goes up to 4_294_967_295 without overflow, so 2_147_483_648 is no problem), until you know the sign. + Use a larger signed type, if you have one. + Use a bignum (unlimited-length integer) library. + Store the number as a string, unconverted, until you know its sign. -- spsystems.net is temporarily off the air; | Henry Spencer mail to henry at zoo.utoronto.ca instead. | henry@spsystems.net

0 |

2/1/2006 2:23:00 AM

Edsko de Vries wrote: > I have a question regarding lexical analysis. I recently came across a > bug in our lexical analyser in phc (www.phpcompiler.org), that I am > unsure how to solve. This is the problem: our current definition for > integer constant looks something like (snip) > This works fine, with the sole exception of the number "-2147483648". > The problem is, of course, overflow: -2147483648 is a valid negative > number (assuming 32-bit numbers), but the integer 2147483648 is _not_ a > valid positive number. Thus, the above method of dealing with "-" as a > unary operator breaks down. (snip) Look at the language definition. C allows only positive constants and unary operators, as you say, though a constant expression might have the value -2147483648. Fortran allows only positive constants in expressions, but optionally signed constants in DATA statements. It might not be required, but I believe most compilers will allow the maximum negative number in a DATA statement. -- glen

0 |

2/2/2006 4:34:02 PM

Edsko de Vries wrote: > This works fine, with the sole exception of the number "-2147483648". > The problem is, of course, overflow: -2147483648 is a valid negative > number (assuming 32-bit numbers), but the integer 2147483648 is _not_ a > valid positive number. Thus, the above method of dealing with "-" as a > unary operator breaks down. When you separate the sign from the numbers, you should use unsigned values to prevent overflows. Shouldn't your code also accept unsigned integers > 2^31? DoDi

0 |

2/2/2006 4:35:41 PM

David R Tribble wrote: > Another method is to convert all but the last digit into an integer, > so that <2147483648> becomes <214748364, 8>. Then a simple > range comparison can be made to determine if the token is too > big for a valid integer value. This method avoids overflow. Just to add a note... There is a rule (I thought it was Horner's Rule, but that's not right), which allows you to check whether adding a digit to a number will result in overflow before doing the actual multiply and add of the digit. -drt

0 |

2/3/2006 11:35:45 PM

Reuters Michael Bloomberg may launch independent US presidential bid - source Reuters Michael Bloomberg, the billionaire former mayor of New ...

Millions of Americans were preparing to dig themselves out Sunday after a mammoth blizzard with hurricane-force winds and record-setting snowfall ...

Police on Saturday charged a 17-year-old boy with four counts of first-degree murder and seven counts of attempted murder in a mass shooting ...

Tian Tian, the giant panda bear at Washington, DC's National Zoo enjoys rolling around in the snow after a winter storm dumped nearly two feet ...

Xinhua China, Iran upgrade ties to carry forward millennia-old friendship Xinhua TEHRAN, Jan. 23, 2016 (Xinhua) Chinese PresidentXi Jinping(R ...

The private sector space race is heating up, and Amazon.com co-founder Jeff Bezos’ Blue Origin is back out in front with the first successful ...

Comparisons can be drawn to John Waters and Harmony Korine, among others, but they do not flatter "Strangler."

Sean Mclain in the Wall Street Journal reported that the studio wanted to avoid clashing dates with blockbuster Bollywood releases — Shah Rukh ...

One of the biggest issues the Obama administration had to deal with throughout their eight-year tenure is the threat of the Islamic State in ...

"I think what the Academy did this week is a step in the right direction but it's going to be a slow moving change because the Academy reflects ...

Resources last updated: 1/24/2016 8:05:25 AM