Inspired by reading comp.lang.c What is the least painful way for detecting signed integer overflow on Itanium and on other processors that have no special hardware support for the task (RISC-V)? After relatively short thinking I was unable to figure out anything better than following relatively long sequence (in pseudo-C, for addition, but subtraction will be similar): sum = a + b; signdiff_ab = a ^ b; signdiff_as = sum ^ a; ovf_in_msb = signdiff_as & (~signdiff_ab); // andcm on IPF ovf = ovf_in_msb[MSB]; // tbit on IPF Do you know better options?

0 |

12/11/2016 12:32:23 PM

On Sunday, December 11, 2016 at 3:32:24 PM UTC+3, already...@yahoo.com wrot= e: > Inspired by reading comp.lang.c >=20 > What is the least painful way for detecting signed integer overflow on It= anium and on other processors that have no special hardware support for the= task (RISC-V)? >=20 > After relatively short thinking I was unable to figure out anything bette= r than following relatively long sequence (in pseudo-C, for addition, but s= ubtraction will be similar): >=20 > sum =3D a + b; signdiff_ab =3D a ^ b; > signdiff_as =3D sum ^ a; > ovf_in_msb =3D signdiff_as & (~signdiff_ab); // andcm on IPF > ovf =3D ovf_in_msb[MSB]; // tbit on IPF >=20 > Do you know better options? Nope, three binary operations is as good as you're going to get. You're luc= ky to have and-with-complement, because you're going to have to complement = *something* ovf =3D (a^b | ~a^sum) >=3D 0; ovf =3D (a^b | ~(a^sum)) >=3D 0; ovf =3D (~(a^b) & (a^sum)) < 0; ovf =3D ((~a^b) & (a^sum)) < 0; Any of those will work. Or, obviously, inverting b or sum instead of a. If = you've got an instruction NAND or NOR or ANDC/BIC or ORC then use that vari= ant and save an explicit NOT. Lots of intruction sets have ANDC (also know as Bit Clear). PDP11 and VAX h= ave *only* BIC (and BIS aka OR) not AND! Both Power/PowerPC and Aarch64 have OR with Complement as well as ANDC. The= y also both have EQV (PowerPC) EON (Exclusive Or with Not) making ~(a^b) or= a^~b one instruction (it's the same thing). So there are a lot of ways to = get a free NOT in PowerPC and Aarch64 -- smart people.

0 |

12/11/2016 3:30:00 PM

On 11/12/2016 13:32, already5chosen@yahoo.com wrote: > Inspired by reading comp.lang.c > > What is the least painful way for detecting signed integer overflow > on Itanium and on other processors that have no special hardware > support for the task (RISC-V)? I'm glad something positive came out of my... "interaction" with Jacob. (His know-it-all attitude drives me up the wall.) In LWN, someone offered: bool add_overflow(int a, int b) { if (b > 0) return a > INT_MAX - b; else return a < INT_MIN - b; } Easy to review, but probably not optimal, unless the compiler can divine the purpose of the function. I was hoping to find an Itanic codegen on https://godbolt.org/ but no such luck :-( Regards.

0 |

12/11/2016 4:47:27 PM

On 12/11/2016 4:32 AM, already5chosen@yahoo.com wrote: > Inspired by reading comp.lang.c > > What is the least painful way for detecting signed integer overflow on Itanium and on other processors that have no special hardware support for the task (RISC-V)? > > After relatively short thinking I was unable to figure out anything better than following relatively long sequence (in pseudo-C, for addition, but subtraction will be similar): > > sum = a + b; signdiff_ab = a ^ b; > signdiff_as = sum ^ a; > ovf_in_msb = signdiff_as & (~signdiff_ab); // andcm on IPF > ovf = ovf_in_msb[MSB]; // tbit on IPF > > Do you know better options? > The direct Mill translation of the above algorithm on Mill would be: sum = addu(a, b), signdiff_ab = xor(a, b); signdiff_as = xor(sum, a); ovf_in_msb = nand(signdiff_as, signdiff_ab);] ovf = test(ovf_in_msb, MSB), brtr(<lab>, ovf); which is six ops including the branch, over four cycles. Noob posted this alternative: bool add_overflow(int a, int b) { if (b > 0) return a > INT_MAX - b; else return a < INT_MIN - b; } for which Mill gets: %1 = lsss(b, 0), %2 = rsubs(b, INT_MAX); %3 = gtrs(a, %2), % 4 = lsss(a, %2), %5 = pick(%1, %3, %4), brtr(<lab", %5); which is significantly better at six ops and two cycles. The compiler finds both of the above schedules from the corresponding C source. The best Mill code is: addsx(a,b), isNaR(), brtr(<lab>, b1); That's three op slots in one instruction, one cycle. addsx(add<signed><excepting>) will detect the overflow, the isNaR will drop it to the belt as a bool, where the brtr branch uses it as a predicate. You get the add result to the belt too, if you can use it. However, you need to use intrinsics to get this code; the C to get this is: #include "millintrins.h" if (isNaR(a+b)) {...}

0 |

12/11/2016 6:57:15 PM

On Sunday, December 11, 2016 at 6:47:32 PM UTC+2, Noob wrote: > On 11/12/2016 13:32, already5chosen@yahoo.com wrote: > > > Inspired by reading comp.lang.c > > > > What is the least painful way for detecting signed integer overflow > > on Itanium and on other processors that have no special hardware > > support for the task (RISC-V)? > > I'm glad something positive came out of my... "interaction" > with Jacob. (His know-it-all attitude drives me up the wall.) > > In LWN, someone offered: > > bool add_overflow(int a, int b) { > if (b > 0) > return a > INT_MAX - b; > else > return a < INT_MIN - b; > } > > Easy to review, but probably not optimal, unless the compiler > can divine the purpose of the function. Thank you. As written, it already can be done in 3 clocks on IPF, which is already one clock faster than my variant. In order to save one syllable, it can be modified a little (it seems, Mill compiler found modification by itself): bool add_overflow(int a, int b) { int a_limit = (int)((unsigned)INT_MIN - (unsigned)b); // take advantage of INT_MIN == INT_MAX+1 if (b > 0) return a >= a_limit; else return a < a_limit; } Itanium code generator in my head compiles it into following asm: sub a_limit=INT_MIN,b; sub neg_b=0,b; cmp.none.ne ovf,dummy=0,0; cmp.none.ge (pos_ovf,neg_ovf)=a,a_limit (pos_ovf) tbit.or.nz (ovf,dummy)=neg_b,MSB_POS; (neg_ovf) tbit.or.z (ovf,dummy)=neg_b,MSB_POS; 6 syllables, 3 clocks (not including branch). 7th syllables required if we also need sum, but it's still only 3 clocks. If either a or b is available early, than the first line can be removed from critical path, leaving only 2 clocks of "critical" latency. And I am still not sure that it is the best way to do it. > > I was hoping to find an Itanic codegen on https://godbolt.org/ > but no such luck :-( > > Regards.

0 |

12/12/2016 1:49:59 PM

On Monday, December 12, 2016 at 3:50:01 PM UTC+2, already...@yahoo.com wrote: > On Sunday, December 11, 2016 at 6:47:32 PM UTC+2, Noob wrote: > > On 11/12/2016 13:32, already5chosen@yahoo.com wrote: > > > > > Inspired by reading comp.lang.c > > > > > > What is the least painful way for detecting signed integer overflow > > > on Itanium and on other processors that have no special hardware > > > support for the task (RISC-V)? > > > > I'm glad something positive came out of my... "interaction" > > with Jacob. (His know-it-all attitude drives me up the wall.) > > > > In LWN, someone offered: > > > > bool add_overflow(int a, int b) { > > if (b > 0) > > return a > INT_MAX - b; > > else > > return a < INT_MIN - b; > > } > > > > Easy to review, but probably not optimal, unless the compiler > > can divine the purpose of the function. > > > Thank you. > As written, it already can be done in 3 clocks on IPF, which is already one clock faster than my variant. > > In order to save one syllable, it can be modified a little (it seems, Mill compiler found modification by itself): > > > bool add_overflow(int a, int b) { > int a_limit = (int)((unsigned)INT_MIN - (unsigned)b); // take advantage of INT_MIN == INT_MAX+1 > if (b > 0) > return a >= a_limit; > else > return a < a_limit; > } > > Itanium code generator in my head compiles it into following asm: > sub a_limit=INT_MIN,b; sub neg_b=0,b; cmp.none.ne ovf,dummy=0,0; > cmp.none.ge (pos_ovf,neg_ovf)=a,a_limit > (pos_ovf) tbit.or.nz (ovf,dummy)=neg_b,MSB_POS; (neg_ovf) tbit.or.z (ovf,dummy)=neg_b,MSB_POS; > > 6 syllables, 3 clocks (not including branch). 7th syllables required if we also need sum, but it's still only 3 clocks. > O.k. In code above I was too excited about "parallel" cmp/tbit feature and didn't realize that it does not help our case. Here is "sequential" variant: sub a_limit=INT_MIN,b; cmp.none.gt (b_pos,b_negz)=b,0; (b_pos) cmp.unc.ge (ovf,dummy)=a,a_limit; (b_negz) cmp.none.lt (ovf,dummy)=a,a_limit; The same number of clocks as above, but only 4 syllables instead of 6. > If either a or b is available early, than the first line can be removed from critical path, leaving only 2 clocks of "critical" latency. > > And I am still not sure that it is the best way to do it. > > > > > I was hoping to find an Itanic codegen on https://godbolt.org/ > > but no such luck :-( > > > > Regards.

0 |

12/12/2016 3:34:45 PM