COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

Integer division using IMUL???

• Email
• Follow

```Hello,

While debugging an application made by someone, I've come accross a
confusing integer calculation. The application is a disk defragmenter
with a disk cluster map display shown as rows and columns. The
application displays a fixed 8x8 pixel blocks with one pixel as the
separator between each blocks. The blocks always fills the entire
available drawing area (with some left over space), so only the number
of rows & column is calculated. The objective of the calculation is to
find out how many blocks can be fitted in the drawing area.

To make things simple, I adjusted the window height so that the number
of columns is only one. The relevant variables/values that are used by
the application is like so:

W = Drawing area width
B = Block width (not sure if it includes the separator)
C = Number of blocks

Obviously, I will have this formula:
C = W div B

However, the application uses confusing calculation:
mov   eax, 38E38E39h ;edx:eax = 38E38E39h (hard coded)
xor   edx, edx
mov   ecx, W         ;ecx = W
imul  ecx
sar   edx, 1
mov   C, edx         ;C = edx

So the question is, what kind of calculation is that? I've never seen
something like this. A division calculation using a multiplication. What
I can understand is that it's a compiler optimization because
multiplication instructions are relatively faster than division
instructions. But the method/trick of accomplishing that is new and
unknown to me. Does anyone know something like this?

Regards,
Jaelani

```
 0

See related articles to this posting

```In article <49cdc658\$0\$90272\$14726298@news.sunsite.dk>,
Jaelani  <spamtrap@crayne.org> writes:
> Hello,
>
> While debugging an application made by someone, I've come accross a
> confusing integer calculation. The application is a disk defragmenter
> with a disk cluster map display shown as rows and columns. The
> application displays a fixed 8x8 pixel blocks with one pixel as the
> separator between each blocks. The blocks always fills the entire
> available drawing area (with some left over space), so only the number
> of rows & column is calculated. The objective of the calculation is to
> find out how many blocks can be fitted in the drawing area.
>
> To make things simple, I adjusted the window height so that the number
> of columns is only one. The relevant variables/values that are used by
> the application is like so:
>
>
> W = Drawing area width
> B = Block width (not sure if it includes the separator)

It includes the separator, so B=9.

> C = Number of blocks

> Obviously, I will have this formula:
> C = W div B
>

That is not obvious. Division is slow. Multiplication by the
reciprocal is faster:

C = W * (1/B)

assuming that 1/B is known at compile-time.

>
> However, the application uses confusing calculation:
>
> mov   eax, 38E38E39h ;edx:eax = 38E38E39h (hard coded)
> xor   edx, edx
> mov   ecx, W         ;ecx = W
> imul  ecx
> sar   edx, 1
> mov   C, edx         ;C = edx

> So the question is, what kind of calculation is that?

As said, the reciprocal is 1/9. But that would be truncated to zero

C = W * (2^n/B)
---------
2^n

for some convenient value of n. n=32 is very convenient but may not
be accurate enough. This program fragment uses n=33.

2^33/9 = 38E38E38

with a fraction of .38E38E3838E38E3838E38E38....et cetera.
The division by 2^33 is carried out in two steps:

1) discard the low 32 bits of W * (2^n/B). In other words use DX
which contains the upper half of the result, ignoring AX which
contains the lowest 32 bits.

2) Shift right to conclude division by 2^33.

```
 0

```Oops.
Correction on my previous posting:
>         2^33/9 = 38E38E38
>
> with a fraction of .38E38E3838E38E3838E38E38....et cetera.

That should be:

>         2^33/9 = 38E38E38
>
> with a fraction of .E38E3838E38E3838E38E38....et cetera.

Rounding 38E38E38.E3  gives 38E38E39

```
 0

```On Mar 28, 12:42�am, Jaelani  <spamt...@crayne.org> wrote:

> So the question is, what kind of calculation is that? I've never seen
> something like this. A division calculation using a multiplication. What
> I can understand is that it's a compiler optimization because
> multiplication instructions are relatively faster than division
> instructions. But the method/trick of accomplishing that is new and
> unknown to me. Does anyone know something like this?
>
>
> Regards,
> Jaelani

Chapter 8 of the AMD Optimization Guide for x86 does a pretty good
job:

http://support.amd.com/us/Processor_TechDocs/22007.pdf

And it will help you better understand, where the constant is coming
from
in the code you provided.

Brian

```
 0

```On Mar 27, 11:42�pm, Jaelani  <spamt...@crayne.org> wrote:
> Hello,
>
> While debugging an application made by someone, I've come accross a
> confusing integer calculation. The application is a disk defragmenter
> with a disk cluster map display shown as rows and columns. The
> application displays a fixed 8x8 pixel blocks with one pixel as the
> separator between each blocks. The blocks always fills the entire
> available drawing area (with some left over space), so only the number
> of rows & column is calculated. The objective of the calculation is to
> find out how many blocks can be fitted in the drawing area.
>
> To make things simple, I adjusted the window height so that the number
> of columns is only one. The relevant variables/values that are used by
> the application is like so:
>
> W = Drawing area width
> B = Block width (not sure if it includes the separator)
> C = Number of blocks
>
> Obviously, I will have this formula:
> C = W div B
>
> However, the application uses confusing calculation:
> mov � eax, 38E38E39h ;edx:eax = 38E38E39h (hard coded)
> xor � edx, edx
> mov � ecx, W � � � � ;ecx = W
> imul �ecx
> sar � edx, 1
> mov � C, edx � � � � ;C = edx
>
> So the question is, what kind of calculation is that? I've never seen
> something like this. A division calculation using a multiplication. What
> I can understand is that it's a compiler optimization because
> multiplication instructions are relatively faster than division
> instructions. But the method/trick of accomplishing that is new and
> unknown to me. Does anyone know something like this?
>

It's so-called fixed-point arithmetic. I'm pretty sure you can work it
out if you follow the calculations.

An example in decimal... Suppose your CPU registers aren't binary but
decimal and can hold 4 decimal digits.
Suppose you want to calculate 3333 / 10.
Now, what if you represent the 10 as 10000/1000 and calculate 3333 *
1000 / 10000?
Watch it:
mov ax, 3333
mov bx, 1000
mul bx
After this dx:ax = 3333*1000=3333000. But if ax holds 4 decimal digits
as we agreed earlier (3000) and we simply drop them off, we end up
with 333 in dx.
And that's what you wanted: 3333 / 10 = 333.

Now, of course, those aren't decimal registers, they're binary. Do the
math.

Alex

```
 0

```Jaelani schrieb:
> However, the application uses confusing calculation:
> mov   eax, 38E38E39h ;edx:eax = 38E38E39h (hard coded)
> xor   edx, edx
> mov   ecx, W         ;ecx = W
> imul  ecx
> sar   edx, 1
> mov   C, edx         ;C = edx
>
> So the question is, what kind of calculation is that? I've never seen
> something like this. A division calculation using a multiplication. What
> I can understand is that it's a compiler optimization because
> multiplication instructions are relatively faster than division
> instructions. But the method/trick of accomplishing that is new and
> unknown to me. Does anyone know something like this?

Division is the same as multplying by the reciprocal.
38E38E39h is (2^33)/9. SAR corrects for 2^33 vs 2^32.

Hendrik vdH

```
 0

```Jaelani wrote:
> Hello,
>
> While debugging an application made by someone, I've come accross a
> confusing integer calculation. The application is a disk defragmenter
> with a disk cluster map display shown as rows and columns. The
> application displays a fixed 8x8 pixel blocks with one pixel as the
> separator between each blocks. The blocks always fills the entire
> available drawing area (with some left over space), so only the number
> of rows & column is calculated. The objective of the calculation is to
> find out how many blocks can be fitted in the drawing area.
>
> To make things simple, I adjusted the window height so that the number
> of columns is only one. The relevant variables/values that are used by
> the application is like so:
>
> W = Drawing area width
> B = Block width (not sure if it includes the separator)
> C = Number of blocks
>
> Obviously, I will have this formula:
> C = W div B
>
> However, the application uses confusing calculation:
> mov   eax, 38E38E39h ;edx:eax = 38E38E39h (hard coded)
> xor   edx, edx
> mov   ecx, W         ;ecx = W
> imul  ecx
> sar   edx, 1
> mov   C, edx         ;C = edx
>
> So the question is, what kind of calculation is that? I've never seen
> something like this. A division calculation using a multiplication. What
> I can understand is that it's a compiler optimization because
> multiplication instructions are relatively faster than division
> instructions. But the method/trick of accomplishing that is new and
> unknown to me. Does anyone know something like this?

Sure!

I once helped write a paper about it, together with Agner Fog (do a
search!).

The key here is that the compiler could figure out that B was a
constant, in which case _all_ modern compilers will replace division by
a reciprocal multiplication followed by a shift.

It looks like the divisor could have been around 10?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

```
 0

```Terje Mathisen wrote:
> Sure!
>
> I once helped write a paper about it, together with Agner Fog (do a
> search!).
>
> The key here is that the compiler could figure out that B was a
> constant, in which case _all_ modern compilers will replace division by
> a reciprocal multiplication followed by a shift.
>
> It looks like the divisor could have been around 10?
>
> Terje

Many thanks for that. I later found a similar method in the AMD
documentations, but Agner Fog's document is much easier to understand.

I only sure that the divisor is 9 (8 pixel block & 1 pixel separator),
but I don't know why the application uses 38E38E39h instead of
E38E38E4h. It probably uses the other method described in the AMD
document although the application seems to use the exact same method.

Thanks again.

_______
Jaelani

```
 0

```bwaichu@yahoo.com wrote:
> Chapter 8 of the AMD Optimization Guide for x86 does a pretty good
> job:
>
> http://support.amd.com/us/Processor_TechDocs/22007.pdf
>
> And it will help you better understand, where the constant is coming
> from
> in the code you provided.
>
> Brian

Yes, it helps. Thank you.

One things though... It's unfortunate that, for Intel CPUs which has a
relatively slower integer division instructions than AMD, don't have
this particular topic in their CPU optimization manual.

```
 0

```>> Obviously, I will have this formula:
>> C = W div B
>>
>
> That is not obvious. Division is slow. Multiplication by the
> reciprocal is faster:
>
>   C = W * (1/B)
>
> assuming that 1/B is known at compile-time.

I'm an idiot, so it was the only formula I can think of.

>> So the question is, what kind of calculation is that?
>
> As said, the reciprocal is 1/9. But that would be truncated to zero
> using integer arithmetic. Instead use:
>
>
>   C = W * (2^n/B)
>       ---------
>        2^n
>
> for some convenient value of n. n=32 is very convenient but may not
> be accurate enough. This program fragment uses n=33.
>
>         2^33/9 = 38E38E38
>
> with a fraction of .38E38E3838E38E3838E38E38....et cetera.
> The division by 2^33 is carried out in two steps:
>
> 1) discard the low 32 bits of W * (2^n/B). In other words use DX
>    which contains the upper half of the result, ignoring AX which
>    contains the lowest 32 bits.
>
> 2) Shift right to conclude division by 2^33.

The explanation makes more sense. So it's similar to scaling-up the
dividend prior dividing and scaling-down back the result, correct?

For 32-bit integers, does that mean the additional bits (n above 32)
serves as the fractional part? 32 significant bits and 32 fractional
bits perhaps?

```
 0

```Alexei A. Frounze wrote:
> It's so-called fixed-point arithmetic. I'm pretty sure you can work it
> out if you follow the calculations.
>
> An example in decimal... Suppose your CPU registers aren't binary but
> decimal and can hold 4 decimal digits.
> Suppose you want to calculate 3333 / 10.
> Now, what if you represent the 10 as 10000/1000 and calculate 3333 *
> 1000 / 10000?
> Watch it:
> mov ax, 3333
> mov bx, 1000
> mul bx
> After this dx:ax = 3333*1000=3333000. But if ax holds 4 decimal digits
> as we agreed earlier (3000) and we simply drop them off, we end up
> with 333 in dx.
> And that's what you wanted: 3333 / 10 = 333.
>
> Now, of course, those aren't decimal registers, they're binary. Do the
> math.
>
> Alex

Great. This isn't just an answer to my question, but also helps me
understand how to multiply two 32-bit integers using only 16-bit
registers. A question of "how did they come up with that?" which was
long overdue.

Thanks.

```
 0

```Hendrik van der Heijden wrote:
> Division is the same as multplying by the reciprocal.
> 38E38E39h is (2^33)/9. SAR corrects for 2^33 vs 2^32.
>
>
> Hendrik vdH

Thanks.

With all the people here bursting with answers makes me feel like a newbie.

```
 0

```On Mar 29, 4:10�am, Terje Mathisen  <spamt...@crayne.org> wrote:
>
> I once helped write a paper about it, together with Agner Fog (do a
> search!).

These are really good:

http://www.agner.org/optimize/#manuals

Now, I'm learning C++ because of them.

Thanks,

Brian

```
 0

```Jaelani wrote:
> Terje Mathisen wrote:
>> Sure!
>>
>> I once helped write a paper about it, together with Agner Fog (do a
>> search!).
>>
>> The key here is that the compiler could figure out that B was a
>> constant, in which case _all_ modern compilers will replace division
>> by a reciprocal multiplication followed by a shift.
>>
>> It looks like the divisor could have been around 10?
>>
>> Terje
>
> Many thanks for that. I later found a similar method in the AMD
> documentations, but Agner Fog's document is much easier to understand.
>
> I only sure that the divisor is 9 (8 pixel block & 1 pixel separator),
> but I don't know why the application uses 38E38E39h instead of
> E38E38E4h. It probably uses the other method described in the AMD

Ah! This is easy to answer: It uses the shortest equivalent reciprocal,
i.e. the one where any trailing zero bits have been shifted out:

.....E4 is a number which ends in two zero bits, so the only effect these
bits have is to increase the required shift amount by 2, i.e. you would
need

SHR EDX,3

SHR EDX,1

> document although the application seems to use the exact same method.
Good luck!

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

```
 0

13 Replies
113 Views

Similar Articles

12/14/2013 12:40:29 PM
page loaded in 86481 ms. (0)

Similar Artilces:

division by general integer using shifts
Hi all, As most here are already aware (but just in case some aren't), if you shift the digits of a number to the left or right one time then it's like multiplying or dividing the number by its radix. So if you perform a binary left shift on say: 00011011 (27 base 10) you get: 00110110 (54 base 10 or 27 * 2) So you can multiply a number by any power of two very quickly (for a computer, register shifts are much faster than mult/div. instructions) by shifting. What about numbers that are not powers of two? They can be done by distributing the operation across nu...

Division of an integer by a real number using VHDL
Hi, Is there a way to divide an integer by a real number(decimal number). Can we simply use the operator '/' as follows: eg: a <= 1234/ 1.36; where we define 'a' as an integer. Are there any specific libraries to be included for such a VHDL design file. If so, should these libraries be included in a particular order? Thankyou Can you give us a hint? If this is for a testbench then you can easily do the division using varaibles. If this is hardware then we need to know how fast and how big the vectors are. Lookup tables are fast if the numbers are small. This is...

Integer division
Hi I am looking for references on how to support general 32 bit integer division (not just invariants) on hardware without dedicated integer-division support and 64 bit floating point support. However there is 32 bit floating point support and dedicated hardware to support 32 bit math functions such as logarithm and exponential. Can someone please point me to references. I realize this question has been asked before but they seemed to pertain to invariant denominators. And invariant perhaps refers to constants... thanks dz [I'd look in Knuth's "Seminumerical Algorithms" ...

integer division
what is the best way to do an interger divsion in CL? I'm using (truncate (/ numerator denominator)) but is there a better way? I want 10/4=2 not 2.5. -jim In article <2m5gimFimlgfU1@uni-berlin.de>, Jim Newton <jimka@rdrop.com> wrote: > what is the best way to do an interger divsion in CL? > I'm using (truncate (/ numerator denominator)) but is > there a better way? > > I want 10/4=2 not 2.5. TRUNCATE takes a divisor directly. (truncate 10 4) => 2, 2. Ignore the second return value if you don't care about the remainder. Alternately use FLOOR,...

96-bit integer modulo, Athlon64 gcc 64-bit integers, libc codefor 64-bit division, etc.
Hi, To effectively solve a research problem, I need a very fast (sub-microsecond, if possible) method to determine whether one 96-bit integer is an exact multiple of another 96-bit integer. The high-level application is mostly written in Perl and is intended to be portable, but right now I am concentrating on the hardware that I have in-hand, which include 2.4 GHz Xeons, a 1.8 GHz Athlon64 ("3000+), a 1.8GHz Pentium M, and a Mac with a G5 CPU (sorry, I don't know the CPU speed). The x86s (except for the Pentium M) are running Linux, the Pentium M is running Windows XP and the Mac i...

re: Integer division
"dz" <drizzle76@gmail.com> wrote in message news:10-12-050@comp.compilers... | Hi | I am looking for references on how to support general 32 bit | integer division (not just invariants) on hardware without dedicated | integer-division support and 64 bit floating point support. However | there is 32 bit floating point support and dedicated hardware to | support 32 bit math functions such as logarithm and exponential. Can | someone please point me to references. I realize this question has | been asked before but they seemed to pertain to invariant | denominators. And invarian...

Integer Division #2
>>> a = 1 >>> b = 25 >>> a / b 0 >>> float(a) / b 0.040000000000000001 >>> >>> from __future__ import division >>> a = 1 >>> b = 25 >>> a / b 0.040000000000000001 >>> In what simple way can I get just 0.04 ? -- Anjanesh Lekshmnarayanan On Jun 19, 8:22=A0am, Anjanesh Lekshminarayanan <m...@anjanesh.net> wrote: > >>> a =3D 1 > >>> b =3D 25 > >>> a / b > 0 > >>> float(a) / b > > 0.040000000000000001 Python typically stores floats in binary,...

use integer != int()
I should have left it alone but I decided to clean up a script and replace a number of calls to "int()" in a subroutine with "use integer". I do not understand why I am getting different results with the following code... -------------------- begin script -------------------- use strict; &method_a(730791); &method_b(730791); exit; sub method_a { my(\$g) = @_; my \$y; \$y = int((10000*\$g + 14780)/3652425); printf STDOUT "Method A = %d\n", \$y; } sub method_b { use integer; my(\$g) = @_; my \$y; \$y = (10000*\$g + 1478...

python3 integer division debugging
The change in integer division seems to be the most insidious source of silent errors in porting code from python2 - since it changes the behaviour or valid code silently. I wish the interpreter had an instrumented mode to detect and report such problems. ...

Using Integers and Bit Manipulation
I have a LOT of data that I need to keep track of that can have only 1 of 2 specific values. The possible memory requirements of tracking this data are huge. Thus... I'm considering using fortrans bit computation procedures to keep track of things. Theoretically, using a default Integer type with my compiler (32 bits), I can keep track of up to 32 data items using a single default integer. However, I've not used the bit functions before and I don't want to trip over my feet while using them. Integer :: ix, iy, iz ! assume these are 32 bit integers for now ::: code ...

Re: Unexpected integer division
chrisstankevitz@gmail.com wrote: > On Nov 30, 12:00 am, chrisstankev...@gmail.com wrote: >> Can anyone think of a context/compiler for which this code should >> perform integer division y/2? > > Thanks for your responses everyone. A colleague favored y/2.0 over y/ > 2 because he felt he had been bitten by unexpected integer division > when dividing a float by an int. I figured if anyone could come up > with a scenario in which that would happen, it was this group. I > especially enjoyed "sure, the machine has no FP unit" :) Another situation, wh...

I'm having trouble using File::ReadBackwards on large files. The problem is that the line: my \$read_size = \$seek_pos % \$max_read_size || \$max_read_size ; yields -1 when \$seek_pos is large. I've "solved" the problem by commenting out the line: #use integer; But I'm sure that "use integer" wasn't put there just be ornery, so now I wonder what awful thing is going to happen because I commented it out. Any thoughts? Thanks, Xho -- -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service \$9.95/Month...

Fast big integer division
Starting point is: http://www.x86asm.net/articles/working-with-big-numbers-using-x86-instructions/ So case when one of numbers fits in register is trivial. Multiplications (karatzuba variant) y1 = x1*2^32 + n1 y2 = x2*2^32 + n2 y1*y2 = (x1*2^32 + n1) * (x2*2^32+n2) = x1*2^32*x2*2^32 + x1*n2*2^32 + x2*n1*2^32 + n1*n2 = x1*x2 * 2^32 * 2^32 + (x1*n2+x2*n1)*2^32 + n1*n2 x1*n2 (trivial) x2*n1 (trivial) n1*n2 (trivial) So this is easy. x1, x2 are big n1,n2 < 2^32 recurse until x1 or x2 <2^32 exponentiation is easy as it is shifting. Problem is how to do integer division. I have found ne...

Using integers for counters in synthesis
Hi everyone, I have a simple question. Assuming we have this process : MY_PROCESS : process(CLK) variable cnt : natural range 0 to 255; begin if rising_edge(CLK) then output_signal <= '0'; if (srst = '1') then cnt := 0; else if (cnt = 100) then output_signal <= '1'; end if; cnt := cnt + 1; end if; end if; end process; In simulation, I will get an error at the rising edge of CLK when cnt is 255. However, what happens in synthesis? Will the synthesizer add logic to prevent cnt from wrapping aroung ? Or will it simply imp...

Problem using hexadecimal [format] on integers
Consider: % format %x [expr 0x5ade3412ab] de3412ab % format %10x [expr 0x5ade3412ab] de3412ab Why doesn't [format] return the two leading digits "5a"? I'm probably overlooking something, but can't see what. Any help greatly appreciated, Erik -- leunissen@ nl | Merge the left part of these two lines into one, e. hccnet. | respecting a character's position in a line. Erik Leunissen a �crit : > Consider: > > % format %x [expr 0x5ade3412ab] > de3412ab > % format %10x [expr 0x5ade3412ab] > de3412ab > > Why doesn'...

how to convert an integer to std_logic_vector using vhdl
Hello everyone, I am trying to divite a std_logic_vector by a std_logic_vector. So i converted both of them into integers and divided. But how do i convert back to std_logic_vector. 1) Is the integer type or the division" / " synthesizable in xilinx ISE. 2) if i use integer as my output port and when i download my code onto the fpga, does it convert back to the binary. If yes to how many bits. I appreciate if you could answer these questions thanks Ashwin ashwin wrote: > Hello everyone, > I am trying to divite a std_logic_vector by a std_logic_vector. > So i conv...