COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Math logic

• Follow

```I am new to using division in my code.

I am trying to come up with the logic to convert milliseconds to
seconds,minutes, and hours.

This is what I have.
Alright, let me have it. :-)

; seconds = milliseconds/1000
; minutes = seconds/60
; hours = minutes/60
; days = hours/24

Ex. 1,514,047 milliseconds

Divide result by 1000,

if less than 1, answer is in milliseconds
if greater than 1 or less than 60, answer is in seconds
if 60 or less than 3600, answer is in minutes
if 3600 or less than 216000, answer is in hours
```
 0

```In article <f1a3fa55-cee2-4605-9fbc-215e915c290e@e5g2000yqn.googlegroups.com>,
Andy <chocolatemint77581@nospicedham.yahoo.com> writes:
> I am new to using division in my code.
>
> I am trying to come up with the logic to convert milliseconds to
> seconds,minutes, and hours.
>
> This is what I have.
> Alright, let me have it. :-)
>
> ; seconds = milliseconds/1000
> ; minutes = seconds/60
> ; hours = minutes/60
> ; days = hours/24
>
> Ex. 1,514,047 milliseconds
>
> Divide result by 1000,
>
> if less than 1, answer is in milliseconds
> if greater than 1 or less than 60, answer is in seconds
> if 60 or less than 3600, answer is in minutes
> if 3600 or less than 216000, answer is in hours

In order to prevent division overflow, it is better to start dividing by
a large numer, e.g. the number of milliseconds in a day and then divide
by ms/hour ms/minute and finally by 1000.

If you use 32b arithmetic then the largest time interval in ms that can
be represented in 32 bits is +- 49 days. If that is enough for your
purpose then set EDX to 0, EAX to the input number, some other register
to 1000 and divide.

In order to handle larger intervals such as 50 days you'll need to use
64 bit integers. Set EDX to the most significant 32b of the input, EAX
to the last significant and divide.

However, this will not work if you need to handle large intervals such
as 100000 days. In that case the result of dividing by 1000 will not
fit into 32 bits and you get an overflow error.

If you were to divide by 1000*60*60*24 first then the result should
fit.

; EDX:EAX               64b input in ms

mov  ECX, 1000*60*60*24
div  ECX                ; EAX=days EDX=remainder in ms

; if eax <> 0 result is in days, otherwise

mov  EAX, EDX
mov  EDX, 0
mov  ECX, 1000*60*60
div  ECX                ; EAX=hours EDX=remainder in ms

; if eax <> 0 result is in hours, otherwise

mov  EAX, EDX
mov  EDX, 0
mov  ECX, 1000*60
div  ECX                ; EAX=minutes EDX=remainder in ms

etc cetera
```
 0

```Andy wrote:
> I am new to using division in my code.

Look out! "div ebx", for example, divides edx:eax by ebx, not just eax.
If you leave "garbage" in edx, the result may not fit in ebx. This
causes an exception. Dos used to report this as "divide by zero".
("What? I didn't divide by zero!") Linux reports it as "floating point
exception". ("What? I didn't use floating point!") Can be a very
puzzling error!

> I am trying to come up with the logic to convert milliseconds to
> seconds,minutes, and hours.
>
> This is what I have.
> Alright, let me have it. :-)
>
> ; seconds = milliseconds/1000
> ; minutes = seconds/60
> ; hours = minutes/60
> ; days = hours/24
>
> Ex. 1,514,047 milliseconds
>
> Divide result by 1000,
>
> if less than 1, answer is in milliseconds
> if greater than 1 or less than 60, answer is in seconds
> if 60 or less than 3600, answer is in minutes
> if 3600 or less than 216000, answer is in hours

What's wrong with leaving it in milliseconds?

Best,
Frank

A deliberately perverse mixture of C and asm - if you changed the couple
Linux-specific calls to C (or 'doze APIs), it'd probably work on
Windows, too.

; get month in Linux
; mixed asm and C :)

; nasm -f elf timec.asm
; ld -o timec timec.o -I/lib/ld-linux.so.2 -lc

global _start

extern strftime
extern localtime_r

;----------------------------------------------
; some structures for "time stuff"
; note that these are mere "typedefs" - no memory allocated

struc tv
tv_sec resd 1
tv_usec resd 1
endstruc

struc tz
tz_minuteswest resd 1
tz_dsttime resd 1          ; obsolete! do not use!
endstruc

struc tm
tm_sec resd 1              ; Seconds. [0-60] (1 leap second)
tm_min resd 1              ; Minutes. [0-59]
tm_hour resd 1             ; Hours. [0-23]
tm_mday resd 1             ; Day. [1-31]
tm_mon resd 1              ; Month. [0-11]
tm_year resd 1             ; Year - 1900.
tm_wday resd 1             ; Day of week. [0-6]
tm_yday resd 1             ; Day of year.[0-365]
tm_isdst resd 1            ; DST. [-1/0/1]
endstruc

%define OUTBUFSIZ 100h

section .bss

; reserve memory for our structures
timeval resb tv_size
timezone resb tz_size
the_time resb tm_size
; etc...
outbuf resb OUTBUFSIZ

section .data

; note that the "back apostrophes" allow us to use "\n" etc.
; a fairly "new" Nasm feature...
;
; as requested, the month.

the_format db `%B! :)\n`, 0

; or give strftime a bit more of a workout...

;    the_format db `%A, %B %d, %G\n%l:%M %p - %D\n`, 0

;---------------------------------

section .text
_start:

; find out what time it is

mov eax, 78         ; __NR_gettimeofday
mov ebx, timeval    ; fills in these structures
mov ecx, timezone
int 80h
cmp eax, -4096
ja exit ; go wind your clock

; make C do the difficult long division

push the_time ; tm structure to fill
push timeval  ; tv structure to work from
call localtime_r

; and the tedious text lookup...

push the_time   ; tm structure to work from
push the_format ; format string in strftime syntax
push OUTBUFSIZ  ; max
push outbuf     ; buffer to fill
call strftime

; we can print it without C's help...

mov edx, eax ; count, returned by strftime
mov ecx, outbuf
mov ebx, 1 ; STDOUT
mov eax, 4 ; __NR_write
int 80h

exit:
mov eax, 1 ; __NR_exit
int 80h
;--------------------------

```
 0

```Andy posted

> I am new to using division in my code.

> I am trying to come up with the logic to convert milliseconds to
> seconds, minutes and hours.

> This is what I have.
> Alright, let me have it. :-)

> ; seconds = milliseconds/1000
> ; minutes = seconds/60
> ; hours = minutes/60
> ; days = hours/24

> Ex. 1,514,047 milliseconds

> Divide result by 1000,

> if less than 1, answer is in milliseconds
> if greater than 1 or less than 60, answer is in seconds

and how would you name it if it's exact 1  ? :)
Ok, less than 60 covers that, but check '>1' become redundant
because of the previous else 'not less than 1' implies >=1.

> if 60 or less than 3600, answer is in minutes
> if 3600 or less than 216000, answer is in hours

Oh, your days got 60 hours :)

If your target is watch format conversion, ie: '23:59:59'
And IIRC a whole day got 05265c00h (86400000) mSec, so this
would fit 32-bits for 49.71 days, almost a full month here :)

__
wolfgang

```
 0

```"Andy" <chocolatemint77581@nospicedham.yahoo.com> ha scritto nel messaggio
>I am new to using division in my code.
>
> I am trying to come up with the logic to convert milliseconds to
> seconds,minutes, and hours.

are millisecond in a 64 bit variable or in one 32 bit variable?
i implemented one data class in C++ with operators +, -, <, <= etc
and use it and its functions for every date,
time etc problem in the few language i know C, C++, asm.

> This is what I have.
> Alright, let me have it. :-)
>
> ; seconds = milliseconds/1000
> ; minutes = seconds/60
> ; hours = minutes/60
> ; days = hours/24
>
> Ex. 1,514,047 milliseconds
>
> Divide result by 1000,
>
> if less than 1, answer is in milliseconds
> if greater than 1 or less than 60, answer is in seconds
> if 60 or less than 3600, answer is in minutes
> if 3600 or less than 216000, answer is in hours

```
 0

```Andy wrote:

> I am new to using division in my code.
>
> I am trying to come up with the logic to convert milliseconds to
> seconds,minutes, and hours.
>
> This is what I have.
> Alright, let me have it. :-)
>
> ; seconds = milliseconds/1000
> ; minutes = seconds/60
> ; hours = minutes/60
> ; days = hours/24
>
> Ex. 1,514,047 milliseconds
>
> Divide result by 1000,
>
> if less than 1, answer is in milliseconds
> if greater than 1 or less than 60, answer is in seconds
> if 60 or less than 3600, answer is in minutes
> if 3600 or less than 216000, answer is in hours

In modern code, division is replaced by (faster)
multiplication with reciprocals:

....
movl (input_in_ms),%ecx
movl \$0x4A90BE59,%edx

# test if valid:

cmpl \$0x36EE7FFF,%ecx
ja too_large

# extract hours:

mull %edx
shrl \$0x14,%edx
movl \$0x0036EE80,%eax
movl %edx,%esi
mull %edx
subl %eax,%ecx

# ECX = remaining milliseconds
# ESI = full hours

# extract minutes:

movl \$0x45E7B273,%edx
movl %ecx,%eax
mull %edx
shrl \$0x0E,%edx
movl \$0xEA60,%eax
movl %edx,%edi
mull %edx
subl %eax,%ecx

# ECX = remaining milliseconds
# EDI = full minutes

# extract seconds:

movl \$10624DD3,%eax
mull %ecx
shrl \$0x06,%edx
movl \$0x03E8,%eax
movl %edx,%ebx
mull %edx
subl %eax,%ecx
....

# EBX = full seconds
# ECX = milliseconds

This code compares the input against the maximum
(255 h, 59 m, 59 s, 999 ms) and jumps to a label
called too_large if it is above. The first stage
divides by 3,600,000, leaving full hours in ESI.
The second stage divides by 60,000, leaving full
minutes in EDI. The last stage divides by 1,000,
full seconds are in EBX and the remaining milli-
seconds are in ECX.

Insert some comparisons between the stages if it
is required to emit different formats (h, m, s,
ms). ECX always is the remainder of the previous
stage. Test if ECX is zero after each step, then
display the content of ESI, EDI, EBX or ECX.

The sample code should execute about three times
faster than any solution using DIV opcodes.

Greetings from Augsburg

Bernhard Schornak
```
 0

```Andy wrote:
> I am new to using division in my code.
>
> I am trying to come up with the logic to convert milliseconds to
> seconds,minutes, and hours.

From what you write further down, it seems like you are only supposed
to print out the minimum number of terms needed?

Anyway, it still makes sense to first do a full conversion into 4 terms:

I.e.

void ms2hms_ms(unsigned ms, unsigned *hms_ms)
{
unsigned sec = ms / 1000;
ms -= sec * 1000;
hms_ms[3] = ms;

unsigned min = sec / 60;
sec -= min * 60;
hms_ms[2] = sec;

unsigned hour = min / 60;
min -= hour * 60;
hms_ms[1] = min;
hms_ms[0] = hour;
}

The only reason for doing something like this in asm is to make it fast,
right?

Well, in that case we can do some really strange tricks, like
calculating the hours first, then get all the other results by shift,
sub & shift:

mov eax,[ms]
mov ebx,[hms_ms]
;; Scaled reciprocal of ms in 1 hour, rounded up
mov edx, ((1 shl 32) + 3599999) / 3600000
mul edx

;; At this point EDX has # of hours, EAX has the fractional hours
mov ecx,eax
mov [ebx],edx		;; Save hour
shr ecx,4
sub eax,ecx
mov ecx,eax
shr eax,26
and ecx,03ffffffh	;; min/sec/ms fraction
mov [ebx+4],eax	;; Save min
mov eax,ecx
shr ecx,4
sub eax,ecx
mov ecx,eax
shr eax,20
and ecx,0fffffh
mov [ebx+8],eax	;; Save sec
lea eax,[ecx+ecx*2]	;; ms_fraction * 3
shl ecx,7		;; ms_fraction * 128
sub ecx,eax		;; -- " -- * 125
shr ecx,18
mov [ebx+12],ecx

Assuming this code works, it should do the full 4-way conversion in

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
```
 0

```Terje Mathisen wrote:

<snip>

> mov eax,[ms]
> mov ebx,[hms_ms]
> ;; Scaled reciprocal of ms in 1 hour, rounded up
> mov edx, ((1 shl 32) + 3599999) / 3600000

Stupid question: What is (1 shl 32) good for?

As described in my docs, processors mask out all
bits above 0x1F, so the above shift left is zero
in any case - even if 32 was not masked to zero,
shifting a 32 bit register left (or right) by 32
still is zero (except SAR, where the sign bit is
shifted in).

Greetings from Augsburg

Bernhard Schornak
```
 0

```On Jun 7, 1:39=A0am, Bernhard Schornak <schor...@nospicedham.web.de>
wrote:
> Terje Mathisen wrote:
>
> <snip>
>
> > mov eax,[ms]
> > mov ebx,[hms_ms]
> > ;; Scaled reciprocal of ms in 1 hour, rounded up
> > mov edx, ((1 shl 32) + 3599999) / 3600000
>
> Stupid question: What is (1 shl 32) good for?
>
> As described in my docs, processors mask out all
> bits above 0x1F, so the above shift left is zero
> in any case - even if 32 was not masked to zero,
> shifting a 32 bit register left (or right) by 32
> still is zero (except SAR, where the sign bit is
> shifted in).

It depends on the particular assembler implementation (just as well as
1<<(sizeof(int)*CHAR_BIT) depends on the particular C/C++
implementation because shifts of ints by sizeof(int)*CHAR_BIT or more
per the C/C++ standard are either undefined or implementation-specific
and there may be special cases for signed ints (which 1 typically
is)).

Alex
```
 0

```Bernhard Schornak wrote:
> Terje Mathisen wrote:
>
> <snip>
>
>> mov eax,[ms]
>> mov ebx,[hms_ms]
>> ;; Scaled reciprocal of ms in 1 hour, rounded up
>> mov edx, ((1 shl 32) + 3599999) / 3600000
>
> Stupid question: What is (1 shl 32) good for?

This is a (symbolic) constant, it is, as you note, quite possible that a
given assembler won't be able to calculate this at assembly time, i
which case you'll have to use a 32-bit constant instead...

Anyway, the proper reciprocal will probably need more significant bits
anyway, so you need an even higher shift count that definitely requires
64-bit processing for the calculation.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
```
 0

```Alexei A. Frounze wrote:

> On Jun 7, 1:39 am, Bernhard Schornak wrote:
>> Terje Mathisen wrote:
>>
>> <snip>
>>
>>> mov eax,[ms]
>>> mov ebx,[hms_ms]
>>> ;; Scaled reciprocal of ms in 1 hour, rounded up
>>> mov edx, ((1 shl 32) + 3599999) / 3600000
>>
>> Stupid question: What is (1 shl 32) good for?
>>
>> As described in my docs, processors mask out all
>> bits above 0x1F, so the above shift left is zero
>> in any case - even if 32 was not masked to zero,
>> shifting a 32 bit register left (or right) by 32
>> still is zero (except SAR, where the sign bit is
>> shifted in).
>
> It depends on the particular assembler implementation (just as well as
> 1<<(sizeof(int)*CHAR_BIT) depends on the particular C/C++
> implementation because shifts of ints by sizeof(int)*CHAR_BIT or more
> per the C/C++ standard are either undefined or implementation-specific
> and there may be special cases for signed ints (which 1 typically
> is)).

logic behind this construct.

GCC uses the idiv (udiv) algorithm introduced in
the AMD Optimisation Guide to calculate recipro-
cals, but Terje's code doesn't work with the EAX
(remainder) emitted by that multiplication. It's
a fast algorithm if it works, but: It depends on
the remainder stored in EAX - which obviously is
not working with a wrong 'divisor'. I'm not that
good with math stuff, so I cannot figure out how
to extract the remainder from the content of EAX
after the multiply.

My 'divisor' : 0x95217CB1
EAX after MUL: 0xED89E8F8

Would be great if I could replace multiple MULs,
see the code I posted, with something faster. At
least, my slow-bloat throws proper results... ;)

Greetings from Augsburg

Bernhard Schornak
```
 0

```Bernhard Schornak wrote:
> Alexei A. Frounze wrote:
>
>> On Jun 7, 1:39 am, Bernhard Schornak wrote:
>>> Terje Mathisen wrote:
>>>
>>> <snip>
>>>
>>>> mov eax,[ms]
>>>> mov ebx,[hms_ms]
>>>> ;; Scaled reciprocal of ms in 1 hour, rounded up
>>>> mov edx, ((1 shl 32) + 3599999) / 3600000
>>>
>>> Stupid question: What is (1 shl 32) good for?
>>>
>>> As described in my docs, processors mask out all
>>> bits above 0x1F, so the above shift left is zero
>>> in any case - even if 32 was not masked to zero,
>>> shifting a 32 bit register left (or right) by 32
>>> still is zero (except SAR, where the sign bit is
>>> shifted in).
>>
>> It depends on the particular assembler implementation (just as well as
>> 1<<(sizeof(int)*CHAR_BIT) depends on the particular C/C++
>> implementation because shifts of ints by sizeof(int)*CHAR_BIT or more
>> per the C/C++ standard are either undefined or implementation-specific
>> and there may be special cases for signed ints (which 1 typically
>> is)).
>
> logic behind this construct.
>
> GCC uses the idiv (udiv) algorithm introduced in
> the AMD Optimisation Guide to calculate recipro-

This algorithm was re-invented by myself and Agner Fog a long time
before AMD picked it up. :-)

> cals, but Terje's code doesn't work with the EAX
> (remainder) emitted by that multiplication. It's
> a fast algorithm if it works, but: It depends on
> the remainder stored in EAX - which obviously is
> not working with a wrong 'divisor'. I'm not that
> good with math stuff, so I cannot figure out how
> to extract the remainder from the content of EAX
> after the multiply.

I have used this particular trick in another piece of code that AMD
"stole" (i.e. used without attribution) for their optimization guides.

After several years, they've now removed it afaik, but I've never gotten
any explanations or "we're sorry we removed your name".

The code in question can convert a full 32-bit unsigned value into 10
decimal digits in just a tiny bit more time than what's needed for a
single DIV.

>
> My 'divisor' : 0x95217CB1
> EAX after MUL: 0xED89E8F8
>
> Would be great if I could replace multiple MULs,
> see the code I posted, with something faster. At
> least, my slow-bloat throws proper results... ;)

That's always good.

The nice thing about these algorithms is that the solution space is so
small that it is very quick to do an exhaustive test of all possible inputs!

I.e. if the ms value is limited to the 1-day range ([0..86399999], then
we can try all of these values and verify the answers.

To get back to the original problem: How about writing a full binary to
ascii conversion, returning something like

03:45:37.123

for the corresponding number of ms?

The input value can have up to 8 significant digits, so we can start by
splitting the value around 10000:

temp64 = ms * recip_10k;

The top 16 bits contain the number of 10-second periods, while the
remaining 48 bits is the number of fractional such periods.

We can now convert the bottom 4 digits by masking away the integer part,
multiplying by 5 (LEA) and moving the mask to get the factors of 2:

hi_frac = (unsigned) (temp64 >> 20) & 0x0fffffff; // 28 bits
hi_frac += hi_frac*4; // hi_frac = LEA(hi_frac, hi_frac*4)
result[7] = (hi_frac >> 27) + '0'; // seconds
hi_frac &= 0x07ffffff;
hi_frac += hi_frac*4;
result[9] = (hi_frac >> 26) + '0'; // 10ths
hi_frac &= 0x03ffffff;
hi_frac += hi_frac*4;
result[10] = (hi_frac >> 25) + '0'; // 100s
hi_frac &= 0x01ffffff;
hi_frac += hi_frac*4;
result[11] = (hi_frac >> 24) + '0'; // 1000s

At the same time we need to convert the upper digits:

lo_frac = (unsigned) (temp64 >> 48);
lo_frac *= recip_3600; // Scaled reciprocal for the number of
// 10-second periods in 10 hours...
result[0] = (lo_frac >> 29) + '0';
lo_frac &= 0x1fffffff;
lo_frac += lo_frac*4;
result[1] = (lo_frac >> 28) + '0';
lo_frac &= 0x0fffffff;

// Next is the 10s of minutes, so scale by 6 (3) instead of 10 (5)

lo_frac += lo_frac*2;
result[3] = (lo_frac >> 27) + '0';
lo_frac &= 0x07ffffff;

// 2nd digit minutes:

lo_frac += lo_frac*4;
result[4] = (lo_frac >> 26) + '0';
lo_frac &= 0x03ffffff;

// 1st digit seconds

lo_frac += lo_frac*2;
result[6] = (lo_frac >> 25) + '0';

We will of course interleave the hi and lo parts of the algorithm, since
this makes it almost twice as fast. :-)

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
```
 0

```On Jun 7, 10:56=A0am, Terje Mathisen <"terje.mathisen at
tmsw.no"@giganews.com> wrote:
> Bernhard Schornak wrote:
> > Alexei A. Frounze wrote:
>
> >> On Jun 7, 1:39 am, Bernhard Schornak wrote:
>
> We will of course interleave the hi and lo parts of the algorithm, since
> this makes it almost twice as fast. :-)
>
> Terje
> --
> - <Terje.Mathisen at tmsw.no>
> "almost all programming can be viewed as an exercise in caching"

Thanks to everyone for their help and feedback.

This is what I ended up with.

; ptime.asm by QWord
;
include \masm32\include\masm32rt.inc

..code
main proc
LOCAL sui:STARTUPINFO
LOCAL pi:PROCESS_INFORMATION
LOCAL creationTime:QWORD
LOCAL exitTime:QWORD
LOCAL sysTime:QWORD
LOCAL runTime:REAL8
LOCAL ppwszCommand:PVOID
LOCAL pwszName:PWCHAR
LOCAL nArgs:SDWORD
LOCAL time[5]:DWORD

mov ppwszCommand,rv(CommandLineToArgvW,ASM(mov
.if nArgs !=3D 2
print "missing argumnet",13,10
ret
.endif
m2m pwszName,[eax+4]

invoke memfill,ADDR sui,SIZEOF sui,0	; fill STARTUPINFO
mov sui.cb,SIZEOF sui					;
.if rv(WaitForSingleObject,pi.hProcess,INFINITE) !=3D WAIT_FAILED

mov eax,DWORD ptr exitTime[0]		; calc time difference
mov edx,DWORD ptr exitTime[4]       ; =3D run time
sub eax,DWORD ptr creationTime[0]   ;
sbb edx,DWORD ptr creationTime[4]   ;

mov ecx,10000		; div by 10*1000
div ecx				; eax =3D ms

mov ecx,1000		;
xor edx,edx         ;
div ecx             ; eax =3D s , edx =3D ms
mov time[0*4],edx   ;

mov ecx,60          ;
xor edx,edx         ;
div ecx             ; eax =3D m , edx =3D s
mov time[1*4],edx   ;

mov ecx,60          ;
xor edx,edx         ;
div ecx             ; eax =3D h , edx =3D m
mov time[2*4],edx   ;

mov ecx,24          ;
xor edx,edx         ;
div ecx             ; eax =3D d , edx =3D h
mov time[3*4],edx   ;
mov time[4*4],eax   ;

print "          dd:hh:mm:ss:_ms",13,10
fn crt_printf,"run time: %02d:%02d:%02d:%02d:
%03d",time[3*4],time[3*4],time[2*4],time[1*4],time[0*4]
print chr\$(13,10)
.else
print "GetProcessTimes fails",13,10
.endif
.else
print "WaitForSingelObject fails",13,10
.endif
invoke CloseHandle,pi.hProcess
.else
.endif

ret
main endp
end main
```
 0

```Terje Mathisen wrote:

> Bernhard Schornak wrote:
>> Alexei A. Frounze wrote:
>>
>>> On Jun 7, 1:39 am, Bernhard Schornak wrote:
>>>> Terje Mathisen wrote:
>>>>
>>>> <snip>
>>>>
>>>>> mov eax,[ms]
>>>>> mov ebx,[hms_ms]
>>>>> ;; Scaled reciprocal of ms in 1 hour, rounded up
>>>>> mov edx, ((1 shl 32) + 3599999) / 3600000
>>>>
>>>> Stupid question: What is (1 shl 32) good for?
>>>>
>>>> As described in my docs, processors mask out all
>>>> bits above 0x1F, so the above shift left is zero
>>>> in any case - even if 32 was not masked to zero,
>>>> shifting a 32 bit register left (or right) by 32
>>>> still is zero (except SAR, where the sign bit is
>>>> shifted in).
>>>
>>> It depends on the particular assembler implementation (just as well as
>>> 1<<(sizeof(int)*CHAR_BIT) depends on the particular C/C++
>>> implementation because shifts of ints by sizeof(int)*CHAR_BIT or more
>>> per the C/C++ standard are either undefined or implementation-specific
>>> and there may be special cases for signed ints (which 1 typically
>>> is)).
>>
>> logic behind this construct.
>>
>> GCC uses the idiv (udiv) algorithm introduced in
>> the AMD Optimisation Guide to calculate recipro-
>
> This algorithm was re-invented by myself and Agner Fog a long time
> before AMD picked it up. :-)

Sorry - I did not know that AMD 'lent' these algo-
rithms from somewhere else. I just know the manual
by heart. That is: I know where I've to look at if
I don't remember something particular...

>> cals, but Terje's code doesn't work with the EAX
>> (remainder) emitted by that multiplication. It's
>> a fast algorithm if it works, but: It depends on
>> the remainder stored in EAX - which obviously is
>> not working with a wrong 'divisor'. I'm not that
>> good with math stuff, so I cannot figure out how
>> to extract the remainder from the content of EAX
>> after the multiply.
>
> I have used this particular trick in another piece of code that AMD
> "stole" (i.e. used without attribution) for their optimization guides.
>
> After several years, they've now removed it afaik, but I've never gotten
> any explanations or "we're sorry we removed your name".
>
> The code in question can convert a full 32-bit unsigned value into 10
> decimal digits in just a tiny bit more time than what's needed for a
> single DIV.
>
>>
>> My 'divisor' : 0x95217CB1
>> EAX after MUL: 0xED89E8F8
>>
>> Would be great if I could replace multiple MULs,
>> see the code I posted, with something faster. At
>> least, my slow-bloat throws proper results... ;)
>
> That's always good.
>
> The nice thing about these algorithms is that the solution space is so
> small that it is very quick to do an exhaustive test of all possible
> inputs!
>
> I.e. if the ms value is limited to the 1-day range ([0..86399999], then
> we can try all of these values and verify the answers.
>
> To get back to the original problem: How about writing a full binary to
> ascii conversion, returning something like
>
> 03:45:37.123
>
> for the corresponding number of ms?
>
> The input value can have up to 8 significant digits, so we can start by
> splitting the value around 10000:
>
> temp64 = ms * recip_10k;
>
> The top 16 bits contain the number of 10-second periods, while the
> remaining 48 bits is the number of fractional such periods.
>
> We can now convert the bottom 4 digits by masking away the integer part,
> multiplying by 5 (LEA) and moving the mask to get the factors of 2:
>
> hi_frac = (unsigned) (temp64 >> 20) & 0x0fffffff; // 28 bits
> hi_frac += hi_frac*4; // hi_frac = LEA(hi_frac, hi_frac*4)
> result[7] = (hi_frac >> 27) + '0'; // seconds
> hi_frac &= 0x07ffffff;
> hi_frac += hi_frac*4;
> result[9] = (hi_frac >> 26) + '0'; // 10ths
> hi_frac &= 0x03ffffff;
> hi_frac += hi_frac*4;
> result[10] = (hi_frac >> 25) + '0'; // 100s
> hi_frac &= 0x01ffffff;
> hi_frac += hi_frac*4;
> result[11] = (hi_frac >> 24) + '0'; // 1000s
>
> At the same time we need to convert the upper digits:
>
> lo_frac = (unsigned) (temp64 >> 48);
> lo_frac *= recip_3600; // Scaled reciprocal for the number of
> // 10-second periods in 10 hours...
> result[0] = (lo_frac >> 29) + '0';
> lo_frac &= 0x1fffffff;
> lo_frac += lo_frac*4;
> result[1] = (lo_frac >> 28) + '0';
> lo_frac &= 0x0fffffff;
>
> // Next is the 10s of minutes, so scale by 6 (3) instead of 10 (5)
>
> lo_frac += lo_frac*2;
> result[3] = (lo_frac >> 27) + '0';
> lo_frac &= 0x07ffffff;
>
> // 2nd digit minutes:
>
> lo_frac += lo_frac*4;
> result[4] = (lo_frac >> 26) + '0';
> lo_frac &= 0x03ffffff;
>
> // 1st digit seconds
>
> lo_frac += lo_frac*2;
> result[6] = (lo_frac >> 25) + '0';
>
> We will of course interleave the hi and lo parts of the algorithm, since
> this makes it almost twice as fast. :-)

Thank you for this comprehensive description! I am
porting my OS/2 libraries to 64 bit Windows at the
moment, adding as many improvements as possible. A
faster conversion from binary numbers to different
formats and vice versa is interesting - especially
if multiple multiplications are replaced by simple
one clock operations. I'll probably figure out how
to apply the described algorithm to some real con-
version functions. If they work, their origin cer-
tainly is mentioned in my sources and docs! ;)

Greetings from Augsburg

Bernhard Schornak
```
 0

13 Replies
284 Views

Similiar Articles:

7/26/2012 2:38:32 PM