COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

Binary Log Routine

• Follow

```Where can I find a quick routine in assembler that will calculate the
log-base-two of an 8-bit number and a 16-bit number without a lookup table
if possible.

Even if it only indicated the nearest power-of-two might be helpful.
Thanks.

```
 0

```DCamera wrote:
> Where can I find a quick routine in assembler that will calculate the
> log-base-two of an 8-bit number and a 16-bit number without a lookup table
> if possible.
>
> Even if it only indicated the nearest power-of-two might be helpful.

What do you want to do with the result?

```
 0

```DCamera wrote:
> Where can I find a quick routine in assembler that will calculate the
> log-base-two of an 8-bit number and a 16-bit number without a lookup table
> if possible.
>
> Even if it only indicated the nearest power-of-two might be helpful.
> Thanks.

The BSR instruction finds the top bit set of a 16/32-bit value, which
is the nearest log-base-two below what you want, if that's any use.

Bart

```
 0

```DCamera wrote:

> Where can I find a quick routine in assembler that will calculate the
> log-base-two of an 8-bit number and a 16-bit number without a lookup table
> if possible.
>
> Even if it only indicated the nearest power-of-two might be helpful.

What result do you expect from foo(10)?
3?
3.32...
4?

The BSR instruction might prove useful...

"Searches the value in a register or a memory location (second operand)
for the most-significant set bit. If a set bit is found, the instruction
clears the zero flag (ZF) and stores the index of the most-significant
set bit in a destination register (first operand). If the second operand
contains 0, the instruction sets ZF to 1 and does not change the
contents of the destination register. The bit index is an unsigned
offset from bit 0 of the searched value."

```
 0

```DCamera wrote:
> Where can I find a quick routine in assembler that will calculate the
> log-base-two of an 8-bit number and a 16-bit number without a lookup table
> if possible.
>
> Even if it only indicated the nearest power-of-two might be helpful.
> Thanks.
>
; -------------------
; 8-bit
; -----
mov cl, 7
mov al, number

loop:
rcl al
jc found
dec cl
jns loop

found:                ; cl = Log (number) in base 2

; --------------------
; 16-bit
; ------
mov cl, 15
mov ax, number

loop:
rcl ax
jc found
dec cl
jns loop

found:                 ; cl = Log (number) in base 2
; --------------------

Not tested :)

For 386+, see BSF ??or  BSR

Jacques

```
 0

```One application would be to optimize a division routine by knowing how large
the number is. For example if you have a 32x32 bit division routine but when
called the values are say 100 / 10 then there is no need to loop through all
32 bits.

Might also come-in-handy for making approximations
For example A/B = AntiLog[(log(A) - Log(b)]

In general just a quick-and-dirty way to know the size of binary numbers
before going to work on them. I was just hoping there might be some neat
trick that had been worked out amongst the pro ASM programmers.

For example: I tried playing with A&(A-1)=0 iff A is power-of-two.

The obvious routine would be to left-shift the numbers until the first "1"
(non-zero) appeared at left-end, but I was hoping for something faster that
would avoid the shifting & looping. I could look at sign-bit and that would
save one time around the loop rather than checking the carry flag.

I was looking for a super-optimized square-root routine also, I'm pretty
sure I have one kicking around here somewhere that only about 8 lines long.

Division and Square-Root seem to be the two routines that slow most
applications down now that there are fairly fast MULs. The AVR has a MUL
that takes 2 clock cycles, I don't remember what the Intel's is but I'm sure
it's no where as bad as a DIV or SQRT.

"Jeffrey Schwab" <spamtrap@crayne.org> wrote in message
news:bl7bg.6813\$JW5.5197@southeast.rr.com...
> DCamera wrote:
> > Where can I find a quick routine in assembler that will calculate the
> > log-base-two of an 8-bit number and a 16-bit number without a lookup
table
> > if possible.
> >
> > Even if it only indicated the nearest power-of-two might be helpful.
>
> What do you want to do with the result?
>

```
 0

```I want to use it to determine the size of a binary number.

ie: Binary Length = log2(X)+1

Obvious solution is to loop & left-shift until a non-zero appears, but I was
hoping that maybe someone had found something faster.

For example to design an optimized 32x32 divide it might be good to know
that two numbers are small and resort to a much faster 8x8 division instead.

"Jeffrey Schwab" <spamtrap@crayne.org> wrote in message
news:bl7bg.6813\$JW5.5197@southeast.rr.com...
> DCamera wrote:
> > Where can I find a quick routine in assembler that will calculate the
> > log-base-two of an 8-bit number and a 16-bit number without a lookup
table
> > if possible.
> >
> > Even if it only indicated the nearest power-of-two might be helpful.
>
> What do you want to do with the result?
>

```
 0

```DCamera <spamtrap@crayne.org> wrote in part:
> I was looking for a super-optimized square-root routine also, I'm pretty
> sure I have one kicking around here somewhere that only about 8 lines long.

sqrt() is easy:  just find 1/sqrt() via Newton-Raphson.
It converges very quickly.

> Division and Square-Root seem to be the two routines that slow most

AFAIK, FP div is done through 1/sqrt().

-- Robert

```
 0

```Thank you very much.

On the Intel I'm using A86 and don't recall any RCL command, I'm assuming
it's a LSL AL,1 ; Logical Shift Left by one. So you just keep going until a
one appears in MSB. I was hoping there might be a faster way than looping if
number was fairly large 32/64+ bits, that's a lot of time spent shifting
8-bits at-a-time each time through loop.

PS. I have no idea what BSF or BSR are. This is for 8 bit routines. No 286+
commands.

"Erdemal" <spamtrap@crayne.org> wrote in message
> DCamera wrote:
> > Where can I find a quick routine in assembler that will calculate the
> > log-base-two of an 8-bit number and a 16-bit number without a lookup
table
> > if possible.
> >
> > Even if it only indicated the nearest power-of-two might be helpful.
> > Thanks.
> >
> ; -------------------
> ; 8-bit
> ; -----
>        mov cl, 7
>        mov al, number
>
>        loop:
>           rcl al
>           jc found
>           dec cl
>           jns loop
>
>        found:                ; cl = Log (number) in base 2
>
> ; --------------------
> ; 16-bit
> ; ------
>       mov cl, 15
>       mov ax, number
>
>       loop:
>           rcl ax
>           jc found
>           dec cl
>           jns loop
>
>       found:                 ; cl = Log (number) in base 2
> ; --------------------
>
> Not tested :)
>
> For 386+, see BSF ??or  BSR
>
> Jacques
>

```
 0

```In article <e4kvtm\$1tp\$1@utornnr1pp.grouptelecom.net>,
spamtrap@crayne.org says...
> I want to use it to determine the size of a binary number.
>
> ie: Binary Length = log2(X)+1
>
> Obvious solution is to loop & left-shift until a non-zero appears, but I was
> hoping that maybe someone had found something faster.
>
> For example to design an optimized 32x32 divide it might be good to know
> that two numbers are small and resort to a much faster 8x8 division instead.

Here, an approximation of the magnitude is probably
sufficient. I'd consider cheating and (for example) only
attempting to figure it out to the nearest 8-bit
boundary.

Probably the most obvious way to do that would be
bisection: Start by splitting the word in two, and figure
out if the upper half is zero. Having found which half of
the word contains the most significant set bit, you can
divide that half into halves again, and so on until you
know it as accurately as you want.

test eax, eax
jz its_zero
test eax, 0ffff0000h
jnz top16
test ah, ah
jnz byte1	; at least one bit in AH is set
jmp byte0	; only bits in AL are set
top16:
test eax, 0ff000000h
jnz byte3	; at least one bit in top byte is set
jmp byte2	; only bits in bottom three bytes set
; continue further if necessary.

Keep in mind that finer granularity takes more time and
more code. You can keep the code from getting a lot
bigger by next moving the appropriate byte into AL, and
using merged code to find the bit inside of that byte.
That only helps space, not speed, though.

--
Later,
Jerry.

The universe is a figment of its own imagination.

```
 0

```Thanks, I'm going to play around with the concept of your routine.
It still loops, but at least it does not converge linearly,
so it should be a big improvement as the numbers get longer.

"Jerry Coffin" <spamtrap@crayne.org> wrote in message
news:MPG.1ed7de9b357301279897d6@news.sunsite.dk...
> In article <e4kvtm\$1tp\$1@utornnr1pp.grouptelecom.net>,
> spamtrap@crayne.org says...
> > I want to use it to determine the size of a binary number.
> >
> > ie: Binary Length = log2(X)+1
> >
> > Obvious solution is to loop & left-shift until a non-zero appears, but I
was
> > hoping that maybe someone had found something faster.
> >
> > For example to design an optimized 32x32 divide it might be good to know
> > that two numbers are small and resort to a much faster 8x8 division
>
> Here, an approximation of the magnitude is probably
> sufficient. I'd consider cheating and (for example) only
> attempting to figure it out to the nearest 8-bit
> boundary.
>
> Probably the most obvious way to do that would be
> bisection: Start by splitting the word in two, and figure
> out if the upper half is zero. Having found which half of
> the word contains the most significant set bit, you can
> divide that half into halves again, and so on until you
> know it as accurately as you want.
>
> test eax, eax
> jz its_zero
> test eax, 0ffff0000h
> jnz top16
> test ah, ah
> jnz byte1 ; at least one bit in AH is set
> jmp byte0 ; only bits in AL are set
> top16:
> test eax, 0ff000000h
> jnz byte3 ; at least one bit in top byte is set
> jmp byte2 ; only bits in bottom three bytes set
> ; continue further if necessary.
>
> Keep in mind that finer granularity takes more time and
> more code. You can keep the code from getting a lot
> bigger by next moving the appropriate byte into AL, and
> using merged code to find the bit inside of that byte.
> That only helps space, not speed, though.
>
> --
>     Later,
>     Jerry.
>
> The universe is a figment of its own imagination.
>

```
 0

```DanCam wrote:
> Thank you very much.
>
> On the Intel I'm using A86 and don't recall any RCL command, I'm assuming
> it's a LSL AL,1 ; Logical Shift Left by one. So you just keep going until a
> one appears in MSB. I was hoping there might be a faster way than looping if
> number was fairly large 32/64+ bits, that's a lot of time spent shifting
> 8-bits at-a-time each time through loop.
>
> PS. I have no idea what BSF or BSR are. This is for 8 bit routines. No 286+
> commands.

If you're targeting something 8086-ish, your best bet is probably a 256
byte lookup table, and just count the number of leading zero bytes.
Something like (untested):

; assume [di]->big endian input number, cx=length

lea bx,clztable
xor ax,ax
mov dx,cx
rep scasb
je allzeros
dec di
lodsb
xlat
allzeros:
sub dx,cx
shl dx,1
shl dx,1
shl dx,1
ret  ;number of leading zero bits in ax

clztable db 8,7,7,6,6,6,6,5,5,5,5,5,5,5,5,4,4...1,1,0

Then you'd need to transform the leading zero count appropriately.

That obviously simplifies if you have a one or two byte input, or if
you can assure that the byte following the number has the high bit set.
And changing that for a little endian number is trivial (change the
dec to an inc, and assume DF comes in the other way).

```
 0

```DanCam wrote:
> Thank you very much.
>
> On the Intel I'm using A86 and don't recall any RCL command, I'm assuming
> it's a LSL AL,1 ; Logical Shift Left by one. So you just keep going until a
> one appears in MSB. I was hoping there might be a faster way than looping if
> number was fairly large 32/64+ bits, that's a lot of time spent shifting
> 8-bits at-a-time each time through loop.
>
> PS. I have no idea what BSF or BSR are. This is for 8 bit routines. No 286+
> commands.

Are you kidding ?

If not, you absolutly need a list of x86 assembler opcodes :)

IE: http://www-ivs.cs.uni-magdeburg.de/bs/lehre/sose99/bs1/nasm/nasmdoca.html

Jacques

```
 0

```"Erdemal" <spamtrap@crayne.org> wrote in message
> DanCam wrote:
> > Thank you very much.
> >
> > On the Intel I'm using A86 and don't recall any RCL command, I'm
assuming
> > it's a LSL AL,1 ; Logical Shift Left by one. So you just keep going
until a
> > one appears in MSB. I was hoping there might be a faster way than
looping if
> > number was fairly large 32/64+ bits, that's a lot of time spent shifting
> > 8-bits at-a-time each time through loop.
> >
> > PS. I have no idea what BSF or BSR are. This is for 8 bit routines. No
286+
> > commands.
>
> Are you kidding ?
>
> If not, you absolutly need a list of x86 assembler opcodes :)
>
> IE:
http://www-ivs.cs.uni-magdeburg.de/bs/lehre/sose99/bs1/nasm/nasmdoca.html
>
> Jacques
>

Hmm, I know my A86 Assembler is old...but, I'll have to check I don't think
it includes BSR, you sure that's not a 286+ command, but if it does, that is
exactly what I am looking for so the Intel application would be easy. From
what I understand BSR scans a byte/word from the top and reports the first
instance of non-zero, is that correct?

Well, then I guess what I'm looking for is a routine that would do this
implicitly for embedded chip without a BSR command. I thought someone might
have come across a neat short-cut that takes advantage of something like
A&(A-1)=0 iff A is power of two, or like lg2(x)=log10(x)+ln(x).

```
 0

```DanCam wrote:

> Hmm, I know my A86 Assembler is old...but, I'll have to check I don't think
> it includes BSR, you sure that's not a 286+ command,

I use the MASM help file for x86 opcodes, here is the BSF MASM short doc

BSF - Bit Scan Forward (386+)
Usage:  BSF     dest,src
Modifies flags: ZF
Scans source operand for first bit set.  Sets ZF if a bit is found
set and loads the destination with an index to first set bit.  Clears
ZF is no bits are found set.  BSF scans forward across bit pattern
(0-n) while BSR scans in reverse (n-0).
Clocks                 Size
Operands         808x  286   386   486          Bytes

reg,reg           -     -   10+3n  6-42           3
reg,mem           -     -   10+3n  7-43          3-7
reg32,reg32       -     -   10+3n  6-42          3-7
reg32,mem32       -     -   10+3n  7-43          3-7

0F BC BSF r16,r/m16 Bit scan forward on r/m16

0F BC BSF r32,r/m32 Bit scan forward on r/m32

> but if it does, that is
> exactly what I am looking for so the Intel application would be easy. From
> what I understand BSR scans a byte/word from the top and reports the first
> instance of non-zero, is that correct?

> Well, then I guess what I'm looking for is a routine that would do this
> implicitly for embedded chip without a BSR command. I thought someone might
> have come across a neat short-cut that takes advantage of something like
> A&(A-1)=0 iff A is power of two, or like lg2(x)=log10(x)+ln(x).

Nice approximation but how do you compute log(10)x and ln(x) without FPU
and if you have a FPU ...

I must have missed something ... as often :)

Jacques

```
 0

```In article <e4lrgg\$t5a\$1@utornnr1pp.grouptelecom.net>
spamtrap@crayne.org "DanCam " writes:

> Hmm, I know my A86 Assembler is old...but, I'll have to check I don't think
> it includes BSR, you sure that's not a 286+ command, but if it does, that is
> exactly what I am looking for so the Intel application would be easy. From
> what I understand BSR scans a byte/word from the top and reports the first
> instance of non-zero, is that correct?

from the same era and has useful info on asm as well as other DOS
topics.

Pete
--
"We have not inherited the earth from our ancestors,
we have borrowed it from our descendants."

```
 0

```Erdemal wrote:
> DanCam wrote:

> > have come across a neat short-cut that takes advantage of something like
> > A&(A-1)=0 iff A is power of two, or like lg2(x)=log10(x)+ln(x).
>
> Nice approximation

lg(x)+ln(x)       lg(x)+ln(x)
Nice?  ld(x)= -------------  =  ------------
lg(2)+ln(2)        0.994177

so it's 0.6% to small. And if you calculate x with the
above approximation you get an error:

2**(lg(x)+ln(x))
---------------  = 2**(-0.005823 * ld(x))
2**(ld(x))

so, for ld(x) = 1/0.005823 = 172 (i.e x= 6*10^51) the approximation
is only half of the correct value.

```
 0

```"Erdemal" <spamtrap@crayne.org> wrote in message
> DanCam wrote:
>
> > Hmm, I know my A86 Assembler is old...but, I'll have to check I don't
think
> > it includes BSR, you sure that's not a 286+ command,
>
> I use the MASM help file for x86 opcodes, here is the BSF MASM short doc
>
> BSF - Bit Scan Forward (386+)
>          Usage:  BSF     dest,src
>          Modifies flags: ZF
>          Scans source operand for first bit set.  Sets ZF if a bit is
found
>          set and loads the destination with an index to first set bit.
Clears
>          ZF is no bits are found set.  BSF scans forward across bit
pattern
>          (0-n) while BSR scans in reverse (n-0).
>                                   Clocks                 Size
>          Operands         808x  286   386   486          Bytes
>
>          reg,reg           -     -   10+3n  6-42           3
>          reg,mem           -     -   10+3n  7-43          3-7
>          reg32,reg32       -     -   10+3n  6-42          3-7
>          reg32,mem32       -     -   10+3n  7-43          3-7
>
>          0F BC BSF r16,r/m16 Bit scan forward on r/m16
>
>          0F BC BSF r32,r/m32 Bit scan forward on r/m32
>
> > but if it does, that is
> > exactly what I am looking for so the Intel application would be easy.
From
> > what I understand BSR scans a byte/word from the top and reports the
first
> > instance of non-zero, is that correct?
>
> > Well, then I guess what I'm looking for is a routine that would do this
> > implicitly for embedded chip without a BSR command. I thought someone
might
> > have come across a neat short-cut that takes advantage of something like
> > A&(A-1)=0 iff A is power of two, or like lg2(x)=log10(x)+ln(x).
>
> Nice approximation but how do you compute log(10)x and ln(x) without FPU
> and if you have a FPU ...
>
> I must have missed something ... as often :)
>
> Jacques
>

You misunderstand. Those were just EXAMPLES of the type of "short-cut" I was
looking for. The embedded uCs are only 8-bits and don't have any Floating
Point and there's no room to do one in software. The BSF opcode might look
good, but my guess is that if you looked at the number of clock-cycles, that
it would probably be just as slow to do it manually by shifting and looking
at bits.

There was one I saw a while back that could quickly tell the "distance" to
the next higher power-of-two, but I did not save it, and now I can't find
it. If I took the original number and added the "distance" then I would have
the next largest power-of-two or lg2(x).

```
 0

```Erdemal  <spamtrap@crayne.org> wrote:

>DanCam wrote:
>
>> Hmm, I know my A86 Assembler is old...but, I'll have to check I don't think
>> it includes BSR, you sure that's not a 286+ command,
>
>I use the MASM help file for x86 opcodes, here is the BSF MASM short doc
>
>BSF - Bit Scan Forward (386+)

Notice this:            ^^^^^^

Yes, BSR and BSF were not introduced until the 386.  The original poster
certainly did not make clear that he was in an 8086 environment.
--
- Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

```
 0

```Here's C++ code, you get assembler routine if you compile this
(compilers will give assembly output with right compiler switch, for
example Visual C++ that would be /Faout.asm )

inline int log2ui(uint32 u)
{
float v(u);
return (static_cast<int32>(floatInt(v)) >> 23) - 127;
}

This doesn't use lookup table but does int-to-float conversion, it
works like this:

The value is loaded into floating-point format. The IEEE754 encoding
stores the the exponent like this:

31                             0
xEEEEEEEEmmmmmmmmmmmmmmmmmmmmmmm

The bits 30 to 23 store the exponent, it's biased with 127 (meaning the
exponent is *signed* using two's complement), so, when we have the
integer value encoded like this we shift the bits down 23 times, now we
have exponent as 8 bit value in 8 lowest bits.

Then we remove the bias (-127 above) and we have the log2 of the value.
The downside is that we have to *store* the floating point value into
memory on x86 architechture to access the bits in floating-point
register. So it's store-load situation, which may not be what you are
looking for because it's not the best possible way around things, but
hey, lookup table -or- silly iterative methods are atleast not used.
;)=

```
 0

```Op Sat, 27 May 2006 17:59:05 +0200 schreef jukka@liimatta.org
<spamtrap@crayne.org>:
> Here's C++ code, you get assembler routine if you compile this
> (compilers will give assembly output with right compiler switch, for
> example Visual C++ that would be /Faout.asm )
>
> inline int log2ui(uint32 u)
> {
>   float v(u);
>   return (static_cast<int32>(floatInt(v)) >> 23) - 127;
> }
>

Might as well do it compiler-independant, type-safely and in C:

inline int log2i(unsigned int i)
{
if (i == 0) {
return -1;
}
union {
float mf;
int mi;
} u;
u.mf = (float) i;
return (u.mi >> 23) - 127;
}

This one still requires 32-bit arch and IEEE-754 floats though.

> This doesn't use lookup table but does int-to-float conversion, it
> works like this:
>
> The value is loaded into floating-point format. The IEEE754 encoding
> stores the the exponent like this:
>
> 31                             0
> xEEEEEEEEmmmmmmmmmmmmmmmmmmmmmmm
>
> The bits 30 to 23 store the exponent, it's biased with 127 (meaning the
> exponent is *signed* using two's complement), so, when we have the
> integer value encoded like this we shift the bits down 23 times, now we
> have exponent as 8 bit value in 8 lowest bits.

give nonsense) and zero which should throw an exception or return -1.

> Then we remove the bias (-127 above) and we have the log2 of the value.
> The downside is that we have to *store* the floating point value into
> memory on x86 architechture to access the bits in floating-point
> register. So it's store-load situation, which may not be what you are
> looking for because it's not the best possible way around things, but
> hey, lookup table -or- silly iterative methods are atleast not used.
> ;)=
>

--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/

```
 0

```Boudewijn Dijkstra wrote:

> jukka wrote:
>
>> Here's C++ code, you get assembler routine if you compile this
>> (compilers will give assembly output with right compiler switch,
>> for example Visual C++ that would be /Faout.asm )
>>
>> inline int log2ui(uint32 u)
>> {
>>   float v(u);
>>   return (static_cast<int32>(floatInt(v)) >> 23) - 127;
>> }
>
> Might as well do it compiler-independant, type-safely and in C:
>
> inline int log2i(unsigned int i)
> {
>   if (i == 0) {
>     return -1;
>   }
>   union {
>     float mf;
>     int mi;
>   } u;
>   u.mf = (float) i;
>   return (u.mi >> 23) - 127;
> }

As far as I understand, your code has implementation-defined behavior,
therefore it is not compiler-independent ;-)

<quote>
With one exception, if a member of a union object is accessed after
a value has been stored in a different member of the object, the
behavior is implementation-defined. One special guarantee is made
in order to simplify the use of unions: If a union contains several
structures that share a common initial sequence, and if the union
object currently contains one of these structures, it is permitted to
inspect the common initial part of any of them. Two structures share
a common initial sequence if corresponding members have compatible
types for a sequence of one or more initial members.
</quote>

By the way, the OP said he didn't have an FPU :-)

```
 0

```Op Wed, 31 May 2006 17:08:16 +0200 schreef Spoon <root@127.0.0.1>:
> Boudewijn Dijkstra wrote:
>> jukka wrote:
>>
>>> Here's C++ code, you get assembler routine if you compile this
>>> (compilers will give assembly output with right compiler switch,
>>> for example Visual C++ that would be /Faout.asm )
>>>
>>> inline int log2ui(uint32 u)
>>> {
>>>   float v(u);
>>>   return (static_cast<int32>(floatInt(v)) >> 23) - 127;
>>> }
>> Might as well do it compiler-independant, type-safely and in C:
>> inline int log2i(unsigned int i)
>> {
>>   if (i == 0) {
>>     return -1;
>>   }
>>   union {
>>     float mf;
>>     int mi;
>>   } u;
>>   u.mf = (float) i;
>>   return (u.mi >> 23) - 127;
>> }
>
> As far as I understand, your code has implementation-defined behavior,
> therefore it is not compiler-independent ;-)

Let's say it doesn't depend on functionality that was caughed up by a
single specific compiler manufacturer.

> By the way, the OP said he didn't have an FPU :-)

Too bad, I did not notice that from alt.comp.lang.assembler.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:
http://www.opera.com/mail/

```
 0

```"Spoon" <root@127.0.0.1> wrote in message news:e5kb97\$lrh\$1@nntp.aioe.org...
> Boudewijn Dijkstra wrote:
>
>> jukka wrote:
>>
>>> Here's C++ code, you get assembler routine if you compile this
>>> (compilers will give assembly output with right compiler switch,
>>> for example Visual C++ that would be /Faout.asm )
>>>
>>> inline int log2ui(uint32 u)
>>> {
>>>   float v(u);
>>>   return (static_cast<int32>(floatInt(v)) >> 23) - 127;
>>> }
>>
>> Might as well do it compiler-independant, type-safely and in C:
>>
>> inline int log2i(unsigned int i)
>> {
>>   if (i == 0) {
>>     return -1;
>>   }
>>   union {
>>     float mf;
>>     int mi;
>>   } u;
>>   u.mf = (float) i;
>>   return (u.mi >> 23) - 127;
>> }
>
> As far as I understand, your code has implementation-defined behavior,
> therefore it is not compiler-independent ;-)
>
> <quote>
> With one exception, if a member of a union object is accessed after
> a value has been stored in a different member of the object, the
> behavior is implementation-defined. One special guarantee is made
> in order to simplify the use of unions: If a union contains several
> structures that share a common initial sequence, and if the union
> object currently contains one of these structures, it is permitted to
> inspect the common initial part of any of them. Two structures share
> a common initial sequence if corresponding members have compatible
> types for a sequence of one or more initial members.
> </quote>
>
> By the way, the OP said he didn't have an FPU :-)
/*
This glop is obviously C, but compile it with assembly output and it should
give you a simple assembly routine or just code the same algorithms.

Benchmark to see which work best for your system.
*/
const int tab_m37_ff[] = {
-1,0,25,1,22,26,31,2,15,23,29,27,10,-1,12,3,6,16,
-1,24,21,30,14,28,9,11,5,-1,20,13,8,4,19,7,18,17
};

const int tab_8[]={
-1,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7
};

/***********************************************/
/* Locate the postion of the highest bit set.  */
/* A binary search is used.  The result is an  */
/* approximation of log2(n) [the integer part] */
/***********************************************/
unsigned long   a_log2(unsigned long n)
{
unsigned long   i = (-1);
/* Is there a bit on in the high word? */
/* Else, all the high bits are already zero. */
if (n & 0xffff0000) {
i += 16;                /* Update our search position */
n >>= 16;               /* Shift out lower (irrelevant) bits */
}
/* Is there a bit on in the high byte of the current word? */
/* Else, all the high bits are already zero. */
if (n & 0xff00) {
i += 8;                 /* Update our search position */
n >>= 8;                /* Shift out lower (irrelevant) bits */
}
/* Is there a bit on in the current nybble? */
/* Else, all the high bits are already zero. */
if (n & 0xf0) {
i += 4;                 /* Update our search position */
n >>= 4;                /* Shift out lower (irrelevant) bits */
}
/* Is there a bit on in the high 2 bits of the current nybble? */
/* 0xc is 1100 in binary... */
/* Else, all the high bits are already zero. */
if (n & 0xc) {
i += 2;                 /* Update our search position */
n >>= 2;                /* Shift out lower (irrelevant) bits */
}
/* Is the 2nd bit on? [ 0x2 is 0010 in binary...] */
/* Else, all the 2nd bit is already zero. */
if (n & 0x2) {
i++;                    /* Update our search position */
n >>= 1;                /* Shift out lower (irrelevant) bit */
}
/* Is the lowest bit set? */
if (n)
i++;                    /* Update our search position */
return i;
}

/*
** From: <bousek@\$smtpg.compsys.com\$>
** Subject: Re: Is there a faster way to find the highest bit set in a 32
bit integer? [Includes C code listing]
** Newsgroups: comp.graphics.algorithms
** References: <01bc3fac\$69b314c0\$ca61e426@DCorbit.solutionsiq.com>
** Organization: TDS Telecom - Madison, WI
** Message-ID: <01bc405b\$2c1c8740\$ca61e426@DCorbit.solutionsiq.com>
** X-Newsreader: Microsoft Internet News 4.70.1155
** MIME-Version: 1.0
** Content-Type: text/plain; charset=ISO-8859-1
** Content-Transfer-Encoding: 7bit
**
** "Dann Corbit" <dcorbit@solutionsiq.com> wrote:
** <snip>
** hey Dann:
** you could modify your routine something like this (although i would tend
to
** put this type of routine in assembler)...
*/
unsigned long   b_log2(unsigned long n)
{
register unsigned long i = (n & 0xffff0000) ? 16 : 0;
if ((n >>= i) & 0xff00)
i |= 8, n >>= 8;
if (n & 0xf0)
i |= 4, n >>= 4;
if (n & 0xc)
i |= 2, n >>= 2;
return (i | (n >> 1));
}

unsigned long   c_log2(unsigned long n)
{
register unsigned long p = 16,
i = 0;
do {
unsigned long   t = n >> p;
if (t)
n = t, i |= p;
} while (p >>= 1);
return i;
}

#include <math.h>
/*
* This is the fully portable version, which uses
* the 'frexp' function to get the mantissa and
* exponent of a number.
*/
unsigned long   d_log2(unsigned long value)
{
int  exponent;
double          d_val = (double) value;
(void)frexp(d_val, &exponent);
return exponent - 1;
}

/*
* This is the rather less portable version, which
* requires you to know how doubles are stored
* in memory, and the location and bias of the
* exponent for such numbers.
* This example is for doubles ( 64 bits )
* on Intel hardware - YMMV.
*/
/* CHANGED FROM DOUBLE TO FLOAT --
#define MP(x)  ((long unsigned long *)&d_val)[x]
unsigned long tlog2( unsigned long value )
{
double d_val = (double)value ;
return (int)( MP(1) >> 20 & 0x7ff ) - 0x3ff ;
}

*/
#define MP(x)  ((unsigned long *)&d_val)[x]
unsigned long   e_log2(unsigned long value)
{
float           d_val = (float) value;
return (int) (MP(0) >> 23 & 0xff) - 0x7f;
}

/*
From:   Robert E. Minsk[SMTP:egbert@spimageworks.com]
Sent:   Saturday, April 12, 1997 9:44 PM
To:     Dann Corbit
Subject:        Is there a faster way to find the highest bit set in a 32
bit integer? [Includes C code listing]
I did not see the original post but why don't you build a table of the
highest bit set in a byte and then check each byte.  For example.
*/
static unsigned char hiBitSetTab[] = {
0, 1, 2, 2, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7
};
/* On my machine unsigned unsigned long is 32-bits, big ended */
unsigned long   f_log2(unsigned long val)
{
unsigned long   tmp;
tmp = val >> 24;
if (tmp) {
return hiBitSetTab[tmp] + 23;
}
tmp = (val >> 16) & 0xff;
if (tmp) {
return hiBitSetTab[tmp] + 15;
}
tmp = (val >> 8) & 0xff;
if (tmp) {
return hiBitSetTab[tmp] + 7;
}
return hiBitSetTab[val & 0xff] - 1;
}

/* On my machine unsigned unsigned long is 32-bits, big ended */
unsigned long   g_log2(unsigned long val)
{
//    unsigned long tmp;
unsigned char  *foo = (unsigned char *) &val;
if (foo[3]) {
return hiBitSetTab[foo[3]] + 23;
}
if (foo[2]) {
return hiBitSetTab[foo[2]] + 15;
}
if (foo[1]) {
return hiBitSetTab[foo[1]] + 7;
}
return hiBitSetTab[foo[0]] - 1;
}

/*
or even:
*/
typedef union {
unsigned long   i;
unsigned char   c[4];
}               u32bitType;
/* On my machine unsigned unsigned long is 32-bits, big ended */
unsigned long   h_log2(long l)
{
u32bitType      val;
val.i = l;
if (val.c[3]) {
return hiBitSetTab[val.c[3]] + 23;
}
if (val.c[2]) {
return hiBitSetTab[val.c[2]] + 15;
}
if (val.c[1]) {
return hiBitSetTab[val.c[1]] + 7;
}
return hiBitSetTab[val.c[0]] - 1;
}

/* exhaustive binary search (you won't get much faster without a lookup
table) */
unsigned long   i_log2(unsigned long n)
{
return n & 65535 << 16 ? n & 255 << 24 ? n & 15 << 28 ? n & 12 << 28 ? n
& 8 << 28 ? 31 : 30 : n & 2 << 28 ? 29 : 28
: n & 12 << 24 ? n & 8 << 24 ? 27 : 26 : n & 2 << 24 ? 25 : 24 : n & 15
<< 20 ? n & 12 << 20 ? n & 8 << 20 ? 23 :
22 : n & 2 << 20 ? 21 : 20 : n & 12 << 16 ? n & 8 << 16 ? 19 : 18 : n &
2 << 16 ? 17 : 16 : n & 255 << 8 ? n & 15
<< 12 ? n & 12 << 12 ? n & 8 << 12 ? 15 : 14 : n & 2 << 12 ? 13 : 12 : n
& 12 << 8 ? n & 8 << 8 ? 11 : 10 : n & 2
<< 8 ? 9 : 8 : n & 240 ? n & 192 ? n & 128 ? 7 : 6 : n & 32 ? 5 : 4 : n
& 12 ? n & 8 ? 3 : 2 : n & 2 ? 1 : n & 1 ? 0 : -1;
}

/* bytewise binary search, lookup table */
unsigned long   j_log2(unsigned long n)
{
static const unsigned long *t = tab_8;
return n >> 16 ? n >> 24 ? 24 + t[n >> 24] : 16 + t[n >> 16] : n >> 8 ?
8 + t[n >> 8] : t[n];
}

/* frexp */
unsigned long   k_log2(unsigned long n)
{
int   h;
frexp((double) n, &h);
return h - 1;
}

/* magical :) but slow :( */
unsigned long   l_log2(unsigned long n)
{
static const unsigned long *t = tab_m37_ff;
return t[(n |= n >> 1, n |= n >> 2, n |= n >> 4, n |= n >> 8, n |= n >>
16) % 37];
}

/* bytewise linear search, with a small comparison-avoiding twist */
unsigned long   m_log2(unsigned long n)
{
static const unsigned long *t = tab_8;
return n >= 1 << 23 ? 24 + t[n >> 24] : n >= 1 << 15 ? 16 + t[n >> 16] :
n >= 1 << 8 ? 8 + t[n >> 8] : t[n];
}

/* bytewise binary search, with the same twist, even though there's no
reason it
should work here (always two comparisons anyway) */
unsigned long   n_log2(unsigned long n)
{
static const unsigned long *t = tab_8;
return n >= 1 << 15 ? n >= 1 << 23 ? 24 + t[n >> 24] : 16 + t[n >> 16] :
n >= 1 << 8 ? 8 + t[n >> 8] : t[n];
}

/* log() - found to be non-portable due to vague log() accuracy */
unsigned long   o_log2(unsigned long n)
{
return n ? (int) floor(log((double) n) / log(2.)) : -1;
}

/* IEEE trick */
unsigned long   p_log2(unsigned long n)
{
if (n) {
float           f = n;
return ((*(unsigned long *) &f) >> 23) - 127;
} else {
return -1;
}
}

#ifdef TEST
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int   main(void)
{
unsigned long   l;
unsigned long   m;
unsigned long   limit = 100000000;
for (l = 0; l < limit; l++) {
m = l * rand();
m = a_log2(m), m = b_log2(m), m = c_log2(m), m = /*d_log2(m), m =*/
e_log2(m), m = f_log2(m),
m = g_log2(m), m = h_log2(m), m = i_log2(m), m = j_log2(m), m =
/*k_log2(m), m =*/ l_log2(m),
m = m_log2(m), m = n_log2(m),/* m = o_log2(m), */ m = p_log2(m);
}
for (m = 0; m < limit; m += 10000000) {
printf(
"m = %lu, a_log2 = %lu, b_log2 = %lu, c_log2 = %lu, d_log2 =
%lu, e_log2 = %lu, f_log2 = %lu, g_log2 = %lu, h_log2 = %lu, i_log2 = %lu,
j_log2 = %lu, k_log2 = %lu, l_log2 = %lu, _log2 = %lu, n_log2 = %lu, o_log2
= %lu, p_log2 = %lu",
m, a_log2(m), b_log2(m), c_log2(m), d_log2(m), e_log2(m),
f_log2(m), g_log2(m), h_log2(m), i_log2(m), j_log2(m), k_log2(m), l_log2(m),
m_log2(m), n_log2(m), o_log2(m), p_log2(m));
printf("floating point approximation is %f\n", log((double) m) /
log(2.));
}
return 0;
}

#endif

```
 0

23 Replies
159 Views

Similiar Articles:

7/10/2012 1:41:59 PM