pow/log functions in fixed point

  • Follow


Hi,

I have successfully converted add, sub, mul and div operations to fixed pt
and see ~10 times improvement.

I'm not sure how to go about doing pow and log functions...can some one
please suggest an algo or pt me to an open source code for reference.

thanks,
Marty


0
Reply dsp 7/20/2010 11:15:44 AM

dsp.study <learner.study@n_o_s_p_a_m.gmail.com> wrote:
 
> I have successfully converted add, sub, mul and div operations to fixed pt
> and see ~10 times improvement.
 
> I'm not sure how to go about doing pow and log functions...can some one
> please suggest an algo or pt me to an open source code for reference.

Fixed point exp() and log() are in Knuth's "Metafont: The Program."

(In Pascal, if that isn't too bad.)

-- glen
0
Reply gah (12303) 7/20/2010 1:16:34 PM


On Tue, 20 Jul 2010 06:15:44 -0500, "dsp.study"
<learner.study@n_o_s_p_a_m.gmail.com> wrote:

>I'm not sure how to go about doing pow and log functions...can some one
>please suggest an algo or pt me to an open source code for reference.

http://groups.google.com/group/comp.dsp/browse_thread/thread/ce4f56d93425ee17
0
Reply gberchin6695 (81) 7/20/2010 2:22:00 PM

On Jul 20, 7:15 am, "dsp.study" <learner.study@n_o_s_p_a_m.gmail.com>
wrote:
>
> I have successfully converted add, sub, mul and div operations to fixed pt
> and see ~10 times improvement.

on what platform did you do this?  the only kind of platform i can
think of that would have this kind of speed improvement is a chip
without native floating-point capability.

and what did you convert the add and sub operations from?

i'm curious.

> I'm not sure how to go about doing pow and log functions...can some one
> please suggest an algo or pt me to an open source code for reference.

do you need bit-wise accuracy?  or can you trade off some error
performance for speed performance?  for DSP chips, i've always been a
proponent of simple polynomials with optimized coefficients for a
single octave and then to use shift operations to get the other
octaves.

r b-j
0
Reply robert 7/20/2010 6:23:29 PM

On 07/20/2010 04:15 AM, dsp.study wrote:
> Hi,
>
> I have successfully converted add, sub, mul and div operations to fixed pt
> and see ~10 times improvement.

In what?  Speed? Code size?  Weight?  Beauty?  Management approval?  Wages?

> I'm not sure how to go about doing pow and log functions...can some one
> please suggest an algo or pt me to an open source code for reference.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html
0
Reply tim177 (4427) 7/20/2010 6:39:14 PM

This is on a mips processor which doesn't have native floating pt support.
I have converted from floating pt to fixed pt (8.24) i.e. 8 bits for int
part and 24 for fraction.

I'd looking at log10 - any pointers would be highly appreciated.

thanks,
Marty

>On Jul 20, 7:15 am, "dsp.study" <learner.study@n_o_s_p_a_m.gmail.com>
>wrote:
>>
>> I have successfully converted add, sub, mul and div operations to fixed
pt
>> and see ~10 times improvement.
>
>on what platform did you do this?  the only kind of platform i can
>think of that would have this kind of speed improvement is a chip
>without native floating-point capability.
>
>and what did you convert the add and sub operations from?
>
>i'm curious.
>
>> I'm not sure how to go about doing pow and log functions...can some one
>> please suggest an algo or pt me to an open source code for reference.
>
>do you need bit-wise accuracy?  or can you trade off some error
>performance for speed performance?  for DSP chips, i've always been a
>proponent of simple polynomials with optimized coefficients for a
>single octave and then to use shift operations to get the other
>octaves.
>
>r b-j
>
0
Reply learner.study (2) 7/21/2010 7:39:27 AM

I tried to use the sample code for BinLog() and notice that its taking 1.5x
more cycles as compared to std libc log2(). 

I'm using MIPS cpu.

thanks!

>On Tue, 20 Jul 2010 06:15:44 -0500, "dsp.study"
><learner.study@n_o_s_p_a_m.gmail.com> wrote:
>
>>I'm not sure how to go about doing pow and log functions...can some one
>>please suggest an algo or pt me to an open source code for reference.
>
>http://groups.google.com/group/comp.dsp/browse_thread/thread/ce4f56d93425ee17
>
0
Reply learner.study (2) 7/21/2010 8:17:47 AM

On Wed, 21 Jul 2010 02:39:27 -0500, dsp.study wrote:

> I'd looking at log10 - any pointers would be highly appreciated.

log10 is only a constant factor different from log2, and the latter is 
where you want to be starting from.

You can get the integer part from whatever the processor-native version 
of count-leading-zeros is.  That leaves you with a value between 0.5 and 
1.0 (in whatever fixed point form you're using).  How you proceed from 
there depends on the accuracy you require.  You can do iterations of the 
taylor series, polynomial approximations, or table-lookup and further 
refinements.  Depends a lot on what you want to do with your logs 
afterwards...

Cheers,

-- 
Andrew
0
Reply Andrew 7/21/2010 12:07:35 PM

dsp.study <learner.study@n_o_s_p_a_m.gmail.com> wrote:

>This is on a mips processor which doesn't have native floating pt support.
>I have converted from floating pt to fixed pt (8.24) i.e. 8 bits for int
>part and 24 for fraction.
>
>I'd looking at log10 - any pointers would be highly appreciated.

I'd recommend implementing log2 in fixed point, and recoding
anything to use it instead of log10.

Since you asked....


S.
0
Reply spope33 (691) 7/22/2010 1:00:34 AM

On Jul 20, 4:15=A0am, "dsp.study" <learner.study@n_o_s_p_a_m.gmail.com>
wrote:
> Hi,
>
> I have successfully converted add, sub, mul and div operations to fixed p=
t
> and see ~10 times improvement.
>
> I'm not sure how to go about doing pow and log functions...can some one
> please suggest an algo or pt me to an open source code for reference.
>
> thanks,
> Marty

Did you try the obvious and search?   I always go to TI's website.
http://focus.ti.com/lit/an/spra619/spra619.pdf
http://www.dattalo.com/technical/theory/logs.html#2.%20Borchardt's%20Algori=
thm
You still need to find sqrt() though.
http://math.fullerton.edu/mathews/n2003/pade/PadeApproximationMod/Links/Pad=
eApproximationMod_lnk_8.html

Peter Nachtwey
0
Reply pnachtwey (92) 7/22/2010 2:03:55 AM

On 7/21/2010 9:00 PM, Steve Pope wrote:
> dsp.study<learner.study@n_o_s_p_a_m.gmail.com>  wrote:
>
>> This is on a mips processor which doesn't have native floating pt support.
>> I have converted from floating pt to fixed pt (8.24) i.e. 8 bits for int
>> part and 24 for fraction.
>>
>> I'd looking at log10 - any pointers would be highly appreciated.
>
> I'd recommend implementing log2 in fixed point, and recoding
> anything to use it instead of log10.

Even without recoding, they differ only by a simple multiplicative 
constant. Does .30103 seem familiar?

Jerry
-- 
Engineering is the art of making what you want from things you can get.
�����������������������������������������������������������������������
0
Reply jya (12870) 7/22/2010 3:32:27 AM

Jerry Avins  <jya@ieee.org> wrote:

>On 7/21/2010 9:00 PM, Steve Pope wrote:

>> I'd recommend implementing log2 in fixed point, and recoding
>> anything to use it instead of log10.

>Even without recoding, they differ only by a simple multiplicative 
>constant. Does .30103 seem familiar?

Sure, one can put in this factor every time one had used a log10(),
but you will have less round-off error accumulating if you don't do this.

Whether it's worth redoing your code depends on how sensitive
it is to such things.

Steve
0
Reply spope33 7/22/2010 3:44:19 AM

"Jerry Avins" <jya@ieee.org> wrote in message news:fZO1o.21428$lS1.13758@newsfe12.iad...
> On 7/21/2010 9:00 PM, Steve Pope wrote:
>> dsp.study<learner.study@n_o_s_p_a_m.gmail.com>  wrote:
>>
>>> This is on a mips processor which doesn't have native floating pt support.
>>> I have converted from floating pt to fixed pt (8.24) i.e. 8 bits for int
>>> part and 24 for fraction.
>>>
>>> I'd looking at log10 - any pointers would be highly appreciated.
>>
>> I'd recommend implementing log2 in fixed point, and recoding
>> anything to use it instead of log10.
>
> Even without recoding, they differ only by a simple multiplicative constant. Does .30103 seem familiar?
>
> Jerry

If you can accomodate a reasonably hefty table and your CPU has a fast
integer multiply (with the ability to get either a double-wide product
or with a MULHI instruction available), quadratic interpolation in a
table of function values is hard to beat in my experience.

Below I demonstrate how this could be done for log2() in 8.24 fixed
point. Our input x is of the form 2^e*m, 1 <= m < 2, m = 1 + f. Using
a "count leading zeros" operation, we quickly establish e and f. Given
f, 0 <= f < 1, we compute the index i of the closest, smaller, table
entry in a table of values of log2(x) on [1,2). The difference between
the actual argument and the function argument corresponding to i shall
be d. Using the function value from the table, f0, as well as the next
two table entries, f1 and f2, we compute the coefficients a, b of a
parabola that fits through all three function values. The value of
log2(m) is then computed as f0 + ((a * d) + b) * d, which is finally
added to the integer part of the result, i.e. log2(2^e) = e, computed
earlier.

The same technique can be used for exp2(), and pow(x,y) can then be
constructed from those two in the straightforward manner. I am aware
that this leads to loss of accuracy if x is near 1 and y is large, but
in many situations this is good enough.

Note that the log implementation below is not fully accurate to the
8.24 precision, but it gets close.

-- Norbert


#include <stdlib.h>
#include <stdio.h>
#include <math.h>

/* This is a single machine instruction on many platforms */
__inline int clz (unsigned int x)
{
  /* Use Robert Harley's table-based algorithm */
  static const unsigned char clz_tab[64] = {
    32, 31, 31, 16, 31, 30,  3, 31, 15, 31, 31, 31, 29, 10,  2, 31,
    31, 31, 12, 14, 21, 31, 19, 31, 31, 28, 31, 25, 31,  9,  1, 31,
    17, 31,  4, 31, 31, 31, 11, 31, 13, 22, 20, 31, 26, 31, 31, 18,
     5, 31, 31, 23, 31, 27, 31,  6, 31, 24,  7, 31,  8, 31,  0, 31
  };
  x |= x >> 1;
  x |= x >> 2;
  x |= x >> 4;
  x |= x >> 8;
  x |= x >> 16;
  x *= 7U*255U*255U*255U;
  return (int)clz_tab[x >> 26];
}

/* One or two machine instructions on most platforms */
__inline long long int mul_wide (int x, int y)
{
  return ((long long int)x) * y;
}

/* compute log2(x) in 8.24 fixed-point format, x != 0 */
unsigned log2_8_24 (unsigned x)
{
  /* log2(x) on interval [1, 2+2/128] in 1.31 format */
  static const unsigned int log_tab[130] = {
    0x00000000, 0x016fe50b, 0x02dcf2d1, 0x04473475,
    0x05aeb4dd, 0x07137eae, 0x08759c50, 0x09d517ef,
    0x0b31fb7d, 0x0c8c50b7, 0x0de42120, 0x0f397609,
    0x108c588d, 0x11dcd197, 0x132ae9e2, 0x1476a9fa,
    0x15c01a3a, 0x170742d5, 0x184c2bd0, 0x198edd07,
    0x1acf5e2e, 0x1c0db6ce, 0x1d49ee4c, 0x1e840be7,
    0x1fbc16b9, 0x20f215b7, 0x22260fb6, 0x23580b65,
    0x24880f56, 0x25b621f9, 0x26e2499d, 0x280c8c76,
    0x2934f098, 0x2a5b7bf9, 0x2b803474, 0x2ca31fc9,
    0x2dc4439b, 0x2ee3a575, 0x30014ac6, 0x311d38e6,
    0x32377512, 0x33500472, 0x3466ec15, 0x357c30f3,
    0x368fd7ee, 0x37a1e5d4, 0x38b25f5a, 0x39c14924,
    0x3acea7c0, 0x3bda7fa9, 0x3ce4d544, 0x3dedace6,
    0x3ef50ad2, 0x3ffaf335, 0x40ff6a2e, 0x420273ca,
    0x43041403, 0x44044ec5, 0x450327eb, 0x4600a33e,
    0x46fcc47a, 0x47f78f4c, 0x48f10751, 0x49e93016,
    0x4ae00d1d, 0x4bd5a1d8, 0x4cc9f1ab, 0x4dbcffee,
    0x4eaecfeb, 0x4f9f64de, 0x508ec1fa, 0x517cea63,
    0x5269e12f, 0x5355a96d, 0x5440461c, 0x5529ba33,
    0x5612089a, 0x56f93433, 0x57df3fd0, 0x58c42e3d,
    0x59a80239, 0x5a8abe79, 0x5b6c65aa, 0x5c4cfa6c,
    0x5d2c7f59, 0x5e0af6ff, 0x5ee863e5, 0x5fc4c886,
    0x60a02757, 0x617a82c3, 0x6253dd2c, 0x632c38ed,
    0x64039858, 0x64d9fdb7, 0x65af6b4b, 0x6683e34f,
    0x675767f5, 0x6829fb69, 0x68fb9fce, 0x69cc5741,
    0x6a9c23d6, 0x6b6b079c, 0x6c39049b, 0x6d061cd3,
    0x6dd2523d, 0x6e9da6ce, 0x6f681c73, 0x7031b512,
    0x70fa728c, 0x71c256ba, 0x72896373, 0x734f9a83,
    0x7414fdb5, 0x74d98eca, 0x759d4f81, 0x76604191,
    0x772266ad, 0x77e3c082, 0x78a450b8, 0x796418f2,
    0x7a231ace, 0x7ae157e3, 0x7b9ed1c7, 0x7c5b8a07,
    0x7d17822f, 0x7dd2bbc4, 0x7e8d3846, 0x7f46f932,
    0x80000000, 0x80b84e23
  };

  int a, b, c, d, e, f, i, t, r, f0, f1, f2;

  c = clz(x);
  e = 7 - c;
  f = x << (c + 1);
  i = (unsigned int)f >> 25;
  d = f - (i << 25);
  f0 = log_tab[i];
  f1 = log_tab[i+1];
  f2 = log_tab[i+2];
  t = (f1 - f0) << 1;
  a = (f2 - f0) - t;
  b = t - a;
  t = (int)(mul_wide(a, d) >> 25) + b;
  t = (int)(mul_wide(t, d) >> 26);
  r = f0 + t;
  return (e << 24) + (((unsigned int)r + 64) >> 7);
}

int main (void)
{
  unsigned int arg, ref, res, err;
  unsigned int cnt[4] = {0, 0, 0, 0};

  for (arg = 1; arg != 0; arg++) {
    ref = (int)(2.4204406323122971e+007 * log (arg / 16777216.0) + 0.5);
    res = log2_8_24 (arg);
    err = abs (res - ref);
    if (err > 3) {
      printf ("arg=%08x  res=%08x  ref=%08x\n", arg, res, ref);
    } else {
      cnt[err]++;
    }
    if (!(arg & 0x07ffffff)) {
      printf ("arg=%08x\n", arg);
    }
  }
  printf ("errcnt 0: %u   1: %u   2: %u   3: %u\n",
          cnt[0], cnt[1], cnt[2], cnt[3]);
  return EXIT_SUCCESS;
}




0
Reply juffa (19) 7/23/2010 7:48:32 AM

On Tue, 20 Jul 2010 06:15:44 -0500, "dsp.study"
<learner.study@n_o_s_p_a_m.gmail.com> wrote:

>Hi,
>
>I have successfully converted add, sub, mul and div operations to fixed pt
>and see ~10 times improvement.
>
>I'm not sure how to go about doing pow and log functions...can some one
>please suggest an algo or pt me to an open source code for reference.
>
>thanks,
>Marty

Hi Marty,
  Clay S. Turner has written a neat article, titled 
"A Fast Binary Logarithm Algorithm".  That article, most 
appropriate for fixed-point processors, will appear in 
the Sept. 2010 "DSP Tips & Tricks" column of the IEEE 
Signal Processing magazine.

Good Luck,
[-Rick-]
0
Reply Rick 7/23/2010 9:23:43 AM

Norbert Juffa <juffa@earthlink.net> wrote:

>Below I demonstrate how this could be done for log2() in 8.24 fixed
>point. Our input x is of the form 2^e*m, 1 <= m < 2, m = 1 + f. Using
>a "count leading zeros" operation, we quickly establish e and f. Given
>f, 0 <= f < 1, we compute the index i of the closest, smaller, table
>entry in a table of values of log2(x) on [1,2). The difference between
>the actual argument and the function argument corresponding to i shall
>be d. Using the function value from the table, f0, as well as the next
>two table entries, f1 and f2, we compute the coefficients a, b of a
>parabola that fits through all three function values. 

The above is a valuable technique, but times have changed and in many
DSP scenarios these days one can just plug the mantissa into 
a (larger) LUT and be done with it.  This applies to log(), square(), 
and similar functions.

Steve
0
Reply spope33 7/23/2010 4:30:01 PM

"Steve Pope" <spope33@speedymail.org> wrote in message news:i2cg29$hq6$1@blue.rahul.net...
> Norbert Juffa <juffa@earthlink.net> wrote:
>
>>Below I demonstrate how this could be done for log2() in 8.24 fixed
>>point. Our input x is of the form 2^e*m, 1 <= m < 2, m = 1 + f. Using
>>a "count leading zeros" operation, we quickly establish e and f. Given
>>f, 0 <= f < 1, we compute the index i of the closest, smaller, table
>>entry in a table of values of log2(x) on [1,2). The difference between
>>the actual argument and the function argument corresponding to i shall
>>be d. Using the function value from the table, f0, as well as the next
>>two table entries, f1 and f2, we compute the coefficients a, b of a
>>parabola that fits through all three function values.
>
> The above is a valuable technique, but times have changed and in many
> DSP scenarios these days one can just plug the mantissa into
> a (larger) LUT and be done with it.  This applies to log(), square(),
> and similar functions.
>
> Steve

For a table lookup method to work efficiently on most embedded processors
requires a high likelihood that the table elements are present in L1 cache.
Often there will be tables for division, sqrt(), exp(), log(), all of whom
would want to fit into L1 at the same time, leaving room for other data as
well. The L1 may be as small as 8KB or even 4KB, so keeping lookup tables
small seems very much desirable.

To achieve accuracy to full 8.24 precision one needs a table of 258*4 bytes
(~1KB) each for both exp() and log() with quadratic interpolation. Table
storage will increase significantly if linear interpolation is used, even
if an economical scheme such as bipartite tables is used. Even for hardware
implementations the use of quadratic interpolation may well be preferred to
keep ROM sizes reasonable.

I would be interested in learning in which scenarios requiring relative high
accuracy (such as here) quadratic interpolation is dispensable, and linear
interpolation or straight lookup are feasible.

-- Norbert


0
Reply Norbert 7/23/2010 6:42:53 PM

On Jul 23, 2:42=A0pm, "Norbert Juffa" <ju...@earthlink.net> wrote:
> "Steve Pope" <spop...@speedymail.org> wrote in messagenews:i2cg29$hq6$1@b=
lue.rahul.net...
> > Norbert Juffa <ju...@earthlink.net> wrote:
>
> >>Below I demonstrate how this could be done for log2() in 8.24 fixed
> >>point. Our input x is of the form 2^e*m, 1 <=3D m < 2, m =3D 1 + f. Usi=
ng
> >>a "count leading zeros" operation, we quickly establish e and f. Given
> >>f, 0 <=3D f < 1, we compute the index i of the closest, smaller, table
> >>entry in a table of values of log2(x) on [1,2). The difference between
> >>the actual argument and the function argument corresponding to i shall
> >>be d. Using the function value from the table, f0, as well as the next
> >>two table entries, f1 and f2, we compute the coefficients a, b of a
> >>parabola that fits through all three function values.
>
> > The above is a valuable technique, but times have changed and in many
> > DSP scenarios these days one can just plug the mantissa into
> > a (larger) LUT and be done with it. =A0This applies to log(), square(),
> > and similar functions.
>
> > Steve
>
> For a table lookup method to work efficiently on most embedded processors
> requires a high likelihood that the table elements are present in L1 cach=
e.
> Often there will be tables for division, sqrt(), exp(), log(), all of who=
m
> would want to fit into L1 at the same time, leaving room for other data a=
s
> well. The L1 may be as small as 8KB or even 4KB, so keeping lookup tables
> small seems very much desirable.
>
> To achieve accuracy to full 8.24 precision one needs a table of 258*4 byt=
es
> (~1KB) each for both exp() and log() with quadratic interpolation. Table
> storage will increase significantly if linear interpolation is used, even
> if an economical scheme such as bipartite tables is used. Even for hardwa=
re
> implementations the use of quadratic interpolation may well be preferred =
to
> keep ROM sizes reasonable.
>
> I would be interested in learning in which scenarios requiring relative h=
igh
> accuracy (such as here) quadratic interpolation is dispensable, and linea=
r
> interpolation or straight lookup are feasible.
>
> -- Norbert- Hide quoted text -
>
> - Show quoted text -

Years ago I made a light meter that showed relative intensity in terms
of F stops and density. Both of these are logarithmic quantities and I
used a Mot 68HC05 micro and I went for the accuracy since the
computation time being a microsecond or 10mSec just didn't matter! The
idea was to force all of the error into the sensor and not have the
processor's math add to the final error. Simplicity and tiny code was
the desirable thing. My 4 arithematic floating point routines took
only 512 bytes! When I added the input, output and log functions, I
was still under 1k bytes!

Clay
0
Reply clay (738) 7/23/2010 7:19:17 PM

On Jul 23, 2:42=A0pm, "Norbert Juffa" <ju...@earthlink.net> wrote:
> "Steve Pope" <spop...@speedymail.org> wrote in messagenews:i2cg29$hq6$1@b=
lue.rahul.net...
> > Norbert Juffa <ju...@earthlink.net> wrote:
>
> >>Below I demonstrate how this could be done for log2() in 8.24 fixed
> >>point. Our input x is of the form 2^e*m, 1 <=3D m < 2, m =3D 1 + f. Usi=
ng
> >>a "count leading zeros" operation, we quickly establish e and f. Given
> >>f, 0 <=3D f < 1, we compute the index i of the closest, smaller, table
> >>entry in a table of values of log2(x) on [1,2). The difference between
> >>the actual argument and the function argument corresponding to i shall
> >>be d. Using the function value from the table, f0, as well as the next
> >>two table entries, f1 and f2, we compute the coefficients a, b of a
> >>parabola that fits through all three function values.
>
> > The above is a valuable technique, but times have changed and in many
> > DSP scenarios these days one can just plug the mantissa into
> > a (larger) LUT and be done with it. =A0This applies to log(), square(),
> > and similar functions.
>
....
>
> For a table lookup method to work efficiently on most embedded processors
> requires a high likelihood that the table elements are present in L1 cach=
e.
> Often there will be tables for division, sqrt(), exp(), log(), all of who=
m
> would want to fit into L1 at the same time, leaving room for other data a=
s
> well. The L1 may be as small as 8KB or even 4KB, so keeping lookup tables
> small seems very much desirable.
>
> To achieve accuracy to full 8.24 precision one needs a table of 258*4 byt=
es
> (~1KB) each for both exp() and log() with quadratic interpolation. Table
> storage will increase significantly if linear interpolation is used, even
> if an economical scheme such as bipartite tables is used.

i had to look up "bipartite table" to get an idea about what you're
talking about.  would this paper:

 http://citeseerx.ist.psu.edu/viewdoc/download?doi=3D10.1.1.36.6368&rep=3Dr=
ep1&type=3Dpdf

be a good initial description of the technique?  it's a little bit
intriguing, it's based on the notion that the first derivative of a
function changes much less in a segment of its domain than the "zeroth
derivative" of the function.

but i don't see quadratic interpolation in this.  how is quadratic
interpolation done?  do you continue to divide the input word into
three parts, use two tables, and have a quadratic function rather than
a linear one (referring to Eq. 6 in the paper i cited above) for the
least significant portion of the sum?

> Even for hardware implementations the use of quadratic
> interpolation may well be preferred to keep ROM sizes reasonable.

if it's just the two tables, why stop at quadratic?

> I would be interested in learning in which scenarios requiring relative h=
igh
> accuracy (such as here) quadratic interpolation is dispensable, and linea=
r
> interpolation or straight lookup are feasible.

i need more context to understand what you are seeking to answer.  if
it's still just two parallel table lookups (with sign-extension and
addition following), there is no reason that the "a1" coefficient
lookup need be a linear function of "x2" as is depicted in Eq. 6 of
that paper.  in fact, it need not be quadratic.  furthermore, you can
toss all assumptions to the wind and make some kinda MATLAB program to
optimally determine *both* of the a1 and a0 tables in such a way to
minimize error.

BTW, for all of the fiddling around you need to do to implement this
bipartite table lookup, possible sign extension, and addition in
software, you might be able to perform a decent polynomial
approximation for as few instruction cycles.  the only reason i see
LUT as superior to a, say, 4th-order or 6th-order polynomial
approximation is in sheer speed when that is important and there is
plenty of memory to burn.

r b-j

0
Reply rbj (3940) 7/24/2010 4:41:28 AM

I do not doubt the validity of the algorithms (for interpolating
down to a point where it reduces your table lookup size).

What I doubt is the significance, these days, of the vanishing
grey area between what you'd want to just do in IEEE double precision
floating point, and what you'd want to design to a target bit-width
in hardware.  The idea of an in-between, general-purpose format
is what I'm not sure is still there anymore.

Steve
0
Reply spope33 7/24/2010 5:46:14 AM

On Jul 24, 1:46=A0am, spop...@speedymail.org (Steve Pope) wrote:
> I do not doubt the validity of the algorithms (for interpolating
> down to a point where it reduces your table lookup size).
>
> What I doubt is the significance, these days, of the vanishing
> grey area between what you'd want to just do in IEEE double precision
> floating point, and what you'd want to design to a target bit-width
> in hardware. =A0The idea of an in-between, general-purpose format
> is what I'm not sure is still there anymore.

dunno what you mean by an "in-between, general-purpose format" but,
Steve, even if you have IEEE doubles, but no std math lib, what would
you do to get exp() and log()?  how would you implement the stdlib
exp() and log() or sqrt() or sin() or tan() or atan(), if you had to?

isn't that of significance to some degree?  or is the source code (say
for the gnu std math lib) published somewhere?  if so, i would like to
see what order of polynomial they use and the coefficients.

r b-j
0
Reply rbj (3940) 7/24/2010 6:55:04 AM

"robert bristow-johnson" <rbj@audioimagination.com> wrote in message 
news:21b0a3de-4c48-4364-90d1-540dbba8bc8c@q12g2000yqj.googlegroups.com...
> On Jul 23, 2:42 pm, "Norbert Juffa" <ju...@earthlink.net> wrote:
> > "Steve Pope" <spop...@speedymail.org> wrote in messagenews:i2cg29$hq6$1@blue.rahul.net...
> > > Norbert Juffa <ju...@earthlink.net> wrote:
> >
> > >>Below I demonstrate how this could be done for log2() in 8.24 fixed
> > >>point. Our input x is of the form 2^e*m, 1 <= m < 2, m = 1 + f. Using
> > >>a "count leading zeros" operation, we quickly establish e and f. Given
> > >>f, 0 <= f < 1, we compute the index i of the closest, smaller, table
> > >>entry in a table of values of log2(x) on [1,2). The difference between
> > >>the actual argument and the function argument corresponding to i shall
> > >>be d. Using the function value from the table, f0, as well as the next
> > >>two table entries, f1 and f2, we compute the coefficients a, b of a
> > >>parabola that fits through all three function values.
> >
> > > The above is a valuable technique, but times have changed and in many
> > > DSP scenarios these days one can just plug the mantissa into
> > > a (larger) LUT and be done with it. This applies to log(), square(),
> > > and similar functions.
> >
> ...
> >
> > For a table lookup method to work efficiently on most embedded processors
> > requires a high likelihood that the table elements are present in L1 cache.
> > Often there will be tables for division, sqrt(), exp(), log(), all of whom
> > would want to fit into L1 at the same time, leaving room for other data as
> > well. The L1 may be as small as 8KB or even 4KB, so keeping lookup tables
> > small seems very much desirable.
> >
> > To achieve accuracy to full 8.24 precision one needs a table of 258*4 bytes
> > (~1KB) each for both exp() and log() with quadratic interpolation. Table
> > storage will increase significantly if linear interpolation is used, even
> > if an economical scheme such as bipartite tables is used.
>
> i had to look up "bipartite table" to get an idea about what you're
> talking about.  would this paper:
>
>  http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.6368&rep=rep1&type=pdf
>
> be a good initial description of the technique?  it's a little bit
> intriguing, it's based on the notion that the first derivative of a
> function changes much less in a segment of its domain than the "zeroth
> derivative" of the function.
>
> but i don't see quadratic interpolation in this.  how is quadratic
> interpolation done?  do you continue to divide the input word into
> three parts, use two tables, and have a quadratic function rather than
> a linear one (referring to Eq. 6 in the paper i cited above) for the
> least significant portion of the sum?


That paper looks like a reasonable starting point if you want to get acquainted
with bipartite tables. I first learned about them through papers by David Matula,
whose work is cited by the paper you dug up.

Bipartite tables are basically a more economical form of linear interpolation,
that allow for smaller storage and for rapid evaluation (two table lookups and
an addition). For hardware implementations, the two table lookup can happen in
parallel and only an adder is needed (multipliers being relatively costly). To
do linear interpolation without a multiplier, one keeps two tables. A base table
that tabulates function values at fairly wide intervals, and a table of offsets
relative to those base entries, sampled on narrower intervals in between.

Obviously the deltas stored in the second table are small and require fewer bits
of storage. Further compression is achieved by averaging the slopes of multiple
intervals of the first table, and thus using one section of the second table in
combination with multiple sections of the first table.

Bipartite tables are mostly a popular scheme for hardware implementations. For
example, we used it at AMD implement the reciprocal and reciprocal square root
HW approximations for the 3DNow! technology. With reasonable amounts of ROM
storage, 16-17 bit accuracy could be achieved. One could use bipartite tables
in software, but the storage savings are more limited since one can adjust the
storage per element at best in increments of one byte. For example one could
use ints for the base table and chars or shorts for the offset table.

For 24-bit accuracy, the ROM size required for bipartite tables becomes somewhat
prohibitive, which is why modern GPUs use quadratic interpolation to generate
several functions accurate to apprximately single precision, such as reciprocal,
reciprocal square root, exp2(), log2(), sin(), and cos(). See:

 Stuart Oberman and Michael Siu. "A high-performance area-efficient multifunction
 interpolator". In Proceedings of the 17th IEEE Symposium on Computer Arithmetic,
 pages 272-279, July 2005.


> > I would be interested in learning in which scenarios requiring relative high
> > accuracy (such as here) quadratic interpolation is dispensable, and linear
> > interpolation or straight lookup are feasible.
>
> i need more context to understand what you are seeking to answer.  if [...]

I was mostly hoping to hear about use cases where quadratic interpolation is
no longer necessary since Steve stated that

   "in many DSP scenarios these days one can just plug the mantissa into
    a (larger) LUT and be done with it."

Obviously that's true if the accuracy requirements are fairly low, but for
scenarios like the one posed by the original poster (8.24 precision), I have
a difficult time imagining how one could arrive at an efficient solution with
straight table lookup, or even linear interpolation. But I have been out of
CPU design work for 10+ years, and out of embedded SW work for 5+ year so
the crossover point for switching from straight LUT to linear interpolation
and from linear to quadratic interpolation could have shifted considerably.

-- Norbert


0
Reply juffa (19) 7/24/2010 8:19:18 AM

robert bristow-johnson  <rbj@audioimagination.com> wrote:

>On Jul 24, 1:46�am, spop...@speedymail.org (Steve Pope) wrote:

>Steve, even if you have IEEE doubles, but no std math lib, what would
>you do to get exp() and log()?  how would you implement the stdlib
>exp() and log() or sqrt() or sin() or tan() or atan(), if you had to?

>isn't that of significance to some degree?  

I can see where it would be significant, but I haven't had to do
this.  This undoubtedly is a function of the exact mix of projects
I happen to have worked on but, no, I typically am using either a
C/C++ math library or I am working with target bit widths (or
in a few cases, target short floating-point formats) which vary
from point to point in a system, and aren't amenable to building
a library around.

>or is the source code (say
>for the gnu std math lib) published somewhere?  if so, i would like to
>see what order of polynomial they use and the coefficients.
>
>r b-j

Steve
0
Reply spope33 7/24/2010 5:45:18 PM

> "robert bristow-johnson" <rbj@audioimagination.com> wrote in message 
> news:03c06b7f-7ce4-4925-b1b6-fe8b01e91268@c10g2000yqi.googlegroups.com...
> On Jul 24, 1:46 am, spop...@speedymail.org (Steve Pope) wrote:
> > I do not doubt the validity of the algorithms (for interpolating
> > down to a point where it reduces your table lookup size).
> >
> > What I doubt is the significance, these days, of the vanishing
> > grey area between what you'd want to just do in IEEE double precision
> > floating point, and what you'd want to design to a target bit-width
> > in hardware. The idea of an in-between, general-purpose format
> > is what I'm not sure is still there anymore.
>
> dunno what you mean by an "in-between, general-purpose format" but,
> Steve, even if you have IEEE doubles, but no std math lib, what would
> you do to get exp() and log()?  how would you implement the stdlib
> exp() and log() or sqrt() or sin() or tan() or atan(), if you had to?
>
> isn't that of significance to some degree?  or is the source code (say
> for the gnu std math lib) published somewhere?  if so, i would like to
> see what order of polynomial they use and the coefficients.
>
> r b-j

Best I know, the glibc source distribution includes math library sources:

http://ftp.gnu.org/gnu/glibc/

Source code for two well-regarded math libraries is available via netlib:

http://netlib.org/fdlibm/
http://netlib.org/cephes/

There are probably many more math libraries available for perusal on the
internet, possibly including the open source versions of various vendors'
math libraries.

If you want to "roll your own", there are a number of books that provide
a reasonable starting point. A fairly recent one is

Jean-Michel Muller, "Elementary Functions: Algorithms and Implementation,
2nd ed." (Birkh�user, 2005)


-- Norbert 


0
Reply juffa (19) 7/24/2010 7:24:23 PM

22 Replies
1149 Views

(page loaded in 0.218 seconds)

Similiar Articles:


















7/20/2012 2:14:05 PM


Reply: