Suitable Integer Square Root Algorithm for 32-64-Bit Integers on Inexpensive Microcontroller?

  • Follow


Hi,

I have the need in a microcontroller application to calculate floor(sqrt(x)) 
where x is approximately in the range of 2^32 - 2^64.

What are the algorithms I should look at?

These small microcontrollers are characterized by having a 8-bit times 8-bit 
to give 16-bit result integer unsigned multiply instruction, and a similar 
16/8 division instruction to give a quotient and a remainder. 
Mulitplications of larger integers have to be done by multiplying the bytes 
and adding.

I'm aware of these algorithms:

a)Adding consecutive odd integers and counting how many you have to add, 
i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.

b)Trial squaring, testing setting each bit to "1", i.e.

http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu8sqrtfrxx/src/atu8sqrtfrxx.c?revision=1.3&view=markup

c)Another method that seems to work (don't know the name of it), here is 
source code:

http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu16sqrtx10frxx/src/atu16sqrtx10frxx.c?revision=1.3&view=markup

What other options should I investigate?

Thanks, Datesfat 

0
Reply Datesfat 2/4/2010 7:22:54 PM

In article <yY6dnfRKeZATg_bWnZ2dnUVZ_sydnZ2d@giganews.com>, 
datesfat.chicks@gmail.com says...
> 
> Hi,
> 
> I have the need in a microcontroller application to calculate floor(sqrt(x)) 
> where x is approximately in the range of 2^32 - 2^64.
> 
> What are the algorithms I should look at?
> 
> These small microcontrollers are characterized by having a 8-bit times 8-bit 
> to give 16-bit result integer unsigned multiply instruction, and a similar 
> 16/8 division instruction to give a quotient and a remainder. 
> Mulitplications of larger integers have to be done by multiplying the bytes 
> and adding.
> 
> I'm aware of these algorithms:
> 
> a)Adding consecutive odd integers and counting how many you have to add, 
> i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
> 
> b)Trial squaring, testing setting each bit to "1", i.e.
> 
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu8sqrtfrxx/src/atu8sqrtfrxx.c?revision=1.3&view=markup
> 
> c)Another method that seems to work (don't know the name of it), here is 
> source code:
> 
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu16sqrtx10frxx/src/atu16sqrtx10frxx.c?revision=1.3&view=markup
> 
> What other options should I investigate?

Here is some C code for integer square roots using many different 
algorithms:
http://cap.connx.com/chess-engines/new-approach/jsqrt0.ZIP

Worth a look:
http://www.azillionmonkeys.com/qed/sqroot.html

Consider also:
http://www.google.com/#hl=en&source=hp&q=fast+integer+square+root&rlz=
1W1GGLD_en&aq=0&aqi=g2&oq=fast+integer+sq&fp=a048890d3c90c6fc

0
Reply Dann 2/4/2010 7:32:18 PM


Datesfat Chicks wrote:
> I have the need in a microcontroller application to calculate 
> floor(sqrt(x)) where x is approximately in the range of 2^32 - 2^64.

x = y^2 - (2^16)^2

> What are the algorithms I should look at?

Google is your friend  :>  There are many sqrt algorithms with
different tradeoffs (speed/size).

I would, instead, ask that you might want to examine what *else*
you know about "x".  I.e., you've already constrained the range
(why?  does that give you any other information that can be
exploited?); are there any other characteristics about how
"x" is synthesized that you can exploit to divide and conquer?

E.g. if x = y * z, then knowing sqrt(y) and/or sqrt(z) simplifies
your problem.

> These small microcontrollers are characterized by having a 8-bit times 
> 8-bit to give 16-bit result integer unsigned multiply instruction, and a 
> similar 16/8 division instruction to give a quotient and a remainder. 
> Mulitplications of larger integers have to be done by multiplying the 
> bytes and adding.
> 
> I'm aware of these algorithms:
> 
> a)Adding consecutive odd integers and counting how many you have to add, 
> i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
> 
> b)Trial squaring, testing setting each bit to "1", i.e.
> 
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu8sqrtfrxx/src/atu8sqrtfrxx.c?revision=1.3&view=markup 
> 
> c)Another method that seems to work (don't know the name of it), here is 
> source code:
> 
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu16sqrtx10frxx/src/atu16sqrtx10frxx.c?revision=1.3&view=markup 
> 
> What other options should I investigate?
0
Reply D 2/4/2010 7:51:16 PM

On Feb 4, 9:22=A0pm, "Datesfat Chicks" <datesfat.chi...@gmail.com>
wrote:
> Hi,
>
> I have the need in a microcontroller application to calculate floor(sqrt(=
x))
> where x is approximately in the range of 2^32 - 2^64.
>
> What are the algorithms I should look at?
>
> These small microcontrollers are characterized by having a 8-bit times 8-=
bit
> to give 16-bit result integer unsigned multiply instruction, and a simila=
r
> 16/8 division instruction to give a quotient and a remainder.
> Mulitplications of larger integers have to be done by multiplying the byt=
es
> and adding.
>
> I'm aware of these algorithms:
>
> a)Adding consecutive odd integers and counting how many you have to add,
> i.e. 1 + 3 =3D 4, 4 + 5 =3D 9, 9 + 7 =3D 16, 16 + 9 =3D 25, etc.
>
> b)Trial squaring, testing setting each bit to "1", i.e.
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> c)Another method that seems to work (don't know the name of it), here is
> source code:
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> What other options should I investigate?
>
> Thanks, Datesfat

The one I have been using last 15-20 years is probably your B, at
least sounds
like what I made up back then - successive approximation for each bit
of the result.
If multiplication is fast on the core you will be using it it is hard
to beat.

Dimiter
0
Reply Didi 2/4/2010 8:19:32 PM

"Didi" <dp@tgi-sci.com> wrote in message 
news:8b5a6ce8-5e18-49e8-815a-5733e4fe5fe3@g1g2000yqi.googlegroups.com...
On Feb 4, 9:22 pm, "Datesfat Chicks" <datesfat.chi...@gmail.com>
wrote:
> Hi,
>
> I have the need in a microcontroller application to calculate 
> floor(sqrt(x))
> where x is approximately in the range of 2^32 - 2^64.
>
> What are the algorithms I should look at?
>
> These small microcontrollers are characterized by having a 8-bit times 
> 8-bit
> to give 16-bit result integer unsigned multiply instruction, and a similar
> 16/8 division instruction to give a quotient and a remainder.
> Mulitplications of larger integers have to be done by multiplying the 
> bytes
> and adding.
>
> I'm aware of these algorithms:
>
> a)Adding consecutive odd integers and counting how many you have to add,
> i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
>
> b)Trial squaring, testing setting each bit to "1", i.e.
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> c)Another method that seems to work (don't know the name of it), here is
> source code:
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> What other options should I investigate?
>
> Thanks, Datesfat
>
>The one I have been using last 15-20 years is probably your B, at
>least sounds
>like what I made up back then - successive approximation for each bit
>of the result.
>If multiplication is fast on the core you will be using it it is hard
>to beat.

Thanks for the insight.

For small operands, I agree with your analysis.

However, for larger operands, the Babylonian method:

http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method

looks very promising.

The issue is that for the cost of one division (the /2 and addition is 
virtually free) you get far more than one bit of additional information 
about the square root.

For larger operands, I don't believe that (b) will be the right answer any 
longer.

But I'm not sure yet ...

Thanks for all, Datesfat 

0
Reply Datesfat 2/4/2010 8:33:36 PM

Datesfat Chicks wrote:
> a)Adding consecutive odd integers and counting how many you have to add, 
> i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.

That quite certainly can't work for the input range you're targeting. 
You would be looking at up to four billion iterations if x was 2^64

> b)Trial squaring, testing setting each bit to "1", i.e.

A whole lot better already.  It becomes even better if you update the 
square of your current result differentially, applying the binomial 
formula.  As you update the result-to-be ('res' in the following), by 
adding a 1-bit at position 'k':

	  x --> x + 2^k

it's square, which eventually should be equal to the input number, 
updates too:

	x^2 --> x^2 + 2*(2^k)*x + (2^k)^2
               = x^2 + x<<(k+1) + 1<<(2*k)

That's two additions and a bit of shifting.  Even an 8-bit CPU should be 
able to manage that in reasonable time.  Basically you go looking for a 
bit position 'k' such that the current difference between x^2 and the 
target value is still larger than or equal to (x<<(k+1)+1<<(2*k)).

This is actually a special case of an algorithm people used to be taught 
in higher schools.  Just like we all (hopefully) learned to do long 
multiplication and long division in elementary school, there used to be 
an algorithm for long square-roots.  I've seen it in a textbook used to 
teach metal workers' apprentices from the 1940s.  It's a little daunting 
in decimal numbers on paper, but boils down to the above when performed 
in binary: you figure out one digit per iteration by looking at the 
difference between the target figure and square of the result-to-be. 
It's also somewhat similar to long division in that way.

> What other options should I investigate?

One traditional trick for sqrt(), and some other inverse functions, too, 
  is to treat it as a root-finding problem for the function

	f(x) := x^2 - y

The x that solves f(x)=0 is sqrt(y).  You can solve that using the 
Newton-Raphson iteration algorithm, a.k.a. the tangent method:

	x' = x - f(x)/f'(x)
	   = x - (x^2 - y) / (2 * x)
	   = x - x/2 - y/(2*x)
	   = x/2 - (y/2)/x

This requires long division, but pays off by converging on the solution 
real fast, once you've come anywhere near it.

This algorithm and the bit-by-bit one above are somewhat related.  The 
search for the right bit position, 'k', is effectively a simplified 
version of the division (y/2)/x.
0
Reply windows 2/4/2010 9:32:25 PM

On Thu, 04 Feb 2010 14:22:54 -0500, Datesfat Chicks wrote:

> I have the need in a microcontroller application to calculate floor(sqrt(x)) 
> where x is approximately in the range of 2^32 - 2^64.
> 
> What are the algorithms I should look at?

Newton-Raphson or bisection, depending upon how slow division is. N-R is
asymptotically faster (the number of correct digits doubles at each step),
but each step requires a division.

Bisection is essentially your option b), but it can be computed much more
efficiently than the algorithm given. You don't need arbitrary multiplies,
only shifts, as:

(a+2^k)^2 = a^2 + 2.a.2^k + (2^k)^2
          = a^2 + a.2^(k+1) + 2^2k

IOW:

uint32_t isqrt(uint64_t x)
{
    uint64_t a = 0;
    uint64_t a2 = 0;
    int k;

    for (k = 31; k >= 0; k--) {

	uint64_t t = a2 + (a << (k+1)) + ((uint64_t)1 << (k + k));
	if (x > t) {
	    a += 1 << k;
	    a2 = t;
	}
    }

    return a;
}

Depending upon the CPU, it may be faster to calculate the a.2^(k+1) and
2^2k terms incrementally.

0
Reply Nobody 2/5/2010 1:38:50 AM


Datesfat Chicks wrote:

> Hi,
> 
> I have the need in a microcontroller application to calculate 
> floor(sqrt(x)) where x is approximately in the range of 2^32 - 2^64.

Homework?

For microcontroller purpose, a 10-bit accurate sqrt approximation should 
probably suffice. I can hardly imagine anything practical that would 
require more then 14 significant bits. So, normalize your number and 
then do polynomial or piecewise linear approximation.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
0
Reply Vladimir 2/5/2010 1:58:08 AM

"Nobody" <nobody@nowhere.com> wrote in message 
news:pan.2010.02.05.01.38.49.156000@nowhere.com...
> On Thu, 04 Feb 2010 14:22:54 -0500, Datesfat Chicks wrote:
>
>> I have the need in a microcontroller application to calculate 
>> floor(sqrt(x))
>> where x is approximately in the range of 2^32 - 2^64.
>>
>> What are the algorithms I should look at?
>
> Newton-Raphson or bisection, depending upon how slow division is. N-R is
> asymptotically faster (the number of correct digits doubles at each step),
> but each step requires a division.
>
> Bisection is essentially your option b), but it can be computed much more
> efficiently than the algorithm given. You don't need arbitrary multiplies,
> only shifts, as:
>
> (a+2^k)^2 = a^2 + 2.a.2^k + (2^k)^2
>          = a^2 + a.2^(k+1) + 2^2k

The identity you provided above is so obvious that I'm beating myself on the 
head with a hammer right now for not spotting that as an alternative to 
multiplying again on each iteration.

Ouch ouch ouch.

Thanks sincerely.

As you know, division of large integers is fairly expensive on a little 
microcontroller.  The "bisection" insight you provided is definitely 
something I will investigate.

Datesfat 

0
Reply Datesfat 2/5/2010 2:26:42 AM

In article <hkfecv$scf$02$1@news.t-online.com>,
Hans-Bernhard Br�ker  <HBBroeker@t-online.de> wrote:

>One traditional trick for sqrt(), and some other inverse functions, too, 
>  is to treat it as a root-finding problem for the function
>
>	f(x) := x^2 - y
>
>The x that solves f(x)=0 is sqrt(y).  You can solve that using the 
>Newton-Raphson iteration algorithm, a.k.a. the tangent method:

It might be better to solve g(x)=0  where  g(x) = 1/x^2 - y   (the
solution is obviously  1/sqrt(y) ) and then multiply by  y.  The
reason is that  NR iteration in this case does not involve any
division except for a bit shift:

	x' = x - g(x)/g'(x)
	   = x - (1/x^2 - y) / (-2 / x^3)
	   = x + x/2 - y*x^3/2
	   = x*(3 - y*x^2)/2

In the same way we can compute reciprocals using only multiplication
(use  h(x) = 1/x - y ).

dave

0
Reply rusin 2/5/2010 8:07:48 PM

Hans-Bernhard Br�ker wrote:
> Datesfat Chicks wrote:

> 
>> What other options should I investigate?
> 
> One traditional trick for sqrt(), and some other inverse functions, too, 
>  is to treat it as a root-finding problem for the function
> 
>     f(x) := x^2 - y
> 
> The x that solves f(x)=0 is sqrt(y).  You can solve that using the 
> Newton-Raphson iteration algorithm, a.k.a. the tangent method:
> 
>     x' = x - f(x)/f'(x)
>        = x - (x^2 - y) / (2 * x)
>        = x - x/2 - y/(2*x)
>        = x/2 - (y/2)/x
> 
> This requires long division, but pays off by converging on the solution 
> real fast, once you've come anywhere near it.
> 

The trick I have seen for finding a good starting value is to find
the first '1' which you can do with table lookups on each byte and
generate 2^(N/2) which is just bit shifts.*

Hence if the highest '1' is at bit 47, then  2^47 <= num < 2^48
so your starting value would be 2^23 = 1 << 23

ISTR that there was a proof that N-R converged for 32-bit numbers
to the correct 16-bit result given this starting value in at most
3 iterations, but I no longer have the analysis.  At any rate,
it would not be difficult to test.


* Some useful bit hacks for doing the table lookups, etc.
http://graphics.stanford.edu/~seander/bithacks.html
0
Reply Steve 2/5/2010 11:07:08 PM

"Steve C" <smallpond@juno.com> wrote in message 
news:hki8bg$ors$1@news.eternal-september.org...
>
> ISTR that there was a proof that N-R converged for 32-bit numbers
> to the correct 16-bit result given this starting value in at most
> 3 iterations, but I no longer have the analysis.  At any rate,
> it would not be difficult to test.

One thing I often do is simulate microcontroller algorithms on a PC.  I 
would find out rather rapidly for the 2^32 case whether it always converged 
within three iterations.  I don't know how long it would take, but 
definitely under an hour and probably much less.

However, some of the integers I'm interested in may be around 2^64.  I 
believe that would be a problem even for a PC.  So some formal analysis 
would be necessary.

Datesfat 

0
Reply Datesfat 2/7/2010 5:23:15 AM

On Feb 4, 12:22=A0pm, "Datesfat Chicks" <datesfat.chi...@gmail.com>
wrote:
> Hi,
>
> I have the need in a microcontroller application to calculate floor(sqrt(=
x))
> where x is approximately in the range of 2^32 - 2^64.
>
> What are the algorithms I should look at?
>
> These small microcontrollers are characterized by having a 8-bit times 8-=
bit
> to give 16-bit result integer unsigned multiply instruction, and a simila=
r
> 16/8 division instruction to give a quotient and a remainder.
> Mulitplications of larger integers have to be done by multiplying the byt=
es
> and adding.
>
> I'm aware of these algorithms:
>
> a)Adding consecutive odd integers and counting how many you have to add,
> i.e. 1 + 3 =3D 4, 4 + 5 =3D 9, 9 + 7 =3D 16, 16 + 9 =3D 25, etc.
>
> b)Trial squaring, testing setting each bit to "1", i.e.
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> c)Another method that seems to work (don't know the name of it), here is
> source code:
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> What other options should I investigate?
>
> Thanks, Datesfat

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D

From this site:

http://www.convict.lu/Jeunes/Math/square_root_CORDIC.htm

(Copied from site, untested)
Here a C-routine for integer-square-rooting for numbers between 0 and
65536:

int sqrt (int x)
{
int base, i, y ;
       base =3D 128 ;
       y =3D 0 ;
       for (i =3D 1; i <=3D 8; i++)
       {
               y + =3D  base ;
               if  ( (y * y) > x )
               {
                       y -=3D base ;  // base should not have been
added, so we substract again
               }
               base >> 1 ;      // shift 1 digit to the right =3D divide
by 2
        }
        return y ;
}


                                                           Enrico
0
Reply Enrico 2/7/2010 6:26:09 AM

Is this a homework question?

You might want to consider a 32 bit MCU with Floating Point Unit.
http://america.renesas.com/fmwk.jsp?cnt=r32c111_root.jsp&fp=/products/mpumcu/m16c_family/r32c100_series/r32c111_group/


--Vinnie	   
					
---------------------------------------		
Posted through http://www.EmbeddedRelated.com
0
Reply vinnie 2/16/2010 9:35:33 PM

13 Replies
497 Views

(page loaded in 0.085 seconds)

Similiar Articles:






7/21/2012 7:02:48 PM


Reply: