COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

Suitable Integer Square Root Algorithm for 32-64-Bit Integers on Inexpensive Microcontroller?

• Follow

```Hi,

I have the need in a microcontroller application to calculate floor(sqrt(x))
where x is approximately in the range of 2^32 - 2^64.

What are the algorithms I should look at?

These small microcontrollers are characterized by having a 8-bit times 8-bit
to give 16-bit result integer unsigned multiply instruction, and a similar
16/8 division instruction to give a quotient and a remainder.
Mulitplications of larger integers have to be done by multiplying the bytes

I'm aware of these algorithms:

a)Adding consecutive odd integers and counting how many you have to add,
i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.

b)Trial squaring, testing setting each bit to "1", i.e.

http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu8sqrtfrxx/src/atu8sqrtfrxx.c?revision=1.3&view=markup

c)Another method that seems to work (don't know the name of it), here is
source code:

http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu16sqrtx10frxx/src/atu16sqrtx10frxx.c?revision=1.3&view=markup

What other options should I investigate?

Thanks, Datesfat

```
 0

```In article <yY6dnfRKeZATg_bWnZ2dnUVZ_sydnZ2d@giganews.com>,
datesfat.chicks@gmail.com says...
>
> Hi,
>
> I have the need in a microcontroller application to calculate floor(sqrt(x))
> where x is approximately in the range of 2^32 - 2^64.
>
> What are the algorithms I should look at?
>
> These small microcontrollers are characterized by having a 8-bit times 8-bit
> to give 16-bit result integer unsigned multiply instruction, and a similar
> 16/8 division instruction to give a quotient and a remainder.
> Mulitplications of larger integers have to be done by multiplying the bytes
>
> I'm aware of these algorithms:
>
> a)Adding consecutive odd integers and counting how many you have to add,
> i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
>
> b)Trial squaring, testing setting each bit to "1", i.e.
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu8sqrtfrxx/src/atu8sqrtfrxx.c?revision=1.3&view=markup
>
> c)Another method that seems to work (don't know the name of it), here is
> source code:
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu16sqrtx10frxx/src/atu16sqrtx10frxx.c?revision=1.3&view=markup
>
> What other options should I investigate?

Here is some C code for integer square roots using many different
algorithms:
http://cap.connx.com/chess-engines/new-approach/jsqrt0.ZIP

Worth a look:
http://www.azillionmonkeys.com/qed/sqroot.html

Consider also:
1W1GGLD_en&aq=0&aqi=g2&oq=fast+integer+sq&fp=a048890d3c90c6fc

```
 0

```Datesfat Chicks wrote:
> I have the need in a microcontroller application to calculate
> floor(sqrt(x)) where x is approximately in the range of 2^32 - 2^64.

x = y^2 - (2^16)^2

> What are the algorithms I should look at?

I would, instead, ask that you might want to examine what *else*
(why?  does that give you any other information that can be
exploited?); are there any other characteristics about how
"x" is synthesized that you can exploit to divide and conquer?

E.g. if x = y * z, then knowing sqrt(y) and/or sqrt(z) simplifies

> These small microcontrollers are characterized by having a 8-bit times
> 8-bit to give 16-bit result integer unsigned multiply instruction, and a
> similar 16/8 division instruction to give a quotient and a remainder.
> Mulitplications of larger integers have to be done by multiplying the
>
> I'm aware of these algorithms:
>
> a)Adding consecutive odd integers and counting how many you have to add,
> i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
>
> b)Trial squaring, testing setting each bit to "1", i.e.
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu8sqrtfrxx/src/atu8sqrtfrxx.c?revision=1.3&view=markup
>
> c)Another method that seems to work (don't know the name of it), here is
> source code:
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic/modxx/atu16sqrtx10frxx/src/atu16sqrtx10frxx.c?revision=1.3&view=markup
>
> What other options should I investigate?
```
 0

```On Feb 4, 9:22=A0pm, "Datesfat Chicks" <datesfat.chi...@gmail.com>
wrote:
> Hi,
>
> I have the need in a microcontroller application to calculate floor(sqrt(=
x))
> where x is approximately in the range of 2^32 - 2^64.
>
> What are the algorithms I should look at?
>
> These small microcontrollers are characterized by having a 8-bit times 8-=
bit
> to give 16-bit result integer unsigned multiply instruction, and a simila=
r
> 16/8 division instruction to give a quotient and a remainder.
> Mulitplications of larger integers have to be done by multiplying the byt=
es
>
> I'm aware of these algorithms:
>
> a)Adding consecutive odd integers and counting how many you have to add,
> i.e. 1 + 3 =3D 4, 4 + 5 =3D 9, 9 + 7 =3D 16, 16 + 9 =3D 25, etc.
>
> b)Trial squaring, testing setting each bit to "1", i.e.
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> c)Another method that seems to work (don't know the name of it), here is
> source code:
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> What other options should I investigate?
>
> Thanks, Datesfat

The one I have been using last 15-20 years is probably your B, at
least sounds
like what I made up back then - successive approximation for each bit
of the result.
If multiplication is fast on the core you will be using it it is hard
to beat.

Dimiter
```
 0

```"Didi" <dp@tgi-sci.com> wrote in message
On Feb 4, 9:22 pm, "Datesfat Chicks" <datesfat.chi...@gmail.com>
wrote:
> Hi,
>
> I have the need in a microcontroller application to calculate
> floor(sqrt(x))
> where x is approximately in the range of 2^32 - 2^64.
>
> What are the algorithms I should look at?
>
> These small microcontrollers are characterized by having a 8-bit times
> 8-bit
> to give 16-bit result integer unsigned multiply instruction, and a similar
> 16/8 division instruction to give a quotient and a remainder.
> Mulitplications of larger integers have to be done by multiplying the
> bytes
>
> I'm aware of these algorithms:
>
> a)Adding consecutive odd integers and counting how many you have to add,
> i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.
>
> b)Trial squaring, testing setting each bit to "1", i.e.
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> c)Another method that seems to work (don't know the name of it), here is
> source code:
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> What other options should I investigate?
>
> Thanks, Datesfat
>
>The one I have been using last 15-20 years is probably your B, at
>least sounds
>like what I made up back then - successive approximation for each bit
>of the result.
>If multiplication is fast on the core you will be using it it is hard
>to beat.

Thanks for the insight.

For small operands, I agree with your analysis.

However, for larger operands, the Babylonian method:

http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method

looks very promising.

The issue is that for the cost of one division (the /2 and addition is
virtually free) you get far more than one bit of additional information

For larger operands, I don't believe that (b) will be the right answer any
longer.

But I'm not sure yet ...

Thanks for all, Datesfat

```
 0

```Datesfat Chicks wrote:
> a)Adding consecutive odd integers and counting how many you have to add,
> i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.

That quite certainly can't work for the input range you're targeting.
You would be looking at up to four billion iterations if x was 2^64

> b)Trial squaring, testing setting each bit to "1", i.e.

A whole lot better already.  It becomes even better if you update the
square of your current result differentially, applying the binomial
formula.  As you update the result-to-be ('res' in the following), by
adding a 1-bit at position 'k':

x --> x + 2^k

it's square, which eventually should be equal to the input number,

x^2 --> x^2 + 2*(2^k)*x + (2^k)^2
= x^2 + x<<(k+1) + 1<<(2*k)

That's two additions and a bit of shifting.  Even an 8-bit CPU should be
able to manage that in reasonable time.  Basically you go looking for a
bit position 'k' such that the current difference between x^2 and the
target value is still larger than or equal to (x<<(k+1)+1<<(2*k)).

This is actually a special case of an algorithm people used to be taught
in higher schools.  Just like we all (hopefully) learned to do long
multiplication and long division in elementary school, there used to be
an algorithm for long square-roots.  I've seen it in a textbook used to
teach metal workers' apprentices from the 1940s.  It's a little daunting
in decimal numbers on paper, but boils down to the above when performed
in binary: you figure out one digit per iteration by looking at the
difference between the target figure and square of the result-to-be.
It's also somewhat similar to long division in that way.

> What other options should I investigate?

One traditional trick for sqrt(), and some other inverse functions, too,
is to treat it as a root-finding problem for the function

f(x) := x^2 - y

The x that solves f(x)=0 is sqrt(y).  You can solve that using the
Newton-Raphson iteration algorithm, a.k.a. the tangent method:

x' = x - f(x)/f'(x)
= x - (x^2 - y) / (2 * x)
= x - x/2 - y/(2*x)
= x/2 - (y/2)/x

This requires long division, but pays off by converging on the solution
real fast, once you've come anywhere near it.

This algorithm and the bit-by-bit one above are somewhat related.  The
search for the right bit position, 'k', is effectively a simplified
version of the division (y/2)/x.
```
 0

```On Thu, 04 Feb 2010 14:22:54 -0500, Datesfat Chicks wrote:

> I have the need in a microcontroller application to calculate floor(sqrt(x))
> where x is approximately in the range of 2^32 - 2^64.
>
> What are the algorithms I should look at?

Newton-Raphson or bisection, depending upon how slow division is. N-R is
asymptotically faster (the number of correct digits doubles at each step),
but each step requires a division.

Bisection is essentially your option b), but it can be computed much more
efficiently than the algorithm given. You don't need arbitrary multiplies,
only shifts, as:

(a+2^k)^2 = a^2 + 2.a.2^k + (2^k)^2
= a^2 + a.2^(k+1) + 2^2k

IOW:

uint32_t isqrt(uint64_t x)
{
uint64_t a = 0;
uint64_t a2 = 0;
int k;

for (k = 31; k >= 0; k--) {

uint64_t t = a2 + (a << (k+1)) + ((uint64_t)1 << (k + k));
if (x > t) {
a += 1 << k;
a2 = t;
}
}

return a;
}

Depending upon the CPU, it may be faster to calculate the a.2^(k+1) and
2^2k terms incrementally.

```
 0

```
Datesfat Chicks wrote:

> Hi,
>
> I have the need in a microcontroller application to calculate
> floor(sqrt(x)) where x is approximately in the range of 2^32 - 2^64.

Homework?

For microcontroller purpose, a 10-bit accurate sqrt approximation should
probably suffice. I can hardly imagine anything practical that would
require more then 14 significant bits. So, normalize your number and
then do polynomial or piecewise linear approximation.

DSP and Mixed Signal Design Consultant
http://www.abvolt.com
```
 0

```"Nobody" <nobody@nowhere.com> wrote in message
news:pan.2010.02.05.01.38.49.156000@nowhere.com...
> On Thu, 04 Feb 2010 14:22:54 -0500, Datesfat Chicks wrote:
>
>> I have the need in a microcontroller application to calculate
>> floor(sqrt(x))
>> where x is approximately in the range of 2^32 - 2^64.
>>
>> What are the algorithms I should look at?
>
> Newton-Raphson or bisection, depending upon how slow division is. N-R is
> asymptotically faster (the number of correct digits doubles at each step),
> but each step requires a division.
>
> Bisection is essentially your option b), but it can be computed much more
> efficiently than the algorithm given. You don't need arbitrary multiplies,
> only shifts, as:
>
> (a+2^k)^2 = a^2 + 2.a.2^k + (2^k)^2
>          = a^2 + a.2^(k+1) + 2^2k

The identity you provided above is so obvious that I'm beating myself on the
head with a hammer right now for not spotting that as an alternative to
multiplying again on each iteration.

Ouch ouch ouch.

Thanks sincerely.

As you know, division of large integers is fairly expensive on a little
microcontroller.  The "bisection" insight you provided is definitely
something I will investigate.

Datesfat

```
 0

```In article <hkfecv\$scf\$02\$1@news.t-online.com>,
Hans-Bernhard Br�ker  <HBBroeker@t-online.de> wrote:

>One traditional trick for sqrt(), and some other inverse functions, too,
>  is to treat it as a root-finding problem for the function
>
>	f(x) := x^2 - y
>
>The x that solves f(x)=0 is sqrt(y).  You can solve that using the
>Newton-Raphson iteration algorithm, a.k.a. the tangent method:

It might be better to solve g(x)=0  where  g(x) = 1/x^2 - y   (the
solution is obviously  1/sqrt(y) ) and then multiply by  y.  The
reason is that  NR iteration in this case does not involve any
division except for a bit shift:

x' = x - g(x)/g'(x)
= x - (1/x^2 - y) / (-2 / x^3)
= x + x/2 - y*x^3/2
= x*(3 - y*x^2)/2

In the same way we can compute reciprocals using only multiplication
(use  h(x) = 1/x - y ).

dave

```
 0

```Hans-Bernhard Br�ker wrote:
> Datesfat Chicks wrote:

>
>> What other options should I investigate?
>
> One traditional trick for sqrt(), and some other inverse functions, too,
>  is to treat it as a root-finding problem for the function
>
>     f(x) := x^2 - y
>
> The x that solves f(x)=0 is sqrt(y).  You can solve that using the
> Newton-Raphson iteration algorithm, a.k.a. the tangent method:
>
>     x' = x - f(x)/f'(x)
>        = x - (x^2 - y) / (2 * x)
>        = x - x/2 - y/(2*x)
>        = x/2 - (y/2)/x
>
> This requires long division, but pays off by converging on the solution
> real fast, once you've come anywhere near it.
>

The trick I have seen for finding a good starting value is to find
the first '1' which you can do with table lookups on each byte and
generate 2^(N/2) which is just bit shifts.*

Hence if the highest '1' is at bit 47, then  2^47 <= num < 2^48
so your starting value would be 2^23 = 1 << 23

ISTR that there was a proof that N-R converged for 32-bit numbers
to the correct 16-bit result given this starting value in at most
3 iterations, but I no longer have the analysis.  At any rate,
it would not be difficult to test.

* Some useful bit hacks for doing the table lookups, etc.
http://graphics.stanford.edu/~seander/bithacks.html
```
 0

```"Steve C" <smallpond@juno.com> wrote in message
news:hki8bg\$ors\$1@news.eternal-september.org...
>
> ISTR that there was a proof that N-R converged for 32-bit numbers
> to the correct 16-bit result given this starting value in at most
> 3 iterations, but I no longer have the analysis.  At any rate,
> it would not be difficult to test.

One thing I often do is simulate microcontroller algorithms on a PC.  I
would find out rather rapidly for the 2^32 case whether it always converged
within three iterations.  I don't know how long it would take, but
definitely under an hour and probably much less.

However, some of the integers I'm interested in may be around 2^64.  I
believe that would be a problem even for a PC.  So some formal analysis
would be necessary.

Datesfat

```
 0

```On Feb 4, 12:22=A0pm, "Datesfat Chicks" <datesfat.chi...@gmail.com>
wrote:
> Hi,
>
> I have the need in a microcontroller application to calculate floor(sqrt(=
x))
> where x is approximately in the range of 2^32 - 2^64.
>
> What are the algorithms I should look at?
>
> These small microcontrollers are characterized by having a 8-bit times 8-=
bit
> to give 16-bit result integer unsigned multiply instruction, and a simila=
r
> 16/8 division instruction to give a quotient and a remainder.
> Mulitplications of larger integers have to be done by multiplying the byt=
es
>
> I'm aware of these algorithms:
>
> a)Adding consecutive odd integers and counting how many you have to add,
> i.e. 1 + 3 =3D 4, 4 + 5 =3D 9, 9 + 7 =3D 16, 16 + 9 =3D 25, etc.
>
> b)Trial squaring, testing setting each bit to "1", i.e.
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> c)Another method that seems to work (don't know the name of it), here is
> source code:
>
> http://www.uculib.com/vvcuculib01/viewvc.cgi/uculib01/src/stm8/cosmic...
>
> What other options should I investigate?
>
> Thanks, Datesfat

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D

From this site:

http://www.convict.lu/Jeunes/Math/square_root_CORDIC.htm

(Copied from site, untested)
Here a C-routine for integer-square-rooting for numbers between 0 and
65536:

int sqrt (int x)
{
int base, i, y ;
base =3D 128 ;
y =3D 0 ;
for (i =3D 1; i <=3D 8; i++)
{
y + =3D  base ;
if  ( (y * y) > x )
{
y -=3D base ;  // base should not have been
}
base >> 1 ;      // shift 1 digit to the right =3D divide
by 2
}
return y ;
}

Enrico
```
 0

```Is this a homework question?

You might want to consider a 32 bit MCU with Floating Point Unit.
http://america.renesas.com/fmwk.jsp?cnt=r32c111_root.jsp&fp=/products/mpumcu/m16c_family/r32c100_series/r32c111_group/

--Vinnie

---------------------------------------
Posted through http://www.EmbeddedRelated.com
```
 0

13 Replies
497 Views