f



CORDIC bit-serial vs. bit-parallel

Hello,

I'm trying to write a CORDIC macro for a polar transmitter FPGA design.
I've used the parallel approach, but when I do a timing estimation the
longest delay path is through the CORDIC routine, and limits the
maximum clock rate of the whole design to about 40MHz. Other parts of
the design on the same board need to run at much faster rate so I'm
considering using a bit-serial version.

As I understand it, the bit-parallel implementation has low latency and
therefore high throughput, but because of the the word-wide shifts it
clocks at a slower rate. Conversely, the bit-serial
routine has a high latency and low throughput rate, but allows the
board to run at a faster clock rate. Is this right?

My question is:

In the bit-serial implementation, you still need to perform the shift
operation on the entire word to select the right bit to send to the
bit-serial adder/subtractor, so how does this solve the problem of a
slow clock rate due to the shift operation?

Thanks,

Mees

0
m_oylulan (28)
5/18/2005 11:16:01 AM
comp.arch.fpga 18587 articles. 2 followers. Post Follow

5 Replies
779 Views

Similar Articles

[PageSpeed] 9

This is why you will get paid the big bucks!!!

Now figure how much you need to make serial or registered to accomplish
the task.

I bet some registering but not completely a serial approach will work.

gm

0
GMM50
5/18/2005 6:53:04 PM
Bit wide shifts are slowing you down....
Are you implementing the CORDIC as a beautifully pipelined hardware datapath
or are you doing this in software?
CORDIC should be able to run easily at 150 MHz+ if I recall correctly.  If
he doesn't respond quickly here, look for Ray Andraka's CORDIC information
in his paper entitled:

    A Survey of CORDIC Algorithms for FPGAs
at
    http://www.andraka.com/papers.htm


<m_oylulan@hotmail.com> wrote in message
news:1116414961.535018.32730@g49g2000cwa.googlegroups.com...
> Hello,
>
> I'm trying to write a CORDIC macro for a polar transmitter FPGA design.
> I've used the parallel approach, but when I do a timing estimation the
> longest delay path is through the CORDIC routine, and limits the
> maximum clock rate of the whole design to about 40MHz. Other parts of
> the design on the same board need to run at much faster rate so I'm
> considering using a bit-serial version.
>
> As I understand it, the bit-parallel implementation has low latency and
> therefore high throughput, but because of the the word-wide shifts it
> clocks at a slower rate. Conversely, the bit-serial
> routine has a high latency and low throughput rate, but allows the
> board to run at a faster clock rate. Is this right?
>
> My question is:
>
> In the bit-serial implementation, you still need to perform the shift
> operation on the entire word to select the right bit to send to the
> bit-serial adder/subtractor, so how does this solve the problem of a
> slow clock rate due to the shift operation?
>
> Thanks,
>
> Mees
>


0
John_H
5/18/2005 7:43:41 PM
m_oylulan@hotmail.com wrote:

>Hello,
>
>I'm trying to write a CORDIC macro for a polar transmitter FPGA design.
>I've used the parallel approach, but when I do a timing estimation the
>longest delay path is through the CORDIC routine, and limits the
>maximum clock rate of the whole design to about 40MHz. Other parts of
>the design on the same board need to run at much faster rate so I'm
>considering using a bit-serial version.
>
>As I understand it, the bit-parallel implementation has low latency and
>therefore high throughput, but because of the the word-wide shifts it
>clocks at a slower rate. Conversely, the bit-serial
>routine has a high latency and low throughput rate, but allows the
>board to run at a faster clock rate. Is this right?
>
>My question is:
>
>In the bit-serial implementation, you still need to perform the shift
>operation on the entire word to select the right bit to send to the
>bit-serial adder/subtractor, so how does this solve the problem of a
>slow clock rate due to the shift operation?
>
>Thanks,
>
>Mees
>
>  
>
Is this an iterative or an unrolled?    I am assuming it is iterative, 
in which case you have a rather nasty shifter to deal with which is 
killing your performance assuming one clock per iteration.  You can 
pipeline the iterations to allow more than one iteration result at a 
time in the loop, but it requires a bit of careful bookkeeping in the 
design.  For a bit serial implementation, the shift is accomplished by 
varying the delay, which if implemented in memories involves messing 
with the address to reduce the overhead for the shifter.

-- 
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759


0
Ray
5/19/2005 3:45:58 PM
I don't know if anyone is still reading this thread, but could I ask a
couple of more questions?

I am using (or trying to use) the iterative CORDIC algorithm written in
software. I've read Ray Andraka's paper on designing a bit serial
processor, in which he writes that when considering whether or not to
use a bit-serial design:

"...the application for the processor must be able to tolerate any
pipeline delay introduced by the serial processor. The latency in a
parallel system is frequently as high or higher than the equivalent
serial system so this is rarely a concern."

I find this statement confusing.  I thought that the advantage of the
bit-parallel was that it has a much lower latency = number of
iterations, while the bit-serial has a latency = word width * number of
iterations. So why is the "latency in a parallel system as high or
higher?"

Thankyou,
Mees

0
m_oylulan
6/7/2005 5:02:23 PM
m_oylulan@hotmail.com wrote:

>I find this statement confusing.  I thought that the advantage of the
>bit-parallel was that it has a much lower latency = number of
>iterations, while the bit-serial has a latency = word width * number of
>iterations. So why is the "latency in a parallel system as high or
>higher?"
>
>Thankyou,
>Mees
>
>  
>
At a given clock frequency,  it is true that the bit parallel will have 
a lower latency (that should be obvious),  however a totally bit serial 
design can generally be clocked faster than an equivalent bit parallel 
design.  In certain pipelined bit serial designs, you can also begin the 
next stage  before the previous one is completed, hiding some of the 
latency, so the overall latency is only a little longer than the bit 
parallel latency.  Unfortunately, CORDIC is not one of those because you 
need the sign (last bit generated) of one stage before you start the 
processing for the next stage.  Nevertheless, at the time that paper was 
written, a bit serial design in the then current FPGAs could be clocked 
much faster than a bit parallel arithmetic design in the same part, so 
while the number of clocks of latency was greater, the higher clock 
frequency makes up for much of that latency in terms of absolute time. 

-- 
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759


0
Ray
6/10/2005 1:01:36 AM
Reply: