Block Ram vs Distributed Ram

  • Follow


Hi,

I have a query regaring RAM usage in FPGA. Please help me in
understanding it.

Whats the thumb rule for choosing Block ram vs distributed Ram?

When, which should be preffered? Why?


Thanks


Ashish

0
Reply ashish.shringarpure (8) 6/8/2006 4:41:19 AM

On 7 Jun 2006 21:41:19 -0700, "Ashish" <ashish.shringarpure@gmail.com>
wrote:

>Hi,
>
>I have a query regaring RAM usage in FPGA. Please help me in
>understanding it.
>
>Whats the thumb rule for choosing Block ram vs distributed Ram?
>
>When, which should be preffered? Why?
>
>

- If it should be dual ported with different clocks per port, use
block ram
- If it is large (more then 1kbit, use block ram)
- If you are have unused block ram, use it
....

Use distributed ram only when you can't use block ram

Zara


BTW, these are guidelines. As every guide line, it may be followed. Or
not. It is your personal option, and you may behave differently if
such things as timing, clock distrbuiton... indicates you should.

0
Reply Zara 6/8/2006 5:22:41 AM


With each Virtex FPGA you get a certain number of BlockRAMs. Until you
have used all of them, they are incrementally free, so use them, even
if it may look wasteful (18 K or 36 K bit capacity).

Nice features like progrmmable width/depth ratio, dual-ported,
read-befor-write option, etc
You can even do one read-modify-write operation per clock cycle (using
both ports)

One caveat: Reading is a synchronous operation, it only occurs after a
clock edge.

Distributed RAM uses 16-bit LUTs and reads in a combinatorial way, and
is faster, but requires more design effort.

Peter Alfke, Xilinx Applications

0
Reply Peter 6/8/2006 5:38:31 AM

Peter Alfke wrote:
> With each Virtex FPGA you get a certain number of BlockRAMs. Until you
> have used all of them, they are incrementally free, so use them, even
> if it may look wasteful (18 K or 36 K bit capacity).
>
> Nice features like progrmmable width/depth ratio, dual-ported,
> read-befor-write option, etc
> You can even do one read-modify-write operation per clock cycle (using
> both ports)
>
> One caveat: Reading is a synchronous operation, it only occurs after a
> clock edge.
>
> Distributed RAM uses 16-bit LUTs and reads in a combinatorial way, and
> is faster, but requires more design effort.
>
This is my main query. How does speed matter here? What is thumb rule
from operation speed point of view?

I have some logic which uses counters, some configuration registers and
FIFO.
Fortunately same clock is used for all.

What will be best choice for implementing these? BRAM and distributed
RAM?
Pls. advice.

I am using Spartan2 device.

> Peter Alfke, Xilinx Applications



Thanks 


Ashish

0
Reply Ashish 6/8/2006 6:51:14 AM

>From my experiance there will be large routing delay if you use block
RAMs and that will be very difficult to avoid if your block RAM usage
is in a scattered manner. I tried to implement a 16 port RAM design
suggested early in this group but with lot of effort could get only
10ns delay.
Sumesh V S

0
Reply vssumesh 6/8/2006 9:18:25 AM

On Wed, 07 Jun 2006 22:38:31 -0700, Peter Alfke wrote:

> With each Virtex FPGA you get a certain number of BlockRAMs. Until you
> have used all of them, they are incrementally free, so use them, even
> if it may look wasteful (18 K or 36 K bit capacity).
> 
> Nice features like progrmmable width/depth ratio, dual-ported,
> read-befor-write option, etc
> You can even do one read-modify-write operation per clock cycle (using
> both ports)
> 
> One caveat: Reading is a synchronous operation, it only occurs after a
> clock edge.
> 
> Distributed RAM uses 16-bit LUTs and reads in a combinatorial way, and
> is faster, but requires more design effort.
> 
> Peter Alfke, Xilinx Applications


The big advantage of distributed RAM is that it's everywhere on the chip
which means that PAR can put it near the logic that it's driving.
Routing delays are the dominant factor in determining the overall
performance of a design. When I look at the worst case paths in .twr
reports I generally see that 75% of the delay is due to interconnect so
anything that can be done to simplify PARs placement job is helpful.
Sixteen word FIFOs are deep enough for many applications, for example I
generally use them to couple different stages of my pipelines together.
The other advantage of distributed RAM is lower latency. The read port on
distributed RAM is asynchronous which saves a minimum of one clock cycle.
In high speed designs it's frequently necessary to double pipeline the
output of a Block RAM, i.e use the output register that's included V4
Block RAM plus an additional dflop register to handle the interconnect
delay, so the difference between distributed RAM and Block ram can be as
much as three cycles (although I would generally pipeline the output of
the distributed RAM in those applications also so the difference is two
cycles). 

In my experience it always been pretty obvious which type of memory to
use. Either you need a lot, in which case you use Block RAM, or you need a
little in which case you use distributed RAM. I've never run into a
situation where I needed something intermediate, either 16 or rarely 32
words is enough or I need 1K or more.

0
Reply Josh 6/8/2006 12:38:39 PM

I think everybody asking for advice should always mention the intended
clock rate.
A 5 ns delay is horribly slow in some applications, but blindingly fast
in others.
Most experienced designers are accustomed to push the envelope, but
many novices might operate below 50 MHz. My first question is always:
How fast is your design?

Peter Alfke, Xilinx
=================
Josh Rosen wrote:
> On Wed, 07 Jun 2006 22:38:31 -0700, Peter Alfke wrote:
>
> > With each Virtex FPGA you get a certain number of BlockRAMs. Until you
> > have used all of them, they are incrementally free, so use them, even
> > if it may look wasteful (18 K or 36 K bit capacity).
> >
> > Nice features like progrmmable width/depth ratio, dual-ported,
> > read-befor-write option, etc
> > You can even do one read-modify-write operation per clock cycle (using
> > both ports)
> >
> > One caveat: Reading is a synchronous operation, it only occurs after a
> > clock edge.
> >
> > Distributed RAM uses 16-bit LUTs and reads in a combinatorial way, and
> > is faster, but requires more design effort.
> >
> > Peter Alfke, Xilinx Applications
>
>
> The big advantage of distributed RAM is that it's everywhere on the chip
> which means that PAR can put it near the logic that it's driving.
> Routing delays are the dominant factor in determining the overall
> performance of a design. When I look at the worst case paths in .twr
> reports I generally see that 75% of the delay is due to interconnect so
> anything that can be done to simplify PARs placement job is helpful.
> Sixteen word FIFOs are deep enough for many applications, for example I
> generally use them to couple different stages of my pipelines together.
> The other advantage of distributed RAM is lower latency. The read port on
> distributed RAM is asynchronous which saves a minimum of one clock cycle.
> In high speed designs it's frequently necessary to double pipeline the
> output of a Block RAM, i.e use the output register that's included V4
> Block RAM plus an additional dflop register to handle the interconnect
> delay, so the difference between distributed RAM and Block ram can be as
> much as three cycles (although I would generally pipeline the output of
> the distributed RAM in those applications also so the difference is two
> cycles).
>
> In my experience it always been pretty obvious which type of memory to
> use. Either you need a lot, in which case you use Block RAM, or you need a
> little in which case you use distributed RAM. I've never run into a
> situation where I needed something intermediate, either 16 or rarely 32
> words is enough or I need 1K or more.

0
Reply Peter 6/8/2006 5:12:35 PM

6 Replies
183 Views

(page loaded in 0.159 seconds)

Similiar Articles:













7/13/2012 4:51:36 AM


Reply: