Hi,
I have a query regaring RAM usage in FPGA. Please help me in
understanding it.
Whats the thumb rule for choosing Block ram vs distributed Ram?
When, which should be preffered? Why?
Thanks
Ashish
|
|
0
|
|
|
|
Reply
|
ashish.shringarpure (8)
|
6/8/2006 4:41:19 AM |
|
On 7 Jun 2006 21:41:19 -0700, "Ashish" <ashish.shringarpure@gmail.com>
wrote:
>Hi,
>
>I have a query regaring RAM usage in FPGA. Please help me in
>understanding it.
>
>Whats the thumb rule for choosing Block ram vs distributed Ram?
>
>When, which should be preffered? Why?
>
>
- If it should be dual ported with different clocks per port, use
block ram
- If it is large (more then 1kbit, use block ram)
- If you are have unused block ram, use it
....
Use distributed ram only when you can't use block ram
Zara
BTW, these are guidelines. As every guide line, it may be followed. Or
not. It is your personal option, and you may behave differently if
such things as timing, clock distrbuiton... indicates you should.
|
|
0
|
|
|
|
Reply
|
Zara
|
6/8/2006 5:22:41 AM
|
|
With each Virtex FPGA you get a certain number of BlockRAMs. Until you
have used all of them, they are incrementally free, so use them, even
if it may look wasteful (18 K or 36 K bit capacity).
Nice features like progrmmable width/depth ratio, dual-ported,
read-befor-write option, etc
You can even do one read-modify-write operation per clock cycle (using
both ports)
One caveat: Reading is a synchronous operation, it only occurs after a
clock edge.
Distributed RAM uses 16-bit LUTs and reads in a combinatorial way, and
is faster, but requires more design effort.
Peter Alfke, Xilinx Applications
|
|
0
|
|
|
|
Reply
|
Peter
|
6/8/2006 5:38:31 AM
|
|
Peter Alfke wrote:
> With each Virtex FPGA you get a certain number of BlockRAMs. Until you
> have used all of them, they are incrementally free, so use them, even
> if it may look wasteful (18 K or 36 K bit capacity).
>
> Nice features like progrmmable width/depth ratio, dual-ported,
> read-befor-write option, etc
> You can even do one read-modify-write operation per clock cycle (using
> both ports)
>
> One caveat: Reading is a synchronous operation, it only occurs after a
> clock edge.
>
> Distributed RAM uses 16-bit LUTs and reads in a combinatorial way, and
> is faster, but requires more design effort.
>
This is my main query. How does speed matter here? What is thumb rule
from operation speed point of view?
I have some logic which uses counters, some configuration registers and
FIFO.
Fortunately same clock is used for all.
What will be best choice for implementing these? BRAM and distributed
RAM?
Pls. advice.
I am using Spartan2 device.
> Peter Alfke, Xilinx Applications
Thanks
Ashish
|
|
0
|
|
|
|
Reply
|
Ashish
|
6/8/2006 6:51:14 AM
|
|
>From my experiance there will be large routing delay if you use block
RAMs and that will be very difficult to avoid if your block RAM usage
is in a scattered manner. I tried to implement a 16 port RAM design
suggested early in this group but with lot of effort could get only
10ns delay.
Sumesh V S
|
|
0
|
|
|
|
Reply
|
vssumesh
|
6/8/2006 9:18:25 AM
|
|
On Wed, 07 Jun 2006 22:38:31 -0700, Peter Alfke wrote:
> With each Virtex FPGA you get a certain number of BlockRAMs. Until you
> have used all of them, they are incrementally free, so use them, even
> if it may look wasteful (18 K or 36 K bit capacity).
>
> Nice features like progrmmable width/depth ratio, dual-ported,
> read-befor-write option, etc
> You can even do one read-modify-write operation per clock cycle (using
> both ports)
>
> One caveat: Reading is a synchronous operation, it only occurs after a
> clock edge.
>
> Distributed RAM uses 16-bit LUTs and reads in a combinatorial way, and
> is faster, but requires more design effort.
>
> Peter Alfke, Xilinx Applications
The big advantage of distributed RAM is that it's everywhere on the chip
which means that PAR can put it near the logic that it's driving.
Routing delays are the dominant factor in determining the overall
performance of a design. When I look at the worst case paths in .twr
reports I generally see that 75% of the delay is due to interconnect so
anything that can be done to simplify PARs placement job is helpful.
Sixteen word FIFOs are deep enough for many applications, for example I
generally use them to couple different stages of my pipelines together.
The other advantage of distributed RAM is lower latency. The read port on
distributed RAM is asynchronous which saves a minimum of one clock cycle.
In high speed designs it's frequently necessary to double pipeline the
output of a Block RAM, i.e use the output register that's included V4
Block RAM plus an additional dflop register to handle the interconnect
delay, so the difference between distributed RAM and Block ram can be as
much as three cycles (although I would generally pipeline the output of
the distributed RAM in those applications also so the difference is two
cycles).
In my experience it always been pretty obvious which type of memory to
use. Either you need a lot, in which case you use Block RAM, or you need a
little in which case you use distributed RAM. I've never run into a
situation where I needed something intermediate, either 16 or rarely 32
words is enough or I need 1K or more.
|
|
0
|
|
|
|
Reply
|
Josh
|
6/8/2006 12:38:39 PM
|
|
I think everybody asking for advice should always mention the intended
clock rate.
A 5 ns delay is horribly slow in some applications, but blindingly fast
in others.
Most experienced designers are accustomed to push the envelope, but
many novices might operate below 50 MHz. My first question is always:
How fast is your design?
Peter Alfke, Xilinx
=================
Josh Rosen wrote:
> On Wed, 07 Jun 2006 22:38:31 -0700, Peter Alfke wrote:
>
> > With each Virtex FPGA you get a certain number of BlockRAMs. Until you
> > have used all of them, they are incrementally free, so use them, even
> > if it may look wasteful (18 K or 36 K bit capacity).
> >
> > Nice features like progrmmable width/depth ratio, dual-ported,
> > read-befor-write option, etc
> > You can even do one read-modify-write operation per clock cycle (using
> > both ports)
> >
> > One caveat: Reading is a synchronous operation, it only occurs after a
> > clock edge.
> >
> > Distributed RAM uses 16-bit LUTs and reads in a combinatorial way, and
> > is faster, but requires more design effort.
> >
> > Peter Alfke, Xilinx Applications
>
>
> The big advantage of distributed RAM is that it's everywhere on the chip
> which means that PAR can put it near the logic that it's driving.
> Routing delays are the dominant factor in determining the overall
> performance of a design. When I look at the worst case paths in .twr
> reports I generally see that 75% of the delay is due to interconnect so
> anything that can be done to simplify PARs placement job is helpful.
> Sixteen word FIFOs are deep enough for many applications, for example I
> generally use them to couple different stages of my pipelines together.
> The other advantage of distributed RAM is lower latency. The read port on
> distributed RAM is asynchronous which saves a minimum of one clock cycle.
> In high speed designs it's frequently necessary to double pipeline the
> output of a Block RAM, i.e use the output register that's included V4
> Block RAM plus an additional dflop register to handle the interconnect
> delay, so the difference between distributed RAM and Block ram can be as
> much as three cycles (although I would generally pipeline the output of
> the distributed RAM in those applications also so the difference is two
> cycles).
>
> In my experience it always been pretty obvious which type of memory to
> use. Either you need a lot, in which case you use Block RAM, or you need a
> little in which case you use distributed RAM. I've never run into a
> situation where I needed something intermediate, either 16 or rarely 32
> words is enough or I need 1K or more.
|
|
0
|
|
|
|
Reply
|
Peter
|
6/8/2006 5:12:35 PM
|
|
|
6 Replies
183 Views
(page loaded in 0.159 seconds)
Similiar Articles: Error Using Block Ram in model sim XE 5.7 - comp.arch.fpga ...I'm using ISE 6.2i and when ever i use a Block RAM I cant see the Ram output ... Block Ram vs Distributed Ram - comp.arch.fpga. Problem with ModelSim and Xilinx PCIe ... Critcal path in XILINX ISE (XST) - comp.arch.fpgaHi, Can you please tell me where I can see the critical path in Xilinx ISE tool? I am using XST as synthesis tool. Regards ... how to know that SRL16 was infered on xilinx? - comp.arch.fpga ...The xilinx core manual says that the earlier stages that need large shift registers uses Block RAM and other stages use distributed RAM. Does this mean it used SRL16s? How to update K-Factor through API - comp.cad.solidworks ...Ok, I'm pulling my hair out over this one. This subroutine was modified from the "Get all sheetmetal feature data" example. I got the example runnin... Choosing FPGAs: Xilinx vs Altera vs Actel vs Lattice - comp.arch ...Block Ram vs Distributed Ram - comp.arch.fpga Choosing FPGAs: Xilinx vs Altera vs Actel vs Lattice - comp.arch ... tables and ram. The granularity of a 2 input nand gate ... Scrubbing in Virtex-4 - comp.arch.fpgaWhen a LUT is configured as either SRL16 or distributed RAM (LUT RAM), scrubbing can ... Therefore, block RAM content configuration columns still need to be avoided ... comp.arch.fpga - page 2Combining Distributed RAM and Block RAM 1 72 (7/2/2003 2:49:36 PM) Hi, I need to use a memory with 37bits width. If I take a Block RAM Primitive RAMB16_S36_S36, 36 ... MATLABPOOL - comp.soft-sys.matlabHowever, if you've got enough RAM to run 8 local workers, this shouldn't be a ... Parallel vs Distributed - comp.soft-sys.matlab Hi, I'm trying use parfor locally on a ... Xilinx System Generator - Multiple system generator block - comp ...Generating core using .mif file - comp.arch.fpga... use .MIF to generated a rom using xilinx core generator. ... How to synthesyze a RAM block?? Open 16 workers by matlabpool on a local machine? - comp.soft-sys ...The Matlab installed on my computer is 2009a edition, and the Matlab Distributed Computing Server and the Parallel Computing Toolbox are version 4.1. Inferring Block RAM vs. Distributed RAM in XST and Precision ...This is a description of how to infer Xilinx FPGA block RAM or distributed RAM through HDL coding style and synthesis attributes/pragmas. Verilog GENERATE is an easy ... Difference between Block RAM and Distributed RAM in FPGAWhat is the difference between block RAM and distributed RAM in FPGA? I'v read some application notes, but I'm still confued about these two RAMs. 7/13/2012 4:51:36 AM
|