Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg - PowerPoint PPT Presentation
Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010 Parallelism in FPGAs Larger SoCs on FPGAs Parallel Systems Parallel systems on FPGAs
Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010
Parallelism in FPGAs Larger SoCs on FPGAs → Parallel Systems Parallel systems on FPGAs will need: − Queueing − Data sharing − Communication − Synchronization Boils down to: − FIFOs − Register files We can do all these with multi-ported memories 2
Multi-Ported Memory X X X X Existing workarounds are ad-hoc, “roll-your-own”, and have limited parallelism. 3
Conventional Approaches 4
2W/2R Multi-Ported Memory Doesn't exist on FPGAs Altera used to have one (Mercury) 5
Stratix III Building Blocks Adaptive Logic Modules Flexible, Registers but slow LUTs Adders Block RAMs Fast, but M9K (eg: 32 x 256) inflexible M144K (eg: 32 x 4098) 6
2W/2R Pure-ALM Scales very poorly with memory depth 7
1W/nR Replication Only one write port Multiple read ports 8
mW/nR Banking Multiple write ports Fragmented data 9
mW/nR “Multipumping” Multiple read/write ports Divides clock speed No fragmentation Read/write ordering 10
Block RAMs: Simple Dual Port Write Read 11
Block RAMs: True Dual Port R / W R / W 12
“Pure Multipumping” Read as banked memory (multiple reads) 13
“Pure Multipumping” Write as replicated memory (avoids fragmentation) 14
Methodology Generate design variations over space − Vary # of ports, depth, type of memories 1W/2R to 8W/16R 2 to 256 elements deep Pure-ALM, M9K, MLAB, Multipumped − Wrap in testbench for timing and correctness Target Quartus 9.0 to Stratix III − No synthesis optimizations for speed or area − Standard P&R effort (speed, avg. over 10 runs) Measure area as Total Equivalent Area − Expresses area in a single unit (ALMs) 15
Conventional Multi-Porting Performance 16
1W/2R Pure-ALM Area vs. Speed Too big and slow! Faster NiosII/f 290 MHz 500 ALMs 17 Smaller
1W/2R Replicated vs. Pure-ALM 18
1W/2R “Pure Multipumping” 19
LVT-Based Multi-Ported Memories 20
LVT-Based Memory 21
LVT-Based Memory Begin with one block RAM 22
LVT-Based Memory Replicate for two read ports 23
LVT-Based Memory Bank for two write ports 24
LVT-Based Memory Select bank to read from 25
LVT-Based Memory Add bank lookup table 26
LVT-Based Memory 27
Live Value Table Operation 28
LVT Operation 2W/2R, 4-deep 29
LVT Operation W 0 W 0 R 0 0 1 2 R 1 W 1 3 Write Addresses Read Addresses Live Value Table 30
LVT Operation: Write W 0 W 0 R 0 0 42 @ 1 1 0 2 R 1 W 1 23 @ 3 3 1 Records which write port last updated a location 31
LVT Operation: Read W 0 W 0 R 0 0 @ 3 1 1 0 2 R 1 W 1 @ 1 3 1 0 Steers read port to correct memory bank 32
LVT Implementation LVT remains practical because it is very narrow 33
LVT Operation Small Pure-ALM memory controlling larger block RAMs 34
Advantages of LVTs LVTs add a layer of indirection − Everything operates in parallel − Makes banked memory behave as consistent unit LVTs are narrow − Word width = log 2 (# of write ports) < 4 bits typically − Pure-ALM, but practical size and speed 35
LVT Performance 36
2W/4R Pure-ALM 37
2W/4R LVT-based vs. Pure-ALM 412 MHz to 375 MHz 84% smaller 43% faster 38
2W/4R Multipumping Must be careful about read/write ordering! 39
Multipumping Performance 40
2W/4R Multipumping 41
2W/4R Multipumping Pure Multipumping (279 MHz) 42
4W/8R Multipumping Worsens as # of ports increases 43
2W/4R Multipumping 54% slower 28% smaller on average on average 193 MHz to 174 MHz 44
Conclusions LVT-based memories are faster and smaller than Pure-ALM memories. LVT-based memories are faster than pure multipumping, but at a cost in area. Pure multipumped memories are better for memories with few ports or low speed. 45
Future Work Pure multipumping for LVT-based memories − Build banks with 2W/4R pure multipumping blocks − Possible further area improvement Relaxing the read/write order for multipumping − Allows multiplexing the write ports − Leaves designer to watch for WAR violations 46
Thank You 47
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.