Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg - - PowerPoint PPT Presentation

▶

Dec 21, 2023 192 likes •684 views

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010 Parallelism in FPGAs Larger SoCs on FPGAs Parallel Systems Parallel systems on FPGAs

SLIDE 1

Efficient Multi-Ported Memories for FPGAs

Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010

SLIDE 2

Parallelism in FPGAs

 Larger SoCs on FPGAs →Parallel Systems  Parallel systems on FPGAs will need:

− Queueing − Data sharing − Communication − Synchronization

 Boils down to:

− FIFOs − Register files

We can do all these with multi-ported memories

SLIDE 3

Multi-Ported Memory

X X X X

Existing workarounds are ad-hoc, “roll-your-own”, and have limited parallelism.

SLIDE 4

Conventional Approaches

SLIDE 5

2W/2R Multi-Ported Memory

Doesn't exist on FPGAs Altera used to have one (Mercury)

SLIDE 6

Stratix III Building Blocks

M9K (eg: 32 x 256) M144K (eg: 32 x 4098)

Adaptive Logic Modules

Registers LUTs Adders

Block RAMs Flexible, but slow Fast, but inflexible

SLIDE 7

2W/2R Pure-ALM

Scales very poorly with memory depth

SLIDE 8

1W/nR Replication

Multiple read ports Only one write port

SLIDE 9

mW/nR Banking

Multiple write ports Fragmented data

SLIDE 10

mW/nR “Multipumping”

Multiple read/write ports No fragmentation Divides clock speed Read/write ordering

SLIDE 11

Block RAMs: Simple Dual Port

Read Write

SLIDE 12

Block RAMs: True Dual Port

R / W R / W

SLIDE 13

“Pure Multipumping”

Read as banked memory (multiple reads)

SLIDE 14

“Pure Multipumping”

Write as replicated memory (avoids fragmentation)

SLIDE 15

Methodology

 Generate design variations over space

− Vary # of ports, depth, type of memories

 1W/2R to 8W/16R  2 to 256 elements deep  Pure-ALM, M9K, MLAB, Multipumped

− Wrap in testbench for timing and correctness

 Target Quartus 9.0 to Stratix III

− No synthesis optimizations for speed or area − Standard P&R effort (speed, avg. over 10 runs)

 Measure area as Total Equivalent Area

− Expresses area in a single unit (ALMs)

SLIDE 16

Conventional Multi-Porting Performance

SLIDE 17

1W/2R Pure-ALM Area vs. Speed

NiosII/f 290 MHz 500 ALMs Smaller

Faster Too big and slow!

SLIDE 18

1W/2R Replicated vs. Pure-ALM

SLIDE 19

1W/2R “Pure Multipumping”

SLIDE 20

LVT-Based Multi-Ported Memories

SLIDE 21

LVT-Based Memory

SLIDE 22

LVT-Based Memory

Begin with one block RAM

SLIDE 23

LVT-Based Memory

Replicate for two read ports

SLIDE 24

LVT-Based Memory

Bank for two write ports

SLIDE 25

LVT-Based Memory

Select bank to read from

SLIDE 26

LVT-Based Memory

Add bank lookup table

SLIDE 27

LVT-Based Memory

SLIDE 28

Live Value Table Operation

SLIDE 29

LVT Operation

2W/2R, 4-deep

SLIDE 30

W0

LVT Operation

W0 W1 R0 R1 Live Value Table Write Addresses Read Addresses

1 2 3

SLIDE 31

W0

LVT Operation: Write

W0 W1 R0 R1 42 @ 1 23 @ 3

Records which write port last updated a location

1 2 3 1

SLIDE 32

W0

LVT Operation: Read

W0 W1 R0 R1 @ 1 @ 3 1

1

Steers read port to correct memory bank

1 2 3

SLIDE 33

LVT Implementation

LVT remains practical because it is very narrow

SLIDE 34

LVT Operation

Small Pure-ALM memory controlling larger block RAMs

SLIDE 35

Advantages of LVTs

 LVTs add a layer of indirection

− Everything operates in parallel − Makes banked memory behave as consistent unit

 LVTs are narrow

− Word width = log2(# of write ports) < 4 bits typically − Pure-ALM, but practical size and speed

SLIDE 36

LVT Performance

SLIDE 37

2W/4R Pure-ALM

SLIDE 38

2W/4R LVT-based vs. Pure-ALM

84% smaller 43% faster

412 MHz to 375 MHz

SLIDE 39

2W/4R Multipumping

Must be careful about read/write ordering!

SLIDE 40

Multipumping Performance

SLIDE 41

2W/4R Multipumping

SLIDE 42

2W/4R Multipumping

Pure Multipumping (279 MHz)

SLIDE 43

4W/8R Multipumping

Worsens as # of ports increases

SLIDE 44

2W/4R Multipumping

28% smaller

n average

193 MHz to 174 MHz

54% slower

n average

SLIDE 45

Conclusions

 Pure multipumped memories are better for

memories with few ports or low speed.

 LVT-based memories are faster and smaller

than Pure-ALM memories.

 LVT-based memories are faster than pure

multipumping, but at a cost in area.

SLIDE 46

Future Work

 Pure multipumping for LVT-based memories

− Build banks with 2W/4R pure multipumping blocks − Possible further area improvement

 Relaxing the read/write order for multipumping

− Allows multiplexing the write ports − Leaves designer to watch for WAR violations

SLIDE 47