Introduction to Field Programmable Gate Arrays Lecture 2/3 CERN - - PowerPoint PPT Presentation

introduction to field programmable gate arrays
SMART_READER_LITE
LIVE PREVIEW

Introduction to Field Programmable Gate Arrays Lecture 2/3 CERN - - PowerPoint PPT Presentation

Introduction to Field Programmable Gate Arrays Lecture 2/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Digital Signal Processing using FPGAs


slide-1
SLIDE 1

Introduction to Field Programmable Gate Arrays

Lecture 2/3

CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier Serrano, CERN AB-CO-HT

slide-2
SLIDE 2

Outline

Digital Signal Processing using FPGAs

  • Introduction. Why FPGAs for DSP?

Fixed point and its subtleties. Doing arithmetic in hardware. Distributed Arithmetic (DA). COordinate Rotation DIgital Computer (CORDIC).

slide-3
SLIDE 3

Outline

Digital Signal Processing using FPGAs

  • Introduction. Why FPGAs for DSP?

Fixed point and its subtleties. Doing arithmetic in hardware. Distributed Arithmetic (DA). COordinate Rotation DIgital Computer (CORDIC).

slide-4
SLIDE 4

Why FPGAs for DSP? (1)

Conventional DSP Device

(Von Neumann architecture)

Data Out Reg Data In

MAC unit

....

C0 Data Out C1 C2 C255

FPGA

Reg0 Reg1 Reg2

Reg255

Data In

All 256 MAC operations in 1 clock cycle 256 Loops needed to process samples

Reason 1: FPGAs handle high computational workloads

slide-5
SLIDE 5

FPGAs are ideal for multi-channel DSP designs

LPF Multi Channel Filter 80MHz Samples ch1 ch2 ch3 ch4 LPF LPF LPF LPF 20MHz Samples

Many low sample rate channels can be multiplexed (e.g. TDM) and processed in the FPGA, at a high rate. Interpolation (using zeros) can also drive sample rates higher.

slide-6
SLIDE 6

Why FPGAs for DSP? (2)

Q = (A x B) + (C x D) + (E x F) + (G x H) can be implemented in parallel

× × × ×

+ + + + + +

A B C D E F G H Q

Reason 2: Tremendous Flexibility But is this the only way in the FPGA?

slide-7
SLIDE 7

× × × ×

+ + + + + +

×

+ +

D Q

× ×

+ + + +

D Q

Parallel Semi-Parallel Serial

Customize Architectures to Suit Your Ideal Algorithms

FPGAs allow Area (cost) / Performance tradeoffs Optimized for? Speed Cost

slide-8
SLIDE 8

DDC DDC A/D A/D D/A D/A MACs Control DDC DDC DUC DUC DUC DUC MACs Control DSP Procs. DUC DUC DUC DUC DDC DDC DDC DDC

SDRAM

AFE

FPGA

DSP Card

Hundreds of Termination Resistors P P

  • w

w e e r r P P C C

SDRAM

SSTL3 Translators Quad TRx Quad TRx

ASSP

FPGA

Network Card

SDRAM

A/D A/D D/A D/A

Control Control PL4 CORBA

Pow erPC

MACs, DUCs, DDCs, Logic

Pow erPC Pow erPC Pow erPC

3.125 Gbps

ASSP

SDRAM

Reason 3: Integration simplifies PCBs

Why FPGAs for DSP? (3)

slide-9
SLIDE 9

Outline

Digital Signal Processing using FPGAs

  • Introduction. Why FPGAs for DSP?

Fixed point and its subtleties. Doing arithmetic in hardware. Distributed Arithmetic (DA). COordinate Rotation DIgital Computer (CORDIC).

slide-10
SLIDE 10

Unsigned integers: positive values only

slide-11
SLIDE 11

2’s complement

slide-12
SLIDE 12

Fixed point binary numbers

Example: 3 integer bits and 5 fractional bits

slide-13
SLIDE 13

Fixed point truncation vs. rounding

Note that in 2’s complement, truncation is biased while rounding isn’t.

slide-14
SLIDE 14

Outline

Digital Signal Processing using FPGAs

  • Introduction. Why FPGAs for DSP?

Fixed point and its subtleties. Doing arithmetic in hardware. Distributed Arithmetic (DA). COordinate Rotation DIgital Computer (CORDIC).

slide-15
SLIDE 15

The Full Adder (FA)

slide-16
SLIDE 16

Add/subtract circuit

S = A+B when Control=‘0’ S = A-B when Control=‘1’

slide-17
SLIDE 17

Saturation

You can’t let the data path become arbitrarily wide. Saturation involves overflow detection and a multiplexer. Useful in accumulators (like the one in the PI controller we use in the lab).

slide-18
SLIDE 18

Multiplication: pencil & paper approach

slide-19
SLIDE 19

A 4-bit unsigned multiplier using Full Adders and AND gates

Of course, you can use embedded multipliers if your chip has them!

slide-20
SLIDE 20

Constant coefficient multipliers using ROM

For “easy” coefficients, there are smarter ways. E.g. to multiply a number A by 31, left-shift A by 5 places then subtract A.

slide-21
SLIDE 21

Division: pencil & paper

  • Uses add/subtract blocks presented earlier.
  • MSB produced first: this will usually imply we have to wait for whole operation to

finish before feeding result to another block.

  • Longer combinational delays than in multiplication: an N by N division will always take

longer than an N by N multiplication.

slide-22
SLIDE 22

Pipelining the division array

slide-23
SLIDE 23

Square root

  • Take a division array, cut it in half (diagonally) and you have square root. Square root

is therefore faster than division!

  • Although with less ripple through, this block suffers from the same problems as the

division array.

  • Alternative approach: first guess with a ROM, then use an iterative algorithm such as

Newton-Raphson.

slide-24
SLIDE 24

Outline

Digital Signal Processing using FPGAs

  • Introduction. Why FPGAs for DSP?

Fixed point and its subtleties. Doing arithmetic in hardware. Distributed Arithmetic (DA). COordinate Rotation DIgital Computer (CORDIC).

slide-25
SLIDE 25

Distributed Arithmetic (DA) 1/2

− =

⋅ =

1

] [ ] [

N n

n x n c y

∑ ∑

− = − =

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ⋅ ⋅ =

1 1

2 ] [ ] [

N n B b b b n

x n c y

Digital filtering is about sums of products: Let’s assume: c[n] constant (prerequisite to use DA) x[n] input signal B bits wide Then: xb[n] is bit number b

  • f x[n] (either 0 or 1)

And after some rearrangement of terms:

∑ ∑

− = − =

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ⋅ ⋅ =

1 1

] [ ] [ 2

B b N n b b

n x n c y

This can be implemented with an N-input LUT

slide-26
SLIDE 26

Distributed Arithmetic (DA) 2/2

∑ ∑

− = − =

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ⋅ ⋅ =

1 1

] [ ] [ 2

B b N n b b

n x n c y

xB[0]

……

x1[0] x0[0] xB[1]

……

x1[1] x0[1] xB[N-1]

……

x1[N-1] x0[N-1]

…….... …….... …….... LUT + Register

2-1

y Generates a result every B clock ticks. Replicating logic one can trade off speed vs. area, to the limit of getting one result per clock tick.

slide-27
SLIDE 27

Outline

Digital Signal Processing using FPGAs

  • Introduction. Why FPGAs for DSP?

Fixed point and its subtleties. Doing arithmetic in hardware. Distributed Arithmetic (DA). COordinate Rotation DIgital Computer (CORDIC).

slide-28
SLIDE 28

COrdinate Rotation DIgital Computer

slide-29
SLIDE 29

Pseudo-rotations

slide-30
SLIDE 30

Basic CORDIC iterations

slide-31
SLIDE 31

Angle accumulator

slide-32
SLIDE 32

The scaling factor

slide-33
SLIDE 33

Rotation Mode

slide-34
SLIDE 34

Example: calculate sin and cos of 30º

slide-35
SLIDE 35

Vectoring Mode

Vector magnitude

slide-36
SLIDE 36

Circular coordinate system

slide-37
SLIDE 37

Other coordinate systems

slide-38
SLIDE 38

Generalized CORDIC equations

slide-39
SLIDE 39

Summary of CORDIC functions

slide-40
SLIDE 40

Precision and convergence

slide-41
SLIDE 41

FPGA implementation

slide-42
SLIDE 42

Iterative bit-serial design

slide-43
SLIDE 43

Acknowledgements

Many thanks to Jeff Weintraub (Xilinx University Program) and Bob Stewart (University of Strathclyde) for many of these slides.