Power Optimization of FPGA Interconnect Via Circuit and CAD - - PowerPoint PPT Presentation

power optimization of fpga interconnect via circuit and
SMART_READER_LITE
LIVE PREVIEW

Power Optimization of FPGA Interconnect Via Circuit and CAD - - PowerPoint PPT Presentation

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Safeen Huda and Jason Anderson International Symposium on Physical Design Santa Rosa, CA, April 6, 2016 1 Motivation * Google data centre FPGA power increasingly


slide-1
SLIDE 1

1

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques

Safeen Huda and Jason Anderson

International Symposium on Physical Design Santa Rosa, CA, April 6, 2016

slide-2
SLIDE 2

2

Motivation

  • FPGA power increasingly critical because of new markets

– Data centers – Mobile electronics

*Google data centre

slide-3
SLIDE 3

3

Motivation

  • FPGAs typically have underutilized wires
  • We ask: Can we take advantage of unused wires?
  • This work: 3 techniques to reduce power w/ unused wires

– Charge recycling (dynamic) – Effective capacitance reduction (dynamic) – Pulse-based signalling (static)

slide-4
SLIDE 4

4

Dynamic Power Reduction Techniques

slide-5
SLIDE 5

5

Motivation

  • Routing power is prime component of FPGA dynamic power

*Figure taken

from [Tuan07]

slide-6
SLIDE 6

6

Charge Recycling in FPGA Interconnect

slide-7
SLIDE 7

7

Dynamic Power in Conv. CCTs

  • Switching from “0” to “1” draws CLVDD

2 joules

VDD CL VDD

VDD

CL

CLVDD coulombs

  • f charge drawn

from supply

slide-8
SLIDE 8

8

Dynamic Power in Conv. CCTs

  • All of the stored energy in CL is dissipated
  • Can we use the energy that is being dissipated?

CLVDD coulombs

  • f charge

dissipated

VDD

VDD

CL VDD CL

slide-9
SLIDE 9

9

Charge Recycling (CR) Concept

  • During “1” → “0” transition, output starts at VDD
  • PDN disconnected, PUN connected

VDD

VDD CL CR Initial Phase

slide-10
SLIDE 10

10

Charge Recycling (CR) Concept

  • PUN, PDN disconnected, CL connected to CR
  • ½ CLVDD coulombs of charge transferred

Charge Recovery Phase

VDD

VDD/2 VDD/2 CL CR *Assume CL = CR

slide-11
SLIDE 11

11

Charge Recycling (CR) Concept

  • PDN connected, PUN disconnected
  • Output pulled to GND, ½ CLVDD coulombs dissipated

Final Phase

VDD

VDD/2 CL CR

slide-12
SLIDE 12

12

Charge Recycling (CR) Concept

  • Output initially at GND during a “0” →”1” transition
  • ½ CLVDD coulombs stored in CR

Initial Phase

VDD

VDD/2 CL CR

slide-13
SLIDE 13

13

Charge Recycling (CR) Concept

  • PUN, PDN disconnected, CL connected to CR
  • ¼ CLVDD coulombs of charge transferred to CL

Charge Recycling Phase *Assume CL = CR

VDD

VDD/4 VDD/4 CL CR

slide-14
SLIDE 14

14

Charge Recycling (CR) Concept

  • ¾CLVDD coulombs of charge drawn from supply
  • Implies 25% reduction in energy consumption

Final Phase

VDD

VDD VDD/4 CL CR

slide-15
SLIDE 15

15

Observations

  • We can reduce power if reservoir capacitors are available

– Use unused wires as reservoirs!

  • CR requires complex set of steps -- area penalty to

implement?

  • FPGA routing circuits are big to begin with

– Large routing multiplexers – Several SRAM cells – Large output buffers to drive long capacitive wires

  • Incremental area overhead of complex circuitry may not

be too bad…

slide-16
SLIDE 16

16

CR in FPGAs

  • Designs on FPGAs typically have paths with lots of slack

– Can trade-off the delay of these paths for power savings using CR

  • Opportunity: [Anderson09] showed that 75% of switches in

a design can be slowed down by 50%

  • Target CR in FPGA routing switches

¡

`

Routing ¡Switch Routing ¡Wire VIN VIN Inputs

Target output buffer for charge recovery/recycling

slide-17
SLIDE 17

17

Proposed FPGA Routing Arch.

CLB CLB CLB CLB SB

CR Buffer “Friend” Conductors (2 way sharing)

slide-18
SLIDE 18

18

CR Routing Buffer

  • CR sets state of buffer – CR mode vs. Normal mode
  • TS sets one of two “friend” buffers in tristate mode

VDD

Gating Circuitry

VDD

Delay Line

CR

TS CR

CWire VIN

TS CR CR Circuit

CWire

Unused Routing Conductor

VIN_D

SRAM Cell

CR CR

SRAM Cell

TS TS M9 M10

CR DIN

DIN

CR = CWIRE

slide-19
SLIDE 19

19

0.00 0.20 0.40 0.60 0.80 1.00 1.20 35 40 45 50 55

Node ¡Voltage ¡[V] Time ¡[ns]

Functional Simulation

  • Simulated in ST65 process
  • Approx. 26% power reduction
  • Theoretical reduction of 33% - circuit overheads
  • Assuming 200fF interconnect load

Output Reservoir

Recovery Phase Recycling Phase

slide-20
SLIDE 20

20

CAD Tool Support

  • Power can be reduced for a routing switch if:

– 1) “Friend” conductor is unoccupied – 2) Switch lies along path with sufficient slack

  • To optimize CR in FPGAs, we need CAD which:

– Maximizes the availability of free reservoirs for nets with high activity and sufficient slack – Optimizes the mode selection of switches

slide-21
SLIDE 21

21

CAD Flow

Conven&onal ¡VPR ¡6.0 ¡ packing ¡and ¡placement ¡ Modified ¡VPR ¡6.0 ¡Router ¡ Op&mizes ¡availability ¡of ¡free ¡ reservoirs ¡ Post-­‑rou&ng ¡phase ¡to ¡select ¡

  • pera&ng ¡mode ¡of ¡switches ¡

(CR ¡vs. ¡Normal) ¡

Packing ¡ Placement CR-­‑aware ¡ Router Switch ¡Mode ¡ Selection %CR ¡Capable ¡ Switches Timing ¡ Constraint CIRCUIT Net ¡Activities .net, ¡ .route, ¡ .place ¡ files ¡ fmax, power ¡ estimate

slide-22
SLIDE 22

22

Results

  • Arch. with 100% CR capable switches
  • Best case 1.3% degradation CP delay

– Due to increased delay of CR capable switches

  • Extra ~3% power reduction as delay constraints relaxed

!"#$%&'$( )*+,&-$+-.*$( )*+,&-$+-.*$( (/&-,(01(0"2"3&%&-4( 5678(9'+*$"#$(&'( 0*&:+"%(;"-,(

slide-23
SLIDE 23

25

Effective Interconnect Capacitance Reduction

slide-24
SLIDE 24

28

VLSI Wire Capacitance

  • Wire capacitance consists of:

– Coupling capacitance (CC) – between adjacent wires on same layer – Plate capacitance (CP) – between adjacent wires on different layers

  • Due to aspect ratio of wires, CC is dominant

CP CP CC CC M4 M3 M5 CP CP CP CP CP CP CC CC CP CP CC CP CP CC

slide-25
SLIDE 25

30

Wire Capacitance Optimization in ASICs (1)

  • In ASICs, have freedom to optimize wire width and spacing
  • Can optimize wi and si to maximize timing, minimize power
  • Optimize wi and si subject to Σwi + Σsi = W

net ¡i net ¡j net ¡k w1 w2 w3 s2 s1 s3 s4 Total ¡channel ¡width, ¡W

slide-26
SLIDE 26

31

Wire Capacitance Optimization in ASICs (2)

  • If net j is timing/power critical:

– Can increase s2 and s3 to reduce CC – Reduces capacitance on net j, improves speed and reduces power

  • Can also optimize w1, w2, w3 for speed and power

net ¡i net ¡j net ¡k w1 w2 w3 s2 s3 Total ¡channel ¡width, ¡W

slide-27
SLIDE 27

32

In FPGAs?

  • FPGA wiring prefabricated, width and spacing fixed
  • Can’t space used wires apart, unused wires in the way
  • Capacitance on wires in two routing options the same

– Despite the fact that nets i,j,k are now spaced further apart

net ¡i net ¡k net ¡j UNUSED ¡Conductors USED ¡Conductors net ¡k net ¡j UNUSED ¡Conductors USED ¡Conductors net ¡i

Routing Option 1 Routing Option 2

slide-28
SLIDE 28

33

Wire Cap. Optimization (1)

  • What’s the total impedance seen by Routing Conductor 1,

looking towards Routing Conductor 2?

CC1 CP CP CC2 CP IN1 IN2

Routing ¡ Conductor ¡1 Routing ¡ Conductor ¡2 Routing ¡ Conductor ¡3

CC1 ¡= ¡CC CC2 ¡+ ¡CP ¡ REQ ZIN(s)

slide-29
SLIDE 29

34

Wire Cap. Optimization (2)

  • If Req is small, capacitor CC2 + CP is shorted out
  • Impedance looking towards Routing Conductor 2 is the

capacitor Cc

CC1 ¡= ¡CC CC2 ¡+ ¡CP ¡ REQ ZIN(s) CC1 ¡= ¡CC CC2 ¡+ ¡CP ¡ REQ ZIN(s)

slide-30
SLIDE 30

35

Wire Cap. Optimization (3)

  • If Req is large, we approximate as an open circuit
  • ZIN equal to series combination of CC and CC2 + CP

CC1 ¡= ¡CC CC2 ¡+ ¡CP ¡ REQ ZIN(s) CC1 ¡= ¡CC CC2 ¡+ ¡CP ¡ REQ ZIN(s)

slide-31
SLIDE 31

36

Wire Cap. Optimization (3)

  • Series combinations of capacitors result in reduced

capacitance:

– If C1 in series with C2, eq. capacitance Ceq = C1C2/(C1 + C2) < C1

  • Therefore can reduce capacitance if Req is large enough
  • Making Req large is bad…

– buffer delay ~ ReqCwire --> increase in Req increases delay

  • What if we made Req large only for unused conductors?

– Would not result in increased delay of used conductors – Neighbouring used conductors would see benefit of reduced cap.

  • Need to be able to set Req large for unused conductors,

but small for used conductors

– Used tri-state buffers!

slide-32
SLIDE 32

40

net ¡k net ¡j UNUSED ¡Conductors USED ¡Conductors net ¡i Tristated Tristated

Nets ¡i ¡and ¡j ¡ still ¡see ¡ reduced ¡ coupling ¡ capacitance

This Work

  • If intermediate wires are tristated, see reduced CC !!
  • In this work we tristate unused wires to reduce wire cap

– Proposed a novel, lightweight TSB topology – Used similar CAD techniques to CR work (won’t cover in this talk)

slide-33
SLIDE 33

41

Proposed Tristate Buffer

slide-34
SLIDE 34

42

Traditional Tri-state Buffers

  • Header transistor M5 cuts off pull up path to output
  • Unused buffer would have IN at VDD

– M1 pulls gate of M6 to GND

  • Large area cost: M2, M4 and M5 must be big due to of stacking

IN OUT M1 M2 M3 TS VDD VDD M6 M4 M5

slide-35
SLIDE 35

43

Optimized Headerless TSB

  • No stacking in output stage
  • Leverages fact that unused buffers have their input

pulled high (details in paper)

IN OUT M1 M2 M3 VDD M8 M9 M4 TS M5 M7 TS VDD VDD VDD VDD

slide-36
SLIDE 36

46

CAD Flow

Conven&onal ¡VPR ¡6.0 ¡ packing ¡and ¡placement ¡ Modified ¡VPR ¡6.0 ¡Router ¡ Op&mizes ¡rou&ng ¡to ¡reduce ¡ capacitance ¡of ¡nets ¡

  • Power and speed of a conductor can be optimized if adjacent

conductor(s) unused; similar to CAD flow in CR project

  • User supplied CC/CP to estimate cap. reduction

Packing ¡ Placement Wire ¡Cap ¡Opt. ¡ Router CIRCUIT Net ¡Activities .net, ¡ .route, ¡ .place ¡ files ¡ fmax, power ¡ estimate CC/CP ¡

slide-37
SLIDE 37

48

Results

  • Dynamic power reduction exceeds 15% for CC/CP ≈ 3
  • Get additional 14.6% leakage power savings from TSB
  • Critical path degradation ~1%
  • Total area overhead ~2.1%

!"# $"# %"# &"# '"# (!"# ($"# (%"# (&"# ('"# (# ()$# ()%# ()&# ()'# $# $)$# $)%# $)&# $)'# *#

!"#$%&'""$&#()'*$%(+$,-&.'"( //0/)(

slide-38
SLIDE 38

50

Static Power Reduction in FPGA Interconnect

slide-39
SLIDE 39

51

Routing Leakage Sources

  • Leakage path between rails of config. memory

– Can be minimized by using HVT xtors.

S S S S

VDD M1 M2 M3 M4 VDD VSS

slide-40
SLIDE 40

52

Routing Leakage Sources

  • Leakage path between rails of routing buffer

S S S S

VDD M1 M2 M3 M4 VDD VSS

slide-41
SLIDE 41

53

Routing Leakage Sources

  • Leakage paths between different inputs of routing mux

– Modern archs. have large muxes à many leakage paths

S S S S

VDD M1 M2 M3 M4 VDD VSS

slide-42
SLIDE 42

54

This Work

  • Can we shut off routing circuits when not active?

– i.e. not in the process of transmitting data – Effectively a very aggressive form of power gating

  • Dominant leakage paths in the routing network begin and

end at routing buffers

– Leakage between rails of routing buffer – Leakage between inputs pins of routing muxes originate and terminate at outputs of buffers driving these input pins

  • Therefore, only shut off routing buffers when not

transmitting data

  • We call this “pulsed-signalling”
slide-43
SLIDE 43

55

This work

  • Recall CR operation:

– Output buffer is tristated immediately following input transition – Wait (while charge is transferred b/w load and reservoir) – Output buffer is activated after transfer of charge

  • Pulsed-signalling is the exact opposite

– Output buffer is activated immediately following input transition – Wait (for transition to be reliably detected by downstream circuits) – Output buffer is tristated after signal transition

  • Therefore we can use similar circuits as CR work to

significantly reduce static power!

– Similar area overhead

slide-44
SLIDE 44

56

Low Leakage Buffer Design

  • Main buffer is active for a period of time (Δt ¡seconds) ¡after

transition appears at input

  • Main buffer is tristated Δt ¡seconds ¡after transition

Routing multiplexor CC VIN VDD M7 M8 M9 VINB M10 M12 VINB M11 VOUT Gating Stage Routing conductor Receiver Stage

Coupling Cap. (MIM Cap) Main Buffer Stage Low Leakage Diodes act as keepers

slide-45
SLIDE 45

57

Low Leakage Buffer Design

  • In quiescent state voltage of routing wire may drift

– Due primarily to leakage

  • Excessive voltage drift can lead to errors
  • Magnitude of drift controlled by diodes

Routing multiplexor CC VIN VDD M7 M8 M9 VINB M10 M12 VINB M11 VOUT Gating Stage Routing conductor Receiver Stage

Coupling Cap. (MIM Cap) Main Buffer Stage Low Leakage Diodes act as keepers

slide-46
SLIDE 46

58

Low Leakage Buffer Design

  • Non-“digital” voltage levels may lead to issues in

downstream circuits

  • Use capacitive coupling to DC voltage of routing wire from

downstream circuits (MIM cap. used as coupling cap)

  • Full latch used as receiver

Routing multiplexor CC VIN VDD M7 M8 M9 VINB M10 M12 VINB M11 VOUT Gating Stage Routing conductor Receiver Stage

Coupling Cap. (MIM Cap) Main Buffer Stage Low Leakage Diodes act as keepers

slide-47
SLIDE 47

64

Potential Pitfalls?

  • Need to ensure VDTB is less than the switching threshold of

downstream latch to prevent crosstalk-induced errors

  • Only operate buffers in this mode of operation if a

sufficient number of neighbours are unused

VDD VDD

Tristated ¡Routing ¡ Driver Active ¡Routing ¡ Driver

VDD VDD VCPL VDTB

slide-48
SLIDE 48

65

Robust Operation of Dynamic Gated Buffer

  • Guarantee robust operation of buffer by ensuring certain

number of adjacent wires are unused

– These act as shields

  • Assume noise on routing wires is dominated by wire-to-

wire coupling

  • Assume layers above and below routing wires can

adequately shield wires from noise sources above/below

  • ALL noise comes from adjacent wires
slide-49
SLIDE 49

66

CAD Tool Support

  • Denote w.c. coupling noise on routing conductor NMAX(i)
  • NMAX(i) is the sum of coupling noise from all neighbours

– Each neighbour contributes (CC/CT)VDD volts of noise – CC is the coupling cap. between neighbours, CT is total cap.

  • Let NS = max. noise s.t. receiver can suppress
  • Receiver must be able to distinguish b/w noise and data
  • When circuit is subject to 6σ parameter variation
  • Static power redux possible if NMAX ≤ ¡NS
  • CAD support must:

– Route design to maximize # of conductors with NMAX ≤ ¡NS – Try to ensure conductors neighbouring used conductors are unused – Optimizes the mode selection of switches

slide-50
SLIDE 50

67

CAD Flow

Conven&onal ¡VPR ¡6.0 ¡ packing ¡and ¡placement ¡ Modified ¡VPR ¡6.0 ¡Router ¡ Op&mizes ¡# ¡of ¡rou&ng ¡ conductors ¡with ¡NMAX ¡≤ ¡NS ¡ ¡ Post-­‑rou&ng ¡phase ¡to ¡select ¡

  • pera&ng ¡mode ¡of ¡switches ¡

(PS ¡vs. ¡Normal) ¡

Packing ¡ Placement PS-­‑aware ¡ Router ¡Mode ¡ Selection CIRCUIT CC/CP ¡(Indicates ¡ magnitude ¡of ¡ coupling ¡noise) .net, ¡ .route, ¡ .place ¡ files ¡ fmax, power ¡ estimate

slide-51
SLIDE 51

69

Methodology

  • Assessed power savings with a combination of MCNC and

VTR benchmark circuits

  • Routed circuits at 1.3 x WMIN and WMAX

– WMAX is which is 1.1x largest WMIN in benchmark set – Setting W = WMAX may be more realistic

slide-52
SLIDE 52

70

Results (W = 1.3 x WMIN)

  • 25% geomean active leakage reduction
  • 30% geomean total leakage reduction

0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0

Power Reduction[%]

Active Leakage Reduction Total Leakage Reduction

slide-53
SLIDE 53

71

Results (W = WMAX)

  • At WMAX number of routing conductors with unoccupied

neighbours increases

  • Increases active leakage reduction
  • Total leakage redux increase less pronounced

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0

Power Reduction[%]

Active Leakage Reduction Total Leakage Reduction

slide-54
SLIDE 54

72

Conclusions

  • Interconnect is the prime culprit in FPGA power
  • Presented three approaches to reduce power in FPGA

interconnect, all which leverage unused interconnect resources:

– Charge recycling – Coupling capacitance reduction – Pulse-based signalling

  • First two approaches target dynamic power; third

approach targets leakage power.

  • Future work: assess power benefits when multiple

techniques are combined

slide-55
SLIDE 55

73

Questions?