Power Optimization of FPGA Interconnect Via Circuit and CAD - PowerPoint PPT Presentation
Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Safeen Huda and Jason Anderson International Symposium on Physical Design Santa Rosa, CA, April 6, 2016 1 Motivation * Google data centre FPGA power increasingly
Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Safeen Huda and Jason Anderson International Symposium on Physical Design Santa Rosa, CA, April 6, 2016 1
Motivation * Google data centre • FPGA power increasingly critical because of new markets – Data centers – Mobile electronics 2
Motivation • FPGAs typically have underutilized wires • We ask: Can we take advantage of unused wires? • This work: 3 techniques to reduce power w/ unused wires – Charge recycling (dynamic) – Effective capacitance reduction (dynamic) – Pulse-based signalling (static) 3
Dynamic Power Reduction Techniques 4
Motivation * Figure taken from [Tuan07 ] • Routing power is prime component of FPGA dynamic power 5
Charge Recycling in FPGA Interconnect 6
Dynamic Power in Conv. CCTs V DD V DD C L V DD coulombs of charge drawn from supply V DD 0 C L C L • Switching from “0” to “1” draws C L V DD 2 joules 7
Dynamic Power in Conv. CCTs V DD V DD C L V DD coulombs V DD 0 of charge C L C L dissipated • All of the stored energy in C L is dissipated • Can we use the energy that is being dissipated? 8
Charge Recycling (CR) Concept Initial Phase V DD V DD 0 C L C R • During “ 1 ” → “ 0 ” transition, output starts at V DD • PDN disconnected, PUN connected 9
Charge Recycling (CR) Concept Charge Recovery Phase V DD *Assume C L = C R V DD /2 V DD /2 C L C R • PUN, PDN disconnected, C L connected to C R • ½ C L V DD coulombs of charge transferred 10
Charge Recycling (CR) Concept Final Phase V DD V DD /2 0 C L C R • PDN connected, PUN disconnected • Output pulled to GND, ½ C L V DD coulombs dissipated 11
Charge Recycling (CR) Concept Initial Phase V DD V DD /2 0 C L C R • Output initially at GND during a “ 0 ” → ” 1 ” transition • ½ C L V DD coulombs stored in C R 12
Charge Recycling (CR) Concept Charge Recycling Phase V DD *Assume C L = C R V DD /4 V DD /4 C L C R • PUN, PDN disconnected, C L connected to C R • ¼ C L V DD coulombs of charge transferred to C L 13
Charge Recycling (CR) Concept Final Phase V DD V DD V DD /4 C L C R • ¾ C L V DD coulombs of charge drawn from supply • Implies 25% reduction in energy consumption 14
Observations • We can reduce power if reservoir capacitors are available – Use unused wires as reservoirs! • CR requires complex set of steps -- area penalty to implement? • FPGA routing circuits are big to begin with – Large routing multiplexers – Several SRAM cells – Large output buffers to drive long capacitive wires • Incremental area overhead of complex circuitry may not be too bad … 15
CR in FPGAs • Designs on FPGAs typically have paths with lots of slack – Can trade-off the delay of these paths for power savings using CR • Opportunity: [Anderson09] showed that 75% of switches in a design can be slowed down by 50% • Target CR in FPGA routing switches Routing ¡Switch Target output buffer for charge recovery/recycling Routing ¡Wire V IN V IN Inputs ¡ ` 16
Proposed FPGA Routing Arch. CR Buffer CLB CLB “Friend” Conductors (2 way sharing) SB CLB CLB 17
CR Routing Buffer V DD V DD CR TS M10 CR TS Gating V IN Circuitry M9 C Wire Delay Line V IN_D D IN CR CR CR D IN Circuit Unused Routing Conductor CR CR TS TS SRAM SRAM C Wire Cell Cell C R = C WIRE • CR sets state of buffer – CR mode vs. Normal mode • TS sets one of two “friend” buffers in tristate mode 18
Functional Simulation 1.20 Output Recovery Phase 1.00 Node ¡Voltage ¡[V] 0.80 Reservoir 0.60 0.40 Recycling Phase 0.20 0.00 35 40 45 50 55 Time ¡[ns] • Simulated in ST65 process • Approx. 26% power reduction • Theoretical reduction of 33% - circuit overheads • Assuming 200fF interconnect load 19
CAD Tool Support • Power can be reduced for a routing switch if: – 1) “Friend” conductor is unoccupied – 2) Switch lies along path with sufficient slack • To optimize CR in FPGAs, we need CAD which: – Maximizes the availability of free reservoirs for nets with high activity and sufficient slack – Optimizes the mode selection of switches 20
CAD Flow CIRCUIT Packing ¡ Conven&onal ¡VPR ¡6.0 ¡ packing ¡and ¡placement ¡ Placement Modified ¡VPR ¡6.0 ¡Router ¡ Op&mizes ¡availability ¡of ¡free ¡ CR-‑aware ¡ Net ¡Activities Router reservoirs ¡ %CR ¡Capable ¡ Switches Post-‑rou&ng ¡phase ¡to ¡select ¡ Switch ¡Mode ¡ Timing ¡ opera&ng ¡mode ¡of ¡switches ¡ Selection Constraint (CR ¡vs. ¡Normal) ¡ .net, ¡ f max , .route, ¡ power ¡ .place ¡ estimate files ¡ 21
Results !"#$%&'$( )*+,&-$+-.*$( )*+,&-$+-.*$( 5678(9'+*$"#$(&'( (/&-,(01(0"2"3&%&-4( 0*&:+"%(;"-,( • Arch. with 100% CR capable switches • Best case 1.3% degradation CP delay – Due to increased delay of CR capable switches • Extra ~3% power reduction as delay constraints relaxed 22
Effective Interconnect Capacitance Reduction 25
VLSI Wire Capacitance M5 C P C P C P C P C P C P C C C C C C C C C C C C M4 C P C P C P C P C P C P M3 • Wire capacitance consists of: – Coupling capacitance (C C ) – between adjacent wires on same layer – Plate capacitance (C P ) – between adjacent wires on different layers • Due to aspect ratio of wires, C C is dominant 28
Wire Capacitance Optimization in ASICs (1) s 1 Total ¡channel ¡width, ¡W net ¡ i w 1 s 2 net ¡ j w 2 s 3 net ¡ k w 3 s 4 • In ASICs, have freedom to optimize wire width and spacing • Can optimize w i and s i to maximize timing, minimize power • Optimize w i and s i subject to Σ w i + Σ s i = W 30
Wire Capacitance Optimization in ASICs (2) net ¡ i w 1 Total ¡channel ¡width, ¡W s 2 net ¡ j w 2 s 3 net ¡ k w 3 • If net j is timing/power critical: – Can increase s 2 and s 3 to reduce C C – Reduces capacitance on net j , improves speed and reduces power • Can also optimize w 1 , w 2 , w 3 for speed and power 31
In FPGAs? Routing Option 1 Routing Option 2 UNUSED ¡Conductors net ¡ i UNUSED ¡Conductors USED ¡Conductors net ¡ i net ¡ j USED ¡Conductors net ¡ j net ¡ k net ¡ k • FPGA wiring prefabricated, width and spacing fixed • Can’t space used wires apart, unused wires in the way • Capacitance on wires in two routing options the same – Despite the fact that nets i,j,k are now spaced further apart 32
Wire Cap. Optimization (1) C C2 C C1 ¡= ¡C C C C1 C P Z IN ( s ) Routing ¡ Conductor ¡3 C C2 ¡ + ¡C P ¡ R EQ IN 2 C P Routing ¡ Conductor ¡2 IN 1 C P Routing ¡ Conductor ¡1 • What’s the total impedance seen by Routing Conductor 1, looking towards Routing Conductor 2? 33
Wire Cap. Optimization (2) C C1 ¡= ¡C C C C1 ¡= ¡C C Z IN ( s ) Z IN ( s ) C C2 ¡ + ¡C P ¡ C C2 ¡ + ¡C P ¡ R EQ R EQ • If R eq is small, capacitor C C2 + C P is shorted out • Impedance looking towards Routing Conductor 2 is the capacitor C c 34
Wire Cap. Optimization (3) C C1 ¡= ¡C C C C1 ¡= ¡C C Z IN ( s ) Z IN ( s ) C C2 ¡ + ¡C P ¡ C C2 ¡ + ¡C P ¡ R EQ R EQ • If R eq is large, we approximate as an open circuit • Z IN equal to series combination of C C and C C2 + C P 35
Wire Cap. Optimization (3) • Series combinations of capacitors result in reduced capacitance: – If C 1 in series with C 2 , eq. capacitance C eq = C 1 C 2 /(C 1 + C 2 ) < C1 • Therefore can reduce capacitance if R eq is large enough • Making R eq large is bad … – buffer delay ~ R eq C wire --> increase in R eq increases delay • What if we made R eq large only for unused conductors? – Would not result in increased delay of used conductors – Neighbouring used conductors would see benefit of reduced cap. • Need to be able to set R eq large for unused conductors, but small for used conductors – Used tri-state buffers! 36
This Work Nets ¡ i ¡and ¡ j ¡ net ¡ i still ¡see ¡ reduced ¡ UNUSED ¡Conductors Tristated USED ¡Conductors coupling ¡ capacitance net ¡ j Tristated net ¡ k • If intermediate wires are tristated, see reduced C C !! • In this work we tristate unused wires to reduce wire cap – Proposed a novel, lightweight TSB topology – Used similar CAD techniques to CR work (won’t cover in this talk) 40
Proposed Tristate Buffer 41
Traditional Tri-state Buffers V DD TS M5 V DD M2 M4 M3 IN OUT M1 M6 • Header transistor M5 cuts off pull up path to output • Unused buffer would have IN at VDD – M 1 pulls gate of M 6 to GND • Large area cost: M 2 , M 4 and M 5 must be big due to of stacking 42
Optimized Headerless TSB V DD V DD V DD M2 M7 TS M9 V DD V DD OUT M4 TS M3 M5 IN M8 M1 • No stacking in output stage • Leverages fact that unused buffers have their input pulled high (details in paper) 43
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.