Performance/Power Trade-Offs of Bitline Isolation Se-Hyun Yang and - - PowerPoint PPT Presentation

performance power trade offs of bitline isolation
SMART_READER_LITE
LIVE PREVIEW

Performance/Power Trade-Offs of Bitline Isolation Se-Hyun Yang and - - PowerPoint PPT Presentation

Performance/Power Trade-Offs of Bitline Isolation Se-Hyun Yang and Babak Falsafi Computer Architecture Lab at Carnegie Mellon Elecrtrical and Computer Engineering Carnegie Mellon University High Bitline Discharge in Caches Deep sub


slide-1
SLIDE 1

Performance/Power Trade-Offs

  • f Bitline Isolation

Se-Hyun Yang and Babak Falsafi

Computer Architecture Lab at Carnegie Mellon Elecrtrical and Computer Engineering Carnegie Mellon University

slide-2
SLIDE 2

High Bitline Discharge in Caches

Deep subµ high-performance caches

  • Use subarrays
  • Precharge entire caches statically
  • No precharging delay exposed

Large discharge from subarrays

… …

slide-3
SLIDE 3

Bitline Isolation

Stop discharge by cutting off Vdd-bitline path

  • A.k.a. leakage biased bitlines
  • Turn off precharge devices

Need selective mechanisms to control …

… …

slide-4
SLIDE 4

Per-access Precharging Control

Ideally, best for energy saving

  • All bitlines isolated initially
  • Precharge only accessed subarrays
  • On-demand wakeup using partial decoding

Can be done for free? Energy cost, Timeliness

… …

slide-5
SLIDE 5

Contributions

Bitline Isolation

  • Energy: Large cost before, not in the future

Per-access control viable in the future

  • Performance: On-demand wakeup is late

Early precharging is required

  • Ideal early precharging Vs. Resizable caches

Large opportunity (74%) for per-access control

slide-6
SLIDE 6

Methodology

CACTI 3.0 and SPICE simulations

  • 180nm-2V,130nm-1.7V,100nm-1.3V,70nm-1V

Highly modified Wattch 1.0

  • 12 SPEC benchmarks
  • 8-wide 64 Issue queue w/ 128 Active list
  • 32KB 2-way set associative L1 caches
slide-7
SLIDE 7

Outline

  • Introduction
  • Methodology
  • Energy Overhead
  • Performance Overhead
  • Per-access Vs. Resizable Caches
  • Conclusions
slide-8
SLIDE 8

Bitline Leakage in SRAM cell

Leakage occurs in all subarrays Bitline isolation: turn off bitline devices

BL BL

Wordline

Vdd Vdd Precharge

slide-9
SLIDE 9

Sources of Energy Overhead

Switching precharge devices Charging up discharged bitlines

slide-10
SLIDE 10

Implications

What affects energy overhead?

  • CMOS technology: Relatively larger wire cap
  • Precharge device size

Resistive load between Vdd-bitlines on cell read Fast pull-up

  • Subarray size
  • Discharging time: average cache access interval
slide-11
SLIDE 11

Energy Overhead: Results

Bitline isolation energy effective in the future

180nm s tatic pullup bitline isolation 400ns 200ns Interval betw een tw

  • subarray

ac cess es R elativ e average pow er 0.5 1.0 1.5 2.0 130nm 100nm 70nm

slide-12
SLIDE 12

Performance Impact

On-demand precharging

  • Precharge only accessed subarrays
  • On-demand wakeup using partial decoding

Address Decoding Partial Address Decoding

Wordline Assertion

Bitline Precharging

?

slide-13
SLIDE 13

Cache Decoder Architecture

subarray decoder subarray decoder subarray decoder subarray decoder 3-to-8 Address 3-to-8 Address Stage 1 Stage 2 Stage 3

slide-14
SLIDE 14

Implications

What affects the delay?

  • Precharging delay

CMOS technology: Longer wire delay Size of subarray

  • Partial address decoding

# of subarrays: More bits for indentifying subarray

slide-15
SLIDE 15

Performance Impact: Results

Early precharging is desirable

0.19 0.07 70 0.28 0.10 100 0.36 0.13 130 0.50 0.18 180 4KB 128-row 0.16 0.06 70 0.24 0.09 100 0.31 0.13 130 0.39 0.15 180 1KB 32-row Bitline precharge(ns) Stage 3 Delay (ns) Feature Size (nm) Subarray size

slide-16
SLIDE 16

Per-Access Vs. Resizable Caches

Resizable caches [Albonesi][Yang et. al]

  • Monitor/Adapt infrequently
  • Energy/time overhead amortized in large interval

Important in the past, not in the future

  • Possibly suboptimal control

Coarse-grain Less sensitive

slide-17
SLIDE 17

ammp applu apsi compress gcc ijpeg m88ksim su2cor swim tomcatv vortex vpr AVG. 20 60 100 80 40 bitline discharge Reduction (%) in

Opportunity

  • 74% opportunity for instruction caches
  • 70nm technology
slide-18
SLIDE 18

Comparison: Resizable Caches

Resizable caches: consistent over technologies Per-access control: capturing opportunity

180nm 130nm 100nm 70nm Reduction (%) in

  • 10

10 30 50 70 bitline discharge Perfect Prediction Resizable Cache

slide-19
SLIDE 19

Conclusions

  • Smaller energy overhead in the future

Per-access fine control viable in the future

  • On-demand wakeup is late

Early precharging to avoid performance hit

  • 74% opportunity for per-access control for 70nm

Significantly less opportunity for the past Resizable caches good for the all generations

slide-20
SLIDE 20

For more information

PowerTap Project http://www.ece.cmu.edu/~powertap Computer Architecture Lab Carnegie Mellon University