Performance/Power Trade-Offs of Bitline Isolation Se-Hyun Yang and - - PowerPoint PPT Presentation

▶

Jun 20, 2023 267 likes •481 views

Performance/Power Trade-Offs of Bitline Isolation Se-Hyun Yang and Babak Falsafi Computer Architecture Lab at Carnegie Mellon Elecrtrical and Computer Engineering Carnegie Mellon University High Bitline Discharge in Caches Deep sub

SLIDE 1

Performance/Power Trade-Offs

f Bitline Isolation

Se-Hyun Yang and Babak Falsafi

Computer Architecture Lab at Carnegie Mellon Elecrtrical and Computer Engineering Carnegie Mellon University

SLIDE 2

High Bitline Discharge in Caches

Deep subµ high-performance caches

Use subarrays
Precharge entire caches statically
No precharging delay exposed

Large discharge from subarrays

… …

…

SLIDE 3

Bitline Isolation

Stop discharge by cutting off Vdd-bitline path

A.k.a. leakage biased bitlines
Turn off precharge devices

Need selective mechanisms to control …

… …

SLIDE 4

Per-access Precharging Control

Ideally, best for energy saving

All bitlines isolated initially
Precharge only accessed subarrays
On-demand wakeup using partial decoding

Can be done for free? Energy cost, Timeliness

… …

…

SLIDE 5

Contributions

Bitline Isolation

Energy: Large cost before, not in the future

Per-access control viable in the future

Performance: On-demand wakeup is late

Early precharging is required

Ideal early precharging Vs. Resizable caches

Large opportunity (74%) for per-access control

SLIDE 6

Methodology

CACTI 3.0 and SPICE simulations

180nm-2V,130nm-1.7V,100nm-1.3V,70nm-1V

Highly modified Wattch 1.0

12 SPEC benchmarks
8-wide 64 Issue queue w/ 128 Active list
32KB 2-way set associative L1 caches

SLIDE 7

Outline

Introduction
Methodology
Energy Overhead
Performance Overhead
Per-access Vs. Resizable Caches
Conclusions

SLIDE 8

Bitline Leakage in SRAM cell

Leakage occurs in all subarrays Bitline isolation: turn off bitline devices

BL BL

Wordline

Vdd Vdd Precharge

SLIDE 9

Sources of Energy Overhead

Switching precharge devices Charging up discharged bitlines

SLIDE 10

Implications

What affects energy overhead?

CMOS technology: Relatively larger wire cap
Precharge device size

Resistive load between Vdd-bitlines on cell read Fast pull-up

Subarray size
Discharging time: average cache access interval

SLIDE 11

Energy Overhead: Results

Bitline isolation energy effective in the future

180nm s tatic pullup bitline isolation 400ns 200ns Interval betw een tw

subarray

ac cess es R elativ e average pow er 0.5 1.0 1.5 2.0 130nm 100nm 70nm

SLIDE 12

Performance Impact

On-demand precharging

Precharge only accessed subarrays
On-demand wakeup using partial decoding

Address Decoding Partial Address Decoding

Wordline Assertion

Bitline Precharging

?

SLIDE 13

Cache Decoder Architecture

subarray decoder subarray decoder subarray decoder subarray decoder 3-to-8 Address 3-to-8 Address Stage 1 Stage 2 Stage 3

SLIDE 14

Implications

What affects the delay?

Precharging delay

CMOS technology: Longer wire delay Size of subarray

Partial address decoding

# of subarrays: More bits for indentifying subarray

SLIDE 15

Performance Impact: Results

Early precharging is desirable

0.19 0.07 70 0.28 0.10 100 0.36 0.13 130 0.50 0.18 180 4KB 128-row 0.16 0.06 70 0.24 0.09 100 0.31 0.13 130 0.39 0.15 180 1KB 32-row Bitline precharge(ns) Stage 3 Delay (ns) Feature Size (nm) Subarray size

SLIDE 16

Per-Access Vs. Resizable Caches

Resizable caches [Albonesi][Yang et. al]

Monitor/Adapt infrequently
Energy/time overhead amortized in large interval

Important in the past, not in the future

Possibly suboptimal control

Coarse-grain Less sensitive

SLIDE 17

ammp applu apsi compress gcc ijpeg m88ksim su2cor swim tomcatv vortex vpr AVG. 20 60 100 80 40 bitline discharge Reduction (%) in

Opportunity

74% opportunity for instruction caches
70nm technology

SLIDE 18

Comparison: Resizable Caches

Resizable caches: consistent over technologies Per-access control: capturing opportunity

180nm 130nm 100nm 70nm Reduction (%) in

10 30 50 70 bitline discharge Perfect Prediction Resizable Cache

SLIDE 19

Conclusions

Smaller energy overhead in the future

Per-access fine control viable in the future

On-demand wakeup is late

Early precharging to avoid performance hit

74% opportunity for per-access control for 70nm

Significantly less opportunity for the past Resizable caches good for the all generations

SLIDE 20

Performance/Power Trade-Offs

Se-Hyun Yang and Babak Falsafi

Computer Architecture Lab at Carnegie Mellon Elecrtrical and Computer Engineering Carnegie Mellon University

High Bitline Discharge in Caches

Deep subµ high-performance caches

Large discharge from subarrays

… …

…

Bitline Isolation

Stop discharge by cutting off Vdd-bitline path

Need selective mechanisms to control …

… …

Per-access Precharging Control

Ideally, best for energy saving

Can be done for free? Energy cost, Timeliness

… …

…

Contributions

Bitline Isolation

Per-access control viable in the future

Early precharging is required

Large opportunity (74%) for per-access control

Methodology

CACTI 3.0 and SPICE simulations

Highly modified Wattch 1.0

Outline

Bitline Leakage in SRAM cell

Leakage occurs in all subarrays Bitline isolation: turn off bitline devices

BL BL

Wordline

Vdd Vdd Precharge

Sources of Energy Overhead

Switching precharge devices Charging up discharged bitlines

Implications

What affects energy overhead?

Resistive load between Vdd-bitlines on cell read Fast pull-up

Energy Overhead: Results

Bitline isolation energy effective in the future

180nm s tatic pullup bitline isolation 400ns 200ns Interval betw een tw

ac cess es R elativ e average pow er 0.5 1.0 1.5 2.0 130nm 100nm 70nm

Performance Impact

On-demand precharging

Address Decoding Partial Address Decoding

Wordline Assertion

Bitline Precharging

?

Cache Decoder Architecture

subarray decoder subarray decoder subarray decoder subarray decoder 3-to-8 Address 3-to-8 Address Stage 1 Stage 2 Stage 3

Implications

What affects the delay?

CMOS technology: Longer wire delay Size of subarray

# of subarrays: More bits for indentifying subarray

Performance Impact: Results

Early precharging is desirable

0.19 0.07 70 0.28 0.10 100 0.36 0.13 130 0.50 0.18 180 4KB 128-row 0.16 0.06 70 0.24 0.09 100 0.31 0.13 130 0.39 0.15 180 1KB 32-row Bitline precharge(ns) Stage 3 Delay (ns) Feature Size (nm) Subarray size

Per-Access Vs. Resizable Caches

Resizable caches [Albonesi][Yang et. al]

Important in the past, not in the future

Coarse-grain Less sensitive

Opportunity

Comparison: Resizable Caches

Resizable caches: consistent over technologies Per-access control: capturing opportunity

180nm 130nm 100nm 70nm Reduction (%) in

10 30 50 70 bitline discharge Perfect Prediction Resizable Cache

Conclusions

Per-access fine control viable in the future

Early precharging to avoid performance hit

Significantly less opportunity for the past Resizable caches good for the all generations

For more information

PowerTap Project http://www.ece.cmu.edu/~powertap Computer Architecture Lab Carnegie Mellon University