Data Criticality in Network-On-Chip Design Joshua San Miguel - - PowerPoint PPT Presentation
Data Criticality in Network-On-Chip Design Joshua San Miguel - - PowerPoint PPT Presentation
Data Criticality in Network-On-Chip Design Joshua San Miguel Natalie Enright Jerger Network-On-Chip Efficiency Efficiency is the ability to produce results with the least amount of waste. Wasted time NoC accounts for 30-70% of on-chip data
Network-On-Chip Efficiency
2
Efficiency is the ability to produce results with the least amount of waste.
- Wasted time
NoC accounts for 30-70% of on-chip data access latency
[Z. Li, HPCA 2009][A. Sharifi, MICRO 2012]
- Wasted energy
NoC accounts for 20-30% of total chip power
[J. D. Owens, IEEE Micro 2007][S. R. Vangal, IEEE JSSC 2008]
Network-On-Chip Efficiency
3
Network-On-Chip Efficiency
4
load request
Network-On-Chip Efficiency
5
data response load request
Network-On-Chip Efficiency
6
data response load request
Minimize wasted time
- Deliver data no later than needed
Network-On-Chip Efficiency
7
Network-On-Chip Efficiency
8
load request 0x2
Network-On-Chip Efficiency
9
data response
0 1 2 3 4 5 6 7
load request 0x2
Network-On-Chip Efficiency
10
data response
0 1 2 3 4 5 6 7
load request 0x2
Minimize wasted energy
- Deliver data no earlier than needed
Network-On-Chip Efficiency
11
Why store data in blocks of multiple words?
- Exploit spatial locality in applications
- Avoid large tag arrays in caches
- Improve row buffer utilization in DRAM
Network-On-Chip Efficiency
12
Why store data in blocks of multiple words?
- Exploit spatial locality in applications
- Avoid large tag arrays in caches
- Improve row buffer utilization in DRAM
Store data at a coarse granularity, but move data at a fine granularity.
Network-On-Chip Efficiency
13
01:00 03:00 02:00 04:00
Network-On-Chip Efficiency
14
01:00 03:00 02:00 04:00
Network-On-Chip Efficiency
15
01:00 03:00 02:00 04:00
Network-On-Chip Efficiency
16
01:00 03:00 02:00 04:00
Arrive no later than needed. But expensive. Wasted money since some arrive too early.
Network-On-Chip Efficiency
17
01:00 03:00 02:00 04:00
Network-On-Chip Efficiency
18
01:00 03:00 02:00 04:00
Network-On-Chip Efficiency
19
01:00 03:00 02:00 04:00
Network-On-Chip Efficiency
20
01:00 03:00 02:00 04:00
Network-On-Chip Efficiency
21
01:00 03:00 02:00 04:00
Network-On-Chip Efficiency
22
01:00 03:00 02:00 04:00
Spend just enough money to arrive both no later and no earlier than needed.
Network-On-Chip Efficiency
23
01:00 03:00 02:00 04:00
Spend just enough money to arrive both no later and no earlier than needed. Deliver data both no later and no earlier than needed
- Design for data criticality
Outline
24
Defining Criticality
- Data Criticality
- Data Liveness
Measuring Criticality
- Energy Wasted
Addressing Criticality
- NoCNoC
Data Criticality
25
Data Criticality is the promptness with which an application uses a data word after fetching it from memory.
Critical: used immediately after being fetched. Non-critical: used some time later after being fetched.
Defining Criticality
Data Criticality – blackscholes
26
for (i++) { ... = BlkSchlsEqEuroNoDiv(sptprice[i]); }
Defining Criticality
Data Criticality – blackscholes
27
for (i++) { ... = BlkSchlsEqEuroNoDiv(sptprice[i]); }
Defining Criticality
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
sptprice i = 0
Data Criticality – blackscholes
28
for (i++) { ... = BlkSchlsEqEuroNoDiv(sptprice[i]); }
Defining Criticality
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
sptprice i = 15
Data Criticality – blackscholes
29
for (i++) { ... = BlkSchlsEqEuroNoDiv(sptprice[i]); }
Defining Criticality
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
sptprice criticality
Data Criticality – fluidanimate
30
for (iparNeigh++) { if (borderNeigh) { pthread_mutex_lock(); neigh->a[iparNeigh] -= ...; pthread_mutex_unlock(); } else { neigh->a[iparNeigh] -= ...; } }
Defining Criticality
Data Criticality
31
Data criticality is an inherent consequence of spatial locality and is exhibited by most (if not all) real-world applications. Examples of non-criticality:
- Long-running code between accesses
- Interference due to thread synchronization
- Dependences from other cache misses
- Preemption by the operating system
Defining Criticality
Data Criticality vs. Instruction Criticality
32
Instruction (or packet) criticality
Defining Criticality load miss B load miss A load miss C load miss D
Data Criticality vs. Instruction Criticality
33
Instruction (or packet) criticality
Defining Criticality load miss B load miss A load miss C load miss D
Data Criticality vs. Instruction Criticality
34
Data criticality
Defining Criticality load miss B load miss A load miss C load miss D
Data Liveness
35
Data Liveness describes whether or not an application uses a data word at all after fetching it from memory.
Live-on-arrival (live): used at least once during its cache lifetime. Dead-on-arrival (dead): never used during its cache lifetime.
Defining Criticality
Data Liveness – fluidanimate
36
for (iz++) for (iy++) for (ix++) for (j++) { if (border(ix)) ... = cell->v[j].x; if (border(iy)) ... = cell->v[j].y; if (border(iz)) ... = cell->v[j].z; }
Defining Criticality
Data Liveness – fluidanimate
37
for (iz++) for (iy++) for (ix++) for (j++) { if (border(ix)) ... = cell->v[j].x; if (border(iy)) ... = cell->v[j].y; if (border(iz)) ... = cell->v[j].z; }
Defining Criticality
x y z x y z x y z x y z x y z
cell->v
Data Liveness – fluidanimate
38
for (iz++) for (iy++) for (ix++) for (j++) { if (border(ix)) ... = cell->v[j].x; if (border(iy)) ... = cell->v[j].y; if (border(iz)) ... = cell->v[j].z; }
Defining Criticality
x y z x y z x y z x y z x y z
cell->v
Data Liveness
39
Data liveness measures the degree of spatial locality in an application. Examples of dead words:
- Unused members of structs
- Irregular or random access patterns
- Heap fragmentation
- Padding between data elements
- Early evictions due to invalidations, cache pressure or poor replacement
policies
Defining Criticality
Outline
40
Defining Criticality
- Data Criticality
- Data Liveness
Measuring Criticality
- Energy Wasted
Addressing Criticality
- NoCNoC
Measuring Criticality
41 Measuring Criticality time load miss A fetch A[i] use A[i] fetch latency
Measuring Criticality
42 Measuring Criticality time load miss A fetch A[i] use A[i] fetch latency access latency
𝑜𝑝𝑜−𝑑𝑠𝑗𝑢𝑗𝑑𝑏𝑚𝑗𝑢𝑧 = 𝑏𝑑𝑑𝑓𝑡𝑡 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 𝑔𝑓𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧
1x for critical words, >1x for non-critical words
Measuring Criticality
43 Measuring Criticality time load miss A fetch A[i] use A[i] fetch latency access latency
Measuring Criticality
44
Full-system simulations:
- FeS2, BookSim, DSENT
- 16 2.0 GHz OoO cores
- 64 kB private L1 per core, 16-word cache blocks
- 16 MB shared distributed L2
Baseline NoC configuration:
- 4 x 4 mesh, 2.0 GHz, 128-bit channels
- X-Y routing, 3-stage router pipeline, 6 4-flit VCs per port
Applications:
- PARSEC and SPLASH-2
Measuring Criticality
Measuring Criticality
45
Very low criticality
Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 2x 3x 4x 5x 6x 7x 8x 9x 10x % accessed words (cumulative) access latency / fetch latency blackscholes bodytrack fluidanimate streamcluster swaptions
Measuring Criticality
46
Low criticality
Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 2x 3x 4x 5x 6x 7x 8x 9x 10x % accessed words (cumulative) access latency / fetch latency barnes lu_cb water_nsquared water_spatial
Measuring Criticality
47
High criticality
Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 2x 3x 4x 5x 6x 7x 8x 9x 10x % accessed words (cumulative) access latency / fetch latency fft vips volrend x264
Measuring Criticality
48
Very high criticality
Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 2x 3x 4x 5x 6x 7x 8x 9x 10x % accessed words (cumulative) access latency / fetch latency canneal cholesky radiosity radix
Measuring Criticality – Energy Wasted
49
Estimate energy wasted due to non-criticality
- Model an ideal NoC where for each word:
𝑔𝑓𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 ≈ 𝑏𝑑𝑑𝑓𝑡𝑡 𝑚𝑏𝑢𝑓𝑜𝑑𝑧
Measuring Criticality
Measuring Criticality – Energy Wasted
50 Measuring Criticality
Estimate energy wasted due to non-criticality
- Model an ideal NoC where for each word:
𝑔𝑓𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 ≈ 𝑏𝑑𝑑𝑓𝑡𝑡 𝑚𝑏𝑢𝑓𝑜𝑑𝑧
Measuring Criticality – Energy Wasted
51 Measuring Criticality high criticality low criticality
Estimate energy wasted due to non-criticality
- Model an ideal NoC where for each word:
𝑔𝑓𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 ≈ 𝑏𝑑𝑑𝑓𝑡𝑡 𝑚𝑏𝑢𝑓𝑜𝑑𝑧
Measuring Criticality – Energy Wasted
52
e.g., bodytrack:
Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 6x 11x 16x 21x 26x 31x % accessed words (cumulative) access latency / fetch latency subnet 1 subnet 2 subnet 3 subnet 4 subnet 5 subnet 6
Measuring Criticality – Energy Wasted
53
e.g., bodytrack:
Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 6x 11x 16x 21x 26x 31x % accessed words (cumulative) access latency / fetch latency subnet 1 subnet 2 subnet 3 subnet 4 subnet 5 subnet 6
1x – 1.1x 1.1x – 2x 2x – 3.5x 3.5x – 7x 18x - ∞ 7x – 18x
Measuring Criticality – Energy Wasted
54
e.g., bodytrack:
Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 6x 11x 16x 21x 26x 31x % accessed words (cumulative) access latency / fetch latency subnet 1 subnet 2 subnet 3 subnet 4 subnet 5 subnet 6
1x – 1.1x, 2.0 GHz 1.1x – 2x, 1.8 GHz 2x – 3.5x , 1.0 GHz 3.5x – 7x, 571.4 MHz 18x - ∞, 111.0 MHz 7x – 18x, 285.7 MHz
Measuring Criticality – Energy Wasted
55
Dynamic energy wasted due to non-criticality
Measuring Criticality 0% 5% 10% 15% 20% 25% 30% 35% 40% barnes blackscholes bodytrack canneal cholesky fft fluidanimate lu_cb radiosity radix streamcluster swaptions vips volrend water_nsquared water_spatial x264 geomean dynamic energy wasted
Measuring Criticality – Energy Wasted
56
Dynamic energy wasted due to non-criticality
Measuring Criticality 0% 5% 10% 15% 20% 25% 30% 35% 40% barnes blackscholes bodytrack canneal cholesky fft fluidanimate lu_cb radiosity radix streamcluster swaptions vips volrend water_nsquared water_spatial x264 geomean dynamic energy wasted
Measuring Criticality – Energy Wasted
57
Dynamic energy wasted due to non-criticality
Measuring Criticality 0% 5% 10% 15% 20% 25% 30% 35% 40% barnes blackscholes bodytrack canneal cholesky fft fluidanimate lu_cb radiosity radix streamcluster swaptions vips volrend water_nsquared water_spatial x264 geomean dynamic energy wasted
Measuring Criticality – Energy Wasted
58
Dynamic energy wasted due to dead words
Measuring Criticality 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% barnes blackscholes bodytrack canneal cholesky fft fluidanimate lu_cb radiosity radix streamcluster swaptions vips volrend water_nsquared water_spatial x264 geomean dynamic energy wasted
Measuring Criticality – Energy Wasted
59
Dynamic energy wasted due to dead words
Measuring Criticality 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% barnes blackscholes bodytrack canneal cholesky fft fluidanimate lu_cb radiosity radix streamcluster swaptions vips volrend water_nsquared water_spatial x264 geomean dynamic energy wasted
68.8% energy wasted due to non-critical and dead words.
Outline
60
Defining Criticality
- Data Criticality
- Data Liveness
Measuring Criticality
- Energy Wasted
Addressing Criticality
- NoCNoC
Addressing Criticality
61
A criticality-aware NoC design needs to:
1. Predict the criticality of a word prior to fetching it. 2. Separate the fetching of words based on their criticality. 3. Reduce energy consumption in fetching low-criticality words. 4. Eliminate the fetching of dead words.
NoCNoC (Non-Critical NoC)
A proof-of-concept, criticality-aware design
Addressing Criticality
NoCNoC
62 Addressing Criticality
1. Predict the criticality of a word prior to fetching it.
- Binary predictor: either critical or non-critical.
NoCNoC
63 Addressing Criticality
1. Predict the criticality of a word prior to fetching it.
- Binary predictor: either critical or non-critical.
load miss (requested word 4)
NoCNoC
64 Addressing Criticality
000000000000010X101000000000000
- 15
… … 15 instruction address prediction table
1. Predict the criticality of a word prior to fetching it.
- Binary predictor: either critical or non-critical.
load miss (requested word 4)
NoCNoC
65 Addressing Criticality
000000000000010X101000000000000
- 15
… … 15 instruction address prediction table
1. Predict the criticality of a word prior to fetching it.
- Binary predictor: either critical or non-critical.
load miss (requested word 4)
NoCNoC
66 Addressing Criticality
000000000000010X101000000000000
- 15
… … 15 instruction address prediction table
1. Predict the criticality of a word prior to fetching it.
- Binary predictor: either critical or non-critical.
load miss (requested word 4)
NoCNoC
67 Addressing Criticality
000000000000010X101000000000000
- 15
… … 15 instruction address prediction table
1. Predict the criticality of a word prior to fetching it.
- Binary predictor: either critical or non-critical.
load miss (requested word 4)
NoCNoC
68 Addressing Criticality
000000000000010X101000000000000
- 15
… … 15 instruction address prediction table … 15 prediction vector (requested word 4)
0010110100000000
1. Predict the criticality of a word prior to fetching it.
- Binary predictor: either critical or non-critical.
load miss (requested word 4)
NoCNoC
69 Addressing Criticality
000000000000010X101000000000000
- 15
… … 15 instruction address prediction table … 15 prediction vector (requested word 4)
L1 request packet 0010110100000000
1. Predict the criticality of a word prior to fetching it.
- Binary predictor: either critical or non-critical.
load miss (requested word 4)
NoCNoC
70
2. Separate the fetching of words based on their criticality.
- Two physical subnetworks: critical and non-critical.
Addressing Criticality
NoCNoC
71
2. Separate the fetching of words based on their criticality.
- Two physical subnetworks: critical and non-critical.
Addressing Criticality
NoCNoC
72 Addressing Criticality critical non-critical
2. Separate the fetching of words based on their criticality.
- Two physical subnetworks: critical and non-critical.
NoCNoC
73
3. Reduce energy consumption in fetching low-criticality words.
- DVFS in non-critical subnetwork.
Addressing Criticality
NoCNoC
74
3. Reduce energy consumption in fetching low-criticality words.
- DVFS in non-critical subnetwork.
Addressing Criticality critical if criticality very low
NoCNoC
75
3. Reduce energy consumption in fetching low-criticality words.
- DVFS in non-critical subnetwork.
Addressing Criticality critical if criticality low
NoCNoC
76
3. Reduce energy consumption in fetching low-criticality words.
- DVFS in non-critical subnetwork.
Addressing Criticality critical if criticality high
NoCNoC
77
3. Reduce energy consumption in fetching low-criticality words.
- DVFS in non-critical subnetwork.
Addressing Criticality critical if criticality very high
NoCNoC
78
4. Eliminate the fetching of dead words.
- Binary predictor: either live or dead.
Addressing Criticality
NoCNoC
79
4. Eliminate the fetching of dead words.
- Binary predictor: either live or dead.
Addressing Criticality
000000000000010X101000000000000
- 15
… … 15 instruction address prediction table
NoCNoC
80
More details and results in the paper:
- DVFS scheme
- Prediction tables
- Comparison to instruction criticality (Aergia [R. Das, ISCA 2010])
Addressing Criticality
NoCNoC – Prediction Accuracy
81 Addressing Criticality 80% 85% 90% 95% 100% criticality liveness prediction accuracy correct
- ver
under
NoCNoC – Prediction Accuracy
82 Addressing Criticality 80% 85% 90% 95% 100% criticality liveness prediction accuracy correct
- ver
under
Outperforms prior liveness predictor (70% accuracy) [H. Kim, NOCS 2011]
NoCNoC – Performance and Energy
83 Addressing Criticality 0.6 0.7 0.8 0.9 1 1.1 dynamic energy runtime normalized to baseline
Conclusion
84
Define Data Criticality
- Deliver data both no later and no earlier than needed.
Measure Criticality
- 68.8% energy wasted due to non-critical and dead words.
Address Criticality
- NoCNoC, a proof-of-concept, criticality-aware design.