Data Criticality in Network-On-Chip Design Joshua San Miguel - - PowerPoint PPT Presentation

data criticality in network on chip design
SMART_READER_LITE
LIVE PREVIEW

Data Criticality in Network-On-Chip Design Joshua San Miguel - - PowerPoint PPT Presentation

Data Criticality in Network-On-Chip Design Joshua San Miguel Natalie Enright Jerger Network-On-Chip Efficiency Efficiency is the ability to produce results with the least amount of waste. Wasted time NoC accounts for 30-70% of on-chip data


slide-1
SLIDE 1

Data Criticality in Network-On-Chip Design

Joshua San Miguel Natalie Enright Jerger

slide-2
SLIDE 2

Network-On-Chip Efficiency

2

Efficiency is the ability to produce results with the least amount of waste.

  • Wasted time

NoC accounts for 30-70% of on-chip data access latency

[Z. Li, HPCA 2009][A. Sharifi, MICRO 2012]

  • Wasted energy

NoC accounts for 20-30% of total chip power

[J. D. Owens, IEEE Micro 2007][S. R. Vangal, IEEE JSSC 2008]

slide-3
SLIDE 3

Network-On-Chip Efficiency

3

slide-4
SLIDE 4

Network-On-Chip Efficiency

4

load request

slide-5
SLIDE 5

Network-On-Chip Efficiency

5

data response load request

slide-6
SLIDE 6

Network-On-Chip Efficiency

6

data response load request

Minimize wasted time

  • Deliver data no later than needed
slide-7
SLIDE 7

Network-On-Chip Efficiency

7

slide-8
SLIDE 8

Network-On-Chip Efficiency

8

load request 0x2

slide-9
SLIDE 9

Network-On-Chip Efficiency

9

data response

0 1 2 3 4 5 6 7

load request 0x2

slide-10
SLIDE 10

Network-On-Chip Efficiency

10

data response

0 1 2 3 4 5 6 7

load request 0x2

Minimize wasted energy

  • Deliver data no earlier than needed
slide-11
SLIDE 11

Network-On-Chip Efficiency

11

Why store data in blocks of multiple words?

  • Exploit spatial locality in applications
  • Avoid large tag arrays in caches
  • Improve row buffer utilization in DRAM
slide-12
SLIDE 12

Network-On-Chip Efficiency

12

Why store data in blocks of multiple words?

  • Exploit spatial locality in applications
  • Avoid large tag arrays in caches
  • Improve row buffer utilization in DRAM

Store data at a coarse granularity, but move data at a fine granularity.

slide-13
SLIDE 13

Network-On-Chip Efficiency

13

01:00 03:00 02:00 04:00

slide-14
SLIDE 14

Network-On-Chip Efficiency

14

01:00 03:00 02:00 04:00

slide-15
SLIDE 15

Network-On-Chip Efficiency

15

01:00 03:00 02:00 04:00

slide-16
SLIDE 16

Network-On-Chip Efficiency

16

01:00 03:00 02:00 04:00

Arrive no later than needed. But expensive. Wasted money since some arrive too early.

slide-17
SLIDE 17

Network-On-Chip Efficiency

17

01:00 03:00 02:00 04:00

slide-18
SLIDE 18

Network-On-Chip Efficiency

18

01:00 03:00 02:00 04:00

slide-19
SLIDE 19

Network-On-Chip Efficiency

19

01:00 03:00 02:00 04:00

slide-20
SLIDE 20

Network-On-Chip Efficiency

20

01:00 03:00 02:00 04:00

slide-21
SLIDE 21

Network-On-Chip Efficiency

21

01:00 03:00 02:00 04:00

slide-22
SLIDE 22

Network-On-Chip Efficiency

22

01:00 03:00 02:00 04:00

Spend just enough money to arrive both no later and no earlier than needed.

slide-23
SLIDE 23

Network-On-Chip Efficiency

23

01:00 03:00 02:00 04:00

Spend just enough money to arrive both no later and no earlier than needed. Deliver data both no later and no earlier than needed

  • Design for data criticality
slide-24
SLIDE 24

Outline

24

Defining Criticality

  • Data Criticality
  • Data Liveness

Measuring Criticality

  • Energy Wasted

Addressing Criticality

  • NoCNoC
slide-25
SLIDE 25

Data Criticality

25

Data Criticality is the promptness with which an application uses a data word after fetching it from memory.

Critical: used immediately after being fetched. Non-critical: used some time later after being fetched.

Defining Criticality

slide-26
SLIDE 26

Data Criticality – blackscholes

26

for (i++) { ... = BlkSchlsEqEuroNoDiv(sptprice[i]); }

Defining Criticality

slide-27
SLIDE 27

Data Criticality – blackscholes

27

for (i++) { ... = BlkSchlsEqEuroNoDiv(sptprice[i]); }

Defining Criticality

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

sptprice i = 0

slide-28
SLIDE 28

Data Criticality – blackscholes

28

for (i++) { ... = BlkSchlsEqEuroNoDiv(sptprice[i]); }

Defining Criticality

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

sptprice i = 15

slide-29
SLIDE 29

Data Criticality – blackscholes

29

for (i++) { ... = BlkSchlsEqEuroNoDiv(sptprice[i]); }

Defining Criticality

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

sptprice criticality

slide-30
SLIDE 30

Data Criticality – fluidanimate

30

for (iparNeigh++) { if (borderNeigh) { pthread_mutex_lock(); neigh->a[iparNeigh] -= ...; pthread_mutex_unlock(); } else { neigh->a[iparNeigh] -= ...; } }

Defining Criticality

slide-31
SLIDE 31

Data Criticality

31

Data criticality is an inherent consequence of spatial locality and is exhibited by most (if not all) real-world applications. Examples of non-criticality:

  • Long-running code between accesses
  • Interference due to thread synchronization
  • Dependences from other cache misses
  • Preemption by the operating system

Defining Criticality

slide-32
SLIDE 32

Data Criticality vs. Instruction Criticality

32

Instruction (or packet) criticality

Defining Criticality load miss B load miss A load miss C load miss D

slide-33
SLIDE 33

Data Criticality vs. Instruction Criticality

33

Instruction (or packet) criticality

Defining Criticality load miss B load miss A load miss C load miss D

slide-34
SLIDE 34

Data Criticality vs. Instruction Criticality

34

Data criticality

Defining Criticality load miss B load miss A load miss C load miss D

slide-35
SLIDE 35

Data Liveness

35

Data Liveness describes whether or not an application uses a data word at all after fetching it from memory.

Live-on-arrival (live): used at least once during its cache lifetime. Dead-on-arrival (dead): never used during its cache lifetime.

Defining Criticality

slide-36
SLIDE 36

Data Liveness – fluidanimate

36

for (iz++) for (iy++) for (ix++) for (j++) { if (border(ix)) ... = cell->v[j].x; if (border(iy)) ... = cell->v[j].y; if (border(iz)) ... = cell->v[j].z; }

Defining Criticality

slide-37
SLIDE 37

Data Liveness – fluidanimate

37

for (iz++) for (iy++) for (ix++) for (j++) { if (border(ix)) ... = cell->v[j].x; if (border(iy)) ... = cell->v[j].y; if (border(iz)) ... = cell->v[j].z; }

Defining Criticality

x y z x y z x y z x y z x y z

cell->v

slide-38
SLIDE 38

Data Liveness – fluidanimate

38

for (iz++) for (iy++) for (ix++) for (j++) { if (border(ix)) ... = cell->v[j].x; if (border(iy)) ... = cell->v[j].y; if (border(iz)) ... = cell->v[j].z; }

Defining Criticality

x y z x y z x y z x y z x y z

cell->v

slide-39
SLIDE 39

Data Liveness

39

Data liveness measures the degree of spatial locality in an application. Examples of dead words:

  • Unused members of structs
  • Irregular or random access patterns
  • Heap fragmentation
  • Padding between data elements
  • Early evictions due to invalidations, cache pressure or poor replacement

policies

Defining Criticality

slide-40
SLIDE 40

Outline

40

Defining Criticality

  • Data Criticality
  • Data Liveness

Measuring Criticality

  • Energy Wasted

Addressing Criticality

  • NoCNoC
slide-41
SLIDE 41

Measuring Criticality

41 Measuring Criticality time load miss A fetch A[i] use A[i] fetch latency

slide-42
SLIDE 42

Measuring Criticality

42 Measuring Criticality time load miss A fetch A[i] use A[i] fetch latency access latency

slide-43
SLIDE 43

𝑜𝑝𝑜−𝑑𝑠𝑗𝑢𝑗𝑑𝑏𝑚𝑗𝑢𝑧 = 𝑏𝑑𝑑𝑓𝑡𝑡 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 𝑔𝑓𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧

1x for critical words, >1x for non-critical words

Measuring Criticality

43 Measuring Criticality time load miss A fetch A[i] use A[i] fetch latency access latency

slide-44
SLIDE 44

Measuring Criticality

44

Full-system simulations:

  • FeS2, BookSim, DSENT
  • 16 2.0 GHz OoO cores
  • 64 kB private L1 per core, 16-word cache blocks
  • 16 MB shared distributed L2

Baseline NoC configuration:

  • 4 x 4 mesh, 2.0 GHz, 128-bit channels
  • X-Y routing, 3-stage router pipeline, 6 4-flit VCs per port

Applications:

  • PARSEC and SPLASH-2

Measuring Criticality

slide-45
SLIDE 45

Measuring Criticality

45

Very low criticality

Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 2x 3x 4x 5x 6x 7x 8x 9x 10x % accessed words (cumulative) access latency / fetch latency blackscholes bodytrack fluidanimate streamcluster swaptions

slide-46
SLIDE 46

Measuring Criticality

46

Low criticality

Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 2x 3x 4x 5x 6x 7x 8x 9x 10x % accessed words (cumulative) access latency / fetch latency barnes lu_cb water_nsquared water_spatial

slide-47
SLIDE 47

Measuring Criticality

47

High criticality

Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 2x 3x 4x 5x 6x 7x 8x 9x 10x % accessed words (cumulative) access latency / fetch latency fft vips volrend x264

slide-48
SLIDE 48

Measuring Criticality

48

Very high criticality

Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 2x 3x 4x 5x 6x 7x 8x 9x 10x % accessed words (cumulative) access latency / fetch latency canneal cholesky radiosity radix

slide-49
SLIDE 49

Measuring Criticality – Energy Wasted

49

Estimate energy wasted due to non-criticality

  • Model an ideal NoC where for each word:

𝑔𝑓𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 ≈ 𝑏𝑑𝑑𝑓𝑡𝑡 𝑚𝑏𝑢𝑓𝑜𝑑𝑧

Measuring Criticality

slide-50
SLIDE 50

Measuring Criticality – Energy Wasted

50 Measuring Criticality

Estimate energy wasted due to non-criticality

  • Model an ideal NoC where for each word:

𝑔𝑓𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 ≈ 𝑏𝑑𝑑𝑓𝑡𝑡 𝑚𝑏𝑢𝑓𝑜𝑑𝑧

slide-51
SLIDE 51

Measuring Criticality – Energy Wasted

51 Measuring Criticality high criticality low criticality

Estimate energy wasted due to non-criticality

  • Model an ideal NoC where for each word:

𝑔𝑓𝑢𝑑ℎ 𝑚𝑏𝑢𝑓𝑜𝑑𝑧 ≈ 𝑏𝑑𝑑𝑓𝑡𝑡 𝑚𝑏𝑢𝑓𝑜𝑑𝑧

slide-52
SLIDE 52

Measuring Criticality – Energy Wasted

52

e.g., bodytrack:

Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 6x 11x 16x 21x 26x 31x % accessed words (cumulative) access latency / fetch latency subnet 1 subnet 2 subnet 3 subnet 4 subnet 5 subnet 6

slide-53
SLIDE 53

Measuring Criticality – Energy Wasted

53

e.g., bodytrack:

Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 6x 11x 16x 21x 26x 31x % accessed words (cumulative) access latency / fetch latency subnet 1 subnet 2 subnet 3 subnet 4 subnet 5 subnet 6

1x – 1.1x 1.1x – 2x 2x – 3.5x 3.5x – 7x 18x - ∞ 7x – 18x

slide-54
SLIDE 54

Measuring Criticality – Energy Wasted

54

e.g., bodytrack:

Measuring Criticality 0% 20% 40% 60% 80% 100% 1x 6x 11x 16x 21x 26x 31x % accessed words (cumulative) access latency / fetch latency subnet 1 subnet 2 subnet 3 subnet 4 subnet 5 subnet 6

1x – 1.1x, 2.0 GHz 1.1x – 2x, 1.8 GHz 2x – 3.5x , 1.0 GHz 3.5x – 7x, 571.4 MHz 18x - ∞, 111.0 MHz 7x – 18x, 285.7 MHz

slide-55
SLIDE 55

Measuring Criticality – Energy Wasted

55

Dynamic energy wasted due to non-criticality

Measuring Criticality 0% 5% 10% 15% 20% 25% 30% 35% 40% barnes blackscholes bodytrack canneal cholesky fft fluidanimate lu_cb radiosity radix streamcluster swaptions vips volrend water_nsquared water_spatial x264 geomean dynamic energy wasted

slide-56
SLIDE 56

Measuring Criticality – Energy Wasted

56

Dynamic energy wasted due to non-criticality

Measuring Criticality 0% 5% 10% 15% 20% 25% 30% 35% 40% barnes blackscholes bodytrack canneal cholesky fft fluidanimate lu_cb radiosity radix streamcluster swaptions vips volrend water_nsquared water_spatial x264 geomean dynamic energy wasted

slide-57
SLIDE 57

Measuring Criticality – Energy Wasted

57

Dynamic energy wasted due to non-criticality

Measuring Criticality 0% 5% 10% 15% 20% 25% 30% 35% 40% barnes blackscholes bodytrack canneal cholesky fft fluidanimate lu_cb radiosity radix streamcluster swaptions vips volrend water_nsquared water_spatial x264 geomean dynamic energy wasted

slide-58
SLIDE 58

Measuring Criticality – Energy Wasted

58

Dynamic energy wasted due to dead words

Measuring Criticality 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% barnes blackscholes bodytrack canneal cholesky fft fluidanimate lu_cb radiosity radix streamcluster swaptions vips volrend water_nsquared water_spatial x264 geomean dynamic energy wasted

slide-59
SLIDE 59

Measuring Criticality – Energy Wasted

59

Dynamic energy wasted due to dead words

Measuring Criticality 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% barnes blackscholes bodytrack canneal cholesky fft fluidanimate lu_cb radiosity radix streamcluster swaptions vips volrend water_nsquared water_spatial x264 geomean dynamic energy wasted

68.8% energy wasted due to non-critical and dead words.

slide-60
SLIDE 60

Outline

60

Defining Criticality

  • Data Criticality
  • Data Liveness

Measuring Criticality

  • Energy Wasted

Addressing Criticality

  • NoCNoC
slide-61
SLIDE 61

Addressing Criticality

61

A criticality-aware NoC design needs to:

1. Predict the criticality of a word prior to fetching it. 2. Separate the fetching of words based on their criticality. 3. Reduce energy consumption in fetching low-criticality words. 4. Eliminate the fetching of dead words.

NoCNoC (Non-Critical NoC)

A proof-of-concept, criticality-aware design

Addressing Criticality

slide-62
SLIDE 62

NoCNoC

62 Addressing Criticality

1. Predict the criticality of a word prior to fetching it.

  • Binary predictor: either critical or non-critical.
slide-63
SLIDE 63

NoCNoC

63 Addressing Criticality

1. Predict the criticality of a word prior to fetching it.

  • Binary predictor: either critical or non-critical.

load miss (requested word 4)

slide-64
SLIDE 64

NoCNoC

64 Addressing Criticality

000000000000010X101000000000000

  • 15

… … 15 instruction address prediction table

1. Predict the criticality of a word prior to fetching it.

  • Binary predictor: either critical or non-critical.

load miss (requested word 4)

slide-65
SLIDE 65

NoCNoC

65 Addressing Criticality

000000000000010X101000000000000

  • 15

… … 15 instruction address prediction table

1. Predict the criticality of a word prior to fetching it.

  • Binary predictor: either critical or non-critical.

load miss (requested word 4)

slide-66
SLIDE 66

NoCNoC

66 Addressing Criticality

000000000000010X101000000000000

  • 15

… … 15 instruction address prediction table

1. Predict the criticality of a word prior to fetching it.

  • Binary predictor: either critical or non-critical.

load miss (requested word 4)

slide-67
SLIDE 67

NoCNoC

67 Addressing Criticality

000000000000010X101000000000000

  • 15

… … 15 instruction address prediction table

1. Predict the criticality of a word prior to fetching it.

  • Binary predictor: either critical or non-critical.

load miss (requested word 4)

slide-68
SLIDE 68

NoCNoC

68 Addressing Criticality

000000000000010X101000000000000

  • 15

… … 15 instruction address prediction table … 15 prediction vector (requested word 4)

0010110100000000

1. Predict the criticality of a word prior to fetching it.

  • Binary predictor: either critical or non-critical.

load miss (requested word 4)

slide-69
SLIDE 69

NoCNoC

69 Addressing Criticality

000000000000010X101000000000000

  • 15

… … 15 instruction address prediction table … 15 prediction vector (requested word 4)

L1 request packet 0010110100000000

1. Predict the criticality of a word prior to fetching it.

  • Binary predictor: either critical or non-critical.

load miss (requested word 4)

slide-70
SLIDE 70

NoCNoC

70

2. Separate the fetching of words based on their criticality.

  • Two physical subnetworks: critical and non-critical.

Addressing Criticality

slide-71
SLIDE 71

NoCNoC

71

2. Separate the fetching of words based on their criticality.

  • Two physical subnetworks: critical and non-critical.

Addressing Criticality

slide-72
SLIDE 72

NoCNoC

72 Addressing Criticality critical non-critical

2. Separate the fetching of words based on their criticality.

  • Two physical subnetworks: critical and non-critical.
slide-73
SLIDE 73

NoCNoC

73

3. Reduce energy consumption in fetching low-criticality words.

  • DVFS in non-critical subnetwork.

Addressing Criticality

slide-74
SLIDE 74

NoCNoC

74

3. Reduce energy consumption in fetching low-criticality words.

  • DVFS in non-critical subnetwork.

Addressing Criticality critical if criticality very low

slide-75
SLIDE 75

NoCNoC

75

3. Reduce energy consumption in fetching low-criticality words.

  • DVFS in non-critical subnetwork.

Addressing Criticality critical if criticality low

slide-76
SLIDE 76

NoCNoC

76

3. Reduce energy consumption in fetching low-criticality words.

  • DVFS in non-critical subnetwork.

Addressing Criticality critical if criticality high

slide-77
SLIDE 77

NoCNoC

77

3. Reduce energy consumption in fetching low-criticality words.

  • DVFS in non-critical subnetwork.

Addressing Criticality critical if criticality very high

slide-78
SLIDE 78

NoCNoC

78

4. Eliminate the fetching of dead words.

  • Binary predictor: either live or dead.

Addressing Criticality

slide-79
SLIDE 79

NoCNoC

79

4. Eliminate the fetching of dead words.

  • Binary predictor: either live or dead.

Addressing Criticality

000000000000010X101000000000000

  • 15

… … 15 instruction address prediction table

slide-80
SLIDE 80

NoCNoC

80

More details and results in the paper:

  • DVFS scheme
  • Prediction tables
  • Comparison to instruction criticality (Aergia [R. Das, ISCA 2010])

Addressing Criticality

slide-81
SLIDE 81

NoCNoC – Prediction Accuracy

81 Addressing Criticality 80% 85% 90% 95% 100% criticality liveness prediction accuracy correct

  • ver

under

slide-82
SLIDE 82

NoCNoC – Prediction Accuracy

82 Addressing Criticality 80% 85% 90% 95% 100% criticality liveness prediction accuracy correct

  • ver

under

Outperforms prior liveness predictor (70% accuracy) [H. Kim, NOCS 2011]

slide-83
SLIDE 83

NoCNoC – Performance and Energy

83 Addressing Criticality 0.6 0.7 0.8 0.9 1 1.1 dynamic energy runtime normalized to baseline

slide-84
SLIDE 84

Conclusion

84

Define Data Criticality

  • Deliver data both no later and no earlier than needed.

Measure Criticality

  • 68.8% energy wasted due to non-critical and dead words.

Address Criticality

  • NoCNoC, a proof-of-concept, criticality-aware design.
slide-85
SLIDE 85

Thank you