Exploration of Influence of Program Inputs on CMP Co-Scheduling - - PowerPoint PPT Presentation

▶

Aug 15, 2022 468 likes •732 views

Exploration of Influence of Program Inputs on CMP Co-Scheduling Yunlian Jiang Xipeng Shen Computer Science The College of William and Mary, USA Cache sharing in CMP Commercial CMPs Intel Core 2 Duo E6750 CPU CPU AMD Athlon X2

SLIDE 1

Exploration of Influence

f Program Inputs on

CMP Co-Scheduling

Yunlian Jiang Xipeng Shen

Computer Science The College of William and Mary, USA

SLIDE 2

Cache sharing in CMP

CPU

Shared Cache

CPU

Commercial CMPs

Intel Core 2 Duo E6750 AMD Athlon X2 6400+

SLIDE 3

Cache sharing

 Pros

 Shorten inter-thread communication  Flexible usage of cache

 Cons: causes cache contention

 degrade performance  impair fairness  hurt performance isolation

SLIDE 4

Job co-scheduling

P2 P1 P4 P3 CMP Chip1 CMP Chip2

 To assign jobs to chips in a manner to

minimize contention

 Example

SLIDE 5

P4 P3

Job co-scheduling

P2 P1 P4 P3 CMP Chip1 CMP Chip2 Chip2 P1 P2

 To assign jobs to chips in a manner to

minimize contention

 Example

SLIDE 6

Job co-scheduling

P2 P1 P4 P3 CMP Chip1 CMP Chip2 Chip2 Chip1 P1 P2 P4 P3

 To assign jobs to chips in a manner to

minimize contention

 Example

SLIDE 7

Previous co-scheduling work

 Runtime sampling based

 Online sampling the performance on different

schedules and pick the best

 E.g., [Tullsen+: ASPLOS’00, ….]

 Profiling directed

 Offline profiling to learn program cache behavior  E.g., [Nussbaum+: USENIX’05….]

SLIDE 8

Our focus

 Two factors determining cache contention

 Programs running together  Inputs to the programs

SLIDE 9

Contributions of this work

 Exposing input impact on cache contention  Construction of cross-input predictive models  Evaluation on a proactive co-scheduler

SLIDE 10

Contributions of this work

 Exposing input impact on cache contention  Construction of cross-input predictive models  Evaluation on a proactive co-scheduler

SLIDE 11

Measurement of input impact

 Machine: Intel Xeon dual-core processors  Compiler: gcc4.1  Hardware performance API: PAPI3.5  Experiments

 Measure the performance degradation

 every pair of 12 SPEC CPU2k programs  3 different input sets (test, train, and ref)

SLIDE 12

Metric

 sCPI : Cycles per Instruction (CPI) when running

alone

 cCPI : CPI when co-running with other programs

SLIDE 13

Co-run degradation on different inputs

SLIDE 14

Contributions of this work

 Exposing input impact on cache contention  Construction of cross-input predictive models  Evaluation on a proactive co-scheduler

SLIDE 15

Objective

Predictive model

An arbitrary input Cache behavior CAPS Scheduler Corun schedule

SLIDE 16

Proactive Co-Scheduler: CAPS

SLIDE 17

Single-run behaviors to predict

 Access per Instruction

 Density of memory references in an execution

 Distinct Memory Blocks per Cycle (DPC)

 Aggressiveness of cache contention

 Reuse Signature

DPC = Distinct Blocks per Instruction (DPI) x Instructions per cycle

SLIDE 18

Reuse signature

 Reuse distance



Number of distinct data between data reuse



E.g,



b a a c b

 Reuse signature

 Histogram of reuse distances in an execution  Predictable with over 94% accuracy [Zhong+:TC’07]

SLIDE 19

Construction of predictive models

< I1 B1 > < Ik Bk > < In Bn > … …

Regression Model

Predictive Model

New Input Memory Behavior

SLIDE 20

Regression models

 Linear model

 Least Mean Squares (LMS) method

 Linear function between inputs and outputs

 Non-linear model

 K-Nearest-Neighbor

 Use k similar instances to estimate new output value

 Hybrid method

 Pick the model with minimum training errors for a program

SLIDE 21

Contributions of this work

 Exposing input impact on cache contention  Construction of cross-input predictive models  Evaluation on a proactive co-scheduler

SLIDE 22

Prediction accuracy result

Programs Access per instruction DPI LMS NN Hybrid LMS NN Hybrid ammp 89.58 98.76 98.76 39.83 86.72 86.72 art 98.86 94.25 98.86 98.96 94.25 98.96 bzip 75.79 78.62 78.62 67.69 64.05 67.69 crafty 99.54 99.24 99.54 76.31 72.50 76.31 equake 54.58 54.42 54.58 82.27 82.13 82.27 gap 74.75 79.35 79.35 79.87 78.08 79.87 gzip 82.76 86.98 86.98 77.85 66.47 77.85 mcf 90.25 92.45 92.45 89.73 88.11 89.73 mesa 96.39 96.98 96.98 89.43 93.33 93.33 parser 96.02 98.61 98.61 89.49 70.42 89.49 twolf 97.11 98.10 98.10 52.12 86.75 86.75 vpr 81.50 81.50 81.50 96.30 95.28 96.30 Average Average 86.43 86.43 88.27 88.27 88.69 88.69 78.32 78.32 81.51 81.51 85.44 85.44

SLIDE 23

Effects on Co-Scheduling

0.5 1 1.5 2 2.5

ptimal

CAPS-real CAPS-pred random

Normalized Corun Degradation

SLIDE 24

Conclusion

 Input influence to job co-scheduling

 Co-schedulers should adapt to program inputs

 Cross-input predictive models

 Reasonable accuracy through LMS and NN  Effective in proactive co-scheduling

SLIDE 25

Exploration of Influence

CMP Co-Scheduling

Yunlian Jiang Xipeng Shen

Cache sharing in CMP

Cache sharing

Job co-scheduling

minimize contention

Job co-scheduling

minimize contention

Job co-scheduling

minimize contention

Previous co-scheduling work

Our focus

Contributions of this work

Contributions of this work

Measurement of input impact

Metric

Co-run degradation on different inputs

Contributions of this work

Objective

Predictive model

An arbitrary input Cache behavior CAPS Scheduler Corun schedule

Proactive Co-Scheduler: CAPS

Single-run behaviors to predict

Reuse signature

Construction of predictive models

< I1 B1 > < Ik Bk > < In Bn > … …

Predictive Model

Regression models

Contributions of this work

Prediction accuracy result

Effects on Co-Scheduling

Conclusion

Thanks! Questions?