Lecture 15: OS Noise and Interference
Abhinav Bhatele, Department of Computer Science
High Performance Computing Systems (CMSC714)
Lecture 15: OS Noise and Interference Abhinav Bhatele, Department of - - PowerPoint PPT Presentation
High Performance Computing Systems (CMSC714) Lecture 15: OS Noise and Interference Abhinav Bhatele, Department of Computer Science Summary of last lecture Goal of auto-tuning: performance portability Selecting code variants,
Abhinav Bhatele, Department of Computer Science
High Performance Computing Systems (CMSC714)
Abhinav Bhatele, CMSC714
2
Abhinav Bhatele, CMSC714
3
Abhinav Bhatele, CMSC714
4
sampling time d2 d3 t 1 t 2 t 3 t min
Abhinav Bhatele, CMSC714
5
Benchmarks: https://asc.llnl.gov/sequoia/benchmarks/FTQ_summary_v1.1.pdf
50 100 150 200 1000 2000 3000 4000 5000 6000 7000 8000 Execution time (us) Core Number BG/P - Noise in sequential computation across 8192 cores Max Min
Abhinav Bhatele, CMSC714
5
Benchmarks: https://asc.llnl.gov/sequoia/benchmarks/FTQ_summary_v1.1.pdf
50 100 150 200 1000 2000 3000 4000 5000 6000 7000 8000 Execution time (us) Core Number XT4 - Noise in sequential computation across 8192 cores Max Min
Abhinav Bhatele, CMSC714
6
7 1 6 2 5 3 4
Hoefler et al.: https://htor.inf.ethz.ch/publications/img/hoefler-noise-sim.pdf
Abhinav Bhatele, CMSC714
7
§Department of Computer Science, The University of Arizona
1 1.5 2 2.5 3 Nov 29 Dec 13 Dec 27 Jan 10 Jan 24 Feb 07 Feb 21 Mar 07 Mar 21 Apr 04 Relative Performance MILC AMG UMT miniVite
Abhinav Bhatele, CMSC714
8
Abhinav Bhatele, CMSC714
9
Abhinav Bhatele, CMSC714
think context switches still happen and the CPU time can be handed from the application to system processes within a “computation phase” (p. 7). So, are granularities such as 1ms referring to the running time on a hypothetical noiseless machine and never precise on a real system? Why don’t we measure the “actual” granularities?
coscheduling needs a special kernel module (Sec. 3.3) but no alteration on the system is done here. Does this happen automatically because of the length of the noise and the length of the computations?
kind of systems immune to the types of noise discussed in this paper?
possible to make the identification of the potential causes of suboptimal performance automatic, like in the case of auto- tuning?
10
The Case of the Missing Supercomputer Performance
Abhinav Bhatele, CMSC714
variability of performances, but is there a way to build a model that can quantify how much each candidate factor affects the messaging rate?
these three systems is maximized. How is it achieved?
jobs with lower continuity are in general more likely to suffer from contention because they usually have to use more links that are shared with other jobs. Therefore, how do we decouple the two factors and conclude that allocation shape is not a major one?
the expected running time of a job, can utilize this kind of information to alleviate the “conflicting router” problem and make a better allocation?
11
There Goes the Neighborhood
Abhinav Bhatele 5218 Brendan Iribe Center (IRB) / College Park, MD 20742 phone: 301.405.4507 / e-mail: bhatele@cs.umd.edu