Topology-aware OpenMP Process Scheduling
Peter Thoman, Hans Moritsch, and Thomas Fahringer University of Innsbruck (Austria)
Topology-aware OpenMP Process Scheduling Peter Thoman, Hans - - PowerPoint PPT Presentation
Topology-aware OpenMP Process Scheduling Peter Thoman, Hans Moritsch, and Thomas Fahringer University of Innsbruck (Austria) Motivation IWOMP 2010, Topology-aware OpenMP Process Scheduling 2010-06-15 Motivation Hardware Trends
Peter Thoman, Hans Moritsch, and Thomas Fahringer University of Innsbruck (Austria)
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
socket core core shared cache core core shared cache socket memory 2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
socket core core shared cache core core shared cache socket memory socket core core shared cache core core shared cache socket memory socket core core shared cache core core shared cache socket memory socket core core shared cache core core shared cache socket memory socket core core shared cache core core shared cache socket memory socket core core shared cache core core shared cache socket memory 2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
Maximum amount of threads that
Maximum amount of threads that
4 8 12 16 20 24 28 32
bt.B_130 lu.C_3085 mg.A_1105 mg.A_961 lu.A_120 gauss.S_40 mg.B_961 mg.A_236 gauss.L_20 mg.C_1091 mg.C_961 mg.B_236 mg.A_271 cg.A_254 gauss.S_20 mg.C_236 mg.B_1091 cg.B_254 is.A_638 cg.C_254 mg.B_271 is.B_638 cg.A_740 gauss.M_40 mmul.L_18 mg.C_271 mg.C_1105 cg.B_740 cg.A_785 mmul.M_18 lu.C_120 cg.C_740 mg.B_1105 cg.B_785 mg.A_230 lu.A_3049 is.A_652 ft.A_145 ft.A_123 gauss.M_20 mmul.S_18 cg.C_785 mg.B_230 is.B_652 ft.B_145 ft.B_123 ep.B_144 cg.A_171 cg.A_644 bt.A_149 bt.B_149 lu.C_3049 ep.C_144 mg.C_230 cg.B_171 cg.B_644 mg.A_1091 cg.C_171 cg.C_644 lu.A_3085 ep.A_144 bt.A_130 gauss.L_40
maximum threadcount
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
100 200 300 400 500 600 700 800
1 2 3 4 5 6 7 8
Total execution time (seconds) Number of parallel jobs 2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
500 1000 1500 2000 2500
1 2 3 4 5 6 7 8
Total execution time (seconds) Number of parallel jobs 2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
OMP processes send a request to server for resources
Includes scalability information for region
Use cores indicated by reply
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
Region scalability Current system-wide load System topology
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
loadfactor dependent on amount of free cores
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
socket core core shared cache core core shared cache socket memory socket core core shared cache core core shared cache socket memory socket core core shared cache core core shared cache socket memory socket core core shared cache core core shared cache socket memory
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
socket core core shared cache core core shared cache socket memory socket core core shared cache core core shared cache socket memory socket core core shared cache core core shared cache socket memory socket core core shared cache core core shared cache socket memory
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
Total distance: from each thread in a team to each other Weighted distance: distance between threads with close id weighted higher Local distance: only count distance from each core to next in sequence
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
0% 20% 40% 60% 80% 100% 120% Overhead (µs) Target miss rate Total distance Weighted distance Local distance
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
100 200 300 400 500 600 700 800 900 1000
Total Time (seconds)
GOMP, sequential Optimal threadcount, standard OS mapping Our server, no locality information Our server, locality Our server, locality + enhanced clustering
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
5000 10000 15000 20000 25000 30000 Total Time (seconds) GOMP sequential Our server, no locality Our server, locality Our server, locality + clustering
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
Topology-aware scheduling (with appropriate thread counts)
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
OpenMPI used in both cases
0,1 0,2 0,3 0,4 0,5 0,6 0,7 Default Topology aware Execution Time (seconds)
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
Scalability information System load Clustering considerations
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling
2010-06-15 IWOMP 2010, Topology-aware OpenMP Process Scheduling