The Impact of Network Noise on Large-Scale Communication Performance - - PowerPoint PPT Presentation

the impact of network noise on large scale communication
SMART_READER_LITE
LIVE PREVIEW

The Impact of Network Noise on Large-Scale Communication Performance - - PowerPoint PPT Presentation

The Impact of Network Noise on Large-Scale Communication Performance Torsten Hoefler , Timo Schneider, and Andrew Lumsdaine Open Systems Lab Indiana University Bloomington, USA Workshop on Large-Scale Parallel Processing/IPDPS09 Rome, Italy


slide-1
SLIDE 1

The Impact of Network Noise on Large-Scale Communication Performance

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine

Open Systems Lab Indiana University Bloomington, USA

Workshop on Large-Scale Parallel Processing/IPDPS’09

Rome, Italy

May, 29th 2009

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-2
SLIDE 2

Motivation

  • perating system noise is a known phenomenon

local interruptions by daemons, interrupts, ... not problematic (<2%) for serial applications noise propagation is problematic can lower application performance significantly pure system issues, often “simple” to solve

2 3 4 1 2 3 4 1

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-3
SLIDE 3

Motivation

effects in the network can cause similar behavior ⇒ network noise (net noise) management, filesystem, other application, ... traffic such congestion causes delays delays optimized communication patterns (collectives) propagation can lead to delays applications interfering with themselves is not net noise!

2 3 4 1 2 3 4 1 2 3 4 1

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-4
SLIDE 4

Our Approach

OS noise is modelled with statistical or signal processing methods network noise depends on:

topology and routing network technology, buffer policies, sizes etc. number of PEs per endpoint (multicore) communication pattern of all endpoints

⇒ not as easy to model approach: benchmark + simulation

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-5
SLIDE 5

Target Architecture

complex topologies

network noise can easily be avoided in tori/hypercubes (make sure all allocations are convex sets)

  • ther topologies (fat tree, Kautz) are not as simple

we focus on fat trees

random application/application interaction

most common also models random filesystem traffic

collective communication patterns (including stencil)

most common in HPC scenarios

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-6
SLIDE 6

Benchmark Method

create two random communicators with given ratio and warm them up MPI_Bcast(..); syncronize clocks on all ranks and start next step synchronously t t

7 2 5

random pattern

3 4 6 1

syncronize clocks on all ranks and start next step synchronously

3 5

MPI_Bcast(..);

6

MPI_Bcast(..);

2 5 7

random pattern

1 2 4

syncronize clocks on all ranks and start next step synchronously syncronize clocks on all ranks and start next step synchronously MPI_Bcast(..);

3 5 6

create two random communicators with given ratio and warm them up

pert nopert

t

pert

t

nopert

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-7
SLIDE 7

Benchmark Results

2 4 6 8 10 14 18 22 26 30 1000 2000 3000 4000 Nodes in collective Time [us] no perturbation with perturbation

boxplot, 32 nodes, MPI_Allreduce (single MPI_DOUBLE) Open MPI 1.2.8, SDR/IB, 566 node fat tree, FBB

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-8
SLIDE 8

Benchmark Results

0.2 0.4 0.6 0.8 100 150 200 250 Perturbation Ratio Slowdown relative to unperturbated run [%] Broadcast with 208 nodes Reduce with 492 nodes Allreduce with 492 nodes

different collectives, 128 measurements, average plotted very high variance (only 128 samples, background load)

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-9
SLIDE 9

Benchmark Results

50 100 150 200 250 110 120 130 140 150 Application Communicator Size Slowdown at Ratio of 0.5 [in %]

fixed perturbation ratio (0.5) slowdown with increasing communicator size

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-10
SLIDE 10

Simulation Methodology

needs to consider topology and routing use IB as a model simple linear congestion model we model collective

  • perations as a set of

dependencies (collective) level-wise simulation

1 2 4 5 6 3 7

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-11
SLIDE 11

Simulation Methodology

route every logical link through the network record congestion on edges

4x4 4x4 4x4 4x4 4x4 4x4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

4,8, 12 5,9,13 6,10,14 7,11,15 0,8,12 1,9,13 2,10,14 3,11,15 0,4,12 1,5,13 2,6,14 3,7,15 0,4,8 1,5,9 2,6,10 3,7,11 0, 1 2, 3 4, 5 6, 7 8, 9 10, 11 12, 13 14, 15 0, 1 2, 3 4, 5 6, 7 8, 9 10, 11 12, 13 14, 15

r0 r1 r2 r3 r4 r5 r6 r7 1 + 1 > 4 > 1 1 > 1 5 > 1 2 > 1 4 > 1 > 8 > 2 > 7 > 0 > 9 > 1 0 1 1 1 1 1 + 1 1 + 1 1 + 1 + 1 + 1 + 1 + 2 + 1 + 1 + 1 + 1 + 1 + 1 + 1

1 2 4 5 6 3 7

2 1 1 2 1 2 1

annotate collective graph with maximum path congestion longest path from any root node is reported

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-12
SLIDE 12

Simulation Systems

real-world system inputs (IB network maps) Odin @ IU (128 nodes, FBB fat tree) CHiC @ TUC (566 nodes, FBB fat tree) Atlas @ LLNL (1142 nodes, FBB fat tree) Ranger @ TACC (3908 nodes, FBB fat tree) TBird @ SNL (4391 nodes, 1/2 BB fat tree) ⇒ your system? Please give us the maps!

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-13
SLIDE 13

Simulation Results

0.3 0.4 0.5 0.6 0.7 150 200 250 300 Perturbation Ratio Slowdown relative to unperturbated run [%] Odin (128) CHiC (566) Atlas (1142) Ranger (3908) TBird (4391)

binomial tree pattern (small message Bcast, Reduce) CHiC results reflect microbenchmark accurately!

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-14
SLIDE 14

Real Large-Scale Simulation Results

simulated large exteded generalized fat trees (XGFT) 24 port crossbars, full bisection bandwidth fat tree optimized routing (OpenSM) 144 nodes (one level) to 20,736 nodes (three levels) above: 144 nodes, below: 1152 nodes

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-15
SLIDE 15

Simulation Results

5000 10000 15000 20000 150 200 250 300 350 Network Size Slowdown at ratio of 0.5 [in %]

perturbation ratio 0.5, tree pattern logarithmic shape reflects CHiC benchmarks!

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-16
SLIDE 16

Conclusions and Future Works

Conclusions network noise must be considered significant impact, similar to OS noise no known real-world analyses yet network topology and routing are very important Future Work good process-to-node mapping could reduce problems topology-aware communication algorithms extend analysis to real applications (profiling, tracing) analyze several network topologies and workarounds

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per

slide-17
SLIDE 17

Thanks for your attention! Questions?

Download the (research-quality) ORCS simulator at: http://www.unixer.de/ORCS

Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine The Impact of Network Noise on Large-Scale Communication Per