Automatic discovery of the characteristics and capacities of a - - PowerPoint PPT Presentation

automatic discovery of the characteristics and capacities
SMART_READER_LITE
LIVE PREVIEW

Automatic discovery of the characteristics and capacities of a - - PowerPoint PPT Presentation

Martin Q UINSON cole Normale Suprieure de Lyon, Laboratoire de lInformatique du Paralllisme Automatic discovery of the characteristics and capacities of a distributed computational platform December 11 th 2003 Introduction to the Grid


slide-1
SLIDE 1

Martin QUINSON

École Normale Supérieure de Lyon, Laboratoire de l’Informatique du Parallélisme

Automatic discovery of the characteristics and capacities

  • f a distributed computational platform

December 11th 2003

slide-2
SLIDE 2

Introduction to the Grid

Metacomputing: aggregating distributed computers and storage units the resulting platform is usually called the Grid

  • Very high potential (in power and ease of use)
  • The Grid hardware is already there

Share of local resources between several organizations ⇒ WAN constellation of LAN

  • The Grid software infrastructure only emerging.

Difficulties come from (amongst others):

  • Heterogeneity
  • Resource sharing (⇒ availability variations)
  • Multiple organizations (trust issue)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 2 / 27 ⊲⊲ |

slide-3
SLIDE 3

Introduction to the Grid

Metacomputing: aggregating distributed computers and storage units the resulting platform is usually called the Grid

  • Very high potential (in power and ease of use)
  • The Grid hardware is already there

Share of local resources between several organizations ⇒ WAN constellation of LAN

  • The Grid software infrastructure only emerging.

Difficulties come from (amongst others):

  • Heterogeneity
  • Resource sharing (⇒ availability variations)
  • Multiple organizations (trust issue)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 2 / 27 ⊲⊲ |

slide-4
SLIDE 4

Introduction to the Grid

Metacomputing: aggregating distributed computers and storage units the resulting platform is usually called the Grid

  • Very high potential (in power and ease of use)
  • The Grid hardware is already there

Share of local resources between several organizations ⇒ WAN constellation of LAN

  • The Grid software infrastructure only emerging.

Difficulties come from (amongst others):

  • Heterogeneity
  • Resource sharing (⇒ availability variations)
  • Multiple organizations (trust issue)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 2 / 27 ⊲⊲ |

slide-5
SLIDE 5

Introduction to the Grid

Metacomputing: aggregating distributed computers and storage units the resulting platform is usually called the Grid

  • Very high potential (in power and ease of use)
  • The Grid hardware is already there

Share of local resources between several organizations ⇒ WAN constellation of LAN

  • The Grid software infrastructure only emerging.

Difficulties come from (amongst others):

  • Heterogeneity
  • Resource sharing (⇒ availability variations)
  • Multiple organizations (trust issue)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 2 / 27 ⊲⊲ |

slide-6
SLIDE 6

Which information for which scheduling?

Randomized scheduling:

  • Tasks list; existing hosts list

Simple scheduling:

  • About tasks: theoretical complexity (like O(n))
  • About hosts: peak performance or on a given benchmark
  • About links: maximal capacities

Current Grid scheduling:

  • About hosts: up/down, CPU and memory load
  • About links: current capacities matrix

Information quality is crucial to scheduling quality

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 3 / 27 ⊲⊲ |

slide-7
SLIDE 7

Which information for which scheduling?

Randomized scheduling:

  • Tasks list; existing hosts list

Simple scheduling:

  • About tasks: theoretical complexity (like O(n))
  • About hosts: peak performance or on a given benchmark
  • About links: maximal capacities

Current Grid scheduling:

  • About hosts: up/down, CPU and memory load
  • About links: current capacities matrix

Information quality is crucial to scheduling quality

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 3 / 27 ⊲⊲ |

slide-8
SLIDE 8

Overview of this work Our goal: provide the information needed by the scheduler.

  • I. Quantitative knowledge of needs (tasks) and availabilities (servers and network)

NWS + FAST

  • II. Qualitative knowledge of network topology

ENV→ ALNeM

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 4 / 27 ⊲⊲ |

slide-9
SLIDE 9

Overview of this work Our goal: provide the information needed by the scheduler.

  • I. Quantitative knowledge of needs (tasks) and availabilities (servers and network)

NWS + FAST

  • II. Qualitative knowledge of network topology

ENV→ ALNeM

Server Server Server Server Server Server Server Server Client Client Client Agent

Task2 Task1 Task3

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 4 / 27 ⊲⊲ |

slide-10
SLIDE 10

Overview of this work Our goal: provide the information needed by the scheduler.

  • I. Quantitative knowledge of needs (tasks) and availabilities (servers and network)

NWS + FAST

  • II. Qualitative knowledge of network topology

ENV→ ALNeM

Server Server Server Server Server Server Server Server Client Client Client Network

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 4 / 27 ⊲⊲ |

slide-11
SLIDE 11

Overview of this work Our goal: provide the information needed by the scheduler.

  • I. Quantitative knowledge of needs (tasks) and availabilities (servers and network)

NWS + FAST

  • II. Qualitative knowledge of network topology

ENV→ ALNeM

NWS [RSH99] forecasts:

  • bandwidth, latency, memory, disk space, . . .
  • host load as percentage

Server Server Server Server Server Server Server Server Client

90Mo 97Mo 50Mo 534ko 42Mo 1Go 156Mo 1,3ko/s 2ko/s 280o/s 4,5Mo/s 1,7Go/s 23Mo/s 280o/s 60Mo/s

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 4 / 27 ⊲⊲ |

slide-12
SLIDE 12

Overview of this work Our goal: provide the information needed by the scheduler.

  • I. Quantitative knowledge of needs (tasks) and availabilities (servers and network)

NWS + FAST

  • II. Qualitative knowledge of network topology

ENV→ ALNeM

NWS [RSH99] forecasts:

  • bandwidth, latency, memory, disk space, . . .
  • host load as percentage

FAST [Qui02b] provides:

  • Task needs benchmarking

time and memory size (fitting to the host)

⇒ Duration of the task on each server

Server Server Server Server Server Server Server Server Client

1,3ko/s 2ko/s 280o/s 4,5Mo/s 1,7Go/s 23Mo/s 280o/s 534ko 90Mo 42Mo 97Mo 50Mo 156Mo 1Go 156Mo

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 4 / 27 ⊲⊲ |

slide-13
SLIDE 13

Overview of this work Our goal: provide the information needed by the scheduler.

  • I. Quantitative knowledge of needs (tasks) and availabilities (servers and network)

NWS + FAST

  • II. Qualitative knowledge of network topology

ENV→ ALNeM Motivating example: how to configure NWS?

  • Simplest: measure everything

Server Server Server Server Server Server Server Server Client Client Client

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 4 / 27 ⊲⊲ |

slide-14
SLIDE 14

Overview of this work Our goal: provide the information needed by the scheduler.

  • I. Quantitative knowledge of needs (tasks) and availabilities (servers and network)

NWS + FAST

  • II. Qualitative knowledge of network topology

ENV→ ALNeM Motivating example: how to configure NWS?

  • Simplest: measure everything
  • Better: hierarchical

Server Server Server Server Server Server Server Server Client

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 4 / 27 ⊲⊲ |

slide-15
SLIDE 15

Overview of this work Our goal: provide the information needed by the scheduler.

  • I. Quantitative knowledge of needs (tasks) and availabilities (servers and network)

NWS + FAST

  • II. Qualitative knowledge of network topology

ENV→ ALNeM Motivating example: how to configure NWS?

  • Simplest: measure everything
  • Better: hierarchical

Target:

  • logical topology (end-host)
  • interferences

Server Server Server Server Server Server Server Server Client

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 4 / 27 ⊲⊲ |

slide-16
SLIDE 16

Overview of this work Our goal: provide the information needed by the scheduler.

  • I. Quantitative knowledge of needs (tasks) and availabilities (servers and network)

NWS + FAST

  • II. Qualitative knowledge of network topology

ENV→ ALNeM

ENV [SBW99]:

maps the network without root access

  • nly hierarchical (tree)

Server Server Server Server Server Server Server Server Client

?

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 4 / 27 ⊲⊲ |

slide-17
SLIDE 17

Overview of this work Our goal: provide the information needed by the scheduler.

  • I. Quantitative knowledge of needs (tasks) and availabilities (servers and network)

NWS + FAST

  • II. Qualitative knowledge of network topology

ENV→ ALNeM

ENV [SBW99]:

maps the network without root access

  • nly hierarchical (tree)

ALNeM [LQ04]

  • Same approach than ENV, generalized
  • Stronger theoretical basements

Server Server Server Server Server Server Server Server Client

?

166 32 42 5 6 1 8 10 16 101 103 20 105 106 109 22 11 12 14 19 110 112 114 39 118 119 121 124 34 126 36 13 15 130 131 133 134 31 136 138 46 60 141 143 144 40 146 147 148 4 7 152 153 154 47 155 158 159 44 18 75 161 164 58 165 167 168 169 17 80 170 171 172 173 174 52 175 176 177 179 59 70 180 182 183 53 2 27 3 51 21 25 28 85 23 24 29 26 30 35 33 37 38 9 45 41 50 54 55 56 57 61 62 63 64 66 67 68 69 71 72 73 74 76 78 79 86 92 94 96 98 99

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 4 / 27 ⊲⊲ |

slide-18
SLIDE 18

Overview

  • Introduction
  • NWS: Network Weather Service
  • FAST: Fast’s Agent System Timer
  • ALNeM: Application-Level Network Mapper
  • Conclusion

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 4 / 27 ⊲⊲ |

slide-19
SLIDE 19

The Network Weather Service: presentation

Goal: (Grid) system availabilities measurement and forecasting

Leaded by Prof. Wolski (UCSB), used by AppLeS, Globus, NetSolve, Ninf, DIET, . . .

Architecture: Distributed system

Sensor: conducts the measurements Memory: stores the results Forecaster: forecasts statistically the tendencies Name server: directory service like LDAP

Memory Nameserver Sensor Sensor Forecaster

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 5 / 27 ⊲⊲ |

slide-20
SLIDE 20

The Network Weather Service: presentation

Goal: (Grid) system availabilities measurement and forecasting

Leaded by Prof. Wolski (UCSB), used by AppLeS, Globus, NetSolve, Ninf, DIET, . . .

Architecture: Distributed system

Sensor: conducts the measurements Memory: stores the results Forecaster: forecasts statistically the tendencies Name server: directory service like LDAP

Memory Nameserver Sensor Sensor Forecaster External source Test Test

Steady state: regular tests

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 5 / 27 ⊲⊲ |

slide-21
SLIDE 21

The Network Weather Service: presentation

Goal: (Grid) system availabilities measurement and forecasting

Leaded by Prof. Wolski (UCSB), used by AppLeS, Globus, NetSolve, Ninf, DIET, . . .

Architecture: Distributed system

Sensor: conducts the measurements Memory: stores the results Forecaster: forecasts statistically the tendencies Name server: directory service like LDAP

Request Client Memory Nameserver Sensor Sensor Forecaster

Handling of a request

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 5 / 27 ⊲⊲ |

slide-22
SLIDE 22

The Network Weather Service: presentation

Goal: (Grid) system availabilities measurement and forecasting

Leaded by Prof. Wolski (UCSB), used by AppLeS, Globus, NetSolve, Ninf, DIET, . . .

Architecture: Distributed system

Sensor: conducts the measurements Memory: stores the results Forecaster: forecasts statistically the tendencies Name server: directory service like LDAP

Client Memory Nameserver Sensor Sensor Forecaster

Handling of a request

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 5 / 27 ⊲⊲ |

slide-23
SLIDE 23

The Network Weather Service: presentation

Goal: (Grid) system availabilities measurement and forecasting

Leaded by Prof. Wolski (UCSB), used by AppLeS, Globus, NetSolve, Ninf, DIET, . . .

Architecture: Distributed system

Sensor: conducts the measurements Memory: stores the results Forecaster: forecasts statistically the tendencies Name server: directory service like LDAP

Answer Client Memory Nameserver Sensor Sensor Forecaster

Handling of a request

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 5 / 27 ⊲⊲ |

slide-24
SLIDE 24

Measurements and Forecasting

  • Provided metrics:

availableCpu (for an incoming process), currentCpu (for existing processes), bandwidthTcp, latencyTcp (Default: 64Kb in 16Kb messages; buffer=32Kb), connectTimeTcp, freeDisk, freeMemory, . . .

  • Forecasting using statistics

Data = serie: D1, D2, . . . , Dn−1, Dn. We want Dn+1. Methods are applied on D1, D2, . . . , Dn−1. each one predict Dn. Selection of the best on Dn to predict Dn+1.

Used statistical methods

mean: running, (adapting) sliding window ; median: idem ; gradian: GRAD(t, g) = (1−g)×GRAD(t−1, g) + g×value(t) ; last value.

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 6 / 27 ⊲⊲ |

slide-25
SLIDE 25

Conclusion about NWS Complete environment Designed for scheduling Statistical forecasting Widely used Uneasy to extend Sometimes difficult to deploy TCP only (myrinet-based?) Related work

NetPerf: HP project to sort network components, no interactivity GloPerf: Globus moves to NWS PingER: Regular pings between 600 hosts in 72 countries Iperf: Finds out the bandwidth by saturating the link for 30 seconds RPS: Forecasting limited to the CPU load Performance Co-Pilot (SGI):

  • Same kind of architecture
  • Low level data (/proc) ⇒ not easily usable by a scheduler
  • No forecasting

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 7 / 27 ⊲⊲ |

slide-26
SLIDE 26

Overview

  • Introduction
  • NWS: Network Weather Service
  • FAST: Fast’s Agent System Timer
  • ALNeM: Application-Level Network Mapper
  • Conclusion

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 7 / 27 ⊲⊲ |

slide-27
SLIDE 27

Fast Agent’s System Timer: presentation

Goals:

  • gather routine’s performance on a given host at a given time
  • interactivity, ease of use

Architecture:

NWS FAST library

Needs modeling Sys availabilities

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 8 / 27 ⊲⊲ |

slide-28
SLIDE 28

Fast Agent’s System Timer: presentation

Goals:

  • gather routine’s performance on a given host at a given time
  • interactivity, ease of use

Architecture:

LDAP Installation time Benchmarker NWS FAST library

Needs modeling Sys availabilities

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 8 / 27 ⊲⊲ |

slide-29
SLIDE 29

Fast Agent’s System Timer: presentation

Goals:

  • gather routine’s performance on a given host at a given time
  • interactivity, ease of use

Architecture:

LDAP Runtime library Installation time Benchmarker Client application NWS FAST library

Needs modeling Sys availabilities

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 8 / 27 ⊲⊲ |

slide-30
SLIDE 30

Routines needs modeling Related Work

  • Elementary operation count: the myth of the constant Mflop/s
  • Analytical model, micro-benchmarking: complex ⇒ interactive, task description?
  • Probability, Markov: how to instanciate it at a given time?

FAST’s approach

  • Simple (sequential) routines like BLAS

macro-benchmarking: benchmark {task; host} as a whole at installation

  • Getting the time: utime + stime to avoid backgroung load
  • Getting the space: step by step execution (like gdb) to track changes and search peak

⇒ rather long, but only once

  • Complex routines (ScaLAPACK)

Structural decomposition by source analysis

  • Irregular routines (sparse algebra)

No forecasting ⇒ selection of the fastest host Decomposition to extract simple parts Input of estimators from the application

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 9 / 27 ⊲⊲ |

slide-31
SLIDE 31

Routines needs modeling Related Work

  • Elementary operation count: the myth of the constant Mflop/s
  • Analytical model, micro-benchmarking: complex ⇒ interactive, task description?
  • Probability, Markov: how to instanciate it at a given time?

FAST’s approach

  • Simple (sequential) routines like BLAS

macro-benchmarking: benchmark {task; host} as a whole at installation

  • Getting the time: utime + stime to avoid backgroung load
  • Getting the space: step by step execution (like gdb) to track changes and search peak

⇒ rather long, but only once

  • Complex routines (ScaLAPACK)

Structural decomposition by source analysis

  • Irregular routines (sparse algebra)

No forecasting ⇒ selection of the fastest host Decomposition to extract simple parts Input of estimators from the application

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 9 / 27 ⊲⊲ |

slide-32
SLIDE 32

Routines needs modeling Related Work

  • Elementary operation count: the myth of the constant Mflop/s
  • Analytical model, micro-benchmarking: complex ⇒ interactive, task description?
  • Probability, Markov: how to instanciate it at a given time?

FAST’s approach

  • Simple (sequential) routines like BLAS

macro-benchmarking: benchmark {task; host} as a whole at installation

  • Getting the time: utime + stime to avoid backgroung load
  • Getting the space: step by step execution (like gdb) to track changes and search peak

⇒ rather long, but only once

  • Complex routines (ScaLAPACK)

Structural decomposition by source analysis

  • Irregular routines (sparse algebra)

No forecasting ⇒ selection of the fastest host Decomposition to extract simple parts Input of estimators from the application

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 9 / 27 ⊲⊲ |

slide-33
SLIDE 33

Routines needs modeling Related Work

  • Elementary operation count: the myth of the constant Mflop/s
  • Analytical model, micro-benchmarking: complex ⇒ interactive, task description?
  • Probability, Markov: how to instanciate it at a given time?

FAST’s approach

  • Simple (sequential) routines like BLAS

macro-benchmarking: benchmark {task; host} as a whole at installation

  • Getting the time: utime + stime to avoid backgroung load
  • Getting the space: step by step execution (like gdb) to track changes and search peak

⇒ rather long, but only once

  • Complex routines (ScaLAPACK)

Freddy [CDQF03], Structural decomposition by source analysis

integration underway

  • Irregular routines (sparse algebra)

No forecasting ⇒ selection of the fastest host Decomposition to extract simple parts Input of estimators from the application

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 9 / 27 ⊲⊲ |

slide-34
SLIDE 34

Routines needs modeling Related Work

  • Elementary operation count: the myth of the constant Mflop/s
  • Analytical model, micro-benchmarking: complex ⇒ interactive, task description?
  • Probability, Markov: how to instanciate it at a given time?

FAST’s approach

  • Simple (sequential) routines like BLAS

macro-benchmarking: benchmark {task; host} as a whole at installation

  • Getting the time: utime + stime to avoid backgroung load
  • Getting the space: step by step execution (like gdb) to track changes and search peak

⇒ rather long, but only once

  • Complex routines (ScaLAPACK)

Freddy [CDQF03], Structural decomposition by source analysis

integration underway

  • Irregular routines (sparse algebra)

No forecasting ⇒ selection of the fastest host Decomposition to extract simple parts Input of estimators from the application

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 9 / 27 ⊲⊲ |

slide-35
SLIDE 35

Quality of the modeling Time modeling

dgeadd dgemm dtrsm icluster paraski icluster paraski icluster paraski Maximal 0.02s 0.02s 0.21s 5.8s 0.13s 0.31s error 6% 35% 0.3% 4% 10% 16% Average 0.006s 0.007s 0.025s 0.03s 0.02s 0.08s error 4% 6.5% 0.1% 0.1% 5% 7% dgeadd: Matrix addition dgemm: Matrix multiplication dtrsm: Triangular resolution icluster: bi-Pentium II, 256Mb, Linux, IMAG (Grenoble). paraski: Pentium III, 256Mb, Linux, IRISA (Rennes). network: Intra: LAN, 100Mb/s; Inter: VTHD network, 2.5Gb/s.

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 10 / 27 ⊲⊲ |

slide-36
SLIDE 36

Quality of the modeling Time modeling

dgeadd dgemm dtrsm icluster paraski icluster paraski icluster paraski Maximal 0.02s 0.02s 0.21s 5.8s 0.13s 0.31s error 6% 35% 0.3% 4% 10% 16% Average 0.006s 0.007s 0.025s 0.03s 0.02s 0.08s error 4% 6.5% 0.1% 0.1% 5% 7% dgeadd: Matrix addition dgemm: Matrix multiplication dtrsm: Triangular resolution icluster: bi-Pentium II, 256Mb, Linux, IMAG (Grenoble). paraski: Pentium III, 256Mb, Linux, IRISA (Rennes). network: Intra: LAN, 100Mb/s; Inter: VTHD network, 2.5Gb/s.

Space modeling

Almost perfect: Maximal error < 1% ; Average error ≈ 0.1%

Code size + Matrix size (constant) (polynomial)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 10 / 27 ⊲⊲ |

slide-37
SLIDE 37

Forecasting with background load

dgemm with background load (CPU-intensive process in background).

100 200 300 400 500 600 700 128 256 384 512 640 768 896 1024

Time (s) Matrices size Forecasted time on paraski Measured time on paraski Forecasted time on icluster Measured time on icluster

Maximal error: 22% Average error<10%

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 11 / 27 ⊲⊲ |

slide-38
SLIDE 38

Forecasting of sequence with background load

C =

  • Cr = Ar × Br − Ai × Bi

Ci = Ar × Bi + Ai × Br

client/servers over LAN

20 40 60 80 100 120 140 160 128 256 384 512 640 768 896 1024

Time (s) Matrices size Measured time Forecasted time

Maximal error: 25%; Average error: 13%

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 12 / 27 ⊲⊲ |

slide-39
SLIDE 39

Comparison with NetSolve’s forecaster

100 200 300 400 500 600 700 128 256 384 512 640 768 896 1024

Time (s) Matrices size NetSolve forecast Measured time FAST forecast

Computation time of dgemm.

50 100 150 200 250 300 128 256 384 512 640 768 896 1024 1152

Time (s) Matrices size NetSolve forecast Measured time FAST forecast

Communication time of dgemm.

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 13 / 27 ⊲⊲ |

slide-40
SLIDE 40

Latency reduction

0.1 Time (s) µ (99569 s) µ (100685 s)

NWS FAST (cache miss)

µ (24 s)

FAST (cache hit)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 14 / 27 ⊲⊲ |

slide-41
SLIDE 41

Responsiveness improvement

Scheduler / NWS collaboration

Idle time Idle time Task run 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5 Time (minutes) CPU availability (%)

Forecasting NWS: out of the box FAST: {sensors restart + forecaster reset} when the task starts or ends Theoretical value

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 15 / 27 ⊲⊲ |

slide-42
SLIDE 42

Virtual booking: How does it work?

Scheduled task

NWS updated Task started Scheduling decision Task ended NWS updated

correction 0 1 1

FAST asks NWS to update

NWS

sensor −1 Time

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 16 / 27 ⊲⊲ |

slide-43
SLIDE 43

Benefits of virtual booking

Idle time Idle time Task running

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 1 1.5 2 2.5 3 3.5 Cpu availability (%) Time (minutes)

Measurements

Idle time Idle time Task running

0.5 1 1.5 2 2.5 3 3.5 Time (minutes) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cpu availability (%)

Forecasting NWS: ADAPT_CPU FAST: ADAPT_CPU + virtual booking + sensors restart + forecaster reset Theoretical value (Result of 4 different runs)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 17 / 27 ⊲⊲ |

slide-44
SLIDE 44

Contributions of FAST Forecasting with load

20 40 60 80 100 120 140 160 128 256 384 512 640 768 896 1024

Time (s) Matrices size Measured time Forecasted time

Responsiveness

Idle time Idle time Task running

0.5 1 1.5 2 2.5 3 3.5 Time (minutes) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Cpu availability (%)

Summary

  • Generic benchmarking solution
  • Simple interface to quantitative data
  • Parallel routines handling currently integrated
  • Integration: DIET, NetSolve, Grid-TLSE, cichlid
  • 15 000 lines of C code, Linux, Solaris, True64
  • 2 journals and 3 conferences/workshops

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 18 / 27 ⊲⊲ |

slide-45
SLIDE 45

Overview

  • Introduction
  • NWS: Network Weather Service
  • FAST: Fast’s Agent System Timer
  • ALNeM: Application-Level Network Mapper
  • Conclusion

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 18 / 27 ⊲⊲ |

slide-46
SLIDE 46

Application-Level Network Mapper

Goal: Mapping the network topology Authors: Arnaud Legrand, Martin Quinson Motivation: Server hosting, Simulation, Collective Communication Forecasting Target application: NWS hosting Problem: Network experiments must not collide (Clique concept) Simplest: One big clique ; Better: Hierarchical

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 19 / 27 ⊲⊲ |

slide-47
SLIDE 47

Application-Level Network Mapper

Goal: Mapping the network topology Authors: Arnaud Legrand, Martin Quinson Motivation: Server hosting, Simulation, Collective Communication Forecasting Target application: NWS hosting Problem: Network experiments must not collide (Clique concept) Simplest: One big clique ; Better: Hierarchical

Server Server Server Server Server Server Server Server Client Client Client

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 19 / 27 ⊲⊲ |

slide-48
SLIDE 48

Application-Level Network Mapper

Goal: Mapping the network topology Authors: Arnaud Legrand, Martin Quinson Motivation: Server hosting, Simulation, Collective Communication Forecasting Target application: NWS hosting Problem: Network experiments must not collide (Clique concept) Simplest: One big clique ; Better: Hierarchical

Server Server Server Server Server Server Server Server Client

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 19 / 27 ⊲⊲ |

slide-49
SLIDE 49

Application-Level Network Mapper

Goal: Mapping the network topology Authors: Arnaud Legrand, Martin Quinson Motivation: Server hosting, Simulation, Collective Communication Forecasting Focus: Discover interferences (limiting common links), not really packet paths

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 19 / 27 ⊲⊲ |

slide-50
SLIDE 50

Application-Level Network Mapper

Goal: Mapping the network topology Authors: Arnaud Legrand, Martin Quinson Motivation: Server hosting, Simulation, Collective Communication Forecasting Focus: Discover interferences (limiting common links), not really packet paths

Related work

Method Restricted Focus Routers Notes SNMP authorized path all passive, dumb routers, LAN traceroute ICMP path all level 3 of OSI pathchar root path all link bandwidth, slow Other no path din = dout tree tomography bipartite [Rabbat03] ENV no interference some tree only

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 19 / 27 ⊲⊲ |

slide-51
SLIDE 51

ALNeM: Notations

Def (non-interference): (ab) rl (cd) ⇐ ⇒

bwcd(ab) bw(ab)

≈ 1 Def (interference): (ab) rl (cd) ⇐ ⇒

bwcd(ab) bw(ab)

≈ 0.5 Def: Interference matrix I(V, rl) I(V, rl)(a, b, c, d) =    1 if (ab) rl (cd) if not INTERFERENCEGRAPH: Given H and I(H, rl), Find a graph G(V, E) and the associated routing satisfying:        H ⊂ V I(H, G) = I(H, rl) |V | is minimal. .

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 20 / 27 ⊲⊲ |

slide-52
SLIDE 52

ALNeM: Notations

Def (non-interference): (ab) rl (cd) ⇐ ⇒

bwcd(ab) bw(ab)

≈ 1 Def (interference): (ab) rl (cd) ⇐ ⇒

bwcd(ab) bw(ab)

≈ 0.5 Def: Interference matrix I(V, rl) I(V, rl)(a, b, c, d) =    1 if (ab) rl (cd) if not INTERFERENCEGRAPH: Given H and I(H, rl), Find a graph G(V, E) and the associated routing satisfying:        H ⊂ V I(H, G) = I(H, rl) |V | is minimal. .

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 20 / 27 ⊲⊲ |

slide-53
SLIDE 53

ALNeM: Notations

Def (non-interference): (ab) rl (cd) ⇐ ⇒

bwcd(ab) bw(ab)

≈ 1 Def (interference): (ab) rl (cd) ⇐ ⇒

bwcd(ab) bw(ab)

≈ 0.5 Def: Interference matrix I(V, rl) I(V, rl)(a, b, c, d) =    1 if (ab) rl (cd) if not INTERFERENCEGRAPH: Given H and I(H, rl), Find a graph G(V, E) and the associated routing satisfying:        H ⊂ V I(H, G) = I(H, rl) |V | is minimal. .

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 20 / 27 ⊲⊲ |

slide-54
SLIDE 54

Mathematical tools

  • Def. (total interference): a ⊥ b ⇐

⇒ ∀(u, v) ∈ H, (au) rl (bv) Lemma (separator): ∀a, b ∈ H, a ⊥ b ⇐ ⇒ ∃ ρ ∈ V

  • ∀z ∈ H : ρ ∈ (a −

→ z)∩(b − → z) . (⊥⇐ ⇒ ∃ ρ separator) Theorem: ⊥ is an equivalence relation

(under some assumptions)

Theorem (representativity): C equivalence class under ⊥

(under some assumptions)

∀ρ, σ ∈ C, ∀b, u, v ∈ H, (ρ, u) rl (b, v) ⇔ (σ, u) rl (b, v) (you can interchange any member of the class by any other in the matrix)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 21 / 27 ⊲⊲ |

slide-55
SLIDE 55

Mathematical tools

  • Def. (total interference): a ⊥ b ⇐

⇒ ∀(u, v) ∈ H, (au) rl (bv) Lemma (separator): ∀a, b ∈ H, a ⊥ b ⇐ ⇒ ∃ ρ ∈ V

  • ∀z ∈ H : ρ ∈ (a −

→ z)∩(b − → z) . (⊥⇐ ⇒ ∃ ρ separator) Theorem: ⊥ is an equivalence relation

(under some assumptions)

Theorem (representativity): C equivalence class under ⊥

(under some assumptions)

∀ρ, σ ∈ C, ∀b, u, v ∈ H, (ρ, u) rl (b, v) ⇔ (σ, u) rl (b, v) (you can interchange any member of the class by any other in the matrix)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 21 / 27 ⊲⊲ |

slide-56
SLIDE 56

Mathematical tools

  • Def. (total interference): a ⊥ b ⇐

⇒ ∀(u, v) ∈ H, (au) rl (bv) Lemma (separator): ∀a, b ∈ H, a ⊥ b ⇐ ⇒ ∃ ρ ∈ V

  • ∀z ∈ H : ρ ∈ (a −

→ z)∩(b − → z) . (⊥⇐ ⇒ ∃ ρ separator) Theorem: ⊥ is an equivalence relation

(under some assumptions)

Theorem (representativity): C equivalence class under ⊥

(under some assumptions)

∀ρ, σ ∈ C, ∀b, u, v ∈ H, (ρ, u) rl (b, v) ⇔ (σ, u) rl (b, v) (you can interchange any member of the class by any other in the matrix)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 21 / 27 ⊲⊲ |

slide-57
SLIDE 57

Mathematical tools

  • Def. (total interference): a ⊥ b ⇐

⇒ ∀(u, v) ∈ H, (au) rl (bv) Lemma (separator): ∀a, b ∈ H, a ⊥ b ⇐ ⇒ ∃ ρ ∈ V

  • ∀z ∈ H : ρ ∈ (a −

→ z)∩(b − → z) . (⊥⇐ ⇒ ∃ ρ separator) Theorem: ⊥ is an equivalence relation

(under some assumptions)

Theorem (representativity): C equivalence class under ⊥

(under some assumptions)

∀ρ, σ ∈ C, ∀b, u, v ∈ H, (ρ, u) rl (b, v) ⇔ (σ, u) rl (b, v) (you can interchange any member of the class by any other in the matrix)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 21 / 27 ⊲⊲ |

slide-58
SLIDE 58

Algorithm for cliques of trees

Equivalence class ⇒ greedy algorithm eating the leaves

A C D E F G H I B A B C D E F G H I

Theorem: When |Cinf| = 1, the graph built is a solution. Theorem: If a tree being a solution exists, |Cinf| = 1. Remark: The graph built is optimal (wrt |V | since V = H) Theorem: When I contains no interferences, the clique of Ci is a valid solution. Remark: It is also optimal

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 22 / 27 ⊲⊲ |

slide-59
SLIDE 59

Algorithm for cliques of trees

Equivalence class ⇒ greedy algorithm eating the leaves

A C D E F G H I B B D G H I F E C A

Theorem: When |Cinf| = 1, the graph built is a solution. Theorem: If a tree being a solution exists, |Cinf| = 1. Remark: The graph built is optimal (wrt |V | since V = H) Theorem: When I contains no interferences, the clique of Ci is a valid solution. Remark: It is also optimal

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 22 / 27 ⊲⊲ |

slide-60
SLIDE 60

Algorithm for cliques of trees

Equivalence class ⇒ greedy algorithm eating the leaves

A C D E F G H I B B D G H I F E C A

Theorem: When |Cinf| = 1, the graph built is a solution. Theorem: If a tree being a solution exists, |Cinf| = 1. Remark: The graph built is optimal (wrt |V | since V = H) Theorem: When I contains no interferences, the clique of Ci is a valid solution. Remark: It is also optimal

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 22 / 27 ⊲⊲ |

slide-61
SLIDE 61

Algorithm for cliques of trees

Equivalence class ⇒ greedy algorithm eating the leaves

A C D E F G H I B D G B H I F E C A

Theorem: When |Cinf| = 1, the graph built is a solution. Theorem: If a tree being a solution exists, |Cinf| = 1. Remark: The graph built is optimal (wrt |V | since V = H) Theorem: When I contains no interferences, the clique of Ci is a valid solution. Remark: It is also optimal

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 22 / 27 ⊲⊲ |

slide-62
SLIDE 62

Algorithm for cliques of trees

Equivalence class ⇒ greedy algorithm eating the leaves

A C D E F G H I B D G B H I F E C A

Theorem: When |Cinf| = 1, the graph built is a solution. Theorem: If a tree being a solution exists, |Cinf| = 1. Remark: The graph built is optimal (wrt |V | since V = H) Theorem: When I contains no interferences, the clique of Ci is a valid solution. Remark: It is also optimal

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 22 / 27 ⊲⊲ |

slide-63
SLIDE 63

Algorithm for cliques of trees

Equivalence class ⇒ greedy algorithm eating the leaves

A C D E F G H I B D G B H I F E C A

Theorem: When |Cinf| = 1, the graph built is a solution. Theorem: If a tree being a solution exists, |Cinf| = 1. Remark: The graph built is optimal (wrt |V | since V = H) Theorem: When I contains no interferences, the clique of Ci is a valid solution. Remark: It is also optimal

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 22 / 27 ⊲⊲ |

slide-64
SLIDE 64

Extension for cycles

Let a, b be the elements of Ci with the more interferences. Lemma: no solution with ∃z ∈ H so that z ∈ (a − → b) ⇒ Cut between a and b!

a b

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 23 / 27 ⊲⊲ |

slide-65
SLIDE 65

Extension for cycles

Let a, b be the elements of Ci with the more interferences. Lemma: no solution with ∃z ∈ H so that z ∈ (a − → b) ⇒ Cut between a and b!

a b

α β

Finding out how to cut

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 23 / 27 ⊲⊲ |

slide-66
SLIDE 66

Extension for cycles

Let a, b be the elements of Ci with the more interferences. Lemma: no solution with ∃z ∈ H so that z ∈ (a − → b) ⇒ Cut between a and b!

a b

I1 I2 I3

α β

Finding out how to cut

8 > > > > > < > > > > > : I1 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯ I2 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯ I3 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯ I4 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯

I4 = {a; b}

the contrary would imply

a b u

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 23 / 27 ⊲⊲ |

slide-67
SLIDE 67

Extension for cycles

Let a, b be the elements of Ci with the more interferences. Lemma: no solution with ∃z ∈ H so that z ∈ (a − → b) ⇒ Cut between a and b!

u v v u a b

I1 I2 I3

α β

Finding out how to cut

8 > > > > > < > > > > > : I1 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯ I2 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯ I3 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯ I4 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯

b α β a b α β a

1 1

} } }

u

1 1 1

v

0\1

I2 I3 I1

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 23 / 27 ⊲⊲ |

slide-68
SLIDE 68

Extension for cycles

Let a, b be the elements of Ci with the more interferences. Lemma: no solution with ∃z ∈ H so that z ∈ (a − → b) ⇒ Cut between a and b!

u v v u a b

I1 I2 I3

α β

Finding out how to cut

8 > > > > > < > > > > > : I1 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯ I2 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯ I3 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯ I4 = ˘ u ∈ Ci : a ∈ (b − → u) and b ∈ (a − → u) ¯

b α β a b α β a

1 1

} } }

u

1 1 1

v

0\1

I2 I3 I1

Topological sort on the graph associated to the matrix slice gives I1, I2, I3

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 23 / 27 ⊲⊲ |

slide-69
SLIDE 69

Extension for cycles

Let a, b be the elements of Ci with the more interferences. Lemma: no solution with ∃z ∈ H so that z ∈ (a − → b) ⇒ Cut between a and b!

✁ ✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂ ✂ ✄ ✄ ✄ ✄ ✄ ✄ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ✆ ✆ ✆ ✆ ✆ ✆ ✝ ✝ ✝ ✝ ✝ ✝ ✞ ✞ ✞ ✞ ✞ ✞

a b

I1 I2 I3

Finding out how to cut How to connect parts afterward First step on I1 → Finds 2 classes I1a and I1α; a ∈ I1a. First step on I3 → Finds 2 classes I1b and I1β; b ∈ I1b.

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 23 / 27 ⊲⊲ |

slide-70
SLIDE 70

Extension for cycles

Let a, b be the elements of Ci with the more interferences. Lemma: no solution with ∃z ∈ H so that z ∈ (a − → b) ⇒ Cut between a and b!

✁ ✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂ ✂ ✄ ✄ ✄ ✄ ✄ ✄ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ✆ ✆ ✆ ✆ ✆ ✆ ✝ ✝ ✝ ✝ ✝ ✝ ✞ ✞ ✞ ✞ ✞ ✞

a b

I1 I2 I3

Finding out how to cut How to connect parts afterward First step on I1 → Finds 2 classes I1a and I1α; a ∈ I1a. First step on I3 → Finds 2 classes I1b and I1β; b ∈ I1b. Reconnect I1a and I1b ; Reconnect I1α and I1β.

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 23 / 27 ⊲⊲ |

slide-71
SLIDE 71

Extension for cycles

Let a, b be the elements of Ci with the more interferences. Lemma: no solution with ∃z ∈ H so that z ∈ (a − → b) ⇒ Cut between a and b!

✁ ✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂ ✂ ✄ ✄ ✄ ✄ ✄ ✄ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ✆ ✆ ✆ ✆ ✆ ✆ ✝ ✝ ✝ ✝ ✝ ✝ ✞ ✞ ✞ ✞ ✞ ✞

a b

I1 I2 I3

Finding out how to cut How to connect parts afterward First step on I1 → Finds 2 classes I1a and I1α; a ∈ I1a. First step on I3 → Finds 2 classes I1b and I1β; b ∈ I1b. Reconnect I1a and I1b ; Reconnect I1α and I1β. No demonstration of this...

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 23 / 27 ⊲⊲ |

slide-72
SLIDE 72

Data collection

Interference measurement between each pair of hosts.

  • Naïve algorithm:
  • N 4, 30s. per step ⇒ 50 days for 20 hosts.
  • Speedups thanks to traceroute or other tomography
  • Independent tests in parallel
  • Validation of information sets
  • Refinement of existing graph?

Deserve more investigation

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 24 / 27 ⊲⊲ |

slide-73
SLIDE 73

Contributions of ALNeM

  • Retrieve the interference-based topology from direct measurements
  • Strong mathemathical basements (optimal for cliques of trees)
  • More generic than ENV (algorithm for cycles)
  • 2 000 lines of C code; one research report
  • Based on GRAS [Quinson03]
  • development on simulator (SimGrid [CLM03]) and immediate deployment
  • target: distributed event-based applications, C language
  • 10 000 lines of C code, Linux, Solaris
  • Submitted to one workshop
166 32 42 5 6 1 8 10 16 101 103 20 105 106 109 22 11 12 14 19 110 112 114 39 118 119 121 124 34 126 36 13 15 130 131 133 134 31 136 138 46 60 141 143 144 40 146 147 148 4 7 152 153 154 47 155 158 159 44 18 75 161 164 58 165 167 168 169 17 80 170 171 172 173 174 52 175 176 177 179 59 70 180 182 183 53 2 27 3 51 21 25 28 85 23 24 29 26 30 35 33 37 38 9 45 41 50 54 55 56 57 61 62 63 64 66 67 68 69 71 72 73 74 76 78 79 86 92 94 96 98 99

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 25 / 27 ⊲⊲ |

slide-74
SLIDE 74

Contributions of ALNeM

  • Retrieve the interference-based topology from direct measurements
  • Strong mathemathical basements (optimal for cliques of trees)
  • More generic than ENV (algorithm for cycles)
  • 2 000 lines of C code; one research report
  • Based on GRAS [Quinson03]
  • development on simulator (SimGrid [CLM03]) and immediate deployment
  • target: distributed event-based applications, C language
  • 10 000 lines of C code, Linux, Solaris
  • Submitted to one workshop
166 32 42 5 6 1 8 10 16 101 103 20 105 106 109 22 11 12 14 19 110 112 114 39 118 119 121 124 34 126 36 13 15 130 131 133 134 31 136 138 46 60 141 143 144 40 146 147 148 4 7 152 153 154 47 155 158 159 44 18 75 161 164 58 165 167 168 169 17 80 170 171 172 173 174 52 175 176 177 179 59 70 180 182 183 53 2 27 3 51 21 25 28 85 23 24 29 26 30 35 33 37 38 9 45 41 50 54 55 56 57 61 62 63 64 66 67 68 69 71 72 73 74 76 78 79 86 92 94 96 98 99

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 25 / 27 ⊲⊲ |

slide-75
SLIDE 75

Overview

  • Introduction
  • NWS: Network Weather Service
  • FAST: Fast’s Agent System Timer
  • ALNeM: Application-Level Network Mapper
  • Conclusion

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 25 / 27 ⊲⊲ |

slide-76
SLIDE 76

Conclusion

  • Major issue on the Grid: collecting data (before scheduling)

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 26 / 27 ⊲⊲ |

slide-77
SLIDE 77

Conclusion

  • Major issue on the Grid: collecting data (before scheduling)
  • Gathering quantitative data: NWS + FAST

NWS: System availability

Contributions: – Lower latency – Better responsiveness – Process management Future work: – Automatic deployment

FAST: Routine needs

Contributions: – Generic benchmarking framework – Unified interface to quantitative data – Virtual booking – Integration: DIET, NetSolve, Grid-TLSE – 2 journals; 3 conferences/workshops Future work: – Integration of Freddy – Irregular routines (sparse algebra) – New metrics (like I/O)? – Yet better integration within NWS

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 26 / 27 ⊲⊲ |

slide-78
SLIDE 78

Conclusion

  • Major issue on the Grid: collecting data (before scheduling)
  • Gathering quantitative data: NWS + FAST

NWS: System availability

Contributions: – Lower latency – Better responsiveness – Process management Future work: – Automatic deployment

FAST: Routine needs

Contributions: – Generic benchmarking framework – Unified interface to quantitative data – Virtual booking – Integration: DIET, NetSolve, Grid-TLSE – 2 journals; 3 conferences/workshops Future work: – Integration of Freddy – Irregular routines (sparse algebra) – New metrics (like I/O)? – Yet better integration within NWS

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 26 / 27 ⊲⊲ |

slide-79
SLIDE 79

Conclusion

  • Major issue on the Grid: collecting data (before scheduling)
  • Gathering quantitative data: NWS + FAST

NWS: System availability

Contributions: – Lower latency – Better responsiveness – Process management Future work: – Automatic deployment

FAST: Routine needs

Contributions: – Generic benchmarking framework – Unified interface to quantitative data – Virtual booking – Integration: DIET, NetSolve, Grid-TLSE – 2 journals; 3 conferences/workshops Future work: – Integration of Freddy – Irregular routines (sparse algebra) – New metrics (like I/O)? – Yet better integration within NWS

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 26 / 27 ⊲⊲ |

slide-80
SLIDE 80

Conclusion

  • Major issue on the Grid: collecting data (before scheduling)
  • Gathering quantitative data: NWS + FAST
  • Gathering qualitative data: ALNeM

ALNeM: Network topology to know about interferences

Contributions: – Strong mathematical basements – Optimal in size for cliques of trees – Partial cycle handling – GRAS: application development tool – Submitted to one workshop Future work: – Proof of NP-hardness . . . – . . . or exact algorithm – Experimentation on real platform – Optimization of the measurements – Iterative algo. (modification detection) – Integration within NWS – Hosting of DIET

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 26 / 27 ⊲⊲ |

slide-81
SLIDE 81

Selected publications

Book chapter: 1 national

  • E. Caron, F. Desprez, E. Fleury, F. Lombard, J.-M. Nicod, M. Quinson, and F. Suter. Une approche

hiérarchique des serveurs de calculs, in Calcul réparti à grande échelle. Hermès Science Paris,

  • 2002. ISBN 2-7462-0472-X.

Journals: 2 internationals (+ 1 submitted), 1 national

  • E. Caron, F. Desprez, M. Quinson, and F. Suter. Performance Evaluation of Linear Algebra Routines

for Network Enabled Servers. Parallel Computing, special issue on Cluters and Computational Grids for scientific computing, 2003.

  • F. Desprez, M. Quinson. Dynamic Performance Forecasting for Network-Enabled Servers in a Grid
  • Environment. Submitted to IEEE Transactions on Parallel and Distributed Systems.

Conferences/workshops: 4 internationals (+ 2 submitted), 2 nationals.

  • Ph. Combes, F. Lombard, M. Quinson, and F. Suter. A Scalable Approach to Network Enabled
  • Servers. Proceedings of the 7th Asian Computing Science Conference. LNCS 2550:110–124,

Springer-Verlag, Jan 2002.

  • M. Quinson. Dynamic Performance Forecasting for Network-Enabled Servers in a Metacomputing
  • Environment. International Workshop on Performance Modeling, Evaluation, and Optimization of

Parallel and Distributed Systems (PMEO-PDS’02), April 15-19 2002.

  • A. Legrand, M. Quinson. Automatic deployment of the Network Weather Service using the Effective

Network View. Submitted to Workshop on Grid Benchmarking, associated to IPDPS’04.

  • O. Aumage, A. Legrand, M. Quinson. Reconciling the Grid Reality And Simulation. Submitted to

Parallel and Distributed Systems: Testing and Debugging, associated to IPDPS’04.

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 27 / 27 ⊲⊲ |

slide-82
SLIDE 82

Appendix

slide-83
SLIDE 83

Sensor in the middle

C B A Test Test

NWS NWS

?

bp(AC) = min (bp(AB); bp(BC)) lat(AC) = lat(AB) + lat(BC) It’s a must to reassemble measurements in hierarchical monitoring

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 28 / 27 ⊲⊲ |

slide-84
SLIDE 84

RPC and grid computing: GridRPC

A simple idea: Implement the RPC model over the Grid

  • Remote Procedure Call: run a computation remotely
  • Good and simple paradigm to implement the Grid
  • Some of the functionalities needed:
  • Computation scheduling, data migration
  • Security, fault-tolerance, interoperability, . . .

C C C C C S S S S S

Agent

  • 5 fundamental components:

Client Server Agent Monitor Database

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 29 / 27 ⊲⊲ |

slide-85
SLIDE 85

RPC and grid computing: GridRPC

A simple idea: Implement the RPC model over the Grid

  • Remote Procedure Call: run a computation remotely
  • Good and simple paradigm to implement the Grid
  • Some of the functionalities needed:
  • Computation scheduling, data migration
  • Security, fault-tolerance, interoperability, . . .

C C C C C S S S S S

Agent

  • 5 fundamental components:

Client Several user interfaces which submit the requests to servers Server Agent Monitor Database

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 29 / 27 ⊲⊲ |

slide-86
SLIDE 86

RPC and grid computing: GridRPC

A simple idea: Implement the RPC model over the Grid

  • Remote Procedure Call: run a computation remotely
  • Good and simple paradigm to implement the Grid
  • Some of the functionalities needed:
  • Computation scheduling, data migration
  • Security, fault-tolerance, interoperability, . . .

C C C C C S S S S S

Agent

  • 5 fundamental components:

Client Several user interfaces which submit the requests to servers Server Runs software modules to solve client’s requests Agent Monitor Database

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 29 / 27 ⊲⊲ |

slide-87
SLIDE 87

RPC and grid computing: GridRPC

A simple idea: Implement the RPC model over the Grid

  • Remote Procedure Call: run a computation remotely
  • Good and simple paradigm to implement the Grid
  • Some of the functionalities needed:
  • Computation scheduling, data migration
  • Security, fault-tolerance, interoperability, . . .

C C C C C S S S S S

Agent

  • 5 fundamental components:

Client Several user interfaces which submit the requests to servers Server Runs software modules to solve client’s requests Agent Gets client’s requests and schedules them onto the servers Monitor Database

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 29 / 27 ⊲⊲ |

slide-88
SLIDE 88

RPC and grid computing: GridRPC

A simple idea: Implement the RPC model over the Grid

  • Remote Procedure Call: run a computation remotely
  • Good and simple paradigm to implement the Grid
  • Some of the functionalities needed:
  • Computation scheduling, data migration
  • Security, fault-tolerance, interoperability, . . .

C C C C C S S S S S

Agent

  • 5 fundamental components:

Client Several user interfaces which submit the requests to servers Server Runs software modules to solve client’s requests Agent Gets client’s requests and schedules them onto the servers Monitor Monitors the current state of the resources Database

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 29 / 27 ⊲⊲ |

slide-89
SLIDE 89

RPC and grid computing: GridRPC

A simple idea: Implement the RPC model over the Grid

  • Remote Procedure Call: run a computation remotely
  • Good and simple paradigm to implement the Grid
  • Some of the functionalities needed:
  • Computation scheduling, data migration
  • Security, fault-tolerance, interoperability, . . .

DB

C C C C C S S S S S

Agent

  • 5 fundamental components:

Client Several user interfaces which submit the requests to servers Server Runs software modules to solve client’s requests Agent Gets client’s requests and schedules them onto the servers Monitor Monitors the current state of the resources Database Contains static and dynamic knowledges about resources

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 29 / 27 ⊲⊲ |

slide-90
SLIDE 90

RPC and grid computing: GridRPC

A simple idea: Implement the RPC model over the Grid

  • Remote Procedure Call: run a computation remotely
  • Good and simple paradigm to implement the Grid
  • Some of the functionalities needed:
  • Computation scheduling, data migration
  • Security, fault-tolerance, interoperability, . . .

DB

C C C C C S S S S S

Agent

  • 5 fundamental components:

Client Several user interfaces which submit the requests to servers Server Runs software modules to solve client’s requests Agent Gets client’s requests and schedules them onto the servers Monitor Monitors the current state of the resources Database Contains static and dynamic knowledges about resources

Knowing the platform is crucial for the agent

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 29 / 27 ⊲⊲ |

slide-91
SLIDE 91

Freddy

Temps pdgemm(M, N, K) = K R

  • × temps_dgemm + (M × K)τ q

p + (K × N)τ p q +

  • λq

p + λp q

K R

  • .

B A Gb Ga Gv2 Gv1 Distributions Matrices grids Possible virtual

5 10 15 20 25 30 35 40 45 50 Ga Gb Gv1 Gv2 Multiplication Redistribution

meas. meas. meas. meas. fore. fore. fore. fore.

  • F. Suter. Parallélisme mixte et prédiction de performances sur réseaux hétérogènes

de machines parallèles. PhD thesis, 2002.

  • E. Caron, F. Desprez, M. Quinson, and F. Suter. Performance Evaluation of Linear

Algebra Routines for Network Enabled Servers. Parallel Computing, special issue

  • n Cluters and Computational Grids for scientific computing (CCGSC’02), 2003.

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 30 / 27 ⊲⊲ |

slide-92
SLIDE 92

GRAS overview

  • development on simulator (SimGrid [CLM03]) and immediate deployment
  • target: distributed event-based applications, C language
  • 10 000 lines of code, Linux, Solaris
  • Futur: message expressivity, even higher performance, interoperability

Error handling Bandwidth test Linux SimGrid TCP SimGrid Syscalls virtualization Conditional execution

Constitutes a portability layer

Grounding features Communications Logs control Leader election Reality Simulation Locks Logs File Host management Data structures Configuration Data Representation Messages and callbacks

Simulates execution span Virtualizes expensive code

Build-in modules Solaris

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 31 / 27 ⊲⊲ |

slide-93
SLIDE 93

Hypothesis on the routing

Hypothesis 1: Routing consistent

  • 1-to-N: no merge after branch
  • N-to-1: no split after join

A B C Hypothesis 2: Routing symmetric

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 32 / 27 ⊲⊲ |

slide-94
SLIDE 94

Algorithm for cliques of trees

  • 1. Initialization: i ← 0; Ci ← H; Ei ← ∅ ; Vi ← ∅
  • 2. Classes lookup: h1, . . . , hp: classes of ⊥ over Ci ; ∀i, li ∈ hi

Ci+1 ← {l1, . . . , lp}

  • 3. Graph update: Vi+1 ← Vi ; Ei+1 ← Ei

∀hj ∈ Ci, ∀v ∈ hj, do Ei+1 ← Ei+1 ∪ {(v, lj)} and Vi+1 ← Vi+1 ∪ {v}

  • 4. Interference matrix update

Let lα, lβ, lγ, lδ ∈ Ci+1 represent respectively hα, hβ, hγ, hδ. For each mα, mβ, mγ, mδ ∈ Ci so that mα ∈ hα, mβ ∈ hβ, mγ ∈ hγ, mδ ∈ hδ. I(Ci+1, )

  • lα, lβ, lγ, lδ
  • = I
  • Ci,
  • mα, mβ, mγ, mδ
  • 5. Iterate 2–3 until Ci = Ci+1.

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 33 / 27 ⊲⊲ |

slide-95
SLIDE 95

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-96
SLIDE 96

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-97
SLIDE 97

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-98
SLIDE 98

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-99
SLIDE 99

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-100
SLIDE 100

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-101
SLIDE 101

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-102
SLIDE 102

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-103
SLIDE 103

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-104
SLIDE 104

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-105
SLIDE 105

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-106
SLIDE 106

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-107
SLIDE 107

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-108
SLIDE 108

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-109
SLIDE 109

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-110
SLIDE 110

ALNeM: example of execution

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 34 / 27 ⊲⊲ |

slide-111
SLIDE 111

DIET: Distributed Interactive Engineering Toolbox

Goal : Metacomputing platform (GridRPC model)

  • Complete and ready to use for users
  • Extensible by researchers

Main functionalities :

  • Distributed and hierarchical scheduling;
  • Resources localization ;
  • Data persistence ;
  • Platform monitoring ;

Teams : GRAAL (ENS-Lyon), U. Besançon, Insa-Lyon, Loria (Nancy), Sun. Targeted applications : Grid-ASP

  • Digital elevation model (Geology – LST ENS-Lyon) ;
  • Molecular dynamics (Physique – Lyon-I et al.) ;
  • HSEP (chemical – SRSMC Nancy) ;
  • Circuit simulation (electronic – Ircom) ;
  • ACI TLSE (sparse matrix expertise – Toulouse) ;

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 35 / 27 ⊲⊲ |

slide-112
SLIDE 112

DIET : Handling of a request

  • 1. Clients connect to the MA
  • 2. Request transmission to servers
  • 3. Performance evaluation : FAST (NWS)
  • 4. Back to MA : distributed scheduling
  • 5. (Broadcast if impossible in local tree)
  • 6. Result sent back to the client
  • 7. Direct client-server connection

MA

LA LA LA

S S S S S

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 36 / 27 ⊲⊲ |

slide-113
SLIDE 113

DIET : Handling of a request

  • 1. Clients connect to the MA
  • 2. Request transmission to servers
  • 3. Performance evaluation : FAST (NWS)
  • 4. Back to MA : distributed scheduling
  • 5. (Broadcast if impossible in local tree)
  • 6. Result sent back to the client
  • 7. Direct client-server connection

?

C

MA

LA LA LA

S S S S S

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 36 / 27 ⊲⊲ |

slide-114
SLIDE 114

DIET : Handling of a request

  • 1. Clients connect to the MA
  • 2. Request transmission to servers
  • 3. Performance evaluation : FAST (NWS)
  • 4. Back to MA : distributed scheduling
  • 5. (Broadcast if impossible in local tree)
  • 6. Result sent back to the client
  • 7. Direct client-server connection

? ? ? ? ? ?

C

MA

LA LA LA

S S S S S

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 36 / 27 ⊲⊲ |

slide-115
SLIDE 115

DIET : Handling of a request

  • 1. Clients connect to the MA
  • 2. Request transmission to servers
  • 3. Performance evaluation : FAST (NWS)
  • 4. Back to MA : distributed scheduling
  • 5. (Broadcast if impossible in local tree)
  • 6. Result sent back to the client
  • 7. Direct client-server connection

C

MA

LA LA LA

S S S S S

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 36 / 27 ⊲⊲ |

slide-116
SLIDE 116

DIET : Handling of a request

  • 1. Clients connect to the MA
  • 2. Request transmission to servers
  • 3. Performance evaluation : FAST (NWS)
  • 4. Back to MA : distributed scheduling
  • 5. (Broadcast if impossible in local tree)
  • 6. Result sent back to the client
  • 7. Direct client-server connection

1

S =54

3

S =120 S =5

5

C

1 2 3 4 5

MA

LA LA LA

S S S S S

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 36 / 27 ⊲⊲ |

slide-117
SLIDE 117

DIET : Handling of a request

  • 1. Clients connect to the MA
  • 2. Request transmission to servers
  • 3. Performance evaluation : FAST (NWS)
  • 4. Back to MA : distributed scheduling
  • 5. (Broadcast if impossible in local tree)
  • 6. Result sent back to the client
  • 7. Direct client-server connection

1

S =54 S =5

5 1

S =54

3

S =120 S =5

5

C

1 2 3 4 5

MA

LA LA LA

S S S S S

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 36 / 27 ⊲⊲ |

slide-118
SLIDE 118

DIET : Handling of a request

  • 1. Clients connect to the MA
  • 2. Request transmission to servers
  • 3. Performance evaluation : FAST (NWS)
  • 4. Back to MA : distributed scheduling
  • 5. (Broadcast if impossible in local tree)
  • 6. Result sent back to the client
  • 7. Direct client-server connection

S =5

5 1

S =54 S =5

5 1

S =54

3

S =120 S =5

5

C

1 2 3 4 5

MA

LA LA LA

S S S S S

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 36 / 27 ⊲⊲ |

slide-119
SLIDE 119

DIET : Handling of a request

  • 1. Clients connect to the MA
  • 2. Request transmission to servers
  • 3. Performance evaluation : FAST (NWS)
  • 4. Back to MA : distributed scheduling
  • 5. (Broadcast if impossible in local tree)
  • 6. Result sent back to the client
  • 7. Direct client-server connection

C

MA

MA

MA MA

MA

MA

LA LA LA

S S S S S

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 36 / 27 ⊲⊲ |

slide-120
SLIDE 120

DIET : Handling of a request

  • 1. Clients connect to the MA
  • 2. Request transmission to servers
  • 3. Performance evaluation : FAST (NWS)
  • 4. Back to MA : distributed scheduling
  • 5. (Broadcast if impossible in local tree)
  • 6. Result sent back to the client
  • 7. Direct client-server connection

S

5

C

MA

MA

MA MA

MA

1 2 3 4 5

MA

LA LA LA

S S S S S

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 36 / 27 ⊲⊲ |

slide-121
SLIDE 121

DIET : Handling of a request

  • 1. Clients connect to the MA
  • 2. Request transmission to servers
  • 3. Performance evaluation : FAST (NWS)
  • 4. Back to MA : distributed scheduling
  • 5. (Broadcast if impossible in local tree)
  • 6. Result sent back to the client
  • 7. Direct client-server connection

C

MA

MA

MA MA

MA

1 2 3 4 5

MA

LA LA LA

S S S S S

Martin QUINSON December 11th 2003 | ⊳⊳ Slide 36 / 27 ⊲⊲ |