The Flexlab Approach To Realistic Evaluation of Networked Systems - - PowerPoint PPT Presentation

the flexlab approach to realistic evaluation of networked
SMART_READER_LITE
LIVE PREVIEW

The Flexlab Approach To Realistic Evaluation of Networked Systems - - PowerPoint PPT Presentation

The Flexlab Approach To Realistic Evaluation of Networked Systems Robert Ricci, Jonathon Duerig, Pramod Sanaga, Daniel Gebhardt, Mike Hibler, Kevin Atkinson, Junxing Zhang, Sneha Kasera, and Jay Lepreau NSDI 2007 April 12, Cambridge, MA


slide-1
SLIDE 1

The Flexlab Approach To Realistic Evaluation of Networked Systems

Robert Ricci, Jonathon Duerig, Pramod Sanaga, Daniel Gebhardt, Mike Hibler, Kevin Atkinson, Junxing Zhang, Sneha Kasera, and Jay Lepreau

NSDI 2007 April 12, Cambridge, MA

slide-2
SLIDE 2

Emulators

Application Emulator Host Application Emulator Host Application Traffic Path Emulator

Examples: ModelNet and Emulab The Good: Control, repeatability, wide variety of network conditions The Bad: Artificial network conditions

2

slide-3
SLIDE 3

Overlay Testbeds

Internet Application Traffic Application Overlay Host Application Overlay Host

Examples: RON and PlanetLab The Good: Real network conditions, deployment platform The Bad: Overloaded, few privileged

  • perations, poor repeatability, hard to

develop/debug on

3

slide-4
SLIDE 4

Evaluating Networked Systems: Flexlab

slide-5
SLIDE 5

Goal: Real Internet within Emulator

Application Emulator Host Application Emulator Host Application Traffic Internet Model Internet

5

slide-6
SLIDE 6

The Flexlab Approach

Measure

6

slide-7
SLIDE 7

The Flexlab Approach

Measure Model

7

slide-8
SLIDE 8

The Flexlab Approach

Measure Model Emulate

8

slide-9
SLIDE 9

The Flexlab Approach

Measure Model Emulate

9

slide-10
SLIDE 10

The Flexlab Approach

Measure Model 2 Emulate

10

slide-11
SLIDE 11

Key Points

Software framework for pluggable network models Application behavior can drive measurements & model in real-time Application-Centric Internet Modeling High fidelity measurement/ emulation technique Includes new techniques for ABW measurement

11

slide-12
SLIDE 12

More in the Paper

Flexible network measurement system Network stationarity results Two straightforward network models Shared bottleneck analysis PlanetLab scheduling delay measurements

12

slide-13
SLIDE 13

Flexlab Architecture

slide-14
SLIDE 14

Flexlab: Application

Application Emulab Host Application Emulab Host Application Traffic

14

slide-15
SLIDE 15

Flexlab: Application Monitor

Application Emulab Host Application Emulab Host Application Traffic

App Monitor App Monitor

15

slide-16
SLIDE 16

Flexlab: Network Model

Application Emulab Host Application Emulab Host Application Traffic

App Monitor App Monitor

Network Model Offered Load Model

16

slide-17
SLIDE 17

Flexlab: Measurement Repo.

Application Emulab Host Application Emulab Host Application Traffic

App Monitor App Monitor

Network Model Offered Load Model Measurement Repository

17

slide-18
SLIDE 18

Flexlab: Path Emulator

Path Emulator Network Characteristics Application Emulab Host Application Emulab Host Application Traffic

App Monitor App Monitor

Network Model Offered Load Model Measurement Repository

18

slide-19
SLIDE 19

ACIM: Application-Centric Internet Modeling

slide-20
SLIDE 20

Imagine Ideal Fidelity

Application Traffic Emulab Host Application Emulab Host Application PlanetLab Host PlanetLab Host Internet

20

slide-21
SLIDE 21

ACIM Architecture

Path Emulator Application

App Monitor

Application

App Monitor

Internet Emulab PlanetLab

Agent Agent

Measurement Traffic Application Traffic PlanetLab Sliver PlanetLab Sliver

21

slide-22
SLIDE 22

ACIM Design Challenges

Determining when to drop packets Finding relationship between throughput and ABW Extension to UDP CPU starvation on PlanetLab Host artifacts in throughput Packet loss in libpcap

22

slide-23
SLIDE 23

ACIM Path Emulator Parameters

Packets enter Packets leave Queuing delay All other delay Available bandwidth

23

slide-24
SLIDE 24

All Other Delay

Packets enter Packets leave Queuing delay All other delay Available bandwidth

Base RTT: Smallest RTT seen recently [Vegas 95] Packets saw little or no queueing delay

24

slide-25
SLIDE 25

Packet Loss

Packets enter Packets leave Queuing delay All other delay Available bandwidth

Caused by full queue at bottleneck link Difficult to measure directly So measure queue length in time: Max recent RTT - Base RTT

25

slide-26
SLIDE 26

Throughput and ABW

Time Bandwidth Delay

26

slide-27
SLIDE 27

Throughput and ABW

Time Bandwidth Delay

Offered load Available bandwidth

27

slide-28
SLIDE 28

Throughput and ABW

Time Bandwidth Delay

Offered load Available bandwidth Measured throughput

28

slide-29
SLIDE 29

Throughput and ABW

Time Bandwidth Delay

Offered load Available bandwidth Measured throughput RTT

29

slide-30
SLIDE 30

Throughput and ABW

Time Bandwidth Delay

Offered load Available bandwidth Measured throughput RTT

30

slide-31
SLIDE 31

Throughput and ABW

Time Bandwidth Delay

Offered load Available bandwidth Measured throughput RTT

31

slide-32
SLIDE 32

Throughput and ABW

Time Bandwidth Delay

Offered load Available bandwidth Measured throughput RTT

32

slide-33
SLIDE 33

Throughput and ABW

Time Bandwidth Delay

Offered load Available bandwidth Measured throughput RTT

33

slide-34
SLIDE 34

Throughput and ABW

Time Bandwidth Delay

Offered load Available bandwidth Measured throughput RTT

34

slide-35
SLIDE 35

Throughput and ABW

Time Bandwidth Delay

Offered load Available bandwidth Measured throughput RTT

35

slide-36
SLIDE 36

Throughput and ABW

Time Bandwidth Delay

Offered load Available bandwidth Measured throughput RTT

36

slide-37
SLIDE 37

Throughput and ABW

If (throughput > last ABW measurement), use new value Else, look for indications that throughput has reached ABW Socket buffer is filling up AND Recent RTTs have been increasing Using linear regression

37

slide-38
SLIDE 38

ACIM Features

Precise: assesses only relevant parts of the network Scales in nodes and paths Complete: automatically captures all relevant network behavior Simpler to measure e2e effects than find causes Detects rare and transient effects Evokes all reactive network behaviors (except content-based) Rapidly tracks conditions

38

slide-39
SLIDE 39

ACIM Accuracy

Is ACIM path emulation accurate? Is it accurate at fine granularity?

39

slide-40
SLIDE 40

Methodology

iperf runs in Emulab Measurement Agent runs on PlanetLab at UT Austin and AT&T Research We added transient TCP cross traffic between these sites

40

slide-41
SLIDE 41

TCP iperf Throughput

4 5

Flexlab with ACIM

1 2 3 10 20 30 40 50 60 70 80 90 100 110 120

Throughput (Mbps) Time (seconds)

1 2 3 4 5

Measurement Agent

41

slide-42
SLIDE 42

TCP iperf Throughput

4 5

Flexlab with ACIM

1 2 3 10 20 30 40 50 60 70 80 90 100 110 120

Throughput (Mbps) Time (seconds)

1 2 3 4 5

Measurement Agent

42

slide-43
SLIDE 43

A Real Application

Does ACIM give accurate results for a real,complicated application?

43

slide-44
SLIDE 44

A Real Application

Does ACIM give accurate results for a real,complicated application? ... does PlanetLab?

44

slide-45
SLIDE 45

A Real Application

Does ACIM give accurate results for a real,complicated application? ... does PlanetLab? Can we discover ground truth?

45

slide-46
SLIDE 46

Methodology: BitTorrent

Two simultaneous instances of reference BitTorrent: One on PlanetLab One in Flexlab Eight nodes in US and Europe: One seed, seven clients We reduced randomness in BT ... but some still remains

46

slide-47
SLIDE 47

BitTorrent w/ CPU Reservation

47

slide-48
SLIDE 48

BitTorrent w/ CPU Reservation

100 200 300 400 500 600 Throughput (Mbps) Time (seconds) 5 10 5 10

PlanetLab: 5.2 Mbps average Flexlab: 5.4 Mbps average

48

slide-49
SLIDE 49

BitTorrent w/o CPU Reservation

100 200 300 400 500 600 Throughput (Mbps) Time (seconds) 5 10 5 10

PlanetLab: 2.3 Mbps average Flexlab: 5.8 Mbps average

49

slide-50
SLIDE 50

BitTorrent Bottom Line

Conclusion: For this experiment, both Flexlab and PlanetLab with CPU reservations give accurate results PlanetLab alone does not CPU availability on PlanetLab hurts BitTorrent ACIM reduces host resource needs on PlanetLab for this experiment BitTorrent: 36-76% CPU ACIM Agent: 2.6% CPU Factor of 15 - 30 CPU Factor of 4 memory

50

slide-51
SLIDE 51

The Future?

No need to perfect in PlanetLab: Full resource isolation Total control over hosts Orthogonal control network ... use in the emulators that already have them Use PlanetLab nodes as NICs Conserve resources for deployed services with end users

51

slide-52
SLIDE 52

Conclusion

New approach to evaluating networked systems Separates the network model Designed to leverage vibrant measurement and modeling community Couples an emulator to an overlay testbed ACIM high fidelity emulation technique Contact testbed-ops@emulab.net to use

52

slide-53
SLIDE 53

Backup Slides

slide-54
SLIDE 54

Why not just add more nodes to every PlanetLab site?

Remaining problems: Poor repeatability Hard to develop/debug No privileged operations Some malicious traffic cannot be tested Some Flexlab network models reduce network load Emulab node pool stat muxed and shared more efficiently than per-site pools Overload can (will?) still happen with PL's pure shared-host model Major practical barriers: admin, cost

54

slide-55
SLIDE 55

Flexlab and VINI

Entirely different kinds of realism and control Flexlab: passes "experiment" traffic over shared path Real Internet conditions from other traffic on same path, but app. traffic is not from real users Control: of all software Environment: friendly local dev. environ, dedicated hosts VINI: can pass "real traffic" over dedicated link Real routing, real neighbor ISPs, potentially traffic from real users, but network resources are not realistic/representative Dedicated pipes with dedicated bandwidth, that insulate experiment from normal Internet conditions Control: restricted to VINI's APIs (Click, XORP, etc.) Environment: distributed environ; shared host resources

55

slide-56
SLIDE 56

Change Point Analysis

Asia to Asia Asia to Commercial Asia to Europe Asia to I2 Commercial to Commercial Commercial to Europe Commercial to I2 I2 to I2 I2 to Europe Europe to Europe 2 2 4 6 20 4 13 4 9 1 2 1 1 0.13% 2.9% 0.5% 0.59% 3.4% 0.02%

  • Path

High Low Change 39% 15% 12%

56

slide-57
SLIDE 57

Simple Static Model

Datapository Flexmon All-Sites PlanetLab Measurements

Network Characteristics to Path Emulator

Static Network Model

57

slide-58
SLIDE 58

Simple Dynamic Model

Datapository Flexmon All-Sites PlanetLab Measurements

Network Characteristics to Path Emulator

Dynamic Network Model Application Network Model from Monitor

58

slide-59
SLIDE 59

Flexmon Architecture

Manager Client Manager Client Manager Client Manager Auto-Manager Client

. . .

Path Prober

. . .

Data Collector Path Prober Path Prober Path Emulators Flexlab PlanetLab Emulab Datapository

Shared Reliable Safe Adaptive Controllable Accommodates high-performance data retrieval

59

slide-60
SLIDE 60

CPU Starvation on PlanetLab

Host Artifacts Long period when agent can't read or write Empty socket buffer or full receive window Solution: Detect and ignore Packet loss from libpcap Long period without reading libpcap buffer Many packets are dropped at once Solution: Detect and ignore

60

slide-61
SLIDE 61

Reverse Path Congestion

Can cause ack compression Throughput Measurement Throughput numbers become much noisier We abuse the TCP timestamp option PlanetLab: homogeneous OS environment Extending it would require hacking client RTT Measurement Future work

61

slide-62
SLIDE 62

Initial Conditions

Needed to bootstrap ACIM ACIM uses traffic to generate conditions But conditions must exist for first traffic We created a measurement framework All pairs of sites are measured Put data into measurement repository Set initial conditions to latest measurements

62

slide-63
SLIDE 63

Simultaneous TCP iperf

Time (seconds) Flexlab with ACIM

20 40 60 80 100 120

Throughput (Mbps) PlanetLab

3 2 1 1 2 3

63

slide-64
SLIDE 64

Repeatability vs. Fidelity

Higher Network Fidelity More Repeatable

Emulab Static Dynamic PlanetLab ACIM General Internet Model

64

slide-65
SLIDE 65

Throughput and ABW

Time Agent write()s RTT Packets On The Wire

ACK ACK ACK ACK ACK Data Data Data Data

Avail-BW? Throughput

Application Offered Load

65

slide-66
SLIDE 66

Currently available for Beta Testing http://www.flux.utah.edu/flexlab

slide-67
SLIDE 67

UDP Streaming Video

0.3 0.6 0.9 1.2 1.5 0.0 0.3 0.6 0.9 1.2 1.5 50 100 150 200 250 300 Throughput (Mbps) Time (seconds)

67

slide-68
SLIDE 68

Opens Up New Questions

Further validation Accuracy tests at runtime Similar in spirit to Emulab's linktest Use to compare models Find which models most appropriate for different classes of applications Replay for ACIM Study fidelity of different software combinations Different TCP implementation or OS in Emulab

68