[PPT] - Load Shedding in Network Monitoring Applications . Barlet-Ros 1 G. PowerPoint Presentation

SLIDE 1

Introduction Prediction Method Load Shedding Evaluation Conclusions

Load Shedding in Network Monitoring Applications

P . Barlet-Ros1

G. Iannaccone2
J. Sanjuàs-Cuxart1
D. Amores-López1
J. Solé-Pareta1

1Technical University of Catalonia (UPC)

Barcelona, Spain {pbarlet, jsanjuas, damores, pareta}@ac.upc.edu

2Intel Research

Berkeley, CA gianluca.iannaccone@intel.com

USENIX Annual Technical Conference, 2007

1 / 18

SLIDE 2

Introduction Prediction Method Load Shedding Evaluation Conclusions

Outline

1

Introduction Motivation Case Study: Intel CoMo

2

Prediction Method Work Hypothesis Multiple Linear Regression

3

Load Shedding When, Where and How Much

4

Evaluation and Operational Results Performance Results Accuracy Results

5

Conclusions and Future Work

2 / 18

SLIDE 3

Introduction Prediction Method Load Shedding Evaluation Conclusions

Outline

1

Introduction Motivation Case Study: Intel CoMo

2

Prediction Method Work Hypothesis Multiple Linear Regression

3

Load Shedding When, Where and How Much

4

Evaluation and Operational Results Performance Results Accuracy Results

5

Conclusions and Future Work

2 / 18

SLIDE 4

Introduction Prediction Method Load Shedding Evaluation Conclusions

Motivation

Building robust network monitoring applications is hard

Unpredictable nature of network traffic Anomalous traffic, extreme data mixes, highly variable data rates

Processing requirements have greatly increased in recent years

E.g., intrusion and anomaly detection

3 / 18

SLIDE 5

Introduction Prediction Method Load Shedding Evaluation Conclusions

Motivation

Building robust network monitoring applications is hard

Unpredictable nature of network traffic Anomalous traffic, extreme data mixes, highly variable data rates

Processing requirements have greatly increased in recent years

E.g., intrusion and anomaly detection

The problem Efficiently handling extreme overload situations Over-provisioning is not possible

3 / 18

SLIDE 6

Introduction Prediction Method Load Shedding Evaluation Conclusions

Case Study: Intel CoMo

CoMo (Continuous Monitoring)1

Open-source passive monitoring system Fast implementation and deployment of monitoring applications

Traffic queries are defined as plug-in modules written in C

Contain complex computations Stateless filter and measurement interval

1http://como.sourceforge.net

4 / 18

SLIDE 7

Introduction Prediction Method Load Shedding Evaluation Conclusions

Case Study: Intel CoMo

CoMo (Continuous Monitoring)1

Open-source passive monitoring system Fast implementation and deployment of monitoring applications

Traffic queries are defined as plug-in modules written in C

Contain complex computations Stateless filter and measurement interval

Traffic queries are black boxes Arbitrary computations and data structures Load shedding cannot use knowledge about the queries

1http://como.sourceforge.net

4 / 18

SLIDE 8

Introduction Prediction Method Load Shedding Evaluation Conclusions

Load Shedding Approach

Working Scenario Monitoring system supporting multiple arbitrary queries Single resource: CPU cycles Approach: Real-time modeling of the queries’ CPU usage

1

Find correlation between traffic features and CPU usage

Features are query agnostic with deterministic worst case cost

2

Exploit the correlation to predict CPU load

3

Use the prediction to guide the load shedding procedure

5 / 18

SLIDE 9

Introduction Prediction Method Load Shedding Evaluation Conclusions

Load Shedding Approach

Working Scenario Monitoring system supporting multiple arbitrary queries Single resource: CPU cycles Approach: Real-time modeling of the queries’ CPU usage

1

Find correlation between traffic features and CPU usage

Features are query agnostic with deterministic worst case cost

2

Exploit the correlation to predict CPU load

3

Use the prediction to guide the load shedding procedure Novelty: No a priori knowledge of the queries is needed Preserves high degree of flexibility Increases possible applications and network scenarios

5 / 18

SLIDE 10

Introduction Prediction Method Load Shedding Evaluation Conclusions

System Overview

Figure: Prediction and Load Shedding Subsystem

6 / 18

SLIDE 11

Introduction Prediction Method Load Shedding Evaluation Conclusions

Outline

1

Introduction Motivation Case Study: Intel CoMo

2

Prediction Method Work Hypothesis Multiple Linear Regression

3

Load Shedding When, Where and How Much

4

Evaluation and Operational Results Performance Results Accuracy Results

5

Conclusions and Future Work

6 / 18

SLIDE 12

Introduction Prediction Method Load Shedding Evaluation Conclusions

Work Hypothesis

Our thesis Cost of mantaining data structures needed to execute a query can be modeled looking at a set of traffic features Empirical observation Different overhead when performing basic operations on the state while processing incoming traffic

E.g., creating or updating entries, looking for a valid match, etc.

Cost of a query is mostly dominated by the overhead of some of these operations

7 / 18

SLIDE 13

Introduction Prediction Method Load Shedding Evaluation Conclusions

Work Hypothesis

Our thesis Cost of mantaining data structures needed to execute a query can be modeled looking at a set of traffic features Empirical observation Different overhead when performing basic operations on the state while processing incoming traffic

E.g., creating or updating entries, looking for a valid match, etc.

Cost of a query is mostly dominated by the overhead of some of these operations Our method Models queries’ cost by considering the right set of traffic features

7 / 18

SLIDE 14

Introduction Prediction Method Load Shedding Evaluation Conclusions

Traffic Features vs CPU Usage

10 20 30 40 50 60 70 80 90 100 2 4 x 10

6

CPU cycles 10 20 30 40 50 60 70 80 90 100 1000 2000 3000 Packets 10 20 30 40 50 60 70 80 90 100 5 10 15 x 10

5

Bytes 10 20 30 40 50 60 70 80 90 100 1000 2000 3000 Time (s) 5−tuple flows

Figure: CPU usage compared to the number of packets, bytes and flows

8 / 18

SLIDE 15

Introduction Prediction Method Load Shedding Evaluation Conclusions

Traffic Features vs CPU Usage

1800 2000 2200 2400 2600 2800 3000 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 x 10

6

packets/batch CPU cycles new_5tuple_flows < 500 500 ≤ new_5tuple_flows < 700 700 ≤ new_5tuple_flows < 1000 new_5tuple_flows ≥ 1000

Figure: CPU usage versus the number of packets and flows

9 / 18

SLIDE 16

Introduction Prediction Method Load Shedding Evaluation Conclusions

Multiple Linear Regression (MLR)

Linear Regression Model Yi = β0 + β1X1i + β2X2i + · · · + βpXpi + εi, i = 1, 2, . . . , n.

Yi = n observations of the response variable (measured cycles) Xji = n observations of the p predictors (traffic features) βj = p regression coefficients (unknown parameters to estimate) εi = n residuals (OLS minimizes SSE)

Feature Selection Variant of the Fast Correlation-Based Filter2 (FCBF) Removes irrelevant and redundant predictors Reduces significantly the cost of the MLR

2L. Yu and H. Liu. Feature Selection for High-Dimensional Data:

A Fast Correlation-Based Filter Solution. In Proc. of ICML, 2003. 10 / 18

SLIDE 17

Introduction Prediction Method Load Shedding Evaluation Conclusions

System Overview

Prediction and Load Shedding subsystem

1

Each 100ms of traffic is grouped into a batch of packets

2

The traffic features are efficiently extracted from the batch (multi-resolution bitmaps)

3

The most relevant features are selected (using FCBF) to be used by the MLR

4

MLR predicts the CPU cycles required by the query to run

5

Load shedding is performed to discard a portion of the batch

6

CPU usage is measured (using TSC) and fed back to the prediction system

11 / 18

SLIDE 18

Introduction Prediction Method Load Shedding Evaluation Conclusions

Outline

1

Introduction Motivation Case Study: Intel CoMo

2

Prediction Method Work Hypothesis Multiple Linear Regression

3

Load Shedding When, Where and How Much

4

Evaluation and Operational Results Performance Results Accuracy Results

5

Conclusions and Future Work

11 / 18

SLIDE 19

Introduction Prediction Method Load Shedding Evaluation Conclusions

Load Shedding

When to shed load When the prediction exceeds the available cycles avail_cycles = (0.1 × CPU frequency) − overhead

Corrected according to prediction error and buffer space Overhead is measured using the time-stamp counter (TSC)

How and where to shed load Packet and Flow sampling (hash based) The same sampling rate is applied to all queries How much load to shed Maximum sampling rate that keeps CPU usage < avail_cycles srate = avail_cycles

pred_cycles

12 / 18

SLIDE 20

Introduction Prediction Method Load Shedding Evaluation Conclusions

Load Shedding

When to shed load When the prediction exceeds the available cycles avail_cycles = (0.1 × CPU frequency) − overhead

Corrected according to prediction error and buffer space Overhead is measured using the time-stamp counter (TSC)

How and where to shed load Packet and Flow sampling (hash based) The same sampling rate is applied to all queries How much load to shed Maximum sampling rate that keeps CPU usage < avail_cycles srate = avail_cycles

pred_cycles

12 / 18

SLIDE 21

Introduction Prediction Method Load Shedding Evaluation Conclusions

Load Shedding

When to shed load When the prediction exceeds the available cycles avail_cycles = (0.1 × CPU frequency) − overhead

Corrected according to prediction error and buffer space Overhead is measured using the time-stamp counter (TSC)

How and where to shed load Packet and Flow sampling (hash based) The same sampling rate is applied to all queries How much load to shed Maximum sampling rate that keeps CPU usage < avail_cycles srate = avail_cycles

pred_cycles

12 / 18

SLIDE 22

Introduction Prediction Method Load Shedding Evaluation Conclusions

Outline

1

Introduction Motivation Case Study: Intel CoMo

2

Prediction Method Work Hypothesis Multiple Linear Regression

3

Load Shedding When, Where and How Much

4

Evaluation and Operational Results Performance Results Accuracy Results

5

Conclusions and Future Work

12 / 18

SLIDE 23

Introduction Prediction Method Load Shedding Evaluation Conclusions

Load Shedding Performance

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 x 10

9

time CPU usage [cycles/sec] CoMo cycles Load shedding cycles Query cycles Predicted cycles CPU frequency

Figure: Stacked CPU usage (Predictive Load Shedding)

13 / 18

SLIDE 24

Introduction Prediction Method Load Shedding Evaluation Conclusions

Load Shedding Performance

2 4 6 8 10 12 14 16 x 10

8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 CPU usage [cycles/batch] F(CPU usage) CPU cycles per batch Predictive Original Reactive

Figure: CDF of the CPU usage per batch

14 / 18

SLIDE 25

Introduction Prediction Method Load Shedding Evaluation Conclusions

Packet Loss

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 10

4

time packets Total DAG drops

(a) Original CoMo

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 10

4

time packets Total DAG drops Unsampled

(b) Reactive Load Shedding

09 am 10 am 11 am 12 pm 01 pm 02 pm 03 pm 04 pm 05 pm 1 2 3 4 5 6 7 8 9 10 11 x 10

4

time packets Total DAG drops Unsampled

(c) Predictive Load Shedding

Figure: Link load and packet drops

15 / 18

SLIDE 26

Introduction Prediction Method Load Shedding Evaluation Conclusions

Accuracy Results

Queries estimate their unsampled output by multiplying their results by the inverse of the sampling rate Errors in the query results (mean ± stdev)

Query

riginal

reactive predictive application (pkts) 55.38% ±11.80 10.61% ±7.78 1.03% ±0.65 application (bytes) 55.39% ±11.80 11.90% ±8.22 1.17% ±0.76 flows 38.48% ±902.13 12.46% ±7.28 2.88% ±3.34 high-watermark 8.68% ±8.13 8.94% ±9.46 2.19% ±2.30 link-count (pkts) 55.03% ±11.45 9.71% ±8.41 0.54% ±0.50 link-count (bytes) 55.06% ±11.45 10.24% ±8.39 0.66% ±0.60 top destinations 21.63 ±31.94 41.86 ±44.64 1.41 ±3.32

16 / 18

SLIDE 27

Introduction Prediction Method Load Shedding Evaluation Conclusions

Outline

1

Introduction Motivation Case Study: Intel CoMo

2

Prediction Method Work Hypothesis Multiple Linear Regression

3

Load Shedding When, Where and How Much

4

Evaluation and Operational Results Performance Results Accuracy Results

5

Conclusions and Future Work

16 / 18

SLIDE 28

Introduction Prediction Method Load Shedding Evaluation Conclusions

Conclusions and Future Work

Effective load shedding methods are now a basic requirement

Rapidly increasing data rates, number of users and complexity of analysis methods

Load shedding operates without knowledge of the traffic queries

Quickly adapts to overload situations by gracefully degrading accuracy via packet and flow sampling

Operational results in a research ISP network show that:

The system is robust to severe overload The impact on the accuracy of the results is minimized

Limitations and Future work

Load shedding methods for queries non robust against sampling Load shedding strategies to maximize the overall system utility Other system resources (memory, disk bandwidth, storage space)

17 / 18

SLIDE 29

Introduction Prediction Method Load Shedding Evaluation Conclusions

Availability

The source code of our load shedding system is publicly available at http://loadshedding.ccaba.upc.edu The CoMo monitoring system is available at http://como.sourceforge.net

Acknowledgments This work was funded by a University Research Grant awarded by the Intel Research Council and the Spanish Ministry of Education under contract TEC2005-08051-C03-01 Authors would also like to thank the Supercomputing Center of Catalonia (CESCA) for giving them access the Catalan RREN

18 / 18

SLIDE 30

Appendix Backup Slides

Outline

6

Backup Slides Load Shedding Algorithm Testbed Scenario Related Work

15 / 18

SLIDE 31

Appendix Backup Slides

Load Shedding Algorithm

Load shedding algorithm (simplified version) pred_cycles = 0; foreach qi in Q do fi = feature_extraction(bi); si = feature_selection(fi, hi); pred_cycles += mlr(fi, si, hi); if avail_cycles < pred_cycles × (1 + error) then foreach qi in Q do bi = sampling(bi, qi, srate); fi = feature_extraction(bi); foreach qi in Q do query_cyclesi = run_query(bi, qi, srate); hi = update_mlr_history(hi, fi, query_cyclesi);

16 / 18

SLIDE 32

Appendix Backup Slides

Testbed Scenario

Equipment and network scenario

2 × Intel R PentiumTM 4 running at 3 GHz 2 × Endace R DAG 4.3GE cards 1 × Gbps link connecting Catalan RREN to Spanish NREN

Executions

Execution Date Time Link load (Mbps) mean/max/min predictive 24/Oct/06 9am:5pm 750.4/973.6/129.0

riginal

25/Oct/06 9am:5pm 719.9/967.5/218.0 reactive 05/Dec/06 9am:5pm 403.3/771.6/131.0

Queries (from the standard distribution of CoMo)

Name Description application Port-based application classification counter Traffic load in packets and bytes flows Per-flow counters high-watermark High watermark of link utilization pattern search Finds sequences of bytes in the payload top destinations List of the top-10 destination IPs trace Full-payload collection

17 / 18

SLIDE 33

Appendix Backup Slides

Related Work

Network Monitoring Systems Only consider a pre-defined set of metrics Filtering, aggregation, sampling, etc. Data Stream Management Systems Define a declarative query language (small set of operators) Operators’ resource usage is assumed to be known Selectively discard tuples, compute summaries, etc.

18 / 18

SLIDE 34

Appendix Backup Slides

Related Work

Network Monitoring Systems Only consider a pre-defined set of metrics Filtering, aggregation, sampling, etc. Data Stream Management Systems Define a declarative query language (small set of operators) Operators’ resource usage is assumed to be known Selectively discard tuples, compute summaries, etc. Limitations Restrict the type of metrics and possible uses Assume explicit knowledge of operators’ cost and selectivity

18 / 18