Using Multi-System Monitoring Time Series to Predict Performance - - PowerPoint PPT Presentation

using multi system monitoring time series to predict
SMART_READER_LITE
LIVE PREVIEW

Using Multi-System Monitoring Time Series to Predict Performance - - PowerPoint PPT Presentation

Using Multi-System Monitoring Time Series to Predict Performance Events Andreas Schrgenhumer Mario Kahlhofer Peter Chalupar Hanspeter Mssenbck Paul Grnbacher 09.11.2018 Motivation t 2 Motivation t 2 Motivation t Train ML 2


slide-1
SLIDE 1

Using Multi-System Monitoring Time Series to Predict Performance Events

Andreas Schörgenhumer Mario Kahlhofer Peter Chalupar Hanspeter Mössenböck Paul Grünbacher 09.11.2018

slide-2
SLIDE 2

Motivation

2

t

slide-3
SLIDE 3

Motivation

2

t

slide-4
SLIDE 4

Train

Motivation

2

t ML

slide-5
SLIDE 5

Train

Motivation

2

t t ML

slide-6
SLIDE 6

Predict Train

Motivation

2

t t ML

slide-7
SLIDE 7

Predict Train

Motivation

2

t t ML

slide-8
SLIDE 8

Predict Train

Motivation

2

t t ML Straightforward:

  • Single system
  • Single component
  • Univariate time series
slide-9
SLIDE 9

Motivation

3

Multiple, interlinked components

slide-10
SLIDE 10

Motivation

3

Multiple, interlinked components Multivariate time series

slide-11
SLIDE 11

Motivation

3

Multiple, interlinked components Multivariate time series Event to data connection

slide-12
SLIDE 12

Motivation

3

Multiple, interlinked components Multivariate time series Event to data connection … Multiple systems

slide-13
SLIDE 13

Motivation

3

Multiple, interlinked components Multivariate time series Event to data connection … Multiple systems ML Train

slide-14
SLIDE 14

Approach

4

Preprocessing Framework Multi- System Data Configs CSVs ML

slide-15
SLIDE 15

Approach

4

Preprocessing Framework Multi- System Data Configs CSVs ML

(1) Data

slide-16
SLIDE 16

Approach

4

Preprocessing Framework Multi- System Data Configs CSVs ML

(1) Data (2) Preprocessing

slide-17
SLIDE 17

Approach

4

Preprocessing Framework Multi- System Data Configs CSVs ML

(1) Data (2) Preprocessing (3) Prediction

slide-18
SLIDE 18

(1) Data

5

Service Host Network Interface Disk * * * * 1 1 System 1 *

slide-19
SLIDE 19

(1) Data

5

Service Host Network Interface Disk * * * * 1 1 System 1 * 250 systems 20-day export

slide-20
SLIDE 20

(1) Data

5

Service Host Network Interface Disk * * * * 1 1 Events System 1 * 250 systems 20-day export Service slowdowns

slide-21
SLIDE 21

(1) Data

5

Service Host Network Interface Disk * * * * 1 1 Events 11 Time Series System 1 * 250 systems 20-day export Service slowdowns CPU load Memory available SWAP available …

slide-22
SLIDE 22

(1) Data

5

Service Host Network Interface Disk * * * * 1 1 Events 11 Time Series 13 Time Series System 1 * 250 systems 20-day export Service slowdowns CPU load Memory available SWAP available … Available Read time Write time …

slide-23
SLIDE 23

(1) Data

5

Service Host Network Interface Disk * * * * 1 1 Events 11 Time Series 13 Time Series 10 Time Series System 1 * 250 systems 20-day export Service slowdowns CPU load Memory available SWAP available … Available Read time Write time … Bytes received Bytes sent Packets dropped … ... 1-minute resolution

slide-24
SLIDE 24

(2) Preprocessing – Framework

6

Preprocessing Framework

slide-25
SLIDE 25

(2) Preprocessing – Framework

  • Input: YAMLs (configurations/configs)
  • Contains all necessary data processing settings
  • Easily changeable due to YAML format

6

Preprocessing Framework

systems:

  • “sys1”
  • “sys2”

timeSeries:

  • CPU_LOAD

from: “2018-01-19 09:00” to: “2018-02-02 09:00” ... leadTime: 0

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 60

step: 1 aggregationFunctions:

  • “AVG”

combinationFunctions:

  • “AVG”

samplingMode: “PER_EVENT” missingDataPointMode: “NAN” addAttributes: true ... systems:

  • “sys1”
  • “sys2”

timeSeries:

  • CPU_LOAD

from: “2018-01-19 09:00” to: “2018-02-02 09:00” ... leadTime: 0

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 60

step: 1 aggregationFunctions:

  • “AVG”

combinationFunctions:

  • “AVG”

samplingMode: “PER_EVENT” missingDataPointMode: “NAN” addAttributes: true ... systems:

  • “sys1”
  • “sys2”

timeSeries:

  • CPU_LOAD

from: “2018-01-19 09:00” to: “2018-02-02 09:00” ... leadTime: 0

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 60

step: 1 aggregationFunctions:

  • “AVG”

combinationFunctions:

  • “AVG”

samplingMode: “PER_EVENT” missingDataPointMode: “NAN” addAttributes: true ...

slide-26
SLIDE 26

(2) Preprocessing – Framework

  • Input: YAMLs (configurations/configs)
  • Contains all necessary data processing settings
  • Easily changeable due to YAML format
  • Output: CSVs (feature vectors)
  • Portable format, directly useable for ML

6

Preprocessing Framework

systems:

  • “sys1”
  • “sys2”

timeSeries:

  • CPU_LOAD

from: “2018-01-19 09:00” to: “2018-02-02 09:00” ... leadTime: 0

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 60

step: 1 aggregationFunctions:

  • “AVG”

combinationFunctions:

  • “AVG”

samplingMode: “PER_EVENT” missingDataPointMode: “NAN” addAttributes: true ... CPU_LOAD:AVG System Label 0.95 sys1 Event 0.71 sys2 No event 0.90 sys2 Event 0.87 sys2 No event 0.84 sys1 No event systems:

  • “sys1”
  • “sys2”

timeSeries:

  • CPU_LOAD

from: “2018-01-19 09:00” to: “2018-02-02 09:00” ... leadTime: 0

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 60

step: 1 aggregationFunctions:

  • “AVG”

combinationFunctions:

  • “AVG”

samplingMode: “PER_EVENT” missingDataPointMode: “NAN” addAttributes: true ... systems:

  • “sys1”
  • “sys2”

timeSeries:

  • CPU_LOAD

from: “2018-01-19 09:00” to: “2018-02-02 09:00” ... leadTime: 0

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 60

step: 1 aggregationFunctions:

  • “AVG”

combinationFunctions:

  • “AVG”

samplingMode: “PER_EVENT” missingDataPointMode: “NAN” addAttributes: true ... CPU_LOAD:AVG System Label 0.95 sys1 Event 0.71 sys2 No event 0.90 sys2 Event 0.87 sys2 No event 0.84 sys1 No event CPU_LOAD:AVG System Label 0.95 sys1 Event 0.71 sys2 No event 0.90 sys2 Event 0.87 sys2 No event 0.84 sys1 No event

slide-27
SLIDE 27

(2) Preprocessing – Config Settings

7

Setting Example Systems

[sys1, sys2, ...]

Time series

[Host: CPU_LOAD, Disk: AVAILABLE, ...]

Time frame

From: 19-01-2018 09:00 To: 02-02-2018 09:00

Sampling mode

PER_EVENT, SLIDE_THROUGH

Negative sampling source

NON_EVENT_SERVICES, EVENT_SERVICES, ...

Lead time

10 min

Observation windows

60 min, AVG aggregation, AVG combination

Missing data mode

DROP, NAN, LAST_VALUE, ...

Metadata

System, special attributes, ...

... ...

slide-28
SLIDE 28

(2) Preprocessing – Example

8

... samplingMode: “PER_EVENT” leadTime: 5

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 15

aggregationFunctions:

  • “MIN”
  • “MAX”

... DISK_WRITE:

  • size: 30

aggregationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “STD_DEV”

combinationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “AVG”

BYTES_SENT:

  • size: 5

aggregationFunctions:

  • “NONE”

...

  • size: 30

aggregationFunctions:

  • “AVG”

... ...

Service Host Disk 1 Disk 2 Disk 3 Network

slide-29
SLIDE 29

(2) Preprocessing – Example

8

... samplingMode: “PER_EVENT” leadTime: 5

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 15

aggregationFunctions:

  • “MIN”
  • “MAX”

... DISK_WRITE:

  • size: 30

aggregationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “STD_DEV”

combinationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “AVG”

BYTES_SENT:

  • size: 5

aggregationFunctions:

  • “NONE”

...

  • size: 30

aggregationFunctions:

  • “AVG”

... ...

Service Host Disk 1 Disk 2 Disk 3 Network

slide-30
SLIDE 30

(2) Preprocessing – Example

8

... samplingMode: “PER_EVENT” leadTime: 5

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 15

aggregationFunctions:

  • “MIN”
  • “MAX”

... DISK_WRITE:

  • size: 30

aggregationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “STD_DEV”

combinationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “AVG”

BYTES_SENT:

  • size: 5

aggregationFunctions:

  • “NONE”

...

  • size: 30

aggregationFunctions:

  • “AVG”

... ...

Service Host Disk 1 Disk 2 Disk 3 Network

slide-31
SLIDE 31

(2) Preprocessing – Example

8

... samplingMode: “PER_EVENT” leadTime: 5

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 15

aggregationFunctions:

  • “MIN”
  • “MAX”

... DISK_WRITE:

  • size: 30

aggregationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “STD_DEV”

combinationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “AVG”

BYTES_SENT:

  • size: 5

aggregationFunctions:

  • “NONE”

...

  • size: 30

aggregationFunctions:

  • “AVG”

... ...

Service Host Disk 1 Disk 2 Disk 3 Network

slide-32
SLIDE 32

(2) Preprocessing – Example

8

... samplingMode: “PER_EVENT” leadTime: 5

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 15

aggregationFunctions:

  • “MIN”
  • “MAX”

... DISK_WRITE:

  • size: 30

aggregationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “STD_DEV”

combinationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “AVG”

BYTES_SENT:

  • size: 5

aggregationFunctions:

  • “NONE”

...

  • size: 30

aggregationFunctions:

  • “AVG”

... ...

Service Host Disk 1 Disk 2 Disk 3 Network

slide-33
SLIDE 33

(2) Preprocessing – Example

8

... samplingMode: “PER_EVENT” leadTime: 5

  • bservationWindowsBoxes:

CPU_LOAD:

  • size: 15

aggregationFunctions:

  • “MIN”
  • “MAX”

... DISK_WRITE:

  • size: 30

aggregationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “STD_DEV”

combinationFunctions:

  • “AVG”
  • “MIN”
  • “MAX”
  • “AVG”

BYTES_SENT:

  • size: 5

aggregationFunctions:

  • “NONE”

...

  • size: 30

aggregationFunctions:

  • “AVG”

... ...

Service Host Disk 1 Disk 2 Disk 3 Network

slide-34
SLIDE 34

(3) Prediction

9

t Preprocessing Framework 20 days, 250 systems, 34 time series

slide-35
SLIDE 35

(3) Prediction

9

t Preprocessing Framework 14d Train 6d Test 20 days, 250 systems, 34 time series

slide-36
SLIDE 36

(3) Prediction

9

t Preprocessing Framework 14d Train 6d Test 20 days, 250 systems, 34 time series Lead time: 0 Windows:

  • Size: [5, 10, 15, 30, 60]
  • Aggregation: min,

max, mean, stddev

slide-37
SLIDE 37

(3) Prediction

9

t Preprocessing Framework 14d Train 6d Test 20 days, 250 systems, 34 time series ML: Random Forest ~3400 FVs ~2100 FVs Lead time: 0 Windows:

  • Size: [5, 10, 15, 30, 60]
  • Aggregation: min,

max, mean, stddev

slide-38
SLIDE 38

(3) Prediction

9

t Preprocessing Framework 14d Train 6d Test 20 days, 250 systems, 34 time series ML: Random Forest ~3400 FVs ~2100 FVs Lead time: 0 Windows:

  • Size: [5, 10, 15, 30, 60]
  • Aggregation: min,

max, mean, stddev Metric Value Accuracy 0.81 Recall 0.81 Precision 0.82 FPR 0.15 F1 Score 0.81

slide-39
SLIDE 39

Future Work

  • More training and testing:
  • Splits
  • Config settings
  • System generalization

10

slide-40
SLIDE 40

Future Work

  • More training and testing:
  • Splits
  • Config settings
  • System generalization
  • Other ML models

10

slide-41
SLIDE 41

Future Work

  • More training and testing:
  • Splits
  • Config settings
  • System generalization
  • Other ML models
  • Other events

10

slide-42
SLIDE 42

Using Multi-System Monitoring Time Series to Predict Performance Events

Andreas Schörgenhumer Mario Kahlhofer Peter Chalupar Hanspeter Mössenböck Paul Grünbacher 09.11.2018