[PPT] - Self-Monitoring and Assumptions Self-Adapting Systems Performance PowerPoint Presentation

SLIDE 1

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 2

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 3

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 4

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 5

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 6

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 7

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 8

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 9

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 10

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 11

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 12

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 13

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 14

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 15

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 16

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 17

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 18

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 19

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 20

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 21

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 22

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 23

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 24

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 25

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 26

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 27

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 28

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 29

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 30

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 31

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 32

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 33

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 34

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 35

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 36

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 37

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 38

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 39

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 40

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 41

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 42

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 43

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 44

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 45

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 46

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 47

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 48

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 49

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 50

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 51

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 52

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 53

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 54

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 55

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 56

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 57

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 58

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 59

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 60

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 61

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 62

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 63

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 64

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 65

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 66

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 67

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 68

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 69

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 70

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 71

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 72

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 73

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 74

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 75

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 76

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 77

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 78

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 79

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 80

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 81

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 82

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 83

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 84

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 85

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 86

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 87

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 88

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 89

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 90

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 91

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 92

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 93

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 94

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 95

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 96

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 97

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 98

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 99

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 100

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 101

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 102

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 103

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 104

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 105

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 106

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 107

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 108

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 109

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 110

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 111

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 112

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 113

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 114

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 115

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 116

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to

experiment with this.

SLIDE 117

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

Performance is important.
People do not really know how to tune

applications and systems.

It would be nice to get some help from

the system in tuning.

Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

Self-Monitoring in VINO.
Processing monitor data.
Adapting to system behavior.
Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

Measurement thread periodically collects

module statistics.

Generate detailed profiling information.
Capture module inputs (traces) and
utputs (logs).
In-situ simulation evaluates competing

algorithms and policies.

SLIDE 118

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

ther

systems

ther

systems

ther

systems

ther

systems measurement thread (graft) get_stats

utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

SLIDE 119

Self-Monitoring and Self-Adaptation

What do we do with Data?

Off-line Analysis
Monitors long-term behavior.
Identifies common usage profiles.
Detects uncommon usage.
Suggests thresholds to online system.
Conducts feasibility evaluations.
Online Analysis
Monitor instantaneous resource utilization.
Maintain efficiency statistics.
Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

Use data from measurement thread to

construct time series usage profile.

Conduct variance analysis.
Construct predicted usage profiles.
Determine resource thresholds from

predicted profiles.

Notify online system of thresholds.
Evaluate traces and logs; derive new

algorithms.

Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

Receive threshold and variance

information from off-line system.

Maintain dynamic statistics about:
Cache hit rates.
Lock contention.
Disk queue lengths.
Load averages.
Context switch rates.
Detect abnormal behavior.
Dynamically trigger trace generation.
Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

Goal: decrease application latency.
Paging
Collect page access trace.
Look for well-known patterns (linear, cyclic, strided).
Look for page access correlation.
Install better prefetching algorithm.
Disk Wait
Similar process to paging.
Replace read-ahead for the application(s).
CPU Hogs
Examine profile output.
Recompile kernel modules in application context.

SLIDE 120

Self-Monitoring and Self-Adaptation

Adaptation (continued)

Interrupt Latency
Measure latency between interrupt arrival and delivery

to process/thread.

Look for excessively long intervals or high variance.
Check (fix) scheduling priorities.
Lock Contention
Measure lock wait times.
Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

Self-monitoring is a generally useful

idea.

An extensible system just makes it

easier.

Automatic adaptation is a cool idea.
Challenging to do it correctly.
An extensible system makes it easier to