Self-Monitoring and Assumptions Self-Adapting Systems Performance - - PowerPoint PPT Presentation

self monitoring and assumptions self adapting systems
SMART_READER_LITE
LIVE PREVIEW

Self-Monitoring and Assumptions Self-Adapting Systems Performance - - PowerPoint PPT Presentation

Self-Monitoring and Assumptions Self-Adapting Systems Performance is important. People do not really know how to tune applications and systems. V E R I It would be nice to get some help from the system in tuning. T A S


slide-1
SLIDE 1

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-2
SLIDE 2

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-3
SLIDE 3

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-4
SLIDE 4

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-5
SLIDE 5

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-6
SLIDE 6

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-7
SLIDE 7

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-8
SLIDE 8

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-9
SLIDE 9

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-10
SLIDE 10

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-11
SLIDE 11

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-12
SLIDE 12

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-13
SLIDE 13

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-14
SLIDE 14

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-15
SLIDE 15

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-16
SLIDE 16

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-17
SLIDE 17

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-18
SLIDE 18

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-19
SLIDE 19

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-20
SLIDE 20

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-21
SLIDE 21

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-22
SLIDE 22

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-23
SLIDE 23

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-24
SLIDE 24

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-25
SLIDE 25

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-26
SLIDE 26

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-27
SLIDE 27

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-28
SLIDE 28

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-29
SLIDE 29

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-30
SLIDE 30

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-31
SLIDE 31

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-32
SLIDE 32

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-33
SLIDE 33

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-34
SLIDE 34

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-35
SLIDE 35

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-36
SLIDE 36

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-37
SLIDE 37

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-38
SLIDE 38

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-39
SLIDE 39

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-40
SLIDE 40

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-41
SLIDE 41

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-42
SLIDE 42

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-43
SLIDE 43

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-44
SLIDE 44

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-45
SLIDE 45

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-46
SLIDE 46

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-47
SLIDE 47

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-48
SLIDE 48

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-49
SLIDE 49

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-50
SLIDE 50

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-51
SLIDE 51

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-52
SLIDE 52

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-53
SLIDE 53

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-54
SLIDE 54

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-55
SLIDE 55

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-56
SLIDE 56

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-57
SLIDE 57

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-58
SLIDE 58

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-59
SLIDE 59

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-60
SLIDE 60

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-61
SLIDE 61

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-62
SLIDE 62

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-63
SLIDE 63

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-64
SLIDE 64

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-65
SLIDE 65

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-66
SLIDE 66

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-67
SLIDE 67

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-68
SLIDE 68

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-69
SLIDE 69

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-70
SLIDE 70

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-71
SLIDE 71

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-72
SLIDE 72

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-73
SLIDE 73

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-74
SLIDE 74

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-75
SLIDE 75

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-76
SLIDE 76

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-77
SLIDE 77

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-78
SLIDE 78

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-79
SLIDE 79

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-80
SLIDE 80

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-81
SLIDE 81

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-82
SLIDE 82

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-83
SLIDE 83

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-84
SLIDE 84

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-85
SLIDE 85

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-86
SLIDE 86

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-87
SLIDE 87

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-88
SLIDE 88

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-89
SLIDE 89

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-90
SLIDE 90

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-91
SLIDE 91

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-92
SLIDE 92

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-93
SLIDE 93

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-94
SLIDE 94

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-95
SLIDE 95

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-96
SLIDE 96

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-97
SLIDE 97

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-98
SLIDE 98

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-99
SLIDE 99

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-100
SLIDE 100

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-101
SLIDE 101

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-102
SLIDE 102

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-103
SLIDE 103

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-104
SLIDE 104

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-105
SLIDE 105

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-106
SLIDE 106

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-107
SLIDE 107

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-108
SLIDE 108

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-109
SLIDE 109

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-110
SLIDE 110

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-111
SLIDE 111

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-112
SLIDE 112

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-113
SLIDE 113

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-114
SLIDE 114

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-115
SLIDE 115

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-116
SLIDE 116

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.

slide-117
SLIDE 117

Self-Monitoring and Self-Adapting Systems

Margo I. Seltzer Harvard University Division of Engineering and Applied Sciences

E V I R TA S

Self-Monitoring and Self-Adaptation

Assumptions

  • Performance is important.
  • People do not really know how to tune

applications and systems.

  • It would be nice to get some help from

the system in tuning.

  • Self-monitoring systems gather

information about their own performance.

Self-Monitoring and Self-Adaptation

Outline

  • Self-Monitoring in VINO.
  • Processing monitor data.
  • Adapting to system behavior.
  • Conclusions.

Self-Monitoring and Self-Adaptation

Self-Monitoring in VINO

  • Measurement thread periodically collects

module statistics.

  • Generate detailed profiling information.
  • Capture module inputs (traces) and
  • utputs (logs).
  • In-situ simulation evaluates competing

algorithms and policies.

slide-118
SLIDE 118

Self-Monitoring and Self-Adaptation

Measurement Thread

VINO kernel file system txn system lock system

  • ther

systems

  • ther

systems

  • ther

systems

  • ther

systems measurement thread (graft) get_stats

  • utput data

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache graft points

Self-Monitoring and Self-Adaptation

Generating Traces and Logs

incoming requests

  • utputs

Buffer Cache record parameters pass- through

Self-Monitoring and Self-Adaptation

In-Situ Simulation

incoming requests

  • utputs

Buffer Cache Buffer Cache Simulator simulation results real pass parameters to simulator and

slide-119
SLIDE 119

Self-Monitoring and Self-Adaptation

What do we do with Data?

  • Off-line Analysis
  • Monitors long-term behavior.
  • Identifies common usage profiles.
  • Detects uncommon usage.
  • Suggests thresholds to online system.
  • Conducts feasibility evaluations.
  • Online Analysis
  • Monitor instantaneous resource utilization.
  • Maintain efficiency statistics.
  • Detect dangerous conditions.

Self-Monitoring and Self-Adaptation

Off-line Analysis

  • Use data from measurement thread to

construct time series usage profile.

  • Conduct variance analysis.
  • Construct predicted usage profiles.
  • Determine resource thresholds from

predicted profiles.

  • Notify online system of thresholds.
  • Evaluate traces and logs; derive new

algorithms.

  • Simulate new algorithms, in situ.

Self-Monitoring and Self-Adaptation

Online Analysis

  • Receive threshold and variance

information from off-line system.

  • Maintain dynamic statistics about:
  • Cache hit rates.
  • Lock contention.
  • Disk queue lengths.
  • Load averages.
  • Context switch rates.
  • Detect abnormal behavior.
  • Dynamically trigger trace generation.
  • Trigger adaptation heuristics.

Self-Monitoring and Self-Adaptation

Adaptation Heuristics

  • Goal: decrease application latency.
  • Paging
  • Collect page access trace.
  • Look for well-known patterns (linear, cyclic, strided).
  • Look for page access correlation.
  • Install better prefetching algorithm.
  • Disk Wait
  • Similar process to paging.
  • Replace read-ahead for the application(s).
  • CPU Hogs
  • Examine profile output.
  • Recompile kernel modules in application context.
slide-120
SLIDE 120

Self-Monitoring and Self-Adaptation

Adaptation (continued)

  • Interrupt Latency
  • Measure latency between interrupt arrival and delivery

to process/thread.

  • Look for excessively long intervals or high variance.
  • Check (fix) scheduling priorities.
  • Lock Contention
  • Measure lock wait times.
  • Decrease lock granularity on highly contested items.

Self-Monitoring and Self-Adaptation

Conclusions

  • Self-monitoring is a generally useful

idea.

  • An extensible system just makes it

easier.

  • Automatic adaptation is a cool idea.
  • Challenging to do it correctly.
  • An extensible system makes it easier to

experiment with this.