A Receding Horizon Approach for the Runtime Management of IaaS - - PowerPoint PPT Presentation

a receding horizon approach for the runtime management of
SMART_READER_LITE
LIVE PREVIEW

A Receding Horizon Approach for the Runtime Management of IaaS - - PowerPoint PPT Presentation

A Receding Horizon Approach for the Runtime Management of IaaS Cloud Systems www.modaclouds.eu Danilo Ardagna, Michele Ciavotta {danilo.ardagna, michele.ciavotta}@polimi.it Politecnico di Milano Riccardo Lancellotti


slide-1
SLIDE 1

A Receding Horizon Approach for the Runtime Management of IaaS Cloud Systems

Danilo Ardagna, Michele Ciavotta {danilo.ardagna, michele.ciavotta}@polimi.it Politecnico di Milano Riccardo Lancellotti riccardo.lancellotti@unimore.it Università di Modena e Reggio Emilia

www.modaclouds.eu

slide-2
SLIDE 2

∗ Introduction ∗ Problem

∗ Problem statement and design assumption ∗ Receding Horizon algorithm

∗ Experimental Analysis ∗ Conclusions

2

Agenda

slide-3
SLIDE 3

The advent of Cloud Computing changed dramatically the ICT industry

∗ Google, Amazon, Microsoft, Salesforce, Oracle, SAP, SoftLayer, Rackspace etc… ∗ Cost-efgective solutions ∗ Computational power ∗ Reliability ∗ Auto-scaling

New business paradigms appeared on the market ∗ IaaS, PaaS, SaaS ∗ But also DaaS, BDaaS, HDaaS, etc…

3

Introduction

slide-4
SLIDE 4

The growing popularity of Cloud Computing

  • pens new challenges

∗ Vendor lock-in ∗ Design for Quality of Service (QoS) guarantees ∗ Managing the lifecycle of a Cloud application ∗ Managing Elasticity ∗ Resource Provisioning ∗ Self-adaptation

4

Introduction: challenges

slide-5
SLIDE 5

Resource Provisioning: mechanism for leasing and releasing virtual cloud resources to guarantee adequate QoS … it requires management solutions that support ∗ Performance prediction, ∗ Monitoring of Service Level Agreements (SLA), ∗ Adaptive re-confjguration actions. T

  • ols currently supplied by IaaS providers, are often too

basic and inadequate for ∗ Highly variable workload, ∗ Applications with a dynamic behavior characterized by uncertainty.

5

Introduction: resource provisioning

slide-6
SLIDE 6

Proposal: a fast and efgective Capacity Allocation technique ∗ based on the Receding Horizon control strategy ∗ integrated within MODAClouds runtime platform ∗ that minimizes the execution costs of a Cloud application, ∗ guaranteeing QoS constraints expressed in terms of average response time

6

Introduction: our approach

slide-7
SLIDE 7

∗ Introduction ∗ Problem

∗ Problem statement and design assumption ∗ Receding Horizon algorithm

∗ Experimental Analysis ∗ Conclusions

7

Agenda

slide-8
SLIDE 8

Perspective of a Software-as-a-Service (SaaS) provider hosting his/her applications on an Infrastructure-as-a-Service (IaaS) provider

Applications are single tier hosted in virtual machines (VMs) that are dynamically instantiated by the IaaS provider Each VM hosts a single WS application Multiple homogeneous VMs implementing the same WS application can run in parallel

Problem: design assumptions

slide-9
SLIDE 9

Problem: design assumptions

Each WS class hosted in a VM is modeled as an M/G/1 queue in tandem with a delay center SLA based on the average response time: every WS class has to provide a response time lower than a threshold

slide-10
SLIDE 10

IaaS providers charge software providers on an hourly basis

∗ reserved VMs ( time-unit cost) ∗ on demand VMs ( time-unit cost )

Time management:

∗ Time slots: (5, 10 min) ∗ Time window: ( 1-5 ) ∗ Charging interval: (60 min)

Problem: design assumptions

slide-11
SLIDE 11

Problem: formulation

Time unit costs Time management Freely available VMs Workload prediction CA plan

slide-12
SLIDE 12

The CA problem can be formulated as: Subject to the conditions:

12

Problem: formulation

Response time Total cost limited number

  • f reserved VMs
slide-13
SLIDE 13

Optimizer Optimization Model Cloud Application

Solve

Optimal solution

Clock

First slot configuration Receding Horizon controller

Predicted workload

Monitoring Platform

Update Model Parameters

IaaS Interface

( r

1 k , d 1 k )

( b ⇤

1 k , . . . , b

n w k

)

In a nutshell, the Capacity Allocation problem is solved for every time slot in but only the actions concerning the fjrst forthcoming time slot are enacted.

13

Receding Horizon Algorithm

slide-14
SLIDE 14

14

Receding Horizon Algorithm

slide-15
SLIDE 15

∗ Introduction ∗ Problem

∗ Problem statement and design assumption ∗ Receding Horizon algorithm

∗ Experimental Analysis ∗ Conclusions

15

Agenda

slide-16
SLIDE 16

Scalability:

∗ Large set of randomly generated instances ∗ Daily distribution of requests from real log traces

Comparison with state of the art approaches:

∗ Heuristic ∗ Oracle with perfect knowledge of the future

Time scale analysis:

∗ SLA violations

16

Experimental Analysis

slide-17
SLIDE 17

Workload prediction

∗ Incoming workload has been obtained for traces of a very large dynamic web-based system ∗ Difgerent workload for each WS class ∗ Prediction obtained by adding white noise to each sample ∗ Noise proportional to the arrival rate ∗ Inaccuracy increases with the time slot

Performance parameters

∗ Service rate ∗ Queueing delay ∗ Reserved instances

Instance cost

∗ Randomly generated considering prices currently charged by common IaaS providers

17

Experiment Design

slide-18
SLIDE 18

T raffjc profjles:

∗ Normal workload with low noise ∗ Normal workload with high noise ∗ Spiky workload with low noise ∗ Spiky workload with high noise

The difgerent levels of noise corresponds to:

18

Experiment Design

slide-19
SLIDE 19

19

Scalability

The analysis demonstrated that our approach scales almost linearly with respect to the number of request classes. Systems up to 160 classes and 5 time slots can be solved in less than 200 sec.

slide-20
SLIDE 20

180 230 280 330 380 430 480 530 1 2 3 4

C

  • s

t[$ ] T

w

S-t Algorithm Heu (40,50) Heu (50,60) Heu (60,80) Oracle

Cost – Normal traffjc

  • 10 minutes time

scale

  • Low noise level

Costs comparison

slide-21
SLIDE 21

180 280 380 480 580 680 780 880 1 2 3 4 5

C

  • s

t[$ ] T

w

S-t Algorithm Heu (40,50) Heu (50,60) Heu (60,80) Oracle

21

Cost – Spiky traffjc

  • 5 minutes time

scale

  • Low noise level

Costs comparison

slide-22
SLIDE 22

Goal: evaluate the impact of time scale on the proposed receding horizon algorithm Analyses have been supported by a discrete event simulator based on the Omnet++ framework created on purpose.

∗ able to capture the time-varying performance degradation due to resource contention via Random Environments (REs)

Performance indicators considered:

∗ SLA violation (the percentage of time slots where the average response time exceeds the SLA thresholds ∗ Dropped request (the percentage of requests dropped as a result of the fjnite queue length)

22

Time scale analysis

0" 1000" 2000" 3000" 4000" 5000" 6000" 7000" R E S P O N S E 'T IME '( m s )' T IME '(m in )' 0" " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " 10" " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "" " " " " " " " " " " " " " " " " " " " " " " 20

slide-23
SLIDE 23

23

Time scale analysis

The values are related to a 24 hours analysis with low noise and averaged over 10 executions. A control time granularity of 5 minutes tends to provide better performance if compared to granularity of 10 minutes both in terms of SLA violations and in terms of dropped requests.

slide-24
SLIDE 24

∗ Introduction ∗ Problem

∗ Problem statement and design assumption ∗ Receding Horizon algorithm

∗ Experimental Analysis ∗ Conclusions

24

Agenda

slide-25
SLIDE 25

We proposed optimization approach to achieve fast, scalable and efgective capacity allocation based on a fjne grained time scale Our technique is able to minimize costs in a more effjcient way than the current state of the art The QoS defjned into the SLA is almost always respected (less than 2% and 7 min) Future works:

∗ development of an adaptive approach able to switch between difgerent time scales according to the workload conditions ∗ T est on a real prototype environment

25

Conclusions and Future Works

slide-26
SLIDE 26

26

Thank You!

Questions ?