The Ocano Project The Ocano Project Intelligent eUtilities - - PowerPoint PPT Presentation

the oc ano project the oc ano project
SMART_READER_LITE
LIVE PREVIEW

The Ocano Project The Ocano Project Intelligent eUtilities - - PowerPoint PPT Presentation

Oceano The Ocano Project The Ocano Project Intelligent eUtilities Infrastructure Intelligent eUtilities Infrastructure A Self-Managed Server Farm A Self-Managed Server Farm Germn Goldszmidt Germn Goldszmidt (gsg@ us.ibm.com) (gsg@


slide-1
SLIDE 1

The Océano Project The Océano Project

Intelligent eUtilities Infrastructure Intelligent eUtilities Infrastructure

A Self-Managed Server Farm A Self-Managed Server Farm

Germán Goldszmidt Germán Goldszmidt

(gsg@ us.ibm.com) (gsg@ us.ibm.com) and and The

The

O céano Team

O céano Team IBM Research IBM Research

September 2001 September 2001

Oceano IBM Confidential 1 09/24/0111:45 AM

slide-2
SLIDE 2

Océano Océano

  • Presentation Outline
  • Presentation Outline

Motivation S ample S cenario Architecture Components Status

Oceano IBM Confidential 2 09/24/0111:45 AM

slide-3
SLIDE 3

Multi-Customer Farms: Today Multi-Customer Farms: Today

Today

Macys

Problems

N on-shared dedicated hardware

for each customer

O ver provisioning

(peak loads 10:1)

Lack rapid response to demand

TCO high

Administration

SportsWeb

XYZ WXY

Independent Islands

Oceano IBM Confidential 3 09/24/0111:45 AM

slide-4
SLIDE 4

Océano Océano Farms: Future

Farms: Future

Virtualize the hardware Unified management

SportsWeb

Characteristics Provisioning Platform S hared Infrastructure

Isolation for each realm

peaks covered (autonomic)

rapid allocation of resources

Automation

reduce administration cost

Macys

Future

XYZ

WXY

Oceano IBM Confidential 4 09/24/0111:45 AM

slide-5
SLIDE 5

Océano Océano Objectives Objectives

Efficient infrastructure for eUtilities

Multi-customer hosting on a virtualized collection of resources Drive down people management costs via automation Handle spiky workloads ; provide capacity on demand

Automated, fast add/remove [clean, secure] servers, bandwidth, storage

Create Infrastructure SLA (ISLA) contracts support dynamic resource allocation model

IS LA monitoring and enforcement

S calable and highly available

Technology applies to several environments:

N etGen S Ps, Enterprises, ...

Oceano IBM Confidential 5 09/24/0111:45 AM

slide-6
SLIDE 6

Flow of Requests into Server Farm

Dolphins Whales

Router SND

Requests Internet

Free Free Free Free Free Free Free Free Free

Macys Macys Macys Macys Macys Macys Macys

WES Network Dispatcher

Oceano Resource Control Admin

WS WS WS WS WS WS WS WS WS Free Free Free Free Free Free Free Free Free

Oceano IBM Confidential 6 09/24/0111:46 AM

slide-7
SLIDE 7

WS

Router SND

Requests

Internet

Free Free Free Free Free Free Free Free Free

Macys Macys Macys Macys Macys Macys Macys

WS WS WS WS

M M

Performance Metrics

SLA Monitor Yemanja

Escalate Event

Neptune

WS WS WS WS WS WS WS WS WS

Oceano Resource Control Admin

ISLA Monitoring ISLA Monitoring

Oceano IBM Confidential 7 09/24/0111:47 AM

slide-8
SLIDE 8

Free Free Free Free Free Free Free Free Free

Macy's +2

Prime Di, Dj for Macy's Resource Allocation

(Neptune)

Analysis Decision

Server Manager

(e-clams)

Selection Priming

(Khnum)

Preparation

Macys Macys

Free Free Free Free Free Free Free

Oc Océ éano ISLA-based resource ano ISLA-based resource reallocation reallocation

Oceano IBM Confidential 8 09/24/0111:47 AM

slide-9
SLIDE 9

Dolphins Whales

Router SND

Requests Internet

Free Free Free Free Free Free Free Free Free

Macys Macys Macys Macys Macys Macys Macys

Network Dispatcher + HarborMaster

Oceano Resource Control Admin

WS WS WS WS WS WS WS WS WS

Macys Macys

After the addition of 2 servers After the addition of 2 servers

Free Free Free Free Free Free Free Free Free

Oceano IBM Confidential 9 09/24/0111:48 AM

slide-10
SLIDE 10

Existing Components

Policy

Infrastructure

Resource Control

WES Network Dispatcher

DB/2

AFS

Océano Océano Components Components

HA/HB/Topology/VLANs

(GulfStream)

Traffic Throttling

(HarborMaster)

Priming/FS

(Khnum)

Server Management

(e-Clams)

Resource Coordination

(Neptune)

Event Correlation

(Yemanja)

  • App. monitor

(Nautilus)

Configuration

(Kelp)

Server Monitors

GUI

(Pismo)

ISLA Monitor

(Yemanja)

Hardware - Netfinity/Linux RS6K/AIX - LAN Switch

ResourceAllocation (Fortuna)

ISLA Contracts (Salmon)

Oceano IBM Confidential 10 09/24/0111:48 AM

slide-11
SLIDE 11

Policy Layer Components Policy Layer Components

S almon

Contract definition, pricing, billing

Yemanja

IS LA monitor Problem Determination (event correlation)

N eptune

Resource coordination

Fortuna

Intelligent Proactive Allocation

Oceano IBM Confidential 11 09/24/0111:48 AM

slide-12
SLIDE 12

Salmon - Service Agreement Levels for Salmon - Service Agreement Levels for Monitoring Océano coNtracts Monitoring Océano coNtracts

IS LA contract definition IS LA Manager

Response Automation Violation Detection Violator, Grace Period, Action/Penalty

Pricing engine:

Flat-rate, Usage-based and penalties for violation S tandard Equations: Charges: Contract Flat-rate, Usage-Based, S ub Contract Addition, Penalty per Violation and prediction queries Futures and Options

3 Monitors 2.1 10 GUI Interface Contract Builder DB ISLA Manager Pricing Engine Reports 1 7 6 5 4 9 8 11 Contract Evaluator 2.2

Off-line activities On-line activities

Yemanja Oceano IBM Confidential 12 09/24/0111:48 AM

slide-13
SLIDE 13

Yemanja - Event Correlation Yemanja - Event Correlation

problem determination hierarchical event correlation

hardware faults, application faults, ISLA performance violations

policy monitoring and violation detection integrate detection with performance monitoring and problem determination automated violation handling alert resource manager (N eptune)

  • pen problem records

Oceano IBM Confidential 13 09/24/0111:48 AM

slide-14
SLIDE 14

Difficult to capture complex problem scenarios

Need system that integrates high level SLA violation with low level network monitoring

Need method to propagate problems to all effected system

recognize effected components resist hard coding of dependency information

hard to anticipate all effected components component models become large and unmanageable, adding new components can effect preexisting component models

cancel dependent problems when initial problem is fixed

Simultaneous faults Uncertainty in causal implications

Lost and spurious alarms Need for integrated testing Scenario waits

Dynamic system configuration changes

Yemanja - Problems to be addressed Yemanja - Problems to be addressed

Oceano IBM Confidential 14 09/24/0111:48 AM

slide-15
SLIDE 15

Yemanja Features

SLA violation detection integrated into correlation rules Rules can contain a mix of methods and events

This allows for the collection of additional data, or the analysis of state information before the complete set of required events have arrived

Associate and rank rules that represent alternate solutions to the same set of events Events propagate through the abstract component dependency chain using publish-subscribe semantics Built in problem database

canceling root problem, cancels dependent problems automatically

Flexible way to collapse multiple events of the same type to a single set based event specification

Can require that some % of resources in a resource-set generate the selected event

Oceano IBM Confidential 15 09/24/0111:48 AM

slide-16
SLIDE 16

Reactive resource allocation

plan based allocates servers, bandwidth

Reacts to

performance problems component failures ISLA violations

Activated by Yemanja

Neptune Neptune

Oceano IBM Confidential 16 09/24/0111:48 AM

slide-17
SLIDE 17

Goals:

Improve Performance + Maximize Revenue Planned + Reactive

Planned: use prediction of periodic traffic patterns Construct a resource allocation plan (e.g. for the next 24 hours). Reactive: (de)Allocation based on current load Correct initial plan give feedback to improve the prediction/analysis. O perate in a fully reactive mode for a new customer or if the system observes unexpected behavior

Fortuna - Resource Allocation Strategy Fortuna - Resource Allocation Strategy

Oceano IBM Confidential 17 09/24/0111:48 AM

slide-18
SLIDE 18

Preliminary Example of a Layered ISLA

Customer ISLA

always strong weak Best effort Levels of guarantee:

Units

  • f server

capacity

1

2 3

monitor

  • céano

Current load How Many Server To Allocate?

Required capacity range

4 5 6 7 8 10 16

Oceano IBM Confidential 18 09/24/0111:48 AM

slide-19
SLIDE 19

Layered IS LA

Current state depends on the required server capacity, and state parameters (layer i):

maxi servers – the layer’s boundary Charge for capacity, time unit Ci Penalty for a violation in this layer Pi size depends on the level

  • f guarantee

Options

Exercised implicitly according to measured

load.

Price of an option depends on the level of guarantee, the capacity (maxi) and can also depend

  • n the expected usage.

ISLAs and Revenues ISLAs and Revenues

Oceano IBM Confidential 19 09/24/0111:48 AM

slide-20
SLIDE 20

Scenario Scenario

Active Scenario: Scenario 1: {[Server_Set(4, 4, 2)], 00:00 Dec/01/2000, 23:59 Dec/31/2001, 1}

2 4 6 8 10

  • No. of

servers T1 T2 T3 T4 Time Ceiling Guaranteed- Scalability Floor 5 10 T1 T2 T3 T4

  • No. of servers

Servers Allocated Servers Requested

Violation Over Provisioning

Allocation on the 3 levels of guarantee Resources Requested X Resources Allocated

Definition Level Monitoring Level Charging Level

Usage-Based Charge Calculation

Oceano IBM Confidential 20 09/24/0111:48 AM

slide-21
SLIDE 21

Resource Control Layer Resource Control Layer

eClams

server allocation/reclamation

Khnum

application and data priming

HarborMaster

bandwidth management (request throttling)

Pismo-Beach

GUI

Oceano IBM Confidential 21 09/24/0111:48 AM

slide-22
SLIDE 22

eClams eClams

  • Server Pool M

Server Pool Management anagement

Functions: (De) Allocation and priming support for servers

Automatic network installation

  • f OS

(e.g. LUI)

Future:

heterogenous server management

server specific characteristics

server capacity, server attached resources

Free Operational OS priming Dirty Appl priming

eClams mgr eClams agent Khnum (application priming) eClams mgr Khnum Resource Coordinator Nautilus(monitoring)

Oceano IBM Confidential 22 09/24/0111:48 AM

slide-23
SLIDE 23

Khnum: App + Data deployment Khnum: App + Data deployment

shared file system

AFS-based (prototype) few things are kept on local disk

Cache Pre-loading Removes load from File S erver "near local" performance Multicasting (MTFTP) keeps network utilization low multiple servers at once Cooperative caching Accessing "neighbor's" memory faster than disk

Oceano IBM Confidential 23 09/24/0111:48 AM

slide-24
SLIDE 24

N o need for special hardware Portable extension to W ebsphere Edge S erver

(aka N etwork Dispatcher);

Load Balancing of Requests

Automatic Dynamic Reconfiguration of W ES Throttle TCP connections drop a percentage of requests use of overflow server

HarborMaster - HarborMaster - W orkload Balancing

W orkload Balancing

Oceano IBM Confidential 24 09/24/0111:48 AM

slide-25
SLIDE 25

GUI - Pismo Beach GUI - Pismo Beach

Displays: current status of servers, customer allocations, performance history, significant Océano events, component based tracing information

Oceano IBM Confidential 25 09/24/0111:48 AM

slide-26
SLIDE 26

Infrastructure Layer Components Infrastructure Layer Components

GulfS tream

Infrastructure monitoring N etwork reconfiguration

N autilus

Application monitoring

Kelp

configuration data management

Oceano IBM Confidential 26 09/24/0111:48 AM

slide-27
SLIDE 27

GulfSteam GulfSteam

Function:

global topology discovery via adapter discovery and correlation failure detection via periodic heartbeats reporting up/down status of components automatic reconfiguration of VLAN s

Future:

handling configuration changes (new/changed hardware, network rewiring, maintenance, etc.) S calability to 1000's of nodes (BlueS tream)

Oceano IBM Confidential 27 09/24/0111:48 AM

slide-28
SLIDE 28

Nautilus Nautilus

Functions:

Application Monitoring Traffic class specific monitoring Response time, data output and request rate Generates threshold based alerts N ode metrics correlated with traffic monitoring Content Based Throttling Admission control of requests based on request content

Oceano IBM Confidential 28 09/24/0111:49 AM

slide-29
SLIDE 29

Kelp Kelp

Functions:

configuration database + access methods Dynamic reconfiguration and IS LA data Object based interface to data

local cache of data and interrelationships

Future:

automated cache updates (trigger driven) additional verification

Oceano IBM Confidential 29 09/24/0111:49 AM

slide-30
SLIDE 30

Prototype Status (Sept 2001) Prototype Status (Sept 2001)

Lab:

81 servers: 75 Intel/Linux, 6 RS 6K/AIX, CIS CO CAT6509

Linux/AIX prototype of O céano supporting:

S ervice level monitoring (simple IS LAs) Autonomic allocation of Linux servers Automatic, scalable priming Bandwidth management Automatic discovery of network connectivity VLAN management

  • Appl. monitoring and content based throttling (Apache)

Display of status, history, performance, events

Oceano IBM Confidential 30 09/24/0111:49 AM

slide-31
SLIDE 31

Backup Foils

Oceano IBM Confidential 31 09/24/0111:49 AM

slide-32
SLIDE 32

Present and Future Present and Future

Fixing the Prototype Customer Pilots

FO AK, ...

Product Plans

In progress... Raquarium Indian, Pacific, Artic

eS erver

eLiza, ...

V2

Oceano IBM Confidential 32 09/24/0111:49 AM