Monitoring and Workflow management Monitoring and Workflow - - PowerPoint PPT Presentation

monitoring and workflow management monitoring and
SMART_READER_LITE
LIVE PREVIEW

Monitoring and Workflow management Monitoring and Workflow - - PowerPoint PPT Presentation

Monitoring and Workflow management Monitoring and Workflow management in large distributed systems in large distributed systems March 2011 1 The MonALISA Framework The MonALISA Framework MonALISA is a Dynamic, Distributed Service System


slide-1
SLIDE 1

1

Monitoring and Workflow management Monitoring and Workflow management in large distributed systems in large distributed systems

March 2011

slide-2
SLIDE 2

Iosif Legrand March 2011

2 2

The MonALISA Framework The MonALISA Framework

  • MonALISA is a Dynamic, Distributed Service System capable to collect

any type of information from different systems, to analyze it in near real time and to provide support for automated control decisions and global

  • ptimization of workflows in complex grid systems.
  • The MonALISA system is designed as an ensemble of autonomous

multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of monitoring tasks. These agents can analyze and process the information, in a distributed way, and to provide optimization decisions in large scale distributed applications.

slide-3
SLIDE 3

3

The MonALISA Architecture The MonALISA Architecture

3

Regional or Global High Level Regional or Global High Level Services, Services, Repositories & Clients Repositories & Clients Secure and reliable communication Secure and reliable communication Dynamic load balancing Dynamic load balancing Scalability & Replication Scalability & Replication AAA for Clients AAA for Clients Distributed Dynamic Distributed Dynamic Registration and Discovery- Registration and Discovery- based on a lease based on a lease mechanism and remote events mechanism and remote events

JINI-Lookup Services Secure & Public MonALISA services Proxies HL services Agents Network of

Distributed System for gathering and Distributed System for gathering and analyzing information based on analyzing information based on mobile agents: mobile agents: Customized aggregation, Triggers, Customized aggregation, Triggers, Actions Actions

Fully Distributed System with no Single Point of Failure

Iosif Legrand March 2011

slide-4
SLIDE 4

4

MonALISA Service & Data Handling MonALISA Service & Data Handling

4

Data Store Data Cache Service & DB Configuration Control (SSL)

Predicates & Agents Data (via ML Proxy)

Applications

Clients or Higher Level Services

WS Clients and service

Web Service WSDL SOAP

Lookup Service Lookup Service

Registration D i s c

  • v

e r y

Postgres AGENTS AGENTS FILTERS / TRIGGERS FILTERS / TRIGGERS

Monitoring Modules Monitoring Modules

Collects any type of information

Dynamic Loading

Push and Pull

Iosif Legrand March 2011

slide-5
SLIDE 5

5

Lookup Service

Registration / Discovery Registration / Discovery Admin Access and AAA for Clients Admin Access and AAA for Clients

MonALISA Service

Lookup Service Client (other service)

Discovery Registration (signed certificate)

MonALISA Service MonALISA Service

Services Proxy Multiplexer Services Proxy Multiplexer Client (other service)

Admin SSL connection Trust keystore AAA services Client authentication

Data Data Filters & Agents Filters & Agents

Trust keystore

Application Applications

Iosif Legrand March 2011

slide-6
SLIDE 6

6

Monitoring Grid sites, Running Jobs, Monitoring Grid sites, Running Jobs, Network Traffic, and Connectivity Network Traffic, and Connectivity

6

TOPOLOGY JOBS ACCOUNTING

Running Jobs

Iosif Legrand March 2011

slide-7
SLIDE 7

7

Monitoring CMS Jobs Worldwide Monitoring CMS Jobs Worldwide

CMS is using MonALISA and ApMon to monitor all the production and analysis

  • jobs. This information is than used in the CMS dashboard frontend

More than 3 years continuous operation without any problems

Organize and structure Monitoring Information

Rate of collected monitoring values Total Collected values Collected ~5* 1010 monitoring values in the last 12 months Rates up to more than 6000 values per second Lost in UDP < 5*10-6

Iosif Legrand March 2011

slide-8
SLIDE 8

8

Monitoring CMS Jobs Worldwide Monitoring CMS Jobs Worldwide

User-level task monitoring

Organize and structure Monitoring Information

Iosif Legrand March 2011

slide-9
SLIDE 9

9

Monitoring CMS Jobs Worldwide Monitoring CMS Jobs Worldwide

User-level task monitoring

Organize and structure Monitoring Information

Iosif Legrand March 2011

slide-10
SLIDE 10

10

Monitoring architecture in ALICE Monitoring architecture in ALICE

10

Long History DB

LCG Tools

MonALISA @Site

ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn Job Agent

MonALISA @CERN MonALISA LCG Site

ApMon AliEn CE ApMon AliEn SE ApMon Cluster Monitor ApMon AliEn TQ ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn Job Agent ApMon AliEn CE ApMon AliEn SE ApMon Cluster Monitor ApMon AliEn IS ApMon AliEn Optimizers ApMon AliEn Brokers ApMon MySQL Servers ApMon CastorGrid Scripts ApMon API Services

MonaLisa MonaLisa Repository Repository

A g g r e g a t e d D a t a rss vsz c p u t i m e run time job slots f r e e s p a c e n r .

  • f

f i l e s

  • pen

files

Q u e u e d J

  • b

A g e n t s

cpu ksi2k job status d i s k u s e d

processes

l

  • a

d net In/out jobs status sockets migrated mbytes active sessions MyProxy status Alerts Actions

Iosif Legrand March 2011

slide-11
SLIDE 11

11

http://pcalimonitor.cern.ch ALICE : Global Views, Status & Jobs ALICE : Global Views, Status & Jobs

Iosif Legrand March 2011

slide-12
SLIDE 12

12

Monitoring in ALICE: jobs, resources, services Monitoring in ALICE: jobs, resources, services

Iosif Legrand March 2011

slide-13
SLIDE 13

Iosif Legrand August 2009

Monitoring in ALICE: Xrootd servers Monitoring in ALICE: Xrootd servers

slide-14
SLIDE 14

14

Active Available Bandwidth measurements Active Available Bandwidth measurements between all the ALICE grid sites between all the ALICE grid sites

Iosif Legrand March 2011

slide-15
SLIDE 15

15

Active Available Bandwidth measurements Active Available Bandwidth measurements between all the ALICE grid sites (2) between all the ALICE grid sites (2)

Iosif Legrand March 2011

slide-16
SLIDE 16

16

Two levels of decisions:

local (autonomous), global (correlations).

Actions triggered by:

values above/below given thresholds, absence/presence of values, correlations between any values.

Action types:

alerts (emails/instant msg/atom feeds), automatic charts annotations in the repository, running custom code, like securely

  • rdering MLs service to change

connectivity – optimize traffic, submit jobs, (re)start global service.

ML Service ML Service ML Service ML Service

Actions based on Actions based on global information global information

Actions based on Actions based on local information local information

  • Traffic
  • Jobs
  • Hosts
  • Apps
  • Temperature
  • Humidity
  • A/C Power

Sensors Sensors Local Local decisions decisions Global Global decisions decisions

Local and Global Decision Framework Local and Global Decision Framework

Global ML Services

Iosif Legrand March 2011

slide-17
SLIDE 17

17

ALICE: Automatic job submission ALICE: Automatic job submission Restarting Services Restarting Services

17

MySQL daemon is automatically restarted when it runs out of memory Trigger: threshold on VSZ memory usage ALICE Production jobs queue is kept full by the automatic submission Trigger: threshold on the number of aliprod waiting jobs Administrators are kept up-to-date on the services’ status Trigger: presence/absence of monitored information

Iosif Legrand March 2011

slide-18
SLIDE 18

18

ALICE is using the monitoring information to automatically:

resubmit error jobs until a target completion percentage is reached, submit new jobs when necessary (watching the task queue size for each service account)

production jobs, RAW data reconstruction jobs, for each pass,

restart site services, whenever tests of VoBox services fail but the central services are OK, send email notifications / add chart annotations when a problem was not solved by a restart dynamically modify the DNS aliases of central services for an efficient load-balancing.

Most of the actions are defined by few lines configuration files.

Automatic actions in ALICE Automatic actions in ALICE

Iosif Legrand March 2011

slide-19
SLIDE 19

19

Monitoring USLHCNet Topology Monitoring USLHCNet Topology

Topology & Status & Peering Topology & Status & Peering Real Time Topology for L2 Circuits

Iosif Legrand March 2011

slide-20
SLIDE 20

Iosif Legrand August 2009

20

Artur Barczyk, 04/16/2009

AMS-GVA(GEANT) CHI-NYC (Qwest) GVA – NYC (GC) GVA – NYC (Colt) Ref @ CERN) CHI-GVA (Qwest) CHI-GVA (GC) AMS-NYC(GC)

0-95% 95-97% 97-98% 98-99% 99-100% 100%

99.5% 97.9% 99.9% 96.6% 99.3% 98.9% 99.5%

Monitoring Links Availability Monitoring Links Availability Very Reliable Information Very Reliable Information

P1 P1

work K

slide-21
SLIDE 21

21

USLHCnet: Accounting for Integrated Traffic USLHCnet: Accounting for Integrated Traffic

Iosif Legrand March 2011

slide-22
SLIDE 22

22

ALARMS and Automatic notifications for USLHCnet ALARMS and Automatic notifications for USLHCnet

Iosif Legrand March 2011

slide-23
SLIDE 23

Iosif Legrand August 2009

23

Monitoring Network Topology (L3), Monitoring Network Topology (L3), Latency, Routers Latency, Routers NETWORKS AS ROUTERS Real Time Topology Discovery & Display Real Time Topology Discovery & Display

slide-24
SLIDE 24

24

EVO : Real-Time monitoring for Reflectors EVO : Real-Time monitoring for Reflectors and the quality of all possible connections and the quality of all possible connections

Iosif Legrand March 2011

slide-25
SLIDE 25

25

EVO: Creating a Dynamic, Global, Minimum EVO: Creating a Dynamic, Global, Minimum Spanning Tree to optimize the connectivity Spanning Tree to optimize the connectivity

=

T u v

u v w T w

) , (

)) , (( ) (

A weighted connected graph G = (V,E) with n vertices and m edges. The quality of connectivity between any two reflectors is measured every second. Building in near real time a minimum- spanning tree with addition constrains

Resilient Overlay Network that optimize real-time communication

Iosif Legrand March 2011

slide-26
SLIDE 26

26

Dynamic MST to optimize the Dynamic MST to optimize the Connectivity for Reflectors Connectivity for Reflectors

Frequent measurements of RTT, jitter, traffic and lost packages The MST is recreated in ~ 1 S case on communication problems.

Iosif Legrand March 2011

slide-27
SLIDE 27

27

EVO: Optimize how clients connect to the EVO: Optimize how clients connect to the system for best performance and load balancing system for best performance and load balancing

Iosif Legrand March 2011

slide-28
SLIDE 28

28

Monitoring the Topology and Optical Monitoring the Topology and Optical Power on Fibers for Optical Circuits Power on Fibers for Optical Circuits

Port power monitoring

Controlling

Glimmerglass Switch Example

Iosif Legrand March 2011

slide-29
SLIDE 29

29

“ “On-Demand”, End to End Optical On-Demand”, End to End Optical Path Allocation Path Allocation

29

Internet

A

>FDT A/fileX B/path/ OS path available Configuring interfaces Starting Data Transfer

Monitor Control TL1 Optical Switch MonALISA Service MonALISA Distributed Service System

B

OS Agent

A c t i v e l i g h t p a t h

Regular IP path

Real time monitoring APPLICATION

LISA AGENT

LISA sets up

  • Network Interfaces
  • TCP stack
  • Kernel parameters
  • Routes

LISA  APPLICATION “use eth1.2, …”

LISA LISA Agent Agent

DATA

CREATES AN END TO END PATH < 1s

Detects errors and automatically recreate the Detects errors and automatically recreate the path in less than the TCP timeout path in less than the TCP timeout

Iosif Legrand March 2011

slide-30
SLIDE 30

30

CERN Geneva CALTECH Pasadena Starlight Manlan USLHCnet Internet2

Controlling Optical Planes Controlling Optical Planes Automatic Path Recovery Automatic Path Recovery

“Fiber cut” simulations The traffic moves from one transatlantic line to the other one FDT transfer (CERN – CALTECH) continues uninterrupted TCP fully recovers in ~ 20s

1 2 3 4

FDT Transfer

4 Fiber cuts simulations

200+ MBytes/sec From a 1U Node 4 fiber cut emulations

Iosif Legrand March 2011

slide-31
SLIDE 31

The MonALISA package includes:

  • Local host monitoring (CPU, memory, network traffic , processes and sockets

in each state, LM sensors, APC UPSs), log files tailing

  • SNMP generic & specific modules
  • Condor, PBS, LSF and SGE (accounting & host monitoring), Ganglia
  • Ping, tracepath, traceroute, pathload and other network-related measurements
  • TL1, Network devices, Ciena, Optical switches
  • Calling external applications/scripts that return as output the values
  • XDR-formatted UDP messages (such as ApMon).

New modules can be easly added by implementing a simple Java interface. Filters can be used to generate new aggregate data. The Service can also react to the monitoring data it receives (actions alarms). MonALISA can run code as distributed agents for global optimization

  • Used by Evo to maintain the tree of connections between reflectors
  • On demand end to end optical paths
  • Controls distributed data transfers

MonALISA collects any type of monitoring MonALISA collects any type of monitoring information in distributed systems information in distributed systems

Iosif Legrand March 2011

slide-32
SLIDE 32

Iosif Legrand March 2011

32

MonALISA Summary MonALISA Summary

32

Major Communities

 ALICE  CMS  ATLAS  PANDA  EVO  LGC RUSSIA  OSG  MXG  RoEduNet  USLHCNET  ULTRALIGHT  Enlightened

  • VRVS

VRVS ALICE EVO EVO OSG OSG

U

USLHCnet USLHCnet

MonALISA Today Running 24 X 7 at ~360 Sites

 Collecting ~ 2 million “persistent” parameters in real-time  80 million “volatile” parameters per day  Update rate of ~25,000 parameter updates/sec  Monitoring  40,000 computers  > 100 WAN Links  > 8,000 complete end-to-end

network path measurements

 Tens of Thousands of Grid jobs

running concurrently

 Controls jobs summation, different central services for the Grid,

EVO topology, FDT …

 The MonALISA repository system serves

~8 million user requests per year.

http://monalisa.caltech.edu