Experience and Prospects for Various Control Experience and - - PowerPoint PPT Presentation

experience and prospects for various control experience
SMART_READER_LITE
LIVE PREVIEW

Experience and Prospects for Various Control Experience and - - PowerPoint PPT Presentation

Experience and Prospects for Various Control Experience and Prospects for Various Control Strategies for Self-Replicating Multi-Agent Strategies for Self-Replicating Multi-Agent Systems Systems J.-P. Briot, Z. Guessoum, S. Aknine, A. Luna-


slide-1
SLIDE 1

Experience and Prospects for Various Control Experience and Prospects for Various Control Strategies for Self-Replicating Multi-Agent Strategies for Self-Replicating Multi-Agent Systems Systems

J.-P. Briot, Z. Guessoum, S. Aknine, J.-P. Briot, Z. Guessoum, S. Aknine, A. Luna-

  • A. Luna-

Almeida, Almeida, N. Faci and M. Gatti

  • N. Faci and M. Gatti

CReSTIC CReSTIC ( (C Centre de entre de Recherche Recherche en en STIC, STIC, Université de Université de Reims Reims) ) LIP6 LIP6 ( (L Laboratoire d' aboratoire d'I Informatique de nformatique de P Paris 6 aris 6) ) faci@leri.univ-reims.fr faci@leri.univ-reims.fr

slide-2
SLIDE 2
  • 2

2-

  • Fault-Tolerant MAS

Fault-Tolerant MAS

 Large-scale multi-agent systems Large-scale multi-agent systems

 Physically distributed

Physically distributed

 Dynamic environment (with limited resources)

Dynamic environment (with limited resources)

 Types of failures Types of failures

 Software (bugs, deadlocks, ...)

Software (bugs, deadlocks, ...)

 Hardware (Network links, machines,...)

Hardware (Network links, machines,...)

» How to avoid failures ? How to avoid failures ?

A A A A A A A A A

slide-3
SLIDE 3
  • 3

3-

  • Fault Classifications

Fault Classifications

 Based on how a failed component behaves once it has failed, faults Based on how a failed component behaves once it has failed, faults can be classified into 4 categories: crash, omission, timing or can be classified into 4 categories: crash, omission, timing or Byzantine. Byzantine.

 Crash faults: the component either completely stops operating or

Crash faults: the component either completely stops operating or never returns to a valid state never returns to a valid state; ;

 Omission faults: the component completely fails to perform its service;

Omission faults: the component completely fails to perform its service;

 Timing faults: the component does not complete its service on time;

Timing faults: the component does not complete its service on time;

 Byzantine faults: these are faults of an arbitrary nature.

Byzantine faults: these are faults of an arbitrary nature.

** * * *

slide-4
SLIDE 4
  • 4

4-

  • Replication

Replication

 Existing solution: Replication strategies Existing solution: Replication strategies

 Replication of data and/or computation is an effective way to

Replication of data and/or computation is an effective way to achieve fault tolerance in distributed systems. achieve fault tolerance in distributed systems.

 A replicated software component is defined as a software

A replicated software component is defined as a software component that possesses a representation on two or more component that possesses a representation on two or more hosts. hosts.

 Distributed applications: Distributed applications:

 Small number of components

Small number of components

 Component criticality is static

Component criticality is static

 …

The number of replicas and the replication strategy are explicitly The number of replicas and the replication strategy are explicitly and statically defined by the designer before run time and statically defined by the designer before run time

slide-5
SLIDE 5
  • 5

5-

  • Agent

Agent Replication Replication

 Multi-agent application characteristics: Multi-agent application characteristics:

 adaptive agent,

adaptive agent,

 large

large scale, scale,

 dynamic and adaptive

dynamic and adaptive organizational structures

  • rganizational structures

 …

 Criticality (the number of r Criticality (the number of re eplicats and the replication plicats and the replication strategy) cannot be explicitly and statically defined by the strategy) cannot be explicitly and statically defined by the designer before run time designer before run time

 Automatically

Automatically and and dynamically dynamically apply replication apply replication mechanisms mechanisms where where (to which agents) and (to which agents) and when when it is most it is most needed. needed.

slide-6
SLIDE 6
  • 6

6-

  • Dynamic Replication

Dynamic Replication

 DarX: a new replication framework DarX: a new replication framework

− http://www-src.lip6.fr/darx/

http://www-src.lip6.fr/darx/

 Large-scale distributed systems

Large-scale distributed systems

 Replication mechanisms

Replication mechanisms

  • Several replication strategies (active, passive, hybrid…)

Several replication strategies (active, passive, hybrid…)

  • Dynamic replication: change dynamically the number of replicas

Dynamic replication: change dynamically the number of replicas and the replication strategy and the replication strategy

 Observation mechanisms

Observation mechanisms

 Fault detection/recovery mechanisms

Fault detection/recovery mechanisms

 Encapsulation of the system tasks into the replication group

Encapsulation of the system tasks into the replication group

  • Transparence of the replication regarding the other agents

Transparence of the replication regarding the other agents

  • Replication mechanisms are not attached to the DarX servers, they

Replication mechanisms are not attached to the DarX servers, they are attached to the replication groups are attached to the replication groups

slide-7
SLIDE 7
  • 7

7-

  • Automatic Replication

Automatic Replication

Adaptive Replication Mechanism Adaptive Replication Mechanism

Which agents need to be replicated and when? Which agents need to be replicated and when? What is the number of replicas? What is the number of replicas? Where? Where?

slide-8
SLIDE 8
  • 8

8-

  • Adaptive Control of Replication

Adaptive Control of Replication

 Hypothesis and principles Hypothesis and principles

 Automatic mechanisms

Automatic mechanisms

 Some prior inputs from the designer of the application

Some prior inputs from the designer of the application

 Agents can be either reactive or deliberative

Agents can be either reactive or deliberative

 Agents can be heterogeneous

Agents can be heterogeneous

 Agents communicate with some ACL (FIPA, …)

Agents communicate with some ACL (FIPA, …)

 Agent criticality relies on Semantic-level information Agent criticality relies on Semantic-level information

 Roles

Roles [Selmas’03] [AAMAS’02] [Selmas’03] [AAMAS’02]

 Interdependence graph

Interdependence graph [AAMAS’04] [Selmas’05] [AAMAS’04] [Selmas’05]

 Plans

Plans

 …

slide-9
SLIDE 9
  • 9

9-

  • Interdependence Graph

Interdependence Graph

 The arcs are labeled by any information which is susceptible The arcs are labeled by any information which is susceptible to enable the detection or anticipation of undesirable to enable the detection or anticipation of undesirable behaviors behaviors

1 j

k 2 i

Agent_i Agent_j Agent_k

w12

slide-10
SLIDE 10
  • 10

10-

  • Interdependence Graph

Interdependence Graph

 Interdependence may be defined by considering Interdependence may be defined by considering

 NbM

NbMij

ij: the number of messages received by

: the number of messages received by Agent Agenti

i from

from Agent Agentj

j

 NbM(

NbM(∆ ∆ t) = Mop(NbM t) = Mop(NbM1,1

1,1(

(∆ ∆ t), NbM t), NbM1,2

1,2 (

(∆ ∆ t)..., NbM t)..., NbMn,n

n,n (

(∆ ∆ t)) t))

Mop is the aggregation operator median. Mop is the aggregation operator median.  ∆

∆t: monitoring is activated each t: monitoring is activated each ∆ ∆t t  A simple adaptation algorithm A simple adaptation algorithm w wi,j

i,j (t0) initialized by the designer/user

(t0) initialized by the designer/user w wij

ij(t +

(t + ∆ ∆t)= w t)= wij

ij(t) + (NbM

(t) + (NbMij

ij (

(∆ ∆t) – NbM ( t) – NbM (∆ ∆t)) / NbM ( t)) / NbM (∆ ∆t) t)

slide-11
SLIDE 11
  • 11

11-

  • Multi-Agent Architecture

Multi-Agent Architecture

Agent 1 Agent 2 Agent 3 Agent 4 Monitor 1 Monitor 2 Monitor 3 Monitor 4 A g e n t s L e v e l O b s e r v a t i

  • n

L e v e l

Host-Monitor

Host_j

Host-Monitor

Host_i

node-2 Adaptation algorithm

slide-12
SLIDE 12
  • 12

12-

  • Multi-Agent Architecture

Multi-Agent Architecture

 Agent-Monitors Agent-Monitors

 observe the domain agents

  • bserve the domain agents

 build/update the interdependences of the associated agent

build/update the interdependences of the associated agent

 control the domain agents

control the domain agents

 …

 Host-Monitors Host-Monitors

 aggregate information and dispatche back to agent-monitors

aggregate information and dispatche back to agent-monitors

 manage the resources

manage the resources

 …

 Domain Agents (agents of the appliation) Domain Agents (agents of the appliation)

slide-13
SLIDE 13
  • 13

13-

  • Multi-Agent Architecture

Multi-Agent Architecture

slide-14
SLIDE 14
  • 14

14-

  • Implementation

Implementation

 DimaX: A Fault-Tolerant Multi-Agent Platform

DimaX: A Fault-Tolerant Multi-Agent Platform

 Various services (naming service, fault detection, replication, …)

Various services (naming service, fault detection, replication, …)

 Agent monitors and host-monitors

Agent monitors and host-monitors

 …

Agents Adaptor Replication Failure Detection (FD)

Adaptive Replication Control Observation

DIMA Naming/Localization

DarX

slide-15
SLIDE 15
  • 15

15-

  • Experiments

Experiments

 Technical details Technical details

 Multi-agent platform: DIMA (Guessoum & Briot 99)

Multi-agent platform: DIMA (Guessoum & Briot 99)

 Middleware: DarX (Guessoum et al. 2003)

Middleware: DarX (Guessoum et al. 2003)

  • Naming localization, observation, replication, failure detection

Naming localization, observation, replication, failure detection

 Example: Scheduling meetings Example: Scheduling meetings

 Interact with the user to receive their meeting requests and

Interact with the user to receive their meeting requests and associated information (a title, a description, possible dates, associated information (a title, a description, possible dates, participants, priority, etc.) , participants, priority, etc.) ,

 Interact with the other agents of the system to schedule

Interact with the other agents of the system to schedule meetings. meetings.

slide-16
SLIDE 16
  • 16

16-

  • Experiments

Experiments

 Robustness Robustness

 100 agents on 10 machines

100 agents on 10 machines

 Failure simulator: randomly stops the thread of an agent

Failure simulator: randomly stops the thread of an agent

 Scenario

Scenario

  • 50 meetings

50 meetings

  • Goal of the MAS: Schedule the 50 meetings

Goal of the MAS: Schedule the 50 meetings

 Rate of successful simulations

Rate of successful simulations

  • Number of simulations which did not fail / total number of

simulations

 3 replication approaches

3 replication approaches

  • Random

Random

  • Roles

Roles

  • Dependences

Dependences

slide-17
SLIDE 17
  • 17

17-

  • Experiments

Experiments

 Robustness Robustness

slide-18
SLIDE 18
  • 18

18-

  • Conclusions and Future Work

Conclusions and Future Work

 A new fault-tolerant multi-agent platform (DimaX) A new fault-tolerant multi-agent platform (DimaX)

 Based on DIMA and DarX

Based on DIMA and DarX

 A new approach to evaluate dynamically the criticality of agents

A new approach to evaluate dynamically the criticality of agents

  • Small applications have been developed (meetings scheduling …)

Small applications have been developed (meetings scheduling …)

 Other categories of faults Other categories of faults

 Timing, Byzantine

Timing, Byzantine

 More experiments More experiments

 To validate the proposed approach

To validate the proposed approach

 To better identify:

To better identify:

  • the potential target application domains (load balancing …)

the potential target application domains (load balancing …)

  • the domains for which the approach is not suited

the domains for which the approach is not suited