Multiagent Supervised Training with Agent Hierarchies and Manual - - PowerPoint PPT Presentation

multiagent supervised training with agent hierarchies and
SMART_READER_LITE
LIVE PREVIEW

Multiagent Supervised Training with Agent Hierarchies and Manual - - PowerPoint PPT Presentation

Multiagent Supervised Training with Agent Hierarchies and Manual Behavior Decomposition Keith Sullivan Sean Luke Department of Computer Science, George Mason University Fairfax, VA 22030 USA RoboCup Motivation Motivation for Training


slide-1
SLIDE 1

Multiagent Supervised Training with Agent Hierarchies and Manual Behavior Decomposition

Keith Sullivan Sean Luke

Department of Computer Science, George Mason University Fairfax, VA 22030 USA

slide-2
SLIDE 2

RoboCup Motivation

slide-3
SLIDE 3

Motivation for Training

◮ Programming agent behaviors is tedious

◮ Code, test, debug cycles

◮ Changing of agent behavior is desirable

◮ Non-programmers (consumers, animators, etc.) ◮ Future tasks, possibly greatly different from original task

◮ Learning from Demonstration (LfD)

◮ Iteratively builds policy from examples (state/action pairs) ◮ Supervised learning

slide-4
SLIDE 4

Hierarchical Training of Agent Behaviors (HiTAB)

◮ Motivation: Rapidly train complex behaviors with

very few examples

◮ Behaviors are automata ◮ Expandable behavior library

◮ Start with atomic behaviors ◮ Iteratively build more complex behaviors via scaffolding

◮ Features describe internal and world conditions

◮ Continuous, torodial, categorical (boolean)

◮ Behaviors and features are parameterizable

slide-5
SLIDE 5

HiTAB (cont.)

◮ Gathering examples is expensive

◮ Each example is an experiment conducted in real-time

◮ Admission: close to programming by example and far away

from machine learning

◮ Limited number of samples, but high dimensional problem!

◮ Behavior decomposition via hierarchical finite automata (HFA) ◮ Per-behavior feature reduction

◮ Learn transition functions → Supervised classification task

◮ C4.5 with probabilistic leaf nodes ◮ Different types of features

slide-6
SLIDE 6

Example Behavior

Acquiring: forward Looking 2: turn next to can grab, pick random angle can see a can next to can release, pick random angle Acquiring 2: forward Looking: turn can see a can can't see can lost can lost can can't see can Start Spinning: turn at desired angle Spinning 2: turn at desired angle

(a) Moore Machine

Run Away (Macro) under bed human absent Hide Under Bed human present Collect Cans (Macro) Start human absent

(b) HFA

slide-7
SLIDE 7

Formal Model

◮ S = {S1, ..., Sn} is the set of states in the automaton. Among

  • ther states, there is one start state S1 and zero or more flag

states.

◮ B = {B1, ..., Bk} is the set of basic (hard-coded) behaviors. ◮ F = {F1, ..., Fm} is the set of observable features in the

environment.

◮ T = F1 × ... × Fm × S → S is the transition function which

maps the current state St and the current feature vector ft to a new state St+1.

◮ We generalize the model with free variables (parameters)

G1, . . . , Gn for basic behaviors and features.

slide-8
SLIDE 8

Using HiTAB

◮ Running HiTAB

◮ Begin in start state ◮ Query transition function, transition, perform associated

behavior

◮ Training with HiTAB

◮ Alternate training mode and testing mode ◮ Build example database, adding corrections as needed ◮ Trim unused behaviors and features for saving

slide-9
SLIDE 9

Homogeneous Agent Hierarchy

◮ Problem

◮ Size of learning space grows →

number of samples grows

◮ Inverse problem between micro-

and macro-level behaviors

◮ Agent hierarchy: tree with

coordinator agents as non-leaves and regular agents as leaves

◮ Coordinator agent features:

statistical information about subsidiary agents a

◮ Agents at same level run same

HFA, but might be in different states

◮ Train agents bottom-up

slide-10
SLIDE 10

Notions of Homogeneity

Patrol Patrol Patrol Patrol Disperse Disperse Disperse Disperse Collective Patrol Attack Attack Disperse Disperse Collective Patrol Collective Patrol Save Humanity

slide-11
SLIDE 11

Experiments

◮ Simulated box Foraging

◮ Known deposit location ◮ Randomly placed boxes ◮ 10 boxes in all experiments

◮ 50 agents: two levels of hierarchy

◮ Teams of 5 agents ◮ Grouped these teams into groups of 5

◮ Boxes require either 5 or 25 agents to pull back ◮ 100 iterations of 100,000 timesteps each

slide-12
SLIDE 12

Simulation

slide-13
SLIDE 13

Results

20 40 60 80 100 120 140 25000 50000 75000 100000 Mean Collected Boxes TimeStep Trained Swarm Trained Groups Hand-Coded Swarm Hand-Coded Groups

slide-14
SLIDE 14

Preliminary Multirobot Work

slide-15
SLIDE 15

Future Work

◮ Training Multiple Agents

◮ Behavior Bootstrapping

◮ Heterogeneous Groups

◮ Behavior and Capability

◮ Dynamic Hierarchies ◮ Correction of Demonstrator Error