[PPT] - Inferring User Intent for Learning by Observation Kevin R. Dixon PowerPoint Presentation

SLIDE 1

Carnegie Mellon

Inferring User Intent for Learning by Observation

Kevin R. Dixon

krd@cs.cmu.edu

Department of Electrical & Computer Engineering Carnegie Mellon University

2004-01-23, Inferring User Intent for LBO – p.1

SLIDE 2

Carnegie Mellon

Task Automation

Many users repeat their regular tasks manually Reading news Watering plants Sorting emails In the robot domain Many industrial tasks are performed manually Not automated at all

2004-01-23, Inferring User Intent for LBO – p.2

SLIDE 3

Carnegie Mellon

Task Automation

Many users repeat their regular tasks manually Reading news Watering plants Sorting emails In the robot domain Many industrial tasks are performed manually Not automated at all Majority of systems require that users convey knowledge with procedural-programming techniques Effectively precludes automation in most cases

2004-01-23, Inferring User Intent for LBO – p.2

SLIDE 4

Carnegie Mellon

Goal of This Work

Create a system that automates motor-skill tasks by learning from observations of user demonstrations

2004-01-23, Inferring User Intent for LBO – p.3

SLIDE 5

Carnegie Mellon

Motor-Skill Tasks

An observable state sequence is sufficient to complete the task State sequence maps onto a set of motor commands Majority of industrial automation involves motor-skill tasks

2004-01-23, Inferring User Intent for LBO – p.4

SLIDE 6

Carnegie Mellon

Outline

Background Modeling the User Predictive Robot Programming Hypothesizing about User Actions Learning By Observation Conclusions and Open Issues

2004-01-23, Inferring User Intent for LBO – p.5

SLIDE 7

Carnegie Mellon

Background

There have been many approaches to improving task automation Iconic programming (Gertz et al., 1995) Multi-modal input (Iba et al., 2002) Cognitive modeling (Forsythe & Xavier, 2002) Task automation based on a user demonstration is called Learning By Observation (LBO) Programming by Demonstration Teaching from Example

2004-01-23, Inferring User Intent for LBO – p.6

SLIDE 8

Carnegie Mellon

Learning by Observation

Learning to drive a car (Pomerleau, 1991) User-interface agents (Cypher, 1993) Manipulator-robot programming (Asada & Asari, 1988) Manipulation tasks (Friedrich et al., 1996) Assembly tasks (Chen & Zelinsky, 2003)

2004-01-23, Inferring User Intent for LBO – p.7

SLIDE 9

Carnegie Mellon

Learning by Observation

Learning to drive a car (Pomerleau, 1991) User-interface agents (Cypher, 1993) Manipulator-robot programming (Asada & Asari, 1988) Manipulation tasks (Friedrich et al., 1996) Assembly tasks (Chen & Zelinsky, 2003) However, each of these systems imitates the user

2004-01-23, Inferring User Intent for LBO – p.7

SLIDE 10

Carnegie Mellon

Problem with Imitation

Initially, the self-driving system could not drive for long distances (Pomerleau, 1991). This is because drivers never show the system how to correct during normal operation (Pomerleau, 1996)

2004-01-23, Inferring User Intent for LBO – p.8

SLIDE 11

Carnegie Mellon

Problem with Imitation

Initially, the self-driving system could not drive for long distances (Pomerleau, 1991). This is because drivers never show the system how to correct during normal operation (Pomerleau, 1996) The solution was to infer user intent

2004-01-23, Inferring User Intent for LBO – p.8

SLIDE 12

Carnegie Mellon

User Intent

Consider user demonstrations as a sequence of goal-directed actions We interpret demonstrations as inferring user intent

2004-01-23, Inferring User Intent for LBO – p.9

SLIDE 13

Carnegie Mellon

User Intent

Consider user demonstrations as a sequence of goal-directed actions We interpret demonstrations as inferring user intent Informally, user intent is the sequence of actions invariant from Changes in the environment Human imprecision Sensor noise

2004-01-23, Inferring User Intent for LBO – p.9

SLIDE 14

Carnegie Mellon

User Intent

Consider user demonstrations as a sequence of goal-directed actions We interpret demonstrations as inferring user intent Informally, user intent is the sequence of actions invariant from Changes in the environment Human imprecision Sensor noise Even more informally, “what the user would have done”

2004-01-23, Inferring User Intent for LBO – p.9

SLIDE 15

Carnegie Mellon

User Intent

Inferring user intent is relatively new Hierarchical search for invariants (Alissandrakis et al., 2002; Billard

et al., 2003)

Finite-state machines (Skubic & Volz, 2000; Nicolescu & Matari´

c, 2001)

2004-01-23, Inferring User Intent for LBO – p.10

SLIDE 16

Carnegie Mellon

Our Approach

Build a statistical model based on user observations

✁

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ✆ ✆ ✆ ✆ ✆ ✆

Environment Sensor Environment: ω0, ω1, . . . , ωM Subgoals: y0, y1, . . . , yn

Hypothesis

Extract a set of subgoals as a function of the environment

2004-01-23, Inferring User Intent for LBO – p.11

SLIDE 17

Carnegie Mellon

Our Approach

Hypothesize about “what the user would have done”

✁

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✂ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ✄ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ☎ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆ ✆

Sensor ω0, ω1, . . . , ωM

Hypothesis

y0,

y1, . . . , yn Environment Environment: Estimated Subgoals:

System provides the sequence of subgoals specifying how to achieve the goal

2004-01-23, Inferring User Intent for LBO – p.12

SLIDE 18

Carnegie Mellon

Outline

Background Modeling the User Predictive Robot Programming Hypothesizing about User Actions Learning By Observation Conclusions and Open Issues

2004-01-23, Inferring User Intent for LBO – p.13

SLIDE 19

Carnegie Mellon

Modeling the User

Due to the low precision of humans, observations can be considered noise corrupted There are hidden, or latent, causes for user behavior Many possible ways to complete a task Difficult to instrument fully realistic environments

2004-01-23, Inferring User Intent for LBO – p.14

SLIDE 20

Carnegie Mellon

Modeling the User

Due to the low precision of humans, observations can be considered noise corrupted There are hidden, or latent, causes for user behavior Many possible ways to complete a task Difficult to instrument fully realistic environments One flexible model of the user is a Continuous-Density Hidden Markov Model (CDHMM)

2004-01-23, Inferring User Intent for LBO – p.14

SLIDE 21

Carnegie Mellon

Estimating a CDHMM

CDHMMs have continuous observations Many researchers have used HMMs to describe human actions (Hannaford & Lee, 1991) Used fixed-topology models (Hovland et al., 1996)

2004-01-23, Inferring User Intent for LBO – p.15

SLIDE 22

Carnegie Mellon

Estimating a CDHMM

CDHMMs have continuous observations Many researchers have used HMMs to describe human actions (Hannaford & Lee, 1991) Used fixed-topology models (Hovland et al., 1996) We do not know what tasks the user will perform a priori Estimate the structure of the target CDHMM from user

bservations

2004-01-23, Inferring User Intent for LBO – p.15

SLIDE 23

Carnegie Mellon

Structure Estimation in CDHMMs

Fixed-topology HMMs have well-known algorithms e.g. Baum-Welch, Viterbi (Rabiner, 1989) Estimating HMM structure is hard (Gillman & Sipser, 1994; Abe &

Warmuth, 1992)

Most work focuses on Heuristic methods (Stolcke & Omohundro, 1994) Special subclasses (Ron et al., 1998) Asymptotic behavior (Ephraim & Merhav, 2002) Have only been used in theory or hand-crafted problems Except in speech recognition (Singh et al., 2002)

2004-01-23, Inferring User Intent for LBO – p.16

SLIDE 24

Carnegie Mellon

Learning Algorithm Overview

Input: Set of demonstrations Output: CDHMM describing the demonstrations

2004-01-23, Inferring User Intent for LBO – p.17

SLIDE 25

Carnegie Mellon

Learning Algorithm Overview

Input: Set of demonstrations Output: CDHMM describing the demonstrations Requirements: Continuous-density observations CDHMM should be simple Low computational complexity Correctness

2004-01-23, Inferring User Intent for LBO – p.17

SLIDE 26

Carnegie Mellon

Our Approach

Consider each task as a random walk through a target CDHMM Assign each observation to a node in a graph

x3 x3 x1 x2 x1 x2

Repeatedly merge similar nodes Use fixed-topology estimation on resulting structure

2004-01-23, Inferring User Intent for LBO – p.18

SLIDE 27

Carnegie Mellon

Similarity Merging In Action

Original

1 12 13 24 25 9 23 32 2 3 4 5 6 7 8 14 15 16 17 18 19 20 21 22 26 27 28 29 30 31

Loose Strict Merging using “loose” and “strict” definitions of similarity

2004-01-23, Inferring User Intent for LBO – p.19

SLIDE 28

Carnegie Mellon

Similarity Merging In Action

Original

1 12 13 24 25 9 23 32 2 3 4 5 6 7 8 14 15 16 17 18 19 20 21 22 26 27 28 29 30 31

Loose

1 11 19 2 16 4 5 17 8 9 10 12 13

Strict Merging using “loose” and “strict” definitions of similarity

2004-01-23, Inferring User Intent for LBO – p.19

SLIDE 29

Carnegie Mellon

Similarity Merging In Action

Original

1 12 13 24 25 9 23 32 2 3 4 5 6 7 8 14 15 16 17 18 19 20 21 22 26 27 28 29 30 31

Loose

1 11 19 2 16 4 5 17 8 9 10 12 13

Strict

1 11 21 22 10 19 29 2 3 4 5 6 7 28 8 9 12 13 14 27 15 16 17 18 23 24 25 26

Merging using “loose” and “strict” definitions of similarity

2004-01-23, Inferring User Intent for LBO – p.19

SLIDE 30

Carnegie Mellon

Properties of the Algorithm

The algorithm only produces graphs where All similar nodes are merged All dissimilar nodes are unmerged Implies locally minimal number of nodes, but not necessarily globally minimal The worst-case computational complexity is quadratic in the number of observations

2004-01-23, Inferring User Intent for LBO – p.20

SLIDE 31

Carnegie Mellon

Correctness

The probability of error decreases exponentially as more tasks are added

Number of Tasks Probability of Error

However, the theorem deals with estimating individual states, not a sequence of observations

2004-01-23, Inferring User Intent for LBO – p.21

SLIDE 32

Carnegie Mellon

Outline

Background Modeling the User Predictive Robot Programming Hypothesizing about User Actions Learning By Observation Conclusions and Open Issues

2004-01-23, Inferring User Intent for LBO – p.22

SLIDE 33

Carnegie Mellon

Theory Meets Humanity

Properties of the learning algorithm assume ideal conditions It remains to be seen how the algorithm works on real-world data Test in relative isolation with an application of Predictive Robot Programming

2004-01-23, Inferring User Intent for LBO – p.23

SLIDE 34

Carnegie Mellon

Predictive Robot Programming

As capabilities increase, robots are performing more sophisticated tasks Simple tasks take days to program, complex tasks take weeks or months Significant programming time means that production may have to be halted temporarily Decreasing programming time will increase the appeal

f robotic automation

2004-01-23, Inferring User Intent for LBO – p.24

SLIDE 35

Carnegie Mellon

Predictive Robot Programming

Most tasks can be decomposed into simpler subtasks Subtasks may be repeated many times through the program However, most robot programmers recreate the subtasks from scratch each time

2004-01-23, Inferring User Intent for LBO – p.25

SLIDE 36

Carnegie Mellon

Visualizing Similarity

0.74 0.76 0.78 0.8 0.82 1.1 1.12 1.14 1.16 −1.9 −1.86 −1.82

x (m) Start

Task A

y (m) z (m)

0.74 0.76 0.78 0.8 0.82 −0.34 −0.32 −0.3 −0.28 −1.9 −1.86 −1.82

Start x (m)

Task B

y (m) z (m)

Waypoints describe the location and orientation of the end effector Waypoints from two different subroutines They are different, but contain a common pattern that is translated and rotated

2004-01-23, Inferring User Intent for LBO – p.26

SLIDE 37

Carnegie Mellon

Predictive Robot Programming

Key idea behind Predictive Robot Programming: Learn from previous user actions Reduce programming time by identifying and completing subtasks automatically As an analogy, word-completion programs

2004-01-23, Inferring User Intent for LBO – p.27

SLIDE 38

Carnegie Mellon

Predictive Robot Programming

Subgoals: y0, y1, . . . , yn Environment:

Hypothesis

yn

+ 1

Prediction:

User supplies subgoals Ignore environment System only predicts next subgoal (maximum likelihood)

2004-01-23, Inferring User Intent for LBO – p.28

SLIDE 39

Carnegie Mellon

Predictive Robot Programming

We present two sets of results Offline programming: Showing prediction accuracy on real-world data Online programming: Showing decrease in programming time in laboratory

2004-01-23, Inferring User Intent for LBO – p.29

SLIDE 40

Carnegie Mellon

Offline-Programming Context

5 arc-welding programs, producing different products 252–1899 waypoints, 16–196 subroutines

Took over 70 days to create by a professional robot programmer

2004-01-23, Inferring User Intent for LBO – p.30

SLIDE 41

Carnegie Mellon

Offline-Programming Methodology

Get Next Subroutine Another Subroutine? Predict Next Waypoint Add Subroutine to CDHMM Waypoint Get Initialize CDHMM Another Waypoint? Sufficient Confidence? Compute Prediction Error

no yes yes no yes no

Repeat this process for each robot program

2004-01-23, Inferring User Intent for LBO – p.31

SLIDE 42

Carnegie Mellon

Model Complexity

Quantify similarity by δ ∈ (0, 1]

δ → 0 induces simple CDHMMs δ → 1 induces complex CDHMMs

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 140 150 160 170 180 190

δ States

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.018 0.02 0.022 0.024 0.026 0.028 0.03

δ Time (s)

2004-01-23, Inferring User Intent for LBO – p.32

SLIDE 43

Carnegie Mellon

Modeling the User

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 x 10

−3

δ Avg Median (m)

When δ is too small, CDHMM is too simple When δ is too large, CDHMM overfits A value of δ ∈ (0.5, 0.9) induces the “right” complexity

2004-01-23, Inferring User Intent for LBO – p.33

SLIDE 44

Carnegie Mellon

Prediction Confidence

Regardless of criterion, prediction will always exist Should only suggest most accurate predictions Need a causal statistic

2004-01-23, Inferring User Intent for LBO – p.34

SLIDE 45

Carnegie Mellon

Prediction Confidence

Regardless of criterion, prediction will always exist Should only suggest most accurate predictions Need a causal statistic Compute prediction confidence, φn ∈ [0, 1], based on CDHMM entropy from observing current task High entropy results in low confidence Low entropy results in high confidence Filter predictions based on confidence threshold

2004-01-23, Inferring User Intent for LBO – p.34

SLIDE 46

Carnegie Mellon

Prediction Confidence

0.2 0.4 0.6 0.8 1 20 40 60 80 100 Confidence Threshold Percentage Useful Predictions Total Predictions

A useful prediction is within 1millimeter of the target Confidence correlates with prediction error as −0.89

(p ≪ 0.01)

2004-01-23, Inferring User Intent for LBO – p.35

SLIDE 47

Carnegie Mellon

Temporal Performance

200 400 600 800 1000 1200 1400 1600 10

−4

10

−3

10

−2

10

−1

Median (m) Waypoint Number

Error generally decreases as more waypoints are added median: 190microns, 0.38% Cartesian error

2004-01-23, Inferring User Intent for LBO – p.36

SLIDE 48

Carnegie Mellon

Offline-Programming Results

Majority of predictions are useful Median errors: Cartesian error 30–200 microns (0.03%–0.2%) Running time per waypoint Construct CDHMM and predict: 30ms

2004-01-23, Inferring User Intent for LBO – p.37

SLIDE 49

Carnegie Mellon

Online-Programming Setup

The ultimate criterion is reduction in programming time We collected 44 robot programs from 3 users in a laboratory setting

2004-01-23, Inferring User Intent for LBO – p.38

SLIDE 50

Carnegie Mellon

Online-Programming Results

Prediction mean std change Wilcoxon Criterion

(sec) (sec)

Conf

Baseline

292.2 78.61

N/A N/A

φn ≥0.8 193.2 32.07 −33.88% 99.95% φn ≥0.5 178.0 33.39 −39.08% 99.99%

Programming time used to complete the tasks with no predictions high-confidence predictions low-confidence predictions Statistically significant drop when using predictions No statistical significance between prediction criteria

2004-01-23, Inferring User Intent for LBO – p.39

SLIDE 51

Carnegie Mellon

Limitations

Indicating location of prediction Collisions are a problem Monolithic CDHMM does not represent entire tasks well

2004-01-23, Inferring User Intent for LBO – p.40

SLIDE 52

Carnegie Mellon

Outline

Background Modeling the User Predictive Robot Programming Hypothesizing about User Actions Learning By Observation Conclusions and Open Issues

2004-01-23, Inferring User Intent for LBO – p.41

SLIDE 53

Carnegie Mellon

Shelter from the Storm

The algorithm appears to model user actions well These were relatively sheltered conditions Inject more realistic factors

2004-01-23, Inferring User Intent for LBO – p.42

SLIDE 54

Carnegie Mellon

Learning By Observation

Allow users to program mobile robots by demonstrating trajectory Consider a vacuum-cleaning robot Undesirable to require retraining each time furniture is moved Create a system that automates motor-skill tasks, regardless of environment, occlusion, and noise

2004-01-23, Inferring User Intent for LBO – p.43

SLIDE 55

Carnegie Mellon

Learning By Observation

Allow users to program mobile robots by demonstrating trajectory Consider a vacuum-cleaning robot Undesirable to require retraining each time furniture is moved Create a system that automates motor-skill tasks, regardless of environment, occlusion, and noise Within reason, of course

2004-01-23, Inferring User Intent for LBO – p.43

SLIDE 56

Carnegie Mellon

Methodology

no yes Associate Subgoals with Environment Observe User Another Demo? Compute Subgoals Map Demos to Same Environment Learn from Demonstrations Perform Task

In Predictive Robot Programming: “Learn from Demos” “Perform Task” Now we have added Sensor issues Extraction of subgoals Environment considerations

2004-01-23, Inferring User Intent for LBO – p.44

SLIDE 57

Carnegie Mellon

Agent Orange

ActivMedia Pioneer DX II SICK scanning laser range finder Carmen toolkit (Montemerlo et al., 2003)

2004-01-23, Inferring User Intent for LBO – p.45

SLIDE 58

Carnegie Mellon

Observing the User

Map of laboratory

Wall Chair Legs Agent Orange Desk Human Legs Computer

Sample laser scan Object occlusion handled by Kalman filters

2004-01-23, Inferring User Intent for LBO – p.46

SLIDE 59

Carnegie Mellon

Hypothesizing about User Actions

Requirements: Emulate “what the user would have done” in novel conditions Ability to be mapped to different environments Incorporate multiple examples Facilitate learning A method for representing trajectories

2004-01-23, Inferring User Intent for LBO – p.47

SLIDE 60

Carnegie Mellon

Representing Trajectories

There has been much work in representing trajectories Cubic splines (Craig, 1989) Bézier curves (Hwang et al., 2003) Predicate calculus (Nicolescu & Matari´

c, 2001)

Nonlinear differential equations (Schaal et al., 2003)

2004-01-23, Inferring User Intent for LBO – p.48

SLIDE 61

Carnegie Mellon

Representing Trajectories

There has been much work in representing trajectories Cubic splines (Craig, 1989) Bézier curves (Hwang et al., 2003) Predicate calculus (Nicolescu & Matari´

c, 2001)

Nonlinear differential equations (Schaal et al., 2003) We want to store user trajectories in a generative manner

2004-01-23, Inferring User Intent for LBO – p.48

SLIDE 62

Carnegie Mellon

Representing Trajectories

There has been much work in representing trajectories Cubic splines (Craig, 1989) Bézier curves (Hwang et al., 2003) Predicate calculus (Nicolescu & Matari´

c, 2001)

Nonlinear differential equations (Schaal et al., 2003) We want to store user trajectories in a generative manner Our approach: Sequenced Linear Dynamical Systems (LDS)

2004-01-23, Inferring User Intent for LBO – p.48

SLIDE 63

Carnegie Mellon

Sequenced LDS

Segment trajectory at important points, the subgoals Represent each segment as a single Linear Dynamical System Reconstruct trajectory using the LDS estimates in sequence

2004-01-23, Inferring User Intent for LBO – p.49

SLIDE 64

Carnegie Mellon

LDS Example

Start 2004-01-23, Inferring User Intent for LBO – p.50

SLIDE 65

Carnegie Mellon

LDS Example

Start

Fit the trajectory with the least-squares LDS Reproduce the trajectory with the estimated LDS

2004-01-23, Inferring User Intent for LBO – p.50

SLIDE 66

Carnegie Mellon

LDS Example

Start

Fit the trajectory with the least-squares LDS Reproduce the trajectory with the estimated LDS Response of LDS to various initial conditions represents a hypothesis of user intent

2004-01-23, Inferring User Intent for LBO – p.50

SLIDE 67

Carnegie Mellon

LDS Example

Start

Fit the trajectory with the least-squares LDS Reproduce the trajectory with the estimated LDS Response of LDS to various initial conditions represents a hypothesis of user intent Response of LDS to different subgoals represents a hypothesis of user intent

2004-01-23, Inferring User Intent for LBO – p.50

SLIDE 68

Carnegie Mellon

LDS Example

Start Start 2004-01-23, Inferring User Intent for LBO – p.51

SLIDE 69

Carnegie Mellon

The Building Blocks

Suppose we have a trajectory, X = {x0, x1, . . . , xN} Assume data are generated according to

xn+1 = R (xn − xN ) + xn

2004-01-23, Inferring User Intent for LBO – p.52

SLIDE 70

Carnegie Mellon

The Building Blocks

Suppose we have a trajectory, X = {x0, x1, . . . , xN} Assume data are generated according to

xn+1 = R (xn − xN ) + xn

Captures direction, curvature, speed

2004-01-23, Inferring User Intent for LBO – p.52

SLIDE 71

Carnegie Mellon

The Building Blocks

Suppose we have a trajectory, X = {x0, x1, . . . , xN} Assume data are generated according to

xn+1 = R (xn − xN ) + xn

Captures direction, curvature, speed Specifies trajectory terminus

2004-01-23, Inferring User Intent for LBO – p.52

SLIDE 72

Carnegie Mellon

LDS Estimation

Assume subgoal is last point in trajectory segment

2004-01-23, Inferring User Intent for LBO – p.53

SLIDE 73

Carnegie Mellon

LDS Estimation

Assume subgoal is last point in trajectory segment The least-squares solution to R is

R

= ([x1 · · · xN]−[x0 · · · xN−

1]) ([x0 · · · xN− 1]−[xN · · · xN]) R ·

= (X1:N − X0:N−

1) (X0:N− 1 − ΓN) R

2004-01-23, Inferring User Intent for LBO – p.53

SLIDE 74

Carnegie Mellon

LDS Estimation

Assume subgoal is last point in trajectory segment The least-squares solution to R is

R

= ([x1 · · · xN]−[x0 · · · xN−

1]) ([x0 · · · xN− 1]−[xN · · · xN]) R ·

= (X1:N − X0:N−

1) (X0:N− 1 − ΓN) R

Yes, it is a closed-form solution

2004-01-23, Inferring User Intent for LBO – p.53

SLIDE 75

Carnegie Mellon

Reconstructing Trajectories

Use the induced control law with

x0 = x0 and

xn

+ 1

=

R (

xn − xN) + xn

We guarantee stability under reasonable conditions Bounded trajectories Terminate at the desired subgoal

2004-01-23, Inferring User Intent for LBO – p.54

SLIDE 76

Carnegie Mellon

Improving the Generalization

Learning from more trajectories improves generalization

2004-01-23, Inferring User Intent for LBO – p.55

SLIDE 77

Carnegie Mellon

Improving the Generalization

Start Start

Learning from more trajectories improves generalization Two trajectories with “slight counter-clockwise curvature”

2004-01-23, Inferring User Intent for LBO – p.55

SLIDE 78

Carnegie Mellon

Improving the Generalization

Start Start Start Start

Learning from more trajectories improves generalization Two trajectories with “slight counter-clockwise curvature”

2004-01-23, Inferring User Intent for LBO – p.55

SLIDE 79

Carnegie Mellon

A Single LDS Is Not Enough

A single LDS is an extremely compact representation But its simplicity is insufficient for LBO tasks

2004-01-23, Inferring User Intent for LBO – p.56

SLIDE 80

Carnegie Mellon

A Single LDS Is Not Enough

A single LDS is an extremely compact representation But its simplicity is insufficient for LBO tasks Segment complicated trajectories based on predictability

2004-01-23, Inferring User Intent for LBO – p.56

SLIDE 81

Carnegie Mellon

A Single LDS Is Not Enough

Use LDS to predict next observation and segment trajectory at poorly predicted points These points mark subgoals in the trajectory

2004-01-23, Inferring User Intent for LBO – p.57

SLIDE 82

Carnegie Mellon

A Single LDS Is Not Enough

Use LDS to predict next observation and segment trajectory at poorly predicted points These points mark subgoals in the trajectory

2004-01-23, Inferring User Intent for LBO – p.57

SLIDE 83

Carnegie Mellon

Trading Simplicity for Accuracy

Prediction-error threshold = 0.4 Prediction-error threshold = 0.8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 50

Prediction−error threshold Subgoals

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.005 0.01 0.015 0.02 0.025 0.03

Prediction−error threshold Avg Trajectory Error

2004-01-23, Inferring User Intent for LBO – p.58

SLIDE 84

Carnegie Mellon

Outline

Background Modeling the User Predictive Robot Programming Hypothesizing about User Actions Learning By Observation Conclusions and Open Issues

2004-01-23, Inferring User Intent for LBO – p.59

SLIDE 85

Carnegie Mellon

The Real World

We now have a model of user actions Hypotheses of user actions to different conditions We apply these abilities to learning motor skills in mobile robots

2004-01-23, Inferring User Intent for LBO – p.60

SLIDE 86

Carnegie Mellon

Observing the User

Place slalom cones in the lab and track the user

2004-01-23, Inferring User Intent for LBO – p.61

SLIDE 87

Carnegie Mellon

Learning from the User

Estimating the user trajectory Extracted 6 subgoals Average error of estimate is 20 millimeters

2004-01-23, Inferring User Intent for LBO – p.62

SLIDE 88

Carnegie Mellon

Learning from the User

10 runs of the robot 5 runs where the robot was “kidnapped”

2004-01-23, Inferring User Intent for LBO – p.63

SLIDE 89

Carnegie Mellon

Environment Changes

“What would the user have done?” Automatically associate subgoals with objects in the environment LDS estimates automatically adjust their responses

2004-01-23, Inferring User Intent for LBO – p.64

SLIDE 90

Carnegie Mellon

Environment Changes

“What would the user have done?”

2004-01-23, Inferring User Intent for LBO – p.65

SLIDE 91

Carnegie Mellon

Environment Changes

“What would the user have done?” We asked the user to perform the same task Average error of estimate is 200 millimeters

2004-01-23, Inferring User Intent for LBO – p.65

SLIDE 92

Carnegie Mellon

Learning in Different Environments

Can demonstrations from these two environments help in performing in another?

2004-01-23, Inferring User Intent for LBO – p.66

SLIDE 93

Carnegie Mellon

Learning in Different Environments

Learning from demonstrations in different environments

2004-01-23, Inferring User Intent for LBO – p.67

SLIDE 94

Carnegie Mellon

Learning in Different Environments

Learning from demonstrations in different environments First, map subgoals to the same environment

2004-01-23, Inferring User Intent for LBO – p.67

SLIDE 95

Carnegie Mellon

Learning in Different Environments

Learning from demonstrations in different environments First, map subgoals to the same environment Construct CDHMM describing subgoals

2004-01-23, Inferring User Intent for LBO – p.67

SLIDE 96

Carnegie Mellon

Learning in Different Environments

Learning from demonstrations in different environments First, map subgoals to the same environment Construct CDHMM describing subgoals Determine most-likely sequence of subgoals needed to complete task

2004-01-23, Inferring User Intent for LBO – p.67

SLIDE 97

Carnegie Mellon

Learning in Different Environments

Two demonstrations and corresponding subgoals

2004-01-23, Inferring User Intent for LBO – p.68

SLIDE 98

Carnegie Mellon

Learning in Different Environments

A B C

2 1 5 6 3 4

CDHMM describing the subgoals with most-likely sequence shown in green

2004-01-23, Inferring User Intent for LBO – p.69

SLIDE 99

Carnegie Mellon

Learning in Different Environments

Individually, the average error is 364 and 236 millimeters Together, the average error is 189 millimeters

2004-01-23, Inferring User Intent for LBO – p.70

SLIDE 100

Carnegie Mellon

LBO Discussion

Computational approach appears viable Permits learning from Multiple demonstrations Demonstrations in different environments Environment mapping is weakest aspect Hungarian Method requires bijective mapping

2004-01-23, Inferring User Intent for LBO – p.71

SLIDE 101

Carnegie Mellon

Outline

Background Modeling the User Predictive Robot Programming Hypothesizing about User Actions Learning By Observation Conclusions and Open Issues

2004-01-23, Inferring User Intent for LBO – p.72

SLIDE 102

Carnegie Mellon

Conclusions and Contributions

Developed algorithm that estimates the structure of CDHMMs efficiently Created PRP system that predicts waypoints in manipulator programs, based on previous observations, with high accuracy Developed LDS trajectory representation that forms hypotheses of user actions to different conditions Created mobile-robot LBO system that learns motor skills by observing user demonstrations

2004-01-23, Inferring User Intent for LBO – p.73

SLIDE 103

Carnegie Mellon

Open Issues / Ongoing Work

Extending computational approach to include higher-level knowledge Computational Symbolic Hybrid Incorporating semi-supervised (partially labeled) learning techniques to improve LBO and PRP Heuristic environment mapping methods

2004-01-23, Inferring User Intent for LBO – p.74

SLIDE 104

Carnegie Mellon

Thank You

2004-01-23, Inferring User Intent for LBO – p.75

SLIDE 105

Carnegie Mellon

Extra material

References Material for clarification Theorems Proof sketches ...

2004-01-23, Inferring User Intent for LBO – p.76

SLIDE 106

References

Abe, N., & Warmuth, M. K. (1992). On the computational complexity

f approximating probability distributions by probabilistic automata.

Machine Learning, 9. Alissandrakis, A., Nehaniv, C. L., & Dautenhahn, K. (2002). Imitation with ALICE: Learning to imitate corresponding actions across dissimilar embodiments. IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans, 32. Asada, H., & Asari, Y. (1988). The direct teaching of tool manipulation skills via the impedance identifi cation of human motions. Pro- ceedings of the IEEE International Conference on Robotics and Automation. Billard, A., Epars, Y., Cheng, G., & Schaal, S. (2003). Discovering imitation strategies through categorization of multi-dimensional data. Proceedings of the IEEE/RSJ International Conference on Intelli- gent Robots and Systems. Chen, J., & Zelinsky, A. (2003). Programming by demonstration: Coping with suboptimal teaching actions. International Journal of Robotics Research, 22. Craig, J. J. (1989). Introduction to robotics: Mechanics and control. Addison Wesley. Second edition. Cypher, A. (Ed.). (1993). Watch what I do: Programming by demon-

stration. Cambridge, MA: MIT Press.

Ephraim, Y., & Merhav, N. (2002). Hidden Markov processes. IEEE Transactions on Information Theory, 48.

76-1

SLIDE 107

Forsythe, C., & Xavier, P . G. (2002). Human emulation: Progress to- ward realistic synthetic human agents. Proceedings of the 11th Conference on Computer-Generated Forces and Behavior Repre- sentation. Friedrich, H., M¨ unch, S., Dillmann, R., Bocionek, S., & Sassin, M. (1996). Robot programming by demonstration (RPD): Supporting the induction by human interaction. Machine Learning, 23, 163– 189. Gertz, M. W., Maxion, R. A., & Khosla, P . K. (1995). Visual programming and hypermedia implementation within a distributed laboratory environment. Journal of Intelligent Automation and Soft Com- puting. Gillman, D., & Sipser, M. (1994). Inference and minimization of hidden Markov chains. Proceedings of the Seventh Annual ACM Confer- ence on Computational Learning Theory (COLT). Hannaford, B., & Lee, P . (1991). Hidden Markov model analysis of force/torque information in telemanipulation. International Journal

f Robotics Research, 10.

Hovland, G., Sikka, P ., & McCarragher, B. (1996). Skill acquisition from human demonstration using a hidden Markov model. Proceedings

f the IEEE International Conference on Robotics and Automation.

Hwang, J.-H., Arkin, R. C., & Kwon, D.-S. (2003). Mobile robots at your fi ngertip: Bezier curve on-line trajectory generation for supervisory

control. Proceedings of the IEEE/RSJ International Conference on

Intelligent Robots and Systems. Iba, S., Paredis, C. J., & Khosla, P . K. (2002). Interactive multi-modal

76-1

SLIDE 108

robot programming. Proceedings of the IEEE International Confer- ence on Robotics and Automation. Montemerlo, M., Roy, N., & Thrun, S. (2003). Perspectives on stan- dardization in mobile robot programming : The Carnegie Mellon navigation (CARMEN) toolkit. Proceedings of the IEEE/RSJ Inter- national Conference on Intelligent Robots and Systems. Nicolescu, M. N., & Matari´ c, M. J. (2001). Learning and interacting in human-robot domains. IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans, 31. Papadimitriou, C. H., & Steiglitz, K. (1998). Combinatorial optimiza- tion: Algorithms and complexity. Mineola, New York: Dover Publi-

cations. Second edition.

Pomerleau, D. (1991). Effi cient training of artifi cial neural networks for autonomous navigation. Neural Copmutation, 3, 88–97. Pomerleau, D. (1996). Neural network vision for robot driving. Early Visual Learning. Oxford University Press. Rabiner, L. R. (1989). A tutorial on hidden Markov models and se- lected applications in speech recognition. Proceedings of the IEEE, 77, 257–286. Ron, D., Singer, Y., & Tishby, N. (1998). On the learnability and usage

f acyclic probabilistic fi nite automata. Journal of Computer and

System Sciences, 56. Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to motor learning by imitation. Philosophical Transaction

f the Royal Society of London: Series B, Bilogical Sciences, 358,

537–547.

76-1

SLIDE 109

Singh, R., Raj, B., & Stern, R. M. (2002). Automatic generation of sub- word units for speech recognition systems. IEEE Transactions on Speech and Audio Processing, 10, 89–99. Skubic, M., & Volz, R. A. (2000). Acquiring robust, force-based assembly skills from human demonstration. IEEE Transactions on Robotics and Automation, 16. Stolcke, A., & Omohundro, S. (1994). Inducing probabilistic grammars by Bayesian model merging. International Conference on Gram- matical Inference.

76-1

SLIDE 110

Carnegie Mellon

Algorithm Learn-Structure

Algorithm Learn-Structure X = {X0, X1, . . . , XM} is the multiset of all observation sequences. ǫ ≥ 0 is the similarity threshold. 1: V := ∅, E := ∅ 2: GX := (V, V 0, E, X, V, f, g) 3: for all Xi ∈ {X0, X1, . . . , XM} 4: for all xn ∈ {xi

0, xi 1, . . . , xi Ni}

5: ǫmin := min

vi∈V µC(Vvi ∪ {xn}, Vvi ∪ {xn})

6: if ǫmin ≤ ǫ then 7: vnew := arg min

vi∈V µC(Vvi ∪ {xn}, Vvi ∪ {xn})

8: Vvnew := Vvnew ∪ {xn} 9: end if 10: else if ǫmin > ǫ then 11: create empty node vnew 12: V := V ∪ {vnew} 13: Vvnew := {xn} 14: gvnew := 0 15: if n > 0 then 16: E := E {eprev→new} 17: feprev→new := 0 18: end if 19: end else if 20: if n > 0 then 21: feprev→new := feprev→new + 1 22: end if 23: else if n = 0 then 24: gvnew := gvnew + 1 25: end else if 26: vprev := vnew 27: end for all 28: end for all

2004-01-23, Inferring User Intent for LBO – p.77

SLIDE 111

Carnegie Mellon

Similarity-Measure Merging

Measure of similarity is defined to be

µC(Vvk, Vvk)

x∈Vvk

x − Vvk2

C

Two nodes are considered similar if

µC(Vvi ∪ Vvj, Vvi ∪ Vvj) ≤ ǫ

2004-01-23, Inferring User Intent for LBO – p.78

SLIDE 112

Carnegie Mellon

Compactness

Definition 1. An observation graph is ǫ-compact with respect to µ if, for any depth n, all nodes vi ∈ V n and vj ∈ V n

µC(Vvi, Vvi) ≤ ǫ; ǫ < µC(Vvi ∪ Vvj, Vvi ∪ Vvj),

for some ǫ ≥ 0. Theorem 1. Learn-Structure only produces ǫ-compact

bservation graphs.

Proof uses ∃

A ⊆ A s.t. τ < µC( A, A) ⇐ ⇒ τ < µC(A, A).

2004-01-23, Inferring User Intent for LBO – p.79

SLIDE 113

Carnegie Mellon

Model Complexity

Similarity threshold ǫ does not have a real-world analogue Define a “complexity” parameter, δ ∈ (0, 1]

Pr

x−Vvi2

Σ

−1 ≤ ǫ

>

1 − δ

5 10 15 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

ε 1−δ χ2

7 Cumulative Distribution Function

2004-01-23, Inferring User Intent for LBO – p.80

SLIDE 114

Carnegie Mellon

Correctness

Theorem 2. Let λ be a CDHMM converted by one-shot estimation from

Learn-Structure with ǫ ≥ 0 and M iid tasks generated by some

source CDHMM λ∗. If the state q∗ ∈ λ∗ generates observations with finite first and second moments,

Pr

λ

γ

→ q∗ ∈ λ∗ >

1 − (1−p∗)M
1 − tr[CVar(x|q∗, λ∗)]

(γ−√ǫ)2

,

where γ > √ǫ, p∗ = P(cn =q∗|λ∗) > 0, and Var(x|q∗) is the

bservation-generation variance from state q∗.

Proof uses a multivariate Chebyshev’s Inequality

ε2 Pr {x−y

C ≥ ε}

≤ tr[CVar(x)] + y−E{x} 2

C

2004-01-23, Inferring User Intent for LBO – p.81

SLIDE 115

Carnegie Mellon

Correctness

Pr

λ

γ

→ q∗ ∈ λ∗ >

1 − (1−p∗)M
1 − tr

[CVar (x|q∗,λ∗)] (γ− √ǫ)2

2004-01-23, Inferring User Intent for LBO – p.82

SLIDE 116

Carnegie Mellon

Correctness

Pr

λ

γ

→ q∗ ∈ λ∗ >

1 − (1−p∗)M
1 − tr

[CVar (x|q∗,λ∗)] (γ− √ǫ)2

A state in the estimated CDHMM is “within” γ of a state

in the target CDHMM

2004-01-23, Inferring User Intent for LBO – p.82

SLIDE 117

Carnegie Mellon

Correctness

Pr

λ

γ

→ q∗ ∈ λ∗ >

1 − (1−p∗)M
1 − tr

[CVar (x|q∗,λ∗)] (γ− √ǫ)2

A state in the estimated CDHMM is “within” γ of a state

in the target CDHMM Increases exponentially as more tasks are observed

2004-01-23, Inferring User Intent for LBO – p.82

SLIDE 118

Carnegie Mellon

Correctness

Pr

λ

γ

→ q∗ ∈ λ∗ >

1 − (1−p∗)M
1 − tr

[CVar (x|q∗,λ∗)] (γ− √ǫ)2

A state in the estimated CDHMM is “within” γ of a state

in the target CDHMM Increases exponentially as more tasks are observed Asymptotes to an “intrinsic error” based on the variability

f the target CDHMM

2004-01-23, Inferring User Intent for LBO – p.82

SLIDE 119

Carnegie Mellon

Prediction Confidence

Compute prediction confidence, φn ∈ [0, 1]

φn

DKL(cn

1 |Q|)

log2 |Q| = 1 − H(cn) log2 |Q|, cn is the current-state random variable conditioned on

bserving the current task

|Q| is the number of states in the CDHMM

2004-01-23, Inferring User Intent for LBO – p.83

SLIDE 120

Carnegie Mellon

Prediction

We use Maximum Likelihood estimators

x∗

n

= arg max

xn

j

bj(n) νn(j)

2004-01-23, Inferring User Intent for LBO – p.84

SLIDE 121

Carnegie Mellon

Prediction

We use Maximum Likelihood estimators

x∗

n

= arg max

xn

j

bj(n) νn(j)

Likelihood state j generates xn

2004-01-23, Inferring User Intent for LBO – p.84

SLIDE 122

Carnegie Mellon

Prediction

We use Maximum Likelihood estimators

x∗

n

= arg max

xn

j

bj(n)

i

νn(j)

Likelihood state j generates xn Probability that state j generates next observation

2004-01-23, Inferring User Intent for LBO – p.84

SLIDE 123

Carnegie Mellon

Proof of Stability

Let the estimated LDS be

xn

+ 1

=

R (

xn − xN) + xn =

R + I
xn −

RxN

where

R

= ([x1 · · · xN]−[x0 · · · xN−

1]) ([x0 · · · xN− 1]−[xN · · · xN]) R ·

= (X1:N − X0:N−

1) (X0:N− 1 − ΓN) R

Theorem 3. The estimated LDS is stable in the sense of Lyapunov about the equilibrium point xN if the matrix

X0

N− 1:−ΓN

has full row rank.

Proof uses the discrete algebraic Lyapunov equation

2004-01-23, Inferring User Intent for LBO – p.85

SLIDE 124

Carnegie Mellon

Trading Simplicity for Accuracy

Error threshold = 0.4 Error threshold = 0.61 Error threshold = 0.8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 50

Prediction−error threshold Subgoals

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.005 0.01 0.015 0.02 0.025 0.03

Prediction−error threshold Avg Trajectory Error

2004-01-23, Inferring User Intent for LBO – p.86

SLIDE 125

Carnegie Mellon

Environment Mappings

1 2 3 4 5 6 7 8 a b c d e f g h

Map each object on the left to one on the right

1 2 3 4 5 6 7 8

Consider object connected to all others with “springs” Compute mapping that minimizes total change in force Weighted bipartite graph matching problem Hungarian Method ⇒ Linear Program (Papadimitriou &

Steiglitz, 1998)

2004-01-23, Inferring User Intent for LBO – p.87