[PPT] - CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov PowerPoint Presentation

SLIDE 1

CS 378: Autonomous Intelligent Robotics

Instructor: Jivko Sinapov

http://www.cs.utexas.edu/~jsinapov/teaching/cs378/

SLIDE 2

Multimodal Perception

SLIDE 3

Announcements

Final Projects Presentation Date: Thursday, May 12, 9:00-12:00 noon

SLIDE 4

Project Deliverables

Final Report (6+ pages in PDF)
Code and Documentation (posted on

github)

Presentation including video and/or demo

SLIDE 5

Multi-modal Perception

SLIDE 6

The “5” Senses

SLIDE 7

The “5” Senses

[http://edublog.cmich.edu/meado1bl/files/2013/03/Five-Senses2.jpg]

SLIDE 8

The “5” Senses

[http://edublog.cmich.edu/meado1bl/files/2013/03/Five-Senses2.jpg]

SLIDE 9

SLIDE 10

[http://neurolearning.com/sensoryslides.pdf]

SLIDE 11

SLIDE 12

How are sensory signals from different modalities integrated?

SLIDE 13

SLIDE 14

[Battaglia et. al. 2003]

SLIDE 15

Locating the Stimulus Using a Single Modality

Standard Trial Comparison Trial

Is the stimulus in Trial 2 located to the left or to the right

f the stimulus in Trial 1?

SLIDE 16

Locating the Stimulus Using a Single Modality

Standard Trial Comparison Trial

Is the stimulus in Trial 2 located to the left or to the right

f the stimulus in Trial 1?

SLIDE 17

SLIDE 18

SLIDE 19

Multimodal Condition

Standard Trial Comparison Trial

SLIDE 20

SLIDE 21

SLIDE 22

[Ernst, 2006]

SLIDE 23

SLIDE 24

Take-home Message

During integration, sensory modalities are weighted based on their individual reliability

SLIDE 25

Sensory Integration During Speech Perception

SLIDE 27

McGurk Effect

SLIDE 28

McGurk Effect

https://www.youtube.com/watch?v=G-lN8vWm3m0 https://vimeo.com/64888757

SLIDE 29

SLIDE 30

Object Recognition Using Auditory and Proprioceptive Feedback

Sinapov et al. “Interactive Object Recognition using Proprioceptive and Auditory Feedback” International Journal of Robotics Research, Vol. 30, No. 10, September 2011

SLIDE 31

What is Proprioception?

“It is the sense that indicates whether the body is moving with required effort, as well as where the various parts of the body are located in relation to each other.”

Wikipedia

SLIDE 32

Why Proprioception?

SLIDE 33

Why Proprioception?

Full Empty

SLIDE 34

Why Proprioception?

Hard Soft

SLIDE 35

Exploratory Behaviors

Lift: Drop: Push: Shake: Crush:

SLIDE 36

Objects

SLIDE 37

Sensorimotor Contexts

lift shake drop press push audio proprioception Behaviors Sensory Modalities

SLIDE 38

Feature Extraction

J1 J7

. . . Time

SLIDE 39

Feature Extraction

Training a self-organizing map (SOM) using sampled joint torques: Training an SOM using sampled frequency distributions:

SLIDE 40

Discretization of joint-torque records using a trained SOM

is the sequence of activated SOM nodes

ver the duration of the interaction

Discretization of the DFT of a sound using a trained SOM

is the sequence of activated SOM nodes

ver the duration of the sound

Feature Extraction

SLIDE 41

Proprioceptive Recognition Model Auditory Recognition Model Weighted Combination

Proprioception sequence Audio sequence

SLIDE 42

SLIDE 43

Accuracy vs. Number of Objects

SLIDE 44

Accuracy vs. Number of Behaviors

SLIDE 45

SLIDE 46

Results with a Second Dataset

Tactile Surface Recognition:

– 5 scratching behaviors – 2 modalities: vibrotactile and proprioceptive

Artificial Finger Tip

Sinapov et al. “Vibrotactile Recognition and Categorization of Surfaces by a Humanoid Robot” IEEE Transactions on Robotics, Vol. 27, No. 3, pp. 488-497, June 2011

SLIDE 47

Surface Recognition Results

Chance accuracy = 1/20 = 5 %

SLIDE 48

Scaling up: more sensory modalities, objects and behaviors

Microphones in the head Torque sensors in the joints ZCam (RGB+D) Logitech Webcam 3-axis accelerometer

SLIDE 49

100 objects

SLIDE 50

Exploratory Behaviors

grasp lift hold shake drop tap poke push press

SLIDE 51

Object Exploration Video

SLIDE 52

Object Exploration Video #2

SLIDE 53

Coupling Action and Perception

… … … … … …

Time

Action: poke Perception: optical flow

SLIDE 54

look grasp lift hold shake audio (DFT) proprioception (joint torques) Color drop tap poke push press Optical flow SURF proprioception (finger pos.)

Sensorimotor Contexts

SLIDE 55

Sensorimotor Contexts

look grasp lift hold shake audio (DFT) proprioception (joint torques) Color drop tap poke push press Optical flow SURF proprioception (finger pos.)

SLIDE 56

Feature Extraction: Proprioception

Joint-Torque values for all 7 Joints Joint-Torque Features

SLIDE 57

Feature Extraction: Audio

audio spectrogram Spectro-temporal Features

SLIDE 58

Feature Extraction: Color

Color Histogram (4 x 4 x 4 = 64 bins)

Object Segmentation

SLIDE 59

Feature Extraction: Optical Flow

… … … Angular bins Count

SLIDE 60

Feature Extraction: Optical Flow

… … …

SLIDE 61

Feature Extraction: SURF

SLIDE 62

Feature Extraction: SURF

Each interest point is described by a 128-dimensional vector

SLIDE 63

Feature Extraction: SURF

Visual “words” Count

SLIDE 64

Dimensionality of Data

audio (DFT) proprioception (joint torques) Color Optical flow SURF proprioception (finger pos.) 100 70 64 10 200 6

SLIDE 65

Data From a Single Exploratory Trial

look grasp lift hold shake audio (DFT) proprioception (joint torques) Color drop tap poke push press Optical flow SURF proprioception (finger pos.)

SLIDE 66

Data From a Single Exploratory Trial

look grasp lift hold shake audio (DFT) proprioception (joint torques) Color drop tap poke push press Optical flow SURF proprioception (finger pos.)

x 5 per object

SLIDE 67

Overview

Category Recognition Model

Sensorimotor Feature Extraction Interaction with Object Category Estimates

…

SLIDE 68

Context-specific Category Recognition

Observation from poke- audio context

Mpoke-audio

Recognition model for poke-audio context Distribution over category labels

SLIDE 69

The models were implemented by two machine

learning algorithms:

K-Nearest Neighbors (k = 3)
Support Vector Machine

Context-specific Category Recognition

SLIDE 70

Support Vector Machine

Support Vector Machine: a discriminative learning algorithm
1. Finds maximum margin

hyperplane that separates two classes

2. Uses Kernel function to

map data points into a feature space in which such a hyperplane exists

[http://www.imtech.res.in/raghava/rbpred/svm.jpg]

SLIDE 71

Combining Model Outputs

. . . .

Mlook-color Mtap-audio Mlift-SURF Mpress-prop.

. . . .

Weighted Combination

SLIDE 72

Model Evaluation: 5 fold Cross-Validation

Train Set Test Set

SLIDE 73

Recognition Rates (%) with SVM

Audio Proprioception Color Optical Flow SURF All look 58.8 58.9 67.7 grasp 45.7 38.7 12.2 57.1 65.2 lift 48.1 63.7 5.0 65.9 79.0 hold 30.2 43.9 5.0 58.1 67.0 shake 49.3 57.7 32.8 75.6 76.8 drop 47.9 34.9 17.2 57.9 71.0 tap 63.3 50.7 26.0 77.3 82.4 push 72.8 69.6 26.4 76.8 88.8 poke 65.9 63.9 17.8 74.7 85.4 press 62.7 69.7 32.4 69.7 77.4

SLIDE 74

SLIDE 75

SLIDE 76

SLIDE 77

SLIDE 78

SLIDE 79

Distribution of rates over categories

SLIDE 80

Can behaviors be selected actively to minimize exploration time?

SLIDE 81

Active Behavior Selection

For each behavior , estimate such that
Let be the vector encoding the robot’s current

estimates over the category labels and let be the remaining set of behaviors available to the robot

SLIDE 82

Example with 3 Categories and 2 Behaviors

A B C A B C A B C A B C A B C Remaining Behaviors and Associated Confusion: B1 Current Estimate: B2

SLIDE 83

Active Behavior Selection: Example

A B C A B C A B C A B C A B C Current Estimate: Remaining Behaviors and Associated Confusion: B1 B2

SLIDE 84

Active Behavior Selection

SLIDE 85

Active vs. Random Behavior Selection

SLIDE 86

Active vs. Random Behavior Selection

SLIDE 87

Discussion

What are some of the limitations of the experiment? What are some ways to address them? What other possible senses can you think of that would be useful to a robot?

SLIDE 88

References

Sinapov, J., Bergquist, T., Schenck, C., Ohiri, U., Griffith, S., and Stoytchev, A. (2011) Interactive Object Recognition Using Proprioceptive and Auditory

Feedback. International Journal of Robotics Research, Vol. 30, No. 10, pp. 1250-

1262 Sinapov, J., Schenck, C., Staley, K., Sukhoy, V., and Stoytchev, A. (2014) Grounding Semantic Categories in Behavioral Interactions: Experiments with 100

Objects. Robotics and Autonomous Systems, Vol. 62, No. 5, pp. 632-645

SLIDE 89

THE END

SLIDE 90

CS 378: Autonomous Intelligent Robotics

Instructor: Jivko Sinapov

Multimodal Perception

Announcements

Final Projects Presentation Date: Thursday, May 12, 9:00-12:00 noon

Project Deliverables

github)

Multi-modal Perception

The “5” Senses

The “5” Senses

The “5” Senses

How are sensory signals from different modalities integrated?

Locating the Stimulus Using a Single Modality

Is the stimulus in Trial 2 located to the left or to the right

Locating the Stimulus Using a Single Modality

Is the stimulus in Trial 2 located to the left or to the right

Multimodal Condition

Take-home Message

During integration, sensory modalities are weighted based on their individual reliability

Further Reading

Ernst, Marc O., and Heinrich H. Bülthoff. "Merging the senses into a robust percept." Trends in cognitive sciences 8.4 (2004): 162-169. Battaglia, Peter W., Robert A. Jacobs, and Richard

auditory signals for spatial localization." JOSA A 20.7 (2003): 1391-1397.

Sensory Integration During Speech Perception

McGurk Effect

McGurk Effect

https://www.youtube.com/watch?v=G-lN8vWm3m0 https://vimeo.com/64888757

Object Recognition Using Auditory and Proprioceptive Feedback

What is Proprioception?

“It is the sense that indicates whether the body is moving with required effort, as well as where the various parts of the body are located in relation to each other.”

Why Proprioception?

Why Proprioception?

Full Empty

Why Proprioception?

Hard Soft

Exploratory Behaviors

Objects

Sensorimotor Contexts

Feature Extraction

Feature Extraction

Feature Extraction

Proprioceptive Recognition Model Auditory Recognition Model Weighted Combination

Accuracy vs. Number of Objects

Accuracy vs. Number of Behaviors

Results with a Second Dataset

– 5 scratching behaviors – 2 modalities: vibrotactile and proprioceptive

Surface Recognition Results

Chance accuracy = 1/20 = 5 %

Scaling up: more sensory modalities, objects and behaviors

Microphones in the head Torque sensors in the joints ZCam (RGB+D) Logitech Webcam 3-axis accelerometer

100 objects

Exploratory Behaviors

Object Exploration Video

Object Exploration Video #2

Coupling Action and Perception

Time

Sensorimotor Contexts

Sensorimotor Contexts

Feature Extraction: Proprioception

Feature Extraction: Audio

Feature Extraction: Color

Color Histogram (4 x 4 x 4 = 64 bins)

Feature Extraction: Optical Flow

Feature Extraction: Optical Flow

Feature Extraction: SURF

Feature Extraction: SURF

Feature Extraction: SURF

Dimensionality of Data

Data From a Single Exploratory Trial

Data From a Single Exploratory Trial

x 5 per object

Overview

Sensorimotor Feature Extraction Interaction with Object Category Estimates

…

Context-specific Category Recognition

Mpoke-audio

learning algorithms:

Context-specific Category Recognition

Support Vector Machine

Combining Model Outputs

. . . .