CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov - - PowerPoint PPT Presentation

cs 378 autonomous intelligent robotics
SMART_READER_LITE
LIVE PREVIEW

CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov - - PowerPoint PPT Presentation

CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov http://www.cs.utexas.edu/~jsinapov/teaching/cs378/ Multimodal Perception Announcements Final Projects Presentation Date: Thursday, May 12, 9:00-12:00 noon Project Deliverables


slide-1
SLIDE 1

CS 378: Autonomous Intelligent Robotics

Instructor: Jivko Sinapov

http://www.cs.utexas.edu/~jsinapov/teaching/cs378/

slide-2
SLIDE 2

Multimodal Perception

slide-3
SLIDE 3

Announcements

Final Projects Presentation Date: Thursday, May 12, 9:00-12:00 noon

slide-4
SLIDE 4

Project Deliverables

  • Final Report (6+ pages in PDF)
  • Code and Documentation (posted on

github)

  • Presentation including video and/or demo
slide-5
SLIDE 5

Multi-modal Perception

slide-6
SLIDE 6

The “5” Senses

slide-7
SLIDE 7

The “5” Senses

[http://edublog.cmich.edu/meado1bl/files/2013/03/Five-Senses2.jpg]

slide-8
SLIDE 8

The “5” Senses

[http://edublog.cmich.edu/meado1bl/files/2013/03/Five-Senses2.jpg]

slide-9
SLIDE 9
slide-10
SLIDE 10

[http://neurolearning.com/sensoryslides.pdf]

slide-11
SLIDE 11
slide-12
SLIDE 12

How are sensory signals from different modalities integrated?

slide-13
SLIDE 13
slide-14
SLIDE 14

[Battaglia et. al. 2003]

slide-15
SLIDE 15

Locating the Stimulus Using a Single Modality

Standard Trial Comparison Trial

Is the stimulus in Trial 2 located to the left or to the right

  • f the stimulus in Trial 1?
slide-16
SLIDE 16

Locating the Stimulus Using a Single Modality

Standard Trial Comparison Trial

Is the stimulus in Trial 2 located to the left or to the right

  • f the stimulus in Trial 1?
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19

Multimodal Condition

Standard Trial Comparison Trial

slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22

[Ernst, 2006]

slide-23
SLIDE 23
slide-24
SLIDE 24

Take-home Message

During integration, sensory modalities are weighted based on their individual reliability

slide-25
SLIDE 25

Further Reading

Ernst, Marc O., and Heinrich H. Bülthoff. "Merging the senses into a robust percept." Trends in cognitive sciences 8.4 (2004): 162-169. Battaglia, Peter W., Robert A. Jacobs, and Richard

  • N. Aslin. "Bayesian integration of visual and

auditory signals for spatial localization." JOSA A 20.7 (2003): 1391-1397.

slide-26
SLIDE 26

Sensory Integration During Speech Perception

slide-27
SLIDE 27

McGurk Effect

slide-28
SLIDE 28

McGurk Effect

https://www.youtube.com/watch?v=G-lN8vWm3m0 https://vimeo.com/64888757

slide-29
SLIDE 29
slide-30
SLIDE 30

Object Recognition Using Auditory and Proprioceptive Feedback

Sinapov et al. “Interactive Object Recognition using Proprioceptive and Auditory Feedback” International Journal of Robotics Research, Vol. 30, No. 10, September 2011

slide-31
SLIDE 31

What is Proprioception?

“It is the sense that indicates whether the body is moving with required effort, as well as where the various parts of the body are located in relation to each other.”

  • Wikipedia
slide-32
SLIDE 32

Why Proprioception?

slide-33
SLIDE 33

Why Proprioception?

Full Empty

slide-34
SLIDE 34

Why Proprioception?

Hard Soft

slide-35
SLIDE 35

Exploratory Behaviors

Lift: Drop: Push: Shake: Crush:

slide-36
SLIDE 36

Objects

slide-37
SLIDE 37

Sensorimotor Contexts

lift shake drop press push audio proprioception Behaviors Sensory Modalities

slide-38
SLIDE 38

Feature Extraction

J1 J7

. . . Time

slide-39
SLIDE 39

Feature Extraction

Training a self-organizing map (SOM) using sampled joint torques: Training an SOM using sampled frequency distributions:

slide-40
SLIDE 40

Discretization of joint-torque records using a trained SOM

is the sequence of activated SOM nodes

  • ver the duration of the interaction

Discretization of the DFT of a sound using a trained SOM

is the sequence of activated SOM nodes

  • ver the duration of the sound

Feature Extraction

slide-41
SLIDE 41

Proprioceptive Recognition Model Auditory Recognition Model Weighted Combination

Proprioception sequence Audio sequence

slide-42
SLIDE 42
slide-43
SLIDE 43

Accuracy vs. Number of Objects

slide-44
SLIDE 44

Accuracy vs. Number of Behaviors

slide-45
SLIDE 45
slide-46
SLIDE 46

Results with a Second Dataset

  • Tactile Surface Recognition:

– 5 scratching behaviors – 2 modalities: vibrotactile and proprioceptive

Artificial Finger Tip

Sinapov et al. “Vibrotactile Recognition and Categorization of Surfaces by a Humanoid Robot” IEEE Transactions on Robotics, Vol. 27, No. 3, pp. 488-497, June 2011

slide-47
SLIDE 47

Surface Recognition Results

Chance accuracy = 1/20 = 5 %

slide-48
SLIDE 48

Scaling up: more sensory modalities, objects and behaviors

Microphones in the head Torque sensors in the joints ZCam (RGB+D) Logitech Webcam 3-axis accelerometer

slide-49
SLIDE 49

100 objects

slide-50
SLIDE 50

Exploratory Behaviors

grasp lift hold shake drop tap poke push press

slide-51
SLIDE 51

Object Exploration Video

slide-52
SLIDE 52

Object Exploration Video #2

slide-53
SLIDE 53

Coupling Action and Perception

… … … … … …

Time

Action: poke Perception: optical flow

slide-54
SLIDE 54

look grasp lift hold shake audio (DFT) proprioception (joint torques) Color drop tap poke push press Optical flow SURF proprioception (finger pos.)

Sensorimotor Contexts

slide-55
SLIDE 55

Sensorimotor Contexts

look grasp lift hold shake audio (DFT) proprioception (joint torques) Color drop tap poke push press Optical flow SURF proprioception (finger pos.)

slide-56
SLIDE 56

Feature Extraction: Proprioception

Joint-Torque values for all 7 Joints Joint-Torque Features

slide-57
SLIDE 57

Feature Extraction: Audio

audio spectrogram Spectro-temporal Features

slide-58
SLIDE 58

Feature Extraction: Color

Color Histogram (4 x 4 x 4 = 64 bins)

Object Segmentation

slide-59
SLIDE 59

Feature Extraction: Optical Flow

… … … Angular bins Count

slide-60
SLIDE 60

Feature Extraction: Optical Flow

… … …

slide-61
SLIDE 61

Feature Extraction: SURF

slide-62
SLIDE 62

Feature Extraction: SURF

Each interest point is described by a 128-dimensional vector

slide-63
SLIDE 63

Feature Extraction: SURF

Visual “words” Count

slide-64
SLIDE 64

Dimensionality of Data

audio (DFT) proprioception (joint torques) Color Optical flow SURF proprioception (finger pos.) 100 70 64 10 200 6

slide-65
SLIDE 65

Data From a Single Exploratory Trial

look grasp lift hold shake audio (DFT) proprioception (joint torques) Color drop tap poke push press Optical flow SURF proprioception (finger pos.)

slide-66
SLIDE 66

Data From a Single Exploratory Trial

look grasp lift hold shake audio (DFT) proprioception (joint torques) Color drop tap poke push press Optical flow SURF proprioception (finger pos.)

x 5 per object

slide-67
SLIDE 67

Overview

Category Recognition Model

Sensorimotor Feature Extraction Interaction with Object Category Estimates

slide-68
SLIDE 68

Context-specific Category Recognition

Observation from poke- audio context

Mpoke-audio

Recognition model for poke-audio context Distribution over category labels

slide-69
SLIDE 69
  • The models were implemented by two machine

learning algorithms:

  • K-Nearest Neighbors (k = 3)
  • Support Vector Machine

Context-specific Category Recognition

slide-70
SLIDE 70

Support Vector Machine

  • Support Vector Machine: a discriminative learning algorithm
  • 1. Finds maximum margin

hyperplane that separates two classes

  • 2. Uses Kernel function to

map data points into a feature space in which such a hyperplane exists

[http://www.imtech.res.in/raghava/rbpred/svm.jpg]

slide-71
SLIDE 71

Combining Model Outputs

. . . .

Mlook-color Mtap-audio Mlift-SURF Mpress-prop.

. . . .

Weighted Combination

slide-72
SLIDE 72

Model Evaluation: 5 fold Cross-Validation

Train Set Test Set

slide-73
SLIDE 73

Recognition Rates (%) with SVM

Audio Proprioception Color Optical Flow SURF All look 58.8 58.9 67.7 grasp 45.7 38.7 12.2 57.1 65.2 lift 48.1 63.7 5.0 65.9 79.0 hold 30.2 43.9 5.0 58.1 67.0 shake 49.3 57.7 32.8 75.6 76.8 drop 47.9 34.9 17.2 57.9 71.0 tap 63.3 50.7 26.0 77.3 82.4 push 72.8 69.6 26.4 76.8 88.8 poke 65.9 63.9 17.8 74.7 85.4 press 62.7 69.7 32.4 69.7 77.4

slide-74
SLIDE 74
slide-75
SLIDE 75
slide-76
SLIDE 76
slide-77
SLIDE 77
slide-78
SLIDE 78
slide-79
SLIDE 79

Distribution of rates over categories

slide-80
SLIDE 80

Can behaviors be selected actively to minimize exploration time?

slide-81
SLIDE 81

Active Behavior Selection

  • For each behavior , estimate such that
  • Let be the vector encoding the robot’s current

estimates over the category labels and let be the remaining set of behaviors available to the robot

slide-82
SLIDE 82

Example with 3 Categories and 2 Behaviors

A B C A B C A B C A B C A B C Remaining Behaviors and Associated Confusion: B1 Current Estimate: B2

slide-83
SLIDE 83

Active Behavior Selection: Example

A B C A B C A B C A B C A B C Current Estimate: Remaining Behaviors and Associated Confusion: B1 B2

slide-84
SLIDE 84

Active Behavior Selection

slide-85
SLIDE 85

Active vs. Random Behavior Selection

slide-86
SLIDE 86

Active vs. Random Behavior Selection

slide-87
SLIDE 87

Discussion

What are some of the limitations of the experiment? What are some ways to address them? What other possible senses can you think of that would be useful to a robot?

slide-88
SLIDE 88

References

Sinapov, J., Bergquist, T., Schenck, C., Ohiri, U., Griffith, S., and Stoytchev, A. (2011) Interactive Object Recognition Using Proprioceptive and Auditory

  • Feedback. International Journal of Robotics Research, Vol. 30, No. 10, pp. 1250-

1262 Sinapov, J., Schenck, C., Staley, K., Sukhoy, V., and Stoytchev, A. (2014) Grounding Semantic Categories in Behavioral Interactions: Experiments with 100

  • Objects. Robotics and Autonomous Systems, Vol. 62, No. 5, pp. 632-645
slide-89
SLIDE 89

THE END

slide-90
SLIDE 90