Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models - - PowerPoint PPT Presentation

▶

Mar 02, 2024 503 likes •863 views

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition Stefan Mathe, Cristian Sminchisescu Presented by Mit Shah Motivation Current Computer Vision Annotations subjectively defined

SLIDE 1

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition

Stefan Mathe, Cristian Sminchisescu

Presented by Mit Shah

SLIDE 2

Motivation…

Current Computer Vision

○ Annotations subjectively defined ○ Intermediate levels of computation??

SLIDE 3

Motivation…

Lack of large scale datasets that provide recordings of the workings of the

human visual system

SLIDE 4

Previous Work...

Study of Gaze patterns in Humans

A person browsing reddit with the F-shaped pattern

SLIDE 5

Previous Work...

Study of Gaze patterns in Humans

○ Inter-observer consistency

SLIDE 6

Previous Work...

Study of Gaze patterns in Humans

○ Inter-observer consistency ○ Bottom-up Features

SLIDE 7

Previous Work...

Study of Gaze patterns in Humans

○ Inter-observer consistency ○ Bottom-up Features ○ Human Fixations

SLIDE 8

Previous Work...

Study of Gaze patterns in Humans

○ Inter-observer consistency ○ Bottom-up Features ○ Human Fixations ○ Models of saliency

SLIDE 9

Previous Work...

Study of Gaze patterns in Humans

○ Inter-observer consistency ○ Bottom-up Features ○ Human Fixations ○ Models of saliency ○ Uses of Saliency maps

Action Recognition Scene Classification Object Localization

SLIDE 10

Previous Work...

Study of Gaze patterns in Humans

○ Inter-observer consistency ○ Bottom-up Features ○ Human Fixations ○ Models of saliency ○ Uses of Saliency maps ○ Previous data sets

At most few hundred videos recorded under free viewing conditions

SLIDE 11

Contributions... (1)

❏ Extended existing large scale datasets Hollywood-2 and UCF Sports

SLIDE 12

Contributions... (2)

❏ Dynamic consistency and alignment measures

AOI Markov Dynamics Temporal AOI Alignment

SLIDE 13

Contributions... (3)

❏ Training an End-to-End automatic visual action recognition system

SLIDE 14

Data Collection...

Hollywood-2 Movie Dataset

12 classes 69 movies 823/884 split 487k frames 20 hr Largest and Most challenging dataset Answering phone, driving a car, eating, fighting, etc.

SLIDE 15

Data Collection...

UCF Sports Action Dataset

Broadcast of television channels 150 videos covering 9 sports action classes Diving, golf swinging, kicking, etc..

SLIDE 16

Data Collection...

Extending the two data sets

19 Humans Action Recognition TASKS SMI iView X HiSpeed 1250 Tower-Mounted Eye Tracker Recording Environment Context Recognition Free Viewing Recording Protocol D i v i d e d i n t

T a s k s Many other Specifications Timings/Durations & Breaks

SLIDE 17

Static & Dynamic Consistency

Action Recognition by Humans

Goal & Importance
Human errors

○ Co Occurring Actions ○ False Positives ○ Mislabeling Videos

SLIDE 18

Static Consistency Among Subjects

How well the regions fixated by human subjects agree on a frame by

frame basis?

Evaluation Protocol

SLIDE 19

Static Consistency Among Subjects

SLIDE 20

nA Times

The Influence of Task on Eye Movements

SA \ {s} Derive Saliency Maps Predict Fixations of Subject s Evaluate average prediction score for s’ in SB nA prediction scores SA Derive Saliency Maps nB prediction scores Hypothesis p-value >= 0.5 ? Independent 2-sample T-test with unequal variances

SLIDE 21

The Influence of Task on Eye Movements

Results -

SLIDE 22

Dynamic Consistency Among Subjects

Spatial distribution - highly consistent

Significant consistency in the order also??
Automatic Discovery of AOIs & 2 metrics

○ AOI Markov dynamics ○ Temporal AOI alignment

SLIDE 23

Scanpath representation

Human fixations - tightly clustered
Assigning to closest AOI
Trace the scan path

SLIDE 24

Automatically Finding AOIs

Clustering the fixations of all subjects in a frame

Start K-Means with 1 cluster Successively Increase until the sum of squared errors drops below a threshold Link centroids from successive frames into tracks Each resulting track becomes an AOI Each fixation assigned to the closest AOI at the time of creation

SLIDE 25

Automatically Finding AOIs

SLIDE 26

AOI Markov Dynamics

Transitions of human visual attention between AOIs by..

Fixated at AOI “a” @ time t-1 Probability of Transitioning to AOI “b” @ time t Human Fixation String fi

SLIDE 27

Temporal AOI Alignment

Longest Common Subsequence??

Able to handle gaps and missing elements

SLIDE 28

Evaluation Pipeline

Interest Point Operator Descriptor Visual Dictionary Classifiers

Input: A video Output: A set of spatio-temporal coordinates Spacetime generalization

f the HoG &

MBH from

ptical flow

Cluster descriptors into 4000 Visual words using K-means RBF-2 kernel and Multiple Kernel Learning (MKL) framework

SLIDE 29

Human Fixation Studies

Human vs. Computer Vision Operators

Fixations as interest point detector

Findings

○ Low correlation ○ Why??

SLIDE 30

Impact of Human Saliency Maps for Computer Visual Action Recognition

Saliency maps encoding only the weak surface structure of fixations (no time

rdering), can be used to boost the accuracy of contemporary methods

SLIDE 31

Saliency Map Prediction

Static Features Motion Features AUC & Spatial KL Divergence

SLIDE 32

Automatic Visual Action Recognition

SLIDE 33

Conclusions

Combining Human + Computer Vision
Extending Dataset
Evaluating Static & Dynamic Consistency
Human Fixations -> Saliency Maps
End-to-End Action Recognition System

SLIDE 34

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition

Motivation…

Motivation…

Previous Work...

Previous Work...

Previous Work...

Previous Work...

Previous Work...

Previous Work...

Previous Work...

Contributions... (1)

Contributions... (2)

Contributions... (3)

Data Collection...

Data Collection...

Data Collection...

Static & Dynamic Consistency

Static Consistency Among Subjects

Static Consistency Among Subjects

The Influence of Task on Eye Movements

The Influence of Task on Eye Movements

Dynamic Consistency Among Subjects

Scanpath representation

Automatically Finding AOIs

Automatically Finding AOIs

AOI Markov Dynamics

Temporal AOI Alignment

Evaluation Pipeline

Human Fixation Studies

Impact of Human Saliency Maps for Computer Visual Action Recognition

Saliency Map Prediction

Automatic Visual Action Recognition

Conclusions

Thanks!