[PPT] - We are grateful to Jason Saragih for providing his CLM code for our PowerPoint Presentation

SLIDE 1

Faculty of Informatics Eötvös Loránd University

High Quality Facial Expression Recognition in Video Streams using Shape Related Information only

Laszlo A. Jeni

The University

f Tokyo

Daniel Takacs

Realeyes Data Services Ltd

Andras Lorincz

Eotvos Lorand University

SLIDE 2

Eötvös Loránd University Faculty of Informatics

We are grateful to Jason Saragih for providing his CLM code for our studies

SLIDE 3

Eötvös Loránd University Faculty of Informatics

Outline

 Introduction  Theory  Datasets  Experiments  Discussion

SLIDE 4

Eötvös Loránd University Faculty of Informatics

Introduction

 Goal: recognize discrete facial emotions in video streams.  We use  precise Constrained Local Model based face tracking and  shape related information

for the emotion classification.

 High quality classification can be achieved.

SLIDE 5

Eötvös Loránd University Faculty of Informatics

Outline

 Introduction  Theory  Datasets  Experiments  Discussion

SLIDE 6

Eötvös Loránd University Faculty of Informatics

Overview of the System

We register a 66 point 3D

constrained local model (CLM) for the face.

In 3D the CLM estimates the rigid

parameters, therefore we can remove this deformation.

We use either AU0 normalization, or

personal mean shape normalization to remove the personal variation of the face.

Finally, a multiclass SVM classifies

the emotions.

3D Facial Feature Point Registration Normalization Video Stream

Face Detection

Emotion Label Normalized 3D Shape Support Vector Machine

SLIDE 7

Eötvös Loránd University Faculty of Informatics

Constrained Local Model

Point Distribution Model (PDM)
Where are the landmaks?
3D model
Parameters
Scale, Projection to 2D, Rotation

(yaw/pitch/roll), Mean shape, Non-rigid components (PCA), PCA Coefficients, Translation

3D Facial Feature Point Registration Normalization Video Stream

Face Detection

Emotion Label Normalized 3D Shape Support Vector Machine

SLIDE 8

Eötvös Loránd University Faculty of Informatics

Constrained Local Model

Local: “local experts” to locate the landmarks

(logit regressors)

Constrained: the relative position of the

landmarks is constrained by the PDM)

Optimization problem:

li = {-1,1}  ith marker is (not) in a correct position

3D Facial Feature Point Registration Normalization Video Stream

Face Detection

Emotion Label Normalized 3D Shape Support Vector Machine

SLIDE 9

Eötvös Loránd University Faculty of Informatics

Constrained Local Model

Positive examples from an annotated dataset
Negative examples: from the neighborhood

prob estimations for one patch markers found

3D Facial Feature Point Registration Normalization Video Stream

Face Detection

Emotion Label Normalized 3D Shape Support Vector Machine

Response map

f

the corner of the eye

SLIDE 10

Eötvös Loránd University Faculty of Informatics

Normalization

AU0 normalization
the difference between the features of the actual

shape and the features of the first (neutral) frame

Personal Mean Shape Normalization
AU0 normalization is crucial for facial expression

recognition, however it is person dependent and it is not available for a single frame.

We assume that we have videos (frame series)

about the subject like in the case of the BU 4DFE and we can compute the personal mean shape.

We found that the mean shape is almost identical

to the neutral shape, i.e., to AU0.

3D Facial Feature Point Registration Normalization Video Stream

Face Detection

Emotion Label Normalized 3D Shape Support Vector Machine

SLIDE 11

Eötvös Loránd University Faculty of Informatics

SVM Based Classification

SVM seeks to minimize the cost

function

Multi-class classification:
decision surfaces are computed for

all class pairs,

for k classes one has k(k − 1)/2

decision surfaces

voting for decision.

3D Facial Feature Point Registration Normalization Video Stream

Face Detection

Emotion Label Normalized 3D Shape Support Vector Machine

SLIDE 12

Eötvös Loránd University Faculty of Informatics

Outline

 Introduction  Theory  Datasets  Experiments  Discussion

SLIDE 13

Eötvös Loránd University Faculty of Informatics

Datasets

Cohn-Kanade Extended

 2D images of 118 subjects  annotated with the seven

universal emotions

 Ground truth landmarks  AU validated emotion labels

BU-4DFE

 High-resolution 3D video

sequences of 101 subjects

 Six prototypic facial

expressions

 No ground truth landmarks (they were provided by the CLM)  Posed expressions

SLIDE 14

Eötvös Loránd University Faculty of Informatics

Outline

 Introduction  Theory  Datasets  Experiments  Discussion

SLIDE 15

Eötvös Loránd University Faculty of Informatics

CK+ with original landmarks

 We used the CK+ dataset with the original 68 2D landmarks  Calculated the mean shape using Procrustes’s method  Normalized all shapes by minimizing the Procrustes distance

between individual shapes and the mean shape

 Compared AU0 normalization with Personal Mean Shape

normalization

 Trained a multi-class SVM using the leave-one-subject-out cross

validation method

SLIDE 16

Eötvös Loránd University Faculty of Informatics

CK+ with original landmarks

 Emotions with large distortions,

such as disgust, happiness and surprise, gave rise to nearly 100% classification performance.

 Even for the worst case,

performance was 92% (fear). AU0 normalization

SLIDE 17

Eötvös Loránd University Faculty of Informatics

CK+ with original landmarks

 Emotions with large distortions,

such as disgust, happiness and surprise, gave rise to nearly 100% classification performance.

 Even for the worst case,

performance was 92% (fear).

 Replacing AU0 normalization by

personal mean shape slightly decreases average performance: recognition on the CK+ database drops from 96% to 94.8% AU0 normalization Personal Mean Shape normalization

SLIDE 18

Eötvös Loránd University Faculty of Informatics

CLM tracked CK+

 We studied the performance of the multi-class SVM using CLM method on

the CK+ dataset.

 We tracked facial expressions with the CLM tracker and annotated all

image sequences starting from the neutral expression to the peak of the emotion.

 3D CLM estimates the rigid and non-rigid transformations:  We removed the rigid ones from the faces and  projected the frontal view to 2D.

SLIDE 19

Eötvös Loránd University Faculty of Informatics

CLM tracked CK+

Classification performance is

affected by imprecision of the CLM tracking.

Emotions with large distortions

can be still recognized in about 90% of the cases, whereas more subtle emotions are sometimes confused with

thers.
With the Personal Mean Shape

Normalization correct classification percentage rises from 77.57% to 86.82% for the CLM tracked CK+.

AU0 normalization Personal Mean Shape normalization

SLIDE 20

Eötvös Loránd University Faculty of Informatics

Comparison of results on CK+

T/S – Texture/ Shape information

SLIDE 21

Eötvös Loránd University Faculty of Informatics

CLM tracked BU-4DFE (frontal case)

We characterized the BU-4DFE database by using the CLM technique:
We selected a frame with neutral expression and an apex frame of the same

frame series. I used these frames and all frames between them for the evaluations.

We applied CLM tracking for the intermediate frames in order, since it is

more robust than applying CLM independently for each frames.

We removed the rigid transformation after the fit and projected the frontal 3D

shapes to 2D.

We applied a 6 class multi-class SVM
this database does not contain contempt
and evaluated the classifiers by the leave-one-subject-out method.
We compared the normalization using the CLM estimation of the AU0 values

with the normalization based on the personal mean shape.

SLIDE 22

Eötvös Loránd University Faculty of Informatics

CLM tracked BU-4DFE (frontal case)

We found an 8% improvement on the average in favor of the mean

shape method.

AU0 normalization Personal Mean Shape normalization

SLIDE 23

Eötvös Loránd University Faculty of Informatics

CLM tracked BU-4DFE (frontal case)

We executed cross evaluations.
We used the CK+ as the ground

truth, since it seems more precise:

the target expression for each

sequence is fully FACS coded,

emotion labels have been revised and

validated, and

CK+ utilizes FACS coding based

emotion evaluation and this method is preferred in the literature considered.

We note however, that both the

CK+ and the BU 4DFE facial expressions are posed and not spontaneous.

Cross Evaulation: CK – BU-4DFE

SLIDE 24

Eötvös Loránd University Faculty of Informatics

Pose invariance on BU-4DFE

Question: CLM’s performance as a function of pose,
pose invariant emotion recognition for situation analysis.
We used the BU 4DFE dataset to render 3D faces with six emotions (anger, disgust,

fear, happiness, sadness, and surprise), which are available in the database.

We randomly selected 25 subjects and rendered rotated versions of every emotion.
We covered rotation angles between 0 and 44 degrees of anti-clockwise rotation

around the yaw axis.

SLIDE 25

Eötvös Loránd University Faculty of Informatics

Pose invariance on BU-4DFE

CLM based classification is robust

against large pose variations, including the hard cases like anger.

However, misclassification types

change as a function of angle. Angry faces Happy Faces Surprised Faces

SLIDE 26

Eötvös Loránd University Faculty of Informatics

Pose invariance on BU-4DFE

as the angle of rotation increases, the error of the landmark position

estimation accumulates.

It may reach 10 RMSE unit on average
1 pixel error for all landmarks corresponds to 1 RMSE unit.
This error influences emotion recognition  we need to improve CLM

SLIDE 27

Eötvös Loránd University Faculty of Informatics

Outline

 Introduction  Theory  Datasets  Experiments  Discussion

SLIDE 28

Eötvös Loránd University Faculty of Informatics

Discussion

We used a number of methods to study performance of shape

representations for facial expression recognition. In all studies, we applied multi-class SVM classification.

We used CLM method to extract shape data, since it is more precise and

may preserve more information than the AAM method.

We replaced normalization using an estimation of the AU0 parameters with

the personal mean shape that gave rise to considerable improvements.

SLIDE 29

Eötvös Loránd University Faculty of Informatics