Weihong Deng Beijing University of Posts and - - PowerPoint PPT Presentation

weihong deng beijing university of posts and
SMART_READER_LITE
LIVE PREVIEW

Weihong Deng Beijing University of Posts and - - PowerPoint PPT Presentation

Weihong Deng Beijing University of Posts and Telecommunications http://whdeng.cn/Emotion/projects.html Outlines Intr In trod oduc uction tion & Bac ackgrou kground nd 01 Facial


slide-1
SLIDE 1

真实世界人脸表情识别

Weihong Deng(邓伟洪)

Beijing University of Posts and Telecommunications http://whdeng.cn/Emotion/projects.html

slide-2
SLIDE 2

In Intr trod

  • duc

uction tion & Bac ackgrou kground nd 01 Facial Expression Databases 02 Our Works 03 Latest Survey 04

Outlines

slide-3
SLIDE 3
  • Charles Darwin theorized that emotional expression was evoluted

by natural selection

  • Important for survival: Fear expression can directly let our eyes absorb more light

and our lungs take more air.

  • Improve group fitness: Surprise indicates something new happen; Sadness is a

signal to the group that something is wrong.

Share similar facial muscles

Evolution Creates Facial Expressions

slide-4
SLIDE 4

Basic Emotions are Universal

Basic Expressions

  • Paul Ekman designed the acknowledged Facial Action Coding

System (FACS)

  • Paul Ekman claimed that Basic emotional expressions are in

fact universal across cultures acted by similar muscle group.

  • In 1960s, Paul Ekman identified six core expressions:

happiness, fear, surprise, disgust, sadness, anger

Paul Ekman Common muscle group

slide-5
SLIDE 5

Introduction & Background 01 Fac acia ial Expre pression ssion Dat atab abas ases es 02 Our Works 03 Latest Survey 04

Outliness

slide-6
SLIDE 6

Prototype Databases

2017 2015 2013 2011 2009 2007

CK+ MMI TFD Multi-PIE JAFFE Oulu-CASIA … …

  • Previous widely-used facial expression

datasets are lab-controlled and small- scale.

MMI: 2900 videos, 75 subjects JAFFE: 213 images, 10 females CK+: 596 videos, 123 subjects Oulu-CASIA: 2880 videos, 80 subjects

slide-7
SLIDE 7
  • 1809 videos from movies and TV shows
  • 7 basic facial expressions
  • Three annotators
  • More than 330 subjects, age 1~77 years

Acted Facial Expression In The Wild (AFEW) Facial Expression Recognition 2013 (FER-2013)

  • 35887 images from the Internet
  • 48x48 pixels in grayscale
  • 184 emotion-related keywords
  • 7 basic facial expressions

PosedSpontaneous Mirco

  • SMIC
  • CASME, CASME II, CAS(ME)2
  • MEVIEW (Micro-Expressions VIdEos in the

Wild)

Micro-Expression Datasets: Suppressed emotion, difficult to observe

slide-8
SLIDE 8

Lab-controlledMoiveIn-the-Wild

EmotioNet AffectNet

  • 1,000,000 images from Internet
  • 457 concepts of emotion-related keywords
  • 23 basic and compound emotion categories
  • Action Units
  • 1,000,000 images from Internet
  • 1,250 emotion-related keywords
  • 8 basic emotion categories
  • Valence and Arousal
slide-9
SLIDE 9

Advanced Databases

2017 2015 2013 2011 2009 2007

FER2013 EmotioNet RAF-DB AffectNet EmotiW CK+ MMI TFD Multi-PIE JAFFE Oulu-CASIA … … … …

  • Datasets collected from real world are

more diverse and naturalistic, most of which contain large-scale samples.

FER2013: https://github.com/npinto/fer2013 SFEW, AFEW: https://cs.anu.edu.au/few/ EmotioNet: http://cbcsl.ece.ohio-state.edu/dbform_emotionet.html AffectNet: http://mohammadmahoor.com/affectnet/ RAF-DB, RAF-ML: http://whdeng.cn/Emotion/projects.html Aff-Wild:https://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge/ ExpW: http://mmlab.ie.cuhk.edu.hk/projects/socialrelation/index.html

  • Facial expression datasets in-the-wild

Li, S., & Deng, W. Deep facial expression recognition: A survey. CoRR abs/1804.08348 (2018).

slide-10
SLIDE 10

Contents

Movies Posed Lab-controlled In-the-wild Spontaneous Micro-expression

Datasets: from Basic to Complex

Expression Labeling Dataset Bias Latest Survey Discussions

slide-11
SLIDE 11

Introduction & Background 01 Facial Expression Databases 02 Our ur Wor

  • rks

ks 03 Latest Survey 04

Outlines

slide-12
SLIDE 12

Two annotation Challenges

1,200,000 labels

Learning from labels

Crowd-sourcing

315 volunteers online Each image labelled 40 times

Annotation

Basic & Compound & Blended

0.12 0.24 0.12 0.02 0.29 0.2 0.2 0.4 0.6 0.8 1

Probability

slide-13
SLIDE 13

Collection

 Image Collection

 Flickr (Image social network)

 https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key={}&text={}&tags={}

&per_page={}&page={}&sort=relevance

 XML response→Interpreted into URLs of the images→Download

~ 30,000 images

download

Keywords

‘smile’ ‘crying’ ‘OMG’… 60,000 images downloaded parse

1.

’s API

XML

response

URLs

Collection

Real-world Affective Face Database (RAF-DB)

  • S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality preserving learning for expression recognition in the wild,” in 2017 IEEE Conference on

Computer Vision and Pattern Recognition (CVPR). IEEE, 2017, pp. 2584–2593.

slide-14
SLIDE 14

Annotation

1,200,000 labels

Learning from labels

2.

 Image Annotation

 Crowd-sourcing

 315 well-trained

annotators were asked to label facial images with one

  • f the seven basic categories

 Each image is annotated

enough times independently, i.e., around 40 times in our experiment. Annotation

Real-world Affective Face Database (RAF-DB)

Crowd-sourcing

315 volunteers online Each image labelled 40 times

slide-15
SLIDE 15

Reliability Estimation

3.

EM

framework

Filter out unreliable labels

Optimal Reliability

 Reliability Estimation

 Filter noisy annotators and labels

 an Expectation Maximization

(EM) framework was used to iteratively optimize and assess the target parameters of each labeler’s reliability.

Real-world Affective Face Database (RAF-DB)

slide-16
SLIDE 16
  • Database Statistics
  • 29672 number of real-world images,
  • a 7-dimensional expression distribution vector for each image,
  • two different subsets: single-label subset, including 7 classes of basic emotions; two-tab subset,

including 12 classes of compound emotions,

  • 5 accurate landmark locations, 37 automatic landmark locations, race, age range and gender

attributes annotations per image.

Real-world Affective Face Database (RAF-DB)

slide-17
SLIDE 17

? ? ?

Background

1. “ Nonverbal communication”, M. Anderson, 1987. 2. “ Facial expression and emotion”, P. Ekman, 1993. 3. “Compound facial expressions of emotion”, Martinez et al. PNAS 2014.

Compound Emotions

  • S. Du, Y. Tao, and A. M. Martinez, “Compound facial expressions
  • f emotion,” Proceedings of the National Academy of Sciences,
  • vol. 111, no. 15, pp. E1454–E1462, 2014.

While past research had identifed facial expressions associated with a single internally felt category (e.g, the facial expression of happiness when we feel joyful), we have recently studied facial expressions observed when people experience compound emotions (e.g, the facial expression of happy surprise when we feel joyful in a surprised way, as, for example, at a surprise birthday party)

slide-18
SLIDE 18

Real-world Affective Face Database (RAF-DB)

slide-19
SLIDE 19

AU1,2 AU5 AU1,2 AU5 AU25 AU25, AU26 AU1,2 AU1,2 AU26 AU4 AU5 AU1,4 AU7 AU26 AU27 AU1,4 AU1 AU5 AU26 AU27 AU20, AU25 AU12 AU6 AU25 AU12 AU6 AU26 AU26 AU12 AU6 AU12 AU6 AU17 AU4 AU7 AU24 AU5 AU10 AU25 AU26 AU25 AU9 AU7 AU10 AU26 AU27 AU4 AU7 AU1,4 AU15 AU15 AU17 AU7 AU25 AU17 AU1,4 AU25 AU10 AU4 AU7 AU9 AU17 AU5 AU10 AU20, AU25 AU7 AU4 AU5 AU10

AU24 AU4 AU7 AU17 AU1,4 AU7 AU20 , AU25 AU6 AU12 AU25 AU7 AU9 AU17 AU4 AU15 AU17 AU1,4 AU1,2 AU5 AU25, AU27

Surprise Fear Joy Anger Disgust Sadness

CK+ RAF-DB

Action Units: RAF-DB is more diverse

  • S. Li and W. Deng, “Reliable crowdsourcing and deep locality preserving learning for unconstrained facial expression recognition,” IEEE Transactions on Image

Processing, vol. 28, no. 1, pp. 356–370, Jan 2019.

slide-20
SLIDE 20

C: The convolution layer P: The max-pooling layer R: The ReLU layer F: The fully connected layer

Softmax Loss

C R P C R P C R C R P C R C R F R F Input

Face Images Separable Features Locality- preserving Loss

λ

Discriminative Features

min

𝑗,𝑘

𝑇𝑗𝑘||𝑦𝑗 − 𝑦𝑘||2

2

𝑇𝑗𝑘 = 1, 𝑦𝑘 is among 𝑙 nearest neighbors of 𝑦𝑗 𝐩𝐬 𝑦𝑗 is among k nearest neighbors of 𝑦𝑘 0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓

Our goal:

Locality Preserving Loss:

𝑀𝑚𝑞 = 1 2𝑜

i=1 𝑜

||𝑦𝑗 − 1 𝑙

𝑦∈𝑂𝑙 𝑦𝑗

𝑦 ||2

2

DLP-CNN: Deep Locality-preserving CNN

slide-21
SLIDE 21

DLP-CNN: Deep Locality-preserving CNN

  • S. Li and W. Deng, “Reliable crowdsourcing and deep locality preserving learning for unconstrained facial expression recognition,” IEEE Transactions on Image

Processing, vol. 28, no. 1, pp. 356–370, Jan 2019.

slide-22
SLIDE 22

Table 1. Expression recognition performance of different DCNNs on RAF. The metric is the mean diagonal value of the confusion matrix.

[6] [7] [8] [6] [7] [8]

  • 6. Simonyan & Zisserman, arXiv:1409.1556 (2014).
  • 7. Krizhevsky et al. NIPS, 1097–1105 (2012).
  • 8. Wen et al. ECCV, 499–515 (2016).

DLP-CNN: Experiment Results

slide-23
SLIDE 23

Table 2. Comparison results of DLP-CNN and other state-of-the-art methods on CK+, SFEW and MMI

  • databases. To validate the generalization of our model, the well-trained DLP-CNN has been employed as a

feature extraction tool without finetune.

[9] [10] [11] [11] [11] [11] [12] [13] [14] [15] [12] [9] [13] [14] [13] [16] [17] [18] [19] [20] [21] [17] [22]

  • 9. Zhong et al. CVPR, 2562–2569 (2012).
  • 10. LV et al. SMARTCOMP, 303–308 (2014).
  • 11. Liu et al. FG, 1–6 (2013).
  • 12. Liu et al. ACCV, 143-157 (2014).
  • 13. Mollahosseini et al. WACV, 1-10 (2016).
  • 14. Liu et al. IEEE TIP, 25(12):5920–5932, (2016).
  • 15. Shojaeilangari et al. IEEE TIP, 24(7):2140–2152, (2015).
  • 16. Eleftheriadis et al. IEEE TIP, 24(1):189–204, (2015).
  • 17. Liu et al. CVPR, 1749–1756 (2014).
  • 18. Ng et al. ICMI, 443-449 (2015).
  • 19. Yu et al. ICMI, 435-442 (2015).
  • 20. Kim et al. ICMI, 427-434 (2015).
  • 21. Jung et al. CVPR, 2983–2991 (2015).
  • 22. Sariyanidi et al. IEEE TIP, 26(4):1965-1978, (2017).

DLP-CNN: Experiment Results

slide-24
SLIDE 24

 Plutchik’s Wheel of Emotions

Many emotions are simply a combination of basic emotions or are derived from one (or more) of these basic emotions.

Real life emotions are often blended and involve several simultaneous superposed

  • r masked emotions. Many students of

emotion have noted that facial expressions may contain more than one message (Ekman & Friesen, 1969; Izard, 1971; Plutchik, 1962;Tomkins, 1963).

Blended Emotions

slide-25
SLIDE 25

Li, Shan, and Weihong Deng. "Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning." International Journal of Computer Vision (2019): 1-23.

Real-world Affective Face Multi Label (RAF-ML)

Blended Emotions With Multiple Labels

0.12 0.34 0.110.02 0.39 0.01 0 0.2 0.4 0.6 0.8 1

Probability

Emotion Distribution

  • 4,908 images from Internet
  • 40 independent labelers

per images

  • Blended emotions with

multi-label

slide-26
SLIDE 26

𝑀𝑐𝑛 = 1 2𝑜

𝑗=1 𝑜

2𝑦𝑗

𝑔

− 1 𝑙

𝑦∈𝑂𝑙

𝑔 𝑦𝑗

𝑦𝑔 − 1 𝑙

𝑦∈𝑂𝑙

𝑚 𝑦𝑗

𝑦𝑔

min

𝑗,𝑘

(𝑇𝑗𝑘

𝑔 + 𝑇𝑗𝑘 𝑚 )||𝑦𝑗 𝑔 − 𝑦𝑘 𝑔||2 2

𝑇𝑗𝑘

∗ = 1, 𝑦𝑘 ∗𝑗𝑡 𝑙𝑜𝑜 𝑝𝑔𝑦𝑗 ∗, 𝑤𝑗𝑑𝑓 𝑤𝑓𝑠𝑡𝑏

0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓

Our goal: Bi-Manifold Loss:

𝑦𝑔: samples in feature manifold 𝑦𝑚: samples in label manifold

Two-dimensional deep feature embedding by DBM-CNN on RAF-ML :

DBM-CNN: Deep Bi-Manifold CNN

slide-27
SLIDE 27

DBM-CNN: Deep Bi-Manifold CNN

Smoothness in terms of both face appearance and emotion perception

slide-28
SLIDE 28

*For each evaluation criterion, “↓” indicates the smaller the better while “↑” indicates the bigger the better. **Bold values indicate the best result in term of each performance index

Table 1. Comparison results of DBM-CNN and other training models using MLkNN

DBM-CNN: Experiment Results

Li, Shan, and Weihong Deng. "Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning." International Journal of Computer Vision (2019): 1-23.

slide-29
SLIDE 29

Table 2. Experimental results of comparing features on RAF-ML using different algorithms

DBM-CNN: Experiment Results

slide-30
SLIDE 30

Table 2. (continued) Experimental results of comparing features on RAF-ML using different algorithms

DBM-CNN: Experiment Results

slide-31
SLIDE 31

A Deeper Look at Facial Expression Datasets Bias

Li, S., & Deng, W. A Deeper Look at Facial Expression Dataset Bias. CoRR abs/1904.11150 (2019).

Datasets play an important role in the progress of facial expression recognition algorithms, but they may suffer from obvious biases caused by different cultures and collection conditions. Hence, evaluating methods with intra-database protocol would render them lack generalization capability on unseen samples at test time.

slide-32
SLIDE 32

Capture Bias:

Each dataset tends to have its own preference during the construction processing.

Experiment Ⅰ Database Recognition Experiment Ⅱ Cross-dataset Generation

Category Bias:

Annotators in each dataset may have different perceptions of the emotion conveyed in images, and many images tend to express more than one expression which enhances the uncertainty of annotation.

A Deeper Look at Facial Expression Datasets Bias

Li, S., & Deng, W. A Deeper Look at Facial Expression Dataset Bias. CoRR abs/1904.11150 (2019).

slide-33
SLIDE 33

ECAN: deep Emotion-Conditional Adaption Network

In traditional MMD, only the marginal distributions are considered to be restricted. Since domain-invariance does not mean discriminativeness and class distribution bias exists across domains, samples in target domains are still prone to be misclassified.

ECAN

We explore the underlying label information of target data, and match both the marginal and class conditional distributions to mitigate the discrepancy. With the Re-weighed MMD redistributing the class distribution and the class conditional MMD learning the conditional invariant transformation, the discriminative separating hyperplane thus can generalize well on the target data.

slide-34
SLIDE 34

Domain Adaption: From RAF-DB to other datasets

slide-35
SLIDE 35

Domain Adaption: Experimental Results

Li, S., & Deng, W. A Deeper Look at Facial Expression Dataset Bias. CoRR abs/1904.11150 (2019).

slide-36
SLIDE 36

Introduction & Background 01 Facial Expression Databases 02 Our Works 03 La Late test t Sur urvey vey 04

Ourlines

slide-37
SLIDE 37
  • Therapies with autistic children
  • Automatic sensing behavioral cues of depression
  • Ads and apps for marketing & entertainment
  • Driver Monitoring Systems for road safety
  • Lie-detector and monitor for suspicious behavior
  • So on … …

Applications

Emotional Artificial Intelligence can be used for great purposes to help people and humanity.

slide-38
SLIDE 38

Latest Progresses

  • Y. Li, J. Zeng, S. Shan, CVPR 2019
  • S. Wang, B. Li, Y. Liu, W. Yan, Y. Chen, X. Fu

Neurocomputing 2018

Cross-domain color FER Facial expression synthesis Self-supervised AU Detection Micro-expression recognition with small sample size

  • Z. Chen, D. Huang, Y. Wang, L. Chen

ACMMM 2018

3D FER

  • L. Song, Z. Lu, R. He, Z. Sun et al.

ACMMM 2018

  • W. Zheng, Y. Zong, X. Zhou, M. Xin

TAC 2018

Spatial and temporal patterns for Dynamic FER

  • S. Wang, Z. Zheng, S. Yin, J. Yang et al.

TPAMI 2019

slide-39
SLIDE 39

Latest Survey

For more detailed and long-term survey, please refer to:

slide-40
SLIDE 40

2017 2015 2013 2011 2009 2007

Dataset

CK+ MMI FER2013 EmotioNet RAF-DB AffectNet EmotiW

LP loss tuplet cluster loss Island loss … … HoloNet PPDN IACNN FaceNet2ExpNet … …

Zhao et al. [15] (LBP-TOP, SVM) Shan et al. [12] (LBP, AdaBoost) Zhi et al. [19] (NMF) Zhong et al. [20] (Sparse learning) Tang (CNN) [130] (winner of FER2013) Kahou et al. [57] (CNN, DBN, DAE)

(winner of EmotiW 2013)

  • --->
  • --->

Fan et al. [108] (CNN-LSTM, C3D)

(winner of EmotiW 2016)

Algorithm

slide-41
SLIDE 41

Latest Survey

Li, S., & Deng, W. Deep facial expression recognition: A survey. CoRR abs/1804.08348 (2018).

  • Facial Expression Datasets
  • Deep Facial Expression Recognition
  • Pre-processing
  • Face alignment
  • Data augmentation
  • Face normalization
  • Deep networks (CNN, DBN, DAE, RNN and

GAN)

  • Facial expression classification
slide-42
SLIDE 42

State-of-the-Art

50 60 70 80 90 100

Accuracy

performance tendency

The state-of-the-arts on different facial expression databases

Lab-controlled Real-world

Micro-expression

Li, S., & Deng, W. Deep facial expression recognition: A survey. CoRR abs/1804.08348 (2018).

slide-43
SLIDE 43

Additional Related Issues

  • Occlusion and non-frontal head pose
  • FER on infrared data
  • FER on 3D static and dynamic data
  • Facial expression synthesis
  • Visualization techniques
  • Other special problems

Li, S., & Deng, W. Deep facial expression recognition: A survey. CoRR abs/1804.08348 (2018).

slide-44
SLIDE 44

Future Trends

  • Beyond Six Basic Emotions:
  • Facial Action Coding System (Action Units)
  • Compound emotions & Blended emotions
  • Objective labeling on Subjective expressions.
  • Multimodal FER:
  • Audio and Video
  • Infrared images and Thermal images
  • Depth information from 3D face models
  • Physiological data
  • Challenging Variations:
  • Head poses and Facial occlusions (Users are

always be spontaneous)

  • Race/Identity dependence
  • Cross-dataset FER for generalization

Li, S., & Deng, W. Deep facial expression recognition: A survey. CoRR abs/1804.08348 (2018).

slide-45
SLIDE 45

Acknowledgements

[1] Shan Li, Weihong Deng, Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning. International Journal of Computer Vision 127(6-7): 884-906 (2019). [2] Shan Li, Weihong Deng, Reliable Crowdsourcing and Deep Locality- Preserving Learning for Unconstrained Facial Expression Recognition. IEEE

  • Trans. Image Processing 28(1): 356-370 (2019).

[3] Shan Li, Weihong Deng, A Deeper Look at Facial Expression Dataset Bias. arXiv: 1904.11150 (2019). [4] Shan Li, Weihong Deng, Deep Facial Expression Recognition: A Survey. arXiv:1804.08348v2 (2018).

Collaborator: Shan Li i (李珊)

slide-46
SLIDE 46

Thanks!