真实世界人脸表情识别
Weihong Deng(邓伟洪)
Beijing University of Posts and Telecommunications http://whdeng.cn/Emotion/projects.html
Weihong Deng Beijing University of Posts and - - PowerPoint PPT Presentation
Weihong Deng Beijing University of Posts and Telecommunications http://whdeng.cn/Emotion/projects.html Outlines Intr In trod oduc uction tion & Bac ackgrou kground nd 01 Facial
Beijing University of Posts and Telecommunications http://whdeng.cn/Emotion/projects.html
and our lungs take more air.
signal to the group that something is wrong.
Share similar facial muscles
Basic Expressions
System (FACS)
fact universal across cultures acted by similar muscle group.
happiness, fear, surprise, disgust, sadness, anger
Paul Ekman Common muscle group
2017 2015 2013 2011 2009 2007
CK+ MMI TFD Multi-PIE JAFFE Oulu-CASIA … …
datasets are lab-controlled and small- scale.
MMI: 2900 videos, 75 subjects JAFFE: 213 images, 10 females CK+: 596 videos, 123 subjects Oulu-CASIA: 2880 videos, 80 subjects
Acted Facial Expression In The Wild (AFEW) Facial Expression Recognition 2013 (FER-2013)
Wild)
Micro-Expression Datasets: Suppressed emotion, difficult to observe
EmotioNet AffectNet
2017 2015 2013 2011 2009 2007
FER2013 EmotioNet RAF-DB AffectNet EmotiW CK+ MMI TFD Multi-PIE JAFFE Oulu-CASIA … … … …
more diverse and naturalistic, most of which contain large-scale samples.
FER2013: https://github.com/npinto/fer2013 SFEW, AFEW: https://cs.anu.edu.au/few/ EmotioNet: http://cbcsl.ece.ohio-state.edu/dbform_emotionet.html AffectNet: http://mohammadmahoor.com/affectnet/ RAF-DB, RAF-ML: http://whdeng.cn/Emotion/projects.html Aff-Wild:https://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge/ ExpW: http://mmlab.ie.cuhk.edu.hk/projects/socialrelation/index.html
Li, S., & Deng, W. Deep facial expression recognition: A survey. CoRR abs/1804.08348 (2018).
Movies Posed Lab-controlled In-the-wild Spontaneous Micro-expression
Datasets: from Basic to Complex
1,200,000 labels
Learning from labels
Crowd-sourcing
315 volunteers online Each image labelled 40 times
Annotation
Basic & Compound & Blended
0.12 0.24 0.12 0.02 0.29 0.2 0.2 0.4 0.6 0.8 1
Probability
Collection
Image Collection
Flickr (Image social network)
https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key={}&text={}&tags={}
&per_page={}&page={}&sort=relevance
XML response→Interpreted into URLs of the images→Download
~ 30,000 images
download
Keywords
‘smile’ ‘crying’ ‘OMG’… 60,000 images downloaded parse
’s API
XML
response
URLs
Collection
Computer Vision and Pattern Recognition (CVPR). IEEE, 2017, pp. 2584–2593.
Annotation
1,200,000 labels
Learning from labels
Image Annotation
Crowd-sourcing
315 well-trained
annotators were asked to label facial images with one
Each image is annotated
enough times independently, i.e., around 40 times in our experiment. Annotation
Crowd-sourcing
315 volunteers online Each image labelled 40 times
Reliability Estimation
EM
framework
Filter out unreliable labels
Optimal Reliability
Reliability Estimation
Filter noisy annotators and labels
an Expectation Maximization
(EM) framework was used to iteratively optimize and assess the target parameters of each labeler’s reliability.
including 12 classes of compound emotions,
attributes annotations per image.
1. “ Nonverbal communication”, M. Anderson, 1987. 2. “ Facial expression and emotion”, P. Ekman, 1993. 3. “Compound facial expressions of emotion”, Martinez et al. PNAS 2014.
While past research had identifed facial expressions associated with a single internally felt category (e.g, the facial expression of happiness when we feel joyful), we have recently studied facial expressions observed when people experience compound emotions (e.g, the facial expression of happy surprise when we feel joyful in a surprised way, as, for example, at a surprise birthday party)
AU1,2 AU5 AU1,2 AU5 AU25 AU25, AU26 AU1,2 AU1,2 AU26 AU4 AU5 AU1,4 AU7 AU26 AU27 AU1,4 AU1 AU5 AU26 AU27 AU20, AU25 AU12 AU6 AU25 AU12 AU6 AU26 AU26 AU12 AU6 AU12 AU6 AU17 AU4 AU7 AU24 AU5 AU10 AU25 AU26 AU25 AU9 AU7 AU10 AU26 AU27 AU4 AU7 AU1,4 AU15 AU15 AU17 AU7 AU25 AU17 AU1,4 AU25 AU10 AU4 AU7 AU9 AU17 AU5 AU10 AU20, AU25 AU7 AU4 AU5 AU10
AU24 AU4 AU7 AU17 AU1,4 AU7 AU20 , AU25 AU6 AU12 AU25 AU7 AU9 AU17 AU4 AU15 AU17 AU1,4 AU1,2 AU5 AU25, AU27
Surprise Fear Joy Anger Disgust Sadness
CK+ RAF-DB
Processing, vol. 28, no. 1, pp. 356–370, Jan 2019.
C: The convolution layer P: The max-pooling layer R: The ReLU layer F: The fully connected layer
Softmax Loss
C R P C R P C R C R P C R C R F R F Input
Face Images Separable Features Locality- preserving Loss
λ
Discriminative Features
min
𝑗,𝑘
𝑇𝑗𝑘||𝑦𝑗 − 𝑦𝑘||2
2
𝑇𝑗𝑘 = 1, 𝑦𝑘 is among 𝑙 nearest neighbors of 𝑦𝑗 𝐩𝐬 𝑦𝑗 is among k nearest neighbors of 𝑦𝑘 0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓
Our goal:
Locality Preserving Loss:
𝑀𝑚𝑞 = 1 2𝑜
i=1 𝑜
||𝑦𝑗 − 1 𝑙
𝑦∈𝑂𝑙 𝑦𝑗
𝑦 ||2
2
Processing, vol. 28, no. 1, pp. 356–370, Jan 2019.
Table 1. Expression recognition performance of different DCNNs on RAF. The metric is the mean diagonal value of the confusion matrix.
[6] [7] [8] [6] [7] [8]
Table 2. Comparison results of DLP-CNN and other state-of-the-art methods on CK+, SFEW and MMI
feature extraction tool without finetune.
[9] [10] [11] [11] [11] [11] [12] [13] [14] [15] [12] [9] [13] [14] [13] [16] [17] [18] [19] [20] [21] [17] [22]
Plutchik’s Wheel of Emotions
Many emotions are simply a combination of basic emotions or are derived from one (or more) of these basic emotions.
Real life emotions are often blended and involve several simultaneous superposed
emotion have noted that facial expressions may contain more than one message (Ekman & Friesen, 1969; Izard, 1971; Plutchik, 1962;Tomkins, 1963).
Li, Shan, and Weihong Deng. "Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning." International Journal of Computer Vision (2019): 1-23.
Blended Emotions With Multiple Labels
0.12 0.34 0.110.02 0.39 0.01 0 0.2 0.4 0.6 0.8 1
Probability
Emotion Distribution
per images
multi-label
𝑀𝑐𝑛 = 1 2𝑜
𝑗=1 𝑜
2𝑦𝑗
𝑔
− 1 𝑙
𝑦∈𝑂𝑙
𝑔 𝑦𝑗
𝑦𝑔 − 1 𝑙
𝑦∈𝑂𝑙
𝑚 𝑦𝑗
𝑦𝑔
min
𝑗,𝑘
(𝑇𝑗𝑘
𝑔 + 𝑇𝑗𝑘 𝑚 )||𝑦𝑗 𝑔 − 𝑦𝑘 𝑔||2 2
𝑇𝑗𝑘
∗ = 1, 𝑦𝑘 ∗𝑗𝑡 𝑙𝑜𝑜 𝑝𝑔𝑦𝑗 ∗, 𝑤𝑗𝑑𝑓 𝑤𝑓𝑠𝑡𝑏
0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓
Our goal: Bi-Manifold Loss:
𝑦𝑔: samples in feature manifold 𝑦𝑚: samples in label manifold
Two-dimensional deep feature embedding by DBM-CNN on RAF-ML :
Smoothness in terms of both face appearance and emotion perception
*For each evaluation criterion, “↓” indicates the smaller the better while “↑” indicates the bigger the better. **Bold values indicate the best result in term of each performance index
Table 1. Comparison results of DBM-CNN and other training models using MLkNN
Li, Shan, and Weihong Deng. "Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning." International Journal of Computer Vision (2019): 1-23.
Table 2. Experimental results of comparing features on RAF-ML using different algorithms
Table 2. (continued) Experimental results of comparing features on RAF-ML using different algorithms
Li, S., & Deng, W. A Deeper Look at Facial Expression Dataset Bias. CoRR abs/1904.11150 (2019).
Datasets play an important role in the progress of facial expression recognition algorithms, but they may suffer from obvious biases caused by different cultures and collection conditions. Hence, evaluating methods with intra-database protocol would render them lack generalization capability on unseen samples at test time.
Capture Bias:
Each dataset tends to have its own preference during the construction processing.
Experiment Ⅰ Database Recognition Experiment Ⅱ Cross-dataset Generation
Category Bias:
Annotators in each dataset may have different perceptions of the emotion conveyed in images, and many images tend to express more than one expression which enhances the uncertainty of annotation.
Li, S., & Deng, W. A Deeper Look at Facial Expression Dataset Bias. CoRR abs/1904.11150 (2019).
In traditional MMD, only the marginal distributions are considered to be restricted. Since domain-invariance does not mean discriminativeness and class distribution bias exists across domains, samples in target domains are still prone to be misclassified.
ECAN
We explore the underlying label information of target data, and match both the marginal and class conditional distributions to mitigate the discrepancy. With the Re-weighed MMD redistributing the class distribution and the class conditional MMD learning the conditional invariant transformation, the discriminative separating hyperplane thus can generalize well on the target data.
Li, S., & Deng, W. A Deeper Look at Facial Expression Dataset Bias. CoRR abs/1904.11150 (2019).
Emotional Artificial Intelligence can be used for great purposes to help people and humanity.
Neurocomputing 2018
Cross-domain color FER Facial expression synthesis Self-supervised AU Detection Micro-expression recognition with small sample size
ACMMM 2018
3D FER
ACMMM 2018
TAC 2018
Spatial and temporal patterns for Dynamic FER
TPAMI 2019
For more detailed and long-term survey, please refer to:
2017 2015 2013 2011 2009 2007
Dataset
CK+ MMI FER2013 EmotioNet RAF-DB AffectNet EmotiW
LP loss tuplet cluster loss Island loss … … HoloNet PPDN IACNN FaceNet2ExpNet … …
Zhao et al. [15] (LBP-TOP, SVM) Shan et al. [12] (LBP, AdaBoost) Zhi et al. [19] (NMF) Zhong et al. [20] (Sparse learning) Tang (CNN) [130] (winner of FER2013) Kahou et al. [57] (CNN, DBN, DAE)
(winner of EmotiW 2013)
Fan et al. [108] (CNN-LSTM, C3D)
(winner of EmotiW 2016)
Algorithm
Li, S., & Deng, W. Deep facial expression recognition: A survey. CoRR abs/1804.08348 (2018).
GAN)
50 60 70 80 90 100
Accuracy
performance tendency
The state-of-the-arts on different facial expression databases
Lab-controlled Real-world
Micro-expression
Li, S., & Deng, W. Deep facial expression recognition: A survey. CoRR abs/1804.08348 (2018).
Li, S., & Deng, W. Deep facial expression recognition: A survey. CoRR abs/1804.08348 (2018).
always be spontaneous)
Li, S., & Deng, W. Deep facial expression recognition: A survey. CoRR abs/1804.08348 (2018).
[1] Shan Li, Weihong Deng, Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning. International Journal of Computer Vision 127(6-7): 884-906 (2019). [2] Shan Li, Weihong Deng, Reliable Crowdsourcing and Deep Locality- Preserving Learning for Unconstrained Facial Expression Recognition. IEEE
[3] Shan Li, Weihong Deng, A Deeper Look at Facial Expression Dataset Bias. arXiv: 1904.11150 (2019). [4] Shan Li, Weihong Deng, Deep Facial Expression Recognition: A Survey. arXiv:1804.08348v2 (2018).