一篇微表情识别的综述,发表在The Visual Computer上。写得蛮详细的,记录一下阅读的笔记
Abstract 摘要:
Challenges in these areas remain relevant due to the nature ofME’s split-second transition with minute intensity levels.
1 Introduction 简介
Micro-expression (ME) is a transitory motion of the human face that usually lasts between 1/25 and 1/5 s.
Contrasting from normal facial expressions, it is difficult to intentionally produce or neutralize ME which makes for effective evidence of lie detection that hap pens in scenarios where a person has something to lose or gain.
Ekman has categorized human emotions into seven universal emotions, i.e. anger, happiness, sadness, disgust,surprise, fear, and contempt.
Ekman and Friesen have also introduced Facial Action Coding System (FACS) to define facial expressions through action units (AUs).
AU is an observable component of facial movement where distinct facial areas are used to detect fine-grained expression changes on faces.
There are currently a total of 44 AUs that happen independently or simultaneously with other AUs to express an emotion.
the accuracy of human recognition with AU is only about 40% due to short-lived ME occurrences.
current challenges such as environmental variation, spontaneous subtle motion, and imbalanced datasets that greatly impact detection and recognition accuracy.
Environmental variation is the most challenging issue in ME recognition which includes illumination variation and head-pose variation.
The low intensity of subtle and spontaneous facial movement is a major challenge for ME recognition which renders emotion recognition non-distinguishable through the naked eyes.
Although recommended in evaluating ME recognition system, their imbalanced data distribution across expressions may lead to biases in results.
2 Context 相关内容
2.1 Datase 数据集
2.2 General pipeline 通用流程
The ME recognition process can be divided into image acquisition, face detection, pre-processing, ME spotting, feature extraction, and ME classification.
several pre-processing steps are implemented to overcome lighting variation or noise attack.
research on ME spotting from images or videos is a potential future research direction.
2.3 Pre-processing预处理
Pre-processing in ME recognition usually involves face detection, face registration, motion magnification, and temporal normalization.
2.3.1 Face detection and registration面部检测与归一化
The face registration stage aligns a detected face onto a reference face
2.3.2 Motion magnification运动增强
motion magnification techniques are introduced to increase distinguishing powers between different motions.
2.3.3 Temporal normalization 时间域的归一化
Temporal interpolation model (TIM) method is commonly used to normalize video lengths.
2.4 Classification分类
Classification normally refers to the categorization of emotions based on selected features input.
3 Features for ME representation 微表情特征
3.1 Low-level representation 底层表征
Low-level features are normally represented in the form of descriptors containing a bunch of visual data cue without explicit semantic meaning/knowledge.
In this paper, we briefly describe features in the following family: local binary pattern (LBP), optical flow, gradient based, and their respective variants.
3.2 Mid-level representation 中层表征
low-level features remain inadequate in representing subtle motions due to short duration, low intensity, noise and head-pose changes.
Mid-level feature is a technique to transform local features into image representations for classification purpose where weightage is added to bring explicit meanings and knowledge to local features.
The most common mid-level technique is the bag-of-words (BoW) representation that is commonly used in affect recognition.
To conclude, there are limited mid-level representations proposed to handle ME recognition.
3.3 High-level representation 高层表征
A high-level representation can be defined as a set of semantic data that are human interpretable, where the high-level features are a combination of several low-level features.
4 Micro-expression spotting 微表情检测(感觉就是从视频序列里面检测到微表情开始与结束)
ME spotting is a stage where frames containing emotions are detected in time for a given video.
4.1 Appearance-based approach基于外观的方法
Appearance-based approach normally refers to feature representation constructed in pixel-wise level, especially by the intensity value.
However, this method is indifferent to non-micro-expression movements, such as eye blinking.
Besides intensity-based feature, methods such as 3D gradient histogram descriptor or histogram of oriented gradients are also reported for ME spotting.
4.2 Dynamic approach动态方法
the feature is constructed based on non-rigid motion changes of subtle expression where motion changes are extracted for spotting purpose.
4.3 Generic approach通用方法
5 Recognition results and discussion 识别结果与讨论
5.1 Result 结果
Leave-one-video-out (LOVO) and leave-one-subject-out(LOSO) cross-validation are the most common methods used in ME recognition performance measurement.
Instead of LOVO and LOSO, k-fold validation, repeated random sub-sampling validation, or basic hold-out methods are also used for performance evaluation with recall and precision graphs reported in other works.
除了LOVO和LOSO, k-fold验证法、重复随机采样验证或基础保持验证法,也被用于评估方法准确度。
5.2 Discussion and future recommendation 讨论与未来研究的建议
To overcome head-pose variation works on face registration, faces in all frames can be aligned and normalized into the same position and size.
As far as illumination change is concerned, all algorithms were heavily tested in controlled and even illumination.
Hence,Euler magnification is introduced in the pre-processing stage to amplify low- intensity movements.
In pre-processing, works on magnification are lacking compared to feature extraction and classification.
Studies in ME spotting are limited.
Most of the proposed features are focused on low-level approach with only two existing works on mid-level features.
6 Conclusion 结论
Feature representations are evolving from low-level approach to mid-level and high-level approach.