MesoNet: a Compact Facial Video Forgery Detection Network 论文阅读

MesoNet: a Compact Facial Video Forgery Detection Network

这一篇论文其实还蛮好懂得,没有什么非常复杂的东西。主要思想是基于中层的语义进行检测,MesoNet是mesoscopic network , mesoscopic是细观、介观的意思,在原文中有对这一点进行说明

We propose to detect forged videos of faces by placing our method at a mesoscopic level of analysis. Indeed, microscopic analyses based on image noise cannot be applied in a compressed video context where the image noise is strongly degraded. Similarly, at a higher semantic level, human eye struggles to distinguish forged images [21], especially when the image depicts a human face [1, 7]. That is why we propose to adopt an intermediate approach using a deep neural network with a small number of layers



This paper presents a method to automatically and efficiently detect face tampering in videos, and particularly focuses on two recent techniques used to generate hyperrealistic forged videos: Deepfake and Face2Face.


Thus, this paper follows a deep learning approach and presents two networks, both with a low number of layers to focus on the mesoscopic properties of images. We evaluate those fast networks on both an existing dataset and a dataset we have constituted from online videos. The tests demonstrate a very successful detection rate with more than 98% for Deepfake and 95% for Face2Face.






1.1 DeepFake介绍


MesoNet: a Compact Facial Video Forgery Detection Network 论文阅读_第1张图片


The process to generate Deepfake images is to gather aligned faces of two different people A and B, then to train an auto-encoder EA to reconstruct the faces of A from the dataset of facial images of A, and an auto-encoder EB to reconstruct the faces of B from the dataset of facial images of B. The trick consists in sharing the weights of the encoding part of the two auto-encoders EA and EB, but keeping their respective decoder separated. Once the optimization is done, any image containing a face of A can be encoded through this shared encoder but decoded with decoder of EB

产生Deepfake图像的过程首先是把两个不同的人A和B的脸对齐,然后训练一个自动编码器EA去重构假脸数据集A中的脸,然后有一个自动编码器EB去重构假脸数据集B。技巧就是分享两个编码器EA和EB编码时候的权值 ,但是保持它们的解码器独立的。直到优化做完,假脸数据集A中的图像能够使用共享权值的编码器进行编码然后使用解码器B进行解码。



Basically, the extraction of faces and their reintegration can fail, especially in the case of face occlusions: some frames can end up with no facial reenactment or with a large blurred area or a doubled facial contour. However, those technical errors can easily be avoided with more advanced networks.More deeply, and this is true for other applications, autoencoders tend to poorly reconstruct fine details because of the compression of the input data on a limited encoding space, the result thus often appears a bit blurry.

1.2 Face2Face

Reenactment methods, like [9], are designed to transfer image facial expression from a source to a target person.


2.Proposed method

2.1 Meso-4


MesoNet: a Compact Facial Video Forgery Detection Network 论文阅读_第2张图片

2.2 MesoInception-4


MesoNet: a Compact Facial Video Forgery Detection Network 论文阅读_第3张图片

MesoNet: a Compact Facial Video Forgery Detection Network 论文阅读_第4张图片

3. Experiments


MesoNet: a Compact Facial Video Forgery Detection Network 论文阅读_第5张图片
MesoNet: a Compact Facial Video Forgery Detection Network 论文阅读_第6张图片
MesoNet: a Compact Facial Video Forgery Detection Network 论文阅读_第7张图片

3.4 Image aggregation


Theoretically speaking,there is no justification for a gain in scores or a confidence interval indicator as frames of a same video are strongly correlated to one another


In practice, for the viewer comfort,most filmed face contain a majority of stable clear frames.The effect of punctual movement blur, face occlusion and random misprediction can thus be outweighted by a majority of good predictions on a sample of frames taken from the video

MesoNet: a Compact Facial Video Forgery Detection Network 论文阅读_第8张图片

3.5. Aggregation on intra-frames


MesoNet: a Compact Facial Video Forgery Detection Network 论文阅读_第9张图片

3.6. Intuition behind the network


MesoNet: a Compact Facial Video Forgery Detection Network 论文阅读_第10张图片

4. Conclusion

Our experiments show that our method has an average detection rate of 98% for Deepfake videos and 95% for Face2Face videos under real conditions of diffusion on the internet.
