Finding Tiny Faces in the Wild with Generative Adversarial Network

Finding Tiny Faces in the Wild with Generative Adversarial Network

Yancheng Bai, Yongqiang Zhang, Mingli Ding, Bernard Ghanem

Abstract

task: detecting small faces in unconstrained conditions
challenges: lacking detailed information and blurring
solution: directly generate a clear high-resolution face from a blurry small one by adopting a generative adversarial network (GAN).
traditional method: super-resolving and refining sequentially
solution: design a novel network
new training losses to guide the generator network to recover fine details and to promote the discriminator network to distinguish real vs. fake and face vs. non-face simultaneously

Introduction

large and medium faces detection: good
small faces: far from satisfactory
difficulty: lack sufficient detailed information to distinguish them from the similar background; modern CNN-based face detectors use the down-sampled convolutional (conv) feature maps with stride 8, 16 or 32 to represent faces, losing most spatial information and are too coarse to describe small faces
traditional solution: directly up-samples images using bi-linear operation and exhaustively searches faces on the up-sampled images, increasing the computation cost and the inference time too; use the intermediate conv feature maps to represent faces at specific scales, the shallow but fine-grained intermediate conv feature maps lack discrimination, which causes many false positive results. take no care of other challenges
our solution: use GAN. generator = SRN + RN. super-resolution network(SRN) up-sample small faces to fine scale, reducing the artifact and improving the quality of up-sampled images with a large upscaling factors. refinement network (RN) recover some missing details in the up-sampled images and generate sharp high-resolution images for classification. discriminator sub-network utilize a new loss function that enforces the discriminator network to distinguish the real/fake face and face/non-face simultaneously, distinguish whether they are real images or generated high-resolution images and whether they are faces or non-faces.
contribution:
(1) GAN: generator = SRN + RN, discriminator multi-task
(2) new loss: promote the discriminator network to distinguish the real/fake image and face/non-face simultaneously
(3) state-of-the-art performance

Face Detection

hand-crafted feature based methods: a single scale, restricts the performance of detectors
CNN-based methods + upsample by re-sizing input images to different scales during training and testing: inevitably increases memory and computation costs, generates the images with large structural distortions
our method: exploits the super-resolution and refinement network to generate clear and fine faces with high resolution

感觉这效果是不是太过了。。。而且有的地方把不是人脸的部位也判断为人脸了

Superresolution and Refinement Network

the first work trying to jointly super-resolve and refine the small blurry faces in the wild

Generative Adversarial Networks

super-resolution (SRGAN), blurry and lack fine details especially for low-resolution faces
extend the discriminator network to classify the fake vs. real and face vs. non-face simultaneously

Proposed Method

GAN

ILR I L R : low-resolution face candidates
IHR I H R : high-resolution face candidates
y y : label, 1 for face, 0 for non-face
generator: G:ILRIHR G : I L R ↦ I H R
discriminator: D D , distinguish the generated vs. true high-resolution images and faces vs. non-faces jointly

minθGmaxθDE(IHR,y)p(IHR,y)(logD(IHR,y;θD))+E(ILR,y)p(ILR,y)(log(1D(G(ILR,y;θG);θD))) min θ G max θ D E ( I H R , y ) ∼ p ( I H R , y ) ( log ⁡ D ( I H R , y ; θ D ) ) + E ( I L R , y ) ∼ p ( I L R , y ) ( log ⁡ ( 1 − D ( G ( I L R , y ; θ G ) ; θ D ) ) )

Network Architecture

SRN: takes the low-resolution images as the inputs and the outputs are the super-resolution images, usually blurring
RN: refine the super-resolution images
Finding Tiny Faces in the Wild with Generative Adversarial Network_第1张图片

Loss Function

pixel-wise loss(generator): 类似自编码器的loss, LMSE=G1(ILR)IHR2+G2(G1(ILR))IHR2 L M S E = ‖ G 1 ( I L R ) − I H R ‖ 2 + ‖ G 2 ( G 1 ( I L R ) ) − I H R ‖ 2 , 其中 G1,G2 G 1 , G 2 分别表示SRN, RN
adversarial loss(discriminator): Ladv=log(1D(G(ILR))) L a d v = log ⁡ ( 1 − D ( G ( I L R ) ) )
Classification loss: Lclc=log() L c l c = log ⁡ ( ) , 不用softmax loss?
结合三个loss进行加权求和就得到最终的loss
这样的工作本人最近在MNIST上也做过,只不过并非对于超分辨任务,真是不谋而合!

你可能感兴趣的:(paper,reading,notes)