Yancheng Bai, Yongqiang Zhang, Mingli Ding, Bernard Ghanem
task: detecting small faces in unconstrained conditions
challenges: lacking detailed information and blurring
solution: directly generate a clear high-resolution face from a blurry small one by adopting a generative adversarial network (GAN).
traditional method: super-resolving and refining sequentially
solution: design a novel network
new training losses to guide the generator network to recover fine details and to promote the discriminator network to distinguish real vs. fake and face vs. non-face simultaneously
large and medium faces detection: good
small faces: far from satisfactory
difficulty: lack sufficient detailed information to distinguish them from the similar background; modern CNN-based face detectors use the down-sampled convolutional (conv) feature maps with stride 8, 16 or 32 to represent faces, losing most spatial information and are too coarse to describe small faces
traditional solution: directly up-samples images using bi-linear operation and exhaustively searches faces on the up-sampled images, increasing the computation cost and the inference time too; use the intermediate conv feature maps to represent faces at specific scales, the shallow but fine-grained intermediate conv feature maps lack discrimination, which causes many false positive results. take no care of other challenges
our solution: use GAN. generator = SRN + RN. super-resolution network(SRN) up-sample small faces to fine scale, reducing the artifact and improving the quality of up-sampled images with a large upscaling factors. refinement network (RN) recover some missing details in the up-sampled images and generate sharp high-resolution images for classification. discriminator sub-network utilize a new loss function that enforces the discriminator network to distinguish the real/fake face and face/non-face simultaneously, distinguish whether they are real images or generated high-resolution images and whether they are faces or non-faces.
contribution:
(1) GAN: generator = SRN + RN, discriminator multi-task
(2) new loss: promote the discriminator network to distinguish the real/fake image and face/non-face simultaneously
(3) state-of-the-art performance
hand-crafted feature based methods: a single scale, restricts the performance of detectors
CNN-based methods + upsample by re-sizing input images to different scales during training and testing: inevitably increases memory and computation costs, generates the images with large structural distortions
our method: exploits the super-resolution and refinement network to generate clear and fine faces with high resolution
感觉这效果是不是太过了。。。而且有的地方把不是人脸的部位也判断为人脸了
the first work trying to jointly super-resolve and refine the small blurry faces in the wild
super-resolution (SRGAN), blurry and lack fine details especially for low-resolution faces
extend the discriminator network to classify the fake vs. real and face vs. non-face simultaneously
ILR I L R : low-resolution face candidates
IHR I H R : high-resolution face candidates
y y : label, 1 for face, 0 for non-face
generator: G:ILR↦IHR G : I L R ↦ I H R
discriminator: D D , distinguish the generated vs. true high-resolution images and faces vs. non-faces jointly
SRN: takes the low-resolution images as the inputs and the outputs are the super-resolution images, usually blurring
RN: refine the super-resolution images
pixel-wise loss(generator): 类似自编码器的loss, LMSE=∥G1(ILR)−IHR∥2+∥G2(G1(ILR))−IHR∥2 L M S E = ‖ G 1 ( I L R ) − I H R ‖ 2 + ‖ G 2 ( G 1 ( I L R ) ) − I H R ‖ 2 , 其中 G1,G2 G 1 , G 2 分别表示SRN, RN
adversarial loss(discriminator): Ladv=log(1−D(G(ILR))) L a d v = log ( 1 − D ( G ( I L R ) ) )
Classification loss: Lclc=log() L c l c = log ( ) , 不用softmax loss?
结合三个loss进行加权求和就得到最终的loss
这样的工作本人最近在MNIST上也做过,只不过并非对于超分辨任务,真是不谋而合!