patch-GAN(pixel2pixel):Image-to-Image Translation with Conditional Adversarial Networks

Image-to-Image Translation with Conditional Adversarial Networks

Paper:https://arxiv.org/pdf/1611.07004.pdf
Code:https://github.com/affinelayer/Pix2Pix-tensorflow
Tips:CVPR2017的一篇paper。
(阅读笔记)

1.Main idea

  • 用条件GAN解决图像到图像的转换问题。a general-purpose solution to image-to-image translation problems.
  • 去学习损失函数来实现图像到图像的映射关系。learn a loss function to train this mapping.

2.Intro

  • 类似于语言翻译,给出了图像到图像的定义解释。we define automatic image-to-image translation as the task of translating one possible representation of a scene into another.
  • 虽然CNN已经取得了很优秀的结果,但还是需要一个目标函数。In other words, we still have to tell the CNN what we wish it to minimize.
    得益于GAN,所以可以直接学到一个高维的Loss function。
  • 之前的大多相关工作都是学习图像与图像之间的结构形式的损失,然后介绍了条件GAN的发展。

3.Details

  • 目标函数与原始GAN的目标函数差不多,只是添加了L1损失,如下式:
    L L 1 ( G ) = E x , y , z [ ∥ y − G ( x , z ) ∥ 1 ] \begin{aligned} \mathcal{L}_{L1}(G) &= \mathbb{E}_{x,y,z} \left[ \|y-G(x,z) \|_1 \right] \\ \end{aligned} LL1(G)=Ex,y,z[yG(x,z)1]
    arg ⁡ min ⁡ G max ⁡ D L c G A N ( G , D ) + λ L L 1 ( G ) \begin{aligned} \arg \min_G & \max_D \mathcal{L}_{cGAN}(G,D) + \lambda \mathcal{L}_{L1}(G) \\ \end{aligned} argGminDmaxLcGAN(G,D)+λLL1(G)
    注意到如果不加噪声 z z z,那么生成器只会学习到定式的函数(只会输出与输入 x x x很类似的结果),这样的结果是不够好的。
  • 生成器和U-net类似,自编码器并有跳跃连接的形式。
    判别器是一个马尔科夫过程(patchGAN),并不是整张图片进行判别,而是一个区域一个区域(patch)的判别,最后结果求平均得分。This discriminator tries to classify if each N × N N \times N N×N patch in an image is real or fake.
    这样以后,运行速度更快,参数更少,也能得到很好的结果。produce high quality results; has fewer parameters, runs faster, and can be applied to arbitrarily large images.
  • 但是代码的实现却还是和其他GAN一样,并没有发现patch的具体设置,于是:
    The difference between a PatchGAN and regular GAN discriminator is that rather the regular GAN maps from a 256x256 image to a single scalar output, which signifies “real” or “fake”, whereas the PatchGAN maps from 256x256 to an N × N N \times N N×N array of outputs X X X, where each X i j X_{ij} Xij signifies whether the patch i , j i,j i,j in the image is real or fake.
    参考:https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/issues/39
    Maybe it would have been better if we called it a “Fully Convolutional GAN” like in FCNs, it is the same idea.

你可能感兴趣的:(Methodology,深度学习,机器学习,神经网络)