Generative Adversarial Text to Image Synthesis[reading notes]

文章目录

  • Abstract
  • Introduction
  • Background
    • Generative adversarial networks
    • Deep symmetric structured joint embedding
  • Method

Abstract

In this work, we develop a novel deeparchitecture and GAN formulation to effectively bridge these advances in text and image model-ing, translating visual concepts from charactersto pixels.

Introduction

In this work we are interested in translating text in the formof single-sentence human-written descriptions directly intoimage pixels.
将单句文本转化为 图像像素。

example
“this small bird has a short,pointy orange beak and white belly”
”the petals of thisflower are pink and the anther are yellow”.

其他方法:
this type of detailed visual information aboutan object has been captured in attribute representations -distinguishing characteristics the object category encodedinto a vector (Farhadi et al., 2009; Kumar et al., 2009;Parikh & Grauman, 2011; Lampert et al., 2014)
(属性表示 - 区分特征对象类别编码到向量)

in particular to enable zero-shot visual recognition (Fu et al., 2014;Akata et al., 2015),
zero-shot 视觉识别

and recently for conditional image generation (Yan et al., 2015).
条件图像生成

Recently, deep convolutional and recurrent networks fortext have yielded highly discriminative and generaliz-able (in the zero-shot learning sense) text representationslearned automatically from words and characters (Reedet al., 2016).
从文字和字符自动学习的文本表示

需要解决两个问题:

  • first, learn a text feature representation that captures the important visual details;
  • second, use these features to synthesize a compelling image that a human might mistake for real
    学习捕获重要视觉细节的文本特征表示,使用这些特征合成图像。

二者的子问题是自然语言表达图像合成
子问题没有解决的是:
the distribution of images conditionedon a text description is highly multimodal, in the sense thatthere are very many plausible configurations of pixels thatcorrectly illustrate the description.(在文本描述中调节的图像分布是高度多模态的,在某种意义上,有很多合理的像素配置可以正确地说明描述。)

但是通过根据链规则可以顺序分解单词或字符序列这一事实,学习变得切实可行; 即
one trains the model to predict the nexttoken conditioned on the image and all previous tokens,which is a more well-defined prediction problem.

主要贡献:
to develop a sim-ple and effective GAN architecture and training strat-egy that enables compelling text to image synthesis ofbird and flower images from human-written descriptions.
(开发一种简单有效的GAN架构和培训策略,使人们能够通过人工描述的图像合成鸟类和花卉图像。)

数据集:

  • Caltech-UCSD Birds dataset
  • Oxford-102 Flowers dataset

每张图片with five text descriptions

Ourmodel is trained on a subset of training categories, and wedemonstrate its performance both on the training set cate-gories and on the testing set, i.e. “zero-shot” text to imagesynthesis. (我们的模型在训练类别的子集上进行训练,并且我们在训练集类别和测试集上表现出它的表现,即对图像合成的“零射击”文本。)

Background

Generative adversarial networks

Generative Adversarial Text to Image Synthesis[reading notes]_第1张图片

Deep symmetric structured joint embedding

Generative Adversarial Text to Image Synthesis[reading notes]_第2张图片

Generative Adversarial Text to Image Synthesis[reading notes]_第3张图片

Method

Our approach is to train a deep convolutional generativeadversarial network (DC-GAN) conditioned on text fea-tures encoded by a hybrid character-level convolutional-recurrent neural network. Both the generator network Gand the discriminator network D perform feed-forward in-ference conditioned on the text feature.
我们的方法是训练深度卷积生成对抗网络(DC-GAN),其条件是由混合字符级卷积 - 递归神经网络编码的文本特征。 发生器网络G和鉴别器网络D都以文本特征为条件执行前馈信息。

你可能感兴趣的:(论文)