In this work, we develop a novel deeparchitecture and GAN formulation to effectively bridge these advances in text and image model-ing, translating visual concepts from charactersto pixels.
In this work we are interested in translating text in the formof single-sentence human-written descriptions directly intoimage pixels.
将单句文本转化为 图像像素。
example
“this small bird has a short,pointy orange beak and white belly”
”the petals of thisflower are pink and the anther are yellow”.
其他方法:
this type of detailed visual information aboutan object has been captured in attribute representations -distinguishing characteristics the object category encodedinto a vector (Farhadi et al., 2009; Kumar et al., 2009;Parikh & Grauman, 2011; Lampert et al., 2014)
(属性表示 - 区分特征对象类别编码到向量)
in particular to enable zero-shot visual recognition (Fu et al., 2014;Akata et al., 2015),
zero-shot 视觉识别
and recently for conditional image generation (Yan et al., 2015).
条件图像生成
Recently, deep convolutional and recurrent networks fortext have yielded highly discriminative and generaliz-able (in the zero-shot learning sense) text representationslearned automatically from words and characters (Reedet al., 2016).
从文字和字符自动学习的文本表示
需要解决两个问题:
二者的子问题是自然语言表达 和图像合成
子问题没有解决的是:
the distribution of images conditionedon a text description is highly multimodal, in the sense thatthere are very many plausible configurations of pixels thatcorrectly illustrate the description.(在文本描述中调节的图像分布是高度多模态的,在某种意义上,有很多合理的像素配置可以正确地说明描述。)
但是通过根据链规则可以顺序分解单词或字符序列这一事实,学习变得切实可行; 即
one trains the model to predict the nexttoken conditioned on the image and all previous tokens,which is a more well-defined prediction problem.
主要贡献:
to develop a sim-ple and effective GAN architecture and training strat-egy that enables compelling text to image synthesis ofbird and flower images from human-written descriptions.
(开发一种简单有效的GAN架构和培训策略,使人们能够通过人工描述的图像合成鸟类和花卉图像。)
数据集:
每张图片with five text descriptions
Ourmodel is trained on a subset of training categories, and wedemonstrate its performance both on the training set cate-gories and on the testing set, i.e. “zero-shot” text to imagesynthesis. (我们的模型在训练类别的子集上进行训练,并且我们在训练集类别和测试集上表现出它的表现,即对图像合成的“零射击”文本。)
Our approach is to train a deep convolutional generativeadversarial network (DC-GAN) conditioned on text fea-tures encoded by a hybrid character-level convolutional-recurrent neural network. Both the generator network Gand the discriminator network D perform feed-forward in-ference conditioned on the text feature.
我们的方法是训练深度卷积生成对抗网络(DC-GAN),其条件是由混合字符级卷积 - 递归神经网络编码的文本特征。 发生器网络G和鉴别器网络D都以文本特征为条件执行前馈信息。