【文献阅读】seada-VQA对数据进行对抗增强并保留语义正确性(R. Tang等人,ArXiv,2020)


文章题目:《Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering》


文章引用格式:Ruixue Tang, Chao Ma, Wei Emma Zhang, Qi Wu, and Xiaokang Yang. "Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering". arXiv preprint, arXiv: 2007.09592, 2020.



Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the major tricks for DNN, has been widely used in many computer vision tasks. However, there are few works studying the data augmentation problem for VQA and none of the existing image based augmentation schemes (such as rotation and flipping) can be directly applied to VQA due to its semantic structure - an  triplet needs to be maintained correctly. For example, a direction related Question-Answer (QA) pair may not be true if the associated image is rotated or flipped. In this paper, instead of directly manipulating images and questions, we use generated adversarial examples for both images and questions as the augmented data. The augmented examples do not change the visual properties presented in the image as well as the semantic meaning of the question, the correctness of the  is thus still maintained. We then use adversarial learning to train a classic VQA model (BUTD) with our augmented data. We find that we not only improve the overall performance on VQAv2, but also can withstand adversarial attack effectively, compared to the baseline model. The source code is available at https://github.com/zaynmi/seada-vqa.



一般的数据增强方式有两种:图像内容变换(data warping),包括几何、颜色变换,随机擦除,对抗训练,风格迁移;和图像采样(oversampling)。对于VQA的数据增强目前也没有相关工作,这是因为增强的同时还需要维持语义的正确性。

之前的类似工作是基于给定图片和答案,来生成类似问题,这种task叫VQG(Visual Question Generation),之前也读过类似文献(比如【文献阅读】具有循环一致性的鲁棒VQA与数据集VQA-Rephrasings(M. Shah等人,CVPR,2019))。但是这种方法会生成一些奇怪的句子或者存在语法错误。而且使用了和原数据集相同的分布,这样并不会减轻过拟合。


- We propose to generate visual and textual adversarial examples to augment the VQA dataset. Our generated data preserve the semantics and explore the learned decision boundary to help improve the model generalization. 提出了一个VQA中图像和文本对抗样本增强的方法。且增强的样本能够保留语义正确性

- We propose an adversarial training scheme that enables VQA models to take advantage of the regularization power of adversarial examples. 提出了一个对抗训练机制。使得VQA能够利用这些对抗样本。

- We show that the model trained with our method achieves 65.16% accuracy on the clean validation set, beating its vanilla training counterpart by 1.84%. Moreover, the adversarially trained model signicantly increases accuracy on adversarial examples by 21.55%. 达到了65.16%的精度,相较于原始模型提升了1.84%。

1. 相关工作




2. 模型

模型结构 如下所示:

【文献阅读】seada-VQA对数据进行对抗增强并保留语义正确性(R. Tang等人,ArXiv,2020)_第1张图片

这里的baseline采用BUTD模型,图像特征的提取采用Fastert RCNN。上图表示了模型的大致思路:对于给定的IQA,先生成一个问题的解释并保存,然后根据该解释生成对抗样本。



对抗样本生成:主要目的是对输入数据增加少量扰动,来产生错误结果。这里作者使用基于梯度的攻击方法IFGSM(Iterative Fast Gradient Sign Method),以生成对抗样本。

语义保留:上述方法不能直接用于生成对抗样本,主要是因为文本是离散的,另外Lp范数自然也就不合适。此外,文本中的个别单词的改变也会改变其语义。因此作者采用了句子到句子的解释模型sequence-to-sequence paraphrasing model。盖默星基于自编码的自然机器翻译框架(Neural Machine Translation)(模型代码参见https://github.com/OpenNMT/OpenNMT-py),RNN编码器将源句子编码为一个向量,条件RNN解码器逐字生成目标语句。模型的损失函数用的softmax。


【文献阅读】seada-VQA对数据进行对抗增强并保留语义正确性(R. Tang等人,ArXiv,2020)_第2张图片



【文献阅读】seada-VQA对数据进行对抗增强并保留语义正确性(R. Tang等人,ArXiv,2020)_第3张图片

3. 实验


【文献阅读】seada-VQA对数据进行对抗增强并保留语义正确性(R. Tang等人,ArXiv,2020)_第4张图片

