Visual Reasoning(2): Inferring and Executing Programs for Visual Reasoning

Inferring and Executing Programs for Visual Reasoning

  • Introduction
  • Methods
    • Training
  • Experiments
  • Conclusion

这里有一些介绍:
https://zhuanlan.zhihu.com/p/28654835
这个工作最后还搞了一部分新数据集
The CLEVR-Humans Dataset,就是用CLEVR的合成图片,让人重新写了一些question & answer,语法逻辑更natural
Visual Reasoning(2): Inferring and Executing Programs for Visual Reasoning_第1张图片

Introduction

Motivation
原先的VQA model都是input-output mappings,不具备推理能力
所以提出
a new model for visual question answering that consists of two parts: a program generator and an execution engine.
打破了以前做VQA就用CNN叠LSTM的简单粗暴套路

我们知道,CLEVR生成的时候是先有其专门的functional programs,填入参数可以得到answer
Visual Reasoning(2): Inferring and Executing Programs for Visual Reasoning_第2张图片
所以这篇工作,第一步是先去预测这些program,第二步是通过这些program预测最终的answer。
(感觉完全是针对CLEVR的生成方式而设计的模型啊。感觉本质是人依据程序逻辑创造出一个虚拟的数据,再让算法观察数据去模拟这种程序思维。而实际真实世界场景,并不一定可以这样用清晰地用逻辑解析出来)

有两种训练方式:
they can be trained separately when ground-truth programs are
available, or jointly in an end-to-end fashion.
把CLEVR生成时中间过程的programs也可拿来训练,怎么有种作弊的感觉。。。挺tricky的,这和之前的VQA比有点不太公平了

Methods

program generator是左边的seq2seq,预测出program
右边的execution也全都是neural network组成,输入是program和image,输出是所有可能答案的概率分布,相当于一个分类器:
Visual Reasoning(2): Inferring and Executing Programs for Visual Reasoning_第3张图片
关于Execution Engine,是由若干个模块组装而成的
而且会依据不同的program z选择不同的module组装,然后执行得出answer

参看下知乎:
https://zhuanlan.zhihu.com/p/28654835
Visual Reasoning(2): Inferring and Executing Programs for Visual Reasoning_第4张图片
Visual Reasoning(2): Inferring and Executing Programs for Visual Reasoning_第5张图片

Training

利用ground-truth programs分别训练 both the program generator and execution engine,效果自然非常好。然鹅,
Annotating ground-truth programs for free-form natural language questions is expensive, so in practice we may have few or no ground-truth programs.

没有ground-truth programs只得使用REINFORCE,jointly training
实际效果很差,不好训练优化,所以又提出 semi-supervised learning approach

First, use a small set of ground-truth programs to train the program generator
Then, fix the program generator and train the execution engine using predicted programs on a large dataset of (x,q,a) triples.
Finally, we use REINFORCE to jointly finetune the program generator and execution engine.

Experiments

少量ground-truth programs,配合强化学习end-to-end,也取得了很好的效果,说明如果能标注出少量中间推理过程,也可以让模型具备推理能力
Visual Reasoning(2): Inferring and Executing Programs for Visual Reasoning_第6张图片

Conclusion

Visual Reasoning(2): Inferring and Executing Programs for Visual Reasoning_第7张图片

你可能感兴趣的:(Reasoning)