Visual reasoning

1、Visual reasoning[from RAVEN CVPR19]

Early attempts were made in 1940s-1970s in the field of logic-based AI. Newell argued that one of the potential solutions to AI was “to construct a single program that would take a standard intelligence test” [42].There are two important trials: (i) Evans presented an AI algorithm that solved a type of geometric analogy tasks in the Wechsler Adult Intelligence Scale (WAIS) test [10, 11],and (ii) Simon and Kotovsky devised a program that solved Thurstone letter series completion problems [54]. However,these early attempts were heuristic-based with hand-crafted rules, making it difficult to apply to other problems.
The reasoning ability of modern vision systems was first systematically analyzed in the CLEVR dataset [22].By carefully controlling inductive bias and slicing the vision systems’ reasoning ability into several axes, Johnson et al. successfully identified major drawbacks of existing models. A subsequent work [23] on this dataset achieved good performance by introducing a program generator in a structured space and combining it with a program execution engine. A similar work that also leveraged language guided structured reasoning was proposed in [18]. Modules with special attention mechanism were latter proposed in an end-to-end manner to solve this visual reasoning task [19, 49, 59]. However, superior performance gain was observed in very recent works [6, 36, 58] that fell back to structured representations by using primitives, dependency trees, or logic. These works also inspire us to incorporate structure information into solving the RPM problem.

More generally, Bisk et al. [4] studied visual reasoning in a 3D block world. Perez et al. [46] introduced a conditional layer for visual reasoning. Aditya et al. [1] proposed a probabilistic soft logic in an attention module to increase
model interpretability. And Barrett et al. [3] measured abstract reasoning in neural networks.

翻译:

        20世纪40-70年代,人们在基于逻辑的人工智能领域进行了早期尝试。纽厄尔认为,人工智能的一个潜在解决方案是“构建一个接受标准智力测试的单一程序”[42]。有两个重要的试验:(i)埃文斯提出了一种人工智能算法,解决了韦氏成人智力量表(WAIS)测试中的一类几何类比任务;(ii)西蒙和科托夫斯基设计了一个解决瑟斯通字母系列完成问题的程序[54]。然而,这些早期的尝试都是基于启发式的手工规则,因此很难应用于其他问题。
       CLEVR数据集[22]首次系统地分析了现代视觉系统的推理能力,通过仔细控制归纳偏差,将视觉系统的推理能力分成几个轴,Johnson等人。成功地识别了现有模型的主要缺点。关于这个数据集的后续工作[23]通过在结构化空间中引入程序生成器并将其与程序执行引擎结合,获得了良好的性能。文献[18]中提出了一项类似的工作,也利用了语言引导的结构化推理。后者以端到端的方式提出具有特殊注意机制的模块来解决这一视觉推理任务[19,49,59]。然而,在最近的研究中发现了优越的性能增益[6,36,58],这些工作通过使用原语、依赖树或逻辑回到结构化表示。这些工作也启发我们将结构信息融入到解决RPM问题中。             更广泛地说,Bisk等人。[4] 研究了三维块世界中的视觉推理。Perez等人。[46]为视觉推理引入了条件层。Aditya等人。[1] 提出了一种概率软逻辑在注意模块中的增加模型可解释性。Barrett等人。[3] 神经网络中的度量抽象推理。

你可能感兴趣的:(深度学习,神经网络)