ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

  VQA的相关应用及其挑战:
  VQA is of great importance to many applications, including image retrieval, early education, and navigation for blind people as it provides user-specific information through the understanding of both the natural language questions and image content.VQA is a highly challenging problem as it requires the machine to understand natural language queries, extract semantic contents from images, and relate them in a unified framework。
  本文提出question-guided attention map(QAM),把QAM当作是潜在的信息,这些map并不需要为各种可能的搜索提供明确的标签。QAM是通过在空间图像特征图中搜索与问题语义相关的视觉特征生成的。这种搜索是通过configurable convolution neural network实现的,这个网路是利用feature map 与configurable convolutional kernel卷积而成。configurable convolutional kernel是一个特殊的卷积核,在把问题向量映射到视觉特征空间中过程中生成configurable convolutional kernel
  框架图:
  ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering_第1张图片
  主要分为4个模块:图像特征提取、问题特征提取、Attention提取、Answer生成。

你可能感兴趣的:(VQA)