论文笔记:Aligning where to see and what to tell: image caption with region-based attention ...

Aligning where to see and what to tell: image caption with region-based attention and scene factorization

rXiv:1506.06272v1  [cs.CV]  20 Jun 2015


摘要部分:


本文提出一种图像文字标注系统利用了图像与句子之间的平行结构

下面翻译的不好,附原文

In our model, the process of generating the next word, given the previously generated ones, is aligned with the visual perception experience where the attention shifting among the visual regions imposes a thread of visual ordering. This alignment characterizes the flow of “abstract meaning”, encoding what is semantically shared by both the visual scene and the text description. Our system also makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image. The contexts adapt language models for word generation to specific scene types.

在该模型中,在给定前一个生成的词情况下产生下一个词的过程中, 与视觉感知信息体验对齐, 该视觉感知体验位于在视觉区域之间移动注意力时产生的一连串视觉顺序该对齐刻画了‘抽象含义’流,对同时被视觉场景和文字描述在语义上共享的信息编码
通过引入特定场景上下文,俘获图像中高级语义信息编码,本系统提出了另一种新模型。
该上下文自适应语言模型以便针对特定场景类型生成词。


接下来作者鼓吹效果.....



占位,持续更新ing

你可能感兴趣的:(论文笔记,计算机视觉,自然语言处理,深度学习,人工智能,VQA,图像处理,自然语言处理,计算机视觉)