(Paper Reading)Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Introduction

Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings.(Paper Reading)Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering_第1张图片
In this paper we propose a combined bottom-up and top-down visual attention mechanism. The bottom-up mechanism proposes a set of salient image regions, with each region represented by a pooled convolutional feature vec- tor. Practically, we implement bottom-up attention using Faster R-CNN [33], which represents a natural expression of a bottom-up attention mechanism.
这篇论文其实

Method

(Paper Reading)Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering_第2张图片
(Paper Reading)Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering_第3张图片
(Paper Reading)Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering_第4张图片

Conclusion

(Paper Reading)Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering_第5张图片

Reference

Author slide

你可能感兴趣的:(Paper,Reading,VQA)