视觉推理是结合了计算机视觉和自然语言处理的一个重要方向。以下为我于7月11-7月22号之间整理的相关论文和数据集,特此记录
[1]CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
[2]A Dataset and Architecture for Visual Reasoning with a Working Memory
[3]GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
[4]FigureQA: An Annotated Figure Dataset for Visual Reasoning
[5]RAVEN: A Dataset for Relational and Analogical Visual Reasoning
[6]TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
https://github.com/YunseokJANG/tgif-qa
[1] Johnson J, Hariharan B, van der Maaten L, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning.
The CLEVR dataset http://cs.stanford.edu/people/jcjohns/clevr/.
[2] Santoro A, Raposo D, Barrett D G T, et al. A simple neural network module for relational reasoning.
[3] Hu R, Andreas J, Rohrbach M, et al. Learning to reason: End-to-end module networks for visual question answering.
[4] Johnson J, Hariharan B, van der Maaten L, et al. Inferring and Executing Programs for Visual Reasoning.
https://github.com/facebookresearch/clevr-iep.
[5] Perez E, de Vries H, Strub F, et al. Learning Visual Reasoning Without Strong Priors
[6]A Read-Write Memory Network for Movie Story Understanding(ICCV.2017.80)
https://github.com/seilna/RWMN
[7]Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning (CVPR), 2018,
http://openaccess.thecvf.com/content_cvpr_2018/html/Mascharka_Transparency_by_Design_CVPR_2018_paper.html
https://github.com/davidmascharka/tbd-nets
https://baijiahao.baidu.com/s?id=1595193639060011837&wfr=spider&for=pc
[8]Iterative Visual Reasoning Beyond Convolutions( CVPR,2018)
http://openaccess.thecvf.com/content_cvpr2018/html/Chen_Iterative_Visual_Reasoning_CVPR_2018_paper.html
https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/79787737
[9]Film Visual reasoning with a general conditioning layer(AAAI, 2018)
[10]Object Level Visual Reasoning in Videos(ECCV, 2108)
[1]Compositional Attention Networks for Machine Reasoning(ICLA,2018)
[2]Learning Visual Reasoning Without Strong Priors
[3]Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
[4]Dual Attention Networks for Multimodal Reasoning and Matching(CVPR,2017)
[5]An enhanced SSD with feature fusion and visual reasoning for object detection
[6]Spatial Knowledge Distillation to Aid Visual Reasoning (WACV,2019)
[7]MUREL: Multimodal Relational Reasoning for Visual Question Answering(CVPR,2019)
https://github.com/Cadene/murel.bootstrap.pytorch
[8]Explainable and Explicit Visual Reasoning over Scene Graphs(CVPR,2019)
[1]iVQA: Inverse Visual Question Answering (CVPR.2018.00898)
学习笔记 https://blog.csdn.net/jiang6869732/article/details/81392761
[2]Making the V in VQA Matter: Elevating the Role of Image Under standing in Visual Question Answering
[3]Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering(ICCV.2017.202)
https://github.com/yuzcccc/mfb.
[4]Visual Question Answering A Survey of Methods and Datasets
[5]Image Captioning with Semantic Attention(CVPR, 2016)
https://github.com/siavash9000/im2txt_demo
下载NLTK数据 http://www.nltk.org/data.html
训练好的模型 https://github.com/tensorflow/models