image captioning经典论文分类整理+部分有源码

Attention-Based Methods
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR 2015.
https://github.com/karpathy/neuraltalk
 
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. ICML 2015. 
https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning、https://github.com/yunjey/show-attend-and-tell
 
P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-down attention for image captioning and visual question answering. CVPR 2018. https://github.com/peteanderson80/bottom-up-attention
 
J. Gu, J. Cai, G. Wang, and T. Chen. Stack-captioning: Coarse-to-fine learning for image captioning. AAAI 2018. 
https://github.com/showkeyjar/chinese_im2text.pytorch
 
L. Huang, W. Wang, J. Chen, and X.-Y. Wei. Attention on attention for image captioning. ICCV, 2019.
https://github.com/husthuaan/AoANet
 
W. Jiang, L. Ma, Y.-G. Jiang, W. Liu, and T. Zhang. Recurrent fusion network for image captioning. ECCV 2018.
 
Attention-Based Methods that Consider Spatial and Semantic Relations between Image Elements
Image captioning: Transforming objects into words. NIPS 2019. S. Herdade, A. Kappeler, K. Boakye, and J. Soares.
​​​​​​​https://github.com/yahoo/object_relation_transformer
 
X-linear attention networks for image captioning. CVPR, 2020. Y. Pan, T. Yao, Y. Li, and T. Mei. https://github.com/Panda-Peter/image-captioning
 
F. Liu, X. Ren, Y. Liu, K. Lei, and X. Sun. Exploring and distilling cross-modal information for image captioning. IJCAI, 2020.
 
Meshed-memory transformer for image captioning. CVPR 2020. M. Cornia, M. Stefanini, L. Baraldi, and R. Cucchiara.
https://github.com/aimagelab/meshed-memory-transformer
 
Oscar: Object semantics aligned pre-training for vision-language tasks. ECCV 2020. X. Li, X. Yin, C. Li, P. Zhang, X. Hu, L. Zhang, L. Wang, H. Hu, L. Dong, F. Wei, et al.https://github.com/microsoft/Oscar
 
Unified vision-language pre-training for image captioning and vqa. AAAI 2020. L. Zhou, H. Palangi, L. Zhang, H. Hu, J. Corso, and J. Gao.
https://github.com/LuoweiZhou/VLP
 
Show, control and tell: A framework for generating controllable and grounded captions. CVPR 2019. M. Cornia, L. Baraldi, and R. Cucchiara.https://github.com/aimagelab/show-control-and-tell
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. 2017-CVPR. Jiasen Lu2, Caiming Xiong, Devi Parikh. 
https://github.com/jiasenlu/AdaptiveAttention​​​​​​​
Graph-Based Methods for Spatial and Semantic Relations between Image Elements
Auto-encoding scene graphs for image captioning. CVPR, 2019. X. Yang, K. Tang, H. Zhang, and J. Cai.
https://github.com/yangxuntu/SGAE
 
J. Gu, S. Joty, J. Cai, H. Zhao, X. Yang, and G. Wang. Unpaired image captioning via scene graph alignments. ICCV 2019.
 
Yiwu Zhong, Liwei Wang, et al. Comprehensive Image Captioning via Scene Graph Decomposition. ECCV 2020. 
https://github.com/YiwuZhong/Sub-GC
 
​​​​​​​
Combining Attention-Based Methods and Graph-Based Methods
T. Yao, Y. Pan, Y. Li, and T. Mei. Exploring visual relationship for image captioning. ECCV 2018. https://github.com/airsplay/VisualRelationships
 
S. Chen, Q. Jin, P. Wang, and Q. Wu. Say as you wish: Fine-grained control of image caption generation with abstract scene graphs. CVPR 2020.
https://github.com/cshizhe/asg2cap
 
Convolutional-Based Methods
J. Aneja, A. Deshpande, and A. G. Schwing. Convolutional image captioning. CVPR 2018. 
https://github.com/aditya12agd5/convcap
 
Q. Wang and A. B. Chan. Cnn+ cnn: Convolutional decoders for image captioning. CoRR, 2018. https://github.com/qingzwang/GHA-ImageCaptioning
 
Unsupervised Methods and Reinforcement Learning
C. Chen, S. Mu, W. Xiao, Z. Ye, L. Wu, and Q. Ju. Improving image captioning with conditional generative adversarial nets. AAAI 2019. https://github.com/Anjaney1999/image-captioning-seqgan
 
X. Liu, H. Li, J. Shao, D. Chen, and X. Wang. Show, tell and discriminate: Image captioning by self-retrieval with partially labeled data. ECCV 2018.
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings. ICCV2019.
 
Generating Multi-Style Captions
SentiCap: Generating Image Descriptions with Sentiments. Alexander Mathews  2016. 数据集
StyleNet: Generating Attractive Visual Captions with Styles. Chuang Gan et al. CVPR 2017.
“Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention. Tianlang Chen et al. CVPR 2018. 
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text. Mathews A et al. CVPR 2018. 
https://github.com/computationalmedia/semstyle
Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training. ACM MM 2018. 
https://github.com/researchmm/img2poem
Engaging image captioning via personality. K. Shuster, S. Humeau, H. Hu, A. Bordes, and J. Weston. CVPR 2019. 
Mscap: Multi-style image captioning with unpaired stylized text. L. Guo, J. Liu, P. Yao, J. Li, and H. Lu. CVPR 2019. 
Unsupervised Stylish Image Description Generation via Domain Layer Norm. Cheng-Kuan Chen et al. AAAI 2019.
MemCap: Memorizing Style Knowledge for Image Captioning.  Wentian Zhao, et al. AAAI 2020.
Human-like Controllable Image Captioning with Verb-specific Semantic Roles. Long Chen, Zhihong Jiang, Jun Xiao, Wei Liu.  CVPR 2021. 
https://github.com/mad-red/VSR-guided-CIC
3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model. Chengxi Li and Brent Harrison. arkiv2021.
StyleM: Stylized Metrics for Image Captioning Built with Contrastive N-grams. Chengxi Li and Brent Harrison. arkiv2022. 风格化描述度量指标
Papers about image caption按年份和会议补充
(2015-2020):
https://github.com/zhjohnchan/awesome-image-captioning
 
CVPR 2019:
Unsupervised Image Captioning - Yang F et al, CVPR 2019. https://github.com/fengyang0317/unsupervised_captioning
 
Pointing Novel Objects in Image Captioning - Li Y et al, CVPR 2019.
 
Context and Attribute Grounded Dense Captioning - Yin G et al, CVPR 2019.
 
Look Back and Predict Forward in Image Captioning - Qin Y et al, CVPR 2019.
 
Self-critical n-step Training for Image Captioning - Gao J et al, CVPR 2019.
 
Intention Oriented Image Captions with Guiding Objects - Zheng Y et al, CVPR 2019.
 
Describing like humans: on diversity in image captioning - Wang Q et al, CVPR 2019. https://github.com/qingzwang/DiversityMetrics
 
Adversarial Semantic Alignment for Improved Image Captions - Dognin P et al, CVPR 2019. 
https://github.com/vacancy/SceneGraphParser
 
Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech - Aditya D et al, CVPR 2019.
 
Good News, Everyone! Context driven entity-aware captioning for news images - Biten A F et al, CVPR 2019. 
https://github.com/furkanbiten/GoodNews
 
CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection - Zhang L et al, CVPR 2019. 
https://github.com/zhangludl/code-and-dataset-for-CapSal
 
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning - Kim D et al, CVPR 2019. 
https://github.com/Dong-JinKim/DenseRelationalCaptioning
 
Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables - Xu Y et al, CVPR 2019.
https://github.com/wubaoyuan/adversarial-attack-to-caption
 
AAAI 2019
Meta Learning for Image Captioning - Li N et al, AAAI 2019. https://github.com/facebookresearch/LaMCTS、https://github.com/linnanwang/AlphaX-NASBench101
 
Learning Object Context for Dense Captioning - Li X et al, AAAI 2019. https://github.com/ttengwang/ESGN
 
Hierarchical Attention Network for Image Captioning - Wang W et al, AAAI 2019. https://github.com/ltguo19/VSUA-Captioning
 
Improving Image Captioning with Conditional Generative Adversarial Nets - Chen C et al, AAAI 2019.
https://github.com/Anjaney1999/image-captioning-seqgan
 
ICCV 2019
Hierarchy Parsing for Image Captioning - Yao T et al, ICCV 2019.
 
Entangled Transformer for Image Captioning - Li G et al, ICCV 2019.
 
Reflective Decoding Network for Image Captioning - Ke L at al, ICCV 2019. 
https://github.com/researchmm/generate-it
 
Learning to Collocate Neural Modules for Image Captioning - Yang X et al, ICCV 2019.
 
NeurIPS 2019
Adaptively Aligned Image Captioning via Adaptive Attention Time - Huang L et al, NeurIPS 2019. 
https://github.com/husthuaan/AAT 
 
Variational Structured Semantic Inference for Diverse Image Captioning - Chen F et al, NeurIPS 2019.
 
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations - Liu F et al, NeurIPS 2019. 
https://github.com/fenglinliu98/MIA
 
IJCAI 2019
Image Captioning with Compositional Neural Module Networks - Tian J et al, IJCAI 2019.
 
Exploring and Distilling Cross-Modal Information for Image Captioning - Liu F et al, IJCAI 2019.
 
Swell-and-Shrink: Decomposing Image Captioning by Transformation and Summarization - Wang H et al, IJCAI 2019.
 
Hornet: a hierarchical offshoot recurrent network for improving person re-ID via image captioning - Yan S et al, IJCAI 2019.
 
AAAI 2020
MemCap: Memorizing Style Knowledge for Image Captioning - Zhao et al, AAAI 2020. 
https://github.com/entalent/MemCap
 
Unified Vision-Language Pre-Training for Image Captioning and VQA - Zhou L et al, AAAI 2020. 
https://github.com/LuoweiZhou/VLP
 
Show, Recall, and Tell: Image Captioning with Recall Mechanism - Wang L et al, AAAI 2020.
 
Reinforcing an Image Caption Generator using Off-line Human Feedback - Hongsuck Seo P et al, AAAI, 2020.
 
Interactive Dual Generative Adversarial Networks for Image Captioning - Liu et al, AAAI 2020.
 
Feature Deformation Meta-Networks in Image Captioning of Novel Objects - Cao et al, AAAI 2020.
 
Joint Commonsense and Relation Reasoning for Image and Video Captioning - Hou et al, AAAI 2020.
 
Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption - Zhang et al, AAAI 2020.
 
CVPR 2020
Normalized and Geometry-Aware Self-Attention Network for Image Captioning - Guo L et al, CVPR 2020.
 
Object Relational Graph with Teacher-Recommended Learning for Video Captioning - Zhang Z et al, CVPR 2020.
 
More Grounded Image Captioning by Distilling Image-Text Matching Model. 
https://github.com/YuanEZhou/Grounded-Image-Captioning
 
Better Captioning with Sequence-Level Exploration.
 
ECCV 2020
Length-Controllable Image Captioning - Deng C et al, ECCV 2020. 
https://github.com/ruotianluo/self-critical.pytorch
 
Captioning Images Taken by People Who Are Blind - Gurari D et al, ECCV 2020.
 
Towards Unique and Informative Captioning of Images - Wang Z et al, ECCV 2020. 
https://github.com/princetonvisualai/SPICE-U
 
Learning Visual Representations with Caption Annotations - Sariyildiz M et al, ECCV 2020. https://github.com/MicPie/clasp
 
SODA: Story Oriented Dense Video Captioning Evaluation Framework - Fujita S et al, ECCV 2020. 
https://github.com/fujiso/SODA
 
TextCaps: a Dataset for Image Captioning with Reading Comprehension - Sidorov O et al, ECCV 2020.
 
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets - Wang J et al, ECCV 2020.
 
Learning to Generate Grounded Visual Captions without Localization Supervision - Ma C et al, ECCV 2020. 
https://github.com/chihyaoma/cyclical-visual-captioning
 
Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards - Yang X et al, ECCV 2020. 
https://github.com/xuewyang/Fashion_Captioning
 
NeurIPS 2020
Diverse Image Captioning with Context-Object Split Latent Spaces - Mahajan S et al, NeurIPS 2020. 
https://github.com/visinf/cos-cvae
 
RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning - Chiaro R et al, NeurIPS 2020. 
https://github.com/delchiaro/RATT
 
CVPR 2021
Towards Accurate Text-based Image Captioning with Content Diversity Exploration. Guanghui Xu et al. CVPR2021. 
https://github.com/guanghuixu/AnchorCaptioner
 
Image Change Captioning by Learning from an Auxiliary Task.  Mehrdad Hosseinzadeh and Yang Wang. 
 
FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation.  Sijin Wang et al. 
Improving OCR-based Image Captioning by Incorporating Geometrical Relationship. Jing Wang et al.
 
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. Dave Zhenyu Chen et al. 
https://github.com/daveredrum/Scan2Cap
CVPR2022

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.Junnan Li Dongxu Li Caiming Xiong Steven Hoi.2022-CVPR. https://github.com/salesforce/BLIP 

未完

2022-02-13

by littleoo 

你可能感兴趣的:(论文阅读,神经网络,深度学习,计算机视觉,自然语言处理,人工智能)