2022 CVPR VQA相关论文

        以下内容是今年4月份汇总的~

        用关键词大概检索出8篇VQA相关论文。其中有两篇研究的是基于外部知识的视觉问答,一篇是场景文本视觉问答,这些都是提出的新模型。另外有两篇是在数据方面做工作,有一篇是鲁棒性研究,有一篇是在研究VQA模型的后门攻击,最后这篇是提出一种推理策略用于模型的训练。

LaTr: Layout-Aware Transformer for Scene-Text VQAicon-default.png?t=M5H6https://openaccess.thecvf.com/content/CVPR2022/html/Biten_LaTr_Layout-Aware_Transformer_for_Scene-Text_VQA_CVPR_2022_paper.html        为场景文本视觉问答ST-VQA)提出多模态模型架构

SimVQA: Exploring Simulated Environments for Visual Question Answeringicon-default.png?t=M5H6https://openaccess.thecvf.com/content/CVPR2022/html/Cascante-Bonilla_SimVQA_Exploring_Simulated_Environments_for_Visual_Question_Answering_CVPR_2022_paper.html        用计算机合成数据,拓展现有数据集,用于VQA

A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answeringicon-default.png?t=M5H6https://arxiv.org/abs/2201.05299        对基于外部知识的VQAOK-VQA)任务,提出一种转换-检索-生成框架 (TRiG) 框架 

SwapMix: Diagnosing and Regularizing the Over-reliance on Visual Context in Visual Question Answering icon-default.png?t=M5H6https://openaccess.thecvf.com/content/CVPR2022/html/Gupta_SwapMix_Diagnosing_and_Regularizing_the_Over-Reliance_on_Visual_Context_in_CVPR_2022_paper.html        提出SwapMix扰动技术,评估VQA模型鲁棒性与数据增强。 SwapMix: Diagnosing and Regularizing the Over-reliance on Visual Context in Visual Question Answering 

Dual-Key Multimodal Backdoors for Visual Question Answeringicon-default.png?t=M5H6https://openaccess.thecvf.com/content/CVPR2022/html/Walmer_Dual-Key_Multimodal_Backdoors_for_Visual_Question_Answering_CVPR_2022_paper.html        研究VQA模型的后门攻击

MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answeringicon-default.png?t=M5H6https://openaccess.thecvf.com/content/CVPR2022/html/Ding_MuKEA_Multimodal_Knowledge_Extraction_and_Accumulation_for_Knowledge-Based_Visual_Question_CVPR_2022_paper.html        提出MuKEA框架,用于完成基于外部知识的VQAOK-VQA)任务

Grounding Answers for Visual Questions Asked by Visually Impaired Peopleicon-default.png?t=M5H6https://openaccess.thecvf.com/content/CVPR2022/html/Chen_Grounding_Answers_for_Visual_Questions_Asked_by_Visually_Impaired_People_CVPR_2022_paper.html        提出了VizWiz-VQA-Grounding 数据集

Maintaining Reasoning Consistency in Compositional Visual Question Answeringicon-default.png?t=M5H6https://openaccess.thecvf.com/content/CVPR2022/html/Jing_Maintaining_Reasoning_Consistency_in_Compositional_Visual_Question_Answering_CVPR_2022_paper.html        提出了一种类似对话的推理方法,用于在回答组合问题及其子问题时保持推理一致性。

你可能感兴趣的:(VQA,人工智能)