从Transformers学习跨模态编码器表示《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》
目录一、文献摘要介绍二、网络框架介绍三、实验分析四、结论这是视觉问答论文阅读的系列笔记之一,本文有点长,请耐心阅读,定会有收货。如有不足,随时欢迎交流和探讨。一、文献摘要介绍Vision-and-languagereasoningrequiresanunderstandingofvisualconcepts,languagesemantics,and,mostimportantly,thealig