【论文笔记】Unified Vision-Language Pre-Training for Image Captioning and VQA
ThispaperpresentsaunifiedVision-LanguagePre-training(VLP)model.Themodelisunifiedinthat(1)itcanbefine-tunedforeithervision-languagegeneration(e.g.,imagecaptioning)orunderstanding(e.g.,visualquestionans