Distilled Dual-Encoder Model for Vision-Language Understanding
视觉语言理解的提取双编码器模型ZekunWang†∗,WenhuiWang‡,HaichaoZhu†,MingLiu†,BingQin†,FuruWei‡†HarbinInstituteofTechnology,Harbin,China‡MicrosoftResearch,Beijing,China{zkwang,hczhu,mliu,qinb}@ir.hit.edu.cn{wenwan,fuwe