(详细版)Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
HaoranWei1∗,LingyuKong2∗,JinyueChen2,LiangZhao1,ZhengGe1†,JinrongYang3,JianjianSun1,ChunruiHan1,XiangyuZhang11MEGVIITechnology2UniversityofChineseAcademyofSciences3HuazhongUniversityofScienceandTechno