VIT(AN IMAGE IS WORTH 16X16 WORDS:TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE)论文网络结构简析
源码来源作者:Adenialzz(本文仅对此源码进行理解,仅为个人观点,如有错误,请指正!)论文地址:https://arxiv.org/abs/2010.11929上述源码输入案例参数(depth=6,img_size=256,patch_size=32,num_classes=1000,dim=1024,heads=16,mlp_dim=2048,dropout=0.1,embedding_d