[2101] [ICCV 2021] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
papercodeContentIntroductionMethodmodelarchitecturetoken-to-token(T2T)re-structurizationsoftsplitT2TmoduleT2T-ViTbackbonearchitecturevariantsExperimentT2T-ViTonImageNetfromCNNtoViTablationstudyIntrodu