VIT探索笔记 (AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE)
VIT探索笔记CodePaper[vit-pytorch](https://github.com/lucidrains/vit-pytorch/tree/main)参看学习bilibili视频11.1VisionTransformer(vit)网络详解ViT论文逐段精读【论文精读】切入点如何把一张图变成一句话?对输入图做切割,分块,每一块就是一个token(单词);假设一张224x224x3的图,