[Swin Transformer] Swin Transformer: HierarchicalVision Transformer using Shifted Windows
1.Motivation将transformer从NLP应用于CV领域存在以下2个方面的挑战,图像尺度的多样性,以及图像像素相对于words的高分辨率,这会造成内存大的花销。ChallengesinadaptingTransformerfromlanguagetovisionarisefromdifferencesbetweenthetwodomains,suchaslargevariations