CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)

最近,Visual Transformer 的研究热点达到了前所未有的高峰,仅 CVPR 2021 就发表了 40 多篇,应用涉及:图像分类、目标检测、实例分割、语义分割、行为识别、自动驾驶、关键点匹配、目标跟踪、NAS、low-level视觉、HoI、可解释性、布局生成、检索、文本检测等方向。

引爆CV圈 Transformer热潮的有两篇最具代表性论文,即 ECCV 2020的 DETR(目标检测)和 ICLR 2021的 ViT(图像分类)

目录

CVPR 2021 Visual Transformer 论文合集

必读的 20 篇必读 ViT 论文


CVPR 2021 Visual Transformer 论文合集

 

 

1. End-to-End Human Pose and Mesh Reconstruction with Transformers

  • Paper: https://arxiv.org/pdf/2012.09760.pdf

  • Code: https://github.com/microsoft/MeshTransformer

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第1张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第2张图片

2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

  • Paper: https://arxiv.org/pdf/2101.06184.pdf

  • Code: https://github.com/tobyperrett/trx

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第3张图片

3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain

  • Paper: https://arxiv.org/pdf/2103.16110.pdf

  • Code: https://github.com/mczhuge/Kaleido-BERT

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第4张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第5张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第6张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第7张图片

4. HOTR: End-to-End Human-Object Interaction Detection with Transformers

  • Paper: https://arxiv.org/pdf/2104.13682.pdf

  • Code: None

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第8张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第9张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第10张图片

5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

  • Paper: https://arxiv.org/pdf/2104.09224.pdf

  • Code: https://github.com/autonomousvision/transfuser

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第11张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第12张图片

6. Pose Recognition with Cascade Transformers

  • Paper: https://arxiv.org/pdf/2104.06976.pdf

  • Code: https://github.com/mlpc-ucsd/PRTR

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第13张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第14张图片

7. Variational Transformer Networks for Layout Generation

  • Paper: https://arxiv.org/pdf/2104.02416.pdf

  • Code: None

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第15张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第16张图片

8. LoFTR: Detector-Free Local Feature Matching with Transformers

  • Homepage: https://zju3dv.github.io/loftr/

  • Paper: https://arxiv.org/pdf/2104.00680.pdf

  • Code: https://github.com/zju3dv/LoFTR

中文解读:CVPR 2021 |  稀疏纹理也能匹配?速览基于Transformers的图像特征匹配器LoFTR

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第17张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第18张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第19张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第20张图片

9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

  • Paper: https://arxiv.org/pdf/2012.15840.pdf

  • Code: https://github.com/fudan-zvg/SETR

  • 中文解读:CVPR 2021 | Transformer再下一城!复旦等提出SETR:语义分割网络

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第21张图片

10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

  • Paper: https://arxiv.org/pdf/2103.16553.pdf

  • Code: None

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第22张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第23张图片

11. Transformer Tracking

  • Paper: https://arxiv.org/pdf/2103.15436.pdf

  • Code: https://github.com/chenxin-dlut/TransT

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第24张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第25张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第26张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第27张图片

 

12. MIST: Multiple Instance Spatial Transformer

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第28张图片

  • Paper: https://arxiv.org/pdf/1811.10725.pdf

  • Code: None

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第29张图片

13. Multimodal Motion Prediction with Stacked Transformers

  • Paper: https://arxiv.org/pdf/2103.11624.pdf

  • Code: https://decisionforce.github.io/mmTransformer

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第30张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第31张图片

14. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

  • Paper: https://assets.amazon.science/1e/4c/93cb61584c9d959dcdea1bdfda9d/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning.pdf

  • Code: https://github.com/amzn/image-to-recipe-transformers

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第32张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第33张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第34张图片

15. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

  • Paper(Oral):https://arxiv.org/pdf/2103.11681.pdf

  • Code: https://github.com/594422814/TransformerTrack

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第35张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第36张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第37张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第38张图片

16. Pre-Trained Image Processing Transformer

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第39张图片

  • Paper:  https://arxiv.org/abs/2012.00364

  • Code: None

  • 中文解读:CVPR 2021 | Transformer进军low-level视觉!北大华为等提出预训练模型IPT

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第40张图片

 

17. End-to-End Video Instance Segmentation with Transformers

  • Paper(Oral): https://arxiv.org/pdf/2011.14503.pdf

  • Code: https://github.com/Epiphqny/VisTR

  • 中文解读:CVPR 2021 Oral | Transformer再突破!美团等提出VisTR:视频实例分割网络

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第41张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第42张图片

18. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

  • Paper(Oral): https://arxiv.org/pdf/2011.09094.pdf

  • Code: https://github.com/dddzg/up-detr

  • 中文解读:CVPR 2021 Oral | Transformer再发力!华南理工和微信提出UP-DETR:无监督预训练检测器

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第43张图片

 

19. End-to-End Human Object Interaction Detection with HOI Transformer

  • Paper: https://arxiv.org/pdf/2103.04503.pdf

  • Code: https://github.com/bbepoch/HoiTransformer

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第44张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第45张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第46张图片

20. Transformer Interpretability Beyond Attention Visualization

  • Paper: https://arxiv.org/pdf/2012.09838.pdf

  • Code: https://github.com/hila-chefer/Transformer-Explainability

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第47张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第48张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第49张图片

 

21. Line Segment Detection Using Transformers without Edges

  • Paper(Oral): https://arxiv.org/abs/2101.01909.pdf

  • Code: None

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第50张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第51张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第52张图片

22. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

  • Paper: https://cs.jhu.edu/~alanlab/Pubs21/wang2021max.pdf

  • Code: None

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第53张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第54张图片

23. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

  • Paper(Oral): https://arxiv.org/pdf/2101.08833.pdf

  • Code: https://github.com/dukebw/SSTVOS

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第55张图片CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第56张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第57张图片

 

24. Topological Planning With Transformers for Vision-and-Language Navigation

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第58张图片

  • Paper: https://arxiv.org/pdf/2012.05292.pdf

  • Code: None

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第59张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第60张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第61张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第62张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第63张图片

25. Taming Transformers for High-Resolution Image Synthesis

  • Homepage: https://compvis.github.io/taming-transformers/

  • Paper(Oral): https://arxiv.org/pdf/2012.09841.pdf

  • Code: https://github.com/CompVis/taming-transformers

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第64张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第65张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第66张图片

26. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

  • Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf

  • Code: None

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第67张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第68张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第69张图片

 

27. General Multi-Label Image Classification With Transformers

  • Paper: https://arxiv.org/pdf/2011.14027.pdf

  • Code: None

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第70张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第71张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第72张图片

28. Bottleneck Transformers for Visual Recognition

  • Paper: https://arxiv.org/pdf/2101.11605.pdf

  • Code: None

  • 中文解读:CNN+Transformer!谷歌提出BoTNet:新主干网络!在ImageNet上达84.7%准确率!

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第73张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第74张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第75张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第76张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第77张图片

29. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

  • Paper(Oral): https://arxiv.org/pdf/2011.13922.pdf

  • Code: https://github.com/YicongHong/Recurrent-VLN-BERT

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第78张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第79张图片

30. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

  • Paper(Oral): https://arxiv.org/pdf/2102.06183.pdf

  • Code: https://github.com/jayleicn/ClipBERT

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第80张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第81张图片

 

31. Scaling Local Self-Attention For Parameter Efficient Visual Backbones

  • Paper(Oral): https://arxiv.org/pdf/2103.12731.pdf

  • Code: None

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第82张图片

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第83张图片

 

下面是还没有公开的论文:
 

1. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

Paper(Oral): None

Code: https://github.com/dingmyu/HR-NAS

2. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer

Paper: None

Code: None

3. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

Paper: None

Code: None

4. Facial Action Unit Detection With Transformers

Paper: None

Code: None

5. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition

Paper: None

Code: None

6. Lesion-Aware Transformers for Diabetic Retinopathy Grading

Paper: None

Code: None

7. Adaptive Image Transformer for One-Shot Object Detection

Paper: None

Code: None

8. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

Paper: None

Code: None

9. Self-Supervised Video Hashing via Bidirectional Transformers

Paper: None

Code: None

10. Gaussian Context Transformer

Paper: None

Code: None

11. Self-attention based Text Knowledge Mining for Text Detection

Paper: None

Code: https://github.com/CVI-SZU/STKM

12. SSAN: Separable Self-Attention Network for Video Representation Learning

Paper: None

Code: None

 

 

必读的 20 篇最新 ViT 论文

转自:https://mp.weixin.qq.com/s/CpmBY2qmvkxLiBmgy_PHJw

CVPR 2021 Visual Transformer 论文合集(附20篇推荐必读ViT论文)_第84张图片

你可能感兴趣的:(Transformer,deep,learning,计算机视觉,深度学习)