计算机视觉(CV)领域Transformer最新论文及资源整理分享

计算机视觉(CV)领域Transformer最新论文及资源整理分享_第1张图片

    Transformer由论文《Attention is All You Need》提出,现在是谷歌云TPU推荐的参考模型。Transformer模型最早是用于机器翻译任务,当时达到了SOTA效果。Transformer改进了RNN最被人诟病的训练慢的缺点,利用self-attention机制实现快速并行。并且Transformer可以增加到非常深的深度,充分发掘DNN模型的特性,提升模型准确率。

    本资源整理了至2021年transformer应用于计算机视觉(CV)领域最新的论文、代码数据等资源,分享给需要的朋友。

     

    资源整理自网络,源地址:https://github.com/DirtyHarryLYL/Transformer-in-Vision

论文资源列表

Surery

    (arXiv 2020.9) Efficient Transformers: A Survey, PDF

    

    (arXiv 2020.1) Transformers in Vision: A Survey, PDF

    

Recent Papers

    (ICLR'21) UPDET: UNIVERSAL MULTI-AGENT REINFORCEMENT LEARNING VIA POLICY DECOUPLING WITH TRANSFORMERS

    

    (ICLR'21) Deformable DETR: Deformable Transformers for End-to-End Object Detection

    

    (ICLR'21) LAMBDANETWORKS: MODELING LONG-RANGE INTERACTIONS WITHOUT ATTENTION

    

    (ICLR'21) SUPPORT-SET BOTTLENECKS FOR VIDEO-TEXT REPRESENTATION LEARNING, 

    

    (ICLR'21) COLORIZATION TRANSFORMER

    

    (ECCV'20) Multi-modal Transformer for Video Retrieval, 

    

    (ECCV'20) Connecting Vision and Language with Localized Narratives, 

    

    (ECCV'20) DETR: End-to-End Object Detection with Transformers

    

    (CVPR'20) Multi-Modality Cross Attention Network for Image and Sentence Matching

    

    (CVPR'20) Learning Texture Transformer Network for Image Super-Resolution

    

    (CVPR'20) Speech2Action: Cross-modal Supervision for Action Recognition, 

    

    (ICPR'20) Transformer Encoder Reasoning Network

    

    (EMNLP'19) Effective Use of Transformer Networks for Entity Tracking

    

    (arXiv 2021.02) Is Space-Time Attention All You Need for Video Understanding? 

    

    (arXiv 2021.02) Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling

    

    (arXiv 2021.02) Video Transformer Network, 

    

    (arXiv 2021.02) Training Vision Transformers for Image Retrieval, 

    

    (arXiv 2021.02) Relaxed Transformer Decoders for Direct Action Proposal Generation

    

    (arXiv 2021.02) TransReID: Transformer-based Object Re-Identification, 

    

    (arXiv 2021.02) Improving Visual Reasoning by Exploiting The Knowledge in Texts, 

    

    (arXiv 2021.01) Fast Convergence of DETR with Spatially Modulated Co-Attention, 

    

    (arXiv 2021.01) Dual-Level Collaborative Transformer for Image Captioning, 

    

    (arXiv 2021.01) SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation (arXiv 2021.1), 

    

    (arXiv 2021.01) CPTR: FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING, 

    

    (arXiv 2021.01) Trans2Seg: Transparent Object Segmentation with Transformer

    

    (arXiv 2021.01) Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

    

    (arXiv 2021.01) Trear: Transformer-based RGB-D Egocentric Action Recognition, 

    

    (arXiv 2021.01) Learn to Dance with AIST++: Music Conditioned 3D Dance Generation

    

    (arXiv 2021.01) Spherical Transformer: Adapting Spherical Signal to CNNs, 

    

    (arXiv 2021.01) Are We There Yet? Learning to Localize in Embodied Instruction Following, 

    

    (arXiv 2021.01) VinVL: Making Visual Representations Matter in Vision-Language Models, 

    

    (arXiv 2021.01) Bottleneck Transformers for Visual Recognition, 

    

    (arXiv 2021.01) Investigating the Vision Transformer Model for Image Retrieval Tasks, 

    

    (arXiv 2021.01) ADDRESSING SOME LIMITATIONS OF TRANSFORMERS WITH FEEDBACK MEMORY, 

    

    (arXiv 2021.01) Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

    

    (arXiv 2021.01) TrackFormer: Multi-Object Tracking with Transformers, 

    

    (arXiv 2021.01) VisualSparta: Sparse Transformer Fragment-level Matching for Large-scale Text-to-Image Search, 

    

    (arXiv 2021.01) Line Segment Detection Using Transformers without Edges, 

    

    (arXiv 2021.01) Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers, 

    

    (arXiv 2020.12) Accurate Word Representations with Universal Visual Guidance, 

    

    (arXiv 2020.12) DETR for Pedestrian Detection, 

    

    (arXiv 2020.12) Transformer Interpretability Beyond Attention Visualization

    

    (arXiv 2020.12) PCT: Point Cloud Transformer, 

    

    (arXiv 2020.12) TransPose: Towards Explainable Human Pose Estimation by Transformer, 

    

    (arXiv 2020.12) Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

    

    (arXiv 2020.12) Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry, 

    

    (arXiv 2020.12) Transformer for Image Quality Assessment

    

    (arXiv 2020.12) TransTrack: Multiple-Object Tracking with Transformer

    

    (arXiv 2020.12) 3D Object Detection with Pointformer, 

    

    (arXiv 2020.12) Training data-efficient image transformers & distillation through attention, 

    

    (arXiv 2020.12) Toward Transformer-Based Object Detection, 

    

    (arXiv 2020.12) SceneFormer: Indoor Scene Generation with Transformers, 

    

    (arXiv 2020.12) Point Transformer, 

    

    (arXiv 2020.12) End-to-End Human Pose and Mesh Reconstruction with Transformers, 

    

    (arXiv 2020.12) Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting, 

    

    (arXiv 2020.12) Pre-Trained Image Processing Transformer, 

    

    (arXiv 2020.12) Taming Transformers for High-Resolution Image Synthesis

    

    (arXiv 2020.11) End-to-end Lane Shape Prediction with Transformers

    

    (arXiv 2020.11) UP-DETR: Unsupervised Pre-training for Object Detection with Transformers, 

    

    (arXiv 2020.11) End-to-End Video Instance Segmentation with Transformers, 

    

    (arXiv 2020.11) Rethinking Transformer-based Set Prediction for Object Detection, 

    

    (arXiv 2020.11) General Multi-label Image Classification with Transformers, [](https://arxiv.org/pdf/2011.14027}

    

    (arXiv 2020.11) End-to-End Object Detection with Adaptive Clustering Transformer, 

    

    (arXiv 2020.10) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    

    (arXiv 2020.07) Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks

    

    (arXiv 2020.07) Feature Pyramid Transformer

    

    (arXiv 2020.06) Visual Transformers: Token-based Image Representation and Processing for Computer Vision, 

    

    (arXiv 2019.08) LXMERT: Learning Cross-Modality Encoder Representations from Transformers

    

往期精品内容推荐

摇身一变,智能对话在物联网的巧妙应用

互联网面试必读-《用算法和数据结构解决问题》免费pdf分享

加州理工《数据驱动算法设计》课程(2020)视频及ppt分享

推荐系统领域常见公共数据集整理分享

20年6月最新-《深度神经网络的高效处理技术综述》免费书分享

Yoshua Bengio-深度学习处理系统2

历史最全图像/视频去模糊化精选论文整理分享

两分钟论文解读之让蒙娜丽莎开口说话

函数式编程规范新书-《Frisby函数式编程的基本指南》最新版pdf免费分享

深度学习基础:正向模型、可微损失函数与优化

 

你可能感兴趣的:(深度学习文章阅读笔记,深度学习优化策略汇总,云计算,transformer,计算机视觉,深度学习)