提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
A ConvNet for the 2020s
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Paper: https://arxiv.org/abs/2203.06717
Code: https://github.com/megvii-research/RepLKNet
Code2: https://github.com/DingXiaoH/RepLKNet-pytorch
中文解读:https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg
MPViT : Multi-Path Vision Transformer for Dense Prediction
Mobile-Former: Bridging MobileNet and Transformer
MetaFormer is Actually What You Need for Vision
Shunted Self-Attention via Multi-Scale Token Aggregation
HairCLIP: Design Your Hair by Text and Reference Image
Paper: https://arxiv.org/abs/2112.05142
Code: https://github.com/wty-ustc/HairCLIP
PointCLIP: Point Cloud Understanding by CLIP
Blended Diffusion for Text-driven Editing of Natural Images
Paper: https://arxiv.org/abs/2111.14818
Code: https://github.com/omriav/blended-diffusion
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Homepage: https://semanticstylegan.github.io/
Paper: https://arxiv.org/abs/2112.02236
Demo: https://semanticstylegan.github.io/videos/demo.mp4
Style Transformer for Image Inversion and Editing
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Paper: https://arxiv.org/abs/2203.10209
Code: https://github.com/mxin262/SwinTextSpotter
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
Homepage: https://jonbarron.info/mipnerf360/
Paper: https://arxiv.org/abs/2111.12077
Demo: https://youtu.be/YStDS2-Ln1s
Point-NeRF: Point-based Neural Radiance Fields
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images
Urban Radiance Fields
Homepage: https://urban-radiance-fields.github.io/
Paper: https://arxiv.org/abs/2111.14643
Demo: https://youtu.be/qGlq5DZT6uc
Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
Homepage: https://grail.cs.washington.edu/projects/humannerf/
Paper: https://arxiv.org/abs/2201.04127
Demo: https://youtu.be/GM-RoZEymmw
MPViT : Multi-Path Vision Transformer for Dense Prediction
MetaFormer is Actually What You Need for Vision
Mobile-Former: Bridging MobileNet and Transformer
Shunted Self-Attention via Multi-Scale Token Aggregation
Language-based Video Editing via Multi-Modal Multi-Level Transformer
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
Embracing Single Stride 3D Object Detector with Sparse Transformer
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
GroupViT: Semantic Segmentation Emerges from Text Supervision
Homepage: https://jerryxu.net/GroupViT/
Paper: https://arxiv.org/abs/2202.11094
Demo: https://youtu.be/DtJsWIUTW-Y
Restormer: Efficient Transformer for High-Resolution Image Restoration
Splicing ViT Features for Semantic Appearance Transfer
Self-supervised Video Transformer
Homepage: https://kahnchana.github.io/svt/
Paper: https://arxiv.org/abs/2112.01514
Code: https://github.com/kahnchana/svt
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
Accelerating DETR Convergence via Semantic-Aligned Matching
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Style Transformer for Image Inversion and Editing
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Paper: https://arxiv.org/abs/2203.10981
Code: https://github.com/kuanchihhuang/MonoDTR
Mask Transfiner for High-Quality Instance Segmentation
Language as Queries for Referring Video Object Segmentation
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
AdaMixer: A Fast-Converging Query-Based Object Detector
Omni-DETR: Omni-Supervised Object Detection with Transformers
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Paper: https://arxiv.org/abs/2203.10209
Code: https://github.com/mxin262/SwinTextSpotter
Conditional Prompt Learning for Vision-Language Models
Bridging Video-text Retrieval with Multiple Choice Question
Paper: https://arxiv.org/abs/2201.04850
Code: https://github.com/TencentARC/MCQ
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
Crafting Better Contrastive Views for Siamese Representation Learning
HCSC: Hierarchical Contrastive Selective Coding
Homepage: https://github.com/gyfastas/HCSC
Paper: https://arxiv.org/abs/2202.00455
中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
AlignMix: Improving representation by interpolating aligned features
BoxeR: Box-Attention for 2D and 3D Transformers
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Accelerating DETR Convergence via Semantic-Aligned Matching
Localization Distillation for Dense Object Detection
Focal and Global Knowledge Distillation for Detectors
A Dual Weighting Label Assignment Scheme for Object Detection
AdaMixer: A Fast-Converging Query-Based Object Detector
Omni-DETR: Omni-Supervised Object Detection with Transformers
Correlation-Aware Deep Tracking
TCTrack: Temporal Contexts for Aerial Tracking
Learning of Global Objective for Network Flow in Multi-Object Tracking
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
GroupViT: Semantic Segmentation Emerges from Text Supervision
Homepage: https://jerryxu.net/GroupViT/
Paper: https://arxiv.org/abs/2202.11094
Demo: https://youtu.be/DtJsWIUTW-Y
BoxeR: Box-Attention for 2D and 3D Transformers
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
Mask Transfiner for High-Quality Instance Segmentation
FreeSOLO: Learning to Segment Objects without Annotations
Efficient Video Instance Segmentation via Tracklet Query and Proposal
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation
Self-supervised Video Transformer
Homepage: https://kahnchana.github.io/svt/
Paper: https://arxiv.org/abs/2112.01514
Code: https://github.com/kahnchana/svt
Spatio-temporal Relation Modeling for Few-shot Action Recognition
End-to-End Semi-Supervised Learning for Video Action Detection
Style Transformer for Image Inversion and Editing
Blended Diffusion for Text-driven Editing of Natural Images
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Homepage: https://semanticstylegan.github.io/
Paper: https://arxiv.org/abs/2112.02236
Demo: https://semanticstylegan.github.io/videos/demo.mp4
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
Restormer: Efficient Transformer for High-Resolution Image Restoration
Learning the Degradation Distribution for Blind Image Super-Resolution
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
Learning to Deblur using Light Field Generated and Real Defocus Images
Homepage: http://lyruan.com/Projects/DRBNet/
Paper(Oral): https://arxiv.org/abs/2204.00442
Code: https://github.com/lingyanruan/DRBNet
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Homepage: https://point-bert.ivg-research.xyz/
Paper: https://arxiv.org/abs/2111.14819
Code: https://github.com/lulutang0608/Point-BERT
A Unified Query-based Paradigm for Point Cloud Understanding
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
PointCLIP: Point Cloud Understanding by CLIP
BoxeR: Box-Attention for 2D and 3D Transformers
Embracing Single Stride 3D Object Detector with Sparse Transformer
Paper: https://arxiv.org/abs/2112.06375
Code: https://github.com/TuSimple/SST
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Paper: https://arxiv.org/abs/2203.10981
Code: https://github.com/kuanchihhuang/MonoDTR
Scribble-Supervised LiDAR Semantic Segmentation
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
PTTR: Relational 3D Point Cloud Object Tracking with Transformer
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
Paper: https://arxiv.org/abs/2111.12707
Code: https://github.com/Vegetebird/MHFormer
中文解读: https://zhuanlan.zhihu.com/p/439459426
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation
MonoScene: Monocular 3D Semantic Scene Completion
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior
ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching
Rethinking Efficient Lane Detection via Curve Modeling
Paper: https://arxiv.org/abs/2203.02431
Code: https://github.com/voldemortX/pytorch-auto-drive
Demo:https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
Paper: https://arxiv.org/abs/2203.00867
Code: https://github.com/DQiaole/ZITS_inpainting
Correlation Verification for Image Retrieval
AdaFace: Quality Adaptive Margin for Face Recognition
Leveraging Self-Supervision for Cross-Domain Crowd Counting
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
SGTR: End-to-end Scene Graph Generation with Transformer
Language as Queries for Referring Video Object Segmentation
ReSTR: Convolution-free Referring Image Segmentation Using Transformers
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
Homepage: https://lukashoel.github.io/stylemesh/
Paper: https://arxiv.org/abs/2112.01530
Code: https://github.com/lukasHoel/stylemesh
Demo:https://www.youtube.com/watch?v=ZqgiTLcNcks
Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon
Weakly Supervised Object Localization as Domain Adaption
Exploiting Temporal Relations on Radar Perception for Autonomous Driving
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Deep Rectangling for Image Stitching: A Learning Baseline
Paper(Oral): https://arxiv.org/abs/2203.03831
Code: https://github.com/nie-lang/DeepRectangling
Dataset: https://github.com/nie-lang/DeepRectangling
中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings
Collaborative Transformers for Grounded Situation Recognition
Unseen Classes at a Later Time? No Problem
It’s About Time: Analog Clock Reading in the Wild
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
Kubric: A scalable dataset generator
Scribble-Supervised LiDAR Semantic Segmentation
Deep Rectangling for Image Stitching: A Learning Baseline
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/
Paper: https://arxiv.org/abs/2204.02389
Dataset: https://github.com/rhgao/ObjectFolder
Demo:https://youtu.be/e5aToT3LkRA
Language-based Video Editing via Multi-Modal Multi-Level Transformer
It’s About Time: Analog Clock Reading in the Wild
Splicing ViT Features for Semantic Appearance Transfer
Kubric: A scalable dataset generator
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Balanced MSE for Imbalanced Visual Regression
参考
https://github.com/amusi/CVPR2022-Papers-with-Code#3D-Point-Cloud