CVPR 2022 论文

CVPR 2022 论文

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档

文章目录

    • CVPR 2022 论文
  • Backbone
  • CLIP
  • GAN
  • NAS
  • OCR
  • NeRF
  • Visual Transformer
    • Backbone
    • 应用(Application)
  • 视觉和语言(Vision-Language)
  • 自监督学习(Self-supervised Learning)
  • 数据增强(Data Augmentation)
  • 目标检测(Object Detection)
  • 目标跟踪(Visual Tracking)
    • 多目标跟踪(Multi-Object Tracking)
  • 语义分割(Semantic Segmentation)
    • 弱监督语义分割
    • 半监督语义分割
    • 无监督语义分割
  • 实例分割(Instance Segmentation)
    • 自监督实例分割
    • 视频实例分割
  • 小样本分割(Few-Shot Segmentation)
  • 视频理解(Video Understanding)
    • 行为识别(Action Recognition)
    • 动作检测(Action Detection)
  • 图像编辑(Image Editing)
  • Low-level Vision
  • 超分辨率(Super-Resolution)
    • 图像超分辨率(Image Super-Resolution)
    • 视频超分辨率(Video Super-Resolution)
  • 去模糊(Deblur)
    • 图像去模糊(Image Deblur)
  • 3D点云(3D Point Cloud)
  • 3D目标检测(3D Object Detection)
  • 3D语义分割(3D Semantic Segmentation)
  • 3D目标跟踪(3D Object Tracking)
  • 3D人体姿态估计(3D Human Pose Estimation)
  • 3D语义场景补全(3D Semantic Scene Completion)
  • 3D重建(3D Reconstruction)
  • 伪装物体检测(Camouflaged Object Detection)
  • 深度估计(Depth Estimation)
    • 单目深度估计
  • 立体匹配(Stereo Matching)
  • 车道线检测(Lane Detection)
  • 图像修复(Image Inpainting)
  • 图像检索(Image Retrieval)
  • 人脸识别(Face Recognition)
  • 人群计数(Crowd Counting)
  • 医学图像(Medical Image)
  • 场景图生成(Scene Graph Generation)
  • 参考视频目标分割(Referring Video Object Segmentation)
  • 风格迁移(Style Transfer)
  • Adversarial Examples(对抗样本)
  • 弱监督物体检测(Weakly Supervised Object Localization)
  • 雷达目标检测(Radar Object Detection)
  • 高光谱图像重建(Hyperspectral Image Reconstruction)
  • 图像拼接(Image Stitching)
  • 水印(Watermarking)
  • Grounded Situation Recognition
  • Zero-shot Learning
  • 数据集(Datasets)
  • 新任务(New Task)
  • 其他(Others)


Backbone

A ConvNet for the 2020s

  • Paper: https://arxiv.org/abs/2201.03545
  • Code: https://github.com/facebookresearch/ConvNeXt
  • 中文解读:https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

  • Paper: https://arxiv.org/abs/2203.06717

  • Code: https://github.com/megvii-research/RepLKNet

  • Code2: https://github.com/DingXiaoH/RepLKNet-pytorch

  • 中文解读:https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg

MPViT : Multi-Path Vision Transformer for Dense Prediction

  • Paper: https://arxiv.org/abs/2112.11010
  • Code: https://github.com/youngwanLEE/MPViT
  • 中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg

Mobile-Former: Bridging MobileNet and Transformer

  • Paper: https://arxiv.org/abs/2108.05895
  • Code: None
  • 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ

MetaFormer is Actually What You Need for Vision

  • Paper: https://arxiv.org/abs/2111.11418
  • Code: https://github.com/sail-sg/poolformer

Shunted Self-Attention via Multi-Scale Token Aggregation

  • Paper(Oral): https://arxiv.org/abs/2111.15193
  • Code: https://github.com/OliverRensu/Shunted-Transformer

CLIP

HairCLIP: Design Your Hair by Text and Reference Image

  • Paper: https://arxiv.org/abs/2112.05142

  • Code: https://github.com/wty-ustc/HairCLIP

PointCLIP: Point Cloud Understanding by CLIP

  • Paper: https://arxiv.org/abs/2112.02413
  • Code: https://github.com/ZrrSkywalker/PointCLIP

Blended Diffusion for Text-driven Editing of Natural Images

  • Paper: https://arxiv.org/abs/2111.14818

  • Code: https://github.com/omriav/blended-diffusion

GAN

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

  • Homepage: https://semanticstylegan.github.io/

  • Paper: https://arxiv.org/abs/2112.02236

  • Demo: https://semanticstylegan.github.io/videos/demo.mp4

Style Transformer for Image Inversion and Editing

  • Paper: https://arxiv.org/abs/2203.07932
  • Code: https://github.com/sapphire497/style-transformer

NAS

β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search

  • Paper: https://arxiv.org/abs/2203.01665
  • Code: https://github.com/Sunshine-Ye/Beta-DARTS

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

  • Paper: https://arxiv.org/abs/2111.15362
  • Code: None

OCR

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

  • Paper: https://arxiv.org/abs/2203.10209

  • Code: https://github.com/mxin262/SwinTextSpotter

NeRF

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

  • Homepage: https://jonbarron.info/mipnerf360/

  • Paper: https://arxiv.org/abs/2111.12077

  • Demo: https://youtu.be/YStDS2-Ln1s

Point-NeRF: Point-based Neural Radiance Fields

  • Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
  • Paper: https://arxiv.org/abs/2201.08845
  • Code: https://github.com/Xharlie/point-nerf

NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images

  • Paper: https://arxiv.org/abs/2111.13679
  • Homepage: https://bmild.github.io/rawnerf/
  • Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc

Urban Radiance Fields

  • Homepage: https://urban-radiance-fields.github.io/

  • Paper: https://arxiv.org/abs/2111.14643

  • Demo: https://youtu.be/qGlq5DZT6uc

Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation

  • Paper: https://arxiv.org/abs/2202.13162
  • Code: https://github.com/HexagonPrime/Pix2NeRF

HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video

  • Homepage: https://grail.cs.washington.edu/projects/humannerf/

  • Paper: https://arxiv.org/abs/2201.04127

  • Demo: https://youtu.be/GM-RoZEymmw

Visual Transformer

Backbone

MPViT : Multi-Path Vision Transformer for Dense Prediction

  • Paper: https://arxiv.org/abs/2112.11010
  • Code: https://github.com/youngwanLEE/MPViT

MetaFormer is Actually What You Need for Vision

  • Paper: https://arxiv.org/abs/2111.11418
  • Code: https://github.com/sail-sg/poolformer

Mobile-Former: Bridging MobileNet and Transformer

  • Paper: https://arxiv.org/abs/2108.05895
  • Code: None
  • 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ

Shunted Self-Attention via Multi-Scale Token Aggregation

  • Paper(Oral): https://arxiv.org/abs/2111.15193
  • Code: https://github.com/OliverRensu/Shunted-Transformer

应用(Application)

Language-based Video Editing via Multi-Modal Multi-Level Transformer

  • Paper: https://arxiv.org/abs/2104.01122
  • Code: None

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

  • Paper: https://arxiv.org/abs/2203.00859
  • Code: None

Embracing Single Stride 3D Object Detector with Sparse Transformer

  • Paper: https://arxiv.org/abs/2112.06375
  • Code: https://github.com/TuSimple/SST

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

  • Paper: https://arxiv.org/abs/2203.02891
  • Code: https://github.com/xulianuwa/MCTformer

Spatio-temporal Relation Modeling for Few-shot Action Recognition

  • Paper: https://arxiv.org/abs/2112.05132
  • Code: https://github.com/Anirudh257/strm

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

  • Paper: https://arxiv.org/abs/2111.07910
  • Code: https://github.com/caiyuanhao1998/MST

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

  • Homepage: https://point-bert.ivg-research.xyz/
  • Paper: https://arxiv.org/abs/2111.14819
  • Code: https://github.com/lulutang0608/Point-BERT

GroupViT: Semantic Segmentation Emerges from Text Supervision

  • Homepage: https://jerryxu.net/GroupViT/

  • Paper: https://arxiv.org/abs/2202.11094

  • Demo: https://youtu.be/DtJsWIUTW-Y

Restormer: Efficient Transformer for High-Resolution Image Restoration

  • Paper: https://arxiv.org/abs/2111.09881
  • Code: https://github.com/swz30/Restormer

Splicing ViT Features for Semantic Appearance Transfer

  • Homepage: https://splice-vit.github.io/
  • Paper: https://arxiv.org/abs/2201.00424
  • Code: https://github.com/omerbt/Splice

Self-supervised Video Transformer

  • Homepage: https://kahnchana.github.io/svt/

  • Paper: https://arxiv.org/abs/2112.01514

  • Code: https://github.com/kahnchana/svt

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

  • Paper: https://arxiv.org/abs/2203.02664
  • Code: https://github.com/rulixiang/afa

Accelerating DETR Convergence via Semantic-Aligned Matching

  • Paper: https://arxiv.org/abs/2203.06883
  • Code: https://github.com/ZhangGongjie/SAM-DETR

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

  • Paper: https://arxiv.org/abs/2203.01305
  • Code: https://github.com/FengLi-ust/DN-DETR
  • 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w

Style Transformer for Image Inversion and Editing

  • Paper: https://arxiv.org/abs/2203.07932
  • Code: https://github.com/sapphire497/style-transformer

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

  • Paper: https://arxiv.org/abs/2203.10981

  • Code: https://github.com/kuanchihhuang/MonoDTR

Mask Transfiner for High-Quality Instance Segmentation

  • Paper: https://arxiv.org/abs/2111.13673
  • Code: https://github.com/SysCV/transfiner

Language as Queries for Referring Video Object Segmentation

  • Paper: https://arxiv.org/abs/2201.00487
  • Code: https://github.com/wjn922/ReferFormer
  • 中文解读:https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

  • Paper: https://arxiv.org/abs/2203.00843
  • Code: https://github.com/CurryYuan/X-Trans2Cap

AdaMixer: A Fast-Converging Query-Based Object Detector

  • Paper(Oral): https://arxiv.org/abs/2203.16507
  • Code: https://github.com/MCG-NJU/AdaMixer

Omni-DETR: Omni-Supervised Object Detection with Transformers

  • Paper: https://arxiv.org/abs/2203.16089
  • Code: https://github.com/amazon-research/omni-detr

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

  • Paper: https://arxiv.org/abs/2203.10209

  • Code: https://github.com/mxin262/SwinTextSpotter

视觉和语言(Vision-Language)

Conditional Prompt Learning for Vision-Language Models

  • Paper: https://arxiv.org/abs/2203.05557
  • Code: https://github.com/KaiyangZhou/CoOp

Bridging Video-text Retrieval with Multiple Choice Question

  • Paper: https://arxiv.org/abs/2201.04850

  • Code: https://github.com/TencentARC/MCQ

自监督学习(Self-supervised Learning)

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

  • Paper: https://arxiv.org/abs/2203.06965
  • Code: None

Crafting Better Contrastive Views for Siamese Representation Learning

  • Paper: https://arxiv.org/abs/2202.03278
  • Code: https://github.com/xyupeng/ContrastiveCrop
  • 中文解读:https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A

HCSC: Hierarchical Contrastive Selective Coding

  • Homepage: https://github.com/gyfastas/HCSC

  • Paper: https://arxiv.org/abs/2202.00455

  • 中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ

数据增强(Data Augmentation)

TeachAugment: Data Augmentation Optimization Using Teacher Knowledge

  • Paper: https://arxiv.org/abs/2202.12513
  • Code: https://github.com/DensoITLab/TeachAugment

AlignMix: Improving representation by interpolating aligned features

  • Paper: https://arxiv.org/abs/2103.15375
  • Code: None

目标检测(Object Detection)

BoxeR: Box-Attention for 2D and 3D Transformers

  • Paper: https://arxiv.org/abs/2111.13087
  • Code: https://github.com/kienduynguyen/BoxeR
  • 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

  • Paper: https://arxiv.org/abs/2203.01305
  • Code: https://github.com/FengLi-ust/DN-DETR
  • 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w

Accelerating DETR Convergence via Semantic-Aligned Matching

  • Paper: https://arxiv.org/abs/2203.06883
  • Code: https://github.com/ZhangGongjie/SAM-DETR

Localization Distillation for Dense Object Detection

  • Paper: https://arxiv.org/abs/2102.12252
  • Code: https://github.com/HikariTJU/LD
  • Code2: https://github.com/HikariTJU/LD
  • 中文解读:https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg

Focal and Global Knowledge Distillation for Detectors

  • Paper: https://arxiv.org/abs/2111.11837
  • Code: https://github.com/yzd-v/FGD
  • 中文解读:https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ

A Dual Weighting Label Assignment Scheme for Object Detection

  • Paper: https://arxiv.org/abs/2203.09730
  • Code: https://github.com/strongwolf/DW

AdaMixer: A Fast-Converging Query-Based Object Detector

  • Paper(Oral): https://arxiv.org/abs/2203.16507
  • Code: https://github.com/MCG-NJU/AdaMixer

Omni-DETR: Omni-Supervised Object Detection with Transformers

  • Paper: https://arxiv.org/abs/2203.16089
  • Code: https://github.com/amazon-research/omni-detr

目标跟踪(Visual Tracking)

Correlation-Aware Deep Tracking

  • Paper: https://arxiv.org/abs/2203.01666
  • Code: None

TCTrack: Temporal Contexts for Aerial Tracking

  • Paper: https://arxiv.org/abs/2203.01885
  • Code: https://github.com/vision4robotics/TCTrack

多目标跟踪(Multi-Object Tracking)

Learning of Global Objective for Network Flow in Multi-Object Tracking

  • Paper: https://arxiv.org/abs/2203.16210
  • Code: None

语义分割(Semantic Segmentation)

弱监督语义分割

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

  • Paper: https://arxiv.org/abs/2203.00962
  • Code: https://github.com/zhaozhengChen/ReCAM

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

  • Paper: https://arxiv.org/abs/2203.02891
  • Code: https://github.com/xulianuwa/MCTformer

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

  • Paper: https://arxiv.org/abs/2203.02664
  • Code: https://github.com/rulixiang/afa

半监督语义分割

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

  • Paper: https://arxiv.org/abs/2106.05095
  • Code: https://github.com/LiheYoung/ST-PlusPlus
  • 中文解读:https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

  • Homepage: https://haochen-wang409.github.io/U2PL/
  • Paper: https://arxiv.org/abs/2203.03884
  • Code: https://github.com/Haochen-Wang409/U2PL
  • 中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ

无监督语义分割

GroupViT: Semantic Segmentation Emerges from Text Supervision

  • Homepage: https://jerryxu.net/GroupViT/

  • Paper: https://arxiv.org/abs/2202.11094

  • Demo: https://youtu.be/DtJsWIUTW-Y

实例分割(Instance Segmentation)

BoxeR: Box-Attention for 2D and 3D Transformers

  • Paper: https://arxiv.org/abs/2111.13087
  • Code: https://github.com/kienduynguyen/BoxeR
  • 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

  • Paper: https://arxiv.org/abs/2203.04074
  • Code: https://github.com/zhang-tao-whu/e2ec

Mask Transfiner for High-Quality Instance Segmentation

  • Paper: https://arxiv.org/abs/2111.13673
  • Code: https://github.com/SysCV/transfiner

自监督实例分割

FreeSOLO: Learning to Segment Objects without Annotations

  • Paper: https://arxiv.org/abs/2202.12181
  • Code: None

视频实例分割

Efficient Video Instance Segmentation via Tracklet Query and Proposal

  • Homepage: https://jialianwu.com/projects/EfficientVIS.html
  • Paper: https://arxiv.org/abs/2203.01853
  • Demo: https://youtu.be/sSPMzgtMKCE

小样本分割(Few-Shot Segmentation)

Learning What Not to Segment: A New Perspective on Few-Shot Segmentation

  • Paper: https://arxiv.org/abs/2203.07615
  • Code: https://github.com/chunbolang/BAM

视频理解(Video Understanding)

Self-supervised Video Transformer

  • Homepage: https://kahnchana.github.io/svt/

  • Paper: https://arxiv.org/abs/2112.01514

  • Code: https://github.com/kahnchana/svt

行为识别(Action Recognition)

Spatio-temporal Relation Modeling for Few-shot Action Recognition

  • Paper: https://arxiv.org/abs/2112.05132
  • Code: https://github.com/Anirudh257/strm

动作检测(Action Detection)

End-to-End Semi-Supervised Learning for Video Action Detection

  • Paper: https://arxiv.org/abs/2203.04251
  • Code: None

图像编辑(Image Editing)

Style Transformer for Image Inversion and Editing

  • Paper: https://arxiv.org/abs/2203.07932
  • Code: https://github.com/sapphire497/style-transformer

Blended Diffusion for Text-driven Editing of Natural Images

  • Paper: https://arxiv.org/abs/2111.14818
  • Code: https://github.com/omriav/blended-diffusion

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

  • Homepage: https://semanticstylegan.github.io/

  • Paper: https://arxiv.org/abs/2112.02236

  • Demo: https://semanticstylegan.github.io/videos/demo.mp4

Low-level Vision

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

  • Paper: https://arxiv.org/abs/2111.15362
  • Code: None

Restormer: Efficient Transformer for High-Resolution Image Restoration

  • Paper: https://arxiv.org/abs/2111.09881
  • Code: https://github.com/swz30/Restormer

超分辨率(Super-Resolution)

图像超分辨率(Image Super-Resolution)

Learning the Degradation Distribution for Blind Image Super-Resolution

  • Paper: https://arxiv.org/abs/2203.04962
  • Code: https://github.com/greatlog/UnpairedSR

视频超分辨率(Video Super-Resolution)

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

  • Paper: https://arxiv.org/abs/2104.13371
  • Code: https://github.com/open-mmlab/mmediting
  • Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
  • 中文解读:https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g

去模糊(Deblur)

图像去模糊(Image Deblur)

Learning to Deblur using Light Field Generated and Real Defocus Images

  • Homepage: http://lyruan.com/Projects/DRBNet/

  • Paper(Oral): https://arxiv.org/abs/2204.00442

  • Code: https://github.com/lingyanruan/DRBNet

3D点云(3D Point Cloud)

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

  • Homepage: https://point-bert.ivg-research.xyz/

  • Paper: https://arxiv.org/abs/2111.14819

  • Code: https://github.com/lulutang0608/Point-BERT

A Unified Query-based Paradigm for Point Cloud Understanding

  • Paper: https://arxiv.org/abs/2203.01252
  • Code: None

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

  • Paper: https://arxiv.org/abs/2203.00680
  • Code: https://github.com/MohamedAfham/CrossPoint

PointCLIP: Point Cloud Understanding by CLIP

  • Paper: https://arxiv.org/abs/2112.02413
  • Code: https://github.com/ZrrSkywalker/PointCLIP

3D目标检测(3D Object Detection)

BoxeR: Box-Attention for 2D and 3D Transformers

  • Paper: https://arxiv.org/abs/2111.13087
  • Code: https://github.com/kienduynguyen/BoxeR
  • 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w

Embracing Single Stride 3D Object Detector with Sparse Transformer

  • Paper: https://arxiv.org/abs/2112.06375

  • Code: https://github.com/TuSimple/SST

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

  • Paper: https://arxiv.org/abs/2011.12001
  • Code: https://github.com/qq456cvb/CanonicalVoting

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

  • Paper: https://arxiv.org/abs/2203.10981

  • Code: https://github.com/kuanchihhuang/MonoDTR

3D语义分割(3D Semantic Segmentation)

Scribble-Supervised LiDAR Semantic Segmentation

  • Paper: https://arxiv.org/abs/2203.08537
  • Dataset: https://github.com/ouenal/scribblekitti

3D目标跟踪(3D Object Tracking)

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

  • Paper: https://arxiv.org/abs/2203.01730
  • Code: https://github.com/Ghostish/Open3DSOT

PTTR: Relational 3D Point Cloud Object Tracking with Transformer

  • Paper: https://arxiv.org/abs/2112.02857
  • Code: https://github.com/Jasonkks/PTTR

3D人体姿态估计(3D Human Pose Estimation)

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

  • Paper: https://arxiv.org/abs/2111.12707

  • Code: https://github.com/Vegetebird/MHFormer

  • 中文解读: https://zhuanlan.zhihu.com/p/439459426

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

  • Paper: https://arxiv.org/abs/2203.00859
  • Code: None

Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation

  • Paper: https://arxiv.org/abs/2203.07697
  • Code: None
  • 中文解读:https://mp.weixin.qq.com/s/L_F28IFLXvs5R4V9TTUpRw

3D语义场景补全(3D Semantic Scene Completion)

MonoScene: Monocular 3D Semantic Scene Completion

  • Paper: https://arxiv.org/abs/2112.00726
  • Code: https://github.com/cv-rits/MonoScene

3D重建(3D Reconstruction)

BANMo: Building Animatable 3D Neural Models from Many Casual Videos

  • Homepage: https://banmo-www.github.io/
  • Paper: https://arxiv.org/abs/2112.12761
  • Code: https://github.com/facebookresearch/banmo
  • 中文解读:https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew

伪装物体检测(Camouflaged Object Detection)

Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection

  • Paper: https://arxiv.org/abs/2203.02688
  • Code: https://github.com/lartpang/ZoomNet

深度估计(Depth Estimation)

单目深度估计

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

  • Paper: https://arxiv.org/abs/2203.01502
  • Code: None

OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion

  • Paper: https://arxiv.org/abs/2203.00838
  • Code: None

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

  • Paper: https://arxiv.org/abs/2112.02306
  • Code: None

P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior

  • Paper: https://arxiv.org/abs/2204.02091
  • Code: https://github.com/SysCV/P3Depth

立体匹配(Stereo Matching)

ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching

  • Paper: https://arxiv.org/abs/2203.02146
  • Code: https://github.com/gangweiX/ACVNet

车道线检测(Lane Detection)

Rethinking Efficient Lane Detection via Curve Modeling

  • Paper: https://arxiv.org/abs/2203.02431

  • Code: https://github.com/voldemortX/pytorch-auto-drive

  • Demo:https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4

图像修复(Image Inpainting)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding

  • Paper: https://arxiv.org/abs/2203.00867

  • Code: https://github.com/DQiaole/ZITS_inpainting

图像检索(Image Retrieval)

Correlation Verification for Image Retrieval

  • Paper(Oral): https://arxiv.org/abs/2204.01458
  • Code: https://github.com/sungonce/CVNet

人脸识别(Face Recognition)

AdaFace: Quality Adaptive Margin for Face Recognition

  • Paper(Oral): https://arxiv.org/abs/2204.00964
  • Code: https://github.com/mk-minchul/AdaFace

人群计数(Crowd Counting)

Leveraging Self-Supervision for Cross-Domain Crowd Counting

  • Paper: https://arxiv.org/abs/2103.16291
  • Code: None

医学图像(Medical Image)

BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation

  • Paper: https://arxiv.org/abs/2203.02533
  • Code: None

场景图生成(Scene Graph Generation)

SGTR: End-to-end Scene Graph Generation with Transformer

  • Paper: https://arxiv.org/abs/2112.12970
  • Code: None

参考视频目标分割(Referring Video Object Segmentation)

Language as Queries for Referring Video Object Segmentation

  • Paper: https://arxiv.org/abs/2201.00487
  • Code: https://github.com/wjn922/ReferFormer

ReSTR: Convolution-free Referring Image Segmentation Using Transformers

  • Paper: https://arxiv.org/abs/2203.16768
  • Code: None

风格迁移(Style Transfer)

StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions

  • Homepage: https://lukashoel.github.io/stylemesh/

  • Paper: https://arxiv.org/abs/2112.01530

  • Code: https://github.com/lukasHoel/stylemesh

  • Demo:https://www.youtube.com/watch?v=ZqgiTLcNcks

Adversarial Examples(对抗样本)

Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon

  • Paper: https://arxiv.org/abs/2203.03818
  • Code: https://github.com/hncszyq/ShadowAttack

弱监督物体检测(Weakly Supervised Object Localization)

Weakly Supervised Object Localization as Domain Adaption

  • Paper: https://arxiv.org/abs/2203.01714
  • Code: https://github.com/zh460045050/DA-WSOL_CVPR2022

雷达目标检测(Radar Object Detection)

Exploiting Temporal Relations on Radar Perception for Autonomous Driving

  • Paper: https://arxiv.org/abs/2204.01184
  • Code: None

高光谱图像重建(Hyperspectral Image Reconstruction)

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

  • Paper: https://arxiv.org/abs/2111.07910
  • Code: https://github.com/caiyuanhao1998/MST

图像拼接(Image Stitching)

Deep Rectangling for Image Stitching: A Learning Baseline

  • Paper(Oral): https://arxiv.org/abs/2203.03831

  • Code: https://github.com/nie-lang/DeepRectangling

  • Dataset: https://github.com/nie-lang/DeepRectangling

  • 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q

水印(Watermarking)

Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings

  • Paper: https://arxiv.org/abs/2104.13450
  • Code: None

Grounded Situation Recognition

Collaborative Transformers for Grounded Situation Recognition

  • Paper: https://arxiv.org/abs/2203.16518
  • Code: https://github.com/jhcho99/CoFormer

Zero-shot Learning

Unseen Classes at a Later Time? No Problem

  • Paper: https://arxiv.org/abs/2203.16517
  • Code: https://github.com/sumitramalagi/Unseen-classes-at-a-later-time

数据集(Datasets)

It’s About Time: Analog Clock Reading in the Wild

  • Homepage: https://charigyang.github.io/abouttime/
  • Paper: https://arxiv.org/abs/2111.09162
  • Code: https://github.com/charigyang/itsabouttime
  • Demo: https://youtu.be/cbiMACA6dRc

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

  • Paper: https://arxiv.org/abs/2112.02306
  • Code: None

Kubric: A scalable dataset generator

  • Paper: https://arxiv.org/abs/2203.03570
  • Code: https://github.com/google-research/kubric
  • 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg

Scribble-Supervised LiDAR Semantic Segmentation

  • Paper: https://arxiv.org/abs/2203.08537
  • Dataset: https://github.com/ouenal/scribblekitti

Deep Rectangling for Image Stitching: A Learning Baseline

  • Paper(Oral): https://arxiv.org/abs/2203.03831
  • Code: https://github.com/nie-lang/DeepRectangling
  • Dataset: https://github.com/nie-lang/DeepRectangling
  • 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

  • Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/

  • Paper: https://arxiv.org/abs/2204.02389

  • Dataset: https://github.com/rhgao/ObjectFolder

  • Demo:https://youtu.be/e5aToT3LkRA

新任务(New Task)

Language-based Video Editing via Multi-Modal Multi-Level Transformer

  • Paper: https://arxiv.org/abs/2104.01122
  • Code: None

It’s About Time: Analog Clock Reading in the Wild

  • Homepage: https://charigyang.github.io/abouttime/
  • Paper: https://arxiv.org/abs/2111.09162
  • Code: https://github.com/charigyang/itsabouttime
  • Demo: https://youtu.be/cbiMACA6dRc

Splicing ViT Features for Semantic Appearance Transfer

  • Homepage: https://splice-vit.github.io/
  • Paper: https://arxiv.org/abs/2201.00424
  • Code: https://github.com/omerbt/Splice

其他(Others)

Kubric: A scalable dataset generator

  • Paper: https://arxiv.org/abs/2203.03570
  • Code: https://github.com/google-research/kubric
  • 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

  • Paper: https://arxiv.org/abs/2203.00843
  • Code: https://github.com/CurryYuan/X-Trans2Cap

Balanced MSE for Imbalanced Visual Regression

  • Paper(Oral): https://arxiv.org/abs/2203.16427
  • Code: https://github.com/jiawei-ren/BalancedMSE

参考
https://github.com/amusi/CVPR2022-Papers-with-Code#3D-Point-Cloud

你可能感兴趣的:(SLAM,自动驾驶,计算机视觉)