【CVPR2023】最新论文新鲜出炉,快来食用吧...

CVPR2023最新信息及论文下载

官网链接:https://cvpr.thecvf.com/Conferences/2023

目录

1. 检测

  • 2D目标检测(2D Object Detection)
  • 视频目标检测(Video Object Detection)
  • 3D目标检测(3D Object Detection)
  • 人物交互检测(HOI Detection)
  • 伪装目标检测(Camouflaged Object Detection)
  • 旋转目标检测(Rotation Object Detection)
  • 显著性目标检测(Saliency Object Detection)
  • 关键点检测(Keypoint Detection)
  • 车道线检测(Lane Detection)
  • 边缘检测(Edge Detection)
  • 消失点检测(Vanishing Point Detection)
  • 异常检测(Anomaly Detection)

2. 分割(Segmentation)

  • 图像分割(Image Segmentation)
  • 全景分割(Panoptic Segmentation)
  • 语义分割(Semantic Segmentation)
  • 实例分割(Instance Segmentation)
  • 超像素(Superpixel)
  • 视频目标分割(Video Object Segmentation)
  • 抠图(Matting)
  • 密集预测(Dense Prediction)

3. 图像处理(Image Processing)

  • 超分辨率(Super Resolution)
  • 图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)
  • 图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)
  • 图像去噪/去模糊/去雨去雾(Image Denoising)
  • 图像编辑/图像修复(Image Edit/Image Inpainting)
  • 图像翻译(Image Translation)
  • 图像质量评估(Image Quality Assessment)
  • 风格迁移(Style Transfer)
  • 图像配准(Image Registration)

4. 视频处理(Video Processing)

  • 视频编辑(Video Editing)
  • 视频生成/视频合成(Video Generation/Video Synthesis)
  • 视频超分(Video Super-Resolution)

5. 估计(Estimation)

  • 光流/运动估计(Flow/Motion Estimation)
  • 深度估计(Depth Estimation)
  • 人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)
  • 手势估计(Gesture Estimation)

6. 图像&视频检索/(Image&Video Retrieval/Video Understanding)

  • 行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)
  • 行人重识别/检测(Re-Identification/Detection)
  • 图像/视频字幕(Image/Video Caption)

7. 人脸(Face)

  • 人脸识别/检测(Facial Recognition/Detection)
  • 人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)
  • 人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

8. 三维视觉(3D Vision)

  • 点云(Point Cloud)
  • 三维重建(3D Reconstruction)
  • 场景重建/视图合成/新视角合成(Novel View Synthesis)

9. 目标跟踪(Object Tracking)

10. 医学影像(Medical Imaging)

11. 文本检测/识别/理解(Text Detection/Recognition/Understanding)

12. 遥感图像(Remote Sensing Image)

13. GAN/生成式/对抗式(GAN/Generative/Adversarial)

14. 图像生成/图像合成(Image Generation/Image Synthesis)

15. 场景图(Scene Graph

  • 场景图生成(Scene Graph Generation)
  • 场景图预测(Scene Graph Prediction)
  • 场景图理解(Scene Graph Understanding)

16. 视觉定位/位姿估计(Visual Localization/Pose Estimation)

17. 视觉推理/视觉问答(Visual Reasoning/VQA)

18. 视觉预测(Vision-based Prediction)

19. 神经网络结构设计(Neural Network Structure Design)

  • CNN
  • Transformer
  • 图神经网络(GNN)
  • 神经网络架构搜索(NAS)
  • MLP

20. 神经网络可解释性(Neural Network Interpretability)

21. 数据集(Dataset)

22. 数据处理(Data Processing)

  • 数据增广(Data Augmentation)
  • 归一化/正则化(Batch Normalization)
  • 图像聚类(Image Clustering)
  • 图像压缩(Image Compression)

23. 图像特征提取与匹配(Image feature extraction and matching)

24. 视觉表征学习(Visual Representation Learning)

25. 模型训练/泛化(Model Training/Generalization)

  • 噪声标签(Noisy Label)
  • 长尾分布(Long-Tailed Distribution)

26. 模型压缩(Model Compression)

  • 知识蒸馏(Knowledge Distillation)
  • 剪枝(Pruning)
  • 量化(Quantization)

27. 模型评估(Model Evaluation)

28. 图像分类(Image Classification)

29. 图像计数(Image Counting)

30. 机器人(Robotic)

31. 半监督学习/弱监督学习/无监督学习/自监督学习(Self-supervised Learning/Semi-supervised Learning)

32. 多模态学习(Multi-Modal Learning)

  • 视听学习(Audio-visual Learning)
  • 视觉-语言(Vision-language)

33. 主动学习(Active Learning)

34. 小样本学习/零样本学习(Few-shot/Zero-shot Learning)

35. 持续学习(Continual Learning/Life-long Learning)

36. 迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

37. 度量学习(Metric Learning)

38. 对比学习(Contrastive Learning)

39. 增量学习(Incremental Learning)

40. 强化学习(Reinforcement Learning)

41. 元学习(Meta Learning)

42. 联邦学习(Federated Learning)

43. 自动驾驶(automatic driving)

其他



检测


2D目标检测(2D Object Detection)

[7]NeRF-RPN: A general framework for object detection in NeRFs

paper

[6]Detecting Everything in the Open World: Towards Universal Object Detection

paper

[5]Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

paper

[4]CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

paper

[3]Enhanced Training of Query-Based Object Detection via Selective Query Recollection

paper | code

[2]DETRs with Hybrid Matching

paper | code

[1]YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors(YOLOv7)

paper | code



视频目标检测(Video Object Detection)

[1]SCOTCH and SODA: A Transformer Video Shadow Detection Framework

paper



3D目标检测(3D object detection)

[22]Neural Part Priors: Learning to Optimize Part-Based Object Completion in RGB-D Scans

paper

[21]itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection

paper

[20]Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild

paper | code

[19]FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection

paper | code

[18]NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations

paper

[17]Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving

paper

[16]VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

paper | code

[15]OcTr: Octree-based Transformer for 3D Object Detection

paper

[14]MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer

paper

[13]CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

paper | code

[12]Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency

paper

[11]AeDet: Azimuth-invariant Multi-view 3D Object Detection

paper | code

[10]Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection

paper

[9]PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

paper | code

[8]MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences

paper

[7]Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View

paper

[6]X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

paper

[5]Virtual Sparse Convolution for Multimodal 3D Object Detection

paper | code

[4]MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

paper | code

[3]Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection

paper | code

[2]LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion

paper | code

[1]ConQueR: Query Contrast Voxel-DETR for 3D Object Detection(3D 目标检测的Query Contrast Voxel-DETR)
paper | code



人物交互检测(HOI Detection)

[2]Category Query Learning for Human-Object Interaction Classification

paper

[1]Detecting Human-Object Contact in Images

paper



伪装目标检测(Camouflaged Object Detection)



旋转目标检测(Rotation Object Detection)


显著性目标检测(Saliency Object Detection)

[2]Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings

paper

[1]Texture-guided Saliency Distilling for Unsupervised Salient Object Detection

paper | code



关键点检测(Keypoint Detection)



车道线检测(Lane Detection)

[1]BEV-LaneDet: a Simple and Effective 3D Lane Detection Baseline

paper



边缘检测(Edge Detection)

[2]The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector

paper | code

[1]Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections

paper | code



消失点检测(Vanishing Point Detection)



异常检测(Anomaly Detection)

[8]SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection

paper

[7]Normalizing Flow based Feature Synthesis for Outlier-Aware Object Detection

paper

[6]Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection

paper

[5]DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection

paper

[4]Diversity-Measurable Anomaly Detection

paper

[3]Block Selection Method for Using Feature Norm in Out-of-distribution Detection

paper

[2]Lossy Compression for Robust Unsupervised Time-Series Anomaly Detection

paper

[1]Multimodal Industrial Anomaly Detection via Hybrid Fusion

paper | code



分割(Segmentation)


图像分割(Image Segmentation)

[3]Focused and Collaborative Feedback Integration for Interactive Image Segmentation

paper | code

[2]MP-Former: Mask-Piloted Transformer for Image Segmentation

paper | code

[1]Interactive Segmentation as Gaussian Process Classification

paper



全景分割(Panoptic Segmentation)

[2]UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration

paper

[1]Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

paper



语义分割(Semantic Segmentation)

[20]LaserMix for Semi-Supervised LiDAR Semantic Segmentation

paper | code

[19]Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation

paper | code

[18]Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

paper | code

[17]Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

paper | code

[16]Reliability in Semantic Segmentation: Are We on the Right Track?

paper | code

[15]Generative Semantic Segmentation

paper | code

[14]Novel Class Discovery for 3D Point Cloud Semantic Segmentation

paper | code

[13]MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving

paper | code

[12]Side Adapter Network for Open-Vocabulary Semantic Segmentation

paper | code

[11]Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes

paper

[10]Token Contrast for Weakly-Supervised Semantic Segmentation

paper | code

[9]Delivering Arbitrary-Modal Semantic Segmentation

paper | code

[8]Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation

paper

[7]Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

paper | code

[6]Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

paper | code

[5]SCPNet: Semantic Scene Completion on Point Cloud

paper

[4]On Calibrating Semantic Segmentation Models: Analyses and An Algorithm

paper

[3]Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision

paper

[2]Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation

paper | code

[1]Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation

paper



实例分割(Instance Segmentation)

[7]A Generalized Framework for Video Instance Segmentation

paper | code

[6]FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation

paper

[5]SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation

paper | code

[4]DynaMask: Dynamic Mask Selection for Instance Segmentation

paper | code

[3]Beyond mAP: Towards better evaluation of instance segmentation

paper

[2]ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution

paper

[1]PolyFormer: Referring Image Segmentation as Sequential Polygon Generation(PolyFormer:将图像分割表述为顺序多边形生成)

paper



超像素(Superpixel)


视频目标分割(Video Object Segmentation)

[4]Two-shot Video Object Segmentation

paper

[3]Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

paper

[2]MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation

paper

[1]InstMove: Instance Motion for Object-centric Video Segmentation

paper | code



抠图(Matting)


密集预测(Dense Prediction)

[2]One-to-Few Label Assignment for End-to-End Dense Detection

paper | code

[1]DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction

paper



视频处理(Video Processing)

[6]A Unified Pyramid Recurrent Network for Video Frame Interpolation

paper

[5]Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior

paper | code

[4]Blind Video Deflickering by Neural Filtering with a Flawed Atlas

paper | code

[3]Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

paper | code

[2]UV Volumes for Real-time Rendering of Editable Free-view Human Performance

paper | code

[1]Exploring Discontinuity for Video Frame Interpolation

[paper]([2202.07291] Exploring Discontinuity for Video Frame Interpolation (arxiv.org))



视频编辑(Video Editing)

[3]Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding

paper

[2]Text-Visual Prompting for Efficient 2D Temporal Video Grounding

paper

[1]Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation

paper | code



视频生成/视频合成(Video Generation/Video Synthesis)

[7]Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers

paper | code

[6]Conditional Image-to-Video Generation with Latent Flow Diffusion Models

paper

[5]3D Cinemagraphy from a Single Image

paper

[4]VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

paper | code

[3]MOSO: Decomposing MOtion, Scene and Object for Video Prediction

paper | code

[2]SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

paper | code

[1]Video Probabilistic Diffusion Models in Projected Latent Space(投影潜在空间中的视频概率扩散模型)

paper | project



视频超分(Video Super-Resolution)

[2]Structured Sparsity Learning for Efficient Video Super-Resolution

paper

[1]Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

paper



估计(Estimation)


光流/运动估计(Optical Flow/Motion Estimation)

[2]DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling

paper

[1]Rethinking Optical Flow from Geometric Matching Consistent Perspective

paper | code




深度估计(Depth Estimation)

[5]SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates

paper

[4]PlaneDepth: Self-supervised Depth Estimation via Orthogonal Planes

paper | code

[3]HRDFuse: Monocular 360°Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions

paper

[2]Fully Self-Supervised Depth Estimation from Defocus Clue

paper | code

[1] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

paper | code



人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)

[11]Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation

paper

[10]3D Human Mesh Estimation from Virtual Markers

paper

[9]Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation

paper

[8]Rigidity-Aware Detection for 6D Object Pose Estimation

paper

[7]Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video

paper

[6]Markerless Camera-to-Robot Pose Estimation via Self-supervised Sim-to-Real Transfer

paper

[5]TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation

paper

[4]Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting

paper

[3]PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation

paper

[2]DistilPose: Tokenized Pose Regression with Heatmap Distillation

paper

[1]Relightable Neural Human Assets from Multi-view Gradient Illuminations(来自多视图渐变照明的可照明神经人类资产)

paper



手势估计(Gesture Estimation)

[5]Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild

paper

[4]Natural Language-Assisted Sign Language Recognition

paper | code

[3]CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

paper | code

[2]Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand Disentanglement

paper

[1]Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos

paper | code



图像处理(Image Processing)

[3]Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR

paper

[2]PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment

paper

[1]DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation

paper | code


超分辨率(Super Resolution)

[6]Activating More Pixels in Image Super-Resolution Transformer

paper | code

[5]Super-Resolution Neural Operator

paper | code

[4]Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution

paper

[3]Perception-Oriented Single Image Super-Resolution using Optimal Objective Estimation

paper | code

[2]N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution

paper | code

[1]Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild(野外鲁棒图像超分辨率的去噪扩散概率模型)

paper | project



图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)

[13]CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not

paper

[12]Instant Volumetric Head Avatars

paper

[11]Contrastive Semi-supervised Learning for Underwater Image Restoration via Reliable Bank

paper | code

[10]ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction

paper | code

[9]Masked Image Modeling with Local Multi-Scale Reconstruction

paper | code

[8]Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective

paper | code

[7]DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration

paper

[6]Robust Unsupervised StyleGAN Image Restoration

paper

[5]Raw Image Reconstruction with Learned Compact Metadata

paper

[4]Efficient and Explicit Modelling of Image Hierarchies for Image Restoration

paper | code

[3]Imagic: Text-Based Real Image Editing with Diffusion Models

paper | project

[2]High-resolution image reconstruction with latent diffusion models from human brain activity

paper | project

[1]Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models

paper



图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)

[1]LightPainter: Interactive Portrait Relighting with Freehand Scribble

paper


图像去噪/去模糊/去雨去雾(Image Denoising)

[6]Masked Image Training for Generalizable Deep Image Denoising

paper | code

[5]Learning A Sparse Transformer Network for Effective Image Deraining

paper | code

[4]Uncertainty-Aware Unsupervised Image Deblurring with Deep Residual Prior

paper

[3]Polarized Color Image Denoising using Pocoformer

paper

[2]Blur Interpolation Transformer for Real-World Motion from Blur

paper | code

[1]Structured Kernel Estimation for Photon-Limited Deconvolution

paper | code



图像编辑/图像修复(Image Edit/Inpainting)

[6]SIEDOB: Semantic Image Editing by Disentangling Object and Background

paper | code

[5]CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing

paper

[4]SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model

paper

[3]Interactive Cartoonization with Controllable Perceptual Factors

paper

[2]Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

paper | code

[1]LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

paper | code



图像翻译(Image Translation)



图像质量评估(Image Quality Assessment)

[2]CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability

paper

[1]Quality-aware Pre-trained Models for Blind Image Quality Assessment

paper


风格迁移(Style Transfer)

[3]Neural Preset for Color Style Transfer

paper | code

[2]StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields

paper

[1]Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN

paper | code



图像配准(Image Registration)

[1]Indescribable Multi-modal Spatial Evaluator

paper | code



人脸(Face)



人脸识别/检测(Facial Recognition/Detection)

[3]Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition

paper

[2]Local Region Perception and Relationship Learning Combined with Feature Fusion for Facial Action Unit Detection

paper

[1]Multi Modal Facial Expression Recognition with Transformer-Based Fusion Networks and Dynamic Sampling

paper



人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[7]SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage

paper

[6]MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation

paper | code

[5]NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images

paper

[4]Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D Images

paper

[3]Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation

paper | code

[2]A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images

paper

[1]MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation(MetaPortrait:具有快速个性化适应的身份保持谈话头像生成)

paper | code



人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

[3]Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment

paper

[2]Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization

paper | code

[1]Physical-World Optical Adversarial Attacks on 3D Face Recognition

paper



目标跟踪(Object Tracking)

[6]MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking

paper

[5]Visual Prompt Multi-Modal Tracking

paper | code

[4]Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking

paper | code

[3]Focus On Details: Online Multi-object Tracking with Diverse Fine-grained Representation

paper

[2]Referring Multi-Object Tracking

paper

[1]Simple Cues Lead to a Strong Multi-Object Tracker

paper



图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)

[10]Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

paper | code

[9]NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

paper

[8]Aligning Step-by-Step Instructional Diagrams to Video Demonstrations

paper

[7]Query-Dependent Video Representation for Moment Retrieval and Highlight Detection

paper

[6]Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

paper | code

[5]Dual-path Adaptation from Image to Video Transformers

paper | code

[4]Data-Free Sketch-Based Image Retrieval

paper

[3]DAA: A Delta Age AdaIN operation for age estimation via binary code transformer

paper

[2]VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval

paper | code

[1]Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

paper


行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)

[8]Box-Level Active Detection

paper

[7]Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition

paper

[6]Open Set Action Recognition via Multi-Label Evidential Learning

paper

[5]Video Test-Time Adaptation for Action Recognition

paper

[4]Post-Processing Temporal Action Detection

paper

[3]TriDet: Temporal Action Detection with Relative Boundary Modeling

paper | code

[2]Learning Discriminative Representations for Skeleton Based Action Recognition

paper

[1]Continuous Sign Language Recognition with Correlation Network

paper | code


行人重识别/检测(Re-Identification/Detection)

[2]TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification

paper | code

[1]MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID

paper | code


图像/视频字幕(Image/Video Caption)

[5]Text with Knowledge Graph Augmented Transformer for Video Captioning

paper

[4]Dual-Stream Transformer for Generic Event Boundary Captioning

paper | code

[3]ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing

paper | code

[2]Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training

paper

[1]Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

paper | code


医学影像(Medical Imaging)

[7]RepMode: Learning to Re-parameterize Diverse Experts for Subcellular Structure Prediction

paper | code

[6]Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation

paper | code

[5]Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification

paper

[4]Neuron Structure Modeling for Generalizable Remote Physiological Measurement

paper | code

[3]Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses

paper | code

[2]Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images

paper | code

[1]Label-Free Liver Tumor Segmentation

paper | code


文本检测/识别/理解(Text Detection/Recognition/Understanding)

[6]Images Speak in Images: A Generalist Painter for In-Context Visual Learning

paper | code

[5]Context De-confounded Emotion Recognition

paper

[4]Joint Visual Grounding and Tracking with Natural Language Specification

paper

[3]Unifying Vision, Text, and Layout for Universal Document Processing

paper

[2]Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling

paper

[1]DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

paper | code


遥感图像(Remote Sensing Image)


GAN/生成式/对抗式(GAN/Generative/Adversarial)

[7]Fine-Grained Face Swapping via Regional GAN Inversion

paper

[6]Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models

paper

[5]Graph Transformer GANs for Graph-Constrained House Generation

paper

[4]Improving GAN Training via Feature Space Shrinkage

paper | code

[3]Adversarial Attack with Raindrops

paper

[2]T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations

paper | project

[1]Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars

paper | project


图像生成/图像合成(Image Generation/Image Synthesis)

[22]All are Worth Words: A ViT Backbone for Diffusion Models

paper | code

[21]Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars

paper | code

[20]Shifted Diffusion for Text-to-image Generation

paper | code

[19]Towards Practical Plug-and-Play Diffusion Models

paper

[18]Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis

paper

[17]Wavelet Diffusion Models are fast and scalable Image Generators

paper | code

[16]Learning 3D-aware Image Synthesis with Unknown Pose Distribution

paper

[15]Picture that Sketch: Photorealistic Image Generation from Abstract Sketches

paper

[14]3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process

paper | code

[13]A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

paper | code

[12]Regularized Vector Quantization for Tokenized Image Synthesis

paper

[11]SpaText: Spatio-Textual Representation for Controllable Image Generation

paper

[10]Unifying Layout Generation with a Decoupled Diffusion Model

paper

[9]Scaling up GANs for Text-to-Image Synthesis

paper

[8]Inversion-Based Style Transfer with Diffusion Models

paper | code

[7]Perspective Fields for Single Image Camera Calibration

paper

[6]VGFlow: Visibility guided Flow Network for Human Reposing

paper

[5]DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

paper | code

[4]Progressive Open Space Expansion for Open-Set Model Attribution

paper | code

[3]Person Image Synthesis via Denoising Diffusion Model

paper

[2]Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models(使用预训练的 2D 扩散模型解决 3D 逆问题)

paper

[1]Parallel Diffusion Models of Operator and Image for Blind Inverse Problems(盲反问题算子和图像的并行扩散模型)

paper


三维视觉(3D Vision)

[2]Learning a 3D Morphable Face Reflectance Model from Low-cost Data

paper | code

[1]Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

paper | code


点云(Point Cloud)

[15]CLIP2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

paper

[14]Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration

paper | code

[13]Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration

paper | code

[12]Controllable Mesh Generation Through Sparse Latent Point Diffusion Models

paper

[11]Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis

paper | code

[10]Rotation-Invariant Transformer for Point Cloud Matching

paper

[9]GraVoS: Voxel Selection for 3D Point-Cloud Detection

paper

[8]DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

paper | code

[7]PointCert: Point Cloud Classification with Deterministic Certified Robustness Guarantees

paper

[6]ACL-SPC: Adaptive Closed-Loop system for Self-Supervised Point Cloud Completion

paper | code

[5]DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization

paper

[4]Frequency-Modulated Point Cloud Rendering with Easy Editing

paper

[3]Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

paper

[2]ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer

paper | code

[1]Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting

paper | code


三维重建(3D Reconstruction)

[25]HexPlane: A Fast Representation for Dynamic Scenes

paper

[24]Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container

paper

[23]BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

paper

[22]Structured 3D Features for Reconstructing Controllable Avatars

paper

[21]PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360∘

paper

[20]Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization

paper

[19]TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision

paper | code

[18]MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

paper | code

[17]PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision

paper

[16]SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

paper | code

[15]Masked Wavelet Representation for Compact Neural Radiance Fields

paper

[14]Decoupling Human and Camera Motion from Videos in the Wild

paper

[13]Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

paper

[12]NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images

paper

[11]Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion

paper | code

[10]MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices

paper | code

[9]Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly

paper

[8]NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction

paper

[7]HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling

paper

[6]MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision

paper

[4]Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness

paper | code

[3]Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes

paper | code

[2]ECON: Explicit Clothed humans Obtained from Normals

paper | code

[1]Structured 3D Features for Reconstructing Relightable and Animatable Avatars

paper | project


场景重建/视图合成/新视角合成(Novel View Synthesis)

[32]Magic3D: High-Resolution Text-to-3D Content Creation

paper

[31]DiffRF: Rendering-Guided 3D Radiance Field Diffusion

paper

[30]Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization

paper | code

[29]Interactive Segmentation of Radiance Fields

paper

[28]MAIR: Multi-view Attention Inverse Rendering with 3D Spatially-Varying Lighting Estimation

paper

[27]GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images

paper

[26]Progressively Optimized Local Radiance Fields for Robust View Synthesis

paper

[25]ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field

paper

[24]HandNeRF: Neural Radiance Fields for Animatable Interacting Hands

paper

[23]Grid-guided Neural Radiance Fields for Large Urban Scenes

paper

[22]EventNeRF: Neural Radiance Fields from a Single Colour Event Camera

paper

[21]SPARF: Neural Radiance Fields from Sparse and Noisy Poses

paper

[20]RUST: Latent Neural Scene Representations from Unposed Imagery

paper

[19]SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field

paper

[18]ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision

paper | code

[17]Balanced Spherical Grid for Egocentric View Synthesis

paper | code

[16]Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention

paper

[15]MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

paper | code

[14]Robust Dynamic Radiance Fields

paper

[13]I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs

paper

[12]Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis from Monocular Image

paper

[11]Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision

paper

[10]Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields

paper

[9]DP-NeRF: Deblurred Neural Radiance Field with Physical Scene Priors

paper | code

[8]SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

paper

[7]3D Video Loops from Asynchronous Input

paper | code

[6]NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer

paper | code

[5]NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation

paper

[4]Renderable Neural Radiance Map for Visual Navigation

paper

[3]Real-Time Neural Light Field on Mobile Devices

paper | project

[2]Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

paper | code

[1]NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

paper | project


模型压缩(Model Compression)

[1]Neural Video Compression with Diverse Contexts

paper | code


知识蒸馏(Knowledge Distillation)

[3]Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

paper

[2]Generic-to-Specific Distillation of Masked Autoencoders

paper | code

[1]CLIPPING: Distilling CLIP-based Models for Video-Language Understanding(CLIPPING:为视频语言理解提炼基于 CLIP 的模型)

paper


剪枝(Pruning)

[2]CP3: Channel Pruning Plug-in for Point-based Networks

paper

[1]DepGraph: Towards Any Structural Pruning

paper | code


量化(Quantization)

[4]Hard Sample Matters a Lot in Zero-Shot Quantization

paper

[3]Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

paper

[2]Post-training Quantization on Diffusion Models

paper | code

[1]Adaptive Data-Free Quantization

paper | code


神经网络结构设计(Neural Network Structure Design)

[6]LINe: Out-of-Distribution Detection by Leveraging Important Neurons

paper

[5]Towards Scalable Neural Representation for Diverse Videos

paper

[4]Boundary Unlearning

paper

[3]Equiangular Basis Vectors

paper | code

[2]LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs

paper | code

[1]Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks

paper | code


CNN

[5]Randomized Adversarial Training via Taylor Expansion

paper | code

[4]Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations

paper | code

[3]DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

paper | code

[2]Demystify Transformers & Convolutions in Modern Image Deep Networks

paper | code

[1]InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

paper | code


Transformer

[15]CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection

paper | code

[14]Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

paper

[13]POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery

paper

[12]FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER

paper

[11]Spherical Transformer for LiDAR-based 3D Recognition

paper | code

[10]MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

paper | code

[9]Top-Down Visual Attention from Analysis by Synthesis

paper

[8]BiFormer: Vision Transformer with Bi-Level Routing Attention

paper | code

[7]Making Vision Transformers Efficient from A Token Sparsification View

paper

[6]Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

paper

[5]Learning Imbalanced Data with Vision Transformers

paper | code

[4]SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency

paper

[3]Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers

paper | code

[2]Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

paper | code

[1]Integrally Pre-Trained Transformer Pyramid Networks

paper | code


图神经网络(GNN)

[2]Turning Strengths into Weaknesses: A Certified Robustness Inspired Attack Framework against Graph Neural Networks

paper

[1]From Node Interaction to Hop Interaction: New Effective and Scalable Graph Learning Paradigm

paper


神经网络架构搜索(NAS)

[3]Polynomial Implicit Neural Representations For Large Diverse Datasets

paper | code

[2]PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

paper | code

[1]Stitchable Neural Networks(可缝合神经网络)

paper | code


MLP

[1]ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization

paper | code


MAE

[1]Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
paper | code


数据处理(Data Processing)

[1]TINC: Tree-structured Implicit Neural Compression

paper | code


数据增广(Data Augmentation)


归一化/正则化(Batch Normalization)

[1]Masked Images Are Counterfactual Samples for Robust Fine-tuning

paper


图像聚类(Image Clustering)

[1]On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering

paper | code


图像压缩(Image Compression)

[1]Context-Based Trit-Plane Coding for Progressive Image Compression

paper | code


模型训练/泛化(Model Training/Generalization)

[16]Generalist: Decoupling Natural and Robust Generalization

paper

[15]Feature Separation and Recalibration for Adversarial Robustness

paper

[14]Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck

paper

[13]FlexiViT: One Model for All Patch Sizes

paper | code

[12]Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization

paper | code

[11]Improving Generalization with Domain Convex Game

paper

[10]TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization

paper | code

[9]An Extended Study of Human-like Behavior under Adversarial Training

paper

[8]Sharpness-Aware Gradient Matching for Domain Generalization

paper | code

[7]HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining

paper

[6]Universal Instance Perception as Object Discovery and Retrieval

paper | code

[5]Practical Network Acceleration with Tiny Sets

paper | code

[4]Towards Bridging the Performance Gaps of Joint Energy-based Models

paper | code

[3]DropKey

paper

[2]Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

paper

[1]DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks

paper


噪声标签(Noisy Label)

[2]Fine-Grained Classification with Noisy Labels

paper

[1]Combating noisy labels in object detection datasets

paper


长尾分布(Long-Tailed Distribution)

[1]Curvature-Balanced Feature Manifold Learning for Long-Tailed Classification

paper


图像特征提取与匹配(Image feature extraction and matching)

[3]Referring Image Matting

paper | code

[2]Iterative Geometry Encoding Volume for Stereo Matching

paper | code

[1]Modality-Agnostic Debiasing for Single Domain Generalization

paper


视觉表征学习(Visual Representation Learning)

[12]Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning

paper

[11]CrOC: Cross-View Online Clustering for Dense Visual Representation Learning

paper | code

[10]Masked Motion Encoding for Self-Supervised Video Representation Learning

paper | code

[9]Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

paper | code

[8]MARLIN: Masked Autoencoder for facial video Representation LearnINg

paper | code

[7]Hierarchical discriminative learning improves visual representations of biomedical microscopy

paper

[6]Fine-tuned CLIP Models are Efficient Video Learners

paper | code

[5]Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

paper | code

[4]Open-Set Representation Learning through Combinatorial Embedding

paper

[3]NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction

paper

[2]Stare at What You See: Masked Image Modeling without Reconstruction

paper | code

[1]Switchable Representation Learning Framework with Self-compatibility

paper


模型评估(Model Evaluation)

[2]Physically Adversarial Infrared Patches with Learnable Shapes and Locations

paper

[1]TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

paper | code


多模态学习(Multi-Modal Learning)

[13]CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP

paper | code

[12]MaPLe: Multi-modal Prompt Learning

paper | code

[11]Decoupled Multimodal Distilling for Emotion Recognition

paper

[10]MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

paper | code

[9]BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency

paper | code

[8]Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos

paper | code

[7]Emotional Reaction Intensity Estimation Based on Multimodal Data

paper

[6]Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers

paper

[5]Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

paper

[4]Multimodal Prompting with Missing Modalities for Visual Recognition

paper | code

[3]Align and Attend: Multimodal Summarization with Dual Contrastive Losses

paper | code

[2]Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information(通过最大化多模态互信息实现一体化预训练)

paper | code

[1]Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks(Uni-Perceiver v2:用于大规模视觉和视觉语言任务的通才模型)

paper | code


视听学习(Audio-visual Learning)

[6]Egocentric Audio-Visual Object Localization

paper | code

[5]Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

paper

[4]Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

paper

[3]Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring

paper | code

[2]CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective

paper | code

[1]A Light Weight Model for Active Speaker Detection

paper | code


视觉-语言(Vision-language)

[18]MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model

paper | code

[17]Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning

paper

[16]Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

paper | code

[15]Test of Time: Instilling Video-Language Models with a Sense of Time

paper | code

[14]Accelerating Vision-Language Pretraining with Free Language Modeling

paper

[13]Task Residual for Tuning Vision-Language Models

paper | code

[12]MAGVLT: Masked Generative Vision-and-Language Transformer

paper

[11]Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

paper | code

[10]Lana: A Language-Capable Navigator for Instruction Following and Generation

paper | code

[9]FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

paper | code

[8]Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

paper

[7]Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing

paper

[6]Connecting Vision and Language with Video Localized Narratives

paper | code

[5]Policy Adaptation from Foundation Model Feedback

paper

[4]Open-vocabulary Attribute Detection

paper

[3]Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training

paper

[2]Turning a CLIP Model into a Scene Text Detector

paper | code

[1]GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

paper


视觉预测(Vision-based Prediction)

[4]TBP-Former: Learning Temporal Bird’s-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving

paper

[3]Intention-Conditioned Long-Term Human Egocentric Action Forecasting

paper | code

[2]Computational Choreography using Human Motion Synthesis

paper

[1]IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction

paper


数据集(Dataset)

[15]GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts

paper

[14]ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data

paper

[13]Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts

paper

[12]A Bag-of-Prototypes Representation for Dataset-Level Applications

paper

[11]Music-Driven Group Choreography

paper

[10]RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset

paper

[9]Backdoor Defense via Adaptively Splitting Poisoned Dataset

paper | code

[8]Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models

paper | code

[7]SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

paper | code

[6]A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

paper | code

[5]MVImgNet: A Large-scale Dataset of Multi-view Images

paper

[4]Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo

paper

[3]CUDA: Convolution-based Unlearnable Datasets

paper

[2]V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception

paper

[1]Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

paper


主动学习(Active Learning)


小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning)

[8]CF-Font: Content Fusion for Few-shot Font Generation

paper

[7]DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection

paper | code

[6]Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings

paper | code

[5]Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

paper | code

[4]Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation

paper

[3]Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

paper | code

[2]NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging

paper

[1]FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization

paper | code


持续学习(Continual Learning/Life-long Learning)

[2]Computationally Budgeted Continual Learning: What Does Matter?

paper | code

[1]Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

paper | code


场景图(Scene Graph)

[1]Probabilistic Debiasing of Scene Graphs

paper | code


场景图生成(Scene Graph Generation)

[1]Prototype-based Embedding Network for Scene Graph Generation

paper


场景图预测(Scene Graph Prediction)


场景图理解(Scene Graph Understanding)

[2]SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text

paper

[1]PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

paper | code


视觉定位/位姿估计(Visual Localization/Pose Estimation)

[5]Human Pose as Compositional Tokens

paper

[4]Data-efficient Large Scale Place Recognition with Graded Similarity Supervision

paper | code

[3]PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

paper

[2]StructVPR: Distill Structural Knowledge with Weighting Samples for Visual Place Recognition

paper

[1]PyramidFlow: High-Resolution Defect Contrastive Localization using Pyramid Normalizing Flow

paper


视觉推理/视觉问答(Visual Reasoning/VQA)

[7]3D Concept Learning and Reasoning from Multi-View Images

paper

[6]Abstract Visual Reasoning: An Algebraic Approach for Solving Raven’s Progressive Matrices

paper | code

[5]Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning

paper | code

[4]Generative Bias for Robust Visual Question Answering

paper

[3]MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

paper | code

[2]Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

paper | code

[1]From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

paper | code


图像分类(Image Classification)

[4]Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments

paper

[3]Semantic Prompt for Few-Shot Image Recognition

paper

[2]Boosting Verified Training for Robust Image Classifications via Abstraction

paper | code

[1]I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification(I2MVFormer:用于零样本图像分类的大型语言模型生成的多视图文档监督)

paper


迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

[11]Deep Frequency Filtering for Domain Generalization

paper

[10]Semi-Supervised Domain Adaptation with Source Label Adaptation

paper | code

[9]Unsupervised Continual Semantic Adaptation through Neural Rendering

paper

[8]MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation

paper | code

[7]Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective

paper

[6]Manipulating Transfer Learning for Property Inference

paper | code

[5]Trainable Projected Gradient Method for Robust Fine-tuning

paper

[4]DA-DETR: Domain Adaptive Detection Transformer with Information Fusion

paper

[3]Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection

paper | code

[2]Guiding Pseudo-labels with Uncertainty Estimation for Source-free Unsupervised Domain Adaptation

paper | code

[1]Adaptive Assignment for Geometry Aware Local Feature Matching

paper


度量学习(Metric Learning)


对比学习(Contrastive Learning)

[8]PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

paper | code

[7]Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

paper

[6]Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

paper

[5]Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation

paper | code

[4]MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset

paper | code

[3]CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning

paper | code

[2]Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation

paper | code

[1]Twin Contrastive Learning with Noisy Labels

paper | code


增量学习(Incremental Learning)

[2]Class-Incremental Exemplar Compression for Class-Incremental Learning

paper

[1]Dense Network Expansion for Class Incremental Learning

paper


强化学习(Reinforcement Learning)

[3]Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

paper | code

[2]ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals

paper

[1]EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning

paper | code


元学习(Meta Learning)

[1]A Meta-Learning Approach to Predicting Performance and Data Requirements

paper


机器人(Robotic)

[2]Efficient Map Sparsification Based on 2D and 3D Discretized Grids

paper

[1]PyPose: A Library for Robot Learning with Physics-based Optimization(PyPose:基于物理优化的机器人学习库)

paper | code


半监督学习/弱监督学习/无监督学习/自监督学习(Self-supervised Learning/Semi-supervised Learning)

[21]Can’t Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders

paper

[20]Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation

paper | code

[19]ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-supervised Learning

paper

[18]Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels

paper

[17]Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching

paper | code

[16]Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data

paper

[15]Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning

paper

[14]Correlational Image Modeling for Self-Supervised Visual Pre-Training

paper

[13]Extracting Class Activation Maps from Non-Discriminative Features as well

paper | code

[12]TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation

paper | code

[11]LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding

paper

[10]MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection

paper | code

[9]Semi-supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination

paper

[8]Non-Contrastive Unsupervised Learning of Physiological Signals from Video

paper

[7]Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems

paper | code

[6]Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models

paper

[5]The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training

paper | code

[4]Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning

paper | code

[3]Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors

paper

[2]Siamese Image Modeling for Self-Supervised Vision Representation Learning

paper | code

[1]Cut and Learn for Unsupervised Object Detection and Instance Segmentation

paper | project


神经网络可解释性(Neural Network Interpretability)

[3]OCTET: Object-aware Counterfactual Explanations

paper | code

[2]Don’t Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis

paper

[1]SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries(SplineCam:深度网络几何和决策边界的精确可视化和表征)

paper | code


图像计数(Image Counting)

[1]Zero-shot Object Counting

paper


联邦学习(Federated Learning)

[3]Make Landscape Flatter in Differentially Private Federated Learning

paper

[2]STDLens: Model Hijacking-resilient Federated Learning for Object Detection

paper | code

[1]Re-thinking Federated Active Learning based on Inter-class Diversity

paper | code


自动驾驶(automatic driving)

[1]BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision(BEVFormer v2:通过透视监督使现代图像主干适应鸟瞰图识别)

paper


其他

[57]Level-S2fM: Structure from Motion on Neural Level Set of Implicit Surfaces

paper

[56]FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network

paper

[55]ARO-Net: Learning Implicit Fields from Anchored Radial Observations

paper | code

[54]Unknown Sniffer for Object Detection: Don’t Turn a Blind Eye to Unknown Objects

paper

[53]Robust Test-Time Adaptation in Dynamic Scenarios

paper

[52]LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction

paper

[51]Doubly Right Object Recognition: A Why Prompt for Visual Rationales

paper

[50]CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

paper

[49]Marching-Primitives: Shape Abstraction from Signed Distance Function

paper

[48]Modeling Inter-Class and Intra-Class Constraints in Novel Class Discovery

paper | code

[47]ActMAD: Activation Matching to Align Distributions for Test-Time-Training

paper | code

[46]Robust Mean Teacher for Continual and Gradual Test-Time Adaptation

paper | code

[45]Planning-oriented Autonomous Driving

paper | code

[44]Explicit Visual Prompting for Low-Level Structure Segmentations

paper | code

[43]Leapfrog Diffusion Model for Stochastic Trajectory Prediction

paper | code

[42]Feature Alignment and Uniformity for Test Time Adaptation

paper

[41]Attribute-preserving Face Dataset Anonymization via Latent Code Optimization

paper | code

[40]Fix the Noise: Disentangling Source Feature for Controllable Domain Translation

paper | code

[39]Effective Ambiguity Attack Against Passport-based DNN Intellectual Property Protection Schemes through Fully Connected Layer Substitution

paper

[38]Visibility Constrained Wide-band Illumination Spectrum Design for Seeing-in-the-Dark

paper | code

[37]Learning a Depth Covariance Function

paper

[36]VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions

paper

[35]Dense Distinct Query for End-to-End Object Detection

paper | code

[34]Facial Affective Analysis based on MAE and Multi-modal Information for 5th ABAW Competition

paper

[33]Partial Network Cloning

paper | code

[32]Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection

paper | code

[31]Adversarial Counterfactual Visual Explanations

paper | code

[3-]A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation

paper | code

[29]Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

paper | code

[28]Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry

paper | code

[27]Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations

paper | code

[26]Backdoor Defense via Deconfounded Representation Learning

paper | code

[25]Label Information Bottleneck for Label Enhancement

paper

[24]LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

paper | code

[23]Diversity-Aware Meta Visual Prompting

paper | code

[22]ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges

paper

[21]Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

paper

[20]UniHCP: A Unified Model for Human-Centric Perceptions

paper | code

[19]Where We Are and What We’re Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

paper

[18]Revisiting Rotation Averaging: Uncertainties and Robust Losses

paper | code

[17]3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification

paper

[16]Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection

paper | code

[15]Understanding and Improving Visual Prompting: A Label-Mapping Perspective

paper | code

[14]vMAP: Vectorised Object Mapping for Neural Field SLAM

paper | code

[13]EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization

paper

[12]Upcycling Models under Domain and Category Shift

paper | code

[11]Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images

paper | code

[10]Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies

paper

[9]Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

paper | code

[8]Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

paper

[7]Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

paper

[6]Physical-World Optical Adversarial Attacks on 3D Face Recognition

paper

[5]Improving Cross-Modal Retrieval with Set of Diverse Embeddings

paper

[4]Neural Video Compression with Diverse Contexts

paper | code

[3]Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger

paper

[2]Single Image Backdoor Inversion via Robust Smoothed Classifiers

paper | code

[1]Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision

paper | code



3. CVPR2023 论文解读汇总

1.CVPR2023|打破对MIM(掩码图像建模)的数据缩放能力的误解!

2.CVPR 2023|基于CLIP的微调新范式!训练速度和性能均创新高!

3.CVPR 2023|浙大提出全归一化流模型PyramidFlow:高分辨率缺陷异常定位新范式

4.CVPR 2023|大脑视觉信号被Stable Diffusion复现图像!“人类的谋略和谎言不存在了”

5.CVPR 2023|港科大 DA-BEV: 3D目标检测新 SOTA,一种强大的深度信息挖掘方法

6.CVPR 23|表征学习超MAE,谷歌等提出MAGE:无监督图像生成超越 Latent Diffusion

7.CVPR2023|不好意思我要加速度了!FasterNet:更高FLOPS才是更快更强的底气

8.CVPR 2023|大模型流行之下,SN-Net给出一份独特的答卷

9.CVPR 2023|结合特征金字塔结构的自监督学习 iTPNs

10.CVPR 2023|SQR:对于训练DETR-family目标检测的探索和思考

11.CVPR 2023|COCO新纪录65.4mAP!InternImage:注入新机制,扩展DCNv3,探索视觉大模型

12.CVPR 2023|YOLOv7强势收录!时隔6年,YOLOv系列再登CVPR!

13.CVPR 2023|谷歌提出Imagic:扩散模型只用文字就能PS照片了!

14.CVPR 2023|Lite DETR:计算量减少60%!高效交错多尺度编码器

15.CVPR 2023|白翔团队新作:借助CLIP完成场景文字检测

16.CVPR’23|即插即用系列!一种轻量高效的自注意力机制助力图像恢复网络问鼎 SOTA

17.CVPR 2023|英伟达提出VoxFromer: 单目3D语义场景补全新SOTA

18.CVPR 2023|EMA-VFI: 基于帧间注意力提取运动和外观信息的高效视频插帧



原文链接:https://github.com/extreme-assistant/CVPR2023-Paper-Code-Interpretation/blob/master/CVPR2023.md?plain=1

你可能感兴趣的:(计算机视觉,目标检测,深度学习)