CVPR2022将于6月22日召开,本次会议共收录了2067篇论文。由于数量较多,本文将分四个子文章呈现,可直接点击论文标题获取文档。
第一部分, 第二部分, 第三部分。
Sparse Fuse Dense: Towards High Quality 3D Detection With Depth Completion [supp] |
GIRAFFE HD: A High-Resolution 3D-Aware Generative Model [supp] |
InOut: Diverse Image Outpainting via GAN Inversion |
PNP: Robust Learning From Noisy Labels by Probabilistic Noise Prediction |
Estimating Structural Disparities for Face Models |
Revisiting the Transferability of Supervised Pretraining: An MLP Perspective [supp] |
Plenoxels: Radiance Fields Without Neural Networks [supp] |
What Matters for Meta-Learning Vision Regression Tasks? [supp] |
Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition |
Selective-Supervised Contrastive Learning With Noisy Labels [supp] |
Learning Second Order Local Anomaly for General Face Forgery Detection |
ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation [supp] |
The Devil Is in the Labels: Noisy Label Correction for Robust Scene Graph Generation [supp] |
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation [supp] |
SimT: Handling Open-Set Noise for Domain Adaptive Semantic Segmentation [supp] |
Interspace Pruning: Using Adaptive Filter Representations To Improve Training of Sparse CNNs [supp] |
PLAD: Learning To Infer Shape Programs With Pseudo-Labels and Approximate Distributions [supp] |
PTTR: Relational 3D Point Cloud Object Tracking With Transformer |
Frequency-Driven Imperceptible Adversarial Attack on Semantic Similarity [supp] |
ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds [supp] |
Video Demoireing With Relation-Based Temporal Consistency |
Co-Domain Symmetry for Complex-Valued Deep Learning |
Industrial Style Transfer With Large-Scale Geometric Warping and Content Preservation [supp] |
Modeling Image Composition for Complex Scene Generation [supp] |
SS3D: Sparsely-Supervised 3D Object Detection From Point Cloud |
Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer [supp] |
GRAM: Generative Radiance Manifolds for 3D-Aware Image Generation [supp] |
UniVIP: A Unified Framework for Self-Supervised Visual Pre-Training |
GraFormer: Graph-Oriented Transformer for 3D Pose Estimation |
Decoupling Zero-Shot Semantic Segmentation [supp] |
Neural Collaborative Graph Machines for Table Structure Recognition [supp] |
Towards Robust Vision Transformer [supp] |
DeepCurrents: Learning Implicit Representations of Shapes With Boundaries |
Learning Affordance Grounding From Exocentric Images [supp] |
Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions [supp] |
Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability [supp] |
Unknown-Aware Object Detection: Learning What You Don't Know From Videos in the Wild [supp] |
Multi-Modal Extreme Classification |
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement [supp] |
Training-Free Transformer Architecture Search [supp] |
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation [supp] |
Non-Isotropy Regularization for Proxy-Based Deep Metric Learning |
C2AM: Contrastive Learning of Class-Agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation [supp] |
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation [supp] |
3DAC: Learning Attribute Compression for Point Clouds [supp] |
Learning a Structured Latent Space for Unsupervised Point Cloud Completion |
The Wanderings of Odysseus in 3D Scenes [supp] |
Few-Shot Learning With Noisy Labels [supp] |
Understanding 3D Object Articulation in Internet Videos |
Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation |
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better Than Dot-Product Self-Attention [supp] |
Interactive Image Synthesis With Panoptic Layout Generation [supp] |
Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving [supp] |
All-in-One Image Restoration for Unknown Corruption [supp] |
Syntax-Aware Network for Handwritten Mathematical Expression Recognition [supp] |
Sketching Without Worrying: Noise-Tolerant Sketch-Based Image Retrieval [supp] |
PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local Descriptors [supp] |
PlanarRecon: Real-Time 3D Plane Detection and Reconstruction From Posed Monocular Videos [supp] |
Deep Equilibrium Optical Flow Estimation [supp] |
Optimizing Video Prediction via Video Frame Interpolation |
Motron: Multimodal Probabilistic Human Motion Forecasting [supp] |
Episodic Memory Question Answering [supp] |
Continual Stereo Matching of Continuous Driving Scenes With Growing Architecture [supp] |
Few-Shot Backdoor Defense Using Shapley Estimation |
Cycle-Consistent Counterfactuals by Latent Transformations [supp] |
ADeLA: Automatic Dense Labeling With Attention for Viewpoint Shift in Semantic Segmentation [supp] |
Joint Hand Motion and Interaction Hotspots Prediction From Egocentric Videos [supp] |
Blind Face Restoration via Integrating Face Shape and Generative Priors [supp] |
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video [supp] |
Safe-Student for Safe Deep Semi-Supervised Learning With Unseen-Class Unlabeled Data [supp] |
Learning To Zoom Inside Camera Imaging Pipeline [supp] |
High-Fidelity GAN Inversion for Image Attribute Editing [supp] |
RCP: Recurrent Closest Point for Point Cloud |
gDNA: Towards Generative Detailed Neural Avatars [supp] |
A Dual Weighting Label Assignment Scheme for Object Detection |
FAM: Visual Explanations for the Feature Representations From Deep Convolutional Networks [supp] |
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning [supp] |
MaskGIT: Masked Generative Image Transformer [supp] |
Revisiting the "Video" in Video-Language Understanding |
Local Texture Estimator for Implicit Representation Function |
Instance-Aware Dynamic Neural Network Quantization [supp] |
When To Prune? A Policy Towards Early Structural Pruning [supp] |
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval [supp] |
Degree-of-Linear-Polarization-Based Color Constancy [supp] |
A Voxel Graph CNN for Object Classification With Event Cameras [supp] |
On the Importance of Asymmetry for Siamese Representation Learning [supp] |
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning [supp] |
ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval |
DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting [supp] |
Exploring Effective Data for Surrogate Training Towards Black-Box Attack [supp] |
JRDB-Act: A Large-Scale Dataset for Spatio-Temporal Action, Social Group and Activity Detection [supp] |
AR-NeRF: Unsupervised Learning of Depth and Defocus Effects From Natural Images With Aperture Rendering Neural Radiance Fields [supp] |
Likert Scoring With Grade Decoupling for Long-Term Action Assessment [supp] |
Many-to-Many Splatting for Efficient Video Frame Interpolation |
Investigating Top-k White-Box and Transferable Black-Box Attack [supp] |
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition [supp] |
Learning To Learn by Jointly Optimizing Neural Architecture and Weights [supp] |
Attributable Visual Similarity Learning [supp] |
A Self-Supervised Descriptor for Image Copy Detection [supp] |
DyTox: Transformers for Continual Learning With DYnamic TOken eXpansion [supp] |
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective [supp] |
Manifold Learning Benefits GANs [supp] |
A Keypoint-Based Global Association Network for Lane Detection |
Negative-Aware Attention Framework for Image-Text Matching [supp] |
Semantic-Aligned Fusion Transformer for One-Shot Object Detection [supp] |
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning [supp] |
Few-Shot Incremental Learning for Label-to-Image Translation [supp] |
Discrete Time Convolution for Fast Event-Based Stereo [supp] |
An Image Patch Is a Wave: Phase-Aware Vision MLP [supp] |
Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination [supp] |
Visual Acoustic Matching [supp] |
Shunted Self-Attention via Multi-Scale Token Aggregation |
Shadows Can Be Dangerous: Stealthy and Effective Physical-World Adversarial Attack by Natural Phenomenon |
ImplicitAtlas: Learning Deformable Shape Templates in Medical Imaging [supp] |
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression [supp] |
3D Photo Stylization: Learning To Generate Stylized Novel Views From a Single Image [supp] |
Improving Visual Grounding With Visual-Linguistic Verification and Iterative Reasoning |
Contrastive Learning for Space-Time Correspondence via Self-Cycle Consistency |
Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion [supp] |
Scale-Equivalent Distillation for Semi-Supervised Object Detection [supp] |
Recurrent Variational Network: A Deep Learning Inverse Problem Solver Applied to the Task of Accelerated MRI Reconstruction [supp] |
SelfD: Self-Learning Large-Scale Driving Policies From the Web |
"The Pedestrian Next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping [supp] |
Attribute Group Editing for Reliable Few-Shot Image Generation [supp] |
Surpassing the Human Accuracy: Detecting Gallbladder Cancer From USG Images With Curriculum Learning [supp] |
CroMo: Cross-Modal Learning for Monocular Depth Estimation [supp] |
Self-Supervised Object Detection From Audio-Visual Correspondence |
Autofocus for Event Cameras [supp] |
Learning Multiple Adverse Weather Removal via Two-Stage Knowledge Learning and Multi-Contrastive Regularization: Toward a Unified Model |
Polymorphic-GAN: Generating Aligned Samples Across Multiple Domains With Learned Morph Maps [supp] |
Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond [supp] |
Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling of SO(3) [supp] |
TrackFormer: Multi-Object Tracking With Transformers [supp] |
L-Verse: Bidirectional Generation Between Image and Text [supp] |
PanopticDepth: A Unified Framework for Depth-Aware Panoptic Segmentation |
3D Shape Reconstruction From 2D Images With Disentangled Attribute Flow |
Feature Statistics Mixing Regularization for Generative Adversarial Networks [supp] |
Learning To Learn and Remember Super Long Multi-Domain Task Sequence [supp] |
OpenTAL: Towards Open Set Temporal Action Localization [supp] |
Urban Radiance Fields [supp] |
Self-Supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection [supp] |
Domain-Agnostic Prior for Transfer Semantic Segmentation |
Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-Learning [supp] |
Ego4D: Around the World in 3,000 Hours of Egocentric Video
|