【CVPR2022】论文列表与下载

CVPR2022将于6月22日召开,本次会议共收录了2067篇论文。由于数量较多,本文将分四个子文章呈现,可直接点击论文标题获取文档。
第二部分, 第三部分, 第四部分
【CVPR2022】论文列表与下载_第1张图片

【CVPR2022】论文列表与下载_第2张图片

1. Part One

Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification [supp]
SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization [supp]
GASP, a Generalized Framework for Agglomerative Clustering of Signed Graphs and Its Application to Instance Segmentation [supp]
Estimating Example Difficulty Using Variance of Gradients [supp]
One Loss for Quantization: Deep Hashing With Discrete Wasserstein Distributional Matching [supp]
Pixel Screening Based Intermediate Correction for Blind Deblurring [supp]
Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast
Controllable Animation of Fluid Elements in Still Images
Holocurtains: Programming Light Curtains via Binary Holography [supp]
Recurrent Dynamic Embedding for Video Object Segmentation [supp]
Deep Hierarchical Semantic Segmentation [supp]
f-SfT: Shape-From-Template With a Physics-Based Deformation Model [supp]
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism [supp]
DATA: Domain-Aware and Task-Aware Self-Supervised Learning [supp]
TWIST: Two-Way Inter-Label Self-Training for Semi-Supervised 3D Instance Segmentation [supp]
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds
Learning Adaptive Warping for Real-World Rolling Shutter Correction [supp]
Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions [supp]
RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures
Do Learned Representations Respect Causal Relationships? [supp]
ZebraPose: Coarse To Fine Surface Encoding for 6DoF Object Pose Estimation [supp]
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [supp]
Learning To Affiliate: Mutual Centralized Learning for Few-Shot Classification [supp]
CAPRI-Net: Learning Compact CAD Shapes With Adaptive Primitive Assembly [supp]
ATPFL: Automatic Trajectory Prediction Model Design Under Federated Learning Framework
Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning
Bridging the Gap Between Classification and Localization for Weakly Supervised Object Localization [supp]
Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation [supp]
3D Moments From Near-Duplicate Photos
Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization [supp]
Blind2Unblind: Self-Supervised Image Denoising With Visible Blind Spots [supp]
Balanced and Hierarchical Relation Learning for One-Shot Object Detection
End-to-End Generative Pretraining for Multimodal Video Captioning [supp]
Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts
NICE-SLAM: Neural Implicit Scalable Encoding for SLAM [supp]
HyperDet3D: Learning a Scene-Conditioned 3D Object Detector [supp]
Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion [supp]
CLRNet: Cross Layer Refinement Network for Lane Detection [supp]
Cross-Modal Map Learning for Vision and Language Navigation [supp]
Motion-Aware Contrastive Video Representation Learning via Foreground-Background Merging [supp]
Incremental Transformer Structure Enhanced Image Inpainting With Masking Positional Encoding [supp]
Pointly-Supervised Instance Segmentation [supp]
Cross-Modal Clinical Graph Transformer for Ophthalmic Report Generation
Human-Object Interaction Detection via Disentangled Transformer [supp]
DINE: Domain Adaptation From Single and Multiple Black-Box Predictors
LGT-Net: Indoor Panoramic Room Layout Estimation With Geometry-Aware Transformer Network [supp]
CRIS: CLIP-Driven Referring Image Segmentation
Multi-View Mesh Reconstruction With Neural Deferred Shading [supp]
CVF-SID: Cyclic Multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise From Image [supp]
Infrared Invisible Clothing: Hiding From Infrared Detectors at Multiple Angles in Real World [supp]
Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation
FaceFormer: Speech-Driven 3D Facial Animation With Transformers [supp]
Exploring Patch-Wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks [supp]
High-Resolution Face Swapping via Latent Semantics Disentanglement [supp]
Searching the Deployable Convolution Neural Networks for GPUs [supp]
Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning [supp]
DeepFake Disrupter: The Detector of DeepFake Is My Friend [supp]
Rotationally Equivariant 3D Object Detection [supp]
Accelerating DETR Convergence via Semantic-Aligned Matching [supp]
Long-Short Temporal Contrastive Learning of Video Transformers
Vision Transformer With Deformable Attention [supp]
Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture [supp]
Deep Vanishing Point Detection: Geometric Priors Make Dataset Variations Vanish [supp]
RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes
LiT: Zero-Shot Transfer With Locked-Image Text Tuning [supp]
Cloning Outfits From Real-World Images to 3D Characters for Generalizable Person Re-Identification [supp]
GeoNeRF: Generalizing NeRF With Geometry Priors [supp]
ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo [supp]
PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation With Photometrically Challenging Objects [supp]
Neural Compression-Based Feature Learning for Video Restoration [supp]
Expanding Low-Density Latent Regions for Open-Set Object Detection [supp]
Drop the GAN: In Defense of Patches Nearest Neighbors As Single Image Generative Models
Uformer: A General U-Shaped Transformer for Image Restoration [supp]
Exploring Dual-Task Correlation for Pose Guided Person Image Generation
Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data [supp]
Neural Rays for Occlusion-Aware Image-Based Rendering [supp]
Modeling 3D Layout for Group Re-Identification
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity [supp]
SIOD: Single Instance Annotated per Category per Image for Object Detection [supp]
Toward Fast, Flexible, and Robust Low-Light Image Enhancement [supp]
Online Learning of Reusable Abstract Models for Object Goal Navigation
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
SimMatch: Semi-Supervised Learning With Similarity Matching
OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks [supp]
HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation Network [supp]
EfficientNeRF Efficient Neural Radiance Fields [supp]
Quantifying Societal Bias Amplification in Image Captioning [supp]
Modular Action Concept Grounding in Semantic Video Prediction [supp]
StyleSwin: Transformer-Based GAN for High-Resolution Image Generation [supp]
Reinforced Structured State-Evolution for Vision-Language Navigation
Sub-Word Level Lip Reading With Visual Attention
Weakly Supervised High-Fidelity Clothing Model Generation [supp]
Highly-Efficient Incomplete Large-Scale Multi-View Clustering With Consensus Bipartite Graph [supp]
Towards Principled Disentanglement for Domain Generalization [supp]
Discrete Cosine Transform Network for Guided Depth Map Super-Resolution [supp]
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing [supp]
E2V-SDE: From Asynchronous Events to Fast and Continuous Video Reconstruction via Neural Stochastic Differential Equations [supp]
CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning [supp]
Discovering Objects That Can Move [supp]
Knowledge Mining With Scene Text for Fine-Grained Recognition
Self-Supervised Learning of Object Parts for Semantic Segmentation [supp]
Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects [supp]
Single-Photon Structured Light [supp]
Deblurring via Stochastic Refinement [supp]
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds
TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization [supp]
R(Det)2: Randomized Decision Routing for Object Detection [supp]
Abandoning the Bayer-Filter To See in the Dark [supp]
SASIC: Stereo Image Compression With Latent Shifts and Stereo Attention [supp]
Exploiting Temporal Relations on Radar Perception for Autonomous Driving [supp]
Multi-Instance Point Cloud Registration by Efficient Correspondence Clustering [supp]
Contrastive Boundary Learning for Point Cloud Segmentation [supp]
Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution [supp]
CVNet: Contour Vibration Network for Building Extraction
Hyperbolic Image Segmentation [supp]
Forward Compatible Training for Large-Scale Embedding Retrieval Systems [supp]
Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval [supp]
Swin Transformer V2: Scaling Up Capacity and Resolution [supp]
Neural Template: Topology-Aware Reconstruction and Disentangled Generation of 3D Meshes [supp]
DEFEAT: Deep Hidden Feature Backdoor Attacks by Imperceptible Perturbation and Latent Representation Constraints
Projective Manifold Gradient Layer for Deep Rotation Regression [supp]
CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation [supp]
Learning To Refactor Action and Co-Occurrence Features for Temporal Action Localization [supp]
It's Time for Artistic Correspondence in Music and Video [supp]
Mixed Differential Privacy in Computer Vision [supp]
AdaFace: Quality Adaptive Margin for Face Recognition [supp]
Learning Soft Estimator of Keypoint Scale and Orientation With Probabilistic Covariant Loss [supp]
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising [supp]
HCSC: Hierarchical Contrastive Selective Coding [supp]
TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition [supp]
KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in Videos
Invariant Grounding for Video Question Answering [supp]
Prompt Distribution Learning [supp]
RAGO: Recurrent Graph Optimizer for Multiple Rotation Averaging [supp]
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search [supp]
On Aliased Resizing and Surprising Subtleties in GAN Evaluation
Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes [supp]
Virtual Elastic Objects [supp]
DiSparse: Disentangled Sparsification for Multitask Model Compression [supp]
Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference [supp]
Opening Up Open World Tracking [supp]
Towards Efficient and Scalable Sharpness-Aware Minimization [supp]
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention [supp]
Rethinking Deep Face Restoration [supp]
OSSO: Obtaining Skeletal Shape From Outside [supp]
Temporal Alignment Networks for Long-Term Video [supp]
Few-Shot Head Swapping in the Wild [supp]
A Study on the Distribution of Social Biases in Self-Supervised Learning Visual Models [supp]
LAR-SR: A Local Autoregressive Model for Image Super-Resolution [supp]
Bayesian Invariant Risk Minimization [supp]
Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection [supp]
Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint [supp]
Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches [supp]
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes [supp]
ICON: Implicit Clothed Humans Obtained From Normals [supp]
Comparing Correspondences: Video Prediction With Correspondence-Wise Losses
Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks [supp]
The Auto Arborist Dataset: A Large-Scale Benchmark for Multiview Urban Forest Monitoring Under Domain Shift [supp]
On the Instability of Relative Pose Estimation and RANSAC's Role [supp]
Shape From Polarization for Complex Scenes in the Wild
Real-Time, Accurate, and Consistent Video Semantic Segmentation via Unsupervised Adaptation and Cross-Unit Deployment on Mobile Device [supp]
SNUG: Self-Supervised Neural Dynamic Garments [supp]
Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation [supp]
Glass Segmentation Using Intensity and Spectral Polarization Cues [supp]
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment
Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection [supp]
Pyramid Grafting Network for One-Stage High Resolution Saliency Detection [supp]
A Style-Aware Discriminator for Controllable Image Translation [supp]
Non-Iterative Recovery From Nonlinear Observations Using Generative Models [supp]
Incremental Cross-View Mutual Distillation for Self-Supervised Medical CT Synthesis
Enhancing Adversarial Training With Second-Order Statistics of Weights [supp]
Partially Does It: Towards Scene-Level FG-SBIR With Partial Input [supp]
Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo [supp]
Moving Window Regression: A Novel Approach to Ordinal Regression [supp]
UniCoRN: A Unified Conditional Image Repainting Network
Forecasting Characteristic 3D Poses of Human Actions [supp]
ACPL: Anti-Curriculum Pseudo-Labelling for Semi-Supervised Medical Image Classification [supp]
Learning to Deblur Using Light Field Generated and Real Defocus Images [supp]
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection [supp]
Safe Self-Refinement for Transformer-Based Domain Adaptation
Density-Preserving Deep Point Cloud Compression [supp]
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions [supp]
Which Model To Transfer? Finding the Needle in the Growing Haystack [supp]
Fast and Unsupervised Action Boundary Detection for Action Segmentation
Class-Incremental Learning With Strong Pre-Trained Models [supp]
Robust Optimization As Data Augmentation for Large-Scale Graphs [supp]
Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients
PhotoScene: Photorealistic Material and Lighting Transfer for Indoor Scenes [supp]
Improving the Transferability of Targeted Adversarial Examples Through Object-Based Diverse Input [supp]
IRON: Inverse Rendering by Optimizing Neural SDFs and Materials From Photometric Images [supp]
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
Versatile Multi-Modal Pre-Training for Human-Centric Perception
360MonoDepth: High-Resolution 360deg Monocular Depth Estimation [supp]
Splicing ViT Features for Semantic Appearance Transfer
Contrastive Regression for Domain Adaptation on Gaze Estimation [supp]
MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory Prediction [supp]
Multi-View Consistent Generative Adversarial Networks for 3D-Aware Image Synthesis [supp]
Putting People in Their Place: Monocular Regression of 3D People in Depth [supp]
POCO: Point Convolution for Surface Reconstruction [supp]
Memory-Augmented Non-Local Attention for Video Super-Resolution [supp]
Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
Classification-Then-Grounding: Reformulating Video Scene Graphs As Temporal Bipartite Graphs [supp]
Transformer-Empowered Multi-Scale Contextual Matching and Aggregation for Multi-Contrast MRI Super-Resolution [supp]
GazeOnce: Real-Time Multi-Person Gaze Estimation [supp]
GateHUB: Gated History Unit With Background Suppression for Online Action Detection [supp]
Few-Shot Font Generation by Learning Fine-Grained Local Styles [supp]
Bridging Video-Text Retrieval With Multiple Choice Questions [supp]
Depth-Aware Generative Adversarial Network for Talking Head Video Generation [supp]
Dual-Path Image Inpainting With Auxiliary GAN Inversion [supp]
DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
Generative Flows With Invertible Attentions [supp]
Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers [supp]
Estimating Fine-Grained Noise Model via Contrastive Learning
DiffPoseNet: Direct Differentiable Camera Pose Estimation
The Flag Median and FlagIRLS [supp]
Implicit Feature Decoupling With Depthwise Quantization [supp]
Graph-Context Attention Networks for Size-Varied Deep Graph Matching [supp]
FENeRF: Face Editing in Neural Radiance Fields
CoNeRF: Controllable Neural Radiance Fields [supp]
Noise2NoiseFlow: Realistic Camera Noise Modeling Without Clean Images [supp]
ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered Scenes
Remember Intentions: Retrospective-Memory-Based Trajectory Prediction [supp]
Measuring Compositional Consistency for Video Question Answering [supp]
Category Contrast for Unsupervised Domain Adaptation in Visual Tasks [supp]
SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
UNIST: Unpaired Neural Implicit Shape Translation Network [supp]
Local-Adaptive Face Recognition via Graph-Based Meta-Clustering and Regularized Adaptation [supp]
The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting [supp]
Mutual Information-Driven Pan-Sharpening
Shifting More Attention to Visual Backbone: Query-Modulated Refinement Networks for End-to-End Visual Grounding [supp]
A Framework for Learning Ante-Hoc Explainable Models via Concepts [supp]
Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior [supp]
FLOAT: Factorized Learning of Object Attributes for Improved Multi-Object Multi-Part Scene Parsing
Efficient Geometry-Aware 3D Generative Adversarial Networks [supp]
DO-GAN: A Double Oracle Framework for Generative Adversarial Networks [supp]
Dancing Under the Stars: Video Denoising in Starlight [supp]
FocusCut: Diving Into a Focus View in Interactive Segmentation
Medial Spectral Coordinates for 3D Shape Analysis
Contextualized Spatio-Temporal Contrastive Learning With Self-Supervision [supp]
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning [supp]
APES: Articulated Part Extraction From Sprite Sheets [supp]
Dressing in the Wild by Watching Dance Videos [supp]
SPAct: Self-Supervised Privacy Preservation for Action Recognition [supp]
Uni6D: A Unified CNN Framework Without Projection Breakdown for 6D Pose Estimation [supp]
De-Rendering 3D Objects in the Wild [supp]
SPAMs: Structured Implicit Parametric Models [supp]
Global Sensing and Measurements Reuse for Image Compressed Sensing
SeeThroughNet: Resurrection of Auxiliary Loss by Preserving Class Probability Information [supp]
Representing 3D Shapes With Probabilistic Directed Distance Fields [supp]
Learning ABCs: Approximate Bijective Correspondence for Isolating Factors of Variation With Weak Supervision [supp]
ABO: Dataset and Benchmarks for Real-World 3D Object Understanding [supp]
DETReg: Unsupervised Pretraining With Region Priors for Object Detection [supp]
Learning To Restore 3D Face From In-the-Wild Degraded Images [supp]
Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack [supp]
Convolutions for Spatial Interaction Modeling [supp]
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection [supp]
Salvage of Supervision in Weakly Supervised Object Detection [supp]
Cross-View Transformers for Real-Time Map-View Semantic Segmentation
Distinguishing Unseen From Seen for Generalized Zero-Shot Learning
Online Continual Learning on a Contaminated Data Stream With Blurry Task Boundaries [supp]
Controllable Dynamic Multi-Task Architectures [supp]
Learning To Imagine: Diversify Memory for Incremental Learning Using Unlabeled Data [supp]
SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles [supp]
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks [supp]
Deep Hybrid Models for Out-of-Distribution Detection [supp]
Accelerating Video Object Segmentation With Compressed Video [supp]
Exploring Domain-Invariant Parameters for Source Free Domain Adaptation
FastDOG: Fast Discrete Optimization on GPU [supp]
Fire Together Wire Together: A Dynamic Pruning Approach With Self-Supervised Mask Prediction [supp]
Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection
Self-Supervised Equivariant Learning for Oriented Keypoint Detection [supp]
Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation
Focal and Global Knowledge Distillation for Detectors
Learning To Prompt for Continual Learning [supp]
Human Mesh Recovery From Multiple Shots [supp]
Improving Adversarial Transferability via Neuron Attribution-Based Attacks [supp]
Better Trigger Inversion Optimization in Backdoor Scanning [supp]
GANSeg: Learning To Segment by Unsupervised Hierarchical Image Generation [supp]
Dense Learning Based Semi-Supervised Object Detection
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction [supp]
Convolution of Convolution: Let Kernels Spatially Collaborate
Make It Move: Controllable Image-to-Video Generation With Text Descriptions [supp]
C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection
Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling
Distribution Consistent Neural Architecture Search
Video-Text Representation Learning via Differentiable Weak Temporal Alignment [supp]
Bi-Directional Object-Context Prioritization Learning for Saliency Ranking [supp]
FreeSOLO: Learning To Segment Objects Without Annotations [supp]
What Do Navigation Agents Learn About Their Environment? [supp]
Progressive Minimal Path Method With Embedded CNN
FIFO: Learning Fog-Invariant Features for Foggy Scene Segmentation [supp]
3D Human Tongue Reconstruction From Single "In-the-Wild" Images [supp]
Enhancing Adversarial Robustness for Deep Metric Learning [supp]
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation [supp]
Lite-MDETR: A Lightweight Multi-Modal Detector
CoordGAN: Self-Supervised Dense Correspondences Emerge From GANs [supp]
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation [supp]
Unsupervised Visual Representation Learning by Online Constrained K-Means [supp]
Neural Point Light Fields [supp]
Vehicle Trajectory Prediction Works, but Not Everywhere [supp]
PSMNet: Position-Aware Stereo Merging Network for Room Layout Estimation [supp]
MonoDTR: Monocular 3D Object Detection With Depth-Aware Transformer [supp]
Learning Graph Regularisation for Guided Super-Resolution [supp]
Instance-Wise Occlusion and Depth Orders in Natural Scenes [supp]
Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos [supp]
Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-Shot Learning [supp]
Generalized Category Discovery [supp]
Maximum Consensus by Weighted Influences of Monotone Boolean Functions [supp]
TransforMatcher: Match-to-Match Attention for Semantic Correspondence [supp]
Robust Outlier Detection by De-Biasing VAE Likelihoods [supp]
Contour-Hugging Heatmaps for Landmark Detection [supp]
Voxel Field Fusion for 3D Object Detection
Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery [supp]
Programmatic Concept Learning for Human Motion Description and Synthesis [supp]
Interpretable Part-Whole Hierarchies and Conceptual-Semantic Relationships in Neural Networks [supp]
Fast Algorithm for Low-Rank Tensor Completion in Delay-Embedded Space
Panoptic, Instance and Semantic Relations: A Relational Context Encoder To Enhance Panoptic Segmentation [supp]
Point2Seq: Detecting 3D Objects As Sequences [supp]
Less Is More: Generating Grounded Navigation Instructions From Landmarks [supp]
Task-Adaptive Negative Envision for Few-Shot Open-Set Recognition [supp]
DisARM: Displacement Aware Relation Module for 3D Detection [supp]
ETHSeg: An Amodel Instance Segmentation Network and a Real-World Dataset for X-Ray Waste Inspection [supp]
MixFormer: Mixing Features Across Windows and Dimensions [supp]
Killing Two Birds With One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC
NeRF-Editing: Geometry Editing of Neural Radiance Fields [supp]
Optimal Correction Cost for Object Detection Evaluation [supp]
Contextual Similarity Distillation for Asymmetric Image Retrieval
FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment [supp]
Artistic Style Discovery With Independent Components
HEAT: Holistic Edge Attention Transformer for Structured Reconstruction [supp]
HyperStyle: StyleGAN Inversion With HyperNetworks for Real Image Editing [supp]
DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning [supp]
Mobile-Former: Bridging MobileNet and Transformer [supp]
Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation [supp]
DESTR: Object Detection With Split Transformer [supp]
LTP: Lane-Based Trajectory Prediction for Autonomous Driving [supp]
CycleMix: A Holistic Strategy for Medical Image Segmentation From Scribble Supervision [supp]
VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution [supp]
Towards End-to-End Unified Scene Text Detection and Layout Analysis [supp]
Image Based Reconstruction of Liquids From 2D Surface Detections [supp]
Contextual Outpainting With Object-Level Contrastive Learning [supp]
AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network [supp]
AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior [supp]
Depth-Guided Sparse Structure-From-Motion for Movies and TV Shows [supp]
End-to-End Referring Video Object Segmentation With Multimodal Transformers [supp]
Unpaired Cartoon Image Synthesis via Gated Cycle Mapping [supp]
IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo [supp]
Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds [supp]
FedCorr: Multi-Stage Federated Learning for Label Noise Correction [supp]
Detecting Camouflaged Object in Frequency Domain [supp]
RigNeRF: Fully Controllable Neural 3D Portraits
CLIP-Forge: Towards Zero-Shot Text-To-Shape Generation [supp]
Style-Based Global Appearance Flow for Virtual Try-On
Source-Free Object Detection by Learning To Overlook Domain Style
Active Learning for Open-Set Annotation
SceneSqueezer: Learning To Compress Scene for Camera Relocalization [supp]
SelfRecon: Self Reconstruction Your Digital Avatar From Monocular Video
Instance-Dependent Label-Noise Learning With Manifold-Regularized Transition Matrix Estimation
Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance With Expanded Views
Self-Supervised Models Are Continual Learners [supp]
Dreaming To Prune Image Deraining Networks [supp]
Equivariant Point Cloud Analysis via Learning Orientations for Message Passing [supp]
When Does Contrastive Visual Representation Learning Work?
One Step at a Time: Long-Horizon Vision-and-Language Navigation With Milestones
Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization [supp]
Point Cloud Pre-Training With Natural 3D Structures [supp]
Scene Consistency Representation Learning for Video Scene Segmentation [supp]
Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart [supp]
Exploiting Explainable Metrics for Augmented SGD [supp]
Semi-Supervised Video Semantic Segmentation With Inter-Frame Feature Reconstruction
GenDR: A Generalized Differentiable Renderer [supp]
Improving Neural Implicit Surfaces Geometry With Patch Warping [supp]
XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding [supp]
Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With a Bayesian Model [supp]
How Well Do Sparse ImageNet Models Transfer? [supp]
REX: Reasoning-Aware and Grounded Explanation [supp]
Dynamic Dual-Output Diffusion Models [supp]
StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis [supp]
JoinABLe: Learning Bottom-Up Assembly of Parametric CAD Joints [supp]
CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism [supp]
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes [supp]
V-Doc: Visual Questions Answers With Documents
AEGNN: Asynchronous Event-Based Graph Neural Networks [supp]
Layer-Wised Model Aggregation for Personalized Federated Learning [supp]
Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values [supp]
Style-Structure Disentangled Features and Normalizing Flows for Diverse Icon Colorization [supp]
Object-Aware Video-Language Pre-Training for Retrieval
OSKDet: Orientation-Sensitive Keypoint Localization for Rotated Object Detection
MAT: Mask-Aware Transformer for Large Hole Image Inpainting [supp]
Exploring Geometric Consistency for Monocular 3D Object Detection [supp]
Neural Window Fully-Connected CRFs for Monocular Depth Estimation
CodedVTR: Codebook-Based Sparse Voxel Transformer With Geometric Guidance [supp]
Uncertainty-Aware Deep Multi-View Photometric Stereo [supp]
Coherent Point Drift Revisited for Non-Rigid Shape Matching and Registration
Unleashing Potential of Unsupervised Pre-Training With Intra-Identity Regularization for Person Re-Identification [supp]
Align and Prompt: Video-and-Language Pre-Training With Entity Prompts
A Unified Query-Based Paradigm for Point Cloud Understanding [supp]
It's About Time: Analog Clock Reading in the Wild [supp]
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens [supp]
Cross Modal Retrieval With Querybank Normalisation [supp]
Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning
Universal Photometric Stereo Network Using Global Lighting Contexts [supp]
Hire-MLP: Vision MLP via Hierarchical Rearrangement [supp]
Ray3D: Ray-Based 3D Human Pose Estimation for Monocular Absolute 3D Localization [supp]
Occluded Human Mesh Recovery [supp]
Multi-Object Tracking Meets Moving UAV
ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization [supp]
Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [supp]
End-to-End Multi-Person Pose Estimation With Transformers
REGTR: End-to-End Point Cloud Correspondences With Transformers [supp]
Neural 3D Scene Reconstruction With the Manhattan-World Assumption [supp]
V2C: Visual Voice Cloning [supp]
Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection [supp]
3DeformRS: Certifying Spatial Deformations on Point Clouds [supp]
ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D Poses [supp]
MAD: A Scalable Dataset for Language Grounding in Videos From Movie Audio Descriptions [supp]
EvUnroll: Neuromorphic Events Based Rolling Shutter Image Correction [supp]
Gait Recognition in the Wild With Dense 3D Representations and a Benchmark [supp]
ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis [supp]
Temporal Context Matters: Enhancing Single Image Prediction With Disease Progression Representations [supp]
QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection
IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment [supp]
UniCon: Combating Label Noise Through Uniform Selection and Contrastive Learning [supp]
Learning From All Vehicles [supp]
BEHAVE: Dataset and Method for Tracking Human Object Interactions [supp]
Disentangled3D: Learning a 3D Generative Model With Disentangled Geometry and Appearance From Monocular Images [supp]
Revisiting Random Channel Pruning for Neural Network Compression [supp]
One-Bit Active Query With Contrastive Pairs [supp]
Estimating Egocentric 3D Human Pose in the Wild With External Weak Supervision [supp]
Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search
Does Text Attract Attention on E-Commerce Images: A Novel Saliency Prediction Dataset and Method
Topologically-Aware Deformation Fields for Single-View 3D Reconstruction [supp]
HyperInverter: Improving StyleGAN Inversion via Hypernetwork [supp]
Sparse Non-Local CRF [supp]
Dataset Distillation by Matching Training Trajectories
Towards Driving-Oriented Metric for Lane Detection Models [supp]
EPro-PnP: Generalized End-to-End Probabilistic Perspective-N-Points for Monocular Object Pose Estimation [supp]
Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection [supp]
XYDeblur: Divide and Conquer for Single Image Deblurring [supp]
Generating Diverse and Natural 3D Human Motions From Text [supp]
E-CIR: Event-Enhanced Continuous Intensity Recovery
Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond [supp]
STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded Scenes [supp]
Deep Decomposition for Stochastic Normal-Abnormal Transport [supp]
Global Context With Discrete Diffusion in Vector Quantised Modelling for Image Generation [supp]
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation [supp]
AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception
Towards Multimodal Depth Estimation From Light Fields [supp]
Learning To Recognize Procedural Activities With Distant Supervision [supp]
Multimodal Material Segmentation [supp]
Multi-Frame Self-Supervised Depth With Transformers [supp]
Weakly Supervised Rotation-Invariant Aerial Object Detection Network
Modeling Motion With Multi-Modal Features for Text-Based Video Segmentation [supp]
Surface Reconstruction From Point Clouds by Learning Predictive Context Priors [supp]
Deformable Video Transformer
Self-Supervised Keypoint Discovery in Behavioral Videos [supp]
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes [supp]
DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation [supp]
Connecting the Complementary-View Videos: Joint Camera Identification and Subject Association [supp]
End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps [supp]
Fast, Accurate and Memory-Efficient Partial Permutation Synchronization [supp]
Quantization-Aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging [supp]
Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation [supp]
Parametric Scattering Networks [supp]
SketchEdit: Mask-Free Local Image Manipulation With Partial Sketches [supp]
ScaleNet: A Shallow Architecture for Scale Estimation [supp]
E2EC: An End-to-End Contour-Based Method for High-Quality High-Speed Instance Segmentation
Bounded Adversarial Attack on Deep Content Features [supp]
BatchFormer: Learning To Explore Sample Relationships for Robust Representation Learning [supp]
Self-Supervised Image-Specific Prototype Exploration for Weakly Supervised Semantic Segmentation
CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification [supp]
Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations [supp]
Learning Multi-View Aggregation in the Wild for Large-Scale 3D Semantic Segmentation [supp]
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-Wise Semantic Alignment and Generation [supp]
Improving Video Model Transfer With Dynamic Representation Learning [supp]
PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition [supp]
Clothes-Changing Person Re-Identification With RGB Modality Only [supp]
Chitransformer: Towards Reliable Stereo From Cues [supp]
Robust Image Forgery Detection Over Online Social Network Shared Images [supp]
QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation [supp]
Physically Disentangled Intra- and Inter-Domain Adaptation for Varicolored Haze Removal [supp]
Modality-Agnostic Learning for Radar-Lidar Fusion in Vehicle Detection
A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty [supp]
Representation Compensation Networks for Continual Semantic Segmentation [supp]
Adaptive Gating for Single-Photon 3D Imaging [supp]
Tracking People by Predicting 3D Appearance, Location and Pose [supp]
Text2Mesh: Text-Driven Neural Stylization for Meshes [supp]
Learning To Solve Hard Minimal Problems [supp]
H4D: Human 4D Modeling by Learning Neural Compositional Representation [supp]
FWD: Real-Time Novel View Synthesis With Forward Warping and Depth [supp]
Non-Generative Generalized Zero-Shot Learning via Task-Correlated Disentanglement and Controllable Samples Synthesis
C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection [supp]
Forward Compatible Few-Shot Class-Incremental Learning [supp]
BaLeNAS: Differentiable Architecture Search via the Bayesian Learning Rule [supp]

你可能感兴趣的:(计算机视觉)