CVPR2020-Code
CVPR 2020 论文开源项目合集,同时欢迎各位大佬提交issue,分享CVPR 2020开源项目
关于往年CV顶会论文(如ECCV 2020、CVPR 2019、ICCV 2019)以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
CNN
Exploring Self-attention for Image Recognition
Improving Convolutional Networks with Self-Calibrated Convolutions
Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets
图像分类
Interpretable and Accurate Fine-grained Recognition via Region Grouping
Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion
Spatially Attentive Output Layer for Image Classification
视频分类
SmallBigNet: Integrating Core and Contextual Views for Video Classification
目标检测
Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax
AugFPN: Improving Multi-scale Feature Learning for Object Detection
Noise-Aware Fully Webly Supervised Object Detection
Learning a Unified Sample Weighting Network for Object Detection
D2Det: Towards High Quality Object Detection and Instance Segmentation
Dynamic Refinement Network for Oriented and Densely Packed Object Detection
Scale-Equalizing Pyramid Convolution for Object Detection
Revisiting the Sibling Head in Object Detector
Scale-equalizing Pyramid Convolution for Object Detection
Detection in Crowded Scenes: One Proposal, Multiple Predictions
Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection
Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection
BiDet: An Efficient Binarized Object Detector
Harmonizing Transferability and Discriminability for Adapting Object Detectors
CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection
Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection
EfficientDet: Scalable and Efficient Object Detection
3D目标检测
SESS: Self-Ensembling Semi-Supervised 3D Object Detection
Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection
What You See is What You Get: Exploiting Visibility for 3D Object Detection
Learning Depth-Guided Convolutions for Monocular 3D Object Detection
Structure Aware Single-stage 3D Object Detection from Point Cloud
IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving
Train in Germany, Test in The USA: Making 3D Object Detectors Generalize
MLCVNet: Multi-Level Context VoteNet for 3D Object Detection
3DSSD: Point-based 3D Single Stage Object Detector
Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation
End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection
DSGN: Deep Stereo Geometry Network for 3D Object Detection
LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention
PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
视频目标检测
Memory Enhanced Global-Local Aggregation for Video Object Detection
目标跟踪
SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking
D3S -- A Discriminative Single Shot Segmentation Tracker
ROAM: Recurrently Optimizing Tracking Model
Siam R-CNN: Visual Tracking by Re-Detection
Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises
High-Performance Long-Term Tracking with Meta-Updater
AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization
Probabilistic Regression for Visual Tracking
MAST: A Memory-Augmented Self-supervised Tracker
Siamese Box Adaptive Network for Visual Tracking
多目标跟踪
3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset
语义分割
FDA: Fourier Domain Adaptation for Semantic Segmentation
Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation
Single-Stage Semantic Segmentation from Image Labels
Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation
CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement
Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation
Temporally Distributed Networks for Fast Video Segmentation
Context Prior for Scene Segmentation
Strip Pooling: Rethinking Spatial Pooling for Scene Parsing
Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks
Learning Dynamic Routing for Semantic Segmentation
实例分割
D2Det: Towards High Quality Object Detection and Instance Segmentation
PolarMask: Single Shot Instance Segmentation with Polar Representation
CenterMask : Real-Time Anchor-Free Instance Segmentation
BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation
Deep Snake for Real-Time Instance Segmentation
Mask Encoding for Single Shot Instance Segmentation
全景分割
Video Panoptic Segmentation
Pixel Consensus Voting for Panoptic Segmentation
BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation
视频目标分割
A Transductive Approach for Video Object Segmentation
State-Aware Tracker for Real-Time Video Object Segmentation
Learning Fast and Robust Target Models for Video Object Segmentation
Learning Video Object Segmentation from Unlabeled Videos
超像素分割
Superpixel Segmentation with Fully Convolutional Networks
交互式图像分割
Interactive Object Segmentation with Inside-Outside Guidance
NAS
AOWS: Adaptive and optimal network width search with latency constraints
Densely Connected Search Space for More Flexible Neural Architecture Search
MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning
FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions
Neural Architecture Search for Lightweight Non-Local Networks
Rethinking Performance Estimation in Neural Architecture Search
CARS: Continuous Evolution for Efficient Neural Architecture Search
GAN
SEAN: Image Synthesis with Semantic Region-Adaptive Normalization
Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation
Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning
PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer
Semantically Mutil-modal Image Synthesis
Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping
Learning to Cartoonize Using White-box Cartoon Representations
GAN Compression: Efficient Architectures for Interactive Conditional GANs
Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions
Re-ID
High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification
COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification
Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking
Pose-guided Visible Part Matching for Occluded Person ReID
Weakly supervised discriminative feature learning with state information for person identification
3D点云(分类/分割/配准等)
3D点云卷积
PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling
Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds
Grid-GCN for Fast and Scalable Point Cloud Learning
FPConv: Learning Local Flattening for Point Convolution
3D点云分类
PointAugment: an Auto-Augmentation Framework for Point Cloud Classification
3D点云语义分割
RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds
Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels
PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation
Learning to Segment 3D Point Clouds in 2D Image Space
3D点云实例分割
PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
3D点云配准
Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences
D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features
RPM-Net: Robust Point Matching using Learned Features
3D点云补全
Cascaded Refinement Network for Point Cloud Completion
3D点云目标跟踪
P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds
其他
An Efficient PointLSTM for Point Clouds Based Gesture Recognition
人脸
人脸识别
CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition
Learning Meta Face Recognition in Unseen Domains
人脸检测
人脸活体检测
Searching Central Difference Convolutional Networks for Face Anti-Spoofing
人脸表情识别
Suppressing Uncertainties for Large-Scale Facial Expression Recognition
人脸转正
Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images
人脸3D重建
AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"
FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction
人体姿态估计(2D/3D)
2D人体姿态估计
TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation
Distribution-Aware Coordinate Representation for Human Pose Estimation
3D人体姿态估计
Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data
Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach
Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data
Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis
Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation
Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation
Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS
人体解析
Correlating Edge, Pose with Parsing
场景文本检测
STEFANN: Scene Text Editor using Font Adaptive Neural Network
ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection
UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection
场景文本识别
SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition
UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition
特征(点)检测和描述
SuperGlue: Learning Feature Matching with Graph Neural Networks
超分辨率
图像超分辨率
Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution
Learning Texture Transformer Network for Image Super-Resolution
Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining
Structure-Preserving Super Resolution with Gradient Guidance
Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy
视频超分辨率
TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution
Space-Time-Aware Multi-Resolution Video Enhancement
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution
模型压缩/剪枝
DMCP: Differentiable Markov Channel Pruning for Neural Networks
Forward and Backward Information Retention for Accurate Binary Neural Networks
Towards Efficient Model Compression via Learned Global Ranking
HRank: Filter Pruning using High-Rank Feature Map
GAN Compression: Efficient Architectures for Interactive Conditional GANs
Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression
视频理解/行为识别
Oops! Predicting Unintentional Action in Video
PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition
Intra- and Inter-Action Understanding via Temporal Action Parsing
3DV: 3D Dynamic Voxel for Action Recognition in Depth Video
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
TEA: Temporal Excitation and Aggregation for Action Recognition
X3D: Expanding Architectures for Efficient Video Recognition
Temporal Pyramid Network for Action Recognition
基于骨架的动作识别
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition
人群计数
深度估计
BiFuse: Monocular 360◦ Depth Estimation via Bi-Projection Fusion
Focus on defocus: bridging the synthetic to real domain gap for depth estimation
Bi3D: Stereo Depth Estimation via Binary Classifications
AANet: Adaptive Aggregation Network for Efficient Stereo Matching
Towards Better Generalization: Joint Depth-Pose Learning without PoseNet
单目深度估计
On the uncertainty of self-supervised monocular depth estimation
3D Packing for Self-Supervised Monocular Depth Estimation
Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation
6D目标姿态估计
PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion
EPOS: Estimating 6D Pose of Objects with Symmetries
G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features
手势估计
HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation
Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data
显著性检测
JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection
UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders
去噪
A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising
CycleISP: Real Image Restoration via Improved Data Synthesis
去雨
Multi-Scale Progressive Fusion Network for Single Image Deraining
Detail-recovery Image Deraining via Context Aggregation Networks
去模糊
视频去模糊
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior
去雾
Domain Adaptation for Image Dehazing
Multi-Scale Boosted Dehazing Network with Dense Feature Fusion
特征点检测与描述
ASLFeat: Learning Local Features of Accurate Shape and Localization
视觉问答(VQA)
VC R-CNN:Visual Commonsense R-CNN
视频问答(VideoQA)
Hierarchical Conditional Relation Networks for Video Question Answering
视觉语言导航
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
视频压缩
Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement
视频插帧
AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation
FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation
Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution
Space-Time-Aware Multi-Resolution Video Enhancement
Scene-Adaptive Video Frame Interpolation via Meta-Learning
Softmax Splatting for Video Frame Interpolation
风格迁移
Diversified Arbitrary Style Transfer via Deep Feature Perturbation
Collaborative Distillation for Ultra-Resolution Universal Style Transfer
车道线检测
Inter-Region Affinity Distillation for Road Marking Segmentation
"人-物"交互(HOT)检测
PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection
Detailed 2D-3D Joint Representation for Human-Object Interaction
Cascaded Human-Object Interaction Recognition
VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions
轨迹预测
The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction
Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction
运动预测
Collaborative Motion Prediction via Neural Motion Message Passing
MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps
光流估计
Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation
图像检索
Evade Deep Image Retrieval by Stashing Private Images in the Hash Space
虚拟试衣
Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content
HDR
Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline
对抗样本
Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction
Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance
三维重建
Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild
Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization
Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion
深度补全
Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End
语义场景补全
3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior
图像/视频描述
Syntax-Aware Action Targeting for Video Captioning
线框解析
Holistically-Attracted Wireframe Parser
数据集
OASIS: A Large-Scale Dataset for Single Image 3D in the Wild
STEFANN: Scene Text Editor using Font Adaptive Neural Network
Interactive Object Segmentation with Inside-Outside Guidance
Video Panoptic Segmentation
FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation
3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset
TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style
Oops! Predicting Unintentional Action in Video
The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction
Open Compound Domain Adaptation
Intra- and Inter-Action Understanding via Temporal Action Parsing
Dynamic Refinement Network for Oriented and Densely Packed Object Detection
COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification
KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation
AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"
Learning to Autofocus
FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction
Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
A Local-to-Global Approach to Multi-modal Movie Scene Segmentation
Deep Homography Estimation for Dynamic Scenes
Assessing Image Quality Issues for Real-World Problems
UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World
PANDA: A Gigapixel-level Human-centric Video Dataset
IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning
Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS
其他
CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus
Learning to Learn Single Domain Generalization
Open Compound Domain Adaptation
Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision
QEBA: Query-Efficient Boundary-Based Blackbox Attack
Equalization Loss for Long-Tailed Object Recognition
Instance-aware Image Colorization
Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting
Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching
Epipolar Transformers
Bringing Old Photos Back to Life
MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask
Self-Supervised Viewpoint Learning from Image Collections
Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations
Towards Learning Structure via Consensus for Face Segmentation and Parsing
Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging
Lightweight Photometric Stereo for Facial Details Recovery
Footprints and Free Space from a Single Color Image
Self-Supervised Monocular Scene Flow Estimation
Quasi-Newton Solver for Robust Non-Rigid Registration
A Local-to-Global Approach to Multi-modal Movie Scene Segmentation
DeepFLASH: An Efficient Network for Learning-based Medical Image Registration
Self-Supervised Scene De-occlusion
Polarized Reflection Removal with Perfect Alignment in the Wild
Background Matting: The World is Your Green Screen
What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective
Look-into-Object: Self-supervised Structure Modeling for Object Recognition
Video Object Grounding using Semantic Roles in Language Description
Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives
SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization
On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location
GhostNet: More Features from Cheap Operations
AdderNet: Do We Really Need Multiplications in Deep Learning?
Deep Image Harmonization via Domain Verification
Blurry Video Frame Interpolation
Extremely Dense Point Correspondences using a Learned Feature Descriptor
Filter Grafting for Deep Neural Networks
Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation
Detecting Attended Visual Targets in Video
Deep Image Spatial Transformation for Person Image Generation
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
不确定中没中
FADNet: A Fast and Accurate Network for Disparity Estimation