在本文中,我们对ICCV2021的最新论文进行了分类汇总,按研究方向整理。包含目标检测、图像分割、目标跟踪、医学影像、3D、模型压缩、图像处理、姿态估计、文本检测等多个方向,同时,我们将对优秀论文解读报道和技术直播,欢迎大家关注~
[5] Active Learning for Deep Object Detection via Probabilistic Modeling
paper:https://arxiv.org/abs/2103.16130
[4] Detecting Invisible People
paper:https://arxiv.org/abs/2012.08419
project:https://www.cs.cmu.edu/~tkhurana/invisible.htm
video:https://youtu.be/StEfnshXrCE
[3] Conditional Variational Capsule Network for Open Set Recognition
paper:https://arxiv.org/abs/2104.09159
code:https://github.com/guglielmocamporese/cvaecaposr
[2] MDETR : Modulated Detection for End-to-End Multi-Modal Understanding(Oral)
paper:https://arxiv.org/pdf/2104.12763
code:https://github.com/ashkamath/mdetr
project:https://ashkamath.github.io/mdetr_page/
colab:https://colab.research.google.com/github/ashkamath/mdetr/blob/colab/notebooks/MDETR_demo.ipynb
[1] DetCo: Unsupervised Contrastive Learning for Object Detection
paper:https://arxiv.org/abs/2102.04803
code:https://github.com/xieenze/DetCo
[2] Labels4Free: Unsupervised Segmentation using StyleGAN
paper:https://arxiv.org/abs/2103.14968
code:https://rameenabdal.github.io/Labels4Free
project:https://rameenabdal.github.io/Labels4Free/
[1] Mining Latent Classes for Few-shot Segmentation(Oral)
paper:https://arxiv.org/abs/2103.15402
code:https://github.com/LiheYoung/MiningFSS
[2] Crossover Learning for Fast Online Video Instance Segmentation
code:https://github.com/hustvl/CrossVIS)
[1] Instances as Queries
paper:https://arxiv.org/abs/2105.01928
code:https://github.com/hustvl/QueryInst
[1] Calibrated Adversarial Refinement for Stochastic Semantic Segmentation
paper:https://arxiv.org/abs/2006.13144
code:https://github.com/EliasKassapis/CARSSS
[3] Rethinking Spatial Dimensions of Vision Transformers
paper:https://arxiv.org/abs/2103.16302
code:https://github.com/naver-ai/pit
[2] Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers(Oral)
paper:https://arxiv.org/pdf/2103.15679.pdf
code:https://github.com/hila-chefer/Transformer-MM-Explainability
[1] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions(Oral)
paper:https://arxiv.org/abs/2102.12122
code:https://github.com/whai362/PVT)
解读:金字塔视觉Transformer(PVT):用于密集预测的多功能backbone
[2] Labels4Free: Unsupervised Segmentation using StyleGAN
paper:https://arxiv.org/abs/2103.14968
code:https://rameenabdal.github.io/Labels4Free
project:https://rameenabdal.github.io/Labels4Free/)
[1] EigenGAN: Layer-Wise Eigen-Learning for GANs
paper:https://arxiv.org/abs/2104.12476
code:https://github.com/LynnHo/EigenGAN-Tensorflow
[1] Equivariant Imaging: Learning Beyond the Range Space(Oral)
paper:https://arxiv.org/pdf/2103.14756.pdf
[1] Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks
paper:https://arxiv.org/abs/2004.03791
code:https://github.com/LongguangWang/ArbSR
[1] Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts(字体生成)
paper:https://arxiv.org/abs/2104.00887
code:https://github.com/clovaai/mxfont
[1] HuMoR: 3D Human Motion Model for Robust Pose Estimation(Oral)
paper:https://geometry.stanford.edu/projects/humor/docs/humor.pdf
video:https://youtu.be/5VWirxUHG0Y
project:https://geometry.stanford.edu/projects/humor/
[1] TransReID: Transformer-based Object Re-Identification
paper:https://arxiv.org/abs/2102.04378
code:https://github.com/heshuting555/TransReID
解读:来自Transformer的降维打击:ReID各项任务全面领先,阿里&浙大提出TransReID
[2] TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization
paper:https://arxiv.org/abs/2103.14862
code:https://github.com/vasgaowei/TS-CAM
[1] Boundary-sensitive Pre-training for Temporal Localization in Videos
paper:https://arxiv.org/abs/2011.10830
[1] COTR: Correspondence Transformer for Matching Across Images
paper:https://arxiv.org/abs/2103.14167)
[1] MVTN: Multi-View Transformation Network for 3D Shape Recognition
paper:https://arxiv.org/abs/2011.13244)
[1] Detecting Invisible People
paper:https://arxiv.org/abs/2012.08419
project:https://www.cs.cmu.edu/~tkhurana/invisible.htm
video:https://youtu.be/StEfnshXrCE
[1] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data
paper:https://arxiv.org/abs/2103.16607
code:https://github.com/ElementAI/seasonal-contrast
[1] Unconstrained Scene Generation with Locally Conditioned Radiance Fields
paper:https://arxiv.org/abs/2104.00670
[1] Generative Compositional Augmentations for Scene Graph Prediction
paper:https://arxiv.org/abs/2007.05756
code:https://github.com/bknyaz/sgg
[1] MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks
paper:https://arxiv.org/abs/2103.06132
[1] Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
paper:https://arxiv.org/abs/2101.10030
code:https://github.com/tianyu0207/RTFM
[1] In-Place Scene Labelling and Understanding with Implicit Scene Representation(Oral)
paper:https://arxiv.org/abs/2103.15875
project:https://shuaifengzhi.com/Semantic-NeRF/
[2] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data
paper:https://arxiv.org/abs/2103.16607
code:https://github.com/ElementAI/seasonal-contrast
[1] Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling
paper:https://arxiv.org/abs/2105.12441
[1] Learning with Memory-based Virtual Classes for Deep Metric Learning
paper:https://arxiv.org/abs/2103.16940
[1] Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning
paper:https://arxiv.org/abs/2106.09701
code:https://github.com/GT-RIPL/AlwaysBeDreaming-DFCIL
project:https://jamessealesmith.github.io/project/dfcil/
[1] CoMatch: Semi-supervised Learning with Contrastive Graph Regularization
paper:https://arxiv.org/abs/2011.11183
code:https://github.com/salesforce/CoMatch
[1] Active Learning for Deep Object Detection via Probabilistic Modeling
paper:https://arxiv.org/abs/2103.16130
[2] On the hidden treasure of dialog in video question answering
paper:https://arxiv.org/abs/2103.14517
[1] Just Ask: Learning to Answer Questions from Millions of Narrated Videos(Oral)
paper:https://arxiv.org/abs/2012.00451
code:https://github.com/antoyang/just-ask
project:https://antoyang.github.io/just-ask.html
[1] 4DComplete: Non-Rigid Motion Estimation Beyond the Observable Surface(4D重建)
paper:https://arxiv.org/abs/2105.01905
dataset:https://github.com/rabbityl/DeformingThings4D)
video:https://youtu.be/QrSsVoTRpWk
Pathdreamer: A World Model for Indoor Navigation(视觉导航)
paper:https://arxiv.org/abs/2105.08756
IPOKE: POKING A STILL IMAGE FOR CONTROLLED STOCHASTIC VIDEO SYNTHESIS
paper:https://arxiv.org/abs/2107.02790
code:https://github.com/CompVis/ipoke
project:https://compvis.github.io/ipoke/)
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
paper:https://arxiv.org/abs/2104.00677
project:https://www.ajayj.com/dietnerf
KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
paper:https://arxiv.org/abs/2103.13744
code:https://github.com/creiser/kilonerf