作者:朱政
原文:CV arXiv Daily:计算机视觉论文每日精选(2019/1/23-2018/1/28)
如有兴趣可以**点击加入极市CV专业微信群**,获取更多高质量干货
本系列文章转自计算机视觉牛人朱政大佬的微信公众号(CV arxiv Daily),已经授权转载,主要是为了帮大家筛选计算机视觉领域每天的arXiv中的论文,主要关注领域:目标检测,图像分割,单/多目标跟踪,行为识别,人体姿态估计与跟踪,行人重识别,GAN,模型搜索等。欢迎关注我,每日会定时转发,努力学习起来~
[1] Google的自监督表征学习文章
Revisiting Self-Supervised Visual Representation Learning
论文链接:https://arxiv.org/abs/1901.09005
代码地址:https://github.com/google/revisiting-self-supervised
摘要: Unsupervised visual representation learning remains a largely unsolved problem in computer vision research. Among a big body of recently proposed approaches for unsupervised learning of visual representations, a class of self-supervised techniques achieves superior performance on many challenging benchmarks. A large number of the pretext tasks for self-supervised learning have been studied, but other important aspects, such as the choice of convolutional neural networks (CNN), has not received equal attention. Therefore, we revisit numerous previously proposed self-supervised models, conduct a thorough large scale study and, as a result, uncover multiple crucial insights. We challenge a number of common practices in selfsupervised visual representation learning and observe that standard recipes for CNN design do not always translate to self-supervised representation learning. As part of our study, we drastically boost the performance of previously proposed techniques and outperform previously published state-of-the-art results by a large margin.
[2] ICLR 2019 GAN文章
Diversity-Sensitive Conditional Generative Adversarial Networks
论文链接:https://arxiv.org/abs/1901.09024
摘要: We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To address such issue, we propose to explicitly regularize the generator to produce diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives. Additionally, explicit regularization on generator allows our method to control a balance between visual quality and diversity. We demonstrate the effectiveness of our method on three conditional generation tasks: image-to-image translation, image inpainting, and future video prediction. We show that simple addition of our regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.
[3] 上交卢策吾老师的Q-learning for斗地主 文章
Combinational Q-Learning for Dou Di Zhu
论文链接:https://arxiv.org/abs/1901.08925
代码地址:https://github.com/qq456cvb/doudizhu-C
摘要: Deep reinforcement learning (DRL) has gained a lot of attention in recent years, and has been proven to be able to play Atari games and Go at or above human levels. However, those games are assumed to have a small fixed number of actions and could be trained with a simple CNN network. In this paper, we study a special class of Asian popular card games called Dou Di Zhu, in which two adversarial groups of agents must consider numerous card combinations at each time step, leading to huge number of actions. We propose a novel method to handle combinatorial actions, which we call combinational Q-learning (CQL). We employ a two-stage network to reduce action space and also leverage order-invariant max-pooling operations to extract relationships between primitive actions. Results show that our method prevails over state-of-the art methods like naive Q-learning and A3C. We develop an easy-to-use card game environments and train all agents adversarially from sractch, with only knowledge of game rules and verify that our agents are comparative to humans. Our code to reproduce all reported results will be available online.
[4] WACV2019 3D点云 文章
Dense 3D Point Cloud Reconstruction Using a Deep Pyramid Network
论文链接:https://arxiv.org/abs/1901.08906
摘要: Reconstructing a high-resolution 3D model of an object is a challenging task in computer vision. Designing scalable and light-weight architectures is crucial while addressing this problem. Existing point-cloud based reconstruction approaches directly predict the entire point cloud in a single stage. Although this technique can handle low-resolution point clouds, it is not a viable solution for generating dense, high-resolution outputs. In this work, we introduce DensePCR, a deep pyramidal network for point cloud reconstruction that hierarchically predicts point clouds of increasing resolution. Towards this end, we propose an architecture that first predicts a low-resolution point cloud, and then hierarchically increases the resolution by aggregating local and global point features to deform a grid. Our method generates point clouds that are accurate, uniform and dense. Through extensive quantitative and qualitative evaluation on synthetic and real datasets, we demonstrate that DensePCR outperforms the existing state-of-the-art point cloud reconstruction works, while also providing a light-weight and scalable architecture for predicting high-resolution outputs.
[5] Multi-Target Multi-Camera Tracking 文章
Multiple Hypothesis Tracking Algorithm for Multi-Target Multi-Camera Tracking with Disjoint Views
论文链接:https://arxiv.org/abs/1901.08787
摘要: In this study, a multiple hypothesis tracking (MHT) algorithm for multi-target multi-camera tracking (MCT) with disjoint views is proposed. Our method forms track-hypothesis trees, and each branch of them represents a multi-camera track of a target that may move within a camera as well as move across cameras. Furthermore, multi-target tracking within a camera is performed simultaneously with the tree formation by manipulating a status of each track hypothesis. Each status represents three different stages of a multi-camera track: tracking, searching, and end-of-track. The tracking status means targets are tracked by a single camera tracker. In the searching status, the disappeared targets are examined if they reappear in other cameras. The end-of-track status does the target exited the camera network due to its lengthy invisibility. These three status assists MHT to form the track-hypothesis trees for multi-camera tracking. Furthermore, they present a gating technique for eliminating of unlikely observation-to-track association. In the experiments, they evaluate the proposed method using two datasets, DukeMTMC and NLPR-MCT, which demonstrates that the proposed method outperforms the state-of-the-art method in terms of improvement of the accuracy. In addition, they show that the proposed method can operate in real-time and online.
[6] One-Class CNN 文章
One-Class Convolutional Neural Network
论文链接:https://arxiv.org/abs/1901.08688
代码地址:github.com/otkupjnoz/oc-cnn
摘要: We present a novel Convolutional Neural Network (CNN) based approach for one class classification. The idea is to use a zero centered Gaussian noise in the latent space as the pseudo-negative class and train the network using the cross-entropy loss to learn a good representation as well as the decision boundary for the given class. A key feature of the proposed approach is that any pre-trained CNN can be used as the base network for one class classification. The proposed One Class CNN (OC-CNN) is evaluated on the UMDAA-02 Face, Abnormality-1001, FounderType-200 datasets. These datasets are related to a variety of one class application problems such as user authentication, abnormality detection and novelty detection. Extensive experiments demonstrate that the proposed method achieves significant improvements over the recent state-of-the-art methods. The source code is available at : github.com/otkupjnoz/oc-cnn.
[7] In Defense of the Triplet Loss 文章
In Defense of the Triplet Loss for Visual Recognition
论文链接:https://arxiv.org/abs/1901.08616
摘要: We employ triplet loss as a space embedding regularizer to boost classification performance. Standard architectures, like ResNet and DesneNet, are extended to support both losses with minimal hyper-parameter tuning. This promotes generality while fine-tuning pretrained networks. Triplet loss is a powerful surrogate for recently proposed embedding regularizers. Yet, it is avoided for large batch-size requirement and high computational cost. Through our experiments, we re-assess these assumptions. During inference, our network supports both classification and embedding tasks without any computational overhead. Quantitative evaluation highlights how our approach compares favorably to the existing state of the art on multiple fine-grained recognition datasets. Further evaluation on an imbalanced video dataset achieves significant improvement (>7%). Beyond boosting efficiency, triplet loss brings retrieval and interpretability to classification models.
SiamRPN系列文章总结
[0] SiamFC文章,对SINT(Siamese Instance Search for Tracking,in CVPR2016)改进,第一个提出用全卷积孪生网络结构来解决tracking问题的paper,可以视为只有一个anchor的SiamRPN
论文题目:Fully-convolutional siamese networks for object tracking
论文地址:https://arxiv.org/abs/1606.09549
项目地址:https://www.robots.ox.ac.uk/~luca/siamese-fc.html
tf实现:https://github.com/torrvision/siamfc-tf
pytorch实现:https://github.com/rafellerc/Pytorch-SiamFC
[0.1] 后面的v2版本即CFNet,用cf操作代替了correlation操作。
论文题目:End-To-End Representation Learning for Correlation Filter Based Tracking
论文地址:http://openaccess.thecvf.com/content_cvpr_2017/html/Valmadre_End-To-End_Representation_Learning_CVPR_2017_paper.html
项目地址:http://www.robots.ox.ac.uk/~luca/cfnet.html
MatConvNet实现:https://github.com/bertinetto/cfnet
SiamFC之后有诸多的改进工作,例如
[0.2] StructSiam,在跟踪中考虑Local structures
论文题目:Structured Siamese Network for Real-Time Visual Tracking
论文地址:http://openaccess.thecvf.com/content_ECCV_2018/papers/Yunhua_Zhang_Structured_Siamese_Network_ECCV_2018_paper.pdf
[0.3] SiamFC-tri,在Saimese跟踪网络中引入了Triplet Loss
论文题目:Triplet Loss in Siamese Network for Object Tracking
论文地址:http://openaccess.thecvf.com/content_ECCV_2018/papers/Xingping_Dong_Triplet_Loss_with_ECCV_2018_paper.pdf
[0.4] DSiam,动态Siamese网络
论文题目:Learning Dynamic Siamese Network for Visual Object Tracking
论文地址:http://openaccess.thecvf.com/content_ICCV_2017/papers/Guo_Learning_Dynamic_Siamese_ICCV_2017_paper.pdf
代码地址:https://github.com/tsingqguo/DSiam
[0.5] SA-Siam,Twofold Siamese网络
论文题目:A Twofold Siamese Network for Real-Time Object Tracking
论文地址:http://openaccess.thecvf.com/content_cvpr_2018/papers/He_A_Twofold_Siamese_CVPR_2018_paper.pdf
[1] SiamRPN文章,将anchor应用在候选区域的每个位置,同时进行分类和回归,one-shot local detection。
论文题目:High Performance Visual Tracking with Siamese Region Proposal Network
论文地址:http://openaccess.thecvf.com/content_cvpr_2018/papers/Li_High_Performance_Visual_CVPR_2018_paper.pdf
项目地址:http://bo-li.info/SiamRPN/
[2] DaSiamRPN, SiamRPN文章的follow-up,重点强调了训练过程中样本不均衡的问题,增加了正样本的种类和有语义的负样本。
论文题目:Distractor-aware Siamese Networks for Visual Object Tracking
论文地址:https://arxiv.org/abs/1808.06048
项目地址:http://bo-li.info/DaSiamRPN/
test code:https://github.com/foolwood/DaSiamRPN
[3] Cascaded SiamRPN,将若干RPN模块cascade起来,同时利用了不同layer的feature。
论文题目:Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking
论文地址:https://arxiv.org/abs/1812.06148
[4] SiamMask,在SiamRPN的结构中增加了一个mask分支,同时进行tracking和video segmentation。
论文题目:Fast Online Object Tracking and Segmentation: A Unifying Approach
论文地址:https://arxiv.org/abs/1812.05050
项目地址:http://www.robots.ox.ac.uk/~qwang/SiamMask/
[5] SiamRPN++, SiamRPN文章的follow-up,让现代网络例如ResNet在tracking中work了,基本上所有数据集都是SOTA。
论文题目:SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks
论文地址:https://arxiv.org/abs/1812.11703
项目地址:http://bo-li.info/SiamRPN++/
[6] Deeper and Wider SiamRPN,将网络加深加宽来提升性能,重点关注感受野和padding的影响。
论文题目:Deeper and Wider Siamese Networks for Real-Time Visual Tracking
论文地址:https://arxiv.org/abs/1901.01660
test code:https://gitlab.com/MSRA_NLPR/deeper_wider_siamese_trackers
[1] Salient Object Detection文章
Deep Reasoning with Multi-scale Context for Salient Object Detection
论文链接:https://arxiv.org/abs/1901.08362
[2] 交通场景异常检测综述
Anomaly Detection in Road Traffic Using Visual Surveillance: A Survey
论文链接:https://arxiv.org/abs/1901.08292
[3] 3D目标检测
3D Backbone Network for 3D Object Detection
论文链接:https://arxiv.org/abs/1901.08373
[4] 语义分割文章
Application of Decision Rules for Handling Class Imbalance in Semantic Segmentation
论文链接:https://arxiv.org/abs/1901.08394
[5] 目标检测文章
Object Detection based on Region Decomposition and Assembly
论文链接:https://arxiv.org/abs/1901.08225
[6] 牛津的图卷积网络文章
Hypergraph Convolution and Hypergraph Attention
论文链接:https://arxiv.org/abs/1901.08150
[1] 京东PoseTrack2018亚军方案的技术报告
A Top-down Approach to Articulated Human Pose Estimation and Tracking
论文链接:https://arxiv.org/abs/1901.07680
[2] 投稿TNNLS网络压缩文章
Towards Compact ConvNets via Structure-Sparsity Regularized Filter Pruning
论文链接:https://arxiv.org/abs/1901.07827
代码:https://github.com/ShaohuiLin/SSR
[3] 港中文&商汤 DeepFashion数据集
DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
论文链接:https://arxiv.org/abs/1901.07973
代码:https://github.com/switchablenorms/DeepFashion2
[4]目标检测文章
Bottom-up Object Detection by Grouping Extreme and Center Points
论文链接:https://arxiv.org/abs/1901.08043
代码:https://github.com/xingyizhou/ExtremeNet
[1] 商汤 COCO2018 检测任务冠军方案文章
Winning entry of COCO 2018 Challenge (object detection task) Hybrid Task Cascade for Instance Segmentation
https://arxiv.org/abs/1901.07518
[2] 小米用NAS做超分的技术报告
Fast, Accurate and Lightweight Super-Resolution with Neural Architecture Search
https://arxiv.org/abs/1901.07261
[3] 目标检测文章
Consistent Optimization for Single-Shot Object Detection
https://arxiv.org/abs/1901.06563
[4] 商汤的不均衡样本分类文章
Dynamic Curriculum Learning for Imbalanced Data Classification
https://arxiv.org/abs/1901.06783
[5] 人脸检测文章
Improved Selective Refinement Network for Face Detection
https://arxiv.org/abs/1901.06651
[6] 旷视的零售商品数据集
RPC: A Large-Scale Retail Product Checkout Dataset
https://arxiv.org/abs/1901.07249
[7] 人体属性识别综述
Pedestrian Attribute Recognition: A Survey
https://arxiv.org/abs/1901.07474
项目地址:https://sites.google.com/view/ahu-pedestrianattributes/
推荐文章