CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)

点击我爱计算机视觉标星,更快获取CVML新技术


CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第1张图片

今天新出了14篇CVPR2019的论文,CV君汇总了他们的简略信息,有代码的也一并列出了,感兴趣的朋友,可以文末下载细读。

Video Generation from Single Semantic Label Map

Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang

来自商汤的,一种新的生成任务,单语义标签图生成视频,比vid2vid更高一筹。

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第2张图片

This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process. Different from typical end-to-end approaches, which model both scene content and dynamics in a single step, we propose to decompose this difficult task into two sub-problems. As current image generation methods do better than video generation in terms of detail, we synthesize high quality content by only generating the first frame. Then we animate the scene based on its semantic meaning to obtain the temporally coherent video, giving us excellent results overall. We employ a cVAE for predicting optical flow as a beneficial intermediate step to generate a video sequence conditioned on the initial single frame. A semantic label map is integrated into the flow prediction module to achieve major improvements in the image-to-video generation process. Extensive experiments on the Cityscapes dataset show that our method outperforms all competing methods.

https://arxiv.org/abs/1903.04480

https://github.com/junting/seg2vid

Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation

Andrea Pilzer, Stéphane Lathuilière, Nicu Sebe, Elisa Ricci

知识蒸馏与提精,用于非监督的单目深度估计。

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第3张图片

Nowadays, the majority of state of the art monocular depth estimation techniques are based on supervised deep learning models. However, collecting RGB images with associated depth maps is a very time consuming procedure. Therefore, recent works have proposed deep architectures for addressing the monocular depth prediction task as a reconstruction problem, thus avoiding the need of collecting ground-truth depth. Following these works, we propose a novel self-supervised deep model for estimating depth maps. Our framework exploits two main strategies: refinement via cycle-inconsistency and distillation. Specifically, first a \emph{student} network is trained to predict a disparity map such as to recover from a frame in a camera view the associated image in the opposite view. Then, a backward cycle network is applied to the generated image to re-synthesize back the input image, estimating the opposite disparity. A third network exploits the inconsistency between the original and the reconstructed input frame in order to output a refined depth map. Finally, knowledge distillation is exploited, such as to transfer information from the refinement network to the student. Our extensive experimental evaluation demonstrate the effectiveness of the proposed framework which outperforms state of the art unsupervised methods on the KITTI benchmark.

https://arxiv.org/abs/1903.04202

Structured Knowledge Distillation for Semantic Segmentation

Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, Jingdong Wang

用于语义分割的结构化知识蒸馏,具体解读见今天的另一篇文章。

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第4张图片

In this paper, we investigate the knowledge distillation strategy for training small semantic segmentation networks by making use of large networks. We start from the straightforward scheme, pixel-wise distillation, which applies the distillation scheme adopted for image classification and performs knowledge distillation for each pixel~\emph{separately}. We further propose to distill the \emph{structured} knowledge from large networks to small networks, which is motivated by that semantic segmentation is a structured prediction problem. We study two structured distillation schemes: (i) \emph{pair-wise} distillation that distills the pairwise similarities, and (ii) \emph{holistic} distillation that uses GAN to distill holistic knowledge. The effectiveness of our knowledge distillation approaches is demonstrated by extensive experiments on three scene parsing datasets: Cityscapes, Camvid and ADE20K.

https://arxiv.org/abs/1903.04197

HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs

Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P. Namboodiri

提出一种新的深度学习架构,使用异质的卷积核,感觉影响力会很大!(抱歉下面图放错了)

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第5张图片

We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. The proposed HetConv (Heterogeneous Kernel-Based Convolution) reduces the computation (FLOPs) and the number of parameters as compared to standard convolution operation while still maintaining representational efficiency. To show the effectiveness of our proposed convolution, we present extensive experimental results on the standard convolutional neural network (CNN) architectures such as VGG \cite{vgg2014very} and ResNet \cite{resnet}. We find that after replacing the standard convolutional filters in these architectures with our proposed HetConv filters, we achieve 3X to 8X FLOPs based improvement in speed while still maintaining (and sometimes improving) the accuracy. We also compare our proposed convolutions with group/depth wise convolutions and show that it achieves more FLOPs reduction with significantly higher accuracy.

https://arxiv.org/abs/1903.04120

Sliced Wasserstein Discrepancy for Unsupervised Domain Adaptation

非监督域适应的新方法Sliced Wasserstein Discrepancy,改进了多种计算机视觉任务,感觉很有前途~

Chen-Yu Lee, Tanmay Batra, Mohammad Haris Baig, Daniel Ulbricht

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第6张图片

In this work, we connect two distinct concepts for unsupervised domain adaptation: feature distribution alignment between domains by utilizing the task-specific decision boundary and the Wasserstein metric. Our proposed sliced Wasserstein discrepancy (SWD) is designed to capture the natural notion of dissimilarity between the outputs of task-specific classifiers. It provides a geometrically meaningful guidance to detect target samples that are far from the support of the source and enables efficient distribution alignment in an end-to-end trainable fashion. In the experiments, we validate the effectiveness and genericness of our method on digit and sign recognition, image classification, semantic segmentation, and object detection.

https://arxiv.org/abs/1903.04064

Group-wise Correlation Stereo Network

Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, Hongsheng Li

来自商汤科技,新的立体视觉网络架构。

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第7张图片

Stereo matching estimates the disparity between a rectified image pair, which is of great importance to depth sensing, autonomous driving, and other related tasks. Previous works built cost volumes with cross-correlation or concatenation of left and right features across all disparity levels, and then a 2D or 3D convolutional neural network is utilized to regress the disparity maps. In this paper, we propose to construct the cost volume by group-wise correlation. The left features and the right features are divided into groups along the channel dimension, and correlation maps are computed among each group to obtain multiple matching cost proposals, which are then packed into a cost volume. Group-wise correlation provides efficient representations for measuring feature similarities and will not lose too much information like full correlation. It also preserves better performance when reducing parameters compared with previous methods. The 3D stacked hourglass network proposed in previous works is improved to boost the performance and decrease the inference computational cost. Experiment results show that our method outperforms previous methods on Scene Flow, KITTI 2012, and KITTI 2015 datasets. The code is available at https://github.com/xy-guo/GwcNet

https://arxiv.org/abs/1903.04025

Deep Robust Subjective Visual Property Prediction in Crowdsourcing

Qianqian Xu, Zhiyong Yang, Yangbangyan Jiang, Xiaochun Cao, Qingming Huang, Yuan Yao

众包中的深度鲁棒主观视觉特性预测。

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第8张图片

The problem of estimating subjective visual properties (SVP) of images (e.g., Shoes A is more comfortable than B) is gaining rising attention. Due to its highly subjective nature, different annotators often exhibit different interpretations of scales when adopting absolute value tests. Therefore, recent investigations turn to collect pairwise comparisons via crowdsourcing platforms. However, crowdsourcing data usually contains outliers. For this purpose, it is desired to develop a robust model for learning SVP from crowdsourced noisy annotations. In this paper, we construct a deep SVP prediction model which not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations. Specifically, we construct a comparison multi-graph based on the collected annotations, where different labeling results correspond to edges with different directions between two vertexes. Then, we propose a generalized deep probabilistic framework which consists of an SVP prediction module and an outlier modeling module that work collaboratively and are optimized jointly. Extensive experiments on various benchmark datasets demonstrate that our new approach guarantees promising results.

https://arxiv.org/abs/1903.03956

[oral]Shape2Motion: Joint Analysis of Motion Parts and Attributes from 3D Shapes

Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qinping Zhao, Kai Xu

三维形状的运动组件和属性的联合分析。怎么感觉像计算机图形学范畴。。。北航和国防科大的作品。

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第9张图片

For the task of mobility analysis of 3D shapes, we propose joint analysis for simultaneous motion part segmentation and motion attribute estimation, taking a single 3D model as input. The problem is significantly different from those tackled in the existing works which assume the availability of either a pre-existing shape segmentation or multiple 3D models in different motion states. To that end, we develop Shape2Motion which takes a single 3D point cloud as input, and jointly computes a mobility-oriented segmentation and the associated motion attributes. Shape2Motion is comprised of two deep neural networks designed for mobility proposal generation and mobility optimization, respectively. The key contribution of these networks is the novel motion-driven features and losses used in both motion part segmentation and motion attribute estimation. This is based on the observation that the movement of a functional part preserves the shape structure. We evaluate Shape2Motion with a newly proposed benchmark for mobility analysis of 3D shapes. Results demonstrate that our method achieves the state-of-the-art performance both in terms of motion part segmentation and motion attribute estimation.

https://arxiv.org/abs/1903.03911

www.kevinkaixu.net/shape2motion.html

Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation (POINT^2)

Haofu Liao, Wei-An Lin, Jiarui Zhang, Jingdan Zhang, Jiebo Luo, S. Kevin Zhou

多视角的2D/3D刚体配准。

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第10张图片

We propose to tackle the problem of multiview 2D/3D rigid registration for intervention via a Point-Of-Interest Network for Tracking and Triangulation (POINT^2). POINT^2 learns to establish 2D point-to-point correspondences between the pre- and intra-intervention images by tracking a set of random POIs. The 3D pose of the pre-intervention volume is then estimated through a triangulation layer. In POINT^2, the unified framework of the POI tracker and the triangulation layer enables learning informative 2D features and estimating 3D pose jointly. In contrast to existing approaches, POINT^2 only requires a single forward-pass to achieve a reliable 2D/3D registration. As the POI tracker is shift-invariant, POINT^2 is more robust to the initial pose of the 3D pre-intervention image. Extensive experiments on a large-scale clinical cone-beam CT (CBCT) dataset show that the proposed POINT^2 method outperforms the existing learning-based method in terms of accuracy, robustness and running time. Furthermore, when used as an initial pose estimator, our method also improves the robustness and speed of the state-of-the-art optimization-based approaches by ten folds.

https://arxiv.org/abs/1903.03896

Fast Single Image Reflection Suppression via Convex Optimization

Yang Yang, Wenye Ma, Yin Zheng, Jian-Feng Cai, Weiyu Xu

通过凸优化进行快速单图像反光抑制,出自腾讯。

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第11张图片

Removing undesired reflections from images taken through the glass is of great importance in computer vision. It serves as a means to enhance the image quality for aesthetic purposes as well as to preprocess images in machine learning and pattern recognition applications. We propose a convex model to suppress the reflection from a single input image. Our model implies a partial differential equation with gradient thresholding, which is solved efficiently using Discrete Cosine Transform. Extensive experiments on synthetic and real-world images demonstrate that our approach achieves desirable reflection suppression results and dramatically reduces the execution time compared to the state of the art.

https://arxiv.org/abs/1903.03889

https://github.com/yyhz76/reflectSuppress

SSN: Learning Sparse Switchable Normalization via SparsestMax

Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo

通过SparsestMax学习稀疏可切换规范化,出自商汤。

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第12张图片

Normalization methods improve both optimization and generalization of ConvNets. To further boost performance, the recently-proposed switchable normalization (SN) provides a new perspective for deep learning: it learns to select different normalizers for different convolution layers of a ConvNet. However, SN uses softmax function to learn importance ratios to combine normalizers, leading to redundant computations compared to a single normalizer. 

This work addresses this issue by presenting Sparse Switchable Normalization (SSN) where the importance ratios are constrained to be sparse. Unlike [] and [] constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax. SSN has several appealing properties. (1) It inherits all benefits from SN such as applicability in various tasks and robustness to a wide range of batch sizes. (2) It is guaranteed to select only one normalizer for each normalization layer, avoiding redundant computations. (3) SSN can be transferred to various tasks in an end-to-end manner. Extensive experiments show that SSN outperforms its counterparts on various challenging benchmarks such as ImageNet, Cityscapes, ADE20K, and Kinetics.

https://arxiv.org/abs/1903.03793

Combining 3D Morphable Models: A Large scale Face-and-Head Model

从图像中进行头部建模。

Stylianos Ploumpis, Haoyang Wang, Nick Pears, William A. P. Smith, Stefanos Zafeiriou

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第13张图片

Three-dimensional Morphable Models (3DMMs) are powerful statistical tools for representing the 3D surfaces of an object class. In this context, we identify an interesting question that has previously not received research attention: is it possible to combine two or more 3DMMs that (a) are built using different templates that perhaps only partly overlap, (b) have different representation capabilities and (c) are built from different datasets that may not be publicly-available? In answering this question, we make two contributions. First, we propose two methods for solving this problem: i. use a regressor to complete missing parts of one model using the other, ii. use the Gaussian Process framework to blend covariance matrices from multiple models. Second, as an example application of our approach, we build a new face-and-head shape model that combines the variability and facial detail of the LSFM with the full head modelling of the LYHM. The resulting combined shape model achieves state-of-the-art performance and outperforms existing head models by a large margin. Finally, as an application experiment, we reconstruct full head representations from single, unconstrained images by utilizing our proposed large-scale model in conjunction with the FaceWarehouse blendshapes for handling expressions.

https://arxiv.org/abs/1903.03785

Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

Xin Li, Yiming Zhou, Zheng Pan, Jiashi Feng

Partial Order减枝:在神经结构搜索中实现最佳速度/准确性权衡

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第14张图片

Achieving good speed and accuracy trade-off on target platform is very important in deploying deep neural networks. Most existing automatic architecture search approaches only pursue high performance but ignores such an important factor. In this work, we propose an algorithm "Partial Order Pruning" to prune architecture search space with partial order assumption, quickly lift the boundary of speed/accuracy trade-off on target platform, and automatically search the architecture with the best speed and accuracy trade-off. Our algorithm explicitly take profile information about the inference speed on target platform into consideration. With the proposed algorithm, we present several "Dongfeng" networks that provide high accuracy and fast inference speed on various application GPU platforms. By further searching decoder architecture, our DF-Seg real-time segmentation models yields state-of-the-art speed/accuracy trade-off on both embedded device and high-end GPU.

https://arxiv.org/abs/1903.03777

https://github.com/lixincn2015/Partial-Order-Pruning

Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks

Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese

来自李飞飞组。

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第15张图片

Many robotic applications require the agent to perform long-horizon tasks in partially observable environments. In such applications, decision making at any step can depend on observations received far in the past. Hence, being able to properly memorize and utilize the long-term history is crucial. In this work, we propose a novel memory-based policy, named Scene Memory Transformer (SMT). The proposed policy embeds and adds each observation to a memory and uses the attention mechanism to exploit spatio-temporal dependencies. This model is generic and can be efficiently trained with reinforcement learning over long episodes. On a range of visual navigation tasks, SMT demonstrates superior performance to existing reactive and memory-based policies by a margin.

https://arxiv.org/abs/1903.03878

https://sites.google.com/view/scene-memory-transformer

论文下载

在“我爱计算机视觉”公众号对话界面回复“cvpr312”,即可收到上述所有论文的百度云下载地址。

加群交流

关注计算机视觉与机器学习技术,欢迎加入52CV群,扫码添加52CV君拉你入群,

请务必注明:52CV

CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等)_第16张图片

喜欢在QQ交流的童鞋,可以加52CV官方QQ群702781905

(不会时时在线,如果没能及时通过验证还请见谅)


长按关注我爱计算机视觉

更多CV技术干货,请点击阅读原文查看。

你可能感兴趣的:(CVPR 2019 | 今日新出14篇论文汇总(来自微软、商汤、腾讯、斯坦福等))