2018顶级会议论文——ECCV 2018

欧洲计算机视觉国际会议(European Conference on Computer Vision,ECCV)每两年一次,是计算机视觉三大会议(另外两个是ICCV和CVPR)之一。每次会议在全球范围录用论文300篇左右,主要的录用论文都来自美国、欧洲等顶尖实验室及研究所,中国大陆的论文数量一般在10-20篇之间。ECCV2010的论文录取率为27%。

会议时间:9月8日~14日

会议地点:慕尼黑,德国

本届大会收到论文投稿 2439 篇,接收 776 篇(31.8%),59 篇 oral 论文,717 篇 poster 论文。在活动方面,ECCV 2018 共有 43 场 Workshop 和 11 场 Tutorial。

最佳论文Best Paper Award(一篇)

《Implicit 3D Orientation Learning for 6D Object Detection from RGB Images》

Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel

【Abstract】We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization.

This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Experiments on the T-LESS and LineMOD datasets show that our method outperforms similar model- based approaches and competes with state-of-the art approaches that require real pose-annotated images.

摘要
本文提出了一种基于RGB图像的实时物体检测与6维姿态估计的方法。其中,新型的3维目标朝向估计方法是基于降噪自编码器(Denoising Autoencoder)的一个变种,它使用域随机化(Domain Randomization)方法在3维模型的模拟视图上进行训练。这种我们称之为“增强自编码器”(Augmented Autoencoder,AAE)的方法,比现有方法具有很多优点:它不需要真实的姿势标注的训练数据,可泛化到多种测试传感器,且能够内部处理目标和视图的对称性。该方法不学习从输入图像到目标姿势的明确映射,相反,它提供了样本在隐空间(latent space)中定义的目标朝向的隐式表达。在 T-LESS 和 LineMOD 数据集上的测试表明,我们的方法优于类似的基于模型的方法,可以媲美需要真实姿态标注图像的当前最优的方法。

最佳论文提名

Best Paper Award, Honorable Mention(两篇)

《Group Normalization》

【Abstract】Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems — BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform its BN- based counterparts for object detection and segmentation in COCO,1 and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.

摘要
批量归一化(Batch Normalization,BN)是深度学习发展中的一项里程碑式技术,可以让各种网络进行训练。但是,批量维度进行归一化会带来一些问题——批量统计估算不准确导致批量变小时,BN的误差会迅速增加。因此,BN在训练大型网络或者将特征转移到计算机视觉任务(包括检测、分割和视频)的应用受到了限制,因为在这类问题中,内存消耗限制了只能使用小批量的BN。在这篇论文中,作者提出了群组归一化(Group Normalization,GN)的方法作为 BN 的替代方法。GN首先将通道(channel)分为许多组(group),对每一组计算均值和方差,以进行归一化。GN的计算与批大小(batch size)无关,并且它的精度在不同批大小的情况中都很稳定。在ImageNet上训练的ResNet-50上,当批量大小为2时,GN的误差比BN低10.6%。当使用经典的批量大小时,GN与BN相当,但优于其他归一化变体。此外,GN 可以很自然地从预训练阶段迁移到微调阶段。在COCO的目标检测和分割任务以及Kinetics的视频分类任务中,GN的性能优于或与BN变体相当,这表明GN可以在一系列不同任务中有效替代BN;在现代的深度学习库中,GN通过若干行代码即可轻松实现。

《GANimation: Anatomically-aware Facial Animation from a Single Image》

【Abstract】Recent advances in Generative Adversarial Networks(GANs) have shown impressive results for task of facial expression synthesis. The most successful architecture is StarGAN [4], that conditions GANs’ generation process with images of a specific domain, namely a set of images of persons sharing the same expression. While effective, this approach can only generate a discrete number of expressions, determined by the content of the dataset. To address this limitation, in this paper, we introduce a novel GAN conditioning scheme based on Action Units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression. Our approach allows controlling the magnitude of activation of each AU and combine several of them. Additionally, we propose a fully unsupervised strategy to train the model, that only requires images annotated with their activated AUs, and exploit attention mechanisms that make our network robust to changing backgrounds and lighting conditions. Extensive evaluation show that our approach goes beyond competing conditional generators both in the capability to synthesize a much wider range of expressions ruled by anatomically feasible muscle movements, as in the capacity of dealing with images in the wild.

摘要
生成式对抗网络(Generative Adversarial Networks, GANs)近期在面部表情合成任务中取得了惊人表现,其中最成功的架构是StarGAN,它把GANs的图像生成过程限定在了特定情形中,即一组不同的人做出同一个表情的图像。这种方法虽然有效,但只能生成若干离散的表情,具体生成哪一种取决于训练数据内容。为了处理这种限制问题,本文提出了一种新的GAN条件限定方法,该方法基于动作单元(Action Units,AU)标注,而在连续的流形中,动作单元标注可以描述定义人类表情的解剖学面部动作。这种方法可以使我们控制每个AU的激活程度,并将之组合。除此以外,本文还提出一种完全无监督的方法用来训练模型,只需要标注了激活的AU的图像,并通过应用注意力机制(attention mechanism)就可使网络对背景和光照条件的改变保持鲁棒性。大量评估表明该方法比其他的条件生成方法有明显更好的表现,不仅表现在有能力根据解剖学上可用的肌肉动作生成多样的表情,而且也能更好地处理来自户外的图像。
 

你可能感兴趣的:(图像,GAN)