3D Bounding Box Estimation Using Deep Learning and Geometry

3D Bounding Box Estimation Using Deep Learning and Geometry

Arsalan Mousavian, Dragomir Anguelov, John Flynn, Jana Kosecka

George Mason University
Zoox, Inc.
https://zoox.com/

geometry [dʒɪ'ɒmɪtrɪ]:n. 几何学,几何结构
George Mason University,GMU, Mason, or George Mason:乔治梅森大学
geometric [,dʒɪə'metrɪk]:adj. 几何学的,几何学图形的
reason ['riːz(ə)n]:n. 理由,理性,动机 vi. 推论,劝说 vt. 说服,推论,辩论
monocular [mə'nɒkjʊlə]:adj. 单眼的,单眼用的
localization [,ləʊkəlaɪ'zeɪʃən]:n. 定位,局限,地方化
Computer Science,CS:计算机科学
Computer Vision,CV:计算机视觉
headquarter ['hedkwɔːtə]:v. 将 (组织的) 总部设在某地,设立总部
Foster City:福斯特城
State of California,California:加利福尼亚州

arXiv (archive - the X represents the Greek letter chi [χ]) is a repository of electronic preprints approved for posting after moderation, but not full peer review.

Zoox is a self-driving technology development company, headquartered in Foster City, California.

*Work done as an intern at Zoox, Inc.

Abstract

We present a method for 3D object detection and pose estimation from a single image. In contrast to current techniques that only regress the 3D orientation of an object, our method first regresses relatively stable 3D object properties using a deep convolutional neural network and then combines these estimates with geometric constraints provided by a 2D object bounding box to produce a complete 3D bounding box. The first network output estimates the 3D object orientation using a novel hybrid discrete-continuous loss, which significantly outperforms the L2 loss. The second output regresses the 3D object dimensions, which have relatively little variance compared to alternatives and can often be predicted for many object types. These estimates, combined with the geometric constraints on translation imposed by the 2D bounding box, enable us to recover a stable and accurate 3D object pose. We evaluate our method on the challenging KITTI object detection benchmark [2] both on the official metric of 3D orientation estimation and also on the accuracy of the obtained 3D bounding boxes. Although conceptually simple, our method outperforms more complex and computationally expensive approaches that leverage semantic segmentation, instance level segmentation and flat ground priors [4] and sub-category detection [23][24]. Our discrete-continuous loss also produces state of the art results for 3D viewpoint estimation on the Pascal 3D+ dataset [26].
我们提出了一种用于在单个图像上进行 3D 物体检测和姿势估计的方法。与仅回归物体的 3D 方向的当前技术相比,我们的方法首先使用深度卷积神经网络回归相对稳定的 3D 物体属性,然后将这些估计与 2D 物体边界框提供的几何约束组合以产生完整的 3D 边界框。第一个网络输出使用新颖的混合 discrete-continuous loss 来估计 3D 物体方向,其明显优于 L2 loss。第二个输出对 3D 物体维度进行回归,与备选方案相比,它具有相对较小的方差,并且通常可以针对许多物体类型进行预测。这些估计结合 2D 边界框施加的平移几何约束,使我们能够恢复稳定和精确的 3D 物体姿势。我们在具有挑战性的 KITTI 物体检测基准 [2] 上评估我们的方法,包括 3D 方向估计的官方度量以及获得的 3D 边界框的准确性。虽然在概念上很简单,但我们的方法优于更复杂和计算成本更高的方法,这些方法利用语义分割、实例级分割和 flat ground 先验知识 [4] 和子类别检测 [23] [24]。我们的 discrete-continuous loss 也在 Pascal 3D + 数据集上的 3D 视点估计上产生了最先进的结果 [26]。

constraint [kən'streɪnt]:n. 约束,局促,态度不自然,强制
hybrid ['haɪbrɪd]:n. 杂种,混血儿,混合物 adj. 混合的,杂种的
conceptually [kən'sɛptʃʊəli]:adv. 概念上
leverage ['liːv(ə)rɪdʒ; 'lev(ə)rɪdʒ]:n. 手段,影响力,杠杆作用,杠杆效率 v. 利用,举债经营
semantic [sɪ'mæntɪk]:adj. 语义的,语义学的 (等于 semantical)

1. Introduction

3. 3D Bounding Box Estimation

(2) x m i n = ( K   [ R   T ]   [ d x / 2 − d y / 2 d z / 2 1 ] ) x x_{min} = \left( K \, [R \, T] \, \begin{bmatrix} d_{x}/2\\ -d_{y}/2\\ d_{z}/2\\ 1\\ \end{bmatrix} \right)_{x} \tag{2} xmin=K[RT]dx/2dy/2dz/21x(2)

你可能感兴趣的:(object,detection,-,目标检测)