Abstract

Specifically, an orientation-encoding unit is designed to describe eight crucial orientations, and multi-scale representation is achieved by stacking several orientation-encoding units. 具体地，一个方向编码单元被设计来描述八个关键的方向，并且通过堆叠几个方向编码单元来实现多尺度表示法。

1. Introduction

在3D物体分类、3D物体侦测和3D语义分割这些任务中，对点云进行语义标注的3D语义分割是比较有挑战性的。

Firstly, the sparseness of point cloud in 3D space makes most spatial operators inefficient. 首先，点云在三维空间中的稀疏性使得大部分空间算子效率低下。

Moreover, the relationship between points is implicit and difficult to be represented due to the unordered and unstructured property of point cloud. 其次，由于点云的无序性和非结构化，点与点之间的关系是隐式的，难以表示。

有几种解决方法：handcrafted voxel feature & 2D CNN features from RGBD images。

Additionally, there is a dilemma between 2D convolution and 3D convolution: 2D convolution fails to capture 3D geometry information such as normal and shape while 3D convolution requires heavy computation. 另外，在二维卷积和三维卷积之间存在一个两难的问题:二维卷积无法捕捉到法线、形状等三维几何信息，而三维卷积需要大量的计算。

Recently, PointNet architecture [22] directly operates on point cloud instead of 3D voxel grid or mesh. It not only accelerates computation but also notably improves the segmentation performance. 最近，PointNet体系结构[22]直接运行在点云上，而不是3D体素网格或网格。它不仅加快了计算速度，而且显著提高了分割性能。

We get inspiration from the successful feature detection algorithm Scale-invariant feature transform (SIFT) [15] which involves two key properties: scale-awareness and orientation-encoding.

we design a novel module called PointSIFT for 3D understanding that possesses the properties.

The basic building block of our PointSIFT module is an orientation-encoding (OE) unit that convolves the features of the nearest points in 8 orientations. 我们的PointSIFT模块的基本构件是一个方向编码(OE)单元，它在8个方向上卷积最近点的特征。

In comparison to K-nearest neighbor search in PointNet++ [24] where K neighbors may fall in one orientation, our OE unit captures information of all orientations. We further stack several OE units in one PointSIFT module for representation of different scales. 与PointNet++[24]中的K近邻搜索(其中K个近邻可能位于一个方向)相比，我们的OE单元捕获所有方向的信息。我们进一步堆叠几个OE单位在一个PointSIFT模块表示不同的比例。

In order to make the whole architecture scale-aware, we connect these OE units by shortcuts and jointly optimize for adaptive scales. 为了使整个架构具有尺度感知能力，我们通过快捷方式将这些OE单元连接起来，共同优化自适应尺度。

We further build a hierarchical architecture that recursively applies the PointSIFT module as local feature descriptor. 我们进一步构建了递归应用PointSIFT模块作为局部特征描述符的层次结构。

Resembling conventional segmentation framework in 2D [26] and 3D [24], our two-stage network first downsamples the point cloud for effective calculation and then upsamples to get dense predictions. 与传统的二维[26]和三维[24]分割框架类似，我们的两阶段网络首先对点云进行降采样以进行有效计算，然后再对点云进行上采样以获得稠密预测。

3. Problem Statement

4. Our Method

encode-decode (downsample-upsample) framework

In the downsampling stage, we recursivelyapply our proposed PointSIFT module combined with setabstraction (SA) module introduced in [24] for hierarchicalfeature embedding.

For upsampling stage dense feature isenabled by effectively interleaving feature propagation (FP)module [24] with PointSIFT module.

4.1. PointSIFT Module

Given an n * d matrix as input which describes a point set of size n with d dimension feature for every point, PointSIFT module outputs an n * d matrix that assigns a new d dimension feature to every point.

4.1.1 Orientation-encoding

Local descriptors in previous methods typically apply unordered operation (e.g., max pooling [24, 32]) based on the observation that point cloud is unordered and unstructured.

However, using ordered operator could be much more informative(max pooling discards all inputs except for the maximum) while still preserves the invariance to order ofinput points. 然而，使用有序操作符可以提供更多的信息(最大池丢弃除最大值以外的所有输入)，同时仍然保持输入点的顺序不变。

One natural ordering for point cloud is the one induced by the ordering of the three coordinates. 点云的一种自然排序是由三个坐标的排序导出的。This observation leads us to the Orientation-encoding(OE) unit which is a point-wise local feature descriptor that encodes information of eight orientations.

The first stage of OE embedding is Stacked 8-eighborhood(S8N) Search which finds nearest neighbors in each of the eight octants partitioned by ordering of three coordinates. 该算法的第一个阶段是8-邻域(S8N)搜索，通过三个坐标的排序，在每个分区中找到最近的邻域。

Since distant points provides little information for description of local patterns, when no point exists within searching radius r in some octant, we duplicate p0 as the nearest neighbor of itself. 由于距离远的点提供的局部模式描述信息较少，当某个八分域的搜索半径r内没有点存在时，我们将p0复制为其最近邻。

We further process features of those neighbors which resides in a 2 * 2 * 2 cube for local pattern description centering at p0. 我们进一步处理位于2 * 2 * 2立方体内的邻居的特征，以p0为中心进行局部模式描述。

Many previous works ignore the structure of data and do max pooling on feature vectors along d dimensions to get new features. However, we believe that ordered operators such as convolution can better exploit the structure of data. Thus we propose orientation-encoding convolution which is a three-stage operator that convolves the 2 * 2 * 2 cube along X, Y , and Z axis successively.

4.1.2 Scale-awareness

stacking several Orientation-encoding (OE) units in PointSIFT module

PointSIFT 阅读笔记