• We design a novel deep net architecture suitable for consuming unordered point sets in 3D;
• We show how such a net can be trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks;
• We provide thorough empirical and theoretical analysis on the stability and efficiency of our method;
• We illustrate the 3D features computed by the selected neurons in the net and develop intuitive explanations for its performance.
• 对点的丢失具有较高的鲁棒性
1.特征提取:将三维的点映射到更高维的空间(MLP),然后再通过对称性的操作(max pooling)(因为是高维空间所以避免了信息的丢失,同时完成了置换不变性),然后再通过一个函数γ(MLP)来得到output scores。
6.再通过级联的全连接网络得到,output scores,进行K哥class的分类。
Pointnet++是作者自己对Pointnet的升级版, 该网络设计了多级点云特征提取的方案,解决了Pointnet对局部特征提取不足的缺点, 使得该网络的学习效果更好;同时该网络在同一级上结合了不同尺度上的学习特征,使得了该网络具有较强的鲁棒性(更好的解决了点丢失时对分类结果的影响)。
2.能够实现hierarchical feature learning(多级点云特征学习)、解决translation invariant 和permutation invariant。(解决了平移不变性和置换不变性)
(1)3D Shape Classification
(2)3D Object Detection and Tracking
(3)3D Ponit Cloud Segementation
(4) Hierarchical Data Structure-based Methods
(5)Other Methods
3D object detection methods can be divided into two categories: region proposal based and single shot methods. (3维检测也是分为两大类,基于 region proposal 和single shot的方法)
Several milestone methods are presented
1.Region Proposal-based Methods
These methods first propose several possible regions (also called proposals) containing objects, and then extract region wise features to determine the category label of each proposal. (先根据特征向量生成proposals,然后再在每个region中根据其特征向量对其设别其类别。)
According to their object proposal generation approach, these methods can further be divided into three categories: multi-view based, segmentation based and frustum-based methods.(根据proposal的生成方式不同,我们又将其分为基于视图的方法、基于分割的方法和基于视锥的方法)
(1)Multi-view based Methods
该方法根据不同的视图(例如:LiDAR front view, Bird’s Eye View (BEV), and image等)来生成proposal,该方法的计算成本通常比较高。
该方法的最经典算法是2017年清华的MV3D,该方法首先在BEV视图中进行高精度的2DCNN,生成相应的ROI,然后将该ROI投影到其他视图中,再结合不同视图中的proposla生成3D bounding box,虽然该方法的精度非常高了,但是其运算速度特别的慢,该领域内接下来的工作便是从两个方向改进该方法(多是提高检测速度。)
First, several methods have been proposed to efficiently fuse the information of different modalities.Second, different methods have been investigated to extract robust representations of the input data.
(1)MV3D(Multi-view 3D object detection network for autonomous driving,)、
(2)Joint 3D proposal generation and object detection from view aggregation、
(3)Deep continuous fusion for multi-sensor 3D object detection、
Multi-task multi-sensor fusion for 3D object detection、
(4)PIXOR: Real-time 3D object detection from point clouds、
(5)Fast and furious: Real time end-to-end 3D detection, tracking and motion forecasting with a single convolutional net、
(6)RT3D: Real-time 3D vehicle detection in lidar point cloud for autonomous driving、
(2)Segmentation-based Methods
These methods first leverage existing semantic segmentation techniques to remove most background points, and then generate a large amount of high quality proposals on foreground points to save computation,(先筛选前景点和背景点,然后删除背景点,在前景点中生成proposal)
these methods achieve higher object recall rates and are more suitable for complicated scenes with highly occluded and crowded objects。
(1)PonitRCNN。 Specifically, they directly segmented 3D point clouds to obtain foreground points and then fused semantic features and local spatial
features to produce high-quality 3D boxes.
(3)STD: Sparse-to-dense 3D object detector for point cloud中用一种球形anchor来关联每个点,然后使用每个点的语义评分来删除多余的锚点,使得该网络有更高的召回率。
(3)Frustum-based Methods
These methods first leverage existing 2D object detectors to generate 2D candidate regions of objects and then extract a 3D frustum proposal for each 2D candidate region (现在2维图像中进行检测,然后根据二维图像检测的proposal和相机的外参,生成三维的proposal)(该方法依赖二维的检测精度,并且不能很好的解决物体遮挡问题)
These methods directly predict class probabilities and regress 3D bounding boxes of objects using a single-stage network. They do not need region proposal generation and post-processing. As a result, they can run at a high speed.(运行速度快)
According to the type of input data, single shot methods can be divided into three categories: BEV-based, discretization based and point-based methods.
(1)BEV-based Methods.
These methods mainly take BEV representation as their input.
(2)Discretization-based Methods.
These methods convert a point cloud into a regular discrete representation, and then apply CNN to predict both categories and 3D boxes of objects.
(3)Point-based Methods:
These methods directly take raw point clouds as their inputs.
Given the locations of an object in the first frame, the task of object tracking is to estimate its state in subsequent frames . Since 3D object tracking can use the rich geometric information in point clouds, it is expected to overcome several drawbacks faced by image-based tracking, including occlusion, illumination and scale variation.
三维场流估计有一些介绍文章 https://zhuanlan.zhihu.com/p/85663856
(6)Just go with the flow
(2)3D object detector有两个缺点:对远处的物体识别能力较差, 和并没有充分的运用图像中的纹理信息。
(4) 3D object tracking and scene flow estimation are emerging research topics
According to the segmentation granularity, 3D point cloud segmentation methods can be classified into three categories: semantic segmentation (scene level), instance segmentation (object level) and part segmentation (part level).
1) 3D Semantic Segmentation
there are four paradigms for semantic segmentation: projection-based, discretization-based, point-based, and hybrid methods.(四种:基于投影,基于离散化,基于点的和混合式)
(1)Proposal-based Methods
These methods convert the instance segmentation problem into two sub-tasks: 3D object detection and instance mask prediction.
(2)Proposal-free Methods
they usually consider instance segmentation as a subsequent clustering step after semantic segmentation.
This paper introduces a new neural network structure (3D-SIS). The network proposes 2D-3D joint learning for the first time. It also learns from geometry and RGB to improve the effect of instance segmentation; at the same time, the network is fully convolutional end-to-end Network, so it can run efficiently in a large three-dimensional environment.
input:(1)3D scan geometry features(2)2D RGB input features
output:(1)3D object bounding boxes (2)class labels(3)instance masks
This paper introduce 3D-SIS, a new approach for 3D semantic instance segmentation of RGB-D scans, which is trained in an end-to-end fashion to detect object instances and jointly learn features from RGB and geometry data.
The core idea of the method is to jointly learn features from RGB and geometry data using multi-view RGB-D input recorded with commodity RGB-D sensors, thus enabling accurate instance predictions.
(1)通过bundle fusion获得该空间的几何信息(用体素来表示,TSDF)
二、通过3D Detection Backbone 进行三维物体检测
(1)分别对3D Geomentry 和3D Color Features 进行三维卷积,然后再将其三维卷积得到的特征进行融合。
(2)对(1)中的融合结果通过anchor进行3DRPN生成三维物体的Box Location
(3)结合(2)中具体的三维物体的box和box中具体的三维特征,进行3DROI分别得到每个3D box中每个3维物体的类别,实现检测分类功能。
三、通过3D Mask Backbone 给检测出来的每个三维物体打上mask。
(3)网络结构不够紧凑,分别对两种输入进行了两种3D CNN,部分结构理论上可能可以共享。