Pointnet提取的全局特征能够很好地完成分类任务,由于网络将所有的点最大池化为了一个全局特征,因此局部点与点之间的联系并没有被网络学习到,导致网络的输出缺乏点云的局部结构特征,因此PointNet对于场景的分割效果十分一般。在点云分类和物体的Part Segmentation中,这样的问题可以通过中心化物体的坐标轴部分地解决,但在场景分割中很难去解决。
原文地址:https://arxiv.org/abs/1706.02413
因此作者在此基础上又提出了能够实现点云作多层特征提取的Pointnet++网络,网络结构如下:
图片来源:https://arxiv.org/abs/1706.02413
下面介绍上图中的网络设计,传统的CNN在进行特征学习时,使用卷积核作为局部感受野,每层的卷积核共享权值,进过多层的特征学习,最后的输出会包含图像的局部特征信息。通过改变中借鉴CNN的采样思路,采取分层特征学习,即在小区域中使用点云采样+成组+提取局部特征(S+G+P)的方式,包含这三部分的机构称为Set Abstraction。
以2D点图为例,整个SA(Set Abstraction)三步的实现过程表示如下:
图片来源:https://arxiv.org/abs/1706.02413
每层新的中心点都是从上一层抽取的特征子集,中心点的个数就是成组的点集数,随着层数增加,中心点的个数也会逐渐降低,抽取到点云的局部结构特征。
当点云不均匀时,每个子区域中如果在分区的时候使用相同的球半径,会导致部分稀疏区域采样点过小。
文中提出**多尺度成组 (MSG)和多分辨率成组 (MRG)**两种解决办法。
简单概括这两种采样方法:
在该网络中作者使用了对输入点云进行随机采样(丢弃)random input dropout(DP)方法。Dropout的设计本身是为了降低过拟合,增强模型的鲁棒性,结果显示对于分类任务的效果也有不错的提升,作者给了一个对比图:
本文中使用的缩写说明:
def sample_and_group(npoint, radius, nsample, xyz, points, knn=False, use_xyz=True):
'''
输入参数说明:
Input:
npoint: int32,中心点的数量(分组数)
radius: float32,ball quary的球半径大小
nsample: int32,区域内采样到的点数
xyz: (batch_size, ndataset, 3) TF tensor,例如:分类任务起始值(32,1024,3)
points: (batch_size, ndataset, channel) TF tensor, 如果为None则等于xyz
knn: bool, True表示使用KNN方法采样,否则使用球半径搜索
use_xyz: bool, True 表示抽取的局部点的特征与xyz进行concat, 否则不进行,默认为True
输出参数说明:
Output:
new_xyz: (batch_size, npoint, 3) TF tensor
new_points: (batch_size, npoint, nsample, 3+channel) TF tensor,点的特征进行了concat
idx: (batch_size, npoint, nsample) TF tensor, 采样的局部区域内点的索引值
grouped_xyz: (batch_size, npoint, nsample, 3) TF tensor, 通过减去xyz对点进行区域归一化
注:源码中没有tf_ops/grouping和sampling/下没有放编译生成对应的链接库.so文件,可能要重新编译才能执行相应的py脚本
'''
#1.对原始点云输入进行采样和分组
new_xyz = gather_point(xyz, farthest_point_sample(npoint, xyz)) # (batch_size, npoint, 3)
if knn:
_,idx = knn_point(nsample, xyz, new_xyz)
else:
idx, pts_cnt = query_ball_point(radius, nsample, xyz, new_xyz)
grouped_xyz = group_point(xyz, idx) # (batch_size, npoint, nsample, 3)
grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization,减去中心点坐标进行区域坐标归一化
#2.对高层次特征进行分组
if points is not None:
grouped_points = group_point(points, idx) # (batch_size, npoint, nsample, channel)
if use_xyz:
new_points = tf.concat([grouped_xyz, grouped_points], axis=-1) # (batch_size, npoint, nample, 3+channel)
else:
new_points = grouped_points
else:
new_points = grouped_xyz
return new_xyz, new_points, idx, grouped_xyz
#在最后一次SA操作中,需要对全部特征进行采样分组
def sample_and_group_all(xyz, points, use_xyz=True):
'''
#输出变为三个参数,功能同上
Inputs:
xyz: (batch_size, ndataset, 3) TF tensor
points: (batch_size, ndataset, channel) TF tensor
use_xyz: bool
输出:
Outputs:
new_xyz: (batch_size, 1, 3) as (0,0,0)
new_points: (batch_size, 1, ndataset, 3+channel) TF tensor
Note:
等价于sample_and_group(npoint=1, radius=inf)以(0,0,0)为重心
'''
batch_size = xyz.get_shape()[0].value
nsample = xyz.get_shape()[1].value
new_xyz = tf.constant(np.tile(np.array([0,0,0]).reshape((1,1,3)), (batch_size,1,1)),dtype=tf.float32) # (batch_size, 1, 3)
idx = tf.constant(np.tile(np.array(range(nsample)).reshape((1,1,nsample)), (batch_size,1,1)))
grouped_xyz = tf.reshape(xyz, (batch_size, 1, nsample, 3)) # (batch_size, npoint=1, nsample, 3)
if points is not None:
if use_xyz:
new_points = tf.concat([xyz, points], axis=2) # (batch_size, 16, 259)
else:
new_points = points
new_points = tf.expand_dims(new_points, 1) # (batch_size, 1, 16, 259)
else:
new_points = grouped_xyz
return new_xyz, new_points, idx, grouped_xyz
def pointnet_sa_module(xyz, points, npoint, radius, nsample, mlp, mlp2, group_all, is_training, bn_decay, scope, bn=True, pooling='max', knn=False, use_xyz=True, use_nchw=False):
''' PointNet Set Abstraction (SA) Module
Input:
xyz: (batch_size, ndataset, 3) TF tensor
points: (batch_size, ndataset, channel) TF tensor
npoint: int32 -- 最远点采样点数(中心点数/成组数)
radius: float32 -- 局部区域的搜索半径
nsample: int32 -- 每个区域内的采样点数
mlp: list of int32 -- 对每个点进行MLP的网络(输出)大小
mlp2: list of int32 -- 对每个区域进行MLP的网络(输出)大小
group_all: bool -- 如果为True,则重写npoint, radius and nsample为None
use_xyz: bool, True 表示抽取的局部点的特征与xyz进行concat, 否则不进行
use_nchw: bool, True, 使用NCHW点云数据格式进行卷积, 作者指出这样比NHWC格式的计算更快
Return:
new_xyz: (batch_size, npoint, 3) TF tensor
new_points: (batch_size, npoint, mlp[-1] or mlp2[-1]) TF tensor
idx: (batch_size, npoint, nsample) int32 -- 区域索引
'''
data_format = 'NCHW' if use_nchw else 'NHWC'
with tf.variable_scope(scope) as sc:
# Sample and Grouping
if group_all:
nsample = xyz.get_shape()[1].value
new_xyz, new_points, idx, grouped_xyz = sample_and_group_all(xyz, points, use_xyz)
else:
new_xyz, new_points, idx, grouped_xyz = sample_and_group(npoint, radius, nsample, xyz, points, knn, use_xyz)
# Point Feature Embedding
if use_nchw: new_points = tf.transpose(new_points, [0,3,1,2])#nchw->nwch
for i, num_out_channel in enumerate(mlp):
new_points = tf_util.conv2d(new_points, num_out_channel, [1,1],
padding='VALID', stride=[1,1],
bn=bn, is_training=is_training,
scope='conv%d'%(i), bn_decay=bn_decay,
data_format=data_format)
if use_nchw: new_points = tf.transpose(new_points, [0,2,3,1])#nchw->nhwc
"""
省略 some code(区域max pooling)
"""
#针对稀疏点云加入多尺度采样(msg)
def pointnet_sa_module_msg(xyz, points, npoint, radius_list, nsample_list, mlp_list, is_training, bn_decay, scope, bn=True, use_xyz=True, use_nchw=False):
''' PointNet Set Abstraction (SA) module with Multi-Scale Grouping (MSG)
Input:
xyz: (batch_size, ndataset, 3) TF tensor
points: (batch_size, ndataset, channel) TF tensor
npoint: int32 -- #points sampled in farthest point sampling
radius: list of float32 -- search radius in local region
nsample: list of int32 -- how many points in each local region
mlp: list of list of int32 -- output size for MLP on each point
use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features
use_nchw: bool, if True, use NCHW data format for conv2d, which is usually faster than NHWC format
Return:
new_xyz: (batch_size, npoint, 3) TF tensor
new_points: (batch_size, npoint, \sum_k{mlp[k][-1]}) TF tensor
'''
data_format = 'NCHW' if use_nchw else 'NHWC'
with tf.variable_scope(scope) as sc:
new_xyz = gather_point(xyz, farthest_point_sample(npoint, xyz))
new_points_list = []
for i in range(len(radius_list)):
radius = radius_list[i]
nsample = nsample_list[i]
idx, pts_cnt = query_ball_point(radius, nsample, xyz, new_xyz)
grouped_xyz = group_point(xyz, idx)
grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1])
if points is not None:
grouped_points = group_point(points, idx)
if use_xyz:
grouped_points = tf.concat([grouped_points, grouped_xyz], axis=-1)
else:
grouped_points = grouped_xyz
if use_nchw: grouped_points = tf.transpose(grouped_points, [0,3,1,2])
for j,num_out_channel in enumerate(mlp_list[i]):
grouped_points = tf_util.conv2d(grouped_points, num_out_channel, [1,1],
padding='VALID', stride=[1,1], bn=bn, is_training=is_training,
scope='conv%d_%d'%(i,j), bn_decay=bn_decay)
if use_nchw: grouped_points = tf.transpose(grouped_points, [0,2,3,1])
new_points = tf.reduce_max(grouped_points, axis=[2])
new_points_list.append(new_points)
new_points_concat = tf.concat(new_points_list, axis=-1)
return new_xyz, new_points_concat
def pointnet_fp_module(xyz1, xyz2, points1, points2, mlp, is_training, bn_decay, scope, bn=True):
''' PointNet Feature Propogation (FP) Module
FP层,作用是更新从插值操作和跳层连接合并来的特征
Input:
xyz1: (batch_size, ndataset1, 3) TF tensor
xyz2: (batch_size, ndataset2, 3) TF tensor, sparser than xyz1
points1: (batch_size, ndataset1, nchannel1) TF tensor
points2: (batch_size, ndataset2, nchannel2) TF tensor
mlp: list of int32 --对给个点进行mlp后的输出特征维度大小
Return:
new_points: (batch_size, ndataset1, mlp[-1]) TF tensor
注:这一部分会用到插值模块,源码中带有tf_ops/3d_interpolation/tf_interpolate_so.so文件可以使用,不用重新编译。不同于需要进行编译的grouping和sampling操作。
'''
with tf.variable_scope(scope) as sc:
dist, idx = three_nn(xyz1, xyz2)
dist = tf.maximum(dist, 1e-10)
norm = tf.reduce_sum((1.0/dist),axis=2,keep_dims=True)
norm = tf.tile(norm,[1,1,3])
weight = (1.0/dist) / norm
interpolated_points = three_interpolate(points2, idx, weight)
if points1 is not None:
new_points1 = tf.concat(axis=2, values=[interpolated_points, points1]) # B,ndataset1,nchannel1+nchannel2
else:
new_points1 = interpolated_points
new_points1 = tf.expand_dims(new_points1, 2)
for i, num_out_channel in enumerate(mlp):
new_points1 = tf_util.conv2d(new_points1, num_out_channel, [1,1],
padding='VALID', stride=[1,1],
bn=bn, is_training=is_training,
scope='conv_%d'%(i), bn_decay=bn_decay)
new_points1 = tf.squeeze(new_points1, [2]) # B,ndataset1,mlp[-1]
return new_points1
以上是SA和FP部分的代码实现,接下来对分类任务的代码进行解读。
以最基础的单尺度采样分组设计为例,结合代码了解模型的搭建过程。
def get_model(point_cloud, is_training, bn_decay=None):
""" Classification PointNet, input is BxNx3, output Bx40 """
batch_size = point_cloud.get_shape()[0].value
num_point = point_cloud.get_shape()[1].value
end_points = {}
l0_xyz = point_cloud
l0_points = None
end_points['l0_xyz'] = l0_xyz
# Set abstraction layers
# Note: When using NCHW for layer 2, we see increased GPU memory usage (in TF1.4).
# So we only use NCHW for layer 1 until this issue can be resolved.
"""
调用三次SA模块+三次全连接层+两次dropout=0.5,和PointNet一样,除最后一层外,在所有的全连接层后都会进行批量归一化操作+ReLU操作:
SA(512, 0.2, [64, 64, 128]) → SA(128, 0.4, [128, 128, 256]) → SA([256, 512, 1024]) →
FC(512, 0.5) → FC(256, 0.5) → FC(K)
"""
l1_xyz, l1_points, l1_indices = pointnet_sa_module(l0_xyz, l0_points, npoint=512, radius=0.2, nsample=32, mlp=[64,64,128], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer1', use_nchw=True)
l2_xyz, l2_points, l2_indices = pointnet_sa_module(l1_xyz, l1_points, npoint=128, radius=0.4, nsample=64, mlp=[128,128,256], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer2')
l3_xyz, l3_points, l3_indices = pointnet_sa_module(l2_xyz, l2_points, npoint=None, radius=None, nsample=None, mlp=[256,512,1024], mlp2=None, group_all=True, is_training=is_training, bn_decay=bn_decay, scope='layer3')
# Fully connected layers
net = tf.reshape(l3_points, [batch_size, -1])
net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training, scope='fc1', bn_decay=bn_decay)
net = tf_util.dropout(net, keep_prob=0.5, is_training=is_training, scope='dp1')
net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training, scope='fc2', bn_decay=bn_decay)
net = tf_util.dropout(net, keep_prob=0.5, is_training=is_training, scope='dp2')
net = tf_util.fully_connected(net, 40, activation_fn=None, scope='fc3')
return net, end_points
"""
对于多尺度的分类网络模型(MSG)对应于pointnet2_cls_msg.py,这里的半径和mlp维度都分别转变为向量和数组表示形式,整体的计算过程如下:
SA(512, [0.1, 0.2, 0.4], [[32, 32, 64], [64, 64, 128], [64, 96, 128]]) →
SA(128, [0.2, 0.4, 0.8], [[64, 64, 128], [128, 128, 256], [128, 128, 256]]) →
SA([256, 512, 1024]) → F C(512, 0.5) → F C(256, 0.5) → F C(K)
对于多分辨率分类模型(MRG),作者在附录中只是给出了设计的步骤,实现源码没有给出
"""
文章给出了针对ModelNet40S数据集上的分割模型的效果比较:
相比于Pointnet的结果,Pointnet++在此有小幅度的提升。
对于分割部分,会单独进行一次总结,文中给出的分割效果对比图:
结果显示在场景分割网络中,准确度关系为:MSG+DP > MRG+DP > SSG> PointNet
源码其余部分的介绍不详细展开,根据个人理解将源码的结构与功能设计展示如下:
本文主要结合代码层面总结了pointnet++网络设计以及分类任务的实现。重点理解pointnet++是如何利用set abstraction(SA)这种结构学习到局部结构上的特征,并通过跳步连接和多尺度采样(MSG+DP)来提高模型对点云的分割准确性。可以注意到pointnet++中在特征提取时使用pointnet网络,但是最后的结果的鲁棒性在不添加其他设计的情况下没有原网络好,并且作者没有继续使用T-net进行点云对齐的方法。
博客内容有很多理解不足之处,请多多交流指正,接下来将在此基础上继续进行相关论文的学习。
参考源码地址:
1.原论文实现代码
https://github.com/charlesq34/pointnet2
2.基于pytorch实现:
https://github.com/erikwijmans/Pointnet2_PyTorch
https://github.com/yanx27/Pointnet_Pointnet2_pytorch