由于pointnet是对整个点云进行处理后maxpooling得到全局特征,没有考虑到局部特征。pointnet++主要是针对这个问题进行改进,首先将点云划分成overlapping的不同子集,然后调用pointnet对子集进行特征提取,再聚合,直到得到整个点云集的特征为止。实际上pointnet++就是在pointnet的基础上增加了一个层次化处理的结构。这些embedded feature可以代表完整点云的语义信息,进一步用于整个点云的cls(分类)和point level的seg(语义分割)。
整个pointnet++要解决两个问题:
- 如何划分完整点云集
- 如何抽象点集,提取local feature
https://github.com/charlesq34/pointnet2
代码解读
核心文件在models
文件夹下
pointnet_cls_basic.py
是基础pointnet的框架
pointnet2_cls_ssg.py
和pointnet2_cls_msg.py
分别是single-scale-group
和multi-scale-group
的代码。
核心公共模块
先来看cls和seg公用的核心模块pointnet_sa_module
,该函数定义位于./utils/pointnet_util.py
pointnet_sa_module (PointNet Set Abstraction Layer)
def pointnet_sa_module(xyz, points, npoint, radius, nsample, mlp, mlp2, group_all, is_training, bn_decay, scope, bn=True, pooling='max', knn=False, use_xyz=True, use_nchw=False):
''' PointNet Set Abstraction (SA) Module
Input:
xyz: (batch_size, ndataset, 3) TF tensor
points: (batch_size, ndataset, channel) TF tensor
npoint: int32 -- #points sampled in farthest point sampling
radius: float32 -- search radius in local region
nsample: int32 -- how many points in each local region
mlp: list of int32 -- output size for MLP on each point
mlp2: list of int32 -- output size for MLP on each region
group_all: bool -- group all points into one PC if set true, OVERRIDE
npoint, radius and nsample settings
use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features
use_nchw: bool, if True, use NCHW data format for conv2d, which is usually faster than NHWC format
Return:
new_xyz: (batch_size, npoint, 3) TF tensor
new_points: (batch_size, npoint, mlp[-1] or mlp2[-1]) TF tensor
idx: (batch_size, npoint, nsample) int32 -- indices for local regions
'''
先来解析一下各个输入输出的含义。
data_format = 'NCHW' if use_nchw else 'NHWC'
with tf.variable_scope(scope) as sc:
# Sample and Grouping
if group_all:
nsample = xyz.get_shape()[1].value
new_xyz, new_points, idx, grouped_xyz = sample_and_group_all(xyz, points, use_xyz)
else:
new_xyz, new_points, idx, grouped_xyz = sample_and_group(npoint, radius, nsample, xyz, points, knn, use_xyz)
这一段是根据输入的group_all
布尔参数来决定执行sample_and_group_all
或者sample_and_group
。
sample_and_group
def sample_and_group(npoint, radius, nsample, xyz, points, knn=False, use_xyz=True):
'''
Input:
npoint: int32
radius: float32
nsample: int32
xyz: (batch_size, ndataset, 3) TF tensor
points: (batch_size, ndataset, channel) TF tensor, if None will just use xyz as points
knn: bool, if True use kNN instead of radius search
use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features
Output:
new_xyz: (batch_size, npoint, 3) TF tensor
new_points: (batch_size, npoint, nsample, 3+channel) TF tensor
idx: (batch_size, npoint, nsample) TF tensor, indices of local points as in ndataset points
grouped_xyz: (batch_size, npoint, nsample, 3) TF tensor, normalized point XYZs
(subtracted by seed point XYZ) in local regions
'''
'''
根据fps算法从输入xyz中选取npoint个点,返回他们的index
然后在由gather_point采样出对应的点,返回子集点云new_xyz (batch_size, npoint, 3)。
'''
new_xyz = gather_point(xyz, farthest_point_sample(npoint, xyz))
'''
然后为new_xyz中的每个点在xyz中找到他的local region neighbor。
根据grouping算法不同,对于new_xyz中的每个点,knn会返回metric space上最近的nsample个点。
query_ball_point则会根据nsamples和radius两方面的限制来提取pts_cnt个点(不一定能达到nsample)。
idx是一个(batch_size, npoint, nsample)的变量,nsample是int32的array, indices to input points in xyz。
'''
if knn:
_,idx = knn_point(nsample, xyz, new_xyz)
else:
idx, pts_cnt = query_ball_point(radius, nsample, xyz, new_xyz)
'''
group_point是根据idx返回的序号将点云local_region的点云组织成有效数据结构(batch_size, npoint, nsample, 3)。
'''
grouped_xyz = group_point(xyz, idx) # (batch_size, npoint, nsample, 3)
grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization
'''
points: (batch_size, ndataset, channel) 是考虑范围内所有点的feature channel是feature的维度
grouped_points: (batch_size, npoint, nsample, channel) 是根据idx把特征提取出来
use_xyz==True的时候,将坐标和特征拼接输出(batch_size, npoint, nsample, 3+channel)
否则,只输出特征,不输出坐标。
'''
if points is not None:
grouped_points = group_point(points, idx) # (batch_size, npoint, nsample, channel)
if use_xyz:
new_points = tf.concat([grouped_xyz, grouped_points], axis=-1) # (batch_size, npoint, nample, 3+channel)
else:
new_points = grouped_points
else:
new_points = grouped_xyz
'''
new_xyz是新生成的子集点云的中心点集合
new_points可能是所有参与运算的点的三维坐标也有可能是所有参与运算点的特征,也有可能是坐标+特征
idx是所有参与运算的点在xyz中的序号
grouped_xyz:(batch_size, npoint, nsample, 3)是所有参与运算的点的三维坐标
'''
return new_xyz, new_points, idx, grouped_xyz
sample and group all等效于
Equivalent to sample_and_group with npoint=1, radius=inf, use (0,0,0) as the centroid`,相当于Pointnet的全局处理方式。
# Point Feature Embedding
if use_nchw: new_points = tf.transpose(new_points, [0,3,1,2])
for i, num_out_channel in enumerate(mlp):
new_points = tf_util.conv2d(new_points, num_out_channel, [1,1],
padding='VALID', stride=[1,1],
bn=bn, is_training=is_training,
scope='conv%d'%(i), bn_decay=bn_decay,
data_format=data_format)
if use_nchw: new_points = tf.transpose(new_points, [0,2,3,1])
# Pooling in Local Regions
if pooling=='max':
new_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')
elif pooling=='avg':
new_points = tf.reduce_mean(new_points, axis=[2], keep_dims=True, name='avgpool')
elif pooling=='weighted_avg':
with tf.variable_scope('weighted_avg'):
dists = tf.norm(grouped_xyz,axis=-1,ord=2,keep_dims=True)
exp_dists = tf.exp(-dists * 5)
weights = exp_dists/tf.reduce_sum(exp_dists,axis=2,keep_dims=True) # (batch_size, npoint, nsample, 1)
new_points *= weights # (batch_size, npoint, nsample, mlp[-1])
new_points = tf.reduce_sum(new_points, axis=2, keep_dims=True)
elif pooling=='max_and_avg':
max_points = tf.reduce_max(new_points, axis=[2], keep_dims=True, name='maxpool')
avg_points = tf.reduce_mean(new_points, axis=[2], keep_dims=True, name='avgpool')
new_points = tf.concat([avg_points, max_points], axis=-1)
# [Optional] Further Processing
if mlp2 is not None:
if use_nchw: new_points = tf.transpose(new_points, [0,3,1,2])
for i, num_out_channel in enumerate(mlp2):
new_points = tf_util.conv2d(new_points, num_out_channel, [1,1],
padding='VALID', stride=[1,1],
bn=bn, is_training=is_training,
scope='conv_post_%d'%(i), bn_decay=bn_decay,
data_format=data_format)
if use_nchw: new_points = tf.transpose(new_points, [0,2,3,1])
new_points = tf.squeeze(new_points, [2]) # (batch_size, npoints, mlp2[-1])
return new_xyz, new_points, idx
剩下的这三块相当的straight forward,应该不必解说,注意返回的是new_xyz
新的子集中心点坐标,new_points
是对应的特征,idx
是包含指向的local region的所有参与运算点的index,其shape为(batch_size, npoint, nsample)。
classification任务
ssg的核心模型
用于cls的ssg核心代码如下:
def get_model(point_cloud, is_training, bn_decay=None):
""" Classification PointNet, input is BxNx3, output Bx40 """
batch_size = point_cloud.get_shape()[0].value
num_point = point_cloud.get_shape()[1].value
end_points = {}
l0_xyz = point_cloud
l0_points = None
end_points['l0_xyz'] = l0_xyz
# Set abstraction layers
# Note: When using NCHW for layer 2, we see increased GPU memory usage (in TF1.4).
# So we only use NCHW for layer 1 until this issue can be resolved.
"""
从原始点云中选出512个点来,每个点在其周围选择至多32个点作为local region。
l1_xyz : (batch_size, 512, 3)
l1_points: (batch_size, 512, 128)
l1_indices:(batch_size, 512, 32)
"""
l1_xyz, l1_points, l1_indices = pointnet_sa_module(l0_xyz, l0_points, npoint=512, radius=0.2, nsample=32, mlp=[64,64,128], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer1', use_nchw=True)
"""
从512个点中选出128个点来,每个点在其周围选择至多64个点作为local region。
l2_xyz : (batch_size, 128, 3)
l2_points: (batch_size, 128, 256)
l2_indices:(batch_size, 128, 64)
"""
l2_xyz, l2_points, l2_indices = pointnet_sa_module(l1_xyz, l1_points, npoint=128, radius=0.4, nsample=64, mlp=[128,128,256], mlp2=None, group_all=False, is_training=is_training, bn_decay=bn_decay, scope='layer2')
"""
从128个点中group all。
l3_xyz : (batch_size, 1, 3)
l3_points: (batch_size, 1, 256)
l3_indices:(batch_size, 1, 128)
"""
l3_xyz, l3_points, l3_indices = pointnet_sa_module(l2_xyz, l2_points, npoint=None, radius=None, nsample=None, mlp=[256,512,1024], mlp2=None, group_all=True, is_training=is_training, bn_decay=bn_decay, scope='layer3')
# Fully connected layers
net = tf.reshape(l3_points, [batch_size, -1])
net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training, scope='fc1', bn_decay=bn_decay)
net = tf_util.dropout(net, keep_prob=0.5, is_training=is_training, scope='dp1')
net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training, scope='fc2', bn_decay=bn_decay)
net = tf_util.dropout(net, keep_prob=0.5, is_training=is_training, scope='dp2')
net = tf_util.fully_connected(net, 40, activation_fn=None, scope='fc3')
return net, end_points
可以看到数据流基本是进行了三次pointnet_sa_module
然后得到特征送入全连接层进行分类。
pointnet_fp_module
def pointnet_fp_module(xyz1, xyz2, points1, points2, mlp, is_training, bn_decay, scope, bn=True):
''' PointNet Feature Propogation (FP) Module
Input:
xyz1: (batch_size, ndataset1, 3) TF tensor
xyz2: (batch_size, ndataset2, 3) TF tensor, sparser than xyz1
points1: (batch_size, ndataset1, nchannel1) TF tensor
points2: (batch_size, ndataset2, nchannel2) TF tensor
mlp: list of int32 -- output size for MLP on each point
Return:
new_points: (batch_size, ndataset1, mlp[-1]) TF tensor
'''
with tf.variable_scope(scope) as sc:
dist, idx = three_nn(xyz1, xyz2)
dist = tf.maximum(dist, 1e-10)
norm = tf.reduce_sum((1.0/dist),axis=2,keep_dims=True)
norm = tf.tile(norm,[1,1,3])
weight = (1.0/dist) / norm
interpolated_points = three_interpolate(points2, idx, weight)
if points1 is not None:
new_points1 = tf.concat(axis=2, values=[interpolated_points, points1]) # B,ndataset1,nchannel1+nchannel2
else:
new_points1 = interpolated_points
new_points1 = tf.expand_dims(new_points1, 2)
for i, num_out_channel in enumerate(mlp):
new_points1 = tf_util.conv2d(new_points1, num_out_channel, [1,1],
padding='VALID', stride=[1,1],
bn=bn, is_training=is_training,
scope='conv_%d'%(i), bn_decay=bn_decay)
new_points1 = tf.squeeze(new_points1, [2]) # B,ndataset1,mlp[-1]
return new_points1