第一次见到MinkowskiEngine,应该是在两个月之前了,当时也没有去留意这个库。最近读了一些点云的论文,发现还是有不少论文的源码是基于MinkowskiEngine的,包括PointContrast(ECCV 2020),DGR(Deep Global Registration, CVPR 2020),Learning Multiview 3D Point Cloud Registration(CVPR 2020)和FCGF(Fully Convolutional Geometric Features, ICCV 2019)等,所以决定去了解一下这个库。
MinkowskiEngine是稀疏张量的自动微分库,致力于高维空间稀疏数据的操作。它支持所有的神经网络层,如Conv, Pooling, Unpooling和广播操作。基于MinkowskiEngine,可以实现点云的分割、分类、重建、补全、检测等任务。
MinkowskiEngine是在4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks[CVPR 2019]被提出的,主页,源码。目前已更新到v0.5版本,且正处于更新中。
点云数据的表示
MinkowskiEngine把点云表示成两部分: 坐标矩阵 C C C和特征矩阵 F F F。
C = [ x 1 y 1 z 1 b 1 : : x N y N z N b N ] C = \left[ \begin{matrix} x_1 & y_1 & z_1 & b_1\\ & : & : & \\ x_N & y_N & z_N & b_N \end{matrix} \right] C=⎣⎡x1xNy1:yNz1:zNb1bN⎦⎤ 和 F = [ f 1 T : f N T ] F = \left[ \begin{matrix} f_1^T \\ : \\ f_N^T \end{matrix} \right] F=⎣⎡f1T:fNT⎦⎤
其中 ( x i , y i , z i ) (x_i, y_i, z_i) (xi,yi,zi)表示点云的坐标, b i b_i bi表示 ( x i , y i , z i ) (x_i, y_i, z_i) (xi,yi,zi)属于batch中的哪个点云(MinkowskiEngine也是把点云组织成batch进行训练), N N N表示1个batch中所有点的数量, f i T f_i^T fiT表示第 i i i个点的特征,可以是1维或者3维或者其它维度的。
这样的表示,相比于3D卷积(X, Y, Z, D)的表示,可以节省空间。[N << XYZ -> N * 4 + N*D << XYZD, << 表示远小于]
常规3D卷积:
x u out = Σ i ∈ V ( K ) W i x u + i in , for u ∈ Z 3 \text{x}_u^{\text{out}} = \Sigma_{\text{i} \in V(K)}W_i\text{x}_{u + \text{i}}^{\text{in}}, \quad \text{for} \quad u \in \mathbb Z^3 xuout=Σi∈V(K)Wixu+iin,foru∈Z3
u ∈ Z 3 u \in \mathbb Z^3 u∈Z3表示3D坐标, K K K表示卷积中的kernel_size,V(K)是3维空间中的offsets集合, W i ∈ N out × N in W_i \in \mathbb N^{\text {out}} \times \mathbb N^ \text{in} Wi∈Nout×Nin
Minkowski 卷积
x u out = Σ i ∈ N ( u , C in ) W i x u + i in , for u ∈ C out \text{x}_u^{\text{out}} = \Sigma_{\text{i} \in N(u, \mathbb C^{\text{in}})}W_i\text{x}_{u + \text{i}}^{\text{in}}, \quad \text{for} \quad u \in \mathbb C^{\text{out}} xuout=Σi∈N(u,Cin)Wixu+iin,foru∈Cout
对于常规3D卷积,可以看到变化的是 u ∈ C out u \in \mathbb C^{\text{out}} u∈Cout和 i ∈ N ( u , C in ) \text{i} \in N(u, \mathbb C^{\text{in}}) i∈N(u,Cin)。 C in \mathbb C^{\text{in}} Cin和 C out \mathbb C^{\text{out}} Cout是预定义的稀疏张量的输入坐标和输出坐标, N ( u , C in ) = { i ∣ u + i ∈ C in , i ∈ V ( K ) } N(u, \mathbb C^{\text{in}}) = \lbrace \text{i} | u + \text{i} \in \mathbb C^{\text{in}}, i \in V(K) \rbrace N(u,Cin)={ i∣u+i∈Cin,i∈V(K)}。因此,相比于常规卷积,不是每一个(x, y, z)位置都会有一个卷积的输出,同时并不是每一个offset位置都会参与计算卷积。
卷积中常见的操作包括Conv, BN, Pooling, FC, transposed Conv等,本章节基于两个很简单的分类网络和分割网络在MinkowskiEngine环境实验上述操作。实验的环境是Ubuntu14.4, Cuda10.2, PyTorch 1.5, Python 3.7, MinkowskiEngine v0.5(下面的实验代码对于其它环境或许也同样支持)。
为了方便观察数据,设batch size=2,第一个点云P1中具有10个点,第二个点云P2中具有6个点,每个点的特征是3维的。下面代码生成P1和P2点云,并转换成MinkowskiEngine的输入数据格式。
import numpy as np
import torch.nn as nn
import MinkowskiEngine as ME
def print_title(s, data):
print('='*20, s, '='*20)
print(data)
if __name__ == '__main__':
origin_pc1 = 100 * np.random.uniform(0, 1, (10, 3))
feat1 = np.ones((10, 3), dtype=np.float32)
origin_pc2 = 100 * np.random.uniform(0, 1, (6, 3))
feat2 = np.ones((6, 3), dtype=np.float32)
print_title('origin_pc1', origin_pc1)
print_title('origin_pc2', origin_pc2)
coords, feats = ME.utils.sparse_collate([origin_pc1, origin_pc2], [feat1, feat2])
print_title('coords', coords)
print_title('feats', feats)
input = ME.SparseTensor(feats, coordinates=coords)
print_title('input', input)
上述程序的输出:
==================== origin_pc1 ====================
[[83.28334147 28.87414665 44.48401738]
[43.04924052 34.66068275 28.201644 ]
[22.51394645 53.53203799 25.68239097]
[11.39696393 27.68488056 18.02263419]
[68.04944494 78.4874799 33.54077384]
[83.11021987 95.29080943 72.42599245]
[68.96104764 64.38640545 56.64488121]
[61.26343854 35.13968286 10.67545387]
[95.5847873 56.20865881 5.97082126]
[63.43547357 75.31685552 67.71327187]]
==================== origin_pc2 ====================
[[21.01681082 32.60864402 14.68910937]
[76.90920828 40.72511594 17.21551445]
[67.84378491 80.58219012 43.75387818]
[45.97922404 77.97593435 3.17289328]
[39.91144138 80.02990713 44.97847053]
[ 1.55805162 57.33833007 92.04541106]]
==================== coords ====================
tensor([[ 0, 83, 28, 44],
[ 0, 43, 34, 28],
[ 0, 22, 53, 25],
[ 0, 11, 27, 18],
[ 0, 68, 78, 33],
[ 0, 83, 95, 72],
[ 0, 68, 64, 56],
[ 0, 61, 35, 10],
[ 0, 95, 56, 5],
[ 0, 63, 75, 67],
[ 1, 21, 32, 14],
[ 1, 76, 40, 17],
[ 1, 67, 80, 43],
[ 1, 45, 77, 3],
[ 1, 39, 80, 44],
[ 1, 1, 57, 92]], dtype=torch.int32)
==================== feats ====================
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
==================== input ====================
SparseTensor(
coordinates=tensor([[ 0, 83, 28, 44],
[ 0, 43, 34, 28],
[ 0, 22, 53, 25],
[ 0, 11, 27, 18],
[ 0, 68, 78, 33],
[ 0, 83, 95, 72],
[ 0, 68, 64, 56],
[ 0, 61, 35, 10],
[ 0, 95, 56, 5],
[ 0, 63, 75, 67],
[ 1, 21, 32, 14],
[ 1, 76, 40, 17],
[ 1, 67, 80, 43],
[ 1, 45, 77, 3],
[ 1, 39, 80, 44],
[ 1, 1, 57, 92]], dtype=torch.int32)
features=tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
coordinate_map_key=coordinate map key:[1, 1, 1]
coordinate_manager=CoordinateMapManagerCPU(
[1, 1, 1]: CoordinateMapCPU:16x4
algorithm=MinkowskiAlgorithm.DEFAULT
)
spatial dimension=3)
可以看到:
ME.utils.sparse_collate
把P1和P2点云数据的坐标进行了量化,并组成了batch的格式, 0表示属于P1点云数据,1表示属于P2点云数据ME.SparseTensor
把数据转换成了SparseTensor,MinkowskiEngine需要的数据格式。SparseTensor包括coordinates和features的信息。这里只实现Conv(3, 64) + BN + ReLU + GlobalPooling + FC(64, 32)的简单分类网络。
import numpy as np
import torch.nn as nn
import MinkowskiEngine as ME
class ExampleNetwork(ME.MinkowskiNetwork):
def __init__(self, in_feat, out_feat, D=3):
super(ExampleNetwork, self).__init__(D)
self.conv = nn.Sequential(
ME.MinkowskiConvolution(
in_channels=in_feat,
out_channels=64,
kernel_size=3,
stride=1,
dilation=1,
bias=False,
dimension=D),
ME.MinkowskiBatchNorm(64),
ME.MinkowskiReLU())
self.pooling = ME.MinkowskiGlobalPooling(ME.PoolingMode.GLOBAL_AVG_POOLING_KERNEL)
self.linear = ME.MinkowskiLinear(64, out_feat)
def forward(self, x):
out = self.conv(x)
print('conv: ', out.coordinates.size(), out.features.size())
out = self.pooling(out)
print('pooling: ', out.coordinates.size(), out.features.size())
out = self.linear(out)
print('linear: ', out.coordinates.size(), out.features.size())
return out
if __name__ == '__main__':
origin_pc1 = 100 * np.random.uniform(0, 1, (10, 3))
feat1 = np.ones((10, 3), dtype=np.float32)
origin_pc2 = 100 * np.random.uniform(0, 1, (6, 3))
feat2 = np.ones((6, 3), dtype=np.float32)
coords, feats = ME.utils.sparse_collate([origin_pc1, origin_pc2], [feat1, feat2])
input = ME.SparseTensor(feats, coordinates=coords)
net = ExampleNetwork(in_feat=3, out_feat=32)
output = net(input)
for k, v in net.named_parameters():
print(k, v.size())
程序运行结果如下:
conv: torch.Size([16, 4]) torch.Size([16, 64])
pooling: torch.Size([2, 4]) torch.Size([2, 64])
linear: torch.Size([2, 4]) torch.Size([2, 32])
conv.0.kernel torch.Size([27, 3, 64])
conv.1.bn.weight torch.Size([64])
conv.1.bn.bias torch.Size([64])
linear.linear.weight torch.Size([32, 64])
linear.linear.bias torch.Size([32])
从上面可以看到:
输入是两个点云P1, P2,分别具有100个点和6个点,网络的经过Conv(3, 64, stride=2) + BN + ReLU + transposed Conv(64, 4),结构代码如下所示:
import numpy as np
import torch
import torch.nn as nn
import MinkowskiEngine as ME
import MinkowskiEngine.MinkowskiFunctional as MEF
class ExampleNetwork(ME.MinkowskiNetwork):
def __init__(self, in_feat, out_feat, D=3):
super(ExampleNetwork, self).__init__(D)
self.conv = ME.MinkowskiConvolution(
in_channels=in_feat,
out_channels=64,
kernel_size=3,
stride=2,
dilation=1,
bias=False,
dimension=D)
self.bn = ME.MinkowskiBatchNorm(64)
self.conv_tr = ME.MinkowskiConvolutionTranspose(
in_channels=64,
out_channels=4,
kernel_size=3,
stride=2,
dilation=1,
bias=False,
dimension=D)
def forward(self, x):
print('input: ', x.coordinates.size(), x.features.size())
out = self.conv(x)
print('conv: ', out.coordinates.size(), out.features.size())
out = self.bn(out)
print('bn: ', out.coordinates.size(), out.features.size())
out = MEF.relu(out)
print('relu: ', out.coordinates.size(), out.features.size())
out = self.conv_tr(out)
print('conv_tr', out.coordinates.size(), out.features.size())
return out
if __name__ == '__main__':
origin_pc1 = 5 * np.random.uniform(0, 1, (100, 3))
feat1 = np.ones((100, 3), dtype=np.float32)
origin_pc2 = 100 * np.random.uniform(0, 1, (6, 3))
feat2 = np.ones((6, 3), dtype=np.float32)
coords, feats = ME.utils.sparse_collate([origin_pc1, origin_pc2], [feat1, feat2])
input = ME.SparseTensor(feats, coordinates=coords)
net = ExampleNetwork(in_feat=3, out_feat=32)
output = net(input)
print(torch.equal(input.coordinates, output.coordinates))
print(torch.equal(input.features, output.features))
输出结果为:
input: torch.Size([74, 4]) torch.Size([74, 3])
conv: torch.Size([31, 4]) torch.Size([31, 64])
bn: torch.Size([31, 4]) torch.Size([31, 64])
relu: torch.Size([31, 4]) torch.Size([31, 64])
conv_tr torch.Size([74, 4]) torch.Size([74, 4])
True
False
通过实验可以观察得到:
True
,表示input和output的坐标是一致的,也就是说通过conv + tr_conv,点云的数量和在tensor中的顺序并不会改变。初次接触MinkowskiEngine,理解不对的地方欢迎大家指正。