发现最新的一些3d检测的paper都是基于pvrcnn和voxelrcnn的,不得不对这两个网络的源码过一遍了。
CODEBASE 基于 OpenPCDet 0.5.2
先来看VoxelRCNN配置文件。
VFE, BACKBONE_3D, MAP_TO_BEV, BACKBONE_2D, DENSE_HEAD 与 SECOND所用的一致。也就是说用AnchorHEADSingle来当一阶段的Head。接下来直接来看看这个DENSE_HEAD吧。
找到AnchorHeadSingle的py文件 路径是:
OpenPCDet/pcdet/models/dense_heads/anchor_head_single.py
先来看初始化。
class AnchorHeadSingle(AnchorHeadTemplate):
def __init__(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range,
predict_boxes_when_training=True, **kwargs):
super().__init__(
model_cfg=model_cfg, num_class=num_class, class_names=class_names, grid_size=grid_size, point_cloud_range=point_cloud_range,
predict_boxes_when_training=predict_boxes_when_training
)
self.num_anchors_per_location = sum(self.num_anchors_per_location)
self.conv_cls = nn.Conv2d(
input_channels, self.num_anchors_per_location * self.num_class,
kernel_size=1
)
self.conv_box = nn.Conv2d(
input_channels, self.num_anchors_per_location * self.box_coder.code_size,
kernel_size=1
)
if self.model_cfg.get('USE_DIRECTION_CLASSIFIER', None) is not None:
self.conv_dir_cls = nn.Conv2d(
input_channels,
self.num_anchors_per_location * self.model_cfg.NUM_DIR_BINS,
kernel_size=1
)
else:
self.conv_dir_cls = None
self.init_weights()
def init_weights(self):
pi = 0.01
nn.init.constant_(self.conv_cls.bias, -np.log((1 - pi) / pi))
nn.init.normal_(self.conv_box.weight, mean=0, std=0.001)
继承自 AnchorHeadTemplate。这个Template主要初始化了一些AnchorHead的通用功能(例如:生成Anchor,初始化targer_assigner,box_coder)。#TODO 这三个模块的解析
然后是AnchorHeadSingle的初始化,这个比较简单,在init里可以看到只是初始化了一些nn.Conv2d 用来做分类和回归的预测。
打印一下看看。
(dense_head): AnchorHeadSingle(
(cls_loss_func): SigmoidFocalClassificationLoss()
(reg_loss_func): WeightedSmoothL1Loss()
(dir_loss_func): WeightedCrossEntropyLoss()
(conv_cls): Conv2d(256, 2, kernel_size=(1, 1), stride=(1, 1))
(conv_box): Conv2d(256, 14, kernel_size=(1, 1), stride=(1, 1))
(conv_dir_cls): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1))
)
解释下conv2d的in out channels:
# 初始化得到的每个feature map体素的anchor数量为2,也就是feature map的像素上有2个anchor。
self.num_anchors_per_location = sum(self.num_anchors_per_location)
conv_cls 用来对每个anchor的分类置信度进行分类预测,把256通道的特征图变为2通道。
conv_box 用来对每个anchor的偏移量进行回归预测(anchor-based的方法预测的是anchor与真实框的偏移量),这14通道表示2个anchor的偏移量(一个3D bounding box 由7个参数决定 x, y, z, dx, dy, dz, heading)
conv_dir_cls 对方向进行预测(SECOND 中提出对方向先进行分类再加上偏向角)。
再来看forward
def forward(self, data_dict):
spatial_features_2d = data_dict['spatial_features_2d']
cls_preds = self.conv_cls(spatial_features_2d)
box_preds = self.conv_box(spatial_features_2d)
cls_preds = cls_preds.permute(0, 2, 3, 1).contiguous() # [N, H, W, C]
box_preds = box_preds.permute(0, 2, 3, 1).contiguous() # [N, H, W, C]
self.forward_ret_dict['cls_preds'] = cls_preds
self.forward_ret_dict['box_preds'] = box_preds
if self.conv_dir_cls is not None:
dir_cls_preds = self.conv_dir_cls(spatial_features_2d)
dir_cls_preds = dir_cls_preds.permute(0, 2, 3, 1).contiguous()
self.forward_ret_dict['dir_cls_preds'] = dir_cls_preds
else:
dir_cls_preds = None
if self.training:
targets_dict = self.assign_targets(
gt_boxes=data_dict['gt_boxes']
)
self.forward_ret_dict.update(targets_dict)
if not self.training or self.predict_boxes_when_training:
batch_cls_preds, batch_box_preds = self.generate_predicted_boxes(
batch_size=data_dict['batch_size'],
cls_preds=cls_preds, box_preds=box_preds, dir_cls_preds=dir_cls_preds
)
data_dict['batch_cls_preds'] = batch_cls_preds
data_dict['batch_box_preds'] = batch_box_preds
data_dict['cls_preds_normalized'] = False
return data_dict
先来看看传进来的data_dict包含什么
(Pdb) data_dict.keys()
dict_keys(['frame_id', 'gt_boxes', 'points', 'points_before_aug', 'flip_x', 'noise_rot', 'noise_scale', 'calib', 'use_lead_xyz', 'voxels', 'voxel_coords', 'voxel_num_points', 'image_shape', 'batch_size', 'voxel_features', 'encoded_spconv_tensor', 'encoded_spconv_tensor_stride', 'multi_scale_3d_features', 'multi_scale_3d_strides', 'spatial_features', 'spatial_features_stride', 'spatial_features_2d'])
spatial_features : 3D backbone输出的特征图
spatial_features_stride : 3D backbone 下采样的步长(倍率)
spatial_features_2d : 2d backbone 输出的特征图 (也就是head要用的bev特征图)
看下这个bev feature map的shape
(Pdb) spatial_features_2d.shape
torch.Size([2, 256, 200, 176])
是一个256通道的2d feature
对应上面说的3个1x1conv 的输入通道
cls_preds = self.conv_cls(spatial_features_2d)
box_preds = self.conv_box(spatial_features_2d)
cls_preds = cls_preds.permute(0, 2, 3, 1).contiguous() # [N, H, W, C]
box_preds = box_preds.permute(0, 2, 3, 1).contiguous() # [N, H, W, C]
self.forward_ret_dict['cls_preds'] = cls_preds
self.forward_ret_dict['box_preds'] = box_preds
if self.conv_dir_cls is not None:
dir_cls_preds = self.conv_dir_cls(spatial_features_2d)
dir_cls_preds = dir_cls_preds.permute(0, 2, 3, 1).contiguous()
self.forward_ret_dict['dir_cls_preds'] = dir_cls_preds
else:
dir_cls_preds = None
看下这 3 个 1x1 conv 的输出。
(Pdb) cls_preds.shape, box_preds.shape, dir_cls_preds.shape
(torch.Size([2, 200, 176, 2]), torch.Size([2, 200, 176, 14]), torch.Size([2, 200, 176, 4]))
接下来要为每个Anchor匹配gt box(注意是原始Anchor 不是Anchor+预测值)
if self.training:
targets_dict = self.assign_targets(
gt_boxes=data_dict['gt_boxes']
)
self.forward_ret_dict.update(targets_dict)
(Pdb) targets_dict['box_cls_labels'].shape
torch.Size([2, 70400])
(Pdb) targets_dict['box_reg_targets'].shape
torch.Size([2, 70400, 7])
(Pdb) targets_dict['reg_weights'].shape
torch.Size([2, 70400])
2是batch_size 70400是Anchor的总数量
不过此时 大部分值都是0,只有匹配到gt的anchor才有值
(Pdb) targets_dict['box_reg_targets']
tensor([[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]], device='cuda:0')
(Pdb) targets_dict['box_cls_labels'].nonzero().shape
torch.Size([508, 2])
接下来是生成预测框
if not self.training or self.predict_boxes_when_training:
batch_cls_preds, batch_box_preds = self.generate_predicted_boxes( # 未看
batch_size=data_dict['batch_size'],
cls_preds=cls_preds, box_preds=box_preds, dir_cls_preds=dir_cls_preds
)
data_dict['batch_cls_preds'] = batch_cls_preds
data_dict['batch_box_preds'] = batch_box_preds # [2, 70400, 7]
data_dict['cls_preds_normalized'] = False
(Pdb) batch_cls_preds.shape
torch.Size([2, 70400, 1])
(Pdb) batch_box_preds.shape
torch.Size([2, 70400, 7])
至此 一阶段HEAD的forward 就结束了。