【3D目标检测】[看code] VoxelRCNN RPN(一)

发现最新的一些3d检测的paper都是基于pvrcnn和voxelrcnn的,不得不对这两个网络的源码过一遍了。

CODEBASE 基于 OpenPCDet 0.5.2

先来看VoxelRCNN配置文件。

【3D目标检测】[看code] VoxelRCNN RPN(一)_第1张图片

VFE, BACKBONE_3D, MAP_TO_BEV, BACKBONE_2D, DENSE_HEAD 与 SECOND所用的一致。也就是说用AnchorHEADSingle来当一阶段的Head。接下来直接来看看这个DENSE_HEAD吧。

找到AnchorHeadSingle的py文件 路径是:

OpenPCDet/pcdet/models/dense_heads/anchor_head_single.py

先来看初始化。

class AnchorHeadSingle(AnchorHeadTemplate):
    def __init__(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range,
                 predict_boxes_when_training=True, **kwargs):
        super().__init__(
            model_cfg=model_cfg, num_class=num_class, class_names=class_names, grid_size=grid_size, point_cloud_range=point_cloud_range,
            predict_boxes_when_training=predict_boxes_when_training
        )

        self.num_anchors_per_location = sum(self.num_anchors_per_location)

        self.conv_cls = nn.Conv2d(
            input_channels, self.num_anchors_per_location * self.num_class,
            kernel_size=1
        )
        self.conv_box = nn.Conv2d(
            input_channels, self.num_anchors_per_location * self.box_coder.code_size,
            kernel_size=1
        )

        if self.model_cfg.get('USE_DIRECTION_CLASSIFIER', None) is not None:
            self.conv_dir_cls = nn.Conv2d(
                input_channels,
                self.num_anchors_per_location * self.model_cfg.NUM_DIR_BINS,
                kernel_size=1
            )
        else:
            self.conv_dir_cls = None
        self.init_weights()

    def init_weights(self):
        pi = 0.01
        nn.init.constant_(self.conv_cls.bias, -np.log((1 - pi) / pi))
        nn.init.normal_(self.conv_box.weight, mean=0, std=0.001)

继承自 AnchorHeadTemplate。这个Template主要初始化了一些AnchorHead的通用功能(例如:生成Anchor,初始化targer_assigner,box_coder)。#TODO 这三个模块的解析

【3D目标检测】[看code] VoxelRCNN RPN(一)_第2张图片

然后是AnchorHeadSingle的初始化,这个比较简单,在init里可以看到只是初始化了一些nn.Conv2d 用来做分类和回归的预测。

打印一下看看。

  (dense_head): AnchorHeadSingle(
    (cls_loss_func): SigmoidFocalClassificationLoss()
    (reg_loss_func): WeightedSmoothL1Loss()
    (dir_loss_func): WeightedCrossEntropyLoss()
    (conv_cls): Conv2d(256, 2, kernel_size=(1, 1), stride=(1, 1))
    (conv_box): Conv2d(256, 14, kernel_size=(1, 1), stride=(1, 1))
    (conv_dir_cls): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1))
  ) 

解释下conv2d的in out channels:

# 初始化得到的每个feature map体素的anchor数量为2,也就是feature map的像素上有2个anchor。

self.num_anchors_per_location = sum(self.num_anchors_per_location)

conv_cls 用来对每个anchor的分类置信度进行分类预测,把256通道的特征图变为2通道。

conv_box 用来对每个anchor的偏移量进行回归预测(anchor-based的方法预测的是anchor与真实框的偏移量),这14通道表示2个anchor的偏移量(一个3D bounding box 由7个参数决定 x, y, z, dx, dy, dz, heading)

conv_dir_cls 对方向进行预测(SECOND 中提出对方向先进行分类再加上偏向角)。

再来看forward

    def forward(self, data_dict):
        spatial_features_2d = data_dict['spatial_features_2d']

        cls_preds = self.conv_cls(spatial_features_2d)
        box_preds = self.conv_box(spatial_features_2d)

        cls_preds = cls_preds.permute(0, 2, 3, 1).contiguous()  # [N, H, W, C]
        box_preds = box_preds.permute(0, 2, 3, 1).contiguous()  # [N, H, W, C]

        self.forward_ret_dict['cls_preds'] = cls_preds
        self.forward_ret_dict['box_preds'] = box_preds

        if self.conv_dir_cls is not None:
            dir_cls_preds = self.conv_dir_cls(spatial_features_2d)
            dir_cls_preds = dir_cls_preds.permute(0, 2, 3, 1).contiguous()
            self.forward_ret_dict['dir_cls_preds'] = dir_cls_preds
        else:
            dir_cls_preds = None

        if self.training:
            targets_dict = self.assign_targets(
                gt_boxes=data_dict['gt_boxes']
            )
            self.forward_ret_dict.update(targets_dict)

        if not self.training or self.predict_boxes_when_training:
            batch_cls_preds, batch_box_preds = self.generate_predicted_boxes(
                batch_size=data_dict['batch_size'],
                cls_preds=cls_preds, box_preds=box_preds, dir_cls_preds=dir_cls_preds
            )
            data_dict['batch_cls_preds'] = batch_cls_preds
            data_dict['batch_box_preds'] = batch_box_preds
            data_dict['cls_preds_normalized'] = False

        return data_dict

先来看看传进来的data_dict包含什么

(Pdb) data_dict.keys()
dict_keys(['frame_id', 'gt_boxes', 'points', 'points_before_aug', 'flip_x', 'noise_rot', 'noise_scale', 'calib', 'use_lead_xyz', 'voxels', 'voxel_coords', 'voxel_num_points', 'image_shape', 'batch_size', 'voxel_features', 'encoded_spconv_tensor', 'encoded_spconv_tensor_stride', 'multi_scale_3d_features', 'multi_scale_3d_strides', 'spatial_features', 'spatial_features_stride', 'spatial_features_2d'])

spatial_features : 3D backbone输出的特征图

spatial_features_stride : 3D backbone 下采样的步长(倍率)

spatial_features_2d : 2d backbone 输出的特征图 (也就是head要用的bev特征图)

看下这个bev feature map的shape

(Pdb) spatial_features_2d.shape
torch.Size([2, 256, 200, 176])

是一个256通道的2d feature

对应上面说的3个1x1conv 的输入通道

        cls_preds = self.conv_cls(spatial_features_2d)
        box_preds = self.conv_box(spatial_features_2d)

        cls_preds = cls_preds.permute(0, 2, 3, 1).contiguous()  # [N, H, W, C]
        box_preds = box_preds.permute(0, 2, 3, 1).contiguous()  # [N, H, W, C]

        self.forward_ret_dict['cls_preds'] = cls_preds
        self.forward_ret_dict['box_preds'] = box_preds

        if self.conv_dir_cls is not None:
            dir_cls_preds = self.conv_dir_cls(spatial_features_2d)
            dir_cls_preds = dir_cls_preds.permute(0, 2, 3, 1).contiguous()
            self.forward_ret_dict['dir_cls_preds'] = dir_cls_preds
        else:
            dir_cls_preds = None

看下这 3 个 1x1 conv 的输出。

(Pdb) cls_preds.shape, box_preds.shape, dir_cls_preds.shape
(torch.Size([2, 200, 176, 2]), torch.Size([2, 200, 176, 14]), torch.Size([2, 200, 176, 4]))

 接下来要为每个Anchor匹配gt box(注意是原始Anchor 不是Anchor+预测值)

        if self.training:
            targets_dict = self.assign_targets(
                gt_boxes=data_dict['gt_boxes']
            )
            self.forward_ret_dict.update(targets_dict)

(Pdb) targets_dict['box_cls_labels'].shape
torch.Size([2, 70400])
(Pdb) targets_dict['box_reg_targets'].shape
torch.Size([2, 70400, 7])
(Pdb) targets_dict['reg_weights'].shape
torch.Size([2, 70400]) 

2是batch_size 70400是Anchor的总数量

不过此时 大部分值都是0,只有匹配到gt的anchor才有值

(Pdb) targets_dict['box_reg_targets']
tensor([[[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]],

        [[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]]], device='cuda:0')
(Pdb) targets_dict['box_cls_labels'].nonzero().shape
torch.Size([508, 2])

接下来是生成预测框

        if not self.training or self.predict_boxes_when_training:
            batch_cls_preds, batch_box_preds = self.generate_predicted_boxes( # 未看
                batch_size=data_dict['batch_size'],
                cls_preds=cls_preds, box_preds=box_preds, dir_cls_preds=dir_cls_preds
            )
            data_dict['batch_cls_preds'] = batch_cls_preds
            data_dict['batch_box_preds'] = batch_box_preds # [2, 70400, 7]
            data_dict['cls_preds_normalized'] = False

 (Pdb) batch_cls_preds.shape
torch.Size([2, 70400, 1])
(Pdb) batch_box_preds.shape
torch.Size([2, 70400, 7])

至此 一阶段HEAD的forward 就结束了。

你可能感兴趣的:(目标检测,人工智能)