CenterPoint 源码流程解读(二)

接上一篇CenterPoint 源码流程解读(一)

CenterPoint 源码流程解读(二)

主要内容:
二. Backbone – 特征提取
2.1 voxelize:体素化
2.2 点云voxel编码方式: PillarFeatureNet(PFN)
2.3 点云中间编码方式: PointPillarsScatter
2.4 backbone: SECOND
三. Neck
四. Head和loss
4.1 CenterHead
4.2 loss

二. Backbone – 特征提取

参考:激光点云3D目标检测算法之PointPillars

2.1 voxelize:体素化

主要实现类Voxelization,将点云转为voxel表征方式。

  • voxels : 30000205, 30000个体素,每个体素20个点,每个点5维度信息
  • coors:体素坐标,30000*3
  • num_points_per_voxel: 每个体素中点的个数
    def forward(ctx,
                points,
                voxel_size,
                coors_range,
                max_points=35,
                max_voxels=20000,
                deterministic=True):
        """convert kitti points(N, >=3) to voxels. 
        """
        if max_points == -1 or max_voxels == -1:
            coors = points.new_zeros(size=(points.size(0), 3), dtype=torch.int)
            dynamic_voxelize(points, coors, voxel_size, coors_range, 3)
            return coors
        else:
            voxels = points.new_zeros(
                size=(max_voxels, max_points, points.size(1))) #30000,20,5
            coors = points.new_zeros(size=(max_voxels, 3), dtype=torch.int) #30000,3
            num_points_per_voxel = points.new_zeros(
                size=(max_voxels, ), dtype=torch.int)
            voxel_num = hard_voxelize(points, voxels, coors,
                                      num_points_per_voxel, voxel_size,
                                      coors_range, max_points, max_voxels, 3,
                                      deterministic) # cuda中体素化ops,29249
            # select the valid voxels,去掉空的voxel
            voxels_out = voxels[:voxel_num]
            coors_out = coors[:voxel_num]
            num_points_per_voxel_out = num_points_per_voxel[:voxel_num]  #每个体素中点数
            return voxels_out, coors_out, num_points_per_voxel_out

2.2 点云voxel编码方式: PillarFeatureNet(PFN)

主要作用是将点云voxel表征方式进行编码,并建立稠密特征张量。

将上一步中的体素化点云编码成为10维的向量D(x,y,z,r,delt_t,xc,yc,zc,xp,yp), 其中x,y,z,r,delt_t分别表示点云3个坐标、反射强度、多帧时点的时间戳差值;xc,yc,zc表示到该Pillar中所有点的算术平均值点(中心)的距离,xp,yp表示该点到该Pillar的x,y坐标中心的偏移值,得到一个(P,N,D)稠密张量。再通过多层 PFNLayer = linear线性层 + BatchNorm + ReLU + max pooling,转换得到(P,C,D),C代表channel数目,最终经过对每个pillar进行最大池化得到(P,C)的张量。

    def forward(self, features, num_points, coors):
        """Forward function.
        """
        features_ls = [features]
        # Find distance of x, y, and z from cluster center,到每个pillar中心点的距离
        if self._with_cluster_center:
            points_mean = features[:, :, :3].sum(
                dim=1, keepdim=True) / num_points.type_as(features).view(
                    -1, 1, 1)
            f_cluster = features[:, :, :3] - points_mean
            features_ls.append(f_cluster)

        # Find distance of x, y, and z from pillar center, 到pillar中心坐标距离
        dtype = features.dtype
        if self._with_voxel_center:
            if not self.legacy:
                f_center = torch.zeros_like(features[:, :, :2])
                f_center[:, :, 0] = features[:, :, 0] - (
                    coors[:, 3].to(dtype).unsqueeze(1) * self.vx +
                    self.x_offset)
                f_center[:, :, 1] = features[:, :, 1] - (
                    coors[:, 2].to(dtype).unsqueeze(1) * self.vy +
                    self.y_offset)
            else:
                f_center = features[:, :, :2]
                f_center[:, :, 0] = f_center[:, :, 0] - (
                    coors[:, 3].type_as(features).unsqueeze(1) * self.vx +
                    self.x_offset)
                f_center[:, :, 1] = f_center[:, :, 1] - (
                    coors[:, 2].type_as(features).unsqueeze(1) * self.vy +
                    self.y_offset)
            features_ls.append(f_center)
        
        #计算点到中心(0,0)距离
        if self._with_distance: 
            points_dist = torch.norm(features[:, :, :3], 2, 2, keepdim=True) 
            features_ls.append(points_dist)

        # Combine together feature decorations,合并
        features = torch.cat(features_ls, dim=-1)
        # The feature decorations were calculated without regard to whether
        # pillar was empty. Need to ensure that
        # empty pillars remain set to zeros.
        voxel_count = features.shape[1]
        mask = get_paddings_indicator(num_points, voxel_count, axis=0)
        mask = torch.unsqueeze(mask, -1).type_as(features)
        features *= mask

        for pfn in self.pfn_layers:
            features = pfn(features, num_points)

        return features.squeeze() #[P,C] 27059, 64

2.3 点云中间编码方式: PointPillarsScatter

作用:将学习到的稠密特征[C,P] 还原成伪图像[C,W,H]

    def forward_batch(self, voxel_features, coors, batch_size):
        """Scatter features of single sample.
        """
        # batch_canvas will be the final output.
        batch_canvas = []
        for batch_itt in range(batch_size):
            # Create the canvas for this sample
            canvas = torch.zeros(
                self.in_channels,
                self.nx * self.ny,
                dtype=voxel_features.dtype,
                device=voxel_features.device)

            # Only include non-empty pillars
            batch_mask = coors[:, 0] == batch_itt
            this_coors = coors[batch_mask, :]
            indices = this_coors[:, 2] * self.nx + this_coors[:, 3]
            indices = indices.type(torch.long)
            voxels = voxel_features[batch_mask, :]
            voxels = voxels.t()

            # Now scatter the blob back to the canvas.
            canvas[:, indices] = voxels

            # Append to a list for later stacking.
            batch_canvas.append(canvas)

        # Stack to 3-dim tensor (batch-size, in_channels, nrows*ncols)
        batch_canvas = torch.stack(batch_canvas, 0)

        # Undo the column stacking to final 4-dim tensor
        batch_canvas = batch_canvas.view(batch_size, self.in_channels, self.ny,
                                         self.nx)
        return batch_canvas

2.4 backbone: SECOND

使用多层的conv+BN+Relu三件套进行特征提取,总共有[4,6,6]层三件套组成, channel维度分别对应[64, 128, 256]。

        blocks = []
        for i, layer_num in enumerate(layer_nums):
            block = [
                build_conv_layer(
                    conv_cfg,
                    in_filters[i],
                    out_channels[i],
                    3,
                    stride=layer_strides[i],
                    padding=1),
                build_norm_layer(norm_cfg, out_channels[i])[1],
                nn.ReLU(inplace=True),
            ]
            for j in range(layer_num):
                block.append(
                    build_conv_layer(
                        conv_cfg,
                        out_channels[i],
                        out_channels[i],
                        3,
                        padding=1))
                block.append(build_norm_layer(norm_cfg, out_channels[i])[1])
                block.append(nn.ReLU(inplace=True))

            block = nn.Sequential(*block)
            blocks.append(block)

        self.blocks = nn.ModuleList(blocks)

三. Neck

SECONDFPN, 对Backbone得到特征进行加工和合理利用。主要还是由类似conv+BN+Relu三件套构成,将上一步channel均变成128,然后合并,得到[B,C,W,H]的张量,此中C为128*3 = 384,结构如下:

  (pts_neck): SECONDFPN(
    (deblocks): ModuleList(
      (0): Sequential(
        (0): Conv2d(64, 128, kernel_size=(2, 2), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): ConvTranspose2d(256, 128, kernel_size=(2, 2), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
    )
  )

四. Head和loss

4.1 CenterHead

CenterHead, 先经过一个共享卷积,将特征由[B,384,128,128]变为[B,64,128,128]。然后分别对每个任务tasks进行推理,最后得到预测结果字典。

    def forward(self, feats):
        """Forward pass.
        """
        return multi_apply(self.forward_single, feats)
        
    def forward_single(self, x):
        """Forward function for CenterPoint.
        """
        ret_dicts = []

        x = self.shared_conv(x) # 共享卷积,三件套

        for task in self.task_heads:
            ret_dicts.append(task(x))
        return ret_dicts

每个大类别,含有一个task。而每一个task,又包含1个SeparateHead,每个SeparateHead包含6个需要回归的head。故配置中有6个task,6个SeparateHead,6*6=36个需要回归的head。其中一个SeparateHead结构如下,6个head分别为reg、height、dim、rot、vel、heatmap。最终经过CenterHead处理后,得到关于6个tasks的list。

注意:因为不同类别,BEV视角下尺寸不同,如car和pedestrian,故将其分为不同的任务;而pedestrian与traffic_cone在BEV视角下,尺寸相近,故作为一个task进行回归

(0): SeparateHead(
        (reg): Sequential(
          (0): ConvModule(
            (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (activate): ReLU(inplace=True)
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (height): Sequential(
          (0): ConvModule(
            (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (activate): ReLU(inplace=True)
          )
          (1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (dim): Sequential(
          (0): ConvModule(
            (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (activate): ReLU(inplace=True)
          )
          (1): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (rot): Sequential(
          (0): ConvModule(
            (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (activate): ReLU(inplace=True)
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (vel): Sequential(
          (0): ConvModule(
            (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (activate): ReLU(inplace=True)
          )
          (1): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
        (heatmap): Sequential(
          (0): ConvModule(
            (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (activate): ReLU(inplace=True)
          )
          (1): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        )
      )

4.2 loss

参考:CenterHead的loss函数

针对每一个task,利用gtbox真值和get_targets得到heatmaps, anno_boxes, inds, masks,这四个量含义如下表:

参数 heatmap anno_box ind mask
说明 中心点热图分数 框的gt真值 框的中心点在热力图中的位置 有效box的掩码,1/0划分
尺寸 [class_num, 128, 128] [500, 10] [500] [500]
取值举例 每个class有一张热图 10维参数的含义,第1-2维表示中心点的偏移量offset_x、 offset_y,第3维表示中心点的高度z,第4-6维表示目标框的长宽高box_dim,第7-8维表示旋转角度sin(α) cos(α),第9-10维表示速度vx vy ind[idx] = x*128 + y mask[idx] = 1

主要包含两个loss,一个是针对heatmap的focal loss,另一个是针对bbox的L1 loss

    def loss(self, gt_bboxes_3d, gt_labels_3d, preds_dicts, **kwargs):
        """Loss function for CenterHead.
        """

        heatmaps, anno_boxes, inds, masks = self.get_targets(
            gt_bboxes_3d, gt_labels_3d)
        loss_dict = dict()
        for task_id, preds_dict in enumerate(preds_dicts):
            # loss1: heatmap focal loss 
            preds_dict[0]['heatmap'] = clip_sigmoid(preds_dict[0]['heatmap'])
            num_pos = heatmaps[task_id].eq(1).float().sum().item()
            loss_heatmap = self.loss_cls(
                preds_dict[0]['heatmap'],
                heatmaps[task_id],
                avg_factor=max(num_pos, 1)) 
            target_box = anno_boxes[task_id]
            # reconstruct the anno_box from multiple reg heads
            preds_dict[0]['anno_box'] = torch.cat(
                (preds_dict[0]['reg'], preds_dict[0]['height'],
                 preds_dict[0]['dim'], preds_dict[0]['rot'],
                 preds_dict[0]['vel']),
                dim=1)

            # Regression loss for dimension, offset, height, rotation
            ind = inds[task_id]
            num = masks[task_id].float().sum()
            pred = preds_dict[0]['anno_box'].permute(0, 2, 3, 1).contiguous()
            pred = pred.view(pred.size(0), -1, pred.size(3))
            pred = self._gather_feat(pred, ind)
            mask = masks[task_id].unsqueeze(2).expand_as(target_box).float()
            isnotnan = (~torch.isnan(target_box)).float()
            mask *= isnotnan

            code_weights = self.train_cfg.get('code_weights', None)
            bbox_weights = mask * mask.new_tensor(code_weights)
            # loss2: bbox loss
            loss_bbox = self.loss_bbox(
                pred, target_box, bbox_weights, avg_factor=(num + 1e-4))
            loss_dict[f'task{task_id}.loss_heatmap'] = loss_heatmap
            loss_dict[f'task{task_id}.loss_bbox'] = loss_bbox
        return loss_dict

你可能感兴趣的:(激光点云工程复现,深度学习,计算机视觉,目标检测,人工智能)