Pytorch torchvision构建Faster-rcnn(四)----ROIHead

经过RPN后,我们得到了Classification/Regression loss和Proposal Region,接下来要对得到的loss和proposals做后续处理,其中包括proposal的细分类和再回归,以及ROI Pooling等操作。

目录

RoIHeads

select_training_samples

ROIAlign pooling

box_head和box_predictor

fastrcnn_loss

postprocess_detections

总结


RoIHead

从RPN的到了一定数量的proposals,接下来需要做的事情是先将proposals和groundtruth对应的box匹配上,并生成用于traning的target,因此下面介绍一下select_training_samples。

select_training_samples

select_training_samples主要完成了以下几个功能:

  1. 需要将rpn生成的proposals与groundtruth匹配,匹配后会有一些大于high_threshold阈值的positive proposals和一些小于low_threshold阈值的negative proposals,也会有一些介于两个阈值的不参与loss计算的ignore proposals。
  2. 挑选出其中的正样本和负样本,并保持其比例一定(默认positive:negative=1:3),且总数量一定(默认512)。
  3. 将正样本(positive proposals)与proposal region间的deltas(dx, dy, dw, dh)。

代码

    def select_training_samples(self, proposals, targets):
        self.check_targets(targets)
        gt_boxes = [t["boxes"] for t in targets]
        gt_labels = [t["labels"] for t in targets]

        # append ground-truth bboxes to propos
        proposals = self.add_gt_proposals(proposals, gt_boxes)

        # 和rpn中的match相同,计算每个proposal和groudtruth的iou
        # matched_idxs保存的是与groudtruth匹配的id(没有匹配上的默认id=0)
        # labels保存的是类别信息,其中背景为0,ignore proposal为-1
        matched_idxs, labels = self.assign_targets_to_proposals(proposals, gt_boxes, gt_labels)
        # subsample对proposal进行sample,挑选出其中的positive和negative proposals
        # 并保证参与训练的正负proposals的比例和个数保持一定
        sampled_inds = self.subsample(labels)
        matched_gt_boxes = []
        num_images = len(proposals)
        # 根据sample的结果选取对应的proposals
        for img_id in range(num_images):
            img_sampled_inds = sampled_inds[img_id]
            proposals[img_id] = proposals[img_id][img_sampled_inds]
            labels[img_id] = labels[img_id][img_sampled_inds]
            matched_idxs[img_id] = matched_idxs[img_id][img_sampled_inds]
            matched_gt_boxes.append(gt_boxes[img_id][matched_idxs[img_id]])
        # 计算gt_truth和proposal间的deltas(dx,dy,dw,dh)
        regression_targets = self.box_coder.encode(matched_gt_boxes, proposals)
        return proposals, matched_idxs, labels, regression_targets

ROIAlign pooling

proposal经过筛选后,需要进行ROIAlign操作,因为输入到roi head模块中的特征是完整尺寸的feature map,需要根据proposals的尺寸在对应的feature map中剪裁出相应的特征,这也就是ROIPooling和ROIAlign要完成的事情。在torchvision实现的faster-rcnn中,实现了精度更高的ROIAlign,定义在torchvision/ops/poolers.py中的MultiScaleRoIAlign。

参数定义:

  • featmap_names : ROIAlign的forward输入是一个包含feature maps的OrderedDict,因此featuremap_names指定从OrderedDict中的哪些feature来做ROIAlign

  • output_size : ROIAlign后输出的feature的大小

  • sampling_ratio :

来看一下官方给出的ROIAlign使用用例:

    Examples::
        # 创建roialign模块
        # ['feat1','feat2']指定用于做roialign的feature
        >>> m = torchvision.ops.MultiScaleRoIAlign(['feat1', 'feat3'], 3, 2)
        # i是输入到roialign中的input feature的OrderedDict
        >>> i = OrderedDict()
        >>> i['feat1'] = torch.rand(1, 5, 64, 64)
        >>> i['feat2'] = torch.rand(1, 5, 32, 32)  # this feature won't be used in the pooling
        >>> i['feat3'] = torch.rand(1, 5, 16, 16)
        >>> # 创建6个随机boxes作为forward输入
        >>> boxes = torch.rand(6, 4) * 256; boxes[:, 2:] += boxes[:, :2]
        >>> image_sizes = [(512, 512)] # image_size为图像输入大小
        >>> output = m(i, [boxes], image_sizes)
        >>> print(output.shape)
        # 经过roialign后,6个boxes生成了经过在feature上剪裁后的3×3大小的Feature
        >>> torch.Size([6, 5, 3, 3])

    """

box_head和box_predictor

box_head和box_predictor将经过roialign后的特征,通过全连接,得到分类和回归结果。

box_features = self.box_roi_pool(features, proposals, image_shapes)
box_features = self.box_head(box_features)
class_logits, box_regression = self.box_predictor(box_features)

fastrcnn_loss

如果是训练的话,需要计算loss,同rpn,分类用交叉熵,回归用SmoothL1 Loss。

postprocess_detections

如果是推理时,则需要对结果进行后处理,代码定义在roi_head.py中的postprocess_detections中:

    def postprocess_detections(self, class_logits, box_regression, proposals, image_shapes):
        device = class_logits.device
        num_classes = class_logits.shape[-1]
        
        # proposals为rpn生成的proposal region
        # proposals以list形式传入,list元素个数等于batch_size
        # boxes_in_image : p_num × 4,其中P_num为经过sample后的proposal region个数
        # boxes_per_image获取每张图片中的proposals个数
        boxes_per_image = [len(boxes_in_image) for boxes_in_image in proposals]
        # 通过网络输出的box_regression和proposals得到最后的bbox坐标
        pred_boxes = self.box_coder.decode(box_regression, proposals)

        # pred_scores为分类结果
        pred_scores = F.softmax(class_logits, -1)

        # split boxes and scores per image
        pred_boxes = pred_boxes.split(boxes_per_image, 0)
        pred_scores = pred_scores.split(boxes_per_image, 0)

        all_boxes = []
        all_scores = []
        all_labels = []
        for boxes, scores, image_shape in zip(pred_boxes, pred_scores, image_shapes):
            boxes = box_ops.clip_boxes_to_image(boxes, image_shape)

            # labels size: num_proposals × num_classes
            # labels的每一行为class的id,从0到num_classes
            labels = torch.arange(num_classes, device=device)
            labels = labels.view(1, -1).expand_as(scores)

            # id=0为背景类,推理时排除背景类别
            boxes = boxes[:, 1:]
            scores = scores[:, 1:]
            labels = labels[:, 1:]

            # 将boxes,scores,labels的大小分别resize成:N×4,N,N
            # N = num_proposals × num_classes
            # 这样做的目的是将每个类别进行处理
            boxes = boxes.reshape(-1, 4)
            scores = scores.flatten()
            labels = labels.flatten()

            # 仅保留大于score阈值的结果
            inds = torch.nonzero(scores > self.score_thresh).squeeze(1)
            boxes, scores, labels = boxes[inds], scores[inds], labels[inds]

            # 移除最小长度小于0.01的boxes
            keep = box_ops.remove_small_boxes(boxes, min_size=1e-2)
            boxes, scores, labels = boxes[keep], scores[keep], labels[keep]

            # 对boxes做nms,由于传入了labels参数,nms是针对于每个类别分别做nms
            keep = box_ops.batched_nms(boxes, scores, labels, self.nms_thresh)
            # 仅保留前detctions_per_img个结果,默认保留前100
            keep = keep[:self.detections_per_img]
            boxes, scores, labels = boxes[keep], scores[keep], labels[keep]

            all_boxes.append(boxes)
            all_scores.append(scores)
            all_labels.append(labels)

        return all_boxes, all_scores, all_labels

总结

至此,我们通过处理输入到roi_head中的feature map和proposals,得到了最终的检测结果(或者是loss),总结一下,roi_head主要做了以下几件事:

  1. 对rpn得到的proposals做了随机筛选,保证正样本和负样本比例一定(默认1:3),并且总数量一定(默认512)
  2. 通过roi_align对feature map做剪裁,得到固定尺寸的feature
  3. roi_align得到的feature经过全连接层得到分类和回归结果
  4. 对分类和回归结果进行后处理,包括:box_decode,过滤分类置信度过小或者box size过小的结果,并进行nms

你可能感兴趣的:(Pytorch,深度学习)