Pytorch torchvision构建Faster-rcnn(三)----Region Proposal Network

RPN是two-stage的标志性结构,并且其本身也是一个二分类的目标检测网络,因此在faster-rcnn的整个网络结构中能看到anchor的使用,回归和分类等操作,这里讲具体介绍一下。

整个rpn部分代码在torchvison/models/detection/rpn.py中,其中定义了RPNHead,AnchorGenerator,RegionProposalNetwork三个模块。

目录

AnchorGenerator

RegionProposalNetwork

BoxCoder

encode

decode

filter_proposals

assign_targets_to_anchors

compute_loss

RPNHead


AnchorGenerator

AnchorGenerator的定义:

Module that generates anchors for a set of feature maps and image sizes.

顾名思义,AnchorGenerator的主要作用就是生成与feature相对应的anchors。

输入参数:

  • sizes : 用于每层feature的anchor基础尺寸
  • aspect_ratios : 宽高比例

根据sizes的个数和aspect_ratios的个数,将在feature map的每个位置上生成固定数量的anchor。

并且注意AnchorGenerator继承自nn.Module,也是有forward()函数的,并且注意forward的输入包含了一个ImageList类型。ImageList类型的定义在torchvison/models/detection/image_list.py中。

写个简单的例子测试一下:

import torchvision.models.detection.rpn as rpn
import torchvision.models.detection.image_list as image_list
import torch

# 创建AnchorGenerator实例
anchor_generator = rpn.AnchorGenerator()

# 构建ImageList
batched_images = torch.Tensor(8,3,640,640)
image_sizes = [(640,640)] * 8
image_list_ = image_list.ImageList(batched_images,image_sizes)

# 构建feature_maps
feature_maps = [torch.Tensor([8,256,80,80]),torch.Tensor(8,256,160,160), torch.Tensor(8,256,320,320)]

# 生成anchors
anchors = anchor_generator(image_list_,feature_maps)

验证一下生成的anchors:

>>> print(type(anchors))

>>> for anchor in anchors:
...     print(anchor.shape)
... 
torch.Size([403200, 4])
torch.Size([403200, 4])
torch.Size([403200, 4])
torch.Size([403200, 4])
torch.Size([403200, 4])
torch.Size([403200, 4])
torch.Size([403200, 4])
torch.Size([403200, 4])

从结果上我们可以看到:

  1. 返回的结果是一个Tensor的list,list中的元素个数和batch_size相同
  2. anchors中的每个Tensor大小相同,均为torch.Size([403200, 4])

这里来分析下为什么anchors中输出的tensor大小都是403200×4:

首先AnchorGenerator()的默认参数aspect_scale=(0.5, 1.0, 2.0),sizes=(128, 256, 512),因为输入的aspect_scale大小为3,因此会在feature的每个位置上生成3个anchor,共生成80×80×3+160×160×3+320×320×3=403200个anchors。

注意sizes中的每个值是用于每层feature的anchor的基数大小,比如在例子中80×80的feature map每个grid设置的anchor的大小为128,并根据aspect_scale的值生成(128/sqrt(2), 128*sqrt(2)),(128*128),(128*sqrt(2),128/sqrt(2))三种尺寸的anchor

RegionProposalNetwork

RegionProposalNetwork是整个rpn的主体,其中集成了AnchorGenerator和RPNHead,功能包含生成anchors,anchor与groundtruth的匹配,nms,回归与分类损失的计算等等。

参数定义如下:

  • anchor_generator :传入AnchorGenerator

  • head:通过feature生成regression deltas和objectness的模块

  • fg_iou_thresh:iou大于该阈值被认为是前景

  • bg_iou_thresh:iou小于该阈值被认为是背景

  • batch_size_per_image:

  • positive_fraction:保留positive anchors的最大比例

  • pre_nms_top_n:用来做nms的positive anchors的个数

  • post_nms_top_n:nms后保留的positive anchors个数

  • nms_thresh:nms的阈值

因为组件比较多,我们从forward入手来分析rpn是如何实现的。先看forward的定义:

    def forward(self, images, features, targets=None):

        features = list(features.values())
        # feature通过head得到预测结果:置信度和box offsets
        # objectness: 
        objectness, pred_bbox_deltas = self.head(features)
        # 生成anchors
        anchors = self.anchor_generator(images, features)

        num_images = len(anchors)
        num_anchors_per_level = [o[0].numel() for o in objectness]
        # 经过concat后,objectness和pred_bbox_deltas被concat成了tensor
        # objectness size: anchors_in_image × 1
        # pred_bbox_deltas: anchors_in_image × 4
        objectness, pred_bbox_deltas = \
            concat_box_prediction_layers(objectness, pred_bbox_deltas)
        # 将deltas转化为(x1,y1,x2,y2)形式的box坐标储存在proposals中
        proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)

        proposals = proposals.view(num_images, -1, 4)
        # 过滤proposals,对所有的proposals根据score进行排序后再进行nms
        # 返回结果是list,list中包含每个batch检测到的box和对应的score
        boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)

        losses = {}
        if self.training:
            # 训练过程中,计算anchor和gt_box的IOU
            # 判断anchor为positive/negative/ignore
            labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
            # 如果anchor为positive,将对应的gt_box转化为regression delta的形式
            regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
            # 计算classification和regression losses
            loss_objectness, loss_rpn_box_reg = self.compute_loss(
                objectness, pred_bbox_deltas, labels, regression_targets)
            losses = {
                "loss_objectness": loss_objectness,
                "loss_rpn_box_reg": loss_rpn_box_reg,
            }
        return boxes, losses

 

BoxCoder

BoxCoder定义在torchvision/models/detection/_utils.py中,主要作用是实现groudtruth的box坐标和regression deltas之间的decode和encode工作。如果定义box的坐标为(x1, y1, x2, y2),deltas为(dx, dy, dw, dh),那么:

encode是将bbox转换为deltas的过程,decode则是讲deltas转化为bbox的过程。具体如何转换请参考各类anchor目标检测方法,可以参考:一文读懂Faster RCNN

BoxCoder中有两个重要函数:encode()和decode()

encode

encode()完成从bbox的四个坐标(x1, y1, x2, y2)到delta (dx, dy ,dw, dh)的转换。注意encode的定义:

def encode(self, reference_boxes, proposals)

  • matched_gt_boxes : 与anchor IOU大于阈值的groudtruth box
  • proposals : 即anchors

decode

decode完成对网络输出的pred_bbox_deltas的解码工作,其定义为:

def decode(self, rel_codes, boxes)

  • rel_codes : 以deltas形式保存的结果,一般这个结果是网络的输出,包含(dx, dy, dw, dh)信息,需要被decode成(x1, x2, y1, y2)的坐标形式
  • boxes :  anchors

注意:boxes的类型一定要是list或者tuple,list的长度为batch_size的大小,list中每个元素均为tensor,具体尺寸和之前说过AnchorGenerator返回的结果一致。而rel_codes可以是list/tuple,也可以是经过concate后的tensor。注意rel_codes和boxes的大小要保持基本一致,即rel_codes如果是list/tuple,他的长度为batch_size大小,如果是concate后的tensor的形式,则size为:B×C×H×W的形式。

来看一下decode在rpn中是如何使用的:

# objectness: (B × anchors_in_image) × 1
# pred_bbox_deltas: (B × anchors_in_image) × 4
objectness, pred_bbox_deltas = \
    concat_box_prediction_layers(objectness, pred_bbox_deltas)
# decode将对应的deltas和anchors转为为bbox的坐标保存在proposals中
# proposals : (B × anchors_in_image) × 1 × 4
proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
proposals = proposals.view(num_images, -1, 4)

filter_proposals

filter_proposals定义在rpn.py中,经过box decode之后,我们将网络输出的pred_bbox_deltas转化成为了bbox坐标形式的proposals,而这些网络预测的proposals中,有一些objectness score小于置信度阈值,还有一些proposals的区域重合了,需要进行nms,filter_proposals的主要任务就是完成这些功能,将objcectness score过小和重合的框滤除掉。这里对filter_proposals做一些注释:

    def filter_proposals(self, proposals, objectness, image_shapes, num_anchors_per_level):
        num_images = proposals.shape[0]
        device = proposals.device
        # do not backprop throught objectness
        objectness = objectness.detach()
        objectness = objectness.reshape(num_images, -1)

        levels = [
            torch.full((n,), idx, dtype=torch.int64, device=device)
            for idx, n in enumerate(num_anchors_per_level)
        ]

        levels = torch.cat(levels, 0)
        levels = levels.reshape(1, -1).expand_as(objectness)

        # 获取每个batch中每个feature中objectness前top_n个结果的id
        # top_n_idx : batch_size × id
        top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)
        batch_idx = torch.arange(num_images, device=device)[:, None]
        # 根据top_n的id,获取每个batch中前top_n个objectness
        objectness = objectness[batch_idx, top_n_idx]
        # levels的size和objectness,proposals相同
        # levels中保存的是当前结果来自哪一层Feature的idx
        levels = levels[batch_idx, top_n_idx]
        # 根据top_n的id,获取每个batch中前top_n个bbox的prediction
        proposals = proposals[batch_idx, top_n_idx]

        final_boxes = []
        final_scores = []
        # 对每个batch分别做处理
        for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes):
            # 防止boxes坐标溢出image
            boxes = box_ops.clip_boxes_to_image(boxes, img_shape)
            # 移除小于min_size的boxes
            keep = box_ops.remove_small_boxes(boxes, self.min_size)
            boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep]
            # 非极大值抑制,返回大于阈值nms_thresh的box的idx
            keep = box_ops.batched_nms(boxes, scores, lvl, self.nms_thresh)
            # 取前post_nms_top_n个结果
            keep = keep[:self.post_nms_top_n]
            boxes, scores = boxes[keep], scores[keep]
            final_boxes.append(boxes)
            final_scores.append(scores)
        return final_boxes, final_scores

assign_targets_to_anchors

网络训练时,需要计算anchor与groudtruth的iou用于决定哪个anchor为前景,哪个anchor为背景,assign_targets_to_anchors就是用来完成这个功能,看一下代码中的实现方式:

    def assign_targets_to_anchors(self, anchors, targets):
        labels = []
        matched_gt_boxes = []
        for anchors_per_image, targets_per_image in zip(anchors, targets):
            gt_boxes = targets_per_image["boxes"]
            # box_similarity计算即计算boxes和anchors的iou
            # gt_boxes: M × 4
            # anchors_per_image: N × 4
            # 返回结果为M × N的矩阵,矩阵中每个值代表对应gt_boxes和anchor的iou
            match_quality_matrix = self.box_similarity(gt_boxes, anchors_per_image)
            # proposal_matcher实现每个anchor和gt_box的匹配工作
            # 输入为M × N的iou矩阵match_quality_matrix
            # 对于N个anchors,每个anchor和M个gt_box的iou大于high_threshold,认为是positive
            # 并将M个gt_boxes中iou最大的box的idx匹配到第N个anchor
            # 当M个gt_box的iou均小于low_threshold,认为是negtive,赋值-1
            # 其余介于high_threshold和low_threshold被认为是ignore,赋值-2
            # 因此得到的matched_idxs是一个N维的tensor,值包含idx/-1/-2
            matched_idxs = self.proposal_matcher(match_quality_matrix)
            
            # matched_gt_boxes_per_image: N × 4
            # matched_gt_boxes_per_image保存每个anchor assigned gt_box
            matched_gt_boxes_per_image = gt_boxes[matched_idxs.clamp(min=0)]

            labels_per_image = matched_idxs >= 0
            labels_per_image = labels_per_image.to(dtype=torch.float32)

            # Background被赋值为0
            bg_indices = matched_idxs == self.proposal_matcher.BELOW_LOW_THRESHOLD
            labels_per_image[bg_indices] = 0

            # ignored被赋值为-1
            inds_to_discard = matched_idxs == self.proposal_matcher.BETWEEN_THRESHOLDS
            labels_per_image[inds_to_discard] = -1

            labels.append(labels_per_image)
            matched_gt_boxes.append(matched_gt_boxes_per_image)
        return labels, matched_gt_boxes

assign_targets_to_anchors返回的结果:

  • labels : List of Tensor,labels的长度为batch_size,每个tensor的size为N,N为每个image中所有anchors的数量。 而tensor中的每个值可以是1/0/-1,1代表positive,0代表negative,-1代表ignore不参与loss计算。
  • matched_gt_boxes : size为N×4,N的含义同labels代表每个image中所有anchors的数量。

compute_loss

计算target和网络输出的prediction之间的loss,其中回归损失用smoothL1,分类损失用binary cross entropy。

值得说明的是,其中fg_bg_sampler的作用是限制rpn生成anchor的数量,具体做法可以参考faster-rcnn原文:

Instead, we randomly sample 256 anchors in an image to compute the loss function of a mini-batch, where the sampled positive and negative anchors have a ratio of up to 1:1. If there are fewer than 128 positive samples in an image, we pad the mini-batch with negative ones.

保持positive和negative anchors的总数在256并且比例维持在1:1,当positive anchors的数量小于128时,由negative anchors填充。

RPNHead

RPNHead的功能就比较简单了,主要是对对feature做和激活等操作生成classification和regression结果。

class RPNHead(nn.Module):
    def __init__(self, in_channels, num_anchors):
        super(RPNHead, self).__init__()
        self.conv = nn.Conv2d(
            in_channels, in_channels, kernel_size=3, stride=1, padding=1
        )
        self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)
        self.bbox_pred = nn.Conv2d(
            in_channels, num_anchors * 4, kernel_size=1, stride=1
        )

        for l in self.children():
            torch.nn.init.normal_(l.weight, std=0.01)
            torch.nn.init.constant_(l.bias, 0)

    def forward(self, x):
        logits = []
        bbox_reg = []
        for feature in x:
            t = F.relu(self.conv(feature))
            logits.append(self.cls_logits(t))
            bbox_reg.append(self.bbox_pred(t))
        return logits, bbox_reg

Pytorch torchvision构建Faster-rcnn(三)----Region Proposal Network_第1张图片

对照模型和图片可以自行参考下RPNHead的结构。

至此,整个rpn网络的整体架构就讲解完毕,以上代码均能够在torchvision源码中找到。

你可能感兴趣的:(Pytorch,深度学习)