Faster-RCNN网络实现目标检测----pytorch

pytorch框架实现faster-RCNN网络

Faster RCNN网络pytorch框架和Keras框架代码对比

https://blog.csdn.net/weixin_44791964/article/details/105739918
代码来自:
https://github.com/bubbliiiing/faster-rcnn-pytorch

Keras是高度封装起来的框架,用起来就是调用各部分的模块。因此Keras适合于学习,需习模型的整体结构。
本文记一篇学习笔记,把Keras框架和Pytorch框架的代码对比吗,完成Faster-RCNN网络

1 faster-RCNN网络概述

Faster-RCNN是一种two-stage的目标检测方法,(与one-stage方法相比two-stage的目标检测方法检测精度高,但是速度较慢)

Faster-RCNN网络对输入的图片大小没有限制,首相将输入的图片resize 成短边为600,按照原图像的比例进行缩放。缩放后的图片经过backbone进行特征提取得到共享特征层。例如特征层大小为38×38,可以看做将原图像划分成38×38的网格。然后利用RPN(区域建议网络)对先验框进行调整,得到建议框。
其中先验框是指事先在大型数据集上标注好的框。
得到建议框的过程可以看做是对框的粗略筛选。然后利用建议框在共享特征层上进行截取,得到局部特征层。
将局部特征层输入到ROI_Pooling中,首先将特征层resize成相同的大小,然后进行分类预测(cls_pred)和回归预测(bbox_pred),得到目标检测的结果。
Faster-RCNN网络实现目标检测----pytorch_第1张图片


2 主干特征提取网络(backbone ResNet_50)

pytorch 中的torch.nn.Conv2d()函数
https://blog.csdn.net/qq_34243930/article/details/107231539?%3E

pytorch官网

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1,groups=1, bias=True)

参数:
in_channels:输入图像的通道数
out_channels:输出图像的通道数
kernel_size:卷积核的大小,参数的类型可以使一个整数(int),或者一个元组(tuple),卷积核可以宽 高不相等
stride=1:卷积的步长,默认为1
padding=0:填充方式
dilation=1:空洞卷积,扩张操作
groups=1:分组卷积,默认不分组
bias=True:在输出中添加一个可学习的偏差,默认为True

空洞卷积https://blog.csdn.net/qq_34243930/article/details/107231539?%3E
Faster-RCNN网络实现目标检测----pytorch_第2张图片

主干特征提取网络使用的是ResNet_50
该网络可以看做是Conv_blockIdentity_block的堆叠。
特征提取网络的代码如下:

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False) # change
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, # change
                    padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * 4)
        self.relu = nn.ReLU(inplace=True)
        # downsample表示残差边
        self.downsample = downsample
        self.stride = stride

首先第一部分class Bottleneck定义了瓶颈结构,包括三个卷积部分:1×1卷积减少通道数,3×3卷积进行特征提取,1×1卷积再将通道数增加。
downsample表示残差边,由于Identity_block和Conv_block的区别在于残差边上是否有卷积核归一化的操作。

然后forward函数调用各个网络层

    def forward(self, x):
        residual = x
        # 1×1卷积
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        # 3×3卷积
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
        # 1×1卷积
        out = self.conv3(out)
        out = self.bn3(out)
		# 判断是有有残差边,Conv_block有残差边,identity_block没有残差边
        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out

函数make_layer是将卷积层堆叠起来。

    def _make_layer(self, block, planes, blocks, stride=1):
        # 残差边
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                    kernel_size=1, stride=stride, bias=False),
            nn.BatchNorm2d(planes * block.expansion),
        )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

这里,利用循环将conv_block和identity_block堆叠起来。blocks堆叠的数量
在最外层的函数resnet50里面定义,blocks=[3, 4, 6, 3],在resnet网络中分别有3层、4层、6层、3层卷积堆叠

def resnet50():
    # 第二个参数为block存放的是堆叠的conv_block和identity_block的个数
    model = ResNet(Bottleneck, [3, 4, 6, 3])
    # 获取特征提取部分
    # 将主干特征提取网络结果分割开
    features = list([model.conv1, model.bn1, model.relu, model.maxpool, model.layer1, model.layer2, model.layer3])
    # 获取分类部分
    classifier = list([model.layer4, model.avgpool])
    features = nn.Sequential(*features)
    classifier = nn.Sequential(*classifier)
    return features,classifier

每一层堆叠的卷积输入通道数和输出通道数是不同的:

class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=1000):
        self.inplanes = 64
        super(ResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                    bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=0, ceil_mode=True) # change
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        
        self.avgpool = nn.AvgPool2d(7)
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
                m.weight.data.normal_(0, math.sqrt(2. / n))
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

Keras框架搭建ResNet_50:

def identity_block(input_tensor, kernel_size, filters, stage, block):

    filters1, filters2, filters3 = filters

    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = Conv2D(filters1, (1, 1), name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters2, kernel_size,padding='same', name=conv_name_base + '2b')(x)
    x = BatchNormalization(name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
    x = BatchNormalization(name=bn_name_base + '2c')(x)

    x = layers.add([x, input_tensor])
    x = Activation('relu')(x)
    return x


def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):

    filters1, filters2, filters3 = filters

    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = Conv2D(filters1, (1, 1), strides=strides,
               name=conv_name_base + '2a')(input_tensor)
    x = BatchNormalization(name=bn_name_base + '2a')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters2, kernel_size, padding='same',
               name=conv_name_base + '2b')(x)
    x = BatchNormalization(name=bn_name_base + '2b')(x)
    x = Activation('relu')(x)

    x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
    x = BatchNormalization(name=bn_name_base + '2c')(x)

    shortcut = Conv2D(filters3, (1, 1), strides=strides,
                      name=conv_name_base + '1')(input_tensor)
    shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut)

    x = layers.add([x, shortcut])
    x = Activation('relu')(x)
    return x


def ResNet50(inputs):

    img_input = inputs

    x = ZeroPadding2D((3, 3))(img_input)
    x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)
    x = BatchNormalization(name='bn_conv1')(x)
    x = Activation('relu')(x)

    x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)

    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')

    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')

    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
    x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')

    return x

3 建议框RPN(Region Proposal Network)

class RegionProposalNetwork(nn.Module):
    def __init__(
            self, in_channels=512, mid_channels=512, ratios=[0.5, 1, 2],
            anchor_scales=[8, 16, 32], feat_stride=16,
            mode = "training",
    ):
        super(RegionProposalNetwork, self).__init__()
        self.anchor_base = generate_anchor_base(anchor_scales=anchor_scales, ratios=ratios)
        # 步长,压缩的倍数
        self.feat_stride = feat_stride
        self.proposal_layer = ProposalCreator(mode)
        # 每一个网格上默认先验框的数量
        n_anchor = self.anchor_base.shape[0]
        # 先进行一个3x3的卷积
        self.conv1 = nn.Conv2d(in_channels, mid_channels, 3, 1, 1)
        # 分类预测先验框内部是否包含物体
        self.score = nn.Conv2d(mid_channels, n_anchor * 2, 1, 1, 0)
        # 回归预测对先验框进行调整
        self.loc = nn.Conv2d(mid_channels, n_anchor * 4, 1, 1, 0)
        normal_init(self.conv1, 0, 0.01)
        normal_init(self.score, 0, 0.01)
        normal_init(self.loc, 0, 0.01)

有特征提取网络的到的特征层,可以看做将原图像划分成不同的网格。
先验框是试下在较大数据及上已经画好的框。这里每一个网格上默认有9个先验框。
然后self.conv1 = nn.Conv2d(in_channels, mid_channels, 3, 1, 1),进行一次3×3的卷积,作用是特征提取。
接着进行两次1×1的卷积:
self.score = nn.Conv2d(mid_channels, n_anchor * 2, 1, 1, 0)
self.loc = nn.Conv2d(mid_channels, n_anchor * 4, 1, 1, 0)
输出通道数分别为n_anchor ×2 和 n_anchor×4,n_anchor ×2表示每一个先验框内是否存在物体,n_anchor×4每一个先验框的调整参数。

    def forward(self, x, img_size, scale=1.):
        n, _, hh, ww = x.shape
        # 对共享特征层进行一个3x3的卷积
        h = F.relu(self.conv1(x))
        # 回归预测
        rpn_locs = self.loc(h)
        rpn_locs = rpn_locs.permute(0, 2, 3, 1).contiguous().view(n, -1, 4)
        # 分类预测
        rpn_scores = self.score(h)
        rpn_scores = rpn_scores.permute(0, 2, 3, 1).contiguous().view(n, -1, 2)
        # 进行softmax
        rpn_softmax_scores = F.softmax(rpn_scores, dim=-1)
        rpn_fg_scores = rpn_softmax_scores[:, :, 1].contiguous()
        rpn_fg_scores = rpn_fg_scores.view(n, -1)
        rpn_scores = rpn_scores.view(n, -1, 2)

        # 生成先验框
        anchor = _enumerate_shifted_anchor(
            np.array(self.anchor_base),
            self.feat_stride, hh, ww)
        rois = list()
        roi_indices = list()
        for i in range(n):
            roi = self.proposal_layer(
                rpn_locs[i].cpu().data.numpy(),
                rpn_fg_scores[i].cpu().data.numpy(),
                anchor, img_size,
                scale=scale)
            batch_index = i * np.ones((len(roi),), dtype=np.int32)
            rois.append(roi)
            roi_indices.append(batch_index)

        rois = np.concatenate(rois, axis=0)
        roi_indices = np.concatenate(roi_indices, axis=0)
        
        return rpn_locs, rpn_scores, rois, roi_indices, anchor

函数forward调用这些卷积完成RPN的任务,生成建议框。
调用这两个卷积操作实现分类预测和回归预测。

keras框架实现RPN网络的作用:
一次3×3的卷积核两次1×1的卷积,
rpn网络输出的结果是对先验框的分类预测和回归预测,
也就是得到框中是否存在物体和对先验框的调整参数。

def get_rpn(base_layers, num_anchors):
    x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer='normal', name='rpn_conv1')(base_layers)
    # 1×1卷积,输出通道数为9
    x_class = Conv2D(num_anchors, (1, 1), activation='sigmoid', kernel_initializer='uniform', name='rpn_out_class')(x)
    # 1×1的卷积,输出通道数为36
    x_regr = Conv2D(num_anchors * 4, (1, 1), activation='linear', kernel_initializer='zero', name='rpn_out_regress')(x)
    
    x_class = Reshape((-1, 1), name="classification")(x_class)
    x_regr = Reshape((-1, 4), name="regression")(x_regr)
    return [x_class, x_regr, base_layers]


def get_classifier(base_layers, input_rois, num_rois, nb_classes=21, trainable=False):
    pooling_regions = 14
    input_shape = (num_rois, 14, 14, 1024)
    # roi_pooling 参数设置
    out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois])
    out = classifier_layers(out_roi_pool, input_shape=input_shape, trainable=True)
    out = TimeDistributed(Flatten())(out)
    out_class = TimeDistributed(Dense(nb_classes, activation='softmax', kernel_initializer='zero'), name='dense_class_{}'.format(nb_classes))(out)
    out_regr = TimeDistributed(Dense(4 * (nb_classes-1), activation='linear', kernel_initializer='zero'), name='dense_regress_{}'.format(nb_classes))(out)
    return [out_class, out_regr]

4 调整先验框生成建议框的过程

首先RPN网络生成两个结果:先验框的调整参数和先验框内是否存在物体。
这一步的作用就是利用RPN的预测结果对先验框进行调整,得到建议框。
函数loc2bbox将RPN的结果转化成建议框,也就是解码过程:

def loc2bbox(src_bbox, loc):
    if src_bbox.shape[0] == 0:
        return np.zeros((0, 4), dtype=loc.dtype)

    src_bbox = src_bbox.astype(src_bbox.dtype, copy=False)
    src_width = src_bbox[:, 2] - src_bbox[:, 0]
    src_height = src_bbox[:, 3] - src_bbox[:, 1]
    src_ctr_x = src_bbox[:, 0] + 0.5 * src_width
    src_ctr_y = src_bbox[:, 1] + 0.5 * src_height

    dx = loc[:, 0::4]
    dy = loc[:, 1::4]
    dw = loc[:, 2::4]
    dh = loc[:, 3::4]

    ctr_x = dx * src_width[:, np.newaxis] + src_ctr_x[:, np.newaxis]
    ctr_y = dy * src_height[:, np.newaxis] + src_ctr_y[:, np.newaxis]
    w = np.exp(dw) * src_width[:, np.newaxis]
    h = np.exp(dh) * src_height[:, np.newaxis]

    dst_bbox = np.zeros(loc.shape, dtype=loc.dtype)
    dst_bbox[:, 0::4] = ctr_x - 0.5 * w
    dst_bbox[:, 1::4] = ctr_y - 0.5 * h
    dst_bbox[:, 2::4] = ctr_x + 0.5 * w
    dst_bbox[:, 3::4] = ctr_y + 0.5 * h
    # 得到调整增以后的先验框也就是建议框
    return dst_bbox

该函数的参数分别是:src_bbox(先验框)和loc(RPN的输出结果)。
分别取出先验框的中心点坐标和宽高,再从RPN的输出结果中提取出建议框的调整参数。
对输出结果解码,获得调整以后的先验框(即建议框)中心点坐标和宽高。
然后将格式转化成建议框左上角和右下角的格式。
roi中存放的就是每一个建议框左上角和右下角的坐标。
keras中

R = rpn_results[0][:, 2:]
R[:, 0] = np.array(np.round(R[:, 0]*width/self.config.rpn_stride),dtype=np.int32)
R[:, 1] = np.array(np.round(R[:, 1]*height/self.config.rpn_stride),dtype=np.int32)
R[:, 2] = np.array(np.round(R[:, 2]*width/self.config.rpn_stride),dtype=np.int32)
R[:, 3] = np.array(np.round(R[:, 3]*height/self.config.rpn_stride),dtype=np.int32)
R[:, 2] -= R[:, 0]
R[:, 3] -= R[:, 1]

rpn.py中,

# 利用slice进行分割,防止建议框超出图像边缘
        roi[:, slice(0, 4, 2)] = np.clip(roi[:, slice(0, 4, 2)], 0, img_size[1])
        roi[:, slice(1, 4, 2)] = np.clip(roi[:, slice(1, 4, 2)], 0, img_size[0])
        
        # 宽高的最小值不可以小于16
        min_size = self.min_size * scale
        # 计算高宽
        ws = roi[:, 2] - roi[:, 0]
        hs = roi[:, 3] - roi[:, 1]
        # 防止建议框过小
        keep = np.where((hs >= min_size) & (ws >= min_size))[0]
        roi = roi[keep, :]
        score = score[keep]
        # 取出成绩最好的一些建议框
        order = score.ravel().argsort()[::-1]
        if n_pre_nms > 0:
            order = order[:n_pre_nms]
        roi = roi[order, :]
        roi = nms(roi,self.nms_thresh)
        roi = torch.Tensor(roi)
        roi = roi[:n_post_nms]
        return roi

防止建议框的范围超出图像范围,而且为了防止建议框过小,只保留宽高大于16的框。
并筛选出置信度较大的建议框。

5 建议框截取公共特征层,输入到RoiPooling层中

建议框对公共特征层截取得到不同大小的局部特征层,然后对局部特征层进行分区域池化,得到相同大小的局部特征层,输入到ROI_Pooling层,进行分类预测和回归预测得到最终的预测结果。


6 Roi pooling层对建议框截取的共享特征层进行分类预测和回归预测

    def forward(self, x, rois, roi_indices):
        roi_indices = torch.Tensor(roi_indices).cuda().float()
        rois = torch.Tensor(rois).cuda().float()
        indices_and_rois = torch.cat([roi_indices[:, None], rois], dim=1)

        xy_indices_and_rois = indices_and_rois[:, [0, 1, 2, 3, 4]]
        indices_and_rois =  xy_indices_and_rois.contiguous()
        # 利用建议框对公用特征层进行截取
        pool = self.roi(x, indices_and_rois)
        pool = pool.view(pool.size(0), -1)
        fc7 = self.classifier(pool)
        # 回归预测
        roi_cls_locs = self.cls_loc(fc7)
        # 分类预测
        roi_scores = self.score(fc7)
        return roi_cls_locs, roi_scores

用建议框截取共享特征层,然后用于回归预测和分类预测 。
Keras代码如下:

    def call(self, x, mask=None):

        assert(len(x) == 2)

        img = x[0]# 共享特征层
        rois = x[1]# 建议框

        outputs = []

        for roi_idx in range(self.num_rois):
            # 获得建议框左上角的坐标还有建议框的宽高
            x = rois[0, roi_idx, 0]
            y = rois[0, roi_idx, 1]
            w = rois[0, roi_idx, 2]
            h = rois[0, roi_idx, 3]
            # 格式转换
            x = K.cast(x, 'int32')
            y = K.cast(y, 'int32')
            w = K.cast(w, 'int32')
            h = K.cast(h, 'int32')
            # 在共享特征层上截取建议框,而且把截取下来的内容resize成self.pool_size×self.pool_size的大小
            rs = tf.image.resize_images(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size))
            outputs.append(rs)

        final_output = K.concatenate(outputs, axis=0)
        final_output = K.reshape(final_output, (1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels))

        final_output = K.permute_dimensions(final_output, (0, 1, 2, 3, 4))

        return final_output

7 Roi Pooling的输出结果调整建议框得到最终的预测框

utils.py中函数DecodeBox中 ,提取出Roi_Pooling网络输出的结果,对建议框截取的公共特征层进行调整的过程,也就是解码过程。

    def forward(self, roi_cls_locs, roi_scores, rois, height, width, score_thresh):

        rois = torch.Tensor(rois)

        roi_cls_loc = (roi_cls_locs * self.std + self.mean)
        roi_cls_loc = roi_cls_loc.view([-1, self.num_classes, 4])
        roi = rois.view((-1, 1, 4)).expand_as(roi_cls_loc)
        cls_bbox = loc2bbox((roi.cpu().detach().numpy()).reshape((-1, 4)),
                            (roi_cls_loc.cpu().detach().numpy()).reshape((-1, 4)))
        cls_bbox = torch.Tensor(cls_bbox)
        # 得到每一个建议框每一个类的位置
        cls_bbox = cls_bbox.view([-1, (self.num_classes), 4])
        # clip bounding box放置框超出图像边缘
        cls_bbox[..., 0] = (cls_bbox[..., 0]).clamp(min=0, max=width)
        cls_bbox[..., 2] = (cls_bbox[..., 2]).clamp(min=0, max=width)
        cls_bbox[..., 1] = (cls_bbox[..., 1]).clamp(min=0, max=height)
        cls_bbox[..., 3] = (cls_bbox[..., 3]).clamp(min=0, max=height)

        prob = F.softmax(torch.tensor(roi_scores), dim=1)
        raw_cls_bbox = cls_bbox.cpu().numpy()
        raw_prob = prob.cpu().numpy()

        outputs = []
        # 循环遍历每一个类,从1开始是因为0表示背景在解码过程中不需要。
        # 背景这一类只是用来帮助训练
        for l in range(1, self.num_classes):
            # 遍历每一个类,对于某一个类,判断框中物体所属类别
            cls_bbox_l = raw_cls_bbox[:, l, :]
            # 计算物体属于该类别的置信度
            prob_l = raw_prob[:, l]
            # 超过某一阈值
            mask = prob_l > score_thresh
            # 则该建议框中的物体属于该类别
            cls_bbox_l = cls_bbox_l[mask]
            prob_l = prob_l[mask]
            if len(prob_l) == 0:
                continue

            label = np.ones_like(prob_l)*(l-1)
            detections_class = np.concatenate([cls_bbox_l,np.expand_dims(prob_l,axis=-1),np.expand_dims(label,axis=-1)],axis=-1)
            
            prob_l_index = np.argsort(prob_l)[::-1]
            detections_class = detections_class[prob_l_index]
            nms_out = nms(detections_class,0.3)
            if outputs == []:
                outputs = nms_out
            else:
                outputs = np.concatenate([outputs,nms_out], axis=0)
        return outputs

Roi_Pooling网络的输出结果就是对建议框的分类预测和回归预测。
取出Roi_Pooling网络的输出结果,调用函数loc2bbox完成解码过程。也就是能够显示在图片上的预测框。
然后从for循环开始,依次遍历每一个类别,对于一个类别遍历所有的建议框,如果建议框属于该类的置信度超过某一阈值,那么该建议框中的物体属于该类别。
利用非极大值抑制NMS对调整后的建议框进行筛选。
接着在图像中画出最终个的预测框:

        for i, c in enumerate(label):
            predicted_class = self.class_names[int(c)]
            score = conf[i]

            left, top, right, bottom = bbox[i]
            top = top - 5
            left = left - 5
            bottom = bottom + 5
            right = right + 5

            top = max(0, np.floor(top + 0.5).astype('int32'))
            left = max(0, np.floor(left + 0.5).astype('int32'))
            bottom = min(np.shape(image)[0], np.floor(bottom + 0.5).astype('int32'))
            right = min(np.shape(image)[1], np.floor(right + 0.5).astype('int32'))

            # 画框框
            label = '{} {:.2f}'.format(predicted_class, score)
            draw = ImageDraw.Draw(image)
            label_size = draw.textsize(label, font)
            label = label.encode('utf-8')
            print(label)
            
            if top - label_size[1] >= 0:
                text_origin = np.array([left, top - label_size[1]])
            else:
                text_origin = np.array([left, top + 1])

            for i in range(thickness):
                draw.rectangle(
                    [left + i, top + i, right - i, bottom - i],
                    outline=self.colors[int(c)])
            draw.rectangle(
                [tuple(text_origin), tuple(text_origin + label_size)],
                fill=self.colors[int(c)])
            draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)
            del draw
        
        print("time:",time.time()-start_time)
        return image

提取出每一个预测矿的左上角和右下角坐标,留出5个像素点的阈度。
然后将最终的预测框画在图片上,标出文本。得到最终的目标检测结果。

*生成先验框的代码

def generate_anchor_base(base_size=16, ratios=[0.5, 1, 2],
                         anchor_scales=[8, 16, 32]):

    anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4),
                           dtype=np.float32)
    for i in range(len(ratios)):
        for j in range(len(anchor_scales)):
            h = base_size * anchor_scales[j] * np.sqrt(ratios[i])
            w = base_size * anchor_scales[j] * np.sqrt(1. / ratios[i])

            index = i * len(anchor_scales) + j
            anchor_base[index, 0] = - h / 2.
            anchor_base[index, 1] = - w / 2.
            anchor_base[index, 2] = h / 2.
            anchor_base[index, 3] = w / 2.
    return anchor_base

anchor_scales=[8, 16, 32]表示原始定义先验框的大小,
ratios=[0.5, 1, 2]表示宽高的比例因子,也就是利用这九个值组合出9中不同宽高的先验框。用循环计算每一个anchor_scales值分别乘以/除以每一个ratio值开根号,作为先验框的宽和高。每一个网格点得到三个正方形、三个横着的矩形和三个竖着的矩形(我描述的很诡异……)

定义函数_enumerate_shifted_anchor

def _enumerate_shifted_anchor(anchor_base, feat_stride, height, width):
    # 计算网格中心点坐标
    shift_x = np.arange(0, width * feat_stride, feat_stride)
    shift_y = np.arange(0, height * feat_stride, feat_stride)
    shift_x, shift_y = np.meshgrid(shift_x, shift_y)
    shift = np.stack((shift_x.ravel(),shift_y.ravel(),
                      shift_x.ravel(),shift_y.ravel(),), axis=1)

    # 每个网格点上的9个先验框
    A = anchor_base.shape[0]
    K = shift.shape[0]
    anchor = anchor_base.reshape((1, A, 4)) + \
             shift.reshape((K, 1, 4))
    # 所有的先验框
    anchor = anchor.reshape((K * A, 4)).astype(np.float32)
    return anchor

这里以600×600的图像为例,特征层的大小为38×38,每一个网格大小约为16。
计算出每一个网格的中心点坐标:
shift_x = np.arange(0, width * feat_stride, feat_stride)
shift_y = np.arange(0, height * feat_stride, feat_stride)
画出其中一个网格点对应的九个先验框的图像如图所示:
代码来自:https://github.com/bubbliiiing/faster-rcnn-pytorch
Faster-RCNN网络实现目标检测----pytorch_第3张图片


代码链接:
链接:https://pan.baidu.com/s/1Rqx16ki4qZnGrVFaPcftKw
提取码:dawx

感谢:
https://blog.csdn.net/weixin_44791964/article/details/105739918
https://github.com/bubbliiiing/faster-rcnn-pytorch
https://blog.csdn.net/qq_34243930/article/details/107231539?%3E
pytorch官网

你可能感兴趣的:(深度学习,python)