https://blog.csdn.net/weixin_44791964/article/details/105739918
代码来自:
https://github.com/bubbliiiing/faster-rcnn-pytorch
Keras是高度封装起来的框架,用起来就是调用各部分的模块。因此Keras适合于学习,需习模型的整体结构。
本文记一篇学习笔记,把Keras框架和Pytorch框架的代码对比吗,完成Faster-RCNN网络
Faster-RCNN是一种two-stage的目标检测方法,(与one-stage方法相比two-stage的目标检测方法检测精度高,但是速度较慢)
Faster-RCNN网络对输入的图片大小没有限制,首相将输入的图片resize 成短边为600,按照原图像的比例进行缩放。缩放后的图片经过backbone进行特征提取得到共享特征层。例如特征层大小为38×38,可以看做将原图像划分成38×38的网格。然后利用RPN(区域建议网络)对先验框进行调整,得到建议框。
其中先验框是指事先在大型数据集上标注好的框。
得到建议框的过程可以看做是对框的粗略筛选。然后利用建议框在共享特征层上进行截取,得到局部特征层。
将局部特征层输入到ROI_Pooling中,首先将特征层resize成相同的大小,然后进行分类预测(cls_pred)和回归预测(bbox_pred),得到目标检测的结果。
pytorch 中的torch.nn.Conv2d()
函数
https://blog.csdn.net/qq_34243930/article/details/107231539?%3E
pytorch官网
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1,groups=1, bias=True)
参数:
in_channels
:输入图像的通道数
out_channels
:输出图像的通道数
kernel_size
:卷积核的大小,参数的类型可以使一个整数(int),或者一个元组(tuple),卷积核可以宽 高不相等
stride=1
:卷积的步长,默认为1
padding=0
:填充方式
dilation=1
:空洞卷积,扩张操作
groups=1
:分组卷积,默认不分组
bias=True
:在输出中添加一个可学习的偏差,默认为True
空洞卷积https://blog.csdn.net/qq_34243930/article/details/107231539?%3E
主干特征提取网络使用的是ResNet_50
该网络可以看做是Conv_block和Identity_block的堆叠。
特征提取网络的代码如下:
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False) # change
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, # change
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(planes * 4)
self.relu = nn.ReLU(inplace=True)
# downsample表示残差边
self.downsample = downsample
self.stride = stride
首先第一部分class Bottleneck
定义了瓶颈结构,包括三个卷积部分:1×1卷积减少通道数,3×3卷积进行特征提取,1×1卷积再将通道数增加。
downsample
表示残差边,由于Identity_block和Conv_block的区别在于残差边上是否有卷积核归一化的操作。
然后forward
函数调用各个网络层
def forward(self, x):
residual = x
# 1×1卷积
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
# 3×3卷积
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
# 1×1卷积
out = self.conv3(out)
out = self.bn3(out)
# 判断是有有残差边,Conv_block有残差边,identity_block没有残差边
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
函数make_layer
是将卷积层堆叠起来。
def _make_layer(self, block, planes, blocks, stride=1):
# 残差边
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes * block.expansion),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
这里,利用循环将conv_block和identity_block堆叠起来。blocks
堆叠的数量
在最外层的函数resnet50
里面定义,blocks=[3, 4, 6, 3]
,在resnet网络中分别有3层、4层、6层、3层卷积堆叠
def resnet50():
# 第二个参数为block存放的是堆叠的conv_block和identity_block的个数
model = ResNet(Bottleneck, [3, 4, 6, 3])
# 获取特征提取部分
# 将主干特征提取网络结果分割开
features = list([model.conv1, model.bn1, model.relu, model.maxpool, model.layer1, model.layer2, model.layer3])
# 获取分类部分
classifier = list([model.layer4, model.avgpool])
features = nn.Sequential(*features)
classifier = nn.Sequential(*classifier)
return features,classifier
每一层堆叠的卷积输入通道数和输出通道数是不同的:
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000):
self.inplanes = 64
super(ResNet, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=0, ceil_mode=True) # change
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AvgPool2d(7)
self.fc = nn.Linear(512 * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
elif isinstance(m, nn.BatchNorm2d):
m.weight.data.fill_(1)
m.bias.data.zero_()
Keras框架搭建ResNet_50:
def identity_block(input_tensor, kernel_size, filters, stage, block):
filters1, filters2, filters3 = filters
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
x = Conv2D(filters1, (1, 1), name=conv_name_base + '2a')(input_tensor)
x = BatchNormalization(name=bn_name_base + '2a')(x)
x = Activation('relu')(x)
x = Conv2D(filters2, kernel_size,padding='same', name=conv_name_base + '2b')(x)
x = BatchNormalization(name=bn_name_base + '2b')(x)
x = Activation('relu')(x)
x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
x = BatchNormalization(name=bn_name_base + '2c')(x)
x = layers.add([x, input_tensor])
x = Activation('relu')(x)
return x
def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):
filters1, filters2, filters3 = filters
conv_name_base = 'res' + str(stage) + block + '_branch'
bn_name_base = 'bn' + str(stage) + block + '_branch'
x = Conv2D(filters1, (1, 1), strides=strides,
name=conv_name_base + '2a')(input_tensor)
x = BatchNormalization(name=bn_name_base + '2a')(x)
x = Activation('relu')(x)
x = Conv2D(filters2, kernel_size, padding='same',
name=conv_name_base + '2b')(x)
x = BatchNormalization(name=bn_name_base + '2b')(x)
x = Activation('relu')(x)
x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
x = BatchNormalization(name=bn_name_base + '2c')(x)
shortcut = Conv2D(filters3, (1, 1), strides=strides,
name=conv_name_base + '1')(input_tensor)
shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut)
x = layers.add([x, shortcut])
x = Activation('relu')(x)
return x
def ResNet50(inputs):
img_input = inputs
x = ZeroPadding2D((3, 3))(img_input)
x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)
x = BatchNormalization(name='bn_conv1')(x)
x = Activation('relu')(x)
x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)
x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')
x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')
x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')
return x
class RegionProposalNetwork(nn.Module):
def __init__(
self, in_channels=512, mid_channels=512, ratios=[0.5, 1, 2],
anchor_scales=[8, 16, 32], feat_stride=16,
mode = "training",
):
super(RegionProposalNetwork, self).__init__()
self.anchor_base = generate_anchor_base(anchor_scales=anchor_scales, ratios=ratios)
# 步长,压缩的倍数
self.feat_stride = feat_stride
self.proposal_layer = ProposalCreator(mode)
# 每一个网格上默认先验框的数量
n_anchor = self.anchor_base.shape[0]
# 先进行一个3x3的卷积
self.conv1 = nn.Conv2d(in_channels, mid_channels, 3, 1, 1)
# 分类预测先验框内部是否包含物体
self.score = nn.Conv2d(mid_channels, n_anchor * 2, 1, 1, 0)
# 回归预测对先验框进行调整
self.loc = nn.Conv2d(mid_channels, n_anchor * 4, 1, 1, 0)
normal_init(self.conv1, 0, 0.01)
normal_init(self.score, 0, 0.01)
normal_init(self.loc, 0, 0.01)
有特征提取网络的到的特征层,可以看做将原图像划分成不同的网格。
先验框是试下在较大数据及上已经画好的框。这里每一个网格上默认有9个先验框。
然后self.conv1 = nn.Conv2d(in_channels, mid_channels, 3, 1, 1)
,进行一次3×3的卷积,作用是特征提取。
接着进行两次1×1的卷积:
self.score = nn.Conv2d(mid_channels, n_anchor * 2, 1, 1, 0)
self.loc = nn.Conv2d(mid_channels, n_anchor * 4, 1, 1, 0)
输出通道数分别为n_anchor ×2 和 n_anchor×4,n_anchor ×2
表示每一个先验框内是否存在物体,n_anchor×4
每一个先验框的调整参数。
def forward(self, x, img_size, scale=1.):
n, _, hh, ww = x.shape
# 对共享特征层进行一个3x3的卷积
h = F.relu(self.conv1(x))
# 回归预测
rpn_locs = self.loc(h)
rpn_locs = rpn_locs.permute(0, 2, 3, 1).contiguous().view(n, -1, 4)
# 分类预测
rpn_scores = self.score(h)
rpn_scores = rpn_scores.permute(0, 2, 3, 1).contiguous().view(n, -1, 2)
# 进行softmax
rpn_softmax_scores = F.softmax(rpn_scores, dim=-1)
rpn_fg_scores = rpn_softmax_scores[:, :, 1].contiguous()
rpn_fg_scores = rpn_fg_scores.view(n, -1)
rpn_scores = rpn_scores.view(n, -1, 2)
# 生成先验框
anchor = _enumerate_shifted_anchor(
np.array(self.anchor_base),
self.feat_stride, hh, ww)
rois = list()
roi_indices = list()
for i in range(n):
roi = self.proposal_layer(
rpn_locs[i].cpu().data.numpy(),
rpn_fg_scores[i].cpu().data.numpy(),
anchor, img_size,
scale=scale)
batch_index = i * np.ones((len(roi),), dtype=np.int32)
rois.append(roi)
roi_indices.append(batch_index)
rois = np.concatenate(rois, axis=0)
roi_indices = np.concatenate(roi_indices, axis=0)
return rpn_locs, rpn_scores, rois, roi_indices, anchor
函数forward
调用这些卷积完成RPN的任务,生成建议框。
调用这两个卷积操作实现分类预测和回归预测。
keras框架实现RPN网络的作用:
一次3×3的卷积核两次1×1的卷积,
rpn网络输出的结果是对先验框的分类预测和回归预测,
也就是得到框中是否存在物体和对先验框的调整参数。
def get_rpn(base_layers, num_anchors):
x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer='normal', name='rpn_conv1')(base_layers)
# 1×1卷积,输出通道数为9
x_class = Conv2D(num_anchors, (1, 1), activation='sigmoid', kernel_initializer='uniform', name='rpn_out_class')(x)
# 1×1的卷积,输出通道数为36
x_regr = Conv2D(num_anchors * 4, (1, 1), activation='linear', kernel_initializer='zero', name='rpn_out_regress')(x)
x_class = Reshape((-1, 1), name="classification")(x_class)
x_regr = Reshape((-1, 4), name="regression")(x_regr)
return [x_class, x_regr, base_layers]
def get_classifier(base_layers, input_rois, num_rois, nb_classes=21, trainable=False):
pooling_regions = 14
input_shape = (num_rois, 14, 14, 1024)
# roi_pooling 参数设置
out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois])
out = classifier_layers(out_roi_pool, input_shape=input_shape, trainable=True)
out = TimeDistributed(Flatten())(out)
out_class = TimeDistributed(Dense(nb_classes, activation='softmax', kernel_initializer='zero'), name='dense_class_{}'.format(nb_classes))(out)
out_regr = TimeDistributed(Dense(4 * (nb_classes-1), activation='linear', kernel_initializer='zero'), name='dense_regress_{}'.format(nb_classes))(out)
return [out_class, out_regr]
首先RPN网络生成两个结果:先验框的调整参数和先验框内是否存在物体。
这一步的作用就是利用RPN的预测结果对先验框进行调整,得到建议框。
函数loc2bbox
将RPN的结果转化成建议框,也就是解码过程:
def loc2bbox(src_bbox, loc):
if src_bbox.shape[0] == 0:
return np.zeros((0, 4), dtype=loc.dtype)
src_bbox = src_bbox.astype(src_bbox.dtype, copy=False)
src_width = src_bbox[:, 2] - src_bbox[:, 0]
src_height = src_bbox[:, 3] - src_bbox[:, 1]
src_ctr_x = src_bbox[:, 0] + 0.5 * src_width
src_ctr_y = src_bbox[:, 1] + 0.5 * src_height
dx = loc[:, 0::4]
dy = loc[:, 1::4]
dw = loc[:, 2::4]
dh = loc[:, 3::4]
ctr_x = dx * src_width[:, np.newaxis] + src_ctr_x[:, np.newaxis]
ctr_y = dy * src_height[:, np.newaxis] + src_ctr_y[:, np.newaxis]
w = np.exp(dw) * src_width[:, np.newaxis]
h = np.exp(dh) * src_height[:, np.newaxis]
dst_bbox = np.zeros(loc.shape, dtype=loc.dtype)
dst_bbox[:, 0::4] = ctr_x - 0.5 * w
dst_bbox[:, 1::4] = ctr_y - 0.5 * h
dst_bbox[:, 2::4] = ctr_x + 0.5 * w
dst_bbox[:, 3::4] = ctr_y + 0.5 * h
# 得到调整增以后的先验框也就是建议框
return dst_bbox
该函数的参数分别是:src_bbox
(先验框)和loc
(RPN的输出结果)。
分别取出先验框的中心点坐标和宽高,再从RPN的输出结果中提取出建议框的调整参数。
对输出结果解码,获得调整以后的先验框(即建议框)中心点坐标和宽高。
然后将格式转化成建议框左上角和右下角的格式。
roi
中存放的就是每一个建议框左上角和右下角的坐标。
keras中
R = rpn_results[0][:, 2:]
R[:, 0] = np.array(np.round(R[:, 0]*width/self.config.rpn_stride),dtype=np.int32)
R[:, 1] = np.array(np.round(R[:, 1]*height/self.config.rpn_stride),dtype=np.int32)
R[:, 2] = np.array(np.round(R[:, 2]*width/self.config.rpn_stride),dtype=np.int32)
R[:, 3] = np.array(np.round(R[:, 3]*height/self.config.rpn_stride),dtype=np.int32)
R[:, 2] -= R[:, 0]
R[:, 3] -= R[:, 1]
在rpn.py
中,
# 利用slice进行分割,防止建议框超出图像边缘
roi[:, slice(0, 4, 2)] = np.clip(roi[:, slice(0, 4, 2)], 0, img_size[1])
roi[:, slice(1, 4, 2)] = np.clip(roi[:, slice(1, 4, 2)], 0, img_size[0])
# 宽高的最小值不可以小于16
min_size = self.min_size * scale
# 计算高宽
ws = roi[:, 2] - roi[:, 0]
hs = roi[:, 3] - roi[:, 1]
# 防止建议框过小
keep = np.where((hs >= min_size) & (ws >= min_size))[0]
roi = roi[keep, :]
score = score[keep]
# 取出成绩最好的一些建议框
order = score.ravel().argsort()[::-1]
if n_pre_nms > 0:
order = order[:n_pre_nms]
roi = roi[order, :]
roi = nms(roi,self.nms_thresh)
roi = torch.Tensor(roi)
roi = roi[:n_post_nms]
return roi
防止建议框的范围超出图像范围,而且为了防止建议框过小,只保留宽高大于16的框。
并筛选出置信度较大的建议框。
建议框对公共特征层截取得到不同大小的局部特征层,然后对局部特征层进行分区域池化,得到相同大小的局部特征层,输入到ROI_Pooling层,进行分类预测和回归预测得到最终的预测结果。
def forward(self, x, rois, roi_indices):
roi_indices = torch.Tensor(roi_indices).cuda().float()
rois = torch.Tensor(rois).cuda().float()
indices_and_rois = torch.cat([roi_indices[:, None], rois], dim=1)
xy_indices_and_rois = indices_and_rois[:, [0, 1, 2, 3, 4]]
indices_and_rois = xy_indices_and_rois.contiguous()
# 利用建议框对公用特征层进行截取
pool = self.roi(x, indices_and_rois)
pool = pool.view(pool.size(0), -1)
fc7 = self.classifier(pool)
# 回归预测
roi_cls_locs = self.cls_loc(fc7)
# 分类预测
roi_scores = self.score(fc7)
return roi_cls_locs, roi_scores
用建议框截取共享特征层,然后用于回归预测和分类预测 。
Keras代码如下:
def call(self, x, mask=None):
assert(len(x) == 2)
img = x[0]# 共享特征层
rois = x[1]# 建议框
outputs = []
for roi_idx in range(self.num_rois):
# 获得建议框左上角的坐标还有建议框的宽高
x = rois[0, roi_idx, 0]
y = rois[0, roi_idx, 1]
w = rois[0, roi_idx, 2]
h = rois[0, roi_idx, 3]
# 格式转换
x = K.cast(x, 'int32')
y = K.cast(y, 'int32')
w = K.cast(w, 'int32')
h = K.cast(h, 'int32')
# 在共享特征层上截取建议框,而且把截取下来的内容resize成self.pool_size×self.pool_size的大小
rs = tf.image.resize_images(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size))
outputs.append(rs)
final_output = K.concatenate(outputs, axis=0)
final_output = K.reshape(final_output, (1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels))
final_output = K.permute_dimensions(final_output, (0, 1, 2, 3, 4))
return final_output
utils.py
中函数DecodeBox
中 ,提取出Roi_Pooling网络输出的结果,对建议框截取的公共特征层进行调整的过程,也就是解码过程。
def forward(self, roi_cls_locs, roi_scores, rois, height, width, score_thresh):
rois = torch.Tensor(rois)
roi_cls_loc = (roi_cls_locs * self.std + self.mean)
roi_cls_loc = roi_cls_loc.view([-1, self.num_classes, 4])
roi = rois.view((-1, 1, 4)).expand_as(roi_cls_loc)
cls_bbox = loc2bbox((roi.cpu().detach().numpy()).reshape((-1, 4)),
(roi_cls_loc.cpu().detach().numpy()).reshape((-1, 4)))
cls_bbox = torch.Tensor(cls_bbox)
# 得到每一个建议框每一个类的位置
cls_bbox = cls_bbox.view([-1, (self.num_classes), 4])
# clip bounding box放置框超出图像边缘
cls_bbox[..., 0] = (cls_bbox[..., 0]).clamp(min=0, max=width)
cls_bbox[..., 2] = (cls_bbox[..., 2]).clamp(min=0, max=width)
cls_bbox[..., 1] = (cls_bbox[..., 1]).clamp(min=0, max=height)
cls_bbox[..., 3] = (cls_bbox[..., 3]).clamp(min=0, max=height)
prob = F.softmax(torch.tensor(roi_scores), dim=1)
raw_cls_bbox = cls_bbox.cpu().numpy()
raw_prob = prob.cpu().numpy()
outputs = []
# 循环遍历每一个类,从1开始是因为0表示背景在解码过程中不需要。
# 背景这一类只是用来帮助训练
for l in range(1, self.num_classes):
# 遍历每一个类,对于某一个类,判断框中物体所属类别
cls_bbox_l = raw_cls_bbox[:, l, :]
# 计算物体属于该类别的置信度
prob_l = raw_prob[:, l]
# 超过某一阈值
mask = prob_l > score_thresh
# 则该建议框中的物体属于该类别
cls_bbox_l = cls_bbox_l[mask]
prob_l = prob_l[mask]
if len(prob_l) == 0:
continue
label = np.ones_like(prob_l)*(l-1)
detections_class = np.concatenate([cls_bbox_l,np.expand_dims(prob_l,axis=-1),np.expand_dims(label,axis=-1)],axis=-1)
prob_l_index = np.argsort(prob_l)[::-1]
detections_class = detections_class[prob_l_index]
nms_out = nms(detections_class,0.3)
if outputs == []:
outputs = nms_out
else:
outputs = np.concatenate([outputs,nms_out], axis=0)
return outputs
Roi_Pooling网络的输出结果就是对建议框的分类预测和回归预测。
取出Roi_Pooling网络的输出结果,调用函数loc2bbox
完成解码过程。也就是能够显示在图片上的预测框。
然后从for循环开始,依次遍历每一个类别,对于一个类别遍历所有的建议框,如果建议框属于该类的置信度超过某一阈值,那么该建议框中的物体属于该类别。
利用非极大值抑制NMS对调整后的建议框进行筛选。
接着在图像中画出最终个的预测框:
for i, c in enumerate(label):
predicted_class = self.class_names[int(c)]
score = conf[i]
left, top, right, bottom = bbox[i]
top = top - 5
left = left - 5
bottom = bottom + 5
right = right + 5
top = max(0, np.floor(top + 0.5).astype('int32'))
left = max(0, np.floor(left + 0.5).astype('int32'))
bottom = min(np.shape(image)[0], np.floor(bottom + 0.5).astype('int32'))
right = min(np.shape(image)[1], np.floor(right + 0.5).astype('int32'))
# 画框框
label = '{} {:.2f}'.format(predicted_class, score)
draw = ImageDraw.Draw(image)
label_size = draw.textsize(label, font)
label = label.encode('utf-8')
print(label)
if top - label_size[1] >= 0:
text_origin = np.array([left, top - label_size[1]])
else:
text_origin = np.array([left, top + 1])
for i in range(thickness):
draw.rectangle(
[left + i, top + i, right - i, bottom - i],
outline=self.colors[int(c)])
draw.rectangle(
[tuple(text_origin), tuple(text_origin + label_size)],
fill=self.colors[int(c)])
draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)
del draw
print("time:",time.time()-start_time)
return image
提取出每一个预测矿的左上角和右下角坐标,留出5个像素点的阈度。
然后将最终的预测框画在图片上,标出文本。得到最终的目标检测结果。
def generate_anchor_base(base_size=16, ratios=[0.5, 1, 2],
anchor_scales=[8, 16, 32]):
anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4),
dtype=np.float32)
for i in range(len(ratios)):
for j in range(len(anchor_scales)):
h = base_size * anchor_scales[j] * np.sqrt(ratios[i])
w = base_size * anchor_scales[j] * np.sqrt(1. / ratios[i])
index = i * len(anchor_scales) + j
anchor_base[index, 0] = - h / 2.
anchor_base[index, 1] = - w / 2.
anchor_base[index, 2] = h / 2.
anchor_base[index, 3] = w / 2.
return anchor_base
anchor_scales=[8, 16, 32]
表示原始定义先验框的大小,
ratios=[0.5, 1, 2]
表示宽高的比例因子,也就是利用这九个值组合出9中不同宽高的先验框。用循环计算每一个anchor_scales
值分别乘以/除以每一个ratio
值开根号,作为先验框的宽和高。每一个网格点得到三个正方形、三个横着的矩形和三个竖着的矩形(我描述的很诡异……)
定义函数_enumerate_shifted_anchor
def _enumerate_shifted_anchor(anchor_base, feat_stride, height, width):
# 计算网格中心点坐标
shift_x = np.arange(0, width * feat_stride, feat_stride)
shift_y = np.arange(0, height * feat_stride, feat_stride)
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shift = np.stack((shift_x.ravel(),shift_y.ravel(),
shift_x.ravel(),shift_y.ravel(),), axis=1)
# 每个网格点上的9个先验框
A = anchor_base.shape[0]
K = shift.shape[0]
anchor = anchor_base.reshape((1, A, 4)) + \
shift.reshape((K, 1, 4))
# 所有的先验框
anchor = anchor.reshape((K * A, 4)).astype(np.float32)
return anchor
这里以600×600的图像为例,特征层的大小为38×38,每一个网格大小约为16。
计算出每一个网格的中心点坐标:
shift_x = np.arange(0, width * feat_stride, feat_stride)
shift_y = np.arange(0, height * feat_stride, feat_stride)
画出其中一个网格点对应的九个先验框的图像如图所示:
代码来自:https://github.com/bubbliiiing/faster-rcnn-pytorch
代码链接:
链接:https://pan.baidu.com/s/1Rqx16ki4qZnGrVFaPcftKw
提取码:dawx
感谢:
https://blog.csdn.net/weixin_44791964/article/details/105739918
https://github.com/bubbliiiing/faster-rcnn-pytorch
https://blog.csdn.net/qq_34243930/article/details/107231539?%3E
pytorch官网