Faster Rcnn算法复现

Faster Rcnn算法复现

  • Faster Rnn 实现流程
  • 代码
  • TODO

复现算法Faster Rcnn
Faster Rcnn算法原文链接: https://arxiv.org/abs/1506.01497

Faster Rnn 实现流程

Faster Rcnn是双阶段目标检测家族中的一员,由Rcnn -> Spp-net -> Fast Rcnn 再到Faster Rcnn,Faster Rcnn中首次使用深度学习的方法进行关键区域的提取,真正实现了end to end的目标检测,Faster Rcnn是双阶段目标检测系列最关键的节点,其后出现的Mask Rcnn与Cascade Rcnn都是基于Faster Rcnn而来,本次实现一个简要版的Faster Rcnn以增强自己对其的理解。
在之前参加天池比赛时,使用了Faster Rcnn和FPN,并做出了一定的改进也取得了不错的成绩,但当时是在mmdetection框架的基础上进行改进,难免无法顾及一些细节,通过这次从头开始实现Faster Rcnn和FPN,对细节方面有了更好的掌握,相信在实现了Faster Rcnn后,双步和和单步的目标检测算法我都可以进行简要版的复现,下图是Faster Rcnn的结构图。
Faster Rcnn算法复现_第1张图片
Faster Rcnn的实现分为五个阶段:

  1. 第一阶段,根据输入的图像和标注的框信息(后续称为ground-truth)计算anchor的真实标签和位移坐标,该阶段生成的anchor的真实标签和位移坐标将用于与RPN网络预测的anchor的标签和位移坐标计算RPN网络的损失以更新RPN网络的权重。
    假设输入图像大小为(800, 800),采用vgg16作为特征提取网络,下采样16倍,得到的特征图大小为(50,
    50),对特征图上的每个点,映射回原图产生anchor,假设设置anchor_scale为(8, 16,
    32),anchor_ratio为(0.5,1,2),那么每个位置将产生9个anchor,其中anchor_scale为anchor的大小,anchor_ratio为anchor的长宽比,需要注意的是,这里设置的anchor_scale是相对于特征图的,当映射回原图时需要乘以下采样倍数。对每个位置产生9个anchor,一共需要产生 50 ∗ 50 ∗ 9 50*50*9 50509即22500个anchor,对这些anchor进行anchor的定位和采样,即将anchor分配给与其具有最大iou的ground_trouth(会从中采样256个,别的忽略即label为-1,正负样本比例为1:1,根据iou判断正负样本),转换公式如式1、2、3、4。
    d x = ( g t x − a n c h o r x ) / a n c h o r w (1) dx=(gt_x-anchor_x)/anchor_w \tag{1} dx=(gtxanchorx)/anchorw(1)
    d y = ( g t y − a n c h o r y ) / a n c h o r h (2) dy=(gt_y-anchor_y)/anchor_h\tag{2} dy=(gtyanchory)/anchorh(2)
    d w = l o g ( g t w / a n c h o r w ) (3) dw=log(gt_w/anchor_w)\tag{3} dw=log(gtw/anchorw)(3)
    d h = l o g ( g t h / a n c h o r h ) (4) dh=log(gt_h/anchor_h)\tag{4} dh=log(gth/anchorh)(4)
    其中,dx、dy、dw、dh为anchor相对于ground_truth的位移坐标,gt_x、gt_y、gt_w、gt_h为ground_truth的中心坐标和宽高,anchor_x、anchor_y、anchor_w、anchor_h为anchor的中心坐标和宽高,同时根据anchor与ground_truth的iou来生成其真实标签(0或1),RPN网络只有前景和背景两种。该阶段的目的是对所有anchor生成其真实的位移坐标和标签,即gt_anchor_locations和gt_anchor_labels,用于联合RPN网络预测的pred_anchor_locations和pred_anchor_labels计算损失函数。

  2. 第二阶段,用RPN网络预测所有anchor的位移坐标和标签,即pred_anchor_locations和pred_anchor_labels。
    下图显示了RPN网络细节 如图3所示,为RPN网络的实现细节,在实际实现时,第一阶段产生的特征图大小为 50 ∗ 50 50*50 5050, 通道数为512,RPN网络由一个 3 ∗ 3 3*3 33的卷积核和两个 1 ∗ 1 1*1 11的卷积分支构成, 3 ∗ 3 3*3 33的卷积核加入了padding=1,即不改变原特征图的尺寸大小,两个 1 ∗ 1 1*1 11的卷积分支分别预测每个位置9个anchor的类别和位移坐标,因此此处输入为提取到的(50,50,512)的特征图,其中512为通道数,而输出为(50, 50,18)的类别预测和(50,50,36)的位移坐标预测。第二阶段产生的pred_anchor_labels和pred_anchor_locations将用于与第一阶段计算的gt_anchor_labels和gt_anchor_locations一起计算RPN阶段的损失loss。
    Faster Rcnn算法复现_第2张图片

  3. 第三阶段,对第二阶段预测的anchor处理,根据第二阶段预测的pred_anchor_locations中的dx、dy、dw、dh结合初始anchor信息反向计算出RPN阶段预测的ground_truth的左上坐标和右下坐标(x1,y1,x2,y2),根据score对其进行排序,取前12000个进行nms,在nms后的剩余框中取前2000个,注意此时的pred_anchor_locations中存储的是反向推算出的预测框在原图上的位置,对剩下的这2000个框根据ground_truth进行采样和定位,计算出这2000个框相对于ground_truth的真实labels和位移坐标locations,根据iou进行采样和定位,与groud_truth的iou大于0.5的分为正样本,此时需要记录其对应的ground_truth的label,该部分标签为类别数,而不是前景背景(0,1),定位公式同第一阶段,然后对定位后的框进行采样,该阶段采样128个,其中正样本比例为0.25,该阶段最后产生的是根据RPN网络预测的pred_anchor_locations、pred_anchor_labels与ground_truth计算出的128个gt_roi_labels和gt_roi_locations。

  4. 第四阶段,第二阶段通过RPN网络产生了pred_anchor_labels和pred_anchor_locations,第三阶段从其中采样出了128个sample_rois,对这128个sample_rois计算出了其相对于ground_truth的真实标签和位移坐标即gt_roi_labels和gt_roi_locations,第四阶段将第三阶段采样出的sample_rois先送入roi pooling层获得 7 ∗ 7 ∗ 512 7*7*512 77512固定大小的特征图,然后将其拉平产生一个(1, 25088)的特征向量,然后送入两层全连接层得到(1,4096)的特征向量,最后通过两个全连接层分支,分别预测其类别(num_class+1)和位移坐标((num_class+1)*4),即pred_roi_labels和pred_roi_locations。

  5. 第五阶段,根据前四个阶段计算的结果计算损失,其中RPN阶段的损失通过gt_anchor_labels、gt_anchor_locations、pred_anchor_labels、pred_anchor_locations计算,ROI阶段的损失通过gt_roi_labels、gt_roi_locations、pred_roi_labels、pred_roi_locations计算,分类损失使用交叉熵损失函数计算,回归损失通过smooth L1损失函数计算,分别计算出rpn_cls_loss、rpn_loc_loss、roi_cls_loss、roi_loc_loss,计算损失时要注意,分类损失是对所有框进行计算,而回归损失只对样本标签有意义的框计算,因此在计算总损失时要在回归损失前乘以10或者使分类损失除以10,即
    rpn_loss = rpn_cls_loss/10 + rpn_loc_loss,
    roi_loss = roi_cls_loss/10+roi_loc_loss,
    total_loss = rpn_loss+roi_loss。
    最后根据损失更新权重。 交叉熵损失函数如式5所示,smooth L1损失如式6所示。
    L = − ∑ c = 1 M y c log ⁡ ( p c ) (5) L=-\sum_{c=1}^{M} y_{c} \log \left(p_{c}\right)\tag{5} L=c=1Myclog(pc)(5)
    L = { 0.5 x 2 , ∣ x ∣ < 1 ∣ x ∣ − 0.5 , ∣ x ∣ ≥ 1 (6) L=\left\{\begin{array}{cc}{0.5 \mathrm{x}^{2},} & {|x|<1} \\ {|x|-0.5,} & {|x| \geq 1}\end{array}\right.\tag{6} L={0.5x2,x0.5,x<1x1(6)

代码

辅助模块util.py

1.	import numpy as np  
2.	  
3.	def iou(valid_anchors, gt_box):  
4.	    # 传入两个box,左上坐标和右下坐标,大小为 n*4  
5.	    # 返回ious,((len(valid_anchors)*len(gt_box)))  
6.	    # 每个valid_anchor与每个gt_box都有iou,ious维度:(len(valid_anchors)*len(gt_box))  
7.	    valid_anchors_num = valid_anchors.shape[0]  
8.	    gt_box_num = gt_box.shape[0]  
9.	    ious = np.empty((valid_anchors_num, gt_box_num))  
10.	    ious.fill(0)  
11.	    for i, anchor in enumerate(valid_anchors):  
12.	        xa1, ya1, xa2, ya2 = anchor  
13.	        area1 = (xa2-xa1)*(ya2-ya1)  
14.	        for j, bbox in enumerate(gt_box):  
15.	            xb1, yb1, xb2, yb2 = bbox  
16.	            area2 = (xb2-xb1)*(yb2-yb1)  
17.	            xx1 = np.max([xa1, xb1])  
18.	            yy1 = np.max([ya1, yb1])  
19.	            xx2 = np.min([xa2, xb2])  
20.	            yy2 = np.min([ya2, yb2])  
21.	            if(xx1 < xx2 and yy1 < yy2):  
22.	                inter_area = (yy2-yy1)*(xx2-xx1)  
23.	                iou = inter_area/(area1+area2-inter_area)  
24.	                ious[i, j] = iou  
25.	    return ious  
26.	  
27.	def nms(bboxes, thre, scores):  
28.	    # 输入为n*4的框, thre为阙值, scores为每个框对应的score  
29.	    # 输入均为numpy类型  
30.	    # 输出为nms后的剩余框  
31.	    x1 = bboxes[:, 0]  
32.	    y1 = bboxes[:, 1]  
33.	    x2 = bboxes[:, 2]  
34.	    y2 = bboxes[:, 3]  
35.	    areas = (x2-x1)*(y2-y1)  
36.	    order = np.argsort(scores)[::-1]  
37.	    keep = [] # nms后剩下的框的index  
38.	    while order.size > 0:  
39.	        i = order[0] # i为最大score的索引  
40.	        keep.append(i)  
41.	        xx1 = np.maximum(x1[i], x1[order[1:]])  
42.	        yy1 = np.maximum(y1[i], y1[order[1:]])  
43.	        xx2 = np.minimum(x2[i], x2[order[1:]])  
44.	        yy2 = np.minimum(y2[i], y2[order[1:]])  
45.	        w = np.maximum(0, xx2-xx1)  
46.	        h = np.maximum(0, yy2-yy1)  
47.	        inter = w*h  
48.	        ious = inter/(areas[i]+areas[order[1:]]-inter)  
49.	        indexes = np.where(ious < thre)[0]  
50.	        order = order[indexes+1]  
51.	    return keep  

主要模块faster_rcnn.py:

1.	import torch  
2.	import torchvision  
3.	import torch.nn as nn  
4.	import torch.nn.functional as F  
5.	import numpy as np  
6.	import util  
7.	  
8.	''''' 
9.	第一阶段,根据原图的gt生成anchor的gt,生成的anchor_gt用于与RPN网络产生的roi计算损失 
10.	注意,该阶段anchor只分两类,0或1,-1表示忽略 
11.	RPN网络对特征图上的每个点上的9个anchor进行预测,预测其类别(0,1)和其相对于gt的相对位置(dx,dy,dw,dh) 
12.	此部分我们要先求出每个anchor分配后的实际类别(0,1)和相对于gt的真实位置(dx,dy,dw,dh) 
13.	以此来求loss 
14.	对于800*800的图,下采样16倍后特征图大小为50*50,每个位置9个anchor,共50*50*9即22500个anchor 
15.	对这22500个anchor,先求出其真实的类别和相对gt的位移,再与RPN网络预测的类别和位移相比较,计算损失。 
16.	该部分共采样了256个anchor,也就是真实求出的anchor labels中只有256个是1或0,别的都是-1(忽略) 
17.	'''  
18.	# 先制作一张图片,并设置其groud_truth和对应的label  
19.	image = torch.zeros((1, 3, 800, 800))  
20.	bboxes = torch.Tensor([[20, 30, 400, 500], [300, 400, 500, 600]])  
21.	labels = torch.Tensor([6, 8])  
22.	sub_sample = 16 # 下采样倍数  
23.	  
24.	# 获取vgg模型,使用vgg模型提取特征,下采样16倍  
25.	model = torchvision.models.vgg16(pretrained=True)  
26.	fe = list(model.features)  
27.	  
28.	backbone = []  
29.	img_bak = image.clone()  
30.	for i in fe:  
31.	    img_bak = i(img_bak)  
32.	    if(img_bak.shape[2] < 50):  
33.	        break  
34.	    backbone.append(i)  
35.	    out_channels = img_bak.shape[1]  
36.	backbone = nn.Sequential(*backbone)  
37.	feature_map = backbone(image)  
38.	print(backbone)  
39.	print(feature_map.shape) # 50*50  
40.	  
41.	# 对特征图生成所有anchors,特征图为50*50,将其上每个点映射回原图生成anchors  
42.	size = 800//16  
43.	centerX = np.arange(16, (size+1)*16, 16)  
44.	centerY = np.arange(16, (size+1)*16, 16)  
45.	# print(centerX)  
46.	center_x = centerX - 8  
47.	center_y = centerY - 8  
48.	print(center_x)  
49.	# anchor的参数,注意scale是针对特征图的  
50.	anchor_scales = [8, 16, 32]  
51.	anchor_ratios = [0.5, 1.0, 2]  
52.	anchor_center = np.zeros((size*size, 2)) # 2500*2  
53.	# 初始化anchor的中心, 共2500个  
54.	index = 0  
55.	for i in range(len(center_x)):  
56.	    for j in range(len(centerY)):  
57.	        anchor_center[index, 0] = center_x[i]  
58.	        anchor_center[index, 1] = center_y[j]  
59.	        index += 1  
60.	print(anchor_center.shape)  
61.	  
62.	# 生成所有的anchors  
63.	anchors = torch.zeros((size*size*9, 4), dtype=torch.float32) # 共50*50个位置,每个位置9个anchors,每个anchor4个坐标(x1,y1,x2,y2)  
64.	index = 0  
65.	for c in anchor_center:  
66.	    center_x, center_y = c  
67.	    for i in range(len(anchor_scales)):  
68.	        for j in range(len(anchor_ratios)):  
69.	            # h = np.sqrt(sub_sample*anchor_scales[i]*anchor_ratios[j])  
70.	            # w = np.sqrt(sub_sample*anchor_scales[i]*(1./anchor_ratios[j]))  
71.	            h = sub_sample * anchor_scales[i] * np.sqrt(anchor_ratios[j])  
72.	            w = sub_sample * anchor_scales[i] * np.sqrt((1. / anchor_ratios[j]))  
73.	            anchors[index, 0] = center_x - w/2  
74.	            anchors[index, 1] = center_y - h/2  
75.	            anchors[index, 2] = center_x + w/2  
76.	            anchors[index, 3] = center_y + h/2  
77.	            index += 1  
78.	print(anchors.shape)  
79.	print(anchors)  
80.	  
81.	# 获取有效的anchors的索引index, 即不超过边界的anchors  
82.	valid_anchors_index = np.where( # 有效anchors的索引  
83.	    (anchors[:, 0] >= 0) &  
84.	    (anchors[:, 1] >= 0) &  
85.	    (anchors[:, 2] <= 800) &  
86.	    (anchors[:, 3] <= 800)  
87.	)[0]  
88.	print(valid_anchors_index)  
89.	valid_anchors = anchors[valid_anchors_index] # 有效anchors  
90.	print(valid_anchors_index.shape)  
91.	print(valid_anchors.shape)  
92.	# 计算所有有效anchor和gt的iou  
93.	ious = util.iou(valid_anchors, bboxes) # (valid_anchors.shape[0], bboxes.shape[0])  
94.	print(ious.shape)  
95.	''''' 
96.	开始分类anchor,与gt的iou最大的ancho分为前景,max iou>0.7的分为前景,否则分为背景 
97.	'''  
98.	gt_maxiou_index = ious.argmax(axis=0) # axis=0表示对列取最大,ious有两列,每一列的最大值的index  
99.	print(gt_maxiou_index)  
100.	anchor_maxiou_index = ious.argmax(axis=1) # 对ious每行取最大值,即anchor与几个gt的iou中的最大值  
101.	print(anchor_maxiou_index)  
102.	# 取出每个gt最大iou的anchor和每个anchor最大iou的gt  
103.	gt_maxiou = ious[gt_maxiou_index, np.arange(bboxes.shape[0])]  
104.	anchor_maxiou = ious[np.arange(valid_anchors.shape[0]), anchor_maxiou_index]  
105.	print(gt_maxiou.shape)  
106.	print(anchor_maxiou.shape)  
107.	gt_maxiou_index = np.where(ious==gt_maxiou)[0] # 和gt有最大iou的anchor的索引  
108.	  
109.	# 设置pos参数,即iou大于0.7的为前景,小于0.3为背景,采样256个,前景占比0.5  
110.	pos_iou_thre = 0.7  
111.	neg_iou_thre = 0.3  
112.	pos_ratio = 0.5  
113.	n_sample = 256  
114.	valid_anchor_labels = np.empty((valid_anchors.shape[0]))  
115.	valid_anchor_labels.fill(-1) # 初始化为-1, 表示忽略  
116.	valid_anchor_labels[gt_maxiou_index] = 1  
117.	valid_anchor_labels[anchor_maxiou >= pos_iou_thre] = 1  
118.	valid_anchor_labels[anchor_maxiou < neg_iou_thre] = 0  
119.	print(valid_anchor_labels.shape)  
120.	# 采样正负样本  
121.	n_pos = n_sample*pos_ratio  
122.	pos_index = np.where(valid_anchor_labels == 1)[0]  
123.	if(len(pos_index) > n_pos):  
124.	    disable_index = np.random.choice(pos_index, size=(len(pos_index)-n_pos), replace=False)  
125.	    valid_anchor_labels[disable_index] = -1  
126.	  
127.	n_neg = n_sample*(1-pos_ratio)  
128.	if(len(pos_index) > n_pos):  
129.	    pass  
130.	else:  
131.	    n_neg = n_sample-len(pos_index)  
132.	neg_index = np.where(valid_anchor_labels==0)[0]  
133.	if(len(neg_index) > n_neg):  
134.	    disable_index = np.random.choice(neg_index, size=(len(neg_index) - n_neg), replace=False)  
135.	    valid_anchor_labels[disable_index] = -1  
136.	# 此时正负样本均已采样,共采样256个  
137.	print(np.sum(valid_anchor_labels==1))  
138.	print(np.sum(valid_anchor_labels==0))  
139.	  
140.	# 开始给每个anchor分配位置,dx,dy,dw,dh,将每个anchor分配到与其具有最大iou的gt上,即anchor相对于gt的坐标  
141.	''''' 
142.	t_{x} = (x - x_{a})/w_{a} 
143.	t_{y} = (y - y_{a})/h_{a} 
144.	t_{w} = log(w/ w_a) 
145.	t_{h} = log(h/ h_a) 
146.	x, y , w, h是ground truth box的中心坐标,宽,高。x_a,y_a,h_a,w_a为anchor boxes的中心坐标,宽,高。 
147.	'''  
148.	anchor_maxiou_gtbox = bboxes[anchor_maxiou_index]  
149.	print(anchor_maxiou_gtbox.shape)  
150.	w = anchor_maxiou_gtbox[:, 2] - anchor_maxiou_gtbox[:, 0]  
151.	h = anchor_maxiou_gtbox[:, 3] - anchor_maxiou_gtbox[:, 1]  
152.	x = anchor_maxiou_gtbox[:, 0] + w/2  
153.	y = anchor_maxiou_gtbox[:, 1] + h/2  
154.	anchor_w = valid_anchors[:, 2] - valid_anchors[:, 0]  
155.	anchor_h = valid_anchors[:, 3] - valid_anchors[:, 1]  
156.	anchor_x = valid_anchors[:, 0] + anchor_w/2  
157.	anchor_y = valid_anchors[:, 1] + anchor_h/2  
158.	eps = torch.tensor(1e-10)  
159.	anchor_h = np.maximum(anchor_h, eps)  
160.	anchor_w = np.maximum(anchor_w, eps)  
161.	dx = (x-anchor_x)/anchor_w  
162.	dy = (y-anchor_y)/anchor_h  
163.	dw = np.log(w/anchor_w)  
164.	dh = np.log(h/anchor_h)  
165.	anchor_location = np.vstack((dx, dy, dw, dh)).transpose()  
166.	print(anchor_location.shape)  
167.	anchor_labels = np.zeros((anchors.shape[0]), dtype=np.int32)  
168.	anchor_labels.fill(-1)  
169.	anchor_locations = np.zeros_like(anchors, dtype=np.float32)  
170.	anchor_locations.fill(-1)  
171.	anchor_labels[valid_anchors_index] = valid_anchor_labels  
172.	anchor_locations[valid_anchors_index] = anchor_location  
173.	print(anchor_labels.shape)  
174.	print(anchor_locations.shape)  
175.	# 以上为第一部分,获取真实的anchor类别和相对gt的位移坐标。  
176.	  
177.	''''' 
178.	第二部分,用RPN网络生成预测的anchor的类别和位移坐标 
179.	'''  
180.	class RPN(nn.Module):  
181.	    def __init__(self):  
182.	        super(RPN, self).__init__()  
183.	        mid_channels = 512  
184.	        in_channels = 512  
185.	        self.conv1 = nn.Conv2d(in_channels, mid_channels, 3, 1, 1)  
186.	        self.reg_layer = nn.Conv2d(mid_channels, len(anchor_scales)*len(anchor_ratios)*4, 1, 1, 0)  
187.	        self.cls_layer = nn.Conv2d(mid_channels, len(anchor_scales)*len(anchor_ratios)*2, 1, 1, 0)  
188.	        self.conv1.weight.data.normal_(0, 0.01)  
189.	        self.conv1.bias.data.zero_()  
190.	        self.reg_layer.weight.data.normal_(0, 0.01)  
191.	        self.reg_layer.bias.data.zero_()  
192.	        self.cls_layer.weight.data.normal_(0, 0.01)  
193.	        self.cls_layer.bias.data.zero_()  
194.	  
195.	    def forward(self, x):  
196.	        x = self.conv1(x)  
197.	        pred_anchor_location = self.reg_layer(x)  
198.	        pred_anchor_cls = self.cls_layer(x)  
199.	        return pred_anchor_location, pred_anchor_cls  
200.	  
201.	rpn = RPN()  
202.	print(feature_map.shape)  
203.	pred_anchor_location, pred_anchor_cls = rpn(feature_map)  
204.	print(pred_anchor_location.shape)  
205.	print(pred_anchor_cls.shape)  
206.	pred_anchor_location = pred_anchor_location.permute(0, 2, 3, 1).contiguous().view(1, -1, 4)  
207.	pred_anchor_cls = pred_anchor_cls.permute(0, 2, 3, 1).contiguous().view(1, -1, 2)  
208.	print(pred_anchor_location.shape)  
209.	print(pred_anchor_cls.shape)  
210.	print(anchor_locations.shape)  
211.	print(anchor_labels.shape)  
212.	# pred_anchor_location与anchor_locations对应,pred_anchor_cls与anchor_labels对应  
213.	# 用于计算RPN_loss  
214.	# objectness_score中存储的是每个anchor属于正类的预测分数  
215.	objectness_score = pred_anchor_cls.view(1, 50, 50, 9, 2)[:, :, :, :, 1].contiguous().view(1, -1) # 预测每个anchor是正样本的分数  
216.	# 第二部分结束,用RPN网络预测所有anchor的类别和位移坐标,与第一部分求出的所有anchor的真实类别和位移坐标计算rpn loss  
217.	''''' 
218.	第三部分,通过rpn预测的anchor的类别和位移坐标生成roi,输入roi head进行预测 
219.	该部分对rpn预测的22500个anchor,先根据预测的位移坐标还原到anchor的坐标,再对前n1个进行nms 
220.	再在nms后的anchor中选取前n2个传入roi head进行预测。 
221.	rpn生成的是原始anchors相对与gt的偏移量。 
222.	再第一部分先根据实际gt计算出了原始anchor相对于gt的真实偏移量(256个有效的) 
223.	该部分的目的是生成送入roi head的框 
224.	'''  
225.	nms_thre = 0.7  
226.	n_train_pre_nms = 12000  
227.	n_train_post_nms = 2000  
228.	n_test_pre_nms = 6000  
229.	n_test_post_nms = 300  
230.	min_size = 16  
231.	# 先把rpn网络预测的位移坐标转换成(x1,y1,x2,y2)坐标  
232.	''''' 
233.	x = (w_{a} * ctr_x_{p}) + ctr_x_{a} 
234.	y = (h_{a} * ctr_x_{p}) + ctr_x_{a} 
235.	h = np.exp(h_{p}) * h_{a} 
236.	w = np.exp(w_{p}) * w_{a} 
237.	根据原始anchors坐标和rpn生成的dx, dy, dw, dh逆向推断出预测的gt的位置 
238.	'''  
239.	pred_anchor_location_numpy = pred_anchor_location[0].data.numpy()  
240.	objectness_score_numpy = objectness_score[0].data.numpy()  
241.	anchor_w = anchors[:, 2] - anchors[:, 0]  
242.	anchor_h = anchors[:, 3] - anchors[:, 1]  
243.	anchor_x = anchors[:, 0] + anchor_w/2  
244.	anchor_y = anchors[:, 1] + anchor_h/2  
245.	dx = pred_anchor_location_numpy[:, 0]  
246.	dy = pred_anchor_location_numpy[:, 1]  
247.	dw = pred_anchor_location_numpy[:, 2]  
248.	dh = pred_anchor_location_numpy[:, 3]  
249.	# dx1 = pred_anchor_location_numpy[:, 0::4]  
250.	# dy1 = pred_anchor_location_numpy[:, 1::4]  
251.	# dw1 = pred_anchor_location_numpy[:, 2::4]  
252.	# dh1 = pred_anchor_location_numpy[:, 3::4]  
253.	dx = torch.from_numpy(dx)  
254.	dy = torch.from_numpy(dy)  
255.	dw = torch.from_numpy(dw)  
256.	dh = torch.from_numpy(dh)  
257.	# 获得基于预测结果(位移坐标)得到的预测框在原图的center_x, center_y, w, h  
258.	pred_gt_center_x = dx*anchor_w+anchor_x  
259.	pred_gt_center_y = dy*anchor_h+anchor_y  
260.	pred_gt_w = np.exp(dw)*anchor_w  
261.	pred_gt_h = np.exp(dh)*anchor_h  
262.	print(pred_gt_center_x.shape)  
263.	print(pred_gt_center_y.shape)  
264.	print(pred_gt_w.shape)  
265.	print(pred_gt_h.shape)  
266.	# 再根据得到的center_x, center_y, w, h转换成左上坐标和右下坐标(x1,y1), (x2,y2)  
267.	rois = torch.zeros_like(pred_anchor_location[0]) # (22500, 4)  
268.	rois[:, 0] = pred_gt_center_x - pred_gt_w/2  
269.	rois[:, 1] = pred_gt_center_y - pred_gt_h/2  
270.	rois[:, 2] = pred_gt_center_x + pred_gt_w/2  
271.	rois[:, 3] = pred_gt_center_y + pred_gt_h/2  
272.	print(rois.shape)  
273.	# 将得到的框映射到原图上,即限制超过边界的坐标  
274.	img_size = (800, 800)  
275.	rois[:, 0] = torch.clamp(rois[:, 0], 0, img_size[0])  
276.	rois[:, 1] = torch.clamp(rois[:, 1], 0, img_size[1])  
277.	rois[:, 2] = torch.clamp(rois[:, 2], 0, img_size[0])  
278.	rois[:, 3] = torch.clamp(rois[:, 3], 0, img_size[1])  
279.	print(rois)  
280.	# 去除高度或宽度小于minsize的预测框  
281.	w = rois[:, 2] - rois[:, 0]  
282.	h = rois[:, 3] - rois[:, 1]  
283.	keep = np.where((h.numpy() >= min_size) & (w.numpy() >= min_size))[0]  
284.	rois = rois[keep, :]  
285.	before_scores = objectness_score[0][keep]  
286.	before_scores_numpy = before_scores.data.numpy()  
287.	print(rois.shape)  
288.	print(before_scores.shape)  
289.	print(before_scores_numpy.shape)  
290.	print(before_scores_numpy.ravel().shape)  
291.	# 对before_scores按从高到低的顺序排序,取前n1个进行nms,然后再取前n2个送入ROI head中  
292.	order = np.argsort(before_scores_numpy)[::-1]  
293.	order = order[:n_train_pre_nms] # 12000  
294.	order = torch.from_numpy(order.copy())  
295.	rois = rois[order, :] # 12000*4  
296.	scores = before_scores[order] # 12000  
297.	rois_numpy = rois.data.numpy()  
298.	scores_numpy = scores.data.numpy()  
299.	keep = util.nms(rois_numpy, nms_thre, scores_numpy)  
300.	print(len(keep))  
301.	keep = keep[:n_train_post_nms]  
302.	rois = rois[keep, :]  
303.	print(rois.shape)  
304.	# 以上取出了要送入roi head进行预测的roi(RPN网络产生的预测框)  
305.	  
306.	''''' 
307.	第四部分,对第三部分产生的rois进行进一步的采样,先对rpn预测后送进来的框进行定位, 
308.	即计算每个框和每个gt的iou,根据iou对其进行采样,并进行位移坐标定位。 
309.	'''  
310.	n_sample = 128  
311.	pos_ratio = 0.25  
312.	pos_iou_thre = 0.5  
313.	neg_iou_thre_hi = 0.5  
314.	neg_iou_thre_lo = 0.0  
315.	''''' 
316.	先采样,该部分根据输入到这里的rpn产生的roi,先计算这些roi实际的label和相对于gt的位移坐标 
317.	用于与roi head生成的对比,计算loss 
318.	'''  
319.	# 计算iou  
320.	ious = util.iou(rois, bboxes) # 2000*2  
321.	print(ious)  
322.	print(ious.shape)  
323.	# 获取每个anchor对应的最大iou,及对应的gt  
324.	gt_argroi = ious.argmax(axis=1)  
325.	roi_max_ious = ious.max(axis=1)  
326.	gt_roi_label = labels[gt_argroi] # 对每个roi分配真实label  
327.	# 分配正样本  
328.	n_pos = n_sample*pos_ratio  
329.	pos_index = np.where(roi_max_ious > pos_iou_thre)[0]  
330.	pos_roi_this_image = int(min(n_pos, len(pos_index)))  
331.	if len(pos_index) > 0:  
332.	    pos_index = np.random.choice(pos_index, size=pos_roi_this_image, replace=False)  
333.	print(pos_index)  
334.	print(len(pos_index))  
335.	  
336.	neg_roi_this_image = n_sample - pos_roi_this_image  
337.	neg_index = np.where((roi_max_ious < neg_iou_thre_hi) & (roi_max_ious > neg_iou_thre_lo))[0]  
338.	neg_roi_this_image = int(min(neg_roi_this_image, len(neg_index)))  
339.	if len(neg_index) > 0:  
340.	    neg_index = np.random.choice(neg_index, size=neg_roi_this_image, replace=False)  
341.	print(neg_index)  
342.	print(len(neg_index))  
343.	# 以上采样出了正样本和负样本的索引,对这些roi求真实label和真实位移坐标作为gt_roi  
344.	keep_index = np.append(pos_index, neg_index)  
345.	print(keep_index)  
346.	sample_rois = rois[keep_index, :]  
347.	print(sample_rois.shape)  
348.	# 计算采样的rois的真实位移坐标和真实类别  
349.	gt_for_sample_rois = bboxes[gt_argroi[keep_index]] # 获取与sample_rois对应的gt框  
350.	w = sample_rois[:, 2] - sample_rois[:, 0]  
351.	h = sample_rois[:, 3] - sample_rois[:, 1]  
352.	center_x = sample_rois[:, 0] + w/2  
353.	center_y = sample_rois[:, 1] + h/2  
354.	gt_w = gt_for_sample_rois[:, 2] - gt_for_sample_rois[:, 0]  
355.	gt_h = gt_for_sample_rois[:, 3] - gt_for_sample_rois[:, 1]  
356.	gt_center_x = gt_for_sample_rois[:, 0] + w/2  
357.	gt_center_y = gt_for_sample_rois[:, 1] + h/2  
358.	eps = torch.tensor(1e-10)  
359.	h = np.maximum(h, eps)  
360.	w = np.maximum(w, eps)  
361.	dx = (gt_center_x - center_x)/w  
362.	dy = (gt_center_y - center_y)/h  
363.	dw = np.log(gt_w/w)  
364.	dh = np.log(gt_h/h)  
365.	gt_sample_roi_locations = np.vstack((dx, dy, dw, dh)).transpose()  
366.	gt_sample_roi_labels = gt_roi_label[keep_index]  
367.	gt_sample_roi_labels[pos_roi_this_image:] = 0 # 负样本的labels设置成0  
368.	''''' 
369.	gt_sample_roi_locations与gt_sample_roi_labels是roi部分的ground truth 
370.	'''  
371.	print(gt_sample_roi_locations)  
372.	print(gt_sample_roi_locations.shape)  
373.	print(gt_sample_roi_labels.shape)  
374.	print(sample_rois)  
375.	# 以上为处理结果,gt_sample_roi_locations和gt_sample_roi_labels为每个sample_roi对应的真实label和位移坐标  
376.	# sample_rois将被送入roi head来预测label和位移结果  
377.	print(sample_rois.shape)  
378.	roi_indexes = torch.zeros((sample_rois.shape[0]), dtype=torch.int32)  
379.	print(roi_indexes.shape)  
380.	# rois是用于输入roi head的数据,再sample_rois的基础上添加了一个img的索引,在本例中只有一个image  
381.	  
382.	rois = torch.zeros((sample_rois.shape[0], sample_rois.shape[1]+1))  
383.	rois[:, 0] = roi_indexes  
384.	rois[:, 1:] = sample_rois  
385.	print(rois.shape)  
386.	print(rois)  
387.	''''' 
388.	此处处理逻辑是先把sample_rois加上一维,来表示是哪张图片的,因为实际中可能一次传入一个batch多张图片 
389.	在本代码中只传入一张,所以该维全初始化为0,然后将sample_rois下采样16倍映射到对应的feature_map上 
390.	然后传入roi pooling获得roi pooling处理后的结果,再传入roi head获得预测的结果 
391.	'''  
392.	size = 7  
393.	roi_pooling = nn.AdaptiveMaxPool2d(size, size)  
394.	out_put = [] # 用于存储roi pooling处理后的结果  
395.	# 下采样sub_sample倍,从原图映射到特征图上  
396.	rois[:, 1:].mul_(1.0/16.0)  
397.	print(feature_map.shape)  
398.	for i in range(rois.shape[0]):  
399.	    roi = rois[i]  
400.	    img_index = int(roi[0])  
401.	    feature_im = feature_map[img_index, :, int(roi[1]):int(roi[3]), int(roi[2]):int(roi[4])] # 取出对应到feature map上的图  
402.	    roi_pooling_im = roi_pooling(feature_im)  
403.	    out_put.append(roi_pooling_im[0])  
404.	out_put = torch.stack(out_put)  
405.	print(out_put.shape)  
406.	# output中存储的就是sample_rois经过roi pooling处理后的特征图  
407.	out_put_linear = out_put.view(out_put.shape[0], -1) # 后面都是全连接层  
408.	print(out_put_linear.shape)  
409.	class ROIHead(nn.Module):  
410.	    def __init__(self, num_class):  
411.	        super(ROIHead, self).__init__()  
412.	        num_class = num_class  
413.	        self.linear1 = nn.Linear(25088, 4096)  
414.	        self.linear2 = nn.Linear(4096, 4096)  
415.	        # 输入的是每个rois映射到特征图再经过roi pooling的结果,预测每个roi中物体的类别和位移坐标  
416.	        self.location = nn.Linear(4096, (num_class+1)*4) # 每个类别的位移坐标  
417.	        self.score = nn.Linear(4096, (num_class+1)) # 每个类别的分数  
418.	        self._init_weight()  
419.	  
420.	    def _init_weight(self):  
421.	        self.linear1.weight.data.normal_(0, 0.01)  
422.	        self.linear1.bias.data.zero_()  
423.	        self.linear2.weight.data.normal_(0, 0.01)  
424.	        self.linear2.bias.data.zero_()  
425.	        self.location.weight.data.normal_(0, 0.01)  
426.	        self.location.bias.data.zero_()  
427.	        self.score.weight.data.normal_(0, 0.01)  
428.	        self.score.bias.data.zero_()  
429.	  
430.	    def forward(self, x):  
431.	        x = self.linear1(x)  
432.	        x = self.linear2(x)  
433.	        pred_roi_locations = self.location(x) # (num_class+1)*4  
434.	        pred_roi_labels = self.score(x) # num_class+1  
435.	        return pred_roi_locations, pred_roi_labels  
436.	  
437.	roihead = ROIHead(num_class=20)  
438.	print(out_put_linear.shape)  
439.	pred_roi_locations, pred_roi_labels = roihead(out_put_linear)  
440.	print(pred_roi_locations.shape) # (n_sample, (num_class+1)*4)  
441.	print(pred_roi_labels.shape) # (n_sample, (num_class+1))  
442.	  
443.	''''' 
444.	第五部分,计算损失函数,本部分分两小部分,第一部分计算rpn的损失,第二部分计算roi的损失 
445.	'''  
446.	# rpn损失计算使用  
447.	loss_lambda = 10  
448.	print("RPN Loss")  
449.	print(anchor_locations.shape)  
450.	print(anchor_labels.shape)  
451.	print(pred_anchor_location.shape)  
452.	print(pred_anchor_cls.shape)  
453.	anchor_locations = torch.from_numpy(anchor_locations)  
454.	anchor_labels = torch.from_numpy(anchor_labels)  
455.	pred_anchor_location = pred_anchor_location[0]  
456.	pred_anchor_cls = pred_anchor_cls[0]  
457.	print(anchor_locations.shape, anchor_labels.shape, pred_anchor_location.shape, pred_anchor_cls.shape)  
458.	# 分类损失, 交叉熵损失  
459.	anchor_labels = anchor_labels.long()  
460.	rpn_cls_loss = F.cross_entropy(pred_anchor_cls, anchor_labels, ignore_index=-1)  
461.	print(rpn_cls_loss)  
462.	# 回归损失,smooth l1损失, 只对gt anchor labels为1的进行smooth l1损失计算  
463.	pos_index = anchor_labels > 0  
464.	print(pos_index.shape)  
465.	print(pos_index)  
466.	mask = pos_index.unsqueeze(1).expand_as(anchor_locations)  
467.	print(mask.shape)  
468.	print(mask)  
469.	# 取出label为正的anchor location计算损失  
470.	anchor_locations = anchor_locations[mask].view(-1, 4) # 18*4  
471.	pred_anchor_location = pred_anchor_location[mask].view(-1, 4) # 18*4  
472.	x = torch.abs(anchor_locations - pred_anchor_location)  
473.	print(x.shape)  
474.	rpn_loc_loss = (x < 1).float()*0.5*x**2 + (x >= 1).float()*(x-0.5)  
475.	rpn_loc_loss = rpn_loc_loss.sum() # 这是回归损失总和,要求平均  
476.	print(rpn_loc_loss)  
477.	n_reg = (anchor_labels>0).float().sum() # 总数  
478.	print(n_reg)  
479.	rpn_loc_loss = rpn_loc_loss/n_reg # 平均  
480.	print(rpn_loc_loss)  
481.	rpn_loss = rpn_cls_loss + loss_lambda*rpn_loc_loss  
482.	print("rpn loss:{}".format(rpn_loss))  
483.	  
484.	print("RPN Loss Finished")  
485.	print("-----------------------------------")  
486.	# 计算roi损失使用  
487.	print("-----------------------------------")  
488.	print("ROI Loss")  
489.	print(gt_sample_roi_locations.shape)  
490.	print(gt_sample_roi_labels.shape)  
491.	print(pred_roi_locations.shape)  
492.	print(pred_roi_labels.shape)  
493.	gt_sample_roi_locations = torch.from_numpy(gt_sample_roi_locations)  
494.	gt_sample_roi_labels = gt_sample_roi_labels.long()  
495.	# 分类损失  
496.	roi_cls_loss = F.cross_entropy(pred_roi_labels, gt_sample_roi_labels, ignore_index=-1)  
497.	print(roi_cls_loss)  
498.	# 回归损失  
499.	pred_roi_locations = pred_roi_locations.view(pred_roi_locations.shape[0], -1, 4)  
500.	print(pred_roi_locations.shape) # 128*21*4  
501.	# 取出pred_roi_locations与gt_roi_locations中对应的那一类的位移坐标进行计算  
502.	pred_roi_locations = pred_roi_locations[np.arange(0, pred_roi_locations.shape[0]), gt_sample_roi_labels] # 128*4  
503.	print(pred_roi_locations.shape)  
504.	  
505.	# 取出正标签,并计算其loss  
506.	pos_index = gt_sample_roi_labels > 0 # 正标签  
507.	mask = pos_index.unsqueeze(1).expand_as(pred_roi_locations) # 掩码  
508.	print(mask.shape)  
509.	pred_roi_locations = pred_roi_locations[mask].view(-1, 4) # 获取预测框中为正标签的部分  
510.	gt_sample_roi_locations = gt_sample_roi_locations[mask].view(-1, 4) # 同上,获取gt中的
511.	print(pred_roi_locations.shape, gt_sample_roi_locations.shape)  
512.	x = torch.abs(pred_roi_locations - gt_sample_roi_locations)  
513.	roi_loc_loss = (x < 1).float()*0.5*x**2 + (x >= 1).float()*(x-0.5)  
514.	roi_loc_loss = roi_loc_loss.sum()  
515.	print(roi_loc_loss)  
516.	n_reg = (gt_sample_roi_labels > 0).sum()  
517.	roi_loc_loss = roi_loc_loss/n_reg  
518.	roi_loss = roi_cls_loss + loss_lambda*roi_loc_loss  
519.	print(roi_loc_loss)  
520.	print("roi_loss: {}".format(roi_loss))  
521.	print("ROI Loss Finished")  
522.	total_loss = rpn_loss+roi_loss  
523.	print("total loss: {}".format(total_loss))  
524.	total_loss.backward() 

TODO

后续会开始看基于深度学习的边缘检测方法(应用于缺陷检测)

你可能感兴趣的:(目标检测,目标检测,Faster,Rcnn)