Faster Rcnn是双阶段目标检测家族中的一员,由Rcnn -> Spp-net -> Fast Rcnn 再到Faster Rcnn,Faster Rcnn中首次使用深度学习的方法进行关键区域的提取,真正实现了end to end的目标检测,Faster Rcnn是双阶段目标检测系列最关键的节点,其后出现的Mask Rcnn与Cascade Rcnn都是基于Faster Rcnn而来,本次实现一个简要版的Faster Rcnn以增强自己对其的理解。
在之前参加天池比赛时,使用了Faster Rcnn和FPN,并做出了一定的改进也取得了不错的成绩,但当时是在mmdetection框架的基础上进行改进,难免无法顾及一些细节,通过这次从头开始实现Faster Rcnn和FPN,对细节方面有了更好的掌握,相信在实现了Faster Rcnn后,双步和和单步的目标检测算法我都可以进行简要版的复现,下图是Faster Rcnn的结构图。
Faster Rcnn的实现分为五个阶段:
第一阶段,根据输入的图像和标注的框信息(后续称为ground-truth)计算anchor的真实标签和位移坐标,该阶段生成的anchor的真实标签和位移坐标将用于与RPN网络预测的anchor的标签和位移坐标计算RPN网络的损失以更新RPN网络的权重。
假设输入图像大小为(800, 800),采用vgg16作为特征提取网络,下采样16倍,得到的特征图大小为(50,
50),对特征图上的每个点,映射回原图产生anchor,假设设置anchor_scale为(8, 16,
32),anchor_ratio为(0.5,1,2),那么每个位置将产生9个anchor,其中anchor_scale为anchor的大小,anchor_ratio为anchor的长宽比,需要注意的是,这里设置的anchor_scale是相对于特征图的,当映射回原图时需要乘以下采样倍数。对每个位置产生9个anchor,一共需要产生 50 ∗ 50 ∗ 9 50*50*9 50∗50∗9即22500个anchor,对这些anchor进行anchor的定位和采样,即将anchor分配给与其具有最大iou的ground_trouth(会从中采样256个,别的忽略即label为-1,正负样本比例为1:1,根据iou判断正负样本),转换公式如式1、2、3、4。
d x = ( g t x − a n c h o r x ) / a n c h o r w (1) dx=(gt_x-anchor_x)/anchor_w \tag{1} dx=(gtx−anchorx)/anchorw(1)
d y = ( g t y − a n c h o r y ) / a n c h o r h (2) dy=(gt_y-anchor_y)/anchor_h\tag{2} dy=(gty−anchory)/anchorh(2)
d w = l o g ( g t w / a n c h o r w ) (3) dw=log(gt_w/anchor_w)\tag{3} dw=log(gtw/anchorw)(3)
d h = l o g ( g t h / a n c h o r h ) (4) dh=log(gt_h/anchor_h)\tag{4} dh=log(gth/anchorh)(4)
其中,dx、dy、dw、dh为anchor相对于ground_truth的位移坐标,gt_x、gt_y、gt_w、gt_h为ground_truth的中心坐标和宽高,anchor_x、anchor_y、anchor_w、anchor_h为anchor的中心坐标和宽高,同时根据anchor与ground_truth的iou来生成其真实标签(0或1),RPN网络只有前景和背景两种。该阶段的目的是对所有anchor生成其真实的位移坐标和标签,即gt_anchor_locations和gt_anchor_labels,用于联合RPN网络预测的pred_anchor_locations和pred_anchor_labels计算损失函数。
第二阶段,用RPN网络预测所有anchor的位移坐标和标签,即pred_anchor_locations和pred_anchor_labels。
下图显示了RPN网络细节 如图3所示,为RPN网络的实现细节,在实际实现时,第一阶段产生的特征图大小为 50 ∗ 50 50*50 50∗50, 通道数为512,RPN网络由一个 3 ∗ 3 3*3 3∗3的卷积核和两个 1 ∗ 1 1*1 1∗1的卷积分支构成, 3 ∗ 3 3*3 3∗3的卷积核加入了padding=1,即不改变原特征图的尺寸大小,两个 1 ∗ 1 1*1 1∗1的卷积分支分别预测每个位置9个anchor的类别和位移坐标,因此此处输入为提取到的(50,50,512)的特征图,其中512为通道数,而输出为(50, 50,18)的类别预测和(50,50,36)的位移坐标预测。第二阶段产生的pred_anchor_labels和pred_anchor_locations将用于与第一阶段计算的gt_anchor_labels和gt_anchor_locations一起计算RPN阶段的损失loss。
第三阶段,对第二阶段预测的anchor处理,根据第二阶段预测的pred_anchor_locations中的dx、dy、dw、dh结合初始anchor信息反向计算出RPN阶段预测的ground_truth的左上坐标和右下坐标(x1,y1,x2,y2),根据score对其进行排序,取前12000个进行nms,在nms后的剩余框中取前2000个,注意此时的pred_anchor_locations中存储的是反向推算出的预测框在原图上的位置,对剩下的这2000个框根据ground_truth进行采样和定位,计算出这2000个框相对于ground_truth的真实labels和位移坐标locations,根据iou进行采样和定位,与groud_truth的iou大于0.5的分为正样本,此时需要记录其对应的ground_truth的label,该部分标签为类别数,而不是前景背景(0,1),定位公式同第一阶段,然后对定位后的框进行采样,该阶段采样128个,其中正样本比例为0.25,该阶段最后产生的是根据RPN网络预测的pred_anchor_locations、pred_anchor_labels与ground_truth计算出的128个gt_roi_labels和gt_roi_locations。
第四阶段,第二阶段通过RPN网络产生了pred_anchor_labels和pred_anchor_locations,第三阶段从其中采样出了128个sample_rois,对这128个sample_rois计算出了其相对于ground_truth的真实标签和位移坐标即gt_roi_labels和gt_roi_locations,第四阶段将第三阶段采样出的sample_rois先送入roi pooling层获得 7 ∗ 7 ∗ 512 7*7*512 7∗7∗512固定大小的特征图,然后将其拉平产生一个(1, 25088)的特征向量,然后送入两层全连接层得到(1,4096)的特征向量,最后通过两个全连接层分支,分别预测其类别(num_class+1)和位移坐标((num_class+1)*4),即pred_roi_labels和pred_roi_locations。
第五阶段,根据前四个阶段计算的结果计算损失,其中RPN阶段的损失通过gt_anchor_labels、gt_anchor_locations、pred_anchor_labels、pred_anchor_locations计算,ROI阶段的损失通过gt_roi_labels、gt_roi_locations、pred_roi_labels、pred_roi_locations计算,分类损失使用交叉熵损失函数计算,回归损失通过smooth L1损失函数计算,分别计算出rpn_cls_loss、rpn_loc_loss、roi_cls_loss、roi_loc_loss,计算损失时要注意,分类损失是对所有框进行计算,而回归损失只对样本标签有意义的框计算,因此在计算总损失时要在回归损失前乘以10或者使分类损失除以10,即
rpn_loss = rpn_cls_loss/10 + rpn_loc_loss,
roi_loss = roi_cls_loss/10+roi_loc_loss,
total_loss = rpn_loss+roi_loss。
最后根据损失更新权重。 交叉熵损失函数如式5所示,smooth L1损失如式6所示。
L = − ∑ c = 1 M y c log ( p c ) (5) L=-\sum_{c=1}^{M} y_{c} \log \left(p_{c}\right)\tag{5} L=−c=1∑Myclog(pc)(5)
L = { 0.5 x 2 , ∣ x ∣ < 1 ∣ x ∣ − 0.5 , ∣ x ∣ ≥ 1 (6) L=\left\{\begin{array}{cc}{0.5 \mathrm{x}^{2},} & {|x|<1} \\ {|x|-0.5,} & {|x| \geq 1}\end{array}\right.\tag{6} L={0.5x2,∣x∣−0.5,∣x∣<1∣x∣≥1(6)
辅助模块util.py
1. import numpy as np
2.
3. def iou(valid_anchors, gt_box):
4. # 传入两个box,左上坐标和右下坐标,大小为 n*4
5. # 返回ious,((len(valid_anchors)*len(gt_box)))
6. # 每个valid_anchor与每个gt_box都有iou,ious维度:(len(valid_anchors)*len(gt_box))
7. valid_anchors_num = valid_anchors.shape[0]
8. gt_box_num = gt_box.shape[0]
9. ious = np.empty((valid_anchors_num, gt_box_num))
10. ious.fill(0)
11. for i, anchor in enumerate(valid_anchors):
12. xa1, ya1, xa2, ya2 = anchor
13. area1 = (xa2-xa1)*(ya2-ya1)
14. for j, bbox in enumerate(gt_box):
15. xb1, yb1, xb2, yb2 = bbox
16. area2 = (xb2-xb1)*(yb2-yb1)
17. xx1 = np.max([xa1, xb1])
18. yy1 = np.max([ya1, yb1])
19. xx2 = np.min([xa2, xb2])
20. yy2 = np.min([ya2, yb2])
21. if(xx1 < xx2 and yy1 < yy2):
22. inter_area = (yy2-yy1)*(xx2-xx1)
23. iou = inter_area/(area1+area2-inter_area)
24. ious[i, j] = iou
25. return ious
26.
27. def nms(bboxes, thre, scores):
28. # 输入为n*4的框, thre为阙值, scores为每个框对应的score
29. # 输入均为numpy类型
30. # 输出为nms后的剩余框
31. x1 = bboxes[:, 0]
32. y1 = bboxes[:, 1]
33. x2 = bboxes[:, 2]
34. y2 = bboxes[:, 3]
35. areas = (x2-x1)*(y2-y1)
36. order = np.argsort(scores)[::-1]
37. keep = [] # nms后剩下的框的index
38. while order.size > 0:
39. i = order[0] # i为最大score的索引
40. keep.append(i)
41. xx1 = np.maximum(x1[i], x1[order[1:]])
42. yy1 = np.maximum(y1[i], y1[order[1:]])
43. xx2 = np.minimum(x2[i], x2[order[1:]])
44. yy2 = np.minimum(y2[i], y2[order[1:]])
45. w = np.maximum(0, xx2-xx1)
46. h = np.maximum(0, yy2-yy1)
47. inter = w*h
48. ious = inter/(areas[i]+areas[order[1:]]-inter)
49. indexes = np.where(ious < thre)[0]
50. order = order[indexes+1]
51. return keep
主要模块faster_rcnn.py:
1. import torch
2. import torchvision
3. import torch.nn as nn
4. import torch.nn.functional as F
5. import numpy as np
6. import util
7.
8. '''''
9. 第一阶段,根据原图的gt生成anchor的gt,生成的anchor_gt用于与RPN网络产生的roi计算损失
10. 注意,该阶段anchor只分两类,0或1,-1表示忽略
11. RPN网络对特征图上的每个点上的9个anchor进行预测,预测其类别(0,1)和其相对于gt的相对位置(dx,dy,dw,dh)
12. 此部分我们要先求出每个anchor分配后的实际类别(0,1)和相对于gt的真实位置(dx,dy,dw,dh)
13. 以此来求loss
14. 对于800*800的图,下采样16倍后特征图大小为50*50,每个位置9个anchor,共50*50*9即22500个anchor
15. 对这22500个anchor,先求出其真实的类别和相对gt的位移,再与RPN网络预测的类别和位移相比较,计算损失。
16. 该部分共采样了256个anchor,也就是真实求出的anchor labels中只有256个是1或0,别的都是-1(忽略)
17. '''
18. # 先制作一张图片,并设置其groud_truth和对应的label
19. image = torch.zeros((1, 3, 800, 800))
20. bboxes = torch.Tensor([[20, 30, 400, 500], [300, 400, 500, 600]])
21. labels = torch.Tensor([6, 8])
22. sub_sample = 16 # 下采样倍数
23.
24. # 获取vgg模型,使用vgg模型提取特征,下采样16倍
25. model = torchvision.models.vgg16(pretrained=True)
26. fe = list(model.features)
27.
28. backbone = []
29. img_bak = image.clone()
30. for i in fe:
31. img_bak = i(img_bak)
32. if(img_bak.shape[2] < 50):
33. break
34. backbone.append(i)
35. out_channels = img_bak.shape[1]
36. backbone = nn.Sequential(*backbone)
37. feature_map = backbone(image)
38. print(backbone)
39. print(feature_map.shape) # 50*50
40.
41. # 对特征图生成所有anchors,特征图为50*50,将其上每个点映射回原图生成anchors
42. size = 800//16
43. centerX = np.arange(16, (size+1)*16, 16)
44. centerY = np.arange(16, (size+1)*16, 16)
45. # print(centerX)
46. center_x = centerX - 8
47. center_y = centerY - 8
48. print(center_x)
49. # anchor的参数,注意scale是针对特征图的
50. anchor_scales = [8, 16, 32]
51. anchor_ratios = [0.5, 1.0, 2]
52. anchor_center = np.zeros((size*size, 2)) # 2500*2
53. # 初始化anchor的中心, 共2500个
54. index = 0
55. for i in range(len(center_x)):
56. for j in range(len(centerY)):
57. anchor_center[index, 0] = center_x[i]
58. anchor_center[index, 1] = center_y[j]
59. index += 1
60. print(anchor_center.shape)
61.
62. # 生成所有的anchors
63. anchors = torch.zeros((size*size*9, 4), dtype=torch.float32) # 共50*50个位置,每个位置9个anchors,每个anchor4个坐标(x1,y1,x2,y2)
64. index = 0
65. for c in anchor_center:
66. center_x, center_y = c
67. for i in range(len(anchor_scales)):
68. for j in range(len(anchor_ratios)):
69. # h = np.sqrt(sub_sample*anchor_scales[i]*anchor_ratios[j])
70. # w = np.sqrt(sub_sample*anchor_scales[i]*(1./anchor_ratios[j]))
71. h = sub_sample * anchor_scales[i] * np.sqrt(anchor_ratios[j])
72. w = sub_sample * anchor_scales[i] * np.sqrt((1. / anchor_ratios[j]))
73. anchors[index, 0] = center_x - w/2
74. anchors[index, 1] = center_y - h/2
75. anchors[index, 2] = center_x + w/2
76. anchors[index, 3] = center_y + h/2
77. index += 1
78. print(anchors.shape)
79. print(anchors)
80.
81. # 获取有效的anchors的索引index, 即不超过边界的anchors
82. valid_anchors_index = np.where( # 有效anchors的索引
83. (anchors[:, 0] >= 0) &
84. (anchors[:, 1] >= 0) &
85. (anchors[:, 2] <= 800) &
86. (anchors[:, 3] <= 800)
87. )[0]
88. print(valid_anchors_index)
89. valid_anchors = anchors[valid_anchors_index] # 有效anchors
90. print(valid_anchors_index.shape)
91. print(valid_anchors.shape)
92. # 计算所有有效anchor和gt的iou
93. ious = util.iou(valid_anchors, bboxes) # (valid_anchors.shape[0], bboxes.shape[0])
94. print(ious.shape)
95. '''''
96. 开始分类anchor,与gt的iou最大的ancho分为前景,max iou>0.7的分为前景,否则分为背景
97. '''
98. gt_maxiou_index = ious.argmax(axis=0) # axis=0表示对列取最大,ious有两列,每一列的最大值的index
99. print(gt_maxiou_index)
100. anchor_maxiou_index = ious.argmax(axis=1) # 对ious每行取最大值,即anchor与几个gt的iou中的最大值
101. print(anchor_maxiou_index)
102. # 取出每个gt最大iou的anchor和每个anchor最大iou的gt
103. gt_maxiou = ious[gt_maxiou_index, np.arange(bboxes.shape[0])]
104. anchor_maxiou = ious[np.arange(valid_anchors.shape[0]), anchor_maxiou_index]
105. print(gt_maxiou.shape)
106. print(anchor_maxiou.shape)
107. gt_maxiou_index = np.where(ious==gt_maxiou)[0] # 和gt有最大iou的anchor的索引
108.
109. # 设置pos参数,即iou大于0.7的为前景,小于0.3为背景,采样256个,前景占比0.5
110. pos_iou_thre = 0.7
111. neg_iou_thre = 0.3
112. pos_ratio = 0.5
113. n_sample = 256
114. valid_anchor_labels = np.empty((valid_anchors.shape[0]))
115. valid_anchor_labels.fill(-1) # 初始化为-1, 表示忽略
116. valid_anchor_labels[gt_maxiou_index] = 1
117. valid_anchor_labels[anchor_maxiou >= pos_iou_thre] = 1
118. valid_anchor_labels[anchor_maxiou < neg_iou_thre] = 0
119. print(valid_anchor_labels.shape)
120. # 采样正负样本
121. n_pos = n_sample*pos_ratio
122. pos_index = np.where(valid_anchor_labels == 1)[0]
123. if(len(pos_index) > n_pos):
124. disable_index = np.random.choice(pos_index, size=(len(pos_index)-n_pos), replace=False)
125. valid_anchor_labels[disable_index] = -1
126.
127. n_neg = n_sample*(1-pos_ratio)
128. if(len(pos_index) > n_pos):
129. pass
130. else:
131. n_neg = n_sample-len(pos_index)
132. neg_index = np.where(valid_anchor_labels==0)[0]
133. if(len(neg_index) > n_neg):
134. disable_index = np.random.choice(neg_index, size=(len(neg_index) - n_neg), replace=False)
135. valid_anchor_labels[disable_index] = -1
136. # 此时正负样本均已采样,共采样256个
137. print(np.sum(valid_anchor_labels==1))
138. print(np.sum(valid_anchor_labels==0))
139.
140. # 开始给每个anchor分配位置,dx,dy,dw,dh,将每个anchor分配到与其具有最大iou的gt上,即anchor相对于gt的坐标
141. '''''
142. t_{x} = (x - x_{a})/w_{a}
143. t_{y} = (y - y_{a})/h_{a}
144. t_{w} = log(w/ w_a)
145. t_{h} = log(h/ h_a)
146. x, y , w, h是ground truth box的中心坐标,宽,高。x_a,y_a,h_a,w_a为anchor boxes的中心坐标,宽,高。
147. '''
148. anchor_maxiou_gtbox = bboxes[anchor_maxiou_index]
149. print(anchor_maxiou_gtbox.shape)
150. w = anchor_maxiou_gtbox[:, 2] - anchor_maxiou_gtbox[:, 0]
151. h = anchor_maxiou_gtbox[:, 3] - anchor_maxiou_gtbox[:, 1]
152. x = anchor_maxiou_gtbox[:, 0] + w/2
153. y = anchor_maxiou_gtbox[:, 1] + h/2
154. anchor_w = valid_anchors[:, 2] - valid_anchors[:, 0]
155. anchor_h = valid_anchors[:, 3] - valid_anchors[:, 1]
156. anchor_x = valid_anchors[:, 0] + anchor_w/2
157. anchor_y = valid_anchors[:, 1] + anchor_h/2
158. eps = torch.tensor(1e-10)
159. anchor_h = np.maximum(anchor_h, eps)
160. anchor_w = np.maximum(anchor_w, eps)
161. dx = (x-anchor_x)/anchor_w
162. dy = (y-anchor_y)/anchor_h
163. dw = np.log(w/anchor_w)
164. dh = np.log(h/anchor_h)
165. anchor_location = np.vstack((dx, dy, dw, dh)).transpose()
166. print(anchor_location.shape)
167. anchor_labels = np.zeros((anchors.shape[0]), dtype=np.int32)
168. anchor_labels.fill(-1)
169. anchor_locations = np.zeros_like(anchors, dtype=np.float32)
170. anchor_locations.fill(-1)
171. anchor_labels[valid_anchors_index] = valid_anchor_labels
172. anchor_locations[valid_anchors_index] = anchor_location
173. print(anchor_labels.shape)
174. print(anchor_locations.shape)
175. # 以上为第一部分,获取真实的anchor类别和相对gt的位移坐标。
176.
177. '''''
178. 第二部分,用RPN网络生成预测的anchor的类别和位移坐标
179. '''
180. class RPN(nn.Module):
181. def __init__(self):
182. super(RPN, self).__init__()
183. mid_channels = 512
184. in_channels = 512
185. self.conv1 = nn.Conv2d(in_channels, mid_channels, 3, 1, 1)
186. self.reg_layer = nn.Conv2d(mid_channels, len(anchor_scales)*len(anchor_ratios)*4, 1, 1, 0)
187. self.cls_layer = nn.Conv2d(mid_channels, len(anchor_scales)*len(anchor_ratios)*2, 1, 1, 0)
188. self.conv1.weight.data.normal_(0, 0.01)
189. self.conv1.bias.data.zero_()
190. self.reg_layer.weight.data.normal_(0, 0.01)
191. self.reg_layer.bias.data.zero_()
192. self.cls_layer.weight.data.normal_(0, 0.01)
193. self.cls_layer.bias.data.zero_()
194.
195. def forward(self, x):
196. x = self.conv1(x)
197. pred_anchor_location = self.reg_layer(x)
198. pred_anchor_cls = self.cls_layer(x)
199. return pred_anchor_location, pred_anchor_cls
200.
201. rpn = RPN()
202. print(feature_map.shape)
203. pred_anchor_location, pred_anchor_cls = rpn(feature_map)
204. print(pred_anchor_location.shape)
205. print(pred_anchor_cls.shape)
206. pred_anchor_location = pred_anchor_location.permute(0, 2, 3, 1).contiguous().view(1, -1, 4)
207. pred_anchor_cls = pred_anchor_cls.permute(0, 2, 3, 1).contiguous().view(1, -1, 2)
208. print(pred_anchor_location.shape)
209. print(pred_anchor_cls.shape)
210. print(anchor_locations.shape)
211. print(anchor_labels.shape)
212. # pred_anchor_location与anchor_locations对应,pred_anchor_cls与anchor_labels对应
213. # 用于计算RPN_loss
214. # objectness_score中存储的是每个anchor属于正类的预测分数
215. objectness_score = pred_anchor_cls.view(1, 50, 50, 9, 2)[:, :, :, :, 1].contiguous().view(1, -1) # 预测每个anchor是正样本的分数
216. # 第二部分结束,用RPN网络预测所有anchor的类别和位移坐标,与第一部分求出的所有anchor的真实类别和位移坐标计算rpn loss
217. '''''
218. 第三部分,通过rpn预测的anchor的类别和位移坐标生成roi,输入roi head进行预测
219. 该部分对rpn预测的22500个anchor,先根据预测的位移坐标还原到anchor的坐标,再对前n1个进行nms
220. 再在nms后的anchor中选取前n2个传入roi head进行预测。
221. rpn生成的是原始anchors相对与gt的偏移量。
222. 再第一部分先根据实际gt计算出了原始anchor相对于gt的真实偏移量(256个有效的)
223. 该部分的目的是生成送入roi head的框
224. '''
225. nms_thre = 0.7
226. n_train_pre_nms = 12000
227. n_train_post_nms = 2000
228. n_test_pre_nms = 6000
229. n_test_post_nms = 300
230. min_size = 16
231. # 先把rpn网络预测的位移坐标转换成(x1,y1,x2,y2)坐标
232. '''''
233. x = (w_{a} * ctr_x_{p}) + ctr_x_{a}
234. y = (h_{a} * ctr_x_{p}) + ctr_x_{a}
235. h = np.exp(h_{p}) * h_{a}
236. w = np.exp(w_{p}) * w_{a}
237. 根据原始anchors坐标和rpn生成的dx, dy, dw, dh逆向推断出预测的gt的位置
238. '''
239. pred_anchor_location_numpy = pred_anchor_location[0].data.numpy()
240. objectness_score_numpy = objectness_score[0].data.numpy()
241. anchor_w = anchors[:, 2] - anchors[:, 0]
242. anchor_h = anchors[:, 3] - anchors[:, 1]
243. anchor_x = anchors[:, 0] + anchor_w/2
244. anchor_y = anchors[:, 1] + anchor_h/2
245. dx = pred_anchor_location_numpy[:, 0]
246. dy = pred_anchor_location_numpy[:, 1]
247. dw = pred_anchor_location_numpy[:, 2]
248. dh = pred_anchor_location_numpy[:, 3]
249. # dx1 = pred_anchor_location_numpy[:, 0::4]
250. # dy1 = pred_anchor_location_numpy[:, 1::4]
251. # dw1 = pred_anchor_location_numpy[:, 2::4]
252. # dh1 = pred_anchor_location_numpy[:, 3::4]
253. dx = torch.from_numpy(dx)
254. dy = torch.from_numpy(dy)
255. dw = torch.from_numpy(dw)
256. dh = torch.from_numpy(dh)
257. # 获得基于预测结果(位移坐标)得到的预测框在原图的center_x, center_y, w, h
258. pred_gt_center_x = dx*anchor_w+anchor_x
259. pred_gt_center_y = dy*anchor_h+anchor_y
260. pred_gt_w = np.exp(dw)*anchor_w
261. pred_gt_h = np.exp(dh)*anchor_h
262. print(pred_gt_center_x.shape)
263. print(pred_gt_center_y.shape)
264. print(pred_gt_w.shape)
265. print(pred_gt_h.shape)
266. # 再根据得到的center_x, center_y, w, h转换成左上坐标和右下坐标(x1,y1), (x2,y2)
267. rois = torch.zeros_like(pred_anchor_location[0]) # (22500, 4)
268. rois[:, 0] = pred_gt_center_x - pred_gt_w/2
269. rois[:, 1] = pred_gt_center_y - pred_gt_h/2
270. rois[:, 2] = pred_gt_center_x + pred_gt_w/2
271. rois[:, 3] = pred_gt_center_y + pred_gt_h/2
272. print(rois.shape)
273. # 将得到的框映射到原图上,即限制超过边界的坐标
274. img_size = (800, 800)
275. rois[:, 0] = torch.clamp(rois[:, 0], 0, img_size[0])
276. rois[:, 1] = torch.clamp(rois[:, 1], 0, img_size[1])
277. rois[:, 2] = torch.clamp(rois[:, 2], 0, img_size[0])
278. rois[:, 3] = torch.clamp(rois[:, 3], 0, img_size[1])
279. print(rois)
280. # 去除高度或宽度小于minsize的预测框
281. w = rois[:, 2] - rois[:, 0]
282. h = rois[:, 3] - rois[:, 1]
283. keep = np.where((h.numpy() >= min_size) & (w.numpy() >= min_size))[0]
284. rois = rois[keep, :]
285. before_scores = objectness_score[0][keep]
286. before_scores_numpy = before_scores.data.numpy()
287. print(rois.shape)
288. print(before_scores.shape)
289. print(before_scores_numpy.shape)
290. print(before_scores_numpy.ravel().shape)
291. # 对before_scores按从高到低的顺序排序,取前n1个进行nms,然后再取前n2个送入ROI head中
292. order = np.argsort(before_scores_numpy)[::-1]
293. order = order[:n_train_pre_nms] # 12000
294. order = torch.from_numpy(order.copy())
295. rois = rois[order, :] # 12000*4
296. scores = before_scores[order] # 12000
297. rois_numpy = rois.data.numpy()
298. scores_numpy = scores.data.numpy()
299. keep = util.nms(rois_numpy, nms_thre, scores_numpy)
300. print(len(keep))
301. keep = keep[:n_train_post_nms]
302. rois = rois[keep, :]
303. print(rois.shape)
304. # 以上取出了要送入roi head进行预测的roi(RPN网络产生的预测框)
305.
306. '''''
307. 第四部分,对第三部分产生的rois进行进一步的采样,先对rpn预测后送进来的框进行定位,
308. 即计算每个框和每个gt的iou,根据iou对其进行采样,并进行位移坐标定位。
309. '''
310. n_sample = 128
311. pos_ratio = 0.25
312. pos_iou_thre = 0.5
313. neg_iou_thre_hi = 0.5
314. neg_iou_thre_lo = 0.0
315. '''''
316. 先采样,该部分根据输入到这里的rpn产生的roi,先计算这些roi实际的label和相对于gt的位移坐标
317. 用于与roi head生成的对比,计算loss
318. '''
319. # 计算iou
320. ious = util.iou(rois, bboxes) # 2000*2
321. print(ious)
322. print(ious.shape)
323. # 获取每个anchor对应的最大iou,及对应的gt
324. gt_argroi = ious.argmax(axis=1)
325. roi_max_ious = ious.max(axis=1)
326. gt_roi_label = labels[gt_argroi] # 对每个roi分配真实label
327. # 分配正样本
328. n_pos = n_sample*pos_ratio
329. pos_index = np.where(roi_max_ious > pos_iou_thre)[0]
330. pos_roi_this_image = int(min(n_pos, len(pos_index)))
331. if len(pos_index) > 0:
332. pos_index = np.random.choice(pos_index, size=pos_roi_this_image, replace=False)
333. print(pos_index)
334. print(len(pos_index))
335.
336. neg_roi_this_image = n_sample - pos_roi_this_image
337. neg_index = np.where((roi_max_ious < neg_iou_thre_hi) & (roi_max_ious > neg_iou_thre_lo))[0]
338. neg_roi_this_image = int(min(neg_roi_this_image, len(neg_index)))
339. if len(neg_index) > 0:
340. neg_index = np.random.choice(neg_index, size=neg_roi_this_image, replace=False)
341. print(neg_index)
342. print(len(neg_index))
343. # 以上采样出了正样本和负样本的索引,对这些roi求真实label和真实位移坐标作为gt_roi
344. keep_index = np.append(pos_index, neg_index)
345. print(keep_index)
346. sample_rois = rois[keep_index, :]
347. print(sample_rois.shape)
348. # 计算采样的rois的真实位移坐标和真实类别
349. gt_for_sample_rois = bboxes[gt_argroi[keep_index]] # 获取与sample_rois对应的gt框
350. w = sample_rois[:, 2] - sample_rois[:, 0]
351. h = sample_rois[:, 3] - sample_rois[:, 1]
352. center_x = sample_rois[:, 0] + w/2
353. center_y = sample_rois[:, 1] + h/2
354. gt_w = gt_for_sample_rois[:, 2] - gt_for_sample_rois[:, 0]
355. gt_h = gt_for_sample_rois[:, 3] - gt_for_sample_rois[:, 1]
356. gt_center_x = gt_for_sample_rois[:, 0] + w/2
357. gt_center_y = gt_for_sample_rois[:, 1] + h/2
358. eps = torch.tensor(1e-10)
359. h = np.maximum(h, eps)
360. w = np.maximum(w, eps)
361. dx = (gt_center_x - center_x)/w
362. dy = (gt_center_y - center_y)/h
363. dw = np.log(gt_w/w)
364. dh = np.log(gt_h/h)
365. gt_sample_roi_locations = np.vstack((dx, dy, dw, dh)).transpose()
366. gt_sample_roi_labels = gt_roi_label[keep_index]
367. gt_sample_roi_labels[pos_roi_this_image:] = 0 # 负样本的labels设置成0
368. '''''
369. gt_sample_roi_locations与gt_sample_roi_labels是roi部分的ground truth
370. '''
371. print(gt_sample_roi_locations)
372. print(gt_sample_roi_locations.shape)
373. print(gt_sample_roi_labels.shape)
374. print(sample_rois)
375. # 以上为处理结果,gt_sample_roi_locations和gt_sample_roi_labels为每个sample_roi对应的真实label和位移坐标
376. # sample_rois将被送入roi head来预测label和位移结果
377. print(sample_rois.shape)
378. roi_indexes = torch.zeros((sample_rois.shape[0]), dtype=torch.int32)
379. print(roi_indexes.shape)
380. # rois是用于输入roi head的数据,再sample_rois的基础上添加了一个img的索引,在本例中只有一个image
381.
382. rois = torch.zeros((sample_rois.shape[0], sample_rois.shape[1]+1))
383. rois[:, 0] = roi_indexes
384. rois[:, 1:] = sample_rois
385. print(rois.shape)
386. print(rois)
387. '''''
388. 此处处理逻辑是先把sample_rois加上一维,来表示是哪张图片的,因为实际中可能一次传入一个batch多张图片
389. 在本代码中只传入一张,所以该维全初始化为0,然后将sample_rois下采样16倍映射到对应的feature_map上
390. 然后传入roi pooling获得roi pooling处理后的结果,再传入roi head获得预测的结果
391. '''
392. size = 7
393. roi_pooling = nn.AdaptiveMaxPool2d(size, size)
394. out_put = [] # 用于存储roi pooling处理后的结果
395. # 下采样sub_sample倍,从原图映射到特征图上
396. rois[:, 1:].mul_(1.0/16.0)
397. print(feature_map.shape)
398. for i in range(rois.shape[0]):
399. roi = rois[i]
400. img_index = int(roi[0])
401. feature_im = feature_map[img_index, :, int(roi[1]):int(roi[3]), int(roi[2]):int(roi[4])] # 取出对应到feature map上的图
402. roi_pooling_im = roi_pooling(feature_im)
403. out_put.append(roi_pooling_im[0])
404. out_put = torch.stack(out_put)
405. print(out_put.shape)
406. # output中存储的就是sample_rois经过roi pooling处理后的特征图
407. out_put_linear = out_put.view(out_put.shape[0], -1) # 后面都是全连接层
408. print(out_put_linear.shape)
409. class ROIHead(nn.Module):
410. def __init__(self, num_class):
411. super(ROIHead, self).__init__()
412. num_class = num_class
413. self.linear1 = nn.Linear(25088, 4096)
414. self.linear2 = nn.Linear(4096, 4096)
415. # 输入的是每个rois映射到特征图再经过roi pooling的结果,预测每个roi中物体的类别和位移坐标
416. self.location = nn.Linear(4096, (num_class+1)*4) # 每个类别的位移坐标
417. self.score = nn.Linear(4096, (num_class+1)) # 每个类别的分数
418. self._init_weight()
419.
420. def _init_weight(self):
421. self.linear1.weight.data.normal_(0, 0.01)
422. self.linear1.bias.data.zero_()
423. self.linear2.weight.data.normal_(0, 0.01)
424. self.linear2.bias.data.zero_()
425. self.location.weight.data.normal_(0, 0.01)
426. self.location.bias.data.zero_()
427. self.score.weight.data.normal_(0, 0.01)
428. self.score.bias.data.zero_()
429.
430. def forward(self, x):
431. x = self.linear1(x)
432. x = self.linear2(x)
433. pred_roi_locations = self.location(x) # (num_class+1)*4
434. pred_roi_labels = self.score(x) # num_class+1
435. return pred_roi_locations, pred_roi_labels
436.
437. roihead = ROIHead(num_class=20)
438. print(out_put_linear.shape)
439. pred_roi_locations, pred_roi_labels = roihead(out_put_linear)
440. print(pred_roi_locations.shape) # (n_sample, (num_class+1)*4)
441. print(pred_roi_labels.shape) # (n_sample, (num_class+1))
442.
443. '''''
444. 第五部分,计算损失函数,本部分分两小部分,第一部分计算rpn的损失,第二部分计算roi的损失
445. '''
446. # rpn损失计算使用
447. loss_lambda = 10
448. print("RPN Loss")
449. print(anchor_locations.shape)
450. print(anchor_labels.shape)
451. print(pred_anchor_location.shape)
452. print(pred_anchor_cls.shape)
453. anchor_locations = torch.from_numpy(anchor_locations)
454. anchor_labels = torch.from_numpy(anchor_labels)
455. pred_anchor_location = pred_anchor_location[0]
456. pred_anchor_cls = pred_anchor_cls[0]
457. print(anchor_locations.shape, anchor_labels.shape, pred_anchor_location.shape, pred_anchor_cls.shape)
458. # 分类损失, 交叉熵损失
459. anchor_labels = anchor_labels.long()
460. rpn_cls_loss = F.cross_entropy(pred_anchor_cls, anchor_labels, ignore_index=-1)
461. print(rpn_cls_loss)
462. # 回归损失,smooth l1损失, 只对gt anchor labels为1的进行smooth l1损失计算
463. pos_index = anchor_labels > 0
464. print(pos_index.shape)
465. print(pos_index)
466. mask = pos_index.unsqueeze(1).expand_as(anchor_locations)
467. print(mask.shape)
468. print(mask)
469. # 取出label为正的anchor location计算损失
470. anchor_locations = anchor_locations[mask].view(-1, 4) # 18*4
471. pred_anchor_location = pred_anchor_location[mask].view(-1, 4) # 18*4
472. x = torch.abs(anchor_locations - pred_anchor_location)
473. print(x.shape)
474. rpn_loc_loss = (x < 1).float()*0.5*x**2 + (x >= 1).float()*(x-0.5)
475. rpn_loc_loss = rpn_loc_loss.sum() # 这是回归损失总和,要求平均
476. print(rpn_loc_loss)
477. n_reg = (anchor_labels>0).float().sum() # 总数
478. print(n_reg)
479. rpn_loc_loss = rpn_loc_loss/n_reg # 平均
480. print(rpn_loc_loss)
481. rpn_loss = rpn_cls_loss + loss_lambda*rpn_loc_loss
482. print("rpn loss:{}".format(rpn_loss))
483.
484. print("RPN Loss Finished")
485. print("-----------------------------------")
486. # 计算roi损失使用
487. print("-----------------------------------")
488. print("ROI Loss")
489. print(gt_sample_roi_locations.shape)
490. print(gt_sample_roi_labels.shape)
491. print(pred_roi_locations.shape)
492. print(pred_roi_labels.shape)
493. gt_sample_roi_locations = torch.from_numpy(gt_sample_roi_locations)
494. gt_sample_roi_labels = gt_sample_roi_labels.long()
495. # 分类损失
496. roi_cls_loss = F.cross_entropy(pred_roi_labels, gt_sample_roi_labels, ignore_index=-1)
497. print(roi_cls_loss)
498. # 回归损失
499. pred_roi_locations = pred_roi_locations.view(pred_roi_locations.shape[0], -1, 4)
500. print(pred_roi_locations.shape) # 128*21*4
501. # 取出pred_roi_locations与gt_roi_locations中对应的那一类的位移坐标进行计算
502. pred_roi_locations = pred_roi_locations[np.arange(0, pred_roi_locations.shape[0]), gt_sample_roi_labels] # 128*4
503. print(pred_roi_locations.shape)
504.
505. # 取出正标签,并计算其loss
506. pos_index = gt_sample_roi_labels > 0 # 正标签
507. mask = pos_index.unsqueeze(1).expand_as(pred_roi_locations) # 掩码
508. print(mask.shape)
509. pred_roi_locations = pred_roi_locations[mask].view(-1, 4) # 获取预测框中为正标签的部分
510. gt_sample_roi_locations = gt_sample_roi_locations[mask].view(-1, 4) # 同上,获取gt中的
511. print(pred_roi_locations.shape, gt_sample_roi_locations.shape)
512. x = torch.abs(pred_roi_locations - gt_sample_roi_locations)
513. roi_loc_loss = (x < 1).float()*0.5*x**2 + (x >= 1).float()*(x-0.5)
514. roi_loc_loss = roi_loc_loss.sum()
515. print(roi_loc_loss)
516. n_reg = (gt_sample_roi_labels > 0).sum()
517. roi_loc_loss = roi_loc_loss/n_reg
518. roi_loss = roi_cls_loss + loss_lambda*roi_loc_loss
519. print(roi_loc_loss)
520. print("roi_loss: {}".format(roi_loss))
521. print("ROI Loss Finished")
522. total_loss = rpn_loss+roi_loss
523. print("total loss: {}".format(total_loss))
524. total_loss.backward()
后续会开始看基于深度学习的边缘检测方法(应用于缺陷检测)