先简单介绍下非极大值抑制(NMS),其目的是用来去除冗余的检测框。举个例子:以voc2007数据集中单张图像为例,假设下图中绿色框为各个汽车的真实框,而红色框也是正样本框,但是我们最终希望输出仅仅包含绿色框的检测对象。因此为了抑制这部分红色框,采用非极大值抑制。
首先利用torch(不是numpy)实现下上述过程矩形框的NMS。首先读取图1(a)中的图像和八个矩形框。为了叙述方便,这8个框按照[T,F,T,F,T,F,T,F]顺序排列,即最终检测结果是只剩下四个T真实框。如下所示:boxes用二维tensor表示。Scores表示每个box的置信度分数(为了符合上述[T,F]的设定,下面标红的数字比后面的淡蓝色的数值高,也就是抑制掉较低分数的box)。
boxes=torch.Tensor([[12,311,84,362],[10,300,80,360],
[362,330,500,389],[360,330,500,380],
[175,327,252,364],[170,320,250,360],
[108,325,150,353], [100,320,150,350]])
scores=torch.Tensor([0.99,0.98,0.89,0.88,0.79,0.78,0.69,0.68])
B)首先将每个框置信度分数按照降序排序,首先保存一个置信度分数最大的box,然后计算剩下box和第一个box的iou,若某个框iou>=thr,就认为这两个框重合了,则舍弃掉。
import numpy as np
import torch
import cv2
def box_nms(boxes, scores, thr=0.2):
'''
boxes: [?,4]
scores: [?]
'''
if boxes.shape[0] == 0: #若无预测框,则直接返回0.
return torch.zeros(0, device=boxes.device).long()
assert boxes.shape[-1] == 4
x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
areas = (x2 - x1 + 1) * (y2 - y1 + 1) #计算八个矩形框的面积:tensor([3796., 4331.,...,1581])
order = scores.sort(0, descending=True)[1] #置信度分数降序排序,order是由大到小的下标排序[0,1,2,...,7]
keep = [] #存储最终保存哪个预测框
while order.numel() > 0: #若order里面一直有框,则一直循环
if order.numel() == 1: #若仅有一个框,则直接将其作为最终的预测框并break
i = order.item()
keep.append(i)
break
else: #若order中框的数量>=2
i = order[0].item()
keep.append(i) #首先往keep里面添加order中第一个,置信度分数最高的一个
xmin = x1[order[1:]].clamp(min=float(x1[i]))
ymin = y1[order[1:]].clamp(min=float(y1[i]))
xmax = x2[order[1:]].clamp(max=float(x2[i]))
ymax = y2[order[1:]].clamp(max=float(y2[i]))
inter = (xmax - xmin).clamp(min=0) * (ymax - ymin).clamp(min=0)#并行计算交面积
iou = inter / (areas[i] + areas[order[1:]] - inter)#计算iou
idx = (iou <= thr).nonzero().squeeze() #将iou值>=thr的抑制掉,仅保留阈值小的。
if idx.numel() == 0: #若idx为0,则说明后面的框均被抑制掉,结束循环
break
order = order[idx + 1] #否则继续执行order下一个。
return torch.LongTensor(keep)
if __name__ == '__main__':
devices = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
img = cv2.imread('E:\Z_summary_elllipsenet/000004.jpg')
boxes=torch.Tensor([[12,311,84,362],[10,300,80,360],
[362,330,500,389],[360,330,500,380],
[175,327,252,364],[170,320,250,360],
[108,325,150,353], [100,320,150,350]])
scores=torch.Tensor([0.99,0.98,0.89,0.88,0.79,0.78,0.69,0.68])
keep=box_nms(boxes,scores)
print(boxes[keep])
以第一次循环为例:首先order=[0,1,2,3,4,5,6,7],进入while循环后,keep=[0],也就是添加置信度分数最高的box进行了保存。之后计算第一个框和剩下7个box的iou,故iou的shape=[7]。然后通过一个判断语句==idx = (iou <= thr).nonzero().squeeze()==故将iou中<=thr的box保留了下来,且返回的是iou的下标。即idx=[1,2,3,4,5,6]。之后步入到黄色代码片段,order=order[idx+1],即此时order=[2,3,4,5,6,7],然后步入下一次循环。所以,order=order[idx+1]的目的是为了和最初的order[0,1,2,3,4,5,6,7]下标一一对应。毕竟你事先存储了一个仅keep,所以下标总体加1。
import numpy as np
def py_cpu_nms(dets, thresh=0.2):
"""Pure Python NMS baseline."""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1] # numpy降序排序
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= thresh)[0]
order = order[inds + 1]
return keep
if __name__ == '__main__':
dets=np.array([[12,311,84,362,0.99],[10,300,80,360,0.98],
[362,330,500,389,0.89],[360,330,500,380,0.88],
[175,327,252,364,0.79],[170,320,250,360,0.78],
[108,325,150,353,0.69], [100,320,150,350,0.68]])
scores=np.array([0.99,0.98,0.89,0.88,0.79,0.78,0.69,0.68])
keep=py_cpu_nms(dets)
print(keep)
思考一个问题:假如存在两个不同类别的且重合度极高的两个box,那么最终输出结果这两个box都应该被保留,但是在上述NMS过程中,很显然只是针对同一类别下的NMS。假如在外层增加一个多个类别的话,但是这样运行速度会变慢,因此,实际在调用NMS函数之前,多加了一步操作:
if __name__ == '__main__':
boxes=torch.Tensor([[12,311,84,362],[10,300,80,360], # 假设box均为同类别的
[362,330,500,389],[360,330,500,380],
[175,327,252,364],[170,320,250,360],
[108,325,150,353], [100,320,150,350]])
scores=torch.Tensor([0.99,0.98,0.89,0.88,0.79,0.78,0.69,0.68])
# 假设我在这儿增加一步:让不同类别的box远离一下。
max_coordinate = boxes.max() # max_coordinate = 500
cls=torch.Tensor([1,2,1,1,1,1,1,1])# 假设八个boxes中有两种类别1和2
offsets = cls.to(boxes) * (max_coordinate + 1)
boxes_for_nms = boxes + offsets[:, None]
print('不同类别下的新boxes:',boxes_for_nms)
一般boxes的类别cls为[1,2,3…,20]整数,所以,通过cls*boxes中最大值作为offset,反加到boxes上,即可以将相同类别的boxes聚集在一起,而将不同类别的boxes远离。