YOLO(You Only Look Once)是一个高效的目标检测算法,属于One-Stage大家族,针对于Two-Stage目标检测算法普遍存在的运算速度慢的缺点,YOLO创造性的提出了One-Stage。也就是将物体分类和物体定位在一个步骤中完成。YOLO直接在输出层回归bounding box的位置和bounding box所属类别,从而实现one-stage。
经过两次迭代,YOLO目前的最新版本为YOLOv3,在前两版的基础上,YOLOv3进行了一些比较细节的改动,效果有所提升。
本文正是希望可以将源码加以注释,方便自己学习,同时也愿意分享出来和大家一起学习。由于本人还是一学生,如果有错还请大家不吝指出。
本文参考的源码地址为:https://github.com/wizyoung/YOLOv3_TensorFlow
文件目录:YOUR_PATH\YOLOv3_TensorFlow-master\utils\nms_utils.py
这一部分代码主要是非最大值抑制(NMS)的实现,原理都是相同,过程大致如下:
需要注意的是,NMS只针对于一类类别的数据,如果有多个类别,则需要分别处理。
# coding: utf-8
from __future__ import division, print_function
import numpy as np
import tensorflow as tf
def gpu_nms(boxes, scores, num_classes, max_boxes=50, score_thresh=0.5, nms_thresh=0.5):
"""
Perform NMS on GPU using TensorFlow.
params:
boxes: tensor of shape [1, 10647, 4] # 10647=(13*13+26*26+52*52)*3, for input 416*416 image
scores: tensor of shape [1, 10647, num_classes], score=conf*prob
num_classes: total number of classes
max_boxes: integer, maximum number of predicted boxes you'd like, default is 50
score_thresh: if [ highest class probability score < score_threshold]
then get rid of the corresponding box
nms_thresh: real value, "intersection over union" threshold used for NMS filtering
"""
boxes_list, label_list, score_list = [], [], []
max_boxes = tf.constant(max_boxes, dtype='int32')
# since we do nms for single image, then reshape it
boxes = tf.reshape(boxes, [-1, 4]) # '-1' means we don't konw the exact number of boxes
score = tf.reshape(scores, [-1, num_classes])
# Step 1: Create a filtering mask based on "box_class_scores" by using "threshold".
mask = tf.greater_equal(score, tf.constant(score_thresh))
# Step 2: Do non_max_suppression for each class
for i in range(num_classes):
# Step 3: Apply the mask to scores, boxes and pick them out
filter_boxes = tf.boolean_mask(boxes, mask[:, i])
filter_score = tf.boolean_mask(score[:, i], mask[:, i])
nms_indices = tf.image.non_max_suppression(boxes=filter_boxes,
scores=filter_score,
max_output_size=max_boxes,
iou_threshold=nms_thresh, name='nms_indices')
label_list.append(tf.ones_like(tf.gather(filter_score, nms_indices), 'int32') * i)
boxes_list.append(tf.gather(filter_boxes, nms_indices))
score_list.append(tf.gather(filter_score, nms_indices))
boxes = tf.concat(boxes_list, axis=0)
score = tf.concat(score_list, axis=0)
label = tf.concat(label_list, axis=0)
return boxes, score, label
def py_nms(boxes, scores, max_boxes=50, iou_thresh=0.5):
"""
Pure Python NMS baseline.
Arguments: boxes: shape of [-1, 4], the value of '-1' means that dont know the
exact number of boxes
scores: shape of [-1,]
max_boxes: representing the maximum of boxes to be selected by non_max_suppression
iou_thresh: representing iou_threshold for deciding to keep boxes
"""
assert boxes.shape[1] == 4 and len(scores.shape) == 1
# 下面几行的代码主要是用于求解每个box的面积,然后按照每个box的score的大小进行排序
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
areas = (x2 - x1) * (y2 - y1)
# 按照每个box的score大小进行排序,这里返回的是排序之后的box的index。
# 本质上order储存的是需要处理的box的索引
order = scores.argsort()[::-1]
# keep用于储存保留下来的box的索引index
keep = []
# 如果还存在没有被处理的box的索引
while order.size > 0:
# 由于之前进行了排序,所以order的第一个肯定是score最高的
i = order[0]
# 将这个索引保存起来
keep.append(i)
# 下面的代码主要是求解第一个box和剩下的所有的box的IOU,
# 因为第一个是目标box,所以在order的选取上需要加上[1:],取遍剩下的所有的box
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
# IOU计算
ovr = inter / (areas[i] + areas[order[1:]] - inter)
# 将和目标box的IOU小于一定阈值的box的索引取出,因为高于这一阈值的box都已经被目标box抑制了
inds = np.where(ovr <= iou_thresh)[0]
# 然后更新我们的order,重复下一轮循环。
order = order[inds + 1]
# 最后返回给定数目的box的索引
return keep[:max_boxes]
def cpu_nms(boxes, scores, num_classes, max_boxes=50, score_thresh=0.5, iou_thresh=0.5):
"""
Perform NMS on CPU.
Arguments:
boxes: shape [1, 10647, 4]
scores: shape [1, 10647, num_classes]
"""
boxes = boxes.reshape(-1, 4)
scores = scores.reshape(-1, num_classes)
# Picked bounding boxes
picked_boxes, picked_score, picked_label = [], [], []
for i in range(num_classes):
indices = np.where(scores[:, i] >= score_thresh)
filter_boxes = boxes[indices]
filter_scores = scores[:, i][indices]
if len(filter_boxes) == 0:
continue
# do non_max_suppression on the cpu
indices = py_nms(filter_boxes, filter_scores,
max_boxes=max_boxes, iou_thresh=iou_thresh)
picked_boxes.append(filter_boxes[indices])
picked_score.append(filter_scores[indices])
picked_label.append(np.ones(len(indices), dtype='int32') * i)
if len(picked_boxes) == 0:
return None, None, None
boxes = np.concatenate(picked_boxes, axis=0)
score = np.concatenate(picked_score, axis=0)
label = np.concatenate(picked_label, axis=0)
return boxes, score, label