今日来填原来的坑
本次第三周的作业是应用YOLO模型来进行对象检测
与本次项目相关的两篇论文:Redmon et al., 2016 (https://arxiv.org/abs/1506.02640) and Redmon and Farhadi, 2016 (https://arxiv.org/abs/1612.08242).
目录
1、问题总述
2、YOLO
2.1 - Model details
2.2 - Filtering with a threshold on class scores (使用阈值进行过滤)
2.3 - Non-max suppression (非极大值抑制)
2.4 Wrapping up the filtering
3 - Test YOLO pretrained model on images
3.1 - Defining classes, anchors and image shape
3.2 - Loading a pretrained model
3.3 - Convert output of the model to usable bounding box tensors(将模型输出转换为识别框tensor)
3.4 - Filtering boxes (过滤boxes)
3.5 - Run the graph on an image (在图片上运行模型)
本次课程我们将学到:
导入本次需要用到的包,该安装的在anaconda navigator 内搜索安装。
import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model
from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body
%matplotlib inline
note : 将keras导入作为后端,若调用keras函数时,则使用 K.funtion(....)
也就是制作训练集,首先要在车顶部安装一个相机 来采集数据,每隔几秒生成一张图片。本次的数据是由drive.ai提供的,
之后是对数据进行标记,将图片中每一辆车用红框标记出来。如下图所示,
因为YOLO模型的训练成本非常高,所以我们使用的是已经过训练的权重 。
YOLO("you only look once"),它只需要一次前向传播进行网络进行预测,在非最大抑制之后,将识别的对象与边界框一起输出。
我们定义5个anchor box ,每个格都会产生5个anchor box 的信息。
为了简单起见,我们将(19,19,5,85)改为(19,19,425)
现在我们可以计算每个分割小方块中的信息,计算概率。
这是一种可视化YOLO预测图像的方法:
以下结果:
另外一种展示YOLO输出的方式是用方框标记识别,不同的颜色表示不同的分类,不同的形状表示不同的ancher。
看到这些复杂重复的输出框之后,使用非极大值抑制是我们要框选的目标更加准确。
使用第一个阈值处理过滤器摆脱分数低的box
- box_confidence: (19×19,5,1) 表示Pc, 每个anchor预测到有对象的分数
- boxes: (19×19,5,4) 表示方框(bx,by,bh,bw)
- box_class_probs: (19×19,5,80) 是哪个类 (c1,c2,…c80)
练习:实现 yolo_filter_boxes()
1、
a = np.random.randn(19*19, 5, 1)
b = np.random.randn(19*19, 5, 80)
c = a * b # shape of c will be (19*19, 5, 80)
2、对每个box
找出最高分的分类(80选1)
得出相应的分数
3、创建一个门槛mask:比如 ([0.9, 0.3, 0.4, 0.5, 0.1] < 0.4) 返回 [False, True, False, False, True] 注意你想保留的boxes应该为true
4、利用 TensorFlow 将 mask 应用到 box_class_scores 上,过滤掉不需要的boxes。
# GRADED FUNCTION: yolo_filter_boxes
def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
"""Filters YOLO boxes by thresholding on object and class confidence.
Arguments:
box_confidence -- tensor of shape (19, 19, 5, 1)
boxes -- tensor of shape (19, 19, 5, 4)
box_class_probs -- tensor of shape (19, 19, 5, 80)
threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
Returns:
scores -- tensor of shape (None,), containing the class probability score for selected boxes
boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes
Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold.
For example, the actual output size of scores would be (10,) if there are 10 boxes.
"""
### START CODE HERE ### (≈ 1 line)
box_scores = box_confidence*box_class_probs #Pc*80个类的预测分数
### END CODE HERE ###
# Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score 跟踪最大的分数
### START CODE HERE ### (≈ 2 lines)
box_classes = K.argmax(box_scores, axis=-1) #获得最高分数的序号(19,19,5,1)
box_class_scores = K.max(box_scores, axis=-1)#获得最高分数的数值
### END CODE HERE ###
# Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the
# same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
### START CODE HERE ### (≈ 1 line)
filtering_mask = box_class_scores >= threshold # don't use A.eval() >= B 将分数大于阈值的box标记为ture,创造掩码
### END CODE HERE ###
# Step 4: Apply the mask to scores, boxes and classes
### START CODE HERE ### (≈ 3 lines)获得符合mask最高分数,该分数所属对象的边界框,该分数所属对象类别
scores = tf.boolean_mask(box_class_scores, filtering_mask)
boxes = tf.boolean_mask(boxes, filtering_mask)
classes = tf.boolean_mask(box_classes, filtering_mask)
### END CODE HERE ###
return scores, boxes, classes
输入数值练习:
with tf.Session() as test_a:
box_confidence = tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1) #指定均值 标准差 随机数种子
boxes = tf.random_normal([19, 19, 5, 4], mean=1, stddev=4, seed = 1)
box_class_probs = tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1)
scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = 0.5)
print("scores[2] = " + str(scores[2].eval()))
print("boxes[2] = " + str(boxes[2].eval()))
print("classes[2] = " + str(classes[2].eval()))
print("scores.shape = " + str(scores.shape))
print("boxes.shape = " + str(boxes.shape))
print("classes.shape = " + str(classes.shape))
输出结果:
练习:实现iou()
在这个练习中(仅在这里), 我们使用两角坐标(左上角/右下角)而来表示一个box
计算box面积的方法 (y2 - y1)x(x2 - x1)
你还需要找到相交部分的坐标(xi1, yi1, xi2, yi2)
# GRADED FUNCTION: iou
def iou(box1, box2):
"""Implement the intersection over union (IoU) between box1 and box2
Arguments:
box1 -- first box, list object with coordinates (x1, y1, x2, y2)
box2 -- second box, list object with coordinates (x1, y1, x2, y2)
"""
# Calculate the (y1, x1, y2, x2) coordinates of the intersection of box1 and box2. Calculate its Area.
### START CODE HERE ### (≈ 5 lines)
xi1 = max(box1[0],box2[0])
yi1 = max(box1[1],box2[1])
xi2 = min(box1[2],box2[2])
yi2 = min(box1[3],box2[3])
inter_area = (yi2-yi1)*(xi2-xi1)
### END CODE HERE ###
# Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
### START CODE HERE ### (≈ 3 lines)
box1_area = (box1[2]-box1[0])*(box1[3]-box1[1])
box2_area = (box2[2]-box2[0])*(box2[3]-box2[1])
union_area = box1_area + box2_area - inter_area
### END CODE HERE ###
# compute the IoU
### START CODE HERE ### (≈ 1 line)
iou = inter_area / union_area
### END CODE HERE ###
return iou
box1 = (2, 1, 4, 3)
box2 = (1, 2, 3, 4)
print("iou = " + str(iou(box1, box2)))
现在实现非最大抑制
关键步骤为:
1. 选出具有最高分数的box
2. 计算该box和其他box的iou, 删除重叠部分iou大于 iou_threshold 的 box
3. 循环1,2 直到没有满足条件的 boxes
这样将会删除所有有大量重叠覆盖的的 boxes,只留下最优的。
练习:使用 TensorFlow 实现 yolo_non_max_suppression()
TensorFlow有两个内置函数,用于实现非最大抑制(因此实际上不需要使用iou()实现):
# GRADED FUNCTION: yolo_non_max_suppression
def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
"""
Applies Non-max suppression (NMS) to set of boxes
Arguments:
scores -- tensor of shape (None,), output of yolo_filter_boxes()
boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)
classes -- tensor of shape (None,), output of yolo_filter_boxes()
max_boxes -- integer, maximum number of predicted boxes you'd like
iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
Returns:
scores -- tensor of shape (, None), predicted score for each box
boxes -- tensor of shape (4, None), predicted box coordinates
classes -- tensor of shape (, None), predicted class for each box
Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
function will transpose the shapes of scores, boxes, classes. This is made for convenience.
"""
max_boxes_tensor = K.variable(max_boxes, dtype='int32') # tensor to be used in tf.image.non_max_suppression()
K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor
# Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
### START CODE HERE ### (≈ 1 line)
nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold)
### END CODE HERE ###
# Use K.gather() to select only nms_indices from scores, boxes and classes
### START CODE HERE ### (≈ 3 lines)
scores = K.gather(scores, nms_indices)
boxes = K.gather(boxes, nms_indices)
classes = K.gather(classes, nms_indices)
### END CODE HERE ###
return scores, boxes, classes
with tf.Session() as test_b:
scores = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
boxes = tf.random_normal([54, 4], mean=1, stddev=4, seed = 1)
classes = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes)
print("scores[2] = " + str(scores[2].eval()))
print("boxes[2] = " + str(boxes[2].eval()))
print("classes[2] = " + str(classes[2].eval()))
print("scores.shape = " + str(scores.eval().shape))
print("boxes.shape = " + str(boxes.eval().shape))
print("classes.shape = " + str(classes.eval().shape))
现在是时候实现一个功能了,采用深度神经网络 CNN (19*19*5*85维编码),并使用刚刚实现的所有的过滤box.
练习:实现 yolo_eval()
yolo_eval 方法将YOLO 的输出进行编码并用非最大抑制进行过滤。
表示 box 的方式由好多种,比如左上角/右下角的坐标,比如中心和宽高。YOLO 在运算过程中将灵活转换这些表示方式。
# GRADED FUNCTION: yolo_eval
def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):
"""
Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.
Arguments:
yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:
box_confidence: tensor of shape (None, 19, 19, 5, 1)
box_xy: tensor of shape (None, 19, 19, 5, 2)
box_wh: tensor of shape (None, 19, 19, 5, 2)
box_class_probs: tensor of shape (None, 19, 19, 5, 80)
image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)
max_boxes -- integer, maximum number of predicted boxes you'd like
score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
Returns:
scores -- tensor of shape (None, ), predicted score for each box
boxes -- tensor of shape (None, 4), predicted box coordinates
classes -- tensor of shape (None,), predicted class for each box
"""
### START CODE HERE ###
# Retrieve outputs of the YOLO model (≈1 line) 检索YOLO模型的输出
box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs[:]
# Convert boxes to be ready for filtering functions 转换 box 为过滤功能做准备
boxes = yolo_boxes_to_corners(box_xy, box_wh)
# Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line) 阈值分数过滤
scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)
# Scale boxes back to original image shape.将框缩放为原始图形形状
boxes = scale_boxes(boxes, image_shape)
# Use one of the functions you've implemented to perform Non-max suppression with a threshold of iou_threshold (≈1 line)执行非最大值抑制
scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)
### END CODE HERE ###
return scores, boxes, classes
with tf.Session() as test_b: #随机初始化下大小为(19,19,5,85)的输出向量:
yolo_outputs = (tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1),
tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1))
scores, boxes, classes = yolo_eval(yolo_outputs)
print("scores[2] = " + str(scores[2].eval()))
print("boxes[2] = " + str(boxes[2].eval()))
print("classes[2] = " + str(classes[2].eval()))
print("scores.shape = " + str(scores.eval().shape))
print("boxes.shape = " + str(boxes.eval().shape))
print("classes.shape = " + str(classes.eval().shape))
创建会话来运行计算图(creating a session to start your graph)
sess = K.get_session()
将文件中的信息加载到模型中,并将原为1280*720的文件修改为608*608
class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
image_shape = (720., 1280.)
yolo.h5生成:
git clone https://github.com/allanzelener/YAD2K.git
cd YAD2K
下载 yolo.weights和yolo.cfg放到文件夹,命令行执行:python yad2k.py yolo.cfg yolo.weights model_data/yolo.h5
下载地址:http://pjreddie.com/media/files/yolo.weights
https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov2.cfg
yolo_model = load_model("model_data/yolo.h5")
yolo_model.summary()
yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))
此时已将yolo-output数据添加到图表中,这组4个张量可以用作yolo-eval函数的输入
yolo-output以正确的格式为yolo-model提供了所有的预测boxes.现在要执行过滤并仅选择最佳box,调用 yolo-eval执行此操作。
scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)
步骤:
1. 创建session
2. yolo_model.input 给到 yolo_model 计算输出 yolo_model.output
3. yolo_model.output 给到 yolo_head,转换为 yolo_output
4. yolo_output 经过过滤-yolo_eval,输出预测的接轨:scores, boxes, classes
Exercise: Implement predict()
方法输出:
当模型使用 BatchNorm 时,feed_dict {K.learning_phase(): 0} 中需要多一个占位符 placeholder
def predict(sess, image_file):
"""
Runs the graph stored in "sess" to predict boxes for "image_file". Prints and plots the preditions.
Arguments:
sess -- your tensorflow/Keras session containing the YOLO graph
image_file -- name of an image stored in the "images" folder.
Returns:
out_scores -- tensor of shape (None, ), scores of the predicted boxes
out_boxes -- tensor of shape (None, 4), coordinates of the predicted boxes
out_classes -- tensor of shape (None, ), class index of the predicted boxes
Note: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes.
"""
# Preprocess your image
image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))
# Run the session with the correct tensors and choose the correct placeholders in the feed_dict.
# You'll need to use feed_dict={yolo_model.input: ... , K.learning_phase(): 0})
### START CODE HERE ### (≈ 1 line)
out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes],feed_dict={yolo_model.input: image_data,K.learning_phase(): 0})
### END CODE HERE ###
# Print predictions info
print('Found {} boxes for {}'.format(len(out_boxes), image_file))
# Generate colors for drawing bounding boxes.
colors = generate_colors(class_names)
# Draw bounding boxes on the image file
draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
# Save the predicted bounding box on the image
image.save(os.path.join("out", image_file), quality=90)
# Display the results in the notebook
output_image = scipy.misc.imread(os.path.join("out", image_file))
imshow(output_image)
return out_scores, out_boxes, out_classes
ut_scores, out_boxes, out_classes = predict(sess, "test.jpg")