吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测

今日来填原来的坑

本次第三周的作业是应用YOLO模型来进行对象检测

与本次项目相关的两篇论文:Redmon et al., 2016 (https://arxiv.org/abs/1506.02640) and Redmon and Farhadi, 2016 (https://arxiv.org/abs/1612.08242).

目录

1、问题总述

2、YOLO 

2.1 - Model details 

2.2 - Filtering with a threshold on class scores (使用阈值进行过滤)

2.3 - Non-max suppression (非极大值抑制)

 2.4 Wrapping up the filtering

3 - Test YOLO pretrained model on images 

3.1 - Defining classes, anchors and image shape 

 3.2 - Loading a pretrained model

 3.3 - Convert output of the model to usable bounding box tensors(将模型输出转换为识别框tensor)

3.4 - Filtering boxes (过滤boxes)

3.5 - Run the graph on an image (在图片上运行模型)

本次课程我们将学到:

  • 在汽车检测数据集上使用对象检测
  • 处理边界框 

导入本次需要用到的包,该安装的在anaconda navigator 内搜索安装。

import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Model
from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_body

%matplotlib inline

note : 将keras导入作为后端,若调用keras函数时,则使用 K.funtion(....)

1、问题总述

也就是制作训练集,首先要在车顶部安装一个相机 来采集数据,每隔几秒生成一张图片。本次的数据是由drive.ai提供的,

之后是对数据进行标记,将图片中每一辆车用红框标记出来。如下图所示,

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第1张图片

因为YOLO模型的训练成本非常高,所以我们使用的是已经过训练的权重 。

2、YOLO 

 YOLO("you only look once"),它只需要一次前向传播进行网络进行预测,在非最大抑制之后,将识别的对象与边界框一起输出。

2.1 - Model details 

我们定义5个anchor box ,每个格都会产生5个anchor box 的信息。

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第2张图片

为了简单起见,我们将(19,19,5,85)改为(19,19,425) 

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第3张图片

现在我们可以计算每个分割小方块中的信息,计算概率。 

 

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第4张图片

 

这是一种可视化YOLO预测图像的方法:

  • 对于19x19网格单元格中的每一个,找到概率分数的最大值(对于每个分类的anchor box找出最大值)。
  • 根据网格单元认为最可能的对象,为网格单元着色。 

以下结果:

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第5张图片

 另外一种展示YOLO输出的方式是用方框标记识别,不同的颜色表示不同的分类,不同的形状表示不同的ancher。

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第6张图片

看到这些复杂重复的输出框之后,使用非极大值抑制是我们要框选的目标更加准确。

  • 摆脱低分的box
  • 当多个框彼此重叠并检测同一对象是,仅检测一个框 

2.2 - Filtering with a threshold on class scores (使用阈值进行过滤)

使用第一个阈值处理过滤器摆脱分数低的box

 - box_confidence: (19×19,5,1) 表示Pc, 每个anchor预测到有对象的分数 
- boxes: (19×19,5,4) 表示方框(bx,by,bh,bw) 
- box_class_probs:  (19×19,5,80) 是哪个类 (c1,c2,…c80)

练习:实现 yolo_filter_boxes()

1、

a = np.random.randn(19*19, 5, 1)
b = np.random.randn(19*19, 5, 80)
c = a * b # shape of c will be (19*19, 5, 80)

2、对每个box 
找出最高分的分类(80选1)
得出相应的分数
3、创建一个门槛mask:比如 ([0.9, 0.3, 0.4, 0.5, 0.1] < 0.4) 返回 [False, True, False, False, True] 注意你想保留的boxes应该为true
4、利用 TensorFlow 将 mask 应用到 box_class_scores 上,过滤掉不需要的boxes。
 

# GRADED FUNCTION: yolo_filter_boxes

def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
    """Filters YOLO boxes by thresholding on object and class confidence.
    
    Arguments:
    box_confidence -- tensor of shape (19, 19, 5, 1)
    boxes -- tensor of shape (19, 19, 5, 4)
    box_class_probs -- tensor of shape (19, 19, 5, 80)
    threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
    
    Returns:
    scores -- tensor of shape (None,), containing the class probability score for selected boxes
    boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
    classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes
    
    Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. 
    For example, the actual output size of scores would be (10,) if there are 10 boxes.
    """
    
    ### START CODE HERE ### (≈ 1 line)
    box_scores = box_confidence*box_class_probs #Pc*80个类的预测分数
    ### END CODE HERE ###

    # Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score 跟踪最大的分数
    ### START CODE HERE ### (≈ 2 lines)
    box_classes = K.argmax(box_scores, axis=-1) #获得最高分数的序号(19,19,5,1)
    box_class_scores = K.max(box_scores, axis=-1)#获得最高分数的数值
    ### END CODE HERE ###

    # Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the
    # same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
    ### START CODE HERE ### (≈ 1 line)
    filtering_mask = box_class_scores >= threshold  # don't use A.eval() >= B 将分数大于阈值的box标记为ture,创造掩码
    ### END CODE HERE ###

    # Step 4: Apply the mask to scores, boxes and classes
    ### START CODE HERE ### (≈ 3 lines)获得符合mask最高分数,该分数所属对象的边界框,该分数所属对象类别
    scores = tf.boolean_mask(box_class_scores, filtering_mask)
    boxes = tf.boolean_mask(boxes, filtering_mask)
    classes = tf.boolean_mask(box_classes, filtering_mask)
    ### END CODE HERE ###
    
    return scores, boxes, classes

 输入数值练习:

with tf.Session() as test_a:
    box_confidence = tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1) #指定均值 标准差 随机数种子
    boxes = tf.random_normal([19, 19, 5, 4], mean=1, stddev=4, seed = 1)
    box_class_probs = tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1)
    scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = 0.5)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.shape))
    print("boxes.shape = " + str(boxes.shape))
    print("classes.shape = " + str(classes.shape))

 输出结果:

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第7张图片

2.3 - Non-max suppression (非极大值抑制)

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第8张图片

练习:实现iou()
在这个练习中(仅在这里), 我们使用两角坐标(左上角/右下角)而来表示一个box
计算box面积的方法 (y2 - y1)x(x2 - x1) 
你还需要找到相交部分的坐标(xi1, yi1, xi2, yi2)  

  • xi1 = max(两个方框的x1)
  • yi1 = max(两个方框的y1)
  • xi2 = min(两个方框的x2)
  • yi2 = min(两个方框的y2)
# GRADED FUNCTION: iou

def iou(box1, box2):
    """Implement the intersection over union (IoU) between box1 and box2
    
    Arguments:
    box1 -- first box, list object with coordinates (x1, y1, x2, y2)
    box2 -- second box, list object with coordinates (x1, y1, x2, y2)
    """

    # Calculate the (y1, x1, y2, x2) coordinates of the intersection of box1 and box2. Calculate its Area.
    ### START CODE HERE ### (≈ 5 lines)
    xi1 = max(box1[0],box2[0])
    yi1 = max(box1[1],box2[1])
    xi2 = min(box1[2],box2[2])
    yi2 = min(box1[3],box2[3])
    inter_area = (yi2-yi1)*(xi2-xi1)
    ### END CODE HERE ###    

    # Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
    ### START CODE HERE ### (≈ 3 lines)
    box1_area = (box1[2]-box1[0])*(box1[3]-box1[1])
    box2_area = (box2[2]-box2[0])*(box2[3]-box2[1])
    union_area = box1_area + box2_area - inter_area
    ### END CODE HERE ###

    # compute the IoU
    ### START CODE HERE ### (≈ 1 line)
    iou = inter_area / union_area
    ### END CODE HERE ###
    return iou
box1 = (2, 1, 4, 3)
box2 = (1, 2, 3, 4) 
print("iou = " + str(iou(box1, box2)))

现在实现非最大抑制

关键步骤为: 
1. 选出具有最高分数的box 
2. 计算该box和其他box的iou, 删除重叠部分iou大于 iou_threshold 的 box 
3. 循环1,2 直到没有满足条件的 boxes

这样将会删除所有有大量重叠覆盖的的 boxes,只留下最优的。

练习:使用 TensorFlow 实现 yolo_non_max_suppression()
TensorFlow有两个内置函数,用于实现非最大抑制(因此实际上不需要使用iou()实现):

  • tf.image.non_max_suppression() 
  • K.gather()
# GRADED FUNCTION: yolo_non_max_suppression

def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
    """
    Applies Non-max suppression (NMS) to set of boxes
    
    Arguments:
    scores -- tensor of shape (None,), output of yolo_filter_boxes()
    boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)
    classes -- tensor of shape (None,), output of yolo_filter_boxes()
    max_boxes -- integer, maximum number of predicted boxes you'd like
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
    
    Returns:
    scores -- tensor of shape (, None), predicted score for each box
    boxes -- tensor of shape (4, None), predicted box coordinates
    classes -- tensor of shape (, None), predicted class for each box
    
    Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
    function will transpose the shapes of scores, boxes, classes. This is made for convenience.
    """
    
    max_boxes_tensor = K.variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()
    K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor
    
    # Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
    ### START CODE HERE ### (≈ 1 line)
    nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold)
    ### END CODE HERE ###

    # Use K.gather() to select only nms_indices from scores, boxes and classes
    ### START CODE HERE ### (≈ 3 lines)
    scores = K.gather(scores, nms_indices)
    boxes = K.gather(boxes, nms_indices)
    classes = K.gather(classes, nms_indices)
    ### END CODE HERE ###
    
    return scores, boxes, classes
with tf.Session() as test_b:
    scores = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
    boxes = tf.random_normal([54, 4], mean=1, stddev=4, seed = 1)
    classes = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.eval().shape))
    print("boxes.shape = " + str(boxes.eval().shape))
    print("classes.shape = " + str(classes.eval().shape))

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第9张图片

 

 2.4 Wrapping up the filtering

现在是时候实现一个功能了,采用深度神经网络 CNN (19*19*5*85维编码),并使用刚刚实现的所有的过滤box.

练习:实现 yolo_eval()

yolo_eval 方法将YOLO 的输出进行编码并用非最大抑制进行过滤。 

表示 box 的方式由好多种,比如左上角/右下角的坐标,比如中心和宽高。YOLO 在运算过程中将灵活转换这些表示方式。

 

# GRADED FUNCTION: yolo_eval

def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):
    """
    Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.
    
    Arguments:
    yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:
                    box_confidence: tensor of shape (None, 19, 19, 5, 1)
                    box_xy: tensor of shape (None, 19, 19, 5, 2)
                    box_wh: tensor of shape (None, 19, 19, 5, 2)
                    box_class_probs: tensor of shape (None, 19, 19, 5, 80)
    image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)
    max_boxes -- integer, maximum number of predicted boxes you'd like
    score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
    
    Returns:
    scores -- tensor of shape (None, ), predicted score for each box
    boxes -- tensor of shape (None, 4), predicted box coordinates
    classes -- tensor of shape (None,), predicted class for each box
    """
    
 ### START CODE HERE ### 

    # Retrieve outputs of the YOLO model (≈1 line) 检索YOLO模型的输出
    box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs[:]

    # Convert boxes to be ready for filtering functions 转换 box 为过滤功能做准备
    boxes = yolo_boxes_to_corners(box_xy, box_wh)

    # Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line) 阈值分数过滤
    scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)

    # Scale boxes back to original image shape.将框缩放为原始图形形状
    boxes = scale_boxes(boxes, image_shape)

    # Use one of the functions you've implemented to perform Non-max suppression with a threshold of iou_threshold (≈1 line)执行非最大值抑制
    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)

    ### END CODE HERE ###
    
    return scores, boxes, classes
with tf.Session() as test_b: #随机初始化下大小为(19,19,5,85)的输出向量:
    yolo_outputs = (tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                    tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1))
    scores, boxes, classes = yolo_eval(yolo_outputs)
    print("scores[2] = " + str(scores[2].eval()))
    print("boxes[2] = " + str(boxes[2].eval()))
    print("classes[2] = " + str(classes[2].eval()))
    print("scores.shape = " + str(scores.eval().shape))
    print("boxes.shape = " + str(boxes.eval().shape))
    print("classes.shape = " + str(classes.eval().shape))

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第10张图片

总体步骤 :

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第11张图片

3 - Test YOLO pretrained model on images 

 创建会话来运行计算图(creating a session to start your graph)

sess = K.get_session()

3.1 - Defining classes, anchors and image shape 

将文件中的信息加载到模型中,并将原为1280*720的文件修改为608*608 

class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
image_shape = (720., 1280.)    

 3.2 - Loading a pretrained model

yolo.h5生成:

git clone https://github.com/allanzelener/YAD2K.git

cd YAD2K

下载 yolo.weights和yolo.cfg放到文件夹,命令行执行:python yad2k.py yolo.cfg yolo.weights model_data/yolo.h5

下载地址:http://pjreddie.com/media/files/yolo.weights

https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov2.cfg 

yolo_model = load_model("model_data/yolo.h5")
yolo_model.summary()

 3.3 - Convert output of the model to usable bounding box tensors(将模型输出转换为识别框tensor)

yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))

此时已将yolo-output数据添加到图表中,这组4个张量可以用作yolo-eval函数的输入

3.4 - Filtering boxes (过滤boxes)

yolo-output以正确的格式为yolo-model提供了所有的预测boxes.现在要执行过滤并仅选择最佳box,调用 yolo-eval执行此操作。 

scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)

3.5 - Run the graph on an image (在图片上运行模型)

 步骤: 
1. 创建session 
2. yolo_model.input 给到 yolo_model 计算输出 yolo_model.output  
3. yolo_model.output 给到 yolo_head,转换为 yolo_output 
4. yolo_output 经过过滤-yolo_eval,输出预测的接轨:scores, boxes, classes

Exercise: Implement predict()  

方法输出:

  • image: 用于 PIL 表示在图片上画出 boxes 
  • image_data: 一个 numpy-array 表示的图片,作为 CNN 的输入 

 当模型使用 BatchNorm 时,feed_dict {K.learning_phase(): 0} 中需要多一个占位符 placeholder

def predict(sess, image_file):
    """
    Runs the graph stored in "sess" to predict boxes for "image_file". Prints and plots the preditions.
    
    Arguments:
    sess -- your tensorflow/Keras session containing the YOLO graph
    image_file -- name of an image stored in the "images" folder.
    
    Returns:
    out_scores -- tensor of shape (None, ), scores of the predicted boxes
    out_boxes -- tensor of shape (None, 4), coordinates of the predicted boxes
    out_classes -- tensor of shape (None, ), class index of the predicted boxes
    
    Note: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes. 
    """

    # Preprocess your image
    image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))

    # Run the session with the correct tensors and choose the correct placeholders in the feed_dict.
    # You'll need to use feed_dict={yolo_model.input: ... , K.learning_phase(): 0})
    ### START CODE HERE ### (≈ 1 line)
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes],feed_dict={yolo_model.input: image_data,K.learning_phase(): 0})
    ### END CODE HERE ###

    # Print predictions info
    print('Found {} boxes for {}'.format(len(out_boxes), image_file))
    # Generate colors for drawing bounding boxes.
    colors = generate_colors(class_names)
    # Draw bounding boxes on the image file
    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    # Save the predicted bounding box on the image
    image.save(os.path.join("out", image_file), quality=90)
    # Display the results in the notebook
    output_image = scipy.misc.imread(os.path.join("out", image_file))
    imshow(output_image)
    
    return out_scores, out_boxes, out_classes
ut_scores, out_boxes, out_classes = predict(sess, "test.jpg")

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第12张图片

 

吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测_第13张图片

你可能感兴趣的:(吴恩达 DeepLearning assignment 4-3 YOLO自动驾驶目标检测)