2 - YOLO
2.1 Model Details 模型细节
First things to know:
- The input is a batch of images of shape (m, 608, 608, 3)
- The output is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers (pc,bx,by,bh,bw,c)(pc,bx,by,bh,bw,c) as explained above. If you expand c into an 80-dimensional vector, each bounding box is then represented by 85 numbers.
We will use 5 anchor boxes. So you can think of the YOLO architecture as the following: IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85).
- 输入是一批数据(m,608,608,3)
m为样本数量,图像分辨率为608x608x3 , 3是通道数量,代表RGB
- 输出是一列带有分类标志的向量。每个边界框向量由6个元素组成,如果你把参数C扩展成80维的向量,那么边界框向量就由85个数字元素组成。
6个元素分别是辨识对象的概率pc,对象中心点的横、纵坐标(bx、by),对象边界框的高、宽(bh、bw),还有类别代码(c)。
我们使用5个目标框,所以YOLO结构最终输出(m, 19, 19, 5, 85)。
Now, for each box (of each cell) we will compute the following elementwise product and extract a probability that the box contains a certain class.
Here's one way to visualize what YOLO is predicting on an image:
简单概括,用可能性pc乘以80个对象的标识,得到每个对象的分数score,即为算法认为此处是该对象的可能性。
其中用(bx,by,bh,bw)定位,用c的数值标识对象类型。
- For each of the 19x19 grid cells, find the maximum of the probability scores (taking a max across both the 5 anchor boxes and across different classes).
- Color that grid cell according to what object that grid cell considers the most likely.
Doing this results in this picture:
Note that this visualization isn't a core part of the YOLO algorithm itself for making predictions; it's just a nice way of visualizing an intermediate result of the algorithm.
Another way to visualize YOLO's output is to plot the bounding boxes that it outputs. Doing that results in a visualization like this:
In the figure above, we plotted only boxes that the model had assigned a high probability to, but this is still too many boxes. You'd like to filter the algorithm's output down to a much smaller number of detected objects. To do so, you'll use non-max suppression. Specifically, you'll carry out these steps:
- Get rid of boxes with a low score (meaning, the box is not very confident about detecting a class)
- Select only one box when several boxes overlap with each other and detect the same object.
简单翻译一下,有两种办法标记出anchor box,
- 第一种,对于19x19的每一个网格中的5个可能的对象,把得分最高的那个用颜色标记出来。
- 第二种,把检测到的每个对象边界框都画出来
对于第二种,我们虽然只标记了可能性较大的对象,但是仍然还有很多框,所以我们继续做以下工作:
- 放弃那些分数低的标记框
- 当多个框重叠标记同一个对象时,只选择一个
2.2 - Filtering with a threshold on class scores 依据scores参数过滤
You are going to apply a first filter by thresholding. You would like to get rid of any box for which the class "score" is less than a chosen threshold.
The model gives you a total of 19x19x5x85 numbers, with each box described by 85 numbers. It'll be convenient to rearrange the (19,19,5,85) (or (19,19,425)) dimensional tensor into the following variables:
-
box_confidence
: tensor of shape (19 x 19, 5, 1) containing pc (confidence probability that there's some object) for each of the 5 boxes predicted in each of the 19x19 cells. -
boxes
: tensor of shape (19 x 19, 5, 4) containing (b_x, b_y, b_h, b_w) for each of the 5 boxes per cell. -
box_class_probs
: tensor of shape (19 x 19, 5, 80) containing the detection probabilities (c_1, c_2, ... c_{80}) for each of the 80 classes for each of the 5 boxes per cell.
Exercise: Implement yolo_filter_boxes()
.
- Compute box scores by doing the elementwise product as described in Figure 4. The following code may help you choose the right operator:
a = np.random.randn(19*19, 5, 1)
b = np.random.randn(19*19, 5, 80)
c = a * b # shape of c will be (19*19, 5, 80)
- For each box, find:
- the index of the class with the maximum box score (Hint) (Be careful with what axis you choose; consider using axis=-1)
- the corresponding box score (Hint) (Be careful with what axis you choose; consider using axis=-1)
- Create a mask by using a threshold. As a reminder:
([0.9, 0.3, 0.4, 0.5, 0.1] < 0.4)
returns:[False, True, False, False, True]
. The mask should be True for the boxes you want to keep. - Use TensorFlow to apply the mask to box_class_scores, boxes and box_classes to filter out the boxes we don't want. You should be left with just the subset of boxes you want to keep. (Hint)
Reminder: to call a Keras function, you should use K.function(...)
.
翻译一下:
- box_confidence :即为19*19的每个区域中,生成5个anchorbox,每个anchorbox生成一个pc
- boxes :即为19*19的每个区域中,生成5个anchorbox,每个anchorbox的边界框
- box_class_probs :19195个anchorbox中,每个box的边界参数,前面已经解释了,4个参数各代表什么意义
实现yolo_filter_boxes():
- 实现图片4中的的运算,用乘法就行,box_confidence * box_class_probs , 其中box_confidence不足的维度将自动扩充,所以运算结果是 (19x19x5x80)的向量。
- 对于每个anchor box(19x19x5个),找出:
- 盒子中最大的分数score的类别序号(即80个中找到最大的那个)
- 该类别对应的分数socre
- 创造一个掩码,这个掩码将你想保留的anchorbox设为true
- 使用TensorFlow对box_class_probs,boxes和box_classes应用掩码,将我们不想要的boxes筛选掉,你应当留下你想留下的boxes子集。
注意,想使用keras的函数,需要用k.function(...)
# GRADED FUNCTION: yolo_filter_boxes
def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
"""Filters YOLO boxes by thresholding on object and class confidence.
Arguments:
box_confidence -- tensor of shape (19, 19, 5, 1)
boxes -- tensor of shape (19, 19, 5, 4)
box_class_probs -- tensor of shape (19, 19, 5, 80)
threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
Returns:
scores -- tensor of shape (None,), containing the class probability score for selected boxes
boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes
Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold.
For example, the actual output size of scores would be (10,) if there are 10 boxes.
"""
# Step 1: Compute box scores
### START CODE HERE ### (≈ 1 line) 算出得分可能性
box_scores = box_confidence * box_class_probs
### END CODE HERE ###
# Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score
### START CODE HERE ### (≈ 2 lines)
#获得最高分数的序号 19x19x5x1
box_classes = K.argmax(box_scores, axis=-1)
#获得最高分数的分数 19x19x5x1
box_class_scores = K.max(box_scores, axis=-1, keepdims=False)
### END CODE HERE ###
# Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the
# same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
### START CODE HERE ### (≈ 1 line)
#将分数大于输入值threshold的标记为true,创造掩码
filtering_mask = box_class_scores >= threshold
### END CODE HERE ###
# Step 4: Apply the mask to scores, boxes and classes
### START CODE HERE ### (≈ 3 lines) 获得符合mask最高分数,该分数所属对象的边界框,该分数所属对象类别
scores = tf.boolean_mask(box_class_scores, filtering_mask)
boxes = tf.boolean_mask(boxes, filtering_mask)
classes = tf.boolean_mask(box_classes, filtering_mask)
### END CODE HERE ###
return scores, boxes, classes
2.3 - Non-max suppression 非极大值抑制
Even after filtering by thresholding over the classes scores, you still end up a lot of overlapping boxes. A second filter for selecting the right boxes is called non-maximum suppression (NMS).
Non-max suppression uses the very important function called "Intersection over Union", or IoU.
Exercise: Implement iou(). Some hints:
- In this exercise only, we define a box using its two corners (upper left and lower right):
(x1, y1, x2, y2)
rather than the midpoint and height/width. - To calculate the area of a rectangle you need to multiply its height
(y2 - y1)
by its width(x2 - x1)
. - You'll also need to find the coordinates
(xi1, yi1, xi2, yi2)
of the intersection of two boxes. Remember that:- xi1 = maximum of the x1 coordinates of the two boxes
- yi1 = maximum of the y1 coordinates of the two boxes
- xi2 = minimum of the x2 coordinates of the two boxes
- yi2 = minimum of the y2 coordinates of the two boxes
- In order to compute the intersection area, you need to make sure the height and width of the intersection are positive, otherwise the intersection area should be zero. Use
max(height, 0)
andmax(width, 0)
.
In this code, we use the convention that (0,0) is the top-left corner of an image, (1,0) is the upper-right corner, and (1,1) the lower-right corner.
非极大值抑制这部分,其实吴恩达老师在课程里讲得很清楚了,我简单翻译一下:
即使经过了用掩码对类别得分进行过滤,你仍然有许多重叠的边界框(如图七),下一个用来选择正确边界框的过滤器被称作非极大值抑制(NMS)。
而非极大值抑制需要用到一个非常重要的函数,交并比(IoU,Intersection over Union),如图8。
练习:实现iou()函数
- 仅在此练习中,我们用两个顶点来定义边界框(x1,y1,x2,y2),而不是中点和宽高。
- 你需要用高
(y2 - y1)
乘以宽(x2 - x1)
来计算矩形区域(的面积)。
在这段代码中,(0,0)是图像的左上角坐标,(1,1)是左下角坐标。
- 你还需要找到两个边界框相交部分的交点
(xi1, yi1, xi2, yi2)
。- xi1 = 两个边界框x1坐标(左上角坐标)的最大值
- yi1 = 两个边界框y1坐标(左上角坐标)的最大值
- xi2 = 两个边界框的x2坐标(右下角坐标)的最小值
- yi2 = 两个边界框的y2坐标(右下角坐标)的最小值
- 为了计算香蕉区域,你得确保相交区域的宽和高为正值,否则相交区域就归零。用
max(height, 0)
和max(width, 0)