YOLOV3详解(原理,结构,keras版本代码详解)

YOLOV3的论文中文翻译: https://zhuanlan.zhihu.com/p/34945787

GitHub地址: https://github.com/qqwweee/keras-yolo3

算法的基本思想:

首先通过特征提取网络对输入图像提取特征,得到一定size的feature map,比如13*13,然后将输入图像分成13*13个grid cell,接着如果ground truth中某个object的中心坐标落在哪个grid cell中,那么就由该grid cell来预测该object,因为每个grid cell都会预测固定数量的bounding box(YOLO v1中是2个,YOLO v2中是5个,YOLO v3中是3个,这几个bounding box的初始size是不一样的),那么这几个bounding box中最终是由哪一个来预测该object?答案是:这几个bounding box中只有和ground truth的IOU最大的bounding box才是用来预测该object的。可以看出预测得到的输出feature map有两个维度是提取到的特征的维度,比如13*13,还有一个维度(深度)是B*(5+C),注:YOLO v1中是(B*5+C),其中B表示每个grid cell预测的bounding box的数量,比如YOLO v1中是2个,YOLO v2中是5个,YOLO v3中是3个,C表示bounding box的类别数(没有背景类,所以对于VOC数据集是20),5表示4个坐标信息和一个置信度(objectness score)。
bounding box的坐标预测方式还是延续了YOLO v2的做法,简单讲就是下面这个截图的公式,tx、ty、tw、th就是模型的预测输出。cx和cy表示grid cell的坐标,比如某层的feature map大小是13*13,那么grid cell就有13*13个,第0行第1列的grid cell的坐标cx就是0,cy就是1。pw和ph表示预测前bounding box的size。bx、by。bw和bh就是预测得到的bounding box的中心的坐标和size。坐标的损失采用的是平方误差损失。

è¿éåå¾çæè¿°

类别预测方面主要是将原来的单标签分类改进为多标签分类,因此网络结构上就将原来用于单标签多分类的softmax层换成用于多标签多分类的逻辑回归层。首先说明一下为什么要做这样的修改,原来分类网络中的softmax层都是假设一张图像或一个object只属于一个类别,但是在一些复杂场景下,一个object可能属于多个类,比如你的类别中有woman和person这两个类,那么如果一张图像中有一个woman,那么你检测的结果中类别标签就要同时有woman和person两个类,这就是多标签分类,需要用逻辑回归层来对每个类别做二分类。逻辑回归层主要用到sigmoid函数,该函数可以将输入约束在0到1的范围内,因此当一张图像经过特征提取后的某一类输出经过sigmoid函数约束后如果大于0.5,就表示属于该类。

YOLO v3采用多个scale融合的方式做预测。原来的YOLO v2有一个层叫:passthrough layer,假设最后提取的feature map的size是13*13,那么这个层的作用就是将前面一层的26*26的feature map和本层的13*13的feature map进行连接,有点像ResNet。当时这么操作也是为了加强YOLO算法对小目标检测的精确度。这个思想在YOLO v3中得到了进一步加强,在YOLO v3中采用类似FPN的upsample和融合做法(最后融合了3个scale,其他两个scale的大小分别是26*26和52*52),在多个scale的feature map上做检测,对于小目标的检测效果提升还是比较明显的。前面提到过在YOLO v3中每个grid cell预测3个bounding box,看起来比YOLO v2中每个grid cell预测5个bounding box要少,其实不是!因为YOLO v3采用了多个scale的特征融合,所以boundign box的数量要比之前多很多,以输入图像为416*416为例:(13*13+26*26+52*52)*3和13*13*5相比哪个更多应该很清晰了。

关于bounding box的初始尺寸还是采用YOLO v2中的k-means聚类的方式来做,这种先验知识对于bounding box的初始化帮助还是很大的,毕竟过多的bounding box虽然对于效果来说有保障,但是对于算法速度影响还是比较大的。作者在COCO数据集上得到的9种聚类结果:(10*13); (16*30); (33*23); (30*61); (62*45); (59*119); (116*90); (156*198); (373*326),这应该是按照输入图像的尺寸是416*416计算得到的。

网络结构(Darknet-53)一方面基本采用全卷积(YOLO v2中采用pooling层做feature map的sample,这里都换成卷积层来做了),另一方面引入了residual结构(YOLO v2中还是类似VGG那样直筒型的网络结构,层数太多训起来会有梯度问题,所以Darknet-19也就19层,因此得益于ResNet的residual结构,训深层网络难度大大减小,因此这里可以将网络做到53层,精度提升比较明显)。Darknet-53只是特征提取层,源码中只使用了pooling层前面的卷积层来提取特征,因此multi-scale的特征融合和预测支路并没有在该网络结构中体现,具体信息可以看源码:https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg。预测支路采用的也是全卷积的结构,其中最后一个卷积层的卷积核个数是255,是针对COCO数据集的80类:3*(80+4+1)=255,3表示一个grid cell包含3个bounding box,4表示框的4个坐标信息,1表示objectness score。模型训练方面还是采用原来YOLO v2中的multi-scale training。 

YOLOV3详解(原理,结构,keras版本代码详解)_第1张图片

 

YOLOv3的网络结构图

其中:

DBL:如图1左下角所示,也就是代码中的Darknetconv2d_BN_Leaky,是yolo_v3的基本组件。就是卷积+BN+Leaky relu。对于v3来说,BN和leaky relu已经是和卷积层不可分离的部分了(最后一层卷积除外),共同构成了最小组件。

resn:n代表数字,有res1,res2, … ,res8等等,表示这个res_block(残缺块)里含有多少个res_unit(残缺单元)。这是yolo_v3的大组件,yolo_v3开始借鉴了ResNet的残缺结构,使用这种结构可以让网络结构更深(从v2的darknet-19上升到v3的darknet-53,前者没有残差结构)。对于res_block的解释,可以在图1的右下角直观看到,其基本组件也是DBL。

concat:张量拼接。将darknet中间层和后面的某一层的上采样进行拼接。拼接的操作和残差层add的操作是不一样的,拼接会扩充张量的维度,而add只是直接相加不会导致张量维度的改变。
 

整个YOLO v3网络总共252层,组成如下:

Total:23

Add:23

BatchNormalization:72

Concatenate:2

Conv2D:75

InputLayer:1

LeakyReLU:72

UpSampling2D:2

ZeroPadding2D:5

 

上表可以看出add层23层(主要用于res_block的构成,每个res_unit需要一个add层,一共有1+2+8+8+4=23层),BN层和LeakyReLU层数量完全一样(72层),在网络结构中的表现为:每一层BN后面都会接一层LeakyReLU。卷积层一共有75层,其中有72层后面都会接BN+LeakyReLU的组合构成基本组件DBL。看结构图,可以发现上采样和concat都有2次,和表格分析中对应上。每个res_block都会用上一个零填充,一共有5个res_block。
 

代码详解:

"""YOLO_v3 Model Defined in Keras."""

from functools import wraps

import numpy as np
import tensorflow as tf
from keras import backend as K
from keras.layers import Conv2D, Add, ZeroPadding2D, UpSampling2D, Concatenate, MaxPooling2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.normalization import BatchNormalization
from keras.models import Model
from keras.regularizers import l2

from yolo3.utils import compose


@wraps(Conv2D)
def DarknetConv2D(*args, **kwargs):
    """Wrapper to set Darknet parameters for Convolution2D."""
    darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)}
    darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides')==(2,2) else 'same'
    darknet_conv_kwargs.update(kwargs)
    return Conv2D(*args, **darknet_conv_kwargs)

def DarknetConv2D_BN_Leaky(*args, **kwargs):
    """Darknet Convolution2D followed by BatchNormalization and LeakyReLU."""
    no_bias_kwargs = {'use_bias': False}
    no_bias_kwargs.update(kwargs)
    return compose(
        DarknetConv2D(*args, **no_bias_kwargs),
        BatchNormalization(),
        LeakyReLU(alpha=0.1))

def resblock_body(x, num_filters, num_blocks):
    '''A series of resblocks starting with a downsampling Convolution2D'''
    # Darknet uses left and top padding instead of 'same' mode
    x = ZeroPadding2D(((1,0),(1,0)))(x) # DARKNET每块之间,使用了,(1,0,1,0)的PADDING层
    x = DarknetConv2D_BN_Leaky(num_filters, (3,3), strides=(2,2))(x)
    for i in range(num_blocks):
        y = compose(
                DarknetConv2D_BN_Leaky(num_filters//2, (1,1)),
                DarknetConv2D_BN_Leaky(num_filters, (3,3)))(x)
        x = Add()([x,y]) # 残差相加
    return x

def darknet_body(x): # DARKNET53基础网络结构
    '''Darknent body having 52 Convolution2D layers'''
    x = DarknetConv2D_BN_Leaky(32, (3,3))(x)
    # 5次下采样
    x = resblock_body(x, 64, 1)
    x = resblock_body(x, 128, 2)
    x = resblock_body(x, 256, 8)
    x = resblock_body(x, 512, 8)
    x = resblock_body(x, 1024, 4)
    return x

def make_last_layers(x, num_filters, out_filters):
    '''6 Conv2D_BN_Leaky layers followed by a Conv2D_linear layer'''
    x = compose(
            DarknetConv2D_BN_Leaky(num_filters, (1,1)),
            DarknetConv2D_BN_Leaky(num_filters*2, (3,3)),
            DarknetConv2D_BN_Leaky(num_filters, (1,1)),
            DarknetConv2D_BN_Leaky(num_filters*2, (3,3)),
            DarknetConv2D_BN_Leaky(num_filters, (1,1)))(x) # 最后深度都是512
    y = compose(
            DarknetConv2D_BN_Leaky(num_filters*2, (3,3)),
            DarknetConv2D(out_filters, (1,1)))(x)
    return x, y


def yolo_body(inputs, num_anchors, num_classes):  # YOLOV3借鉴的特征金字塔(FPN)的实现
    """Create YOLO_V3 model CNN body in Keras."""
    darknet = Model(inputs, darknet_body(inputs))
    x, y1 = make_last_layers(darknet.output, 512, num_anchors*(num_classes+5))
    # 这里是一个尺度,所以num_anchors = 3
    # 最小尺度的输出

    x = compose(
            DarknetConv2D_BN_Leaky(256, (1,1)),
            UpSampling2D(2))(x) # 深度变为256,向上采样变16x16
    x = Concatenate()([x,darknet.layers[152].output]) # 大小不变,深度叠加上darknet.layers[152]层的深度
    x, y2 = make_last_layers(x, 256, num_anchors*(num_classes+5))

    x = compose(
            DarknetConv2D_BN_Leaky(128, (1,1)),
            UpSampling2D(2))(x) # 深度变为128
    x = Concatenate()([x,darknet.layers[92].output]) # 深度叠加上darknet92层的深度
    x, y3 = make_last_layers(x, 128, num_anchors*(num_classes+5))

    return Model(inputs, [y1,y2,y3])

def tiny_yolo_body(inputs, num_anchors, num_classes):
    '''Create Tiny YOLO_v3 model CNN body in keras.'''
    x1 = compose(
            DarknetConv2D_BN_Leaky(16, (3,3)),
            MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
            DarknetConv2D_BN_Leaky(32, (3,3)),
            MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
            DarknetConv2D_BN_Leaky(64, (3,3)),
            MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
            DarknetConv2D_BN_Leaky(128, (3,3)),
            MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
            DarknetConv2D_BN_Leaky(256, (3,3)))(inputs)
    x2 = compose(
            MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='same'),
            DarknetConv2D_BN_Leaky(512, (3,3)),
            MaxPooling2D(pool_size=(2,2), strides=(1,1), padding='same'),
            DarknetConv2D_BN_Leaky(1024, (3,3)),
            DarknetConv2D_BN_Leaky(256, (1,1)))(x1)
    y1 = compose(
            DarknetConv2D_BN_Leaky(512, (3,3)),
            DarknetConv2D(num_anchors*(num_classes+5), (1,1)))(x2)

    x2 = compose(
            DarknetConv2D_BN_Leaky(128, (1,1)),
            UpSampling2D(2))(x2)
    y2 = compose(
            Concatenate(),
            DarknetConv2D_BN_Leaky(256, (3,3)),
            DarknetConv2D(num_anchors*(num_classes+5), (1,1)))([x2,x1])

    return Model(inputs, [y1,y2])


def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):

    # 注意!!!! 这个函数是通过向量化,对一个batch_size的所有图片同时处理的
    """Convert final layer features to bounding box parameters."""
    num_anchors = len(anchors)
    # Reshape to batch, height, width, num_anchors, box_params.
    anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])
    '''
    参数变换
    [[[[[116.  90.]
        [156. 198.]
        [373. 326.]]]]]
    '''
    grid_shape = K.shape(feats)[1:3] # height, width
    grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
        [1, grid_shape[1], 1, 1]) # shape: [grid_shape[0], grid_shpae[1], 1, 1]
    # K.tile() 在某一个维度上重复多少次
    # K.arange(0, stop=13)构造[0~12]列表
    grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
        [grid_shape[0], 1, 1, 1]) # shape: [grid_shape[1], grid_shape[0], 1, 1]
    grid = K.concatenate([grid_x, grid_y]) # shape: [grid, grid, 1, 2]
                                          #例:如果是最后一层13x13,则构成[13,13,1,2]的栅格网络,保存每个网格的坐标从(0,0)~(13,13)
    grid = K.cast(grid, K.dtype(feats))

    feats = K.reshape(
        feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])

    # Adjust preditions to each spatial grid point and anchor size.
    # 将box_xy, box_wh,从OUTPUT的预测数据转化为标准尺度的真实坐标(此处应该是416,416).
    box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats)) # 转化为相对于grid的xy坐标
    box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats)) # 转化为相对于grid的wh
    box_confidence = K.sigmoid(feats[..., 4:5]) # 置信度回归用
    box_class_probs = K.sigmoid(feats[..., 5:]) # 类别概率,也用于回归

    if calc_loss == True:
        return grid, feats, box_xy, box_wh
    return box_xy, box_wh, box_confidence, box_class_probs

# 转换成适配不同图片本身的box尺寸
def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape):
    '''Get corrected boxes'''
    box_yx = box_xy[..., ::-1]
    box_hw = box_wh[..., ::-1]
    input_shape = K.cast(input_shape, K.dtype(box_yx))
    image_shape = K.cast(image_shape, K.dtype(box_yx))
    new_shape = K.round(image_shape * K.min(input_shape/image_shape))
    offset = (input_shape-new_shape)/2./input_shape
    scale = input_shape/new_shape
    box_yx = (box_yx - offset) * scale
    box_hw *= scale

    box_mins = box_yx - (box_hw / 2.)
    box_maxes = box_yx + (box_hw / 2.)
    boxes =  K.concatenate([
        box_mins[..., 0:1],  # y_min
        box_mins[..., 1:2],  # x_min
        box_maxes[..., 0:1],  # y_max
        box_maxes[..., 1:2]  # x_max
    ])

    # Scale boxes back to original image shape.
    boxes *= K.concatenate([image_shape, image_shape])
    return boxes


def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):
    '''Process Conv layer output'''
    box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats,
        anchors, num_classes, input_shape)
    boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)
    boxes = K.reshape(boxes, [-1, 4])
    box_scores = box_confidence * box_class_probs # 最后得分是(置信度*分类器)的概率
    box_scores = K.reshape(box_scores, [-1, num_classes])
    return boxes, box_scores


def yolo_eval(yolo_outputs,
              anchors,
              num_classes,
              image_shape,
              max_boxes=20,
              score_threshold=.6,
              iou_threshold=.5):
    """Evaluate YOLO model on given input and return filtered boxes."""
    num_layers = len(yolo_outputs)
    anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]] # default setting
    input_shape = K.shape(yolo_outputs[0])[1:3] * 32 
    boxes = []
    box_scores = [] # shape:[某一尺度所有图片每一个grid,类别数],保存的是每个类别的概率*置信度,也就是说置信度为0就为0
    for l in range(num_layers): #三个尺度分别转化为实际画框的参数
        _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l],
            anchors[anchor_mask[l]], num_classes, input_shape, image_shape)
        boxes.append(_boxes)
        box_scores.append(_box_scores)
    boxes = K.concatenate(boxes, axis=0)
    box_scores = K.concatenate(box_scores, axis=0)

    mask = box_scores >= score_threshold
    max_boxes_tensor = K.constant(max_boxes, dtype='int32')
    boxes_ = []
    scores_ = []
    classes_ = [] 
    for c in range(num_classes):
        # TODO: use keras backend instead of tf.
        class_boxes = tf.boolean_mask(boxes, mask[:, c]) # 清理达不到(置信度*类别概率)阈值的类别,第一次清理,
                                                         # 即清理所有grid预测的box中,没有物体的,概率特别小的
        class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])
        nms_index = tf.image.non_max_suppression(
            class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold) # 非极大值抑制,最多留20个
        class_boxes = K.gather(class_boxes, nms_index) # 通过下标找到该box
        class_box_scores = K.gather(class_box_scores, nms_index) # 通过下标找到box的score
        classes = K.ones_like(class_box_scores, 'int32') * c # 把类型变成一个整数而非'00...1...000'的形式
        boxes_.append(class_boxes)
        scores_.append(class_box_scores)
        classes_.append(classes)
    boxes_ = K.concatenate(boxes_, axis=0)
    scores_ = K.concatenate(scores_, axis=0)
    classes_ = K.concatenate(classes_, axis=0)

    return boxes_, scores_, classes_


def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
    '''Preprocess true boxes to training input format

    Parameters
    ----------
    true_boxes: array, shape=(m, T, 5)
        Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape.
    input_shape: array-like, hw, multiples of 32
    anchors: array, shape=(N, 2), wh
    num_classes: integer

    Returns
    -------
    y_true: list of array, shape like yolo_outputs, xywh are reletive value

    '''
    #true_boxes.shape = (图片张数,每张图片box个数,5)(5是左上右下点坐标加上类别下标)
    assert (true_boxes[..., 4]0 # val尺寸:[图片张数,每张图box个数] 即每个box都有一个0或1的掩膜来掩w宽
                                    #(2,3)的矩阵里面存着True或False

    # 每个图片都需要单独处理                              
    for b in range(m):
        # Discard zero rows.
        wh = boxes_wh[b, valid_mask[b]]
        if len(wh)==0: continue # 该图中所有的box都不合格
        # Expand dim to apply broadcasting.
        wh = np.expand_dims(wh, -2) # 变成(box个数, 1, 2)
        box_maxes = wh / 2.
        box_mins = -box_maxes # 跟上面的anchor一样移动一下位置,且尺寸相同

        # 计算真是真和anchor的IOU
        intersect_mins = np.maximum(box_mins, anchor_mins)
        intersect_maxes = np.minimum(box_maxes, anchor_maxes)
        intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
        intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
        box_area = wh[..., 0] * wh[..., 1]
        anchor_area = anchors[..., 0] * anchors[..., 1]
        iou = intersect_area / (box_area + anchor_area - intersect_area)

        # Find best anchor for each true box
        # 通过最大的iou确定该BOX应放在Label的哪个anchor的位置
        # 9个设定的ANCHOR去框定每个输入的BOX。
        best_anchor = np.argmax(iou, axis=-1)

        for t, n in enumerate(best_anchor):
            for l in range(num_layers):
                if n in anchor_mask[l]: # 看一下best_anchor在哪个尺度 
                    i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32') # 这两个表示是哪个grid预测
                    j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32') # 用true的x, y 乘上grid_shapes
                    k = anchor_mask[l].index(n) # 应该放在该尺度该grid三个anchor的哪个位置上
                    c = true_boxes[b,t, 4].astype('int32') # 是哪一个类别的
                    # 设定数据
                    # 将T个BOX的标的数据统一放置到3*B*W*H*3的维度上。
                    y_true[l][b, j, i, k, 0:4] = true_boxes[b,t, 0:4]
                    y_true[l][b, j, i, k, 4] = 1
                    y_true[l][b, j, i, k, 5+c] = 1 # 将分类器的结果变为0和1的形式,分到的那类是1,其余是0

    return y_true
    # [(None,13,13,3,5+num_classes),(None,26,26,3,5+num_classes),(None,52,52,3,5+num_classes)]
    # None是不知道会有多少张图在这个尺度
 

def box_iou(b1, b2):
    '''Return iou tensor

    Parameters
    ----------
    b1: tensor, shape=(i1,...,iN, 4), xywh
    b2: tensor, shape=(j, 4), xywh

    Returns
    -------
    iou: tensor, shape=(i1,...,iN, j)

    '''

    # Expand dim to apply broadcasting.
    b1 = K.expand_dims(b1, -2)
    b1_xy = b1[..., :2]
    b1_wh = b1[..., 2:4]
    b1_wh_half = b1_wh/2.
    b1_mins = b1_xy - b1_wh_half
    b1_maxes = b1_xy + b1_wh_half

    # Expand dim to apply broadcasting.
    b2 = K.expand_dims(b2, 0)
    b2_xy = b2[..., :2]
    b2_wh = b2[..., 2:4]
    b2_wh_half = b2_wh/2.
    b2_mins = b2_xy - b2_wh_half
    b2_maxes = b2_xy + b2_wh_half

    intersect_mins = K.maximum(b1_mins, b2_mins)
    intersect_maxes = K.minimum(b1_maxes, b2_maxes)
    intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
    intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
    b1_area = b1_wh[..., 0] * b1_wh[..., 1]
    b2_area = b2_wh[..., 0] * b2_wh[..., 1]
    iou = intersect_area / (b1_area + b2_area - intersect_area)

    return iou


def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=False):
    '''Return yolo_loss tensor

    Parameters
    ----------
    yolo_outputs: list of tensor, the output of yolo_body or tiny_yolo_body
    y_true: list of array, the output of preprocess_true_boxes
    anchors: array, shape=(N, 2), wh
    num_classes: integer
    ignore_thresh: float, the iou threshold whether to ignore object confidence loss

    Returns
    -------
    loss: tensor, shape=(1,)

    '''
    num_layers = len(anchors)//3 # default setting
    # args即[*model_body.output, *y_true]
    # model_body.output = [y1,y2,y3]即三个尺度的预测结果,每个y都是m*grid*grid*num_anchors*(num_classes+5)
    # m = batch_size
    yolo_outputs = args[:num_layers]
    y_true = args[num_layers:]
    anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]
    input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0]))
    grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(num_layers)]
    # 得到三个grid的大小
    loss = 0
    m = K.shape(yolo_outputs[0])[0] # batch size, tensor
    mf = K.cast(m, K.dtype(yolo_outputs[0]))
    # loss是需要三层分比计算的
    for l in range(num_layers):
        object_mask = y_true[l][..., 4:5] # 置信率
        true_class_probs = y_true[l][..., 5:] # 分类
 
        # 将网络最后一层输出转化为BBOX的参数
        # anchors[anchor_mask[l]]:anchors对应的某一个尺度的anchor
        # 例:最小尺度预测大物体:
        '''
        anchors[anchor_mask[0]]
        [[116  90]
        [156 198]
        [373 326]]
        '''
        grid, raw_pred, pred_xy, pred_wh = yolo_head(yolo_outputs[l],
             anchors[anchor_mask[l]], num_classes, input_shape, calc_loss=True)
        pred_box = K.concatenate([pred_xy, pred_wh])

        # Darknet raw box to calculate loss.
        # 这是对x,y,w,b转换公式的反变换
        raw_true_xy = y_true[l][..., :2]*grid_shapes[l][::-1] - grid # 保存时其实保存的是5个数(:2)就是x,y
        raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1])
        # 这部操作是避免出现log(0) = 负无穷,故当object_mask置信率接近0是返回全0结果
        # K.switch(条件函数,返回值1,返回值2)其中1,2要等shape
        raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf
        box_loss_scale = 2 - y_true[l][...,2:3]*y_true[l][...,3:4]

        # Find ignore mask, iterate over each of batch.
        ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)
        object_mask_bool = K.cast(object_mask, 'bool') # 将真实标定的数据置信率转换为T or F的掩膜
        def loop_body(b, ignore_mask):
            true_box = tf.boolean_mask(y_true[l][b,...,0:4], object_mask_bool[b,...,0]) # b是第几张图,将置信率为0的其他参数清0
            iou = box_iou(pred_box[b], true_box) # 单张图片单个尺度算iou
            best_iou = K.max(iou, axis=-1) # 先取每个grid最大的iou
            ignore_mask = ignore_mask.write(b, K.cast(best_iou

 

参考博客:

https://blog.csdn.net/yangchengtest/article/details/80664415

https://blog.csdn.net/chandanyan8568/article/details/81089083

https://blog.csdn.net/u014380165/article/details/80202337

 

你可能感兴趣的:(代码解析)