Tensorflow2.0 实现 YOLOv3(二):网络结构(common.py + backbone.py)

文章目录

  • 文章说明
  • 总体结构
  • common.py
    • Convolutional 结构
    • Residual 残差模块
    • Upsample 结构
  • backbone.py
    • Darknet53 结构
  • yolov3.py
    • YOLOv3 网络
  • 完整代码
    • common.py
    • backbone.py

文章说明

本系列文章旨在对 Github 上 malin9402 提供的代码进行说明,在这篇文章中,我们会对 YOLOv3 网络的整体结构进行说明,涉及的文件包括 common.py、backbone.py 和部分 yolov3.py。

如果只是想运行 Github 上的代码,可以参考对 YOLOv3 代码的说明一文。

总体结构

下面这幅图就是 YOLOv3 网络的整体结构,在图中我们可以看到:尺寸为 416X416 的输入图片进入 Darknet-53 网络后得到了 3 个分支,这些分支在经过一系列的卷积、上采样以及合并等操作后最终得到了三个尺寸不一的特征映射,形状分别为 [13, 13, 255]、[26, 26, 255] 和 [52, 52, 255]。
Tensorflow2.0 实现 YOLOv3(二):网络结构(common.py + backbone.py)_第1张图片实现这个网络需要先构建 Convolutional 结构、Residual 残差模块、Upsample 结构和 Darknet53 结构。

Convolutional 结构、Residual 残差模块和 Upsample 结构在 YOLOv3 项目的 common.py 中;Darknet53 结构在 YOLOv3 项目的 backbone.py 中;最终的 YOLOv3 网络在 YOLOv3 项目的 yolov3.py 中。

common.py

Convolutional 结构

Convolutional 结构其实很简单,就是普通的卷积层。其代码如下:

import tensorflow as tf
class BatchNormalization(tf.keras.layers.BatchNormalization):
    """
    "Frozen state" and "inference mode" are two separate concepts.
    `layer.trainable = False` is to freeze the layer, so the layer will use
    stored moving `var` and `mean` in the "inference mode", and both `gama`
    and `beta` will not be updated !
    """
    def call(self, x, training=False):
        if not training:
            training = tf.constant(False)
        training = tf.logical_and(training, self.trainable)
        return super().call(x, training)

def convolutional(input_layer, filters_shape, downsample=False, activate=True, bn=True):
    if downsample:
        input_layer = tf.keras.layers.ZeroPadding2D(((1, 0), (1, 0)))(input_layer)
        padding = 'valid'
        strides = 2
    else:
        strides = 1
        padding = 'same'

    conv = tf.keras.layers.Conv2D(filters=filters_shape[-1], kernel_size = filters_shape[0], strides=strides, padding=padding,
                                  use_bias=not bn, kernel_regularizer=tf.keras.regularizers.l2(0.0005),
                                  kernel_initializer=tf.random_normal_initializer(stddev=0.01),
                                  bias_initializer=tf.constant_initializer(0.))(input_layer)

    if bn: conv = BatchNormalization()(conv)
    if activate == True: conv = tf.nn.leaky_relu(conv, alpha=0.1)

    return conv

以上代码中,构建批归一化层的方法来自于YunYang1994,不是很常见,但直接用 tensorflow2.0 提供的高级 API 应该也是可以的。将一张 416X416 的彩色图片输入 Convolutional 结构后的 shape 变化为(这里假设使用 32 个卷积核):
Tensorflow2.0 实现 YOLOv3(二):网络结构(common.py + backbone.py)_第2张图片【注】这里的 ZeroPadding2D 层其实就是对图片的上下左右四个边界填充 0 而已,padding=((top_pad, bottom_pad), (left_pad, right_pad))。

Residual 残差模块

残差模块最显著的特点是使用了 short cut 机制(有点类似于电路中的短路机制)来缓解在神经网络中增加深度带来的梯度消失问题,从而使得神经网络变得更容易优化。它通过恒等映射的方法使得输入和输出之间建立了一条直接的关联通道,从而使得网络集中学习输入和输出之间的残差。

实现 Residual 残差模块的代码为:

def residual_block(input_layer, input_channel, filter_num1, filter_num2):
    short_cut = input_layer
    conv = convolutional(input_layer, filters_shape=(1, 1, input_channel, filter_num1))
    conv = convolutional(conv       , filters_shape=(3, 3, filter_num1,   filter_num2))

    residual_output = short_cut + conv
    return residual_output

将一张 416X416X32 的 feature map 输入 Residual 残差模块后的 shape 变化为(这里假设 filter_num1 = 64;filter_num1 = 32):
Tensorflow2.0 实现 YOLOv3(二):网络结构(common.py + backbone.py)_第3张图片由此可见,feature map 通过 Residual 残差模块前后其 shape 不变。并且,在输入的参数中,要保证 input_channel = filter_num2,否则相加的操作会因为 shape 不同而无法进行。

Upsample 结构

上采样的作用是在后面 YOLOv3 网络中进行相加操作时保证两个 feature map 的宽和高相同,其具体操作为:将输入的 feature map 的宽高都乘 2,得到新的 feature map。

def upsample(input_layer):
    return tf.image.resize(input_layer, (input_layer.shape[1] * 2, input_layer.shape[2] * 2), method='nearest')

backbone.py

Darknet53 结构

实现 Darknet53 结构的代码为:

import tensorflow as tf
import core.common as common
def darknet53(input_data):

    input_data = common.convolutional(input_data, (3, 3,  3,  32))
    input_data = common.convolutional(input_data, (3, 3, 32,  64), downsample=True)

    for i in range(1):
        input_data = common.residual_block(input_data,  64,  32, 64)

    input_data = common.convolutional(input_data, (3, 3,  64, 128), downsample=True)

    for i in range(2):
        input_data = common.residual_block(input_data, 128,  64, 128)

    input_data = common.convolutional(input_data, (3, 3, 128, 256), downsample=True)

    for i in range(8):
        input_data = common.residual_block(input_data, 256, 128, 256)

    route_1 = input_data
    input_data = common.convolutional(input_data, (3, 3, 256, 512), downsample=True)

    for i in range(8):
        input_data = common.residual_block(input_data, 512, 256, 512)

    route_2 = input_data
    input_data = common.convolutional(input_data, (3, 3, 512, 1024), downsample=True)

    for i in range(4):
        input_data = common.residual_block(input_data, 1024, 512, 1024)

    return route_1, route_2, input_data

其 shape 变化在后面展示。

yolov3.py

YOLOv3 网络

以下代码只是 yolov3.py 文件中的一部分,其余部分会在另一篇文章中给出。

import numpy as np
import tensorflow as tf
import core.utils as utils
import core.common as common
import core.backbone as backbone
from core.config import cfg

NUM_CLASS = len(utils.read_class_names(cfg.YOLO.CLASSES))

def YOLOv3(input_layer):
    route_1, route_2, conv = backbone.darknet53(input_layer)
    conv = common.convolutional(conv, (1, 1, 1024,  512))
    conv = common.convolutional(conv, (3, 3,  512, 1024))
    conv = common.convolutional(conv, (1, 1, 1024,  512))
    conv = common.convolutional(conv, (3, 3,  512, 1024))
    conv = common.convolutional(conv, (1, 1, 1024,  512))
    conv_lobj_branch = common.convolutional(conv, (3, 3, 512, 1024))
    # 第一个输出  最后输出是不加激活也不加BN层
    conv_lbbox = common.convolutional(conv_lobj_branch, (1, 1, 1024, 3*(NUM_CLASS + 5)), activate=False, bn=False)
    conv = common.convolutional(conv, (1, 1,  512,  256))
    conv = common.upsample(conv)
    # 在通道维度上拼接
    conv = tf.concat([conv, route_2], axis=-1)

    conv = common.convolutional(conv, (1, 1, 768, 256))
    conv = common.convolutional(conv, (3, 3, 256, 512))
    conv = common.convolutional(conv, (1, 1, 512, 256))
    conv = common.convolutional(conv, (3, 3, 256, 512))
    conv = common.convolutional(conv, (1, 1, 512, 256))

    conv_mobj_branch = common.convolutional(conv, (3, 3, 256, 512))
    conv_mbbox = common.convolutional(conv_mobj_branch, (1, 1, 512, 3*(NUM_CLASS + 5)), activate=False, bn=False)

    conv = common.convolutional(conv, (1, 1, 256, 128))
    conv = common.upsample(conv)

    conv = tf.concat([conv, route_1], axis=-1)

    conv = common.convolutional(conv, (1, 1, 384, 128))
    conv = common.convolutional(conv, (3, 3, 128, 256))
    conv = common.convolutional(conv, (1, 1, 256, 128))
    conv = common.convolutional(conv, (3, 3, 128, 256))
    conv = common.convolutional(conv, (1, 1, 256, 128))

    conv_sobj_branch = common.convolutional(conv, (3, 3, 128, 256))
    conv_sbbox = common.convolutional(conv_sobj_branch, (1, 1, 256, 3*(NUM_CLASS +5)), activate=False, bn=False)
    return [conv_sbbox, conv_mbbox, conv_lbbox]

将一张 416X416 的彩色图片输入 YOLOv3 网络后的 shape 变化为(假设共有 80 个类别):
Tensorflow2.0 实现 YOLOv3(二):网络结构(common.py + backbone.py)_第4张图片这里的每个输出的通道数都是 3x(80+5),这是因为 YOLOv3 设定的是每个网格单元预测 3 个检测框,而且每个检测框需要有 (x, y, w, h, confidence) 五个基本参数,然后还要有80个类别的概率。

基本参数:

  • x:检测框的中心横坐标;
  • y:检测框的中心纵坐标;
  • w:检测框的宽度;
  • h:检测框的高度;
  • confidence:置信概率,即检测框内有检测物体的概率。

完整代码

common.py

import tensorflow as tf

class BatchNormalization(tf.keras.layers.BatchNormalization):
    """
    "Frozen state" and "inference mode" are two separate concepts.
    `layer.trainable = False` is to freeze the layer, so the layer will use
    stored moving `var` and `mean` in the "inference mode", and both `gama`
    and `beta` will not be updated !
    """
    def call(self, x, training=False):
        if not training:
            training = tf.constant(False)
        training = tf.logical_and(training, self.trainable)
        return super().call(x, training)

def convolutional(input_layer, filters_shape, downsample=False, activate=True, bn=True):
    if downsample:
        input_layer = tf.keras.layers.ZeroPadding2D(((1, 0), (1, 0)))(input_layer)
        padding = 'valid'
        strides = 2
    else:
        strides = 1
        padding = 'same'

    conv = tf.keras.layers.Conv2D(filters=filters_shape[-1], kernel_size = filters_shape[0], strides=strides, padding=padding,
                                  use_bias=not bn, kernel_regularizer=tf.keras.regularizers.l2(0.0005),
                                  kernel_initializer=tf.random_normal_initializer(stddev=0.01),
                                  bias_initializer=tf.constant_initializer(0.))(input_layer)

    if bn: conv = BatchNormalization()(conv)
    if activate == True: conv = tf.nn.leaky_relu(conv, alpha=0.1)

    return conv

def residual_block(input_layer, input_channel, filter_num1, filter_num2):
    short_cut = input_layer
    conv = convolutional(input_layer, filters_shape=(1, 1, input_channel, filter_num1))
    conv = convolutional(conv       , filters_shape=(3, 3, filter_num1,   filter_num2))

    residual_output = short_cut + conv
    return residual_output

def upsample(input_layer):
    return tf.image.resize(input_layer, (input_layer.shape[1] * 2, input_layer.shape[2] * 2), method='nearest')

backbone.py

import tensorflow as tf
import core.common as common


def darknet53(input_data):

    input_data = common.convolutional(input_data, (3, 3,  3,  32))
    input_data = common.convolutional(input_data, (3, 3, 32,  64), downsample=True)

    for i in range(1):
        input_data = common.residual_block(input_data,  64,  32, 64)

    input_data = common.convolutional(input_data, (3, 3,  64, 128), downsample=True)

    for i in range(2):
        input_data = common.residual_block(input_data, 128,  64, 128)

    input_data = common.convolutional(input_data, (3, 3, 128, 256), downsample=True)

    for i in range(8):
        input_data = common.residual_block(input_data, 256, 128, 256)

    route_1 = input_data
    input_data = common.convolutional(input_data, (3, 3, 256, 512), downsample=True)

    for i in range(8):
        input_data = common.residual_block(input_data, 512, 256, 512)

    route_2 = input_data
    input_data = common.convolutional(input_data, (3, 3, 512, 1024), downsample=True)

    for i in range(4):
        input_data = common.residual_block(input_data, 1024, 512, 1024)

    return route_1, route_2, input_data

你可能感兴趣的:(深度学习,tensorflow,YOLOv3)