GhostNet是华为诺亚方舟实验室提出来的一个非常有趣的网络,我们一起来学习一下。
2020年,华为新出了一个轻量级网络,命名为GhostNet。
在优秀CNN模型中,特征图存在冗余是非常重要的。如图所示,这个是对ResNet-50第一个残差块特征图进行可视化的结果,当我们给一个神经网络输入一张图片时,我们可以获得特别多的特征图。
利用小扳手连接起来的两幅特征图,它们的相似性就特别高,这个就是神经网络中存在的特征图冗杂的情况。
作者将相似的特征图认为是彼此的Ghost,所以这个网络就叫做GhostNet(误)。
在GhostNet这篇论文里面,作者认为可以使用一些计算量更低(Cheap Operations)的操作去生成这些冗余的特征图,这样就可以在保证良好检测效果的情况下,减少模型的参数量与提高模型的执行速度。
https://github.com/bubbliiiing/mobilenet-yolov4-lite-keras
通过上述的介绍,我们了解到了,GhostNet的核心思想就是使用一些计算量更低(Cheap Operations)的操作去生成这些冗余的特征图。
在论文中,作者设计了一个名为Ghost Module的模块,他的功能是代替普通卷积。
Ghost Module将普通卷积分为两部分,首先进行一个普通的1x1卷积,这是一个少量卷积,比如正常使用32通道的卷积,这里就用16通道的卷积,这个1x1卷积的作用类似于特征整合,生成输入特征层的特征浓缩。
然后我们再进行深度可分离卷积,这个深度可分离卷积是逐层卷积,它也就是我们上面提到的Cheap Operations。它利用上一步获得的特征浓缩生成Ghost特征图。
因此,如果我们从整体上去看这个Ghost Module,它其实就是两步简单思想的汇总:
1、利用1x1卷积获得输入特征的必要特征浓缩。
2、利用深度可分离卷积获得特征浓缩的相似特征图(Ghost)。
Ghost Module的实现代码如下:
def _ghost_module(inputs, exp, kernel, dw_kernel, ratio, strides=1,
padding='same',use_bias=False, relu=True):
output_channels = math.ceil(exp * 1.0 / ratio)
x = Conv2D(output_channels, kernel, strides=strides, padding=padding, use_bias=use_bias)(inputs)
x = BatchNormalization()(x)
if relu:
x = Activation('relu')(x)
dw = DepthwiseConv2D(dw_kernel, strides, padding=padding, depth_multiplier=ratio-1, use_bias=use_bias)(x)
dw = BatchNormalization()(dw)
if relu:
dw = Activation('relu')(dw)
x = Concatenate(axis=-1)([x,dw])
x = Lambda(slices, arguments={'n':exp})(x)
return x
Ghost Bottlenecks是由Ghost Module组成的瓶颈结构,就像这样。
其实本质上就是用Ghost Module,来代替瓶颈结构里面的普通卷积。
Ghost Bottlenecks可以分为两个部分,分别是主干部分和残差边部分,包含Ghost Module的,我们称它为主干部分。
Ghost Bottlenecks有两个种类,如下图所示,当我们需要对特征层的宽高进行压缩的时候,我们会设置这个Ghost Bottlenecks的Stride=2,即步长为2。此时我们会Bottlenecks里面多添加一些卷积层,在主干部分里,我们会在两个Ghost Module中添加一个步长为2x2的深度可分离卷积进行特征层的宽高压缩。在残差边部分,我们也会添加上一个步长为2x2的深度可分离卷积和1x1的普通卷积。
Ghost Bottlenecks的实现代码如下:
def _ghost_bottleneck(inputs, output_channel, hidden_channel, kernel, ghost_kernel, strides, ratio, squeeze):
input_shape = K.int_shape(inputs) # 获取输入张量的尺寸
x = _ghost_module(inputs, hidden_channel, [1,1], ghost_kernel, ratio)
if strides > 1:
x = DepthwiseConv2D(kernel, strides, padding='same', depth_multiplier=1, use_bias=False)(x)
x = BatchNormalization()(x)
if squeeze:
x = _squeeze(x, hidden_channel, 4)
x = _ghost_module(x, output_channel, [1,1], ghost_kernel, ratio, relu=False)
if strides == 1 and input_shape[-1] == output_channel:
res = inputs
else:
res = DepthwiseConv2D(kernel, strides=strides, padding='same', depth_multiplier=1, use_bias=False)(inputs)
res = BatchNormalization()(res)
res = Conv2D(output_channel, (1, 1), padding='same', strides=(1, 1), use_bias=False)(res)
res = BatchNormalization()(res)
x = Add()([res, x])
return x
整个Ghostnet的构建方式如列表所示:
可以看到,整个Ghostnet都是由Ghost Bottlenecks进行组成的。
当一张图片输入到Ghostnet当中时,我们首先进行一个16通道的普通1x1卷积块(卷积+标准化+激活函数)。
之后我们就开始Ghost Bottlenecks的堆叠了,利用Ghost Bottlenecks,我们最终获得了一个7x7x160的特征层(当输入是224x224x3的时候)。
然后我们会利用一个1x1的卷积块进行通道数的调整,此时我们可以获得一个7x7x960的特征层。
之后我们进行一次全局平均池化,然后再利用一个1x1的卷积块进行通道数的调整,获得一个1x1x1280的特征层。
然后平铺后进行全连接就可以进行分类了。
GhostNet的实现代码如下,该代码是Ghostnet在YoloV4上的应用,可以参考一下:
import math
import warnings
import numpy as np
import tensorflow as tf
from keras import backend as K
from keras.applications.imagenet_utils import decode_predictions
from keras.initializers import random_normal
from keras.layers import (Activation, Add, BatchNormalization, Concatenate,
Conv2D, DepthwiseConv2D, GlobalAveragePooling2D,
Lambda, Multiply, Reshape)
def slices(dw, n):
return dw[:,:,:,:n]
def _make_divisible(v, divisor, min_value=None):
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
def _squeeze(inputs, hidden_channel, ratio, block_id, sub_block_id):
x = GlobalAveragePooling2D()(inputs)
x = Reshape((1,1,-1))(x)
x = Conv2D(_make_divisible(hidden_channel/ratio, 4), (1,1), strides=(1,1), padding='same', kernel_initializer=random_normal(stddev=0.02),
name="blocks."+str(block_id)+"."+str(sub_block_id)+".se.conv_reduce")(x)
x = Activation('relu')(x)
x = Conv2D(hidden_channel, (1,1),strides=(1,1), padding='same', kernel_initializer=random_normal(stddev=0.02),
name="blocks."+str(block_id)+"."+str(sub_block_id)+".se.conv_expand")(x)
x = Activation('hard_sigmoid')(x)
x = Multiply()([inputs, x]) # inputs和x逐元素相乘
return x
def _ghost_module(inputs, exp, ratio, block_id, sub_block_id, part, kernel_size=1, dw_size=3, stride=1, relu=True):
output_channels = math.ceil(exp * 1.0 / ratio)
x = Conv2D(output_channels, kernel_size, strides=stride, padding="same", use_bias=False, kernel_initializer=random_normal(stddev=0.02),
name="blocks."+str(block_id)+"."+str(sub_block_id)+".ghost"+str(part)+".primary_conv.0")(inputs)
x = BatchNormalization(name="blocks."+str(block_id)+"."+str(sub_block_id)+".ghost"+str(part)+".primary_conv.1")(x)
if relu:
x = Activation('relu')(x)
dw = DepthwiseConv2D(dw_size, 1, padding="same", depth_multiplier=ratio-1, use_bias=False, depthwise_initializer=random_normal(stddev=0.02),
name="blocks."+str(block_id)+"."+str(sub_block_id)+".ghost"+str(part)+".cheap_operation.0")(x)
dw = BatchNormalization(name="blocks."+str(block_id)+"."+str(sub_block_id)+".ghost"+str(part)+".cheap_operation.1")(dw)
if relu:
dw = Activation('relu')(dw)
x = Concatenate(axis=-1)([x,dw])
x = Lambda(slices, arguments={'n':exp})(x)
return x
def _ghost_bottleneck(inputs, output_channel, hidden_channel, kernel, strides, ratio, squeeze, block_id, sub_block_id):
input_shape = K.int_shape(inputs)
x = _ghost_module(inputs, hidden_channel, ratio, block_id, sub_block_id, 1)
if strides > 1:
x = DepthwiseConv2D(kernel, strides, padding='same', depth_multiplier=1, use_bias=False, depthwise_initializer=random_normal(stddev=0.02),
name="blocks."+str(block_id)+"."+str(sub_block_id)+".conv_dw")(x)
x = BatchNormalization(name="blocks."+str(block_id)+"."+str(sub_block_id)+".bn_dw")(x)
if squeeze:
x = _squeeze(x, hidden_channel, 4, block_id, sub_block_id)
x = _ghost_module(x, output_channel, ratio, block_id, sub_block_id, 2, relu=False)
if strides == 1 and input_shape[-1] == output_channel:
res = inputs
else:
res = DepthwiseConv2D(kernel, strides=strides, padding='same', depth_multiplier=1, use_bias=False, depthwise_initializer=random_normal(stddev=0.02),
name="blocks."+str(block_id)+"."+str(sub_block_id)+".shortcut.0")(inputs)
res = BatchNormalization(name="blocks."+str(block_id)+"."+str(sub_block_id)+".shortcut.1")(res)
res = Conv2D(output_channel, (1, 1), padding='same', strides=(1, 1), use_bias=False, kernel_initializer=random_normal(stddev=0.02),
name="blocks."+str(block_id)+"."+str(sub_block_id)+".shortcut.2")(res)
res = BatchNormalization(name="blocks."+str(block_id)+"."+str(sub_block_id)+".shortcut.3")(res)
x = Add()([res, x])
return x
def Ghostnet(inputs):
x = Conv2D(16, (3, 3), padding="same", strides=(2, 2), use_bias=False, kernel_initializer=random_normal(stddev=0.02), name="conv_stem")(inputs)
x = BatchNormalization(name="bn1")(x)
x = Activation('relu')(x)
x = _ghost_bottleneck(x, 16, 16, (3, 3), strides=1, ratio=2, squeeze=False, block_id=0, sub_block_id=0)
x = _ghost_bottleneck(x, 24, 48, (3, 3), strides=2, ratio=2, squeeze=False, block_id=1, sub_block_id=0)
x = _ghost_bottleneck(x, 24, 72, (3, 3), strides=1, ratio=2, squeeze=False, block_id=2, sub_block_id=0)
x = _ghost_bottleneck(x, 40, 72, (5, 5), strides=2, ratio=2, squeeze=True, block_id=3, sub_block_id=0)
x = _ghost_bottleneck(x, 40, 120, (5, 5), strides=1, ratio=2, squeeze=True, block_id=4, sub_block_id=0)
feat1 = x
x = _ghost_bottleneck(x, 80, 240, (3, 3), strides=2, ratio=2, squeeze=False, block_id=5, sub_block_id=0)
x = _ghost_bottleneck(x, 80, 200, (3, 3), strides=1, ratio=2, squeeze=False, block_id=6, sub_block_id=0)
x = _ghost_bottleneck(x, 80, 184, (3, 3), strides=1, ratio=2, squeeze=False, block_id=6, sub_block_id=1)
x = _ghost_bottleneck(x, 80, 184, (3, 3), strides=1, ratio=2, squeeze=False, block_id=6, sub_block_id=2)
x = _ghost_bottleneck(x, 112, 480, (3, 3), strides=1, ratio=2, squeeze=True, block_id=6, sub_block_id=3)
x = _ghost_bottleneck(x, 112, 672, (3, 3), strides=1, ratio=2, squeeze=True, block_id=6, sub_block_id=4)
feat2 = x
x = _ghost_bottleneck(x, 160, 672, (5, 5), strides=2, ratio=2, squeeze=True, block_id=7, sub_block_id=0)
x = _ghost_bottleneck(x, 160, 960, (5, 5), strides=1, ratio=2, squeeze=False, block_id=8, sub_block_id=0)
x = _ghost_bottleneck(x, 160, 960, (5, 5), strides=1, ratio=2, squeeze=True, block_id=8, sub_block_id=1)
x = _ghost_bottleneck(x, 160, 960, (5, 5), strides=1, ratio=2, squeeze=False, block_id=8, sub_block_id=2)
x = _ghost_bottleneck(x, 160, 960, (5, 5), strides=1, ratio=2, squeeze=True, block_id=8, sub_block_id=3)
feat3 = x
return feat1,feat2,feat3
作为一个轻量级网络,我把Ghostnet和Mobilenet放在一起,作为Yolov4的主干网络进行特征提取。
对于yolov4来讲,我们需要利用主干特征提取网络获得的三个有效特征进行加强特征金字塔的构建。
我们通过上述代码可以取出三个有效特征层,我们可以利用这三个有效特征层替换原来yolov4主干网络CSPdarknet53的有效特征层。
为了进一步减少参数量,我们可以使用深度可分离卷积代替yoloV3中用到的普通卷积。
最终Ghostnet-Yolov4的构建代码如下:
from functools import wraps
from keras import backend as K
from keras.initializers import random_normal
from keras.layers import (Activation, BatchNormalization, Concatenate, Conv2D,
DepthwiseConv2D, Input, Lambda, MaxPooling2D,
UpSampling2D)
from keras.layers.normalization import BatchNormalization
from keras.models import Model
from keras.regularizers import l2
from utils.utils import compose
from nets.ghostnet import Ghostnet
from nets.mobilenet_v1 import MobileNetV1
from nets.mobilenet_v2 import MobileNetV2
from nets.mobilenet_v3 import MobileNetV3
from nets.yolo_training import yolo_loss
def relu6(x):
return K.relu(x, max_value=6)
#------------------------------------------------------#
# 单次卷积DarknetConv2D
# 如果步长为2则自己设定padding方式。
#------------------------------------------------------#
@wraps(Conv2D)
def DarknetConv2D(*args, **kwargs):
darknet_conv_kwargs = {'kernel_initializer' : random_normal(stddev=0.02), 'kernel_regularizer': l2(5e-4)}
darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides')==(2,2) else 'same'
darknet_conv_kwargs.update(kwargs)
return Conv2D(*args, **darknet_conv_kwargs)
#---------------------------------------------------#
# 卷积块 -> 卷积 + 标准化 + 激活函数
# DarknetConv2D + BatchNormalization + Relu6
#---------------------------------------------------#
def DarknetConv2D_BN_Leaky(*args, **kwargs):
no_bias_kwargs = {'use_bias': False}
no_bias_kwargs.update(kwargs)
return compose(
DarknetConv2D(*args, **no_bias_kwargs),
BatchNormalization(),
Activation(relu6))
#---------------------------------------------------#
# 深度可分离卷积块
# DepthwiseConv2D + BatchNormalization + Relu6
#---------------------------------------------------#
def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha = 1,
depth_multiplier=1, strides=(1, 1)):
pointwise_conv_filters = int(pointwise_conv_filters * alpha)
x = DepthwiseConv2D((3, 3), depthwise_initializer=random_normal(stddev=0.02),
padding='same',
depth_multiplier=depth_multiplier,
strides=strides,
use_bias=False)(inputs)
x = BatchNormalization()(x)
x = Activation(relu6)(x)
x = DarknetConv2D(pointwise_conv_filters, (1, 1),
padding='same',
use_bias=False,
strides=(1, 1))(x)
x = BatchNormalization()(x)
return Activation(relu6)(x)
#---------------------------------------------------#
# 进行五次卷积
#---------------------------------------------------#
def make_five_convs(x, num_filters):
# 五次卷积
x = DarknetConv2D_BN_Leaky(num_filters, (1,1))(x)
x = _depthwise_conv_block(x, num_filters*2,alpha=1)
x = DarknetConv2D_BN_Leaky(num_filters, (1,1))(x)
x = _depthwise_conv_block(x, num_filters*2,alpha=1)
x = DarknetConv2D_BN_Leaky(num_filters, (1,1))(x)
return x
#---------------------------------------------------#
# Panet网络的构建,并且获得预测结果
#---------------------------------------------------#
def yolo_body(input_shape, anchors_mask, num_classes, backbone="mobilenetv1", alpha=1):
inputs = Input(input_shape)
#---------------------------------------------------#
# 生成mobilnet的主干模型,获得三个有效特征层。
#---------------------------------------------------#
if backbone=="mobilenetv1":
#---------------------------------------------------#
# 52,52,256;26,26,512;13,13,1024
#---------------------------------------------------#
feat1,feat2,feat3 = MobileNetV1(inputs, alpha=alpha)
elif backbone=="mobilenetv2":
#---------------------------------------------------#
# 52,52,32;26,26,92;13,13,320
#---------------------------------------------------#
feat1,feat2,feat3 = MobileNetV2(inputs, alpha=alpha)
elif backbone=="mobilenetv3":
#---------------------------------------------------#
# 52,52,40;26,26,112;13,13,160
#---------------------------------------------------#
feat1,feat2,feat3 = MobileNetV3(inputs, alpha=alpha)
elif backbone=="ghostnet":
#---------------------------------------------------#
# 52,52,40;26,26,112;13,13,160
#---------------------------------------------------#
feat1,feat2,feat3 = Ghostnet(inputs)
else:
raise ValueError('Unsupported backbone - `{}`, Use mobilenetv1, mobilenetv2, mobilenetv3, ghostnet.'.format(backbone))
P5 = DarknetConv2D_BN_Leaky(int(512* alpha), (1,1))(feat3)
P5 = _depthwise_conv_block(P5, int(1024* alpha))
P5 = DarknetConv2D_BN_Leaky(int(512* alpha), (1,1))(P5)
maxpool1 = MaxPooling2D(pool_size=(13,13), strides=(1,1), padding='same')(P5)
maxpool2 = MaxPooling2D(pool_size=(9,9), strides=(1,1), padding='same')(P5)
maxpool3 = MaxPooling2D(pool_size=(5,5), strides=(1,1), padding='same')(P5)
P5 = Concatenate()([maxpool1, maxpool2, maxpool3, P5])
P5 = DarknetConv2D_BN_Leaky(int(512* alpha), (1,1))(P5)
P5 = _depthwise_conv_block(P5, int(1024* alpha))
P5 = DarknetConv2D_BN_Leaky(int(512* alpha), (1,1))(P5)
P5_upsample = compose(DarknetConv2D_BN_Leaky(int(256* alpha), (1,1)), UpSampling2D(2))(P5)
P4 = DarknetConv2D_BN_Leaky(int(256* alpha), (1,1))(feat2)
P4 = Concatenate()([P4, P5_upsample])
P4 = make_five_convs(P4,int(256* alpha))
P4_upsample = compose(DarknetConv2D_BN_Leaky(int(128* alpha), (1,1)), UpSampling2D(2))(P4)
P3 = DarknetConv2D_BN_Leaky(int(128* alpha), (1,1))(feat1)
P3 = Concatenate()([P3, P4_upsample])
P3 = make_five_convs(P3,int(128* alpha))
#---------------------------------------------------#
# 第三个特征层
# y3=(batch_size,52,52,3,85)
#---------------------------------------------------#
P3_output = _depthwise_conv_block(P3, int(256* alpha))
P3_output = DarknetConv2D(len(anchors_mask[0])*(num_classes+5), (1,1))(P3_output)
P3_downsample = _depthwise_conv_block(P3, int(256* alpha), strides=(2,2))
P4 = Concatenate()([P3_downsample, P4])
P4 = make_five_convs(P4,int(256* alpha))
#---------------------------------------------------#
# 第二个特征层
# y2=(batch_size,26,26,3,85)
#---------------------------------------------------#
P4_output = _depthwise_conv_block(P4, int(512* alpha))
P4_output = DarknetConv2D(len(anchors_mask[1])*(num_classes+5), (1,1))(P4_output)
P4_downsample = _depthwise_conv_block(P4, int(512* alpha), strides=(2,2))
P5 = Concatenate()([P4_downsample, P5])
P5 = make_five_convs(P5,int(512* alpha))
#---------------------------------------------------#
# 第一个特征层
# y1=(batch_size,13,13,3,85)
#---------------------------------------------------#
P5_output = _depthwise_conv_block(P5, int(1024* alpha))
P5_output = DarknetConv2D(len(anchors_mask[2])*(num_classes+5), (1,1))(P5_output)
return Model(inputs, [P5_output, P4_output, P3_output])