ImageNet的句号:基于keras的EfficientNet复现【1 主model】

〇、前面的几句话

EfficientNet是19年的东西,现在是2021年初。往回看,2020年整个ImageNet领域都没有一个网络能对EfficientNet产生压倒性优势。
从这里就可以看出这个东西是多么恐怖。
网上关于EfficientNet的讲解可谓是汗牛充栋,往前翻19年的文章,大家对这个东西都没有几个理解透彻的,大多数人的注意力还停留在简单的“模型精度和效率优化”上面。
后来源代码放出来了。大家才发现,这篇文章里面是有多少细致的东西需要解读。
原理就不说了,一搜一大堆,我划几个重点,然后直接上代码。

一、联合优化

这是EfficientNet统领全纲的一个点。我最先接触到网络“瘦身”的概念是在MobileNet上。MobileNet严谨地证明了自己以深度可分离卷积为基本模块,可以减少参数量并且达到几乎同样的accuracy。
不仅如此,MobileNet也提出了“宽度因子”的概念,这无疑就是EfficientNet里面混合维度放大法的“婴儿版”。
联合优化是有效的,及其有效的,这是EfficientNet的理论基础。

二、七个参数向量和强化学习

本人现在正在搞一篇SCI,核心就是用NSGA-II算法搜索神经网络超参数向量。当时搞这个idea的时候真心不知道MnasNet这个东西,只觉得启发算法搜神经网络这个玩意简直是太耗资源,而且不好成功。
EfficientNet-B0的参数结果又秀了我一脸,第一个混合放大维度参数竟然是强化学习搜出来的。
资源终端+帕累托解空间 理论优化神经网络,听起来真的很梦幻。而EfficientNet就是硬生生给你搜出来七个baseline,真的是令人咂舌。
但就是有一个问题,这个东西我觉得应该是“一个训练集,一种缩放参数”。像谷歌这么复杂地搞,这个网络的落地实践,我觉得真真是个问题。

三、MBConv,SE的首次大规模集成

众所周知,MobileNet主打的就是轻量级,我以前也没听过SE可以被这么用。以前的SE都是链接几个节点,搞个小附加。而EfficientNet把这两个叠在一起,一层一层做出来一个large setting。气人的是,还很有效。

四、后面再说,

现在我越写越觉得自己理解还是不透,言多必失,先少说点。

import tensorflow as tf
import math
NUM_CLASSES = 6

# swish激活函数原理
'''
def swish(x):
    return x * tf.nn.sigmoid(x)
'''


def round_filters(filters, multiplier):
    '''
    宽度上取整扩张函数
    :param filters: 原始通道数量
    :param multiplier: 扩张系数
    :return: 扩张后的通道数量
    该方法有几个特点:
    ①向上扩张时,最小结果不低于8
    ②向下收缩时,结果不低于原通道数的0.9
    ③返回值必为深度因子的整数倍,并且向上取整数倍
    '''
    depth_divisor = 8
    min_depth = None
    min_depth = min_depth or depth_divisor
    filters = filters * multiplier
    new_filters = max(min_depth, int(filters + depth_divisor / 2) // depth_divisor * depth_divisor)
    if new_filters < 0.9 * filters:
        new_filters += depth_divisor
    return int(new_filters)


def round_repeats(repeats, multiplier):
    '''
    深度上取整扩张函数
    :param repeats: 原深度(layer数)
    :param multiplier: 扩张系数
    :return: 扩张后的深度
    '''
    if not multiplier:
        return repeats
    return int(math.ceil(multiplier * repeats))


'''
SE模块
思想是使用全局均池化提取通道间特征,然后将结果和后层做特征融合。
起到对不同通道的注意力加权重机制
=》基本结构
:imput shape = (ImageHeight, ImageWeight, FilterNum)
:layer1 = GlobalAveragePooling(input), shape = (1, 1, FilterNum)
:layer2 = ImgaeFC(layer1), shape = (1, 1, FilterNum/r) where r is Dense scaling factor
:layer3 = Activation(layer2), shape = (1, 1, FilterNum/r) the activation function is relu ,swish or sigmoid
:layer4 = ImgaeFC(layer3), shape = (1, 1, FilterNum) multiply with r
:output = Activation(layer4) * imput, shape = (ImageHeight, ImageWeight, FilterNum) Go back to the original shape
'''
class SEBlock(tf.keras.layers.Layer):
    def __init__(self, input_channels, ratio=0.25):
        '''
        参数和成员变量解析:
        :param input_channels: 输入通道数
        :param ratio: 通道缩放比率,应用于FC层
        :argument self.num_reduced_filters: 缩放后通道数(最小为1)
        :argument self.pool: 全局平均池化
        :argument self.reduce_conv and self.expand_conv: 缩放FC层和反缩放FC层,用conv2D实现
        '''
        super(SEBlock, self).__init__()
        self.num_reduced_filters = max(1, int(input_channels * ratio))
        self.pool = tf.keras.layers.GlobalAveragePooling2D()
        self.reduce_conv = tf.keras.layers.Conv2D(filters=self.num_reduced_filters,
                                                  kernel_size=(1, 1),
                                                  strides=1,
                                                  padding="same")
        self.expand_conv = tf.keras.layers.Conv2D(filters=input_channels,
                                                  kernel_size=(1, 1),
                                                  strides=1,
                                                  padding="same")

    def call(self, inputs, **kwargs):
        branch = self.pool(inputs)
        branch = tf.expand_dims(input=branch, axis=1)
        branch = tf.expand_dims(input=branch, axis=1)
        branch = self.reduce_conv(branch)
        branch = tf.nn.swish(branch)
        branch = self.expand_conv(branch)
        branch = tf.nn.sigmoid(branch)
        output = inputs * branch
        return output

'''
惊为天人的MBConv模块
MBConv是一个先进思想的集大成者。其中包含了短链接部分的SE层和长链接的残差结构。
EfficientNet就是基于MBConv模块实现的主干网,在论文里,作者经过多次实验验证,找到了最优秀的MBConv集合方式
=》模块一
:imput which is denoted as "x" which will be used in Res function
:layer1 = FC(1*1)
:layer2 = BN(layer1) BatchNormalize ,A Normalized Method for Accelerating Training
//其实我也不知道为啥BN可以加快训练,其原理就是简单的对于每一个样本进行计算均值,方差,然后对每一个元素经过一个算式进行基于均值距离的归一化计算
//但他就是有用,和droupout一样,很有用。
:layer3 = swish(layer2)
:layer4 = DepthwiseConv2D(layer3) a conv2d method without summing over all channels
:layer5 = BN(layer4)
:layer6 = swish(layer5)

=》模块二: SE模块

=》模块三
:layer1 = FC(SE_output)
:layer2 = BN(layer1)
:layer3 = Dropout&Dense(layer2)

=》output = F(x)+x (Residual Equation)
'''
class MBConv(tf.keras.layers.Layer):
    def __init__(self, in_channels, out_channels, expansion_factor, stride, k, drop_connect_rate):
        super(MBConv, self).__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.stride = stride
        self.drop_connect_rate = drop_connect_rate
        self.conv1 = tf.keras.layers.Conv2D(filters=in_channels * expansion_factor,
                                            kernel_size=(1, 1),
                                            strides=1,
                                            padding="same")
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.dwconv = tf.keras.layers.DepthwiseConv2D(kernel_size=(k, k),
                                                      strides=stride,
                                                      padding="same")
        self.bn2 = tf.keras.layers.BatchNormalization()
        self.se = SEBlock(input_channels=in_channels * expansion_factor)
        self.conv2 = tf.keras.layers.Conv2D(filters=out_channels,
                                            kernel_size=(1, 1),
                                            strides=1,
                                            padding="same")
        self.bn3 = tf.keras.layers.BatchNormalization()
        self.dropout = tf.keras.layers.Dropout(rate=drop_connect_rate)

    def call(self, inputs, training=None, **kwargs):
        x = self.conv1(inputs)
        x = self.bn1(x, training=training)
        x = tf.nn.swish(x)
        x = self.dwconv(x)
        x = self.bn2(x, training=training)
        x = self.se(x)
        x = tf.nn.swish(x)
        x = self.conv2(x)
        x = self.bn3(x, training=training)
        if self.stride == 1 and self.in_channels == self.out_channels:
            if self.drop_connect_rate:
                x = self.dropout(x, training=training)
            x = tf.keras.layers.add([x, inputs])
        return x


def build_mbconv_block(in_channels, out_channels, layers, stride, expansion_factor, k, drop_connect_rate):
    block = tf.keras.Sequential()
    for i in range(layers):
        if i == 0:
            block.add(MBConv(in_channels=in_channels,
                             out_channels=out_channels,
                             expansion_factor=expansion_factor,
                             stride=stride,
                             k=k,
                             drop_connect_rate=drop_connect_rate))
        else:
            block.add(MBConv(in_channels=out_channels,
                             out_channels=out_channels,
                             expansion_factor=expansion_factor,
                             stride=1,
                             k=k,
                             drop_connect_rate=drop_connect_rate))
    return block


class EfficientNet(tf.keras.Model):
    def __init__(self, width_coefficient, depth_coefficient, dropout_rate, drop_connect_rate=0.2):
        super(EfficientNet, self).__init__()

        self.conv1 = tf.keras.layers.Conv2D(filters=round_filters(32, width_coefficient),
                                            kernel_size=(3, 3),
                                            strides=2,
                                            padding="same")
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.block1 = build_mbconv_block(in_channels=round_filters(32, width_coefficient),
                                         out_channels=round_filters(16, width_coefficient),
                                         layers=round_repeats(1, depth_coefficient),
                                         stride=1,
                                         expansion_factor=1, k=3, drop_connect_rate=drop_connect_rate)
        self.block2 = build_mbconv_block(in_channels=round_filters(16, width_coefficient),
                                         out_channels=round_filters(24, width_coefficient),
                                         layers=round_repeats(2, depth_coefficient),
                                         stride=2,
                                         expansion_factor=6, k=3, drop_connect_rate=drop_connect_rate)
        self.block3 = build_mbconv_block(in_channels=round_filters(24, width_coefficient),
                                         out_channels=round_filters(40, width_coefficient),
                                         layers=round_repeats(2, depth_coefficient),
                                         stride=2,
                                         expansion_factor=6, k=5, drop_connect_rate=drop_connect_rate)
        self.block4 = build_mbconv_block(in_channels=round_filters(40, width_coefficient),
                                         out_channels=round_filters(80, width_coefficient),
                                         layers=round_repeats(3, depth_coefficient),
                                         stride=2,
                                         expansion_factor=6, k=3, drop_connect_rate=drop_connect_rate)
        self.block5 = build_mbconv_block(in_channels=round_filters(80, width_coefficient),
                                         out_channels=round_filters(112, width_coefficient),
                                         layers=round_repeats(3, depth_coefficient),
                                         stride=1,
                                         expansion_factor=6, k=5, drop_connect_rate=drop_connect_rate)
        self.block6 = build_mbconv_block(in_channels=round_filters(112, width_coefficient),
                                         out_channels=round_filters(192, width_coefficient),
                                         layers=round_repeats(4, depth_coefficient),
                                         stride=2,
                                         expansion_factor=6, k=5, drop_connect_rate=drop_connect_rate)
        self.block7 = build_mbconv_block(in_channels=round_filters(192, width_coefficient),
                                         out_channels=round_filters(320, width_coefficient),
                                         layers=round_repeats(1, depth_coefficient),
                                         stride=1,
                                         expansion_factor=6, k=3, drop_connect_rate=drop_connect_rate)

        self.conv2 = tf.keras.layers.Conv2D(filters=round_filters(1280, width_coefficient),
                                            kernel_size=(1, 1),
                                            strides=1,
                                            padding="same")
        self.bn2 = tf.keras.layers.BatchNormalization()
        self.pool = tf.keras.layers.GlobalAveragePooling2D()
        self.dropout = tf.keras.layers.Dropout(rate=dropout_rate)
        self.fc = tf.keras.layers.Dense(units=NUM_CLASSES,
                                        activation=tf.keras.activations.softmax)

    def call(self, inputs, training=None, mask=None):
        x = self.conv1(inputs)
        x = self.bn1(x, training=training)
        x = tf.nn.swish(x)

        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = self.block5(x)
        x = self.block6(x)
        x = self.block7(x)

        x = self.conv2(x)
        x = self.bn2(x, training=training)
        x = tf.nn.swish(x)
        x = self.pool(x)
        x = self.dropout(x, training=training)
        x = self.fc(x)

        return x


def get_efficient_net(width_coefficient, depth_coefficient, resolution, dropout_rate):
    net = EfficientNet(width_coefficient=width_coefficient,
                       depth_coefficient=depth_coefficient,
                       dropout_rate=dropout_rate)
    net.build(input_shape=(None, resolution, resolution, 3))
    net.summary()

    return net


# 七个网络等级,标号越大,模型越大,精度越高。
def efficient_net_b0():
    return get_efficient_net(1.0, 1.0, 224, 0.2)


def efficient_net_b1():
    return get_efficient_net(1.0, 1.1, 240, 0.2)


def efficient_net_b2():
    return get_efficient_net(1.1, 1.2, 260, 0.3)


def efficient_net_b3():
    return get_efficient_net(1.2, 1.4, 300, 0.3)


def efficient_net_b4():
    return get_efficient_net(1.4, 1.8, 380, 0.4)


def efficient_net_b5():
    return get_efficient_net(1.6, 2.2, 456, 0.4)


def efficient_net_b6():
    return get_efficient_net(1.8, 2.6, 528, 0.5)


def efficient_net_b7():
    return get_efficient_net(2.0, 3.1, 600, 0.5)

你可能感兴趣的:(深度学习,卷积神经网络,人工智能,cv)