本文仅在理论方面讲述CNN相关的知识,并给出AlexNet, Agg, ResNet等网络结构的代码。
由输入层、卷积层、池化层、全连接层构成。
左边为神经网络,右边为卷积神经网络。均采用的时较为简单的结构,卷积神经网络是对基础神经网络的延申,由一维扩展到三位空间,适用于对图像的操作。
假设我们在输入一张 32 × 32 × 3 32 \times 32 \times 3 32×32×3 大小的图片进入CNN,我们在卷积层对他进行图像特征提取,输入图片输出特征图。首先我们需要设定以下参数作为卷积层的参数:
其中卷积操作为需要卷积操作的范围内,对原图像的像素分别乘上卷积核对应内容并相加,得到结果,以红框即第一次卷积操作为例 结果为:
0 ∗ 1 + 2 ∗ 0 + 4 ∗ 1 + 1 ∗ 0 + 3 ∗ 1 + 5 ∗ 0 + 30 ∗ 1 + 12 ∗ 0 + 32 ∗ 1 = 64 0*1+2*0+4*1+1*0+3*1+5*0+30*1+12*0+32*1=64 0∗1+2∗0+4∗1+1∗0+3∗1+5∗0+30∗1+12∗0+32∗1=64
图片中展示的为单通道的卷积操作,由于我们输入的时RGB三通道的图片,我们需要3个卷积核对每一个通道进行卷积操作,再将三个通道相加得到特征图。
我们可以通过公式计算出最终得到的卷积结果的大小,其中H代表长,F代表卷积核,P代表Padding边缘填充,S代表步长:
H 2 = H 1 − F H + 2 P S + 1 W 2 = W 2 − F H + 2 P S + 1 H_2 =\frac{H_1-F_H+2P}{S}+1\\ W_2 = \frac{W_2-F_H+2P}{S}+1 H2=SH1−FH+2P+1W2=SW2−FH+2P+1
池化层是为了对特征图进行下采样(即压缩)而被使用的,池化有很多种方式,Max Pooling , Min Pooling , Average Pooling 等。在此我们仅解释Max Pooling操作,其余操作可依此类推:
Max Pooling:对取样范围内的值进行压缩,取范围内最大的值。
Average Pooling: 从核内计算平均值,取该值
在构成卷积神经网络时,在卷积层后增加激活函数,一般深度神经网络使用ReUL激活函数,每一个卷积层(conv)或全连接层(fc)称为神经网络中的一层。下面我们以一个四层神经网络为例:
我们使用tensorflow中的keras库尝试搭建这些网络,在此仅展示代码,后续会补上代码的相关解释博客,此处展示的代码为网络结构,若你了解tensorflow训练的流程,可以尝试使用以下网络训练。下述代码笔者均使用tensorflow中的数据集尝试训练过。
AlexNet 为第一个深度神经网络,他一共有八层,其中五个卷积层和三个全连接层,卷积核的大小为 11 × 11 11 \times 11 11×11 ,0 padding。
import tensorflow as tf
class AlexNet8(tf.keras.Model):
def __init__(self):
super(AlexNet8, self).__init__()
self.conv1 = tf.keras.layers.Conv2D(filters=96, kernel_size=(3, 3),
padding='valid', strides=1)
self.bn1 = tf.keras.layers.BatchNormalization()
self.activation1 = tf.keras.layers.Activation('relu')
self.pool1 = tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=2)
self.conv2 = tf.keras.layers.Conv2D(filters=256, kernel_size=(3, 3),
padding='valid', strides=1)
self.bn2 = tf.keras.layers.BatchNormalization()
self.activation2 = tf.keras.layers.Activation('relu')
self.pool2 = tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=2)
self.conv3 = tf.keras.layers.Conv2D(filters=384, kernel_size=(3, 3),
padding='same', activation='relu',
strides=1)
self.conv4 = tf.keras.layers.Conv2D(filters=384, kernel_size=(3, 3),
padding='same', activation='relu',
strides=1)
self.conv5 = tf.keras.layers.Conv2D(filters=256, kernel_size=(3, 3),
padding='same', activation='relu',
strides=1)
self.pool3 = tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=2)
self.flatten = tf.keras.layers.Flatten()
self.dense1 = tf.keras.layers.Dense(2048, activation='relu')
self.dropout1 = tf.keras.layers.Dropout(0.5)
self.dense2 = tf.keras.layers.Dense(2048, activation='relu')
self.dropout2 = tf.keras.layers.Dropout(0.5)
self.dense3 = tf.keras.layers.Dense(10, activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.activation1(x)
x = self.pool1(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.activation2(x)
x = self.pool2(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.conv5(x)
x = self.pool3(x)
x = self.flatten(x)
x = self.dense1(x)
x = self.dropout1(x)
x = self.dense2(x)
x = self.dropout2(x)
y = self.dense3(x)
return y
下列图中的结构为Vgg16,一共有16层,其中13个卷积层,三个全连接层,卷积核的大小为 3 × 3 3 \times 3 3×3
import tensorflow as tf
class VGGNet(tf.keras.Model):
def __init__(self):
super(VGGNet, self).__init__()
self.conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding='same', strides=1)
self.bn1 = tf.keras.layers.BatchNormalization()
self.activation1 = tf.keras.layers.Activation('relu')
self.conv2 = tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding='same', strides=1)
self.bn2 = tf.keras.layers.BatchNormalization()
self.activation2 = tf.keras.layers.Activation('relu')
self.pool1 = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)
self.dropout1 = tf.keras.layers.Dropout(0.2)
self.conv3 = tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding='same', strides=1)
self.bn3 = tf.keras.layers.BatchNormalization()
self.activation3 = tf.keras.layers.Activation('relu')
self.conv4 = tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding='same', strides=1)
self.bn4 = tf.keras.layers.BatchNormalization()
self.activation4 = tf.keras.layers.Activation('relu')
self.pool2 = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)
self.dropout2 = tf.keras.layers.Dropout(0.2)
self.conv5 = tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', strides=1)
self.bn5 = tf.keras.layers.BatchNormalization()
self.activation5 = tf.keras.layers.Activation('relu')
self.conv6 = tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', strides=1)
self.bn6 = tf.keras.layers.BatchNormalization()
self.activation6 = tf.keras.layers.Activation('relu')
self.conv7 = tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', strides=1)
self.bn7 = tf.keras.layers.BatchNormalization()
self.activation7 = tf.keras.layers.Activation('relu')
self.pool3 = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)
self.dropout3 = tf.keras.layers.Dropout(0.2)
self.conv8 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', strides=1)
self.bn8 = tf.keras.layers.BatchNormalization()
self.activation8 = tf.keras.layers.Activation('relu')
self.conv9 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same')
self.bn9 = tf.keras.layers.BatchNormalization()
self.activation9 = tf.keras.layers.Activation('relu')
self.conv10 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', strides=1)
self.bn10 = tf.keras.layers.BatchNormalization()
self.activation10 = tf.keras.layers.Activation('relu')
self.pool4 = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)
self.dropout4 = tf.keras.layers.Dropout(0.2)
self.conv11 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', strides=1)
self.bn11 = tf.keras.layers.BatchNormalization()
self.activation11 = tf.keras.layers.Activation('relu')
self.conv12 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', strides=1)
self.bn12 = tf.keras.layers.BatchNormalization()
self.activation12 = tf.keras.layers.Activation('relu')
self.conv13 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', strides=1)
self.bn13 = tf.keras.layers.BatchNormalization()
self.activation13 = tf.keras.layers.Activation('relu')
self.pool5 = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)
self.dropout5 = tf.keras.layers.Dropout(0.2)
self.flatten = tf.keras.layers.Flatten()
self.dense1 = tf.keras.layers.Dense(512, activation='relu')
self.dropout6 = tf.keras.layers.Dropout(0.2)
self.dense2 = tf.keras.layers.Dense(512, activation='relu')
self.dropout7 = tf.keras.layers.Dropout(0.2)
self.dense3 = tf.keras.layers.Dense(10, activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.activation1(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.activation2(x)
x = self.pool1(x)
x = self.dropout1(x)
x = self.conv3(x)
x = self.bn3(x)
x = self.activation3(x)
x = self.conv4(x)
x = self.bn4(x)
x = self.activation4(x)
x = self.pool2(x)
x = self.dropout2(x)
x = self.conv5(x)
x = self.bn5(x)
x = self.activation5(x)
x = self.conv6(x)
x = self.bn6(x)
x = self.activation6(x)
x = self.conv7(x)
x = self.bn7(x)
x = self.activation7(x)
x = self.pool3(x)
x = self.dropout3(x)
x = self.conv8(x)
x = self.bn8(x)
x = self.activation8(x)
x = self.conv9(x)
x = self.bn9(x)
x = self.activation9(x)
x = self.conv10(x)
x = self.bn10(x)
x = self.activation10(x)
x = self.pool4(x)
x = self.dropout4(x)
x = self.conv11(x)
x = self.bn11(x)
x = self.activation11(x)
x = self.conv12(x)
x = self.bn12(x)
x = self.activation12(x)
x = self.conv13(x)
x = self.bn13(x)
x = self.activation13(x)
x = self.pool5(x)
x = self.dropout5(x)
x = self.flatten(x)
x = self.dense1(x)
x = self.dropout6(x)
x = self.dense2(x)
x = self.dropout7(x)
y = self.dense3(x)
return y
由于添加更深层网络(大于20层)时,会出现精度下降的情况,导致20层以上的深度神经网络无法达到更好的性能。resnet网络则解决了这一问题,通过将上一层结果和本层卷积结果进行比较,取更优的网络作为我们传入下层的输入。
import tensorflow as tf
class ResnetBlock(tf.keras.Model):
def __init__(self, filters, strides=1, residual_path=False):
super(ResnetBlock, self).__init__()
self.filters = filters
self.strides = strides
self.residual_path = residual_path
self.c1 = tf.keras.layers.Conv2D(filters, (3, 3), strides=strides, padding='same', use_bias=False)
self.b1 = tf.keras.layers.BatchNormalization()
self.a1 = tf.keras.layers.Activation('relu')
self.c2 = tf.keras.layers.Conv2D(filters, (3, 3), strides=1, padding='same', use_bias=False)
self.b2 = tf.keras.layers.BatchNormalization()
if residual_path:
self.down_c1 = tf.keras.layers.Conv2D(filters, (1, 1), strides=strides, padding='same', use_bias=False)
self.down_b1 = tf.keras.layers.BatchNormalization()
self.a2 = tf.keras.layers.Activation('relu')
def call(self, inputs):
residual = inputs
x = self.c1(inputs)
x = self.b1(x)
x = self.a1(x)
x = self.c2(x)
y = self.b2(x)
if self.residual_path:
residual = self.down_c1(inputs)
residual = self.down_b1(residual)
out = self.a2(y + residual)
return out
class ResNet18(tf.keras.Model):
def __init__(self, block_list, initial_filters=64):
super(ResNet18, self).__init__()
self.num_blocks = len(block_list)
self.block_list = block_list
self.out_filters = initial_filters
self.c1 = tf.keras.layers.Conv2D(self.out_filters, (3, 3), strides=1, padding='same', use_bias=False)
self.b1 = tf.keras.layers.BatchNormalization()
self.a1 = tf.keras.layers.Activation('relu')
self.blocks = tf.keras.models.Sequential()
# 构建ResNet网络结构
for block_id in range(len(block_list)):
for layer_id in range(block_list[block_id]):
if block_id != 0 and layer_id == 0:
block = ResnetBlock(self.out_filters, strides=2, residual_path=True)
else:
block = ResnetBlock(self.out_filters, residual_path=False)
self.blocks.add(block)
self.out_filters *= 2
self.p1 = tf.keras.layers.GlobalAveragePooling2D()
self.f1 = tf.keras.layers.Dense(10, activation='softmax', kernel_regularizer=tf.keras.regularizers.l2())
def call(self, inputs):
x = self.c1(inputs)
x = self.b1(x)
x = self.a1(x)
x = self.blocks(x)
x = self.p1(x)
y = self.f1(x)
return y