本篇博客我们主要来探讨一下深度学习框架下的卷积神经网络各层的作用,并且使用caffe深度学习框架搭建模型实现表情识别。
首先展示一下输入层结构:
input: "data"
input_shape {
dim: 1 # batchsize
dim: 1 # number of channels
dim: 224 # width
dim: 28 # height
}
在输入层中,我们只需要定义输入数据的形状,在这里我们输入的图像数据,因此定义中的dim: 1是batch的大小,即训练过程中每一个step输入到网络中图片的数量,这样可以使用部分样本的数据来代表全部样本的特点和分布,来更新网络参数啊,这样可以大大减少训练时的计算量和提高模型的泛化能力;dim: 1是图片的通道数,此处设为1表示输入的是灰度图像,如果dim为3,则表示输入的是彩色图像;dim:224和dim:28表示每张图像的形状大小为224×28
layer {
name: "data_bn"
type: "BatchNorm"
bottom: "data"
top: "data_bn"
param {
lr_mult: 0.0
}
}
以上是BN层的网络结构和参数。其中name是自己定义的该层的名字;type为该层的类型,此处为BN层;bottom:data为该层的输入参数,这里的data就是上边输入层的数据;top:data_bn为该层的输出变量,data_bn为自己定义的网络的输出变量;parm里的lr_mult:0.0表示该层的学习率:
layer {
name: "data_scale"
type: "Scale"
bottom: "data_bn"
top: "data_bn"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
scale_param {
bias_term: true
}
}
在caffe中,scale层是做标准化或者去除均值和方差缩放,其中变化方式为data_bn = α × data_bn + β,等式右边的data_bn为去除均值,方差化为1的数据。经过上述变换,数据的分布就发生了变化,使得数据变得更加平滑,从而避免梯度爆炸和弥散,提高模型的鲁棒性,使得激活函数能够远离其饱和区。
layer {
name: "conv1"
type: "Convolution"
bottom: "data_bn"
top: "conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
convolution_param {
num_output: 64
pad: 3
kernel_size: 7
stride: 2
weight_filler {
type: "msra"
variance_norm: FAN_OUT
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
以上是caffe框架中卷积网络的结构和参数。其中num_output:64是该卷积层中使用的卷积核的个数为64;pad:3表示在原始输入的基础上扩展了3个像素的尺寸,以便卷积核能够在输入矩阵上滑动整数次;kernel_size:7表示当前使用的卷积核的大小是7×7的;stride:2指卷积核每次在输入矩阵上滑动两个像素;weight_filler用来定义卷积核参数的初始化方法,其中MSRA方法是以均值为0、方差为2的高斯分布对卷积核进行初始化,这种初始化方法非常有利于RELU激活函数的训练;bias_filler指的是偏差的初始化方法,此处偏差设置为常量0,即不是用偏差。
layer {
name: "conv1_relu"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
卷积核的输出后边一般都会级联一个激活函数层,激活函数是受到生物学中大脑依靠电信号传递信息的机制设计的。激活函数分为线性激活函数和非线性激活函数两种,此处使用的RELU激活函数属于非线性激活函数,这使得深度卷积网络最终拟合出的函数为非线性函数,更能反应数据本身的特点和规律。而且RELU激活函数可以减小梯度消失和梯度爆炸的影响。
layer {
name: "conv1_pool"
type: "Pooling"
bottom: "conv1"
top: "conv1_pool"
pooling_param {
kernel_size: 3
stride: 2
}
以上为池化层的结构和参数,池化层的目的是为了在卷积层逐渐增加卷积核数量的同时,减小特征map的大小,从而去除掉冗余信息和减小计算量。此处pooling layer的参数设置为:
kernel_size:3表示每次池化的范围为3×3
stride:2池化的步长为2
layer {
name: "layer_128_1_conv_expand"
type: "Convolution"
bottom: "layer_128_1_bn1"
top: "layer_128_1_conv_expand"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 128
bias_term: false
pad: 0
kernel_size: 1
stride: 2
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
扩展层的卷积核大小都是为1的,步长值设定视具体情况而定,这样相当于不改变原来数据的特征,只是改变了原来特征map的大小,方便后边与其他层的输出进行相加操作。
layer {
name: "layer_128_1_sum"
type: "Eltwise"
bottom: "layer_128_1_conv2"
bottom: "layer_128_1_conv_expand"
top: "layer_128_1_sum"
}
Eltwise层的作用就是将两层或者多层的map相加得到新的map,以上Eltwise层是将layer_128_1_conv2和layer_128_1_conv_expand相加,最终得到layer_128_1_sum,其中layer_128_1_conv_expand就是通过上一级扩展层得到的输出,如果不经过扩展直接相加,两层形状不一样,会报错。
layer {
name: "classifier-Expressions"
type: "InnerProduct"
bottom: "global_pool"
top: "classifier-Expressions"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 1
}
inner_product_param {
num_output: 6
weight_filler {
type: "xavier"
std: 0.02
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
经过卷积、批量归一化、池化、激活函数之后,我们得到的是一个多维矩阵形式的特征map,最后我们需要将其处理成一维的形式,级联全连接形式的神经网络。caffe中承担这一任务的就是InnerProduct层。
layer {
name: "prob"
type: "Softmax"
bottom: "classifier-Expressions"
top: "prob"
}
输出层这里采用softmax函数,将输出以概率的形式给出,我们取概率值最大的那一个作为识别结果,输出层神经元的个数与你分类的种类数是相同的。
name: "macro_expression_recognition"
################ define data #############
input: "data"
input_shape {
dim: 1 # batchsize
dim: 1 # number of channels
dim: 224 # width
dim: 28 # height
}
################# Define net ###########
layer {
name: "data_bn"
type: "BatchNorm"
bottom: "data"
top: "data_bn"
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
}
layer {
name: "data_scale"
type: "Scale"
bottom: "data_bn"
top: "data_bn"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
scale_param {
bias_term: true
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data_bn"
top: "conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
convolution_param {
num_output: 64
pad: 3
kernel_size: 7
stride: 2
weight_filler {
type: "msra"
variance_norm: FAN_OUT
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "conv1_bn"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
}
layer {
name: "conv1_scale"
type: "Scale"
bottom: "conv1"
top: "conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
scale_param {
bias_term: true
}
}
layer {
name: "conv1_relu"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv1_pool"
type: "Pooling"
bottom: "conv1"
top: "conv1_pool"
pooling_param {
kernel_size: 3
stride: 2
}
}
layer {
name: "layer_64_1_conv1"
type: "Convolution"
bottom: "conv1_pool"
top: "layer_64_1_conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 64
bias_term: false
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "layer_64_1_bn2"
type: "BatchNorm"
bottom: "layer_64_1_conv1"
top: "layer_64_1_conv1"
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
}
layer {
name: "layer_64_1_scale2"
type: "Scale"
bottom: "layer_64_1_conv1"
top: "layer_64_1_conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
scale_param {
bias_term: true
}
}
layer {
name: "layer_64_1_relu2"
type: "ReLU"
bottom: "layer_64_1_conv1"
top: "layer_64_1_conv1"
}
layer {
name: "layer_64_1_conv2"
type: "Convolution"
bottom: "layer_64_1_conv1"
top: "layer_64_1_conv2"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 64
bias_term: false
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "layer_64_1_sum"
type: "Eltwise"
bottom: "layer_64_1_conv2"
bottom: "conv1_pool"
top: "layer_64_1_sum"
}
layer {
name: "layer_128_1_bn1"
type: "BatchNorm"
bottom: "layer_64_1_sum"
top: "layer_128_1_bn1"
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
}
layer {
name: "layer_128_1_scale1"
type: "Scale"
bottom: "layer_128_1_bn1"
top: "layer_128_1_bn1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
scale_param {
bias_term: true
}
}
layer {
name: "layer_128_1_relu1"
type: "ReLU"
bottom: "layer_128_1_bn1"
top: "layer_128_1_bn1"
}
layer {
name: "layer_128_1_conv1"
type: "Convolution"
bottom: "layer_128_1_bn1"
top: "layer_128_1_conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 128
bias_term: false
pad: 1
kernel_size: 3
stride: 2
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "layer_128_1_bn2"
type: "BatchNorm"
bottom: "layer_128_1_conv1"
top: "layer_128_1_conv1"
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
}
layer {
name: "layer_128_1_scale2"
type: "Scale"
bottom: "layer_128_1_conv1"
top: "layer_128_1_conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
scale_param {
bias_term: true
}
}
layer {
name: "layer_128_1_relu2"
type: "ReLU"
bottom: "layer_128_1_conv1"
top: "layer_128_1_conv1"
}
layer {
name: "layer_128_1_conv2"
type: "Convolution"
bottom: "layer_128_1_conv1"
top: "layer_128_1_conv2"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 128
bias_term: false
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "layer_128_1_conv_expand"
type: "Convolution"
bottom: "layer_128_1_bn1"
top: "layer_128_1_conv_expand"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 128
bias_term: false
pad: 0
kernel_size: 1
stride: 2
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "layer_128_1_sum"
type: "Eltwise"
bottom: "layer_128_1_conv2"
bottom: "layer_128_1_conv_expand"
top: "layer_128_1_sum"
}
layer {
name: "layer_256_1_bn1"
type: "BatchNorm"
bottom: "layer_128_1_sum"
top: "layer_256_1_bn1"
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
}
layer {
name: "layer_256_1_scale1"
type: "Scale"
bottom: "layer_256_1_bn1"
top: "layer_256_1_bn1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
scale_param {
bias_term: true
}
}
layer {
name: "layer_256_1_relu1"
type: "ReLU"
bottom: "layer_256_1_bn1"
top: "layer_256_1_bn1"
}
layer {
name: "layer_256_1_conv1"
type: "Convolution"
bottom: "layer_256_1_bn1"
top: "layer_256_1_conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 256
bias_term: false
pad: 1
kernel_size: 3
stride: 2
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "layer_256_1_bn2"
type: "BatchNorm"
bottom: "layer_256_1_conv1"
top: "layer_256_1_conv1"
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
}
layer {
name: "layer_256_1_scale2"
type: "Scale"
bottom: "layer_256_1_conv1"
top: "layer_256_1_conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
scale_param {
bias_term: true
}
}
layer {
name: "layer_256_1_relu2"
type: "ReLU"
bottom: "layer_256_1_conv1"
top: "layer_256_1_conv1"
}
layer {
name: "layer_256_1_conv2"
type: "Convolution"
bottom: "layer_256_1_conv1"
top: "layer_256_1_conv2"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 256
bias_term: false
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "layer_256_1_conv_expand"
type: "Convolution"
bottom: "layer_256_1_bn1"
top: "layer_256_1_conv_expand"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 256
bias_term: false
pad: 0
kernel_size: 1
stride: 2
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "layer_256_1_sum"
type: "Eltwise"
bottom: "layer_256_1_conv2"
bottom: "layer_256_1_conv_expand"
top: "layer_256_1_sum"
}
layer {
name: "layer_512_1_bn1"
type: "BatchNorm"
bottom: "layer_256_1_sum"
top: "layer_512_1_bn1"
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
}
layer {
name: "layer_512_1_scale1"
type: "Scale"
bottom: "layer_512_1_bn1"
top: "layer_512_1_bn1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
scale_param {
bias_term: true
}
}
layer {
name: "layer_512_1_relu1"
type: "ReLU"
bottom: "layer_512_1_bn1"
top: "layer_512_1_bn1"
}
layer {
name: "layer_512_1_conv1"
type: "Convolution"
bottom: "layer_512_1_bn1"
top: "layer_512_1_conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 512
bias_term: false
pad: 1
kernel_size: 3
stride: 2
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "layer_512_1_bn2"
type: "BatchNorm"
bottom: "layer_512_1_conv1"
top: "layer_512_1_conv1"
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
}
layer {
name: "layer_512_1_scale2"
type: "Scale"
bottom: "layer_512_1_conv1"
top: "layer_512_1_conv1"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
scale_param {
bias_term: true
}
}
layer {
name: "layer_512_1_relu2"
type: "ReLU"
bottom: "layer_512_1_conv1"
top: "layer_512_1_conv1"
}
layer {
name: "layer_512_1_conv2"
type: "Convolution"
bottom: "layer_512_1_conv1"
top: "layer_512_1_conv2"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 512
bias_term: false
pad: 1
kernel_size: 3
stride: 1
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "layer_512_1_conv_expand"
type: "Convolution"
bottom: "layer_512_1_bn1"
top: "layer_512_1_conv_expand"
param {
lr_mult: 1.0
decay_mult: 1.0
}
convolution_param {
num_output: 512
bias_term: false
pad: 0
kernel_size: 1
stride: 2
weight_filler {
type: "msra"
}
bias_filler {
type: "constant"
value: 0.0
}
}
}
layer {
name: "layer_512_1_sum"
type: "Eltwise"
bottom: "layer_512_1_conv2"
bottom: "layer_512_1_conv_expand"
top: "layer_512_1_sum"
}
layer {
name: "last_bn"
type: "BatchNorm"
bottom: "layer_512_1_sum"
top: "layer_512_1_sum"
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
param {
lr_mult: 0.0
}
}
layer {
name: "last_scale"
type: "Scale"
bottom: "layer_512_1_sum"
top: "layer_512_1_sum"
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 1.0
}
scale_param {
bias_term: true
}
}
layer {
name: "last_relu"
type: "ReLU"
bottom: "layer_512_1_sum"
top: "layer_512_1_sum"
}
layer {
name: "global_pool"
type: "Pooling"
bottom: "layer_512_1_sum"
top: "global_pool"
pooling_param {
pool: AVE
global_pooling: true
}
}
############# self definitation ###############
layer {
name: "classifier-Expressions"
type: "InnerProduct"
bottom: "global_pool"
top: "classifier-Expressions"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 1
}
inner_product_param {
num_output: 6
weight_filler {
type: "xavier"
std: 0.02
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
#######################################################
#######################################################
layer {
name: "prob"
type: "Softmax"
bottom: "classifier-Expressions"
top: "prob"
}
普通卷积神经网络的结构就是”卷积+BN+Scale+激活函数+池化“的模块堆叠,最终级联全连接层和输出层,得到最终的识别结果。