深度卷积神经网络层探析与caffe实现表情识别

本篇博客我们主要来探讨一下深度学习框架下的卷积神经网络各层的作用,并且使用caffe深度学习框架搭建模型实现表情识别。

深度卷积网络各层作用解析

输入层

首先展示一下输入层结构:

input: "data"  
input_shape {  
  dim: 1   # batchsize  
  dim: 1   # number of channels   
  dim: 224  # width  
  dim: 28  # height  
}

在输入层中,我们只需要定义输入数据的形状,在这里我们输入的图像数据,因此定义中的dim: 1是batch的大小,即训练过程中每一个step输入到网络中图片的数量,这样可以使用部分样本的数据来代表全部样本的特点和分布,来更新网络参数啊,这样可以大大减少训练时的计算量和提高模型的泛化能力;dim: 1是图片的通道数,此处设为1表示输入的是灰度图像,如果dim为3,则表示输入的是彩色图像;dim:224dim:28表示每张图像的形状大小为224×28

BatchNormalization层

layer {
  name: "data_bn"
  type: "BatchNorm"
  bottom: "data"
  top: "data_bn"
  param {
    lr_mult: 0.0
  }
}

以上是BN层的网络结构和参数。其中name是自己定义的该层的名字;type为该层的类型,此处为BN层;bottom:data为该层的输入参数,这里的data就是上边输入层的数据;top:data_bn为该层的输出变量,data_bn为自己定义的网络的输出变量;parm里的lr_mult:0.0表示该层的学习率:

  1. 当令 lr_mult = x 时,相当于该层的学习率为 solver.prototxt 中的 base_lr * x;
  2. 当 lr_mult = 1 时,相当于该层的学习率就是 base_lr;
  3. lr_mult = 0 时,相当于固定该层的权重,不需要训练;
    此处为什么将BN层的参数设置为0呢?原因如下:
    batchnormal层中有三个参数:均值、方差和滑动系数,训练时这三个参数是通过当前的数据计算得到的,并且不通过反向传播更新,因此必须将lr_mult和decay_mult都设置为0,因为caffe中这两个参数缺省值是默认为1;如果为1,则会通过反向传播更新该层的参数,这显然是错误的做法。

scale层

layer {
  name: "data_scale"
  type: "Scale"
  bottom: "data_bn"
  top: "data_bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  scale_param {
    bias_term: true
  }
}

在caffe中,scale层是做标准化或者去除均值和方差缩放,其中变化方式为data_bn = α × data_bn + β,等式右边的data_bn为去除均值,方差化为1的数据。经过上述变换,数据的分布就发生了变化,使得数据变得更加平滑,从而避免梯度爆炸和弥散,提高模型的鲁棒性,使得激活函数能够远离其饱和区。

卷积层

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data_bn"
  top: "conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 64
    pad: 3
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "msra"
      variance_norm: FAN_OUT
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}

以上是caffe框架中卷积网络的结构和参数。其中num_output:64是该卷积层中使用的卷积核的个数为64;pad:3表示在原始输入的基础上扩展了3个像素的尺寸,以便卷积核能够在输入矩阵上滑动整数次;kernel_size:7表示当前使用的卷积核的大小是7×7的;stride:2指卷积核每次在输入矩阵上滑动两个像素;weight_filler用来定义卷积核参数的初始化方法,其中MSRA方法是以均值为0、方差为2的高斯分布对卷积核进行初始化,这种初始化方法非常有利于RELU激活函数的训练;bias_filler指的是偏差的初始化方法,此处偏差设置为常量0,即不是用偏差。

RELU激活函数层

layer {
  name: "conv1_relu"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}

卷积核的输出后边一般都会级联一个激活函数层,激活函数是受到生物学中大脑依靠电信号传递信息的机制设计的。激活函数分为线性激活函数和非线性激活函数两种,此处使用的RELU激活函数属于非线性激活函数,这使得深度卷积网络最终拟合出的函数为非线性函数,更能反应数据本身的特点和规律。而且RELU激活函数可以减小梯度消失和梯度爆炸的影响。

池化(pool)层

layer {
  name: "conv1_pool"
  type: "Pooling"
  bottom: "conv1"
  top: "conv1_pool"
  pooling_param {
    kernel_size: 3
    stride: 2
  }

以上为池化层的结构和参数,池化层的目的是为了在卷积层逐渐增加卷积核数量的同时,减小特征map的大小,从而去除掉冗余信息和减小计算量。此处pooling layer的参数设置为:
kernel_size:3表示每次池化的范围为3×3
stride:2池化的步长为2

扩展

layer {
  name: "layer_128_1_conv_expand"
  type: "Convolution"
  bottom: "layer_128_1_bn1"
  top: "layer_128_1_conv_expand"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 128
    bias_term: false
    pad: 0
    kernel_size: 1
    stride: 2
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}

扩展层的卷积核大小都是为1的,步长值设定视具体情况而定,这样相当于不改变原来数据的特征,只是改变了原来特征map的大小,方便后边与其他层的输出进行相加操作。

Eltwise

layer {
  name: "layer_128_1_sum"
  type: "Eltwise"
  bottom: "layer_128_1_conv2"
  bottom: "layer_128_1_conv_expand"
  top: "layer_128_1_sum"
}

Eltwise层的作用就是将两层或者多层的map相加得到新的map,以上Eltwise层是将layer_128_1_conv2layer_128_1_conv_expand相加,最终得到layer_128_1_sum,其中layer_128_1_conv_expand就是通过上一级扩展层得到的输出,如果不经过扩展直接相加,两层形状不一样,会报错。

全连接层

layer {
  name: "classifier-Expressions"
  type: "InnerProduct"
  bottom: "global_pool"
  top: "classifier-Expressions"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 1
  }
  inner_product_param {
    num_output: 6
    weight_filler {
      type: "xavier"
      std: 0.02
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}

经过卷积、批量归一化、池化、激活函数之后,我们得到的是一个多维矩阵形式的特征map,最后我们需要将其处理成一维的形式,级联全连接形式的神经网络。caffe中承担这一任务的就是InnerProduct层。

输出层

layer {  
  name: "prob"  
  type: "Softmax"  
  bottom: "classifier-Expressions"  
  top: "prob"  
}  

输出层这里采用softmax函数,将输出以概率的形式给出,我们取概率值最大的那一个作为识别结果,输出层神经元的个数与你分类的种类数是相同的。

表情识别完整网络结构

name: "macro_expression_recognition"

################ define data  #############
input: "data"  
input_shape {  
  dim: 1   # batchsize  
  dim: 1   # number of channels   
  dim: 224  # width  
  dim: 28  # height  
}

#################   Define net    ###########
layer {
  name: "data_bn"
  type: "BatchNorm"
  bottom: "data"
  top: "data_bn"
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
}
layer {
  name: "data_scale"
  type: "Scale"
  bottom: "data_bn"
  top: "data_bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data_bn"
  top: "conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 64
    pad: 3
    kernel_size: 7
    stride: 2
    weight_filler {
      type: "msra"
      variance_norm: FAN_OUT
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv1_bn"
  type: "BatchNorm"
  bottom: "conv1"
  top: "conv1"
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
}
layer {
  name: "conv1_scale"
  type: "Scale"
  bottom: "conv1"
  top: "conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "conv1_relu"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "conv1_pool"
  type: "Pooling"
  bottom: "conv1"
  top: "conv1_pool"
  pooling_param {
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "layer_64_1_conv1"
  type: "Convolution"
  bottom: "conv1_pool"
  top: "layer_64_1_conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 64
    bias_term: false
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "layer_64_1_bn2"
  type: "BatchNorm"
  bottom: "layer_64_1_conv1"
  top: "layer_64_1_conv1"
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
}
layer {
  name: "layer_64_1_scale2"
  type: "Scale"
  bottom: "layer_64_1_conv1"
  top: "layer_64_1_conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "layer_64_1_relu2"
  type: "ReLU"
  bottom: "layer_64_1_conv1"
  top: "layer_64_1_conv1"
}
layer {
  name: "layer_64_1_conv2"
  type: "Convolution"
  bottom: "layer_64_1_conv1"
  top: "layer_64_1_conv2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 64
    bias_term: false
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "layer_64_1_sum"
  type: "Eltwise"
  bottom: "layer_64_1_conv2"
  bottom: "conv1_pool"
  top: "layer_64_1_sum"
}
layer {
  name: "layer_128_1_bn1"
  type: "BatchNorm"
  bottom: "layer_64_1_sum"
  top: "layer_128_1_bn1"
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
}
layer {
  name: "layer_128_1_scale1"
  type: "Scale"
  bottom: "layer_128_1_bn1"
  top: "layer_128_1_bn1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "layer_128_1_relu1"
  type: "ReLU"
  bottom: "layer_128_1_bn1"
  top: "layer_128_1_bn1"
}
layer {
  name: "layer_128_1_conv1"
  type: "Convolution"
  bottom: "layer_128_1_bn1"
  top: "layer_128_1_conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 128
    bias_term: false
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "layer_128_1_bn2"
  type: "BatchNorm"
  bottom: "layer_128_1_conv1"
  top: "layer_128_1_conv1"
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
}
layer {
  name: "layer_128_1_scale2"
  type: "Scale"
  bottom: "layer_128_1_conv1"
  top: "layer_128_1_conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "layer_128_1_relu2"
  type: "ReLU"
  bottom: "layer_128_1_conv1"
  top: "layer_128_1_conv1"
}
layer {
  name: "layer_128_1_conv2"
  type: "Convolution"
  bottom: "layer_128_1_conv1"
  top: "layer_128_1_conv2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 128
    bias_term: false
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "layer_128_1_conv_expand"
  type: "Convolution"
  bottom: "layer_128_1_bn1"
  top: "layer_128_1_conv_expand"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 128
    bias_term: false
    pad: 0
    kernel_size: 1
    stride: 2
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "layer_128_1_sum"
  type: "Eltwise"
  bottom: "layer_128_1_conv2"
  bottom: "layer_128_1_conv_expand"
  top: "layer_128_1_sum"
}
layer {
  name: "layer_256_1_bn1"
  type: "BatchNorm"
  bottom: "layer_128_1_sum"
  top: "layer_256_1_bn1"
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
}
layer {
  name: "layer_256_1_scale1"
  type: "Scale"
  bottom: "layer_256_1_bn1"
  top: "layer_256_1_bn1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "layer_256_1_relu1"
  type: "ReLU"
  bottom: "layer_256_1_bn1"
  top: "layer_256_1_bn1"
}
layer {
  name: "layer_256_1_conv1"
  type: "Convolution"
  bottom: "layer_256_1_bn1"
  top: "layer_256_1_conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 256
    bias_term: false
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "layer_256_1_bn2"
  type: "BatchNorm"
  bottom: "layer_256_1_conv1"
  top: "layer_256_1_conv1"
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
}
layer {
  name: "layer_256_1_scale2"
  type: "Scale"
  bottom: "layer_256_1_conv1"
  top: "layer_256_1_conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "layer_256_1_relu2"
  type: "ReLU"
  bottom: "layer_256_1_conv1"
  top: "layer_256_1_conv1"
}
layer {
  name: "layer_256_1_conv2"
  type: "Convolution"
  bottom: "layer_256_1_conv1"
  top: "layer_256_1_conv2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 256
    bias_term: false
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "layer_256_1_conv_expand"
  type: "Convolution"
  bottom: "layer_256_1_bn1"
  top: "layer_256_1_conv_expand"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 256
    bias_term: false
    pad: 0
    kernel_size: 1
    stride: 2
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "layer_256_1_sum"
  type: "Eltwise"
  bottom: "layer_256_1_conv2"
  bottom: "layer_256_1_conv_expand"
  top: "layer_256_1_sum"
}
layer {
  name: "layer_512_1_bn1"
  type: "BatchNorm"
  bottom: "layer_256_1_sum"
  top: "layer_512_1_bn1"
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
}
layer {
  name: "layer_512_1_scale1"
  type: "Scale"
  bottom: "layer_512_1_bn1"
  top: "layer_512_1_bn1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "layer_512_1_relu1"
  type: "ReLU"
  bottom: "layer_512_1_bn1"
  top: "layer_512_1_bn1"
}
layer {
  name: "layer_512_1_conv1"
  type: "Convolution"
  bottom: "layer_512_1_bn1"
  top: "layer_512_1_conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 512
    bias_term: false
    pad: 1
    kernel_size: 3
    stride: 2
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "layer_512_1_bn2"
  type: "BatchNorm"
  bottom: "layer_512_1_conv1"
  top: "layer_512_1_conv1"
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
}
layer {
  name: "layer_512_1_scale2"
  type: "Scale"
  bottom: "layer_512_1_conv1"
  top: "layer_512_1_conv1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "layer_512_1_relu2"
  type: "ReLU"
  bottom: "layer_512_1_conv1"
  top: "layer_512_1_conv1"
}
layer {
  name: "layer_512_1_conv2"
  type: "Convolution"
  bottom: "layer_512_1_conv1"
  top: "layer_512_1_conv2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 512
    bias_term: false
    pad: 1
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "layer_512_1_conv_expand"
  type: "Convolution"
  bottom: "layer_512_1_bn1"
  top: "layer_512_1_conv_expand"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  convolution_param {
    num_output: 512
    bias_term: false
    pad: 0
    kernel_size: 1
    stride: 2
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "layer_512_1_sum"
  type: "Eltwise"
  bottom: "layer_512_1_conv2"
  bottom: "layer_512_1_conv_expand"
  top: "layer_512_1_sum"
}
layer {
  name: "last_bn"
  type: "BatchNorm"
  bottom: "layer_512_1_sum"
  top: "layer_512_1_sum"
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
  param {
    lr_mult: 0.0
  }
}
layer {
  name: "last_scale"
  type: "Scale"
  bottom: "layer_512_1_sum"
  top: "layer_512_1_sum"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 1.0
  }
  scale_param {
    bias_term: true
  }
}
layer {
  name: "last_relu"
  type: "ReLU"
  bottom: "layer_512_1_sum"
  top: "layer_512_1_sum"
}
layer {
  name: "global_pool"
  type: "Pooling"
  bottom: "layer_512_1_sum"
  top: "global_pool"
  pooling_param {
    pool: AVE
    global_pooling: true
  }
}

#############     self definitation ###############
layer {
  name: "classifier-Expressions"
  type: "InnerProduct"
  bottom: "global_pool"
  top: "classifier-Expressions"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 1
  }
  inner_product_param {
    num_output: 6
    weight_filler {
      type: "xavier"
      std: 0.02
    }
    bias_filler {
      type: "constant"
      value: 0.2
    }
  }
}
#######################################################
#######################################################

layer {  
  name: "prob"  
  type: "Softmax"  
  bottom: "classifier-Expressions"  
  top: "prob"  
}  

总结

普通卷积神经网络的结构就是”卷积+BN+Scale+激活函数+池化“的模块堆叠,最终级联全连接层和输出层,得到最终的识别结果。

你可能感兴趣的:(神经网络,卷积,神经网络,深度学习,人工智能,caffe)