一、数据集准备 点击此处返回总目录 二、数据处理 三、数据集划分 四、制作标签 五、生成LMDB文件 六、编写网络结构文件 七、编写超参数文件 八、训练模型 九、测试模型 十、把分类器CNN转换成全卷积网络CFN 十一、 目的:输入一张图片,希望能够把这些人的脸框出来。  参考:https://blog.csdn.net/sinat_14916279/article/details/71273892 注:我不是做人脸检测的,纯粹就是练手玩一下~~ 一、数据集准备 从网上下载人脸检测的数据集。数据来源很多,比如benchmark啊,论文里的啊什么的,自己找吧。我下载后的数据的格式是: 路径/xxx.jpg 60,80,280,320 其中四个值分别表示人脸框的左上角坐标和右下角坐标。其实关于标注,也不一定是给出左上角右下角的坐标这种方法。也可能是给出左上角坐标+图像高度+图像宽度等方法。 二、数据处理 我们想让模型知道什么样是人脸,什么样不是人脸。模型做的是分类的任务。 训练数据应当是二分类的数据,第一类是人脸,第二类是非人脸。 1. 根据原始数据制作正样本(人脸)数据。 什么是人脸呢?就是图中框框的部分。下载的数据集的图像不是光包括人脸的,而是既有人脸又有身体什么的图片。我们要做一个裁剪的操作得到人脸的图片。可以根据标注的坐标把人脸裁剪出来。这样就得到了正样本。可以使用OpenCV来做,很简单。裁剪完之后检查检查把不太好的删掉。 得到人脸数据:  2. 根据原始数据制作负样本(非人脸)数据。 什么是非人脸呢?除了人脸,其他都是非人脸。负样本稍微麻烦一点。 第一种情况就是,图中红色框框中的就是非人脸。  第二种情况就是,重合了一小部分的我们也可以认为是非人脸。比如:  这时我们需要定义一个指标:交并比(Intersection-over-Union,IoU),是指是产生的候选框(candidate bound)与原标记框(ground truth bound)的交叠率,即它们的交集与并集的比值。 因此,对于非人脸制作。我们随机选很多框。框的个数可以是几十上百,框的大小也可以随机变化。 找出框来之后,计算与人脸框的IoU。如果重叠的比例小于0.3,我们就认为是非人脸数据。如果IoU>0.7,也可以认为是人脸数据,放到人脸数据集中(这样的话不仅仅可以增加人脸数据的大小,而且考虑了人脸的遮挡、照不全等情况)。如果0.3 当然生成之后检查检查把不太好的删掉。 如果担心得到非人脸数据里面掺了人脸的图片,又不想费力气去检查一下,还有一种方法就是从其他的数据集中裁剪,比如物体检测的数据集里面就没有人脸,无论怎么切都切不出来人脸来。 最后得到非人脸数据:  三、数据集划分 正样本和负样本已经有了之后,还需要对样本进行划分。划分成训练集、验证集和测试集。 划分结果如下: |----data | |---- face_train | | |----0 //训练集正样本。15738张图片。人脸。 | | |----1 //训练集负样本。15663张图片。非人脸。 | | | | | |---- face_val | | |----0 //验证集正样本。3156张图片。人脸。 | | |----1 //验证集负样本。3146张图片。非人脸。 | | | | | |----face_test | |----0 //测试集正样本。2079张图片。人脸。 | |----1 //测试集负样本。2066张图片。非人脸。 | | |---labels //存放标签 | | |---bat //存放脚本 | | |---lmdb //存放转换后的lmdb数据 | | |---model //训练后结果保存到这里 | | |---draw //可视化的图保存在这里 四、制作标签 编写生成标签的程序,在labels文件夹下生成了face_train.txt和face_val.txt文件:  五、生成LMDB文件 AlexNet和VGGNet一般resize到227*227。 //create_train_lmdb.bat  //create_test_lmdb.bat  运行,可以看到在lmdb文件夹下生成了face_train_lmdb目录和face_val_lmdb目录,里面有lmdb文件。 六、编写网络结构文件 //train_val.prototxt
name: "face_train_val_net" layer { top: "data" top: "label" name: "data" type: "Data" data_param { source: "F:/deep_learning/face_detection/lmdb/face_train_lmdb" backend: LMDB batch_size: 128 #128。不同的batch_size对训练速度又很大影响。 } transform_param { #mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" #就不减均值了。 mirror: true } include: { phase: TRAIN } } layer { top: "data" top: "label" name: "data" type: "Data" data_param { source: "F:/deep_learning/face_detection/lmdb/face_val_lmdb" backend: LMDB batch_size: 64 #64 } transform_param { #mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" mirror: false } include: { phase: TEST } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 20 kernel_size: 3 stride: 1 pad: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "norm1" type: "LRN" bottom: "conv1" top: "conv1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 40 kernel_size: 3 pad: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "norm2" type: "LRN" bottom: "conv2" top: "conv2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 60 kernel_size: 3 pad: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "norm3" type: "LRN" bottom: "conv3" top: "conv3" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool3" type: "Pooling" bottom: "conv3" top: "pool3" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv4" type: "Convolution" bottom: "pool3" top: "conv4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 80 kernel_size: 3 pad: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "norm4" type: "LRN" bottom: "conv4" top: "conv4" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool4" type: "Pooling" bottom: "conv4" top: "pool4" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "fc5" type: "InnerProduct" bottom: "pool4" top: "fc5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 160 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu5" type: "ReLU" bottom: "fc5" top: "fc5" } layer { name: "drop5" type: "Dropout" bottom: "fc5" top: "fc5" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc6" type: "InnerProduct" bottom: "fc5" top: "fc6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 2 #2分类。 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "accuracy" type: "Accuracy" bottom: "fc6" bottom: "label" top: "accuracy" include: { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "fc6" bottom: "label" top: "loss" loss_weight: 0.5 } |
画一下网络结构图:  七、编写超参数文件 //solver.prototxt  八、训练模型 //train.bat  问题:网络训练时间跟什么有关。 1. 跟选的网络有关。使用alexnet和vggNet肯定是不一样的。alexnet只有8层。网络越深,训练时间越长。 2. 输入数据的大小。227*227的和32*32的肯定不一样,而且并不是10倍的慢,可能要慢几百倍。 经过漫长的等待,训练完之后得到模型:  九、测试模型 训练的时候用到训练集和验证集。测试的时候用到测试集。 1. 根据train_val.prototxt 写develop.prototxt文件 //develop.prototxt
name: "face_train_val_net" name: "face_net"
layer { top: "data" top: "label" name: "data" type: "Data" data_param { source: "F:/deep_learning/face_detection/lmdb/face_train_lmdb" backend: LMDB batch_size: 128 #128。不同的batch_size对训练速度又很大影响。 } transform_param { #mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" #就不减均值了。 mirror: true } include: { phase: TRAIN } } layer { top: "data" top: "label" name: "data" type: "Data" data_param { source: "F:/deep_learning/face_detection/lmdb/face_val_lmdb" backend: LMDB batch_size: 64 #64 } transform_param { #mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" mirror: false } include: { phase: TEST } } layer{ name: "data" type: "Input" top: "data" input_param { shape: { dim: 1 dim: 3 dim: 48 dim: 48 } } //注意:四个分别为batch,通道,高,宽。不是宽,高。 } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1"
param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 20 kernel_size: 3 stride: 1 pad: 1
weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "norm1" type: "LRN" bottom: "conv1" top: "conv1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2"
param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 40 kernel_size: 3 pad: 1
weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "norm2" type: "LRN" bottom: "conv2" top: "conv2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3"
param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 60 kernel_size: 3 pad: 1
weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "norm3" type: "LRN" bottom: "conv3" top: "conv3" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool3" type: "Pooling" bottom: "conv3" top: "pool3" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv4" type: "Convolution" bottom: "pool3" top: "conv4"
param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 80 kernel_size: 3 pad: 1
weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "norm4" type: "LRN" bottom: "conv4" top: "conv4" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool4" type: "Pooling" bottom: "conv4" top: "pool4" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "fc5" type: "InnerProduct" bottom: "pool4" top: "fc5"
param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 160
weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu5" type: "ReLU" bottom: "fc5" top: "fc5" } layer { name: "drop5" type: "Dropout" bottom: "fc5" top: "fc5" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc6" type: "InnerProduct" bottom: "fc5" top: "fc6"
param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 2 #2分类。
weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } }
layer { name: "accuracy" type: "Accuracy" bottom: "fc6" bottom: "label" top: "accuracy" include: { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "fc6" bottom: "label" top: "loss" loss_weight: 0.5 } layer{ name: "prob" type: "softmax" bottom: "fc6" top:"prob" } |
2. 测试 方法一:测试某一张图片 //face_test.py
# -*- coding: utf-8 -*- import numpy as np import os #import sys #import matplotlib.pyplot as plt #caffe_root = '/home/caffe' #Caffe的根目录 #sys.path.insert(0, caffe_root + '/python') import caffe size = 48 #待测试图片 image_file = 'F:/deep_learning/face_detection/data/face_val/0/71_faceimage22831.jpg' #deploy文件 model_def = 'F:/deep_learning/face_detection/deploy.prototxt' #训练好的模型 model_weights = 'F:/deep_learning/face_detection/models/_iter_100000.caffemodel' #GPU模式 #caffe.set_device(0) caffe.set_mode_cpu() net = caffe.Net(model_def, model_weights, caffe.TEST) # 加载均值文件 也可指定数值做相应的操作 #mu = np.load('C:/Users/Administrator/Desktop/caffe/python/caffe/imagenet/ilsvrc_2012_mean.npy') ###caffe 自带的文件 #mu = mu.mean(1).mean(1) # average over pixels to obtain the mean (BGR) pixel values #对输入的图片进行预处理 transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) #transformer.set_mean('data', mu) # 每个通道减去均值 # python读取的图片文件格式为H×W×K,需转化为K×H×W transformer.set_transpose('data', (2,0,1)) #改变维度的顺序,由原始图片(48,48,3)变为(3,48,48) # python中将图片存储为[0, 1],而caffe中将图片存储为[0, 255],所以需要一个转换 transformer.set_raw_scale('data', 255) # 缩放到[0,255]之间 transformer.set_channel_swap('data', (2,1,0)) #交换通道,将图片由RGB变为BGR net.blobs['data'].reshape(1,3,size,size) # 将输入图片格式转化为合适格式(与deploy文件相同) #上面这句,第一参数:图片数量 第二个参数 :通道数 第三个参数:图片高度 第四个参数:图片宽度 image = caffe.io.load_image(image_file) #加载图片,始终是得到一副(h,w,3),rgb,0~1,float32的图像 net.blobs['data'].data[...] = transformer.preprocess('data', image) #用上面的transformer.preprocess来处理刚刚加载图片 ### perform classification output = net.forward() #out中包含每一层的结果 #print output output_prob = output['prob'][0].argmax() # 'prob'层,第一张图片,概率最高的类,需要自己对应到我们约定的类别去 print output_prob #print output['prob'][0][0] #或print output['prob'][0,1] |
运行,报错了。好像是环境的问题,等解决了再说。 方法二:测试整个测试集。 等调通了方法一再说。 --------------------------------------------------------------------------------------------------------------------------- 我们训练的模型是:输入一张48*48的图像,输出0或者1。但是实际上我们输入的图像不一定是48*48的。而且输入的图像是可能包含人脸的图片,人脸只是图片的一部分。 我们已经训练好了模型,但是离实际使用还有一些差距。主要是存在以下几个问题。 问题一:给的不是人脸或者非人脸的图片。 我们训练的模型是输入一张48*48的图片,输出是否是人脸,是一个二分类的模型。但是实际上我们给的是以下这种图片,让你框出人脸来。而不是给一张图片判断是不是人脸。  解决方法:滑动窗口。搞一个48*48的窗口一直滑动。然后判断窗口里面有没有人脸。  问题二:人脸大小不一样。 一张图片里面人脸大小不一定是48*48的。比如有的人脸很小,有的人脸很大。而我们的模型只能识别48*48的。 解决方法,做一个多尺度的scale变换。对一个图像做很多个大小变换,既有大图又有小图。总有一张图的一个人脸适合48*48的吧。 这个东西就是图像金字塔。  问题三:输入图片大小不一样 我们模型要求输入的图片是48*48的。但是变成图像金字塔之后,输入图片大小不一。因为我们的神经网络有全连接层。全连接层我们已经写死了。它前面连的特征是固定大小的,shape是一定固定的,是写死的。 而一个小的图片经过卷积池化,和一个大的图片经过相同的卷积池化,得到的特征图肯定是不一样的。这两个特征图都想接全连接层,就比较难办了。 解决方法:不能用全连接层。要把全连接层转换成全卷积层。这样就使得多scale的输入是可行的了。 全卷积是怎么变换的呢?caffe的官网上有一个例子。 十、把分类器CNN转换成全卷积网络CFN 1. 将deploy.prototxt修改成全卷积的deploy_full_conv.prorotxt
name: "face_net" name: "face_full_conv_net" layer { name: "data" type: "Input" top: "data" input_param { shape: { dim: 1 dim: 3 dim: 48 dim: 48 } } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" convolution_param { num_output: 20 kernel_size: 3 stride: 1 pad: 1 } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "norm1" type: "LRN" bottom: "conv1" top: "conv1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" convolution_param { num_output: 40 kernel_size: 3 pad: 1 } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "norm2" type: "LRN" bottom: "conv2" top: "conv2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" convolution_param { num_output: 60 kernel_size: 3 pad: 1 } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "norm3" type: "LRN" bottom: "conv3" top: "conv3" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool3" type: "Pooling" bottom: "conv3" top: "pool3" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv4" type: "Convolution" bottom: "pool3" top: "conv4" convolution_param { num_output: 80 kernel_size: 3 pad: 1 } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "norm4" type: "LRN" bottom: "conv4" top: "conv4" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool4" type: "Pooling" bottom: "conv4" top: "pool4" pooling_param { pool: MAX kernel_size: 2 stride: 2 } }
layer { name: "fc5" type: "InnerProduct" bottom: "pool4" top: "fc5" inner_product_param { num_output: 160 } } layer { name: "fc5-conv" type: "Convolution" bottom: "pool4" top: "fc5-conv" convolution_param { num_output: 160 kernel_size: 3 } } layer { name: "relu5" type: "ReLU" bottom: "fc5-conv" top: "fc5-conv" } layer { name: "drop5" type: "Dropout" bottom: "fc5-conv" top: "fc5-conv" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc6" type: "InnerProduct" bottom: "fc5" top: "fc6" inner_product_param { num_output: 2 } } layer { name: "fc6-conv" type: "Convolution" bottom: "fc5-conv" top: "fc6-conv" convolution_param { num_output: 2 kernel_size: 1 } } layer { name: "prob" type: "Softmax" bottom: "fc6-conv" top: "prob" } |
2. 将训练好的分类模型caffemodel转换成可以接受任意输入大小,最后输出特征图的全卷积模型caffemodel 编写脚本convert_full_conv.py
# -*- coding: utf-8 -*- #首先需要手动将deploy.prototxt修改成全卷积的deploy_full_conv.prorotxt,特别要注意全连接层修改成卷积层的细节 #将训练好的分类模型caffemodel转换成可以接受任意输入大小,最后输出特征图的全卷积模型caffemodel import numpy as np import caffe model_def = 'F:/deep_learning/face_detection/deploy.prototxt' model_weights = 'F:/deep_learning/face_detection/models/_iter_100000.caffemodel' net = caffe.Net(model_def, model_weights, caffe.TEST) params = ['fc5', 'fc6'] # fc_params = {name: (weights, biases)} fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params} for fc in params: print '{} weights are {} dimensional and biases are {} dimensional'.format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape) # Load the fully convolutional network to transplant the parameters. net_full_conv = caffe.Net('F:/deep_learning/face_detection/deploy_full_conv.prototxt', 'F:/deep_learning/face_detection/models/_iter_100000.caffemodel', caffe.TEST) params_full_conv = ['fc5-conv', 'fc6-conv'] # conv_params = {name: (weights, biases)} conv_params = {pr: (net_full_conv.params[pr][0].data, net_full_conv.params[pr][1].data) for pr in params_full_conv} for conv in params_full_conv: print '{} weights are {} dimensional and biases are {} dimensional'.format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape) for pr, pr_conv in zip(params, params_full_conv): conv_params[pr_conv][0].flat = fc_params[pr][0].flat # flat unrolls the arrays conv_params[pr_conv][1][...] = fc_params[pr][1] net_full_conv.save('F:/deep_learning/face_detection/models/_iter_100000_full_conv.caffemodel') print 'success' |
运行,生成了_iter_100000_full_conv.caffemodel。  十一、 |