[Tensorflow] 搭建简化版LeNet5来训练识别CIFAR-10图片数据

Le-Net5是由Yann LeCun于1998年在论文Gradient-based learning applied to document recognition中提出的,在论文中它主要用于识别MNIST手写数字的识别率。此处我们要来改造一下这个网络,已用于CIFAR-10数据的训练和识别。

此处最大的区别在于,MNIST手写数字是32x32的单色图片,而CIRAR-10则是32x32的RGB图像,使用Le-Net5的网络来训练效果会如何呢? 让我们来尝试一下。

  • Le-Net5模型结构介绍
  • CIFAR-10数据介绍
  • 构建Le-Net5
  • 训练和测试
  • 附录

一. Le-Net5简化版模型结构介绍

这个模型的基本结构如下这张图中所示,在原论文中有大量的细节处理,此处我们主要的目的并非研究模型算法,而是学习如何使用Tensorflow,因此略去了一些细节内容,采用了一个简化的模型。
[Tensorflow] 搭建简化版LeNet5来训练识别CIFAR-10图片数据_第1张图片
输入是一张32x32的单色图片。
第一层卷积层,采用5x5的过滤器,步长为1,输入通道为3, 输出通道为6, 生成数据为28x28x6,激活函数采用sigmoid。
第二层池化层,采用2x2的过滤器,步长为1,生成数据为14x14x6,激活函数采用sigmoid。
第三层卷积层,依旧采用5x5的过滤器, 步长为1,输入通道为6, 输出通道为16, 生成数据为10x10x16,激活函数采用sigmoid。(在原论文中,此处并非全连接,而是采用一种不对称的连接方式,因为实现起来太过复杂,此处改为全连接。)
第四层池化层,采用2x2的过滤器,步长为2,生成数据为5x5x16,激活函数采用sigmoid。
第五层依旧是卷积层,采用5x5的过滤器, 步长为1,输入通道为16, 输出通道为120, 生成数据为1x1x120,激活函数采用tanh。
第五层是展开层,根据上一层的数据量,该层为120= 1x1x120。因为这一层没有对数据做任何改变,仅仅是展开,因此一般不被看做一个有效层。
第六层是全连接层,节点数量是120。
第七层依旧是全连接层,节点数是84。
最后一层采用softmax来输出预测结果。(在原论文中此处采用RBF,而非softmax。)

三. CIFAR-10 数据集

CIFAR-10数据集包含60000张32x32的彩色图片,分别属于10个类别,每个类别分别有6,0000张图片。其中50000张图片被作为训练集,而另外10000张图片则被作为测试集。
该数据有3种格式可供选择,python格式,matlab格式和二进制格式。此处我们使用的是python格式。该数据需要导入pickle模块,并使用pickle.load方法来读取。数据被读取进来后,他的数据结构是一个数据字典。
他的data部分是10000x3072的numpy数组,数据类型为unit8。没一张图片数据中,前1024字节是红色通道的数据,中间1024字节是绿色通道数据,最后的1024字节是蓝色通道数据。
他的label部分是10000个数字,数字范围是0-9分别代表10个类别。
关于CIFAR-10数据集更多信息,可以在神经网络的大牛Alex Krizhevsky的个人网站上找到。 http://www.cs.toronto.edu/~kriz/cifar.html

二. 构建Le-Net5

我们构建一个LeNet5类来封装网络结构。
在论文中有提到,输入数据是经过归一化处理的,值空间大约在-0.1到1.175之间,这样可以加速训练速度,因此,此处加入了归一化处理,使用tensorflow函数l2_normalize。
为了让代码更加易读,我将卷积层,池化层和全连接层的实现代码封装在一个叫tf_general的模块内。tf_general的代码在文章尾部的附注中给出。
另外,根据论文中描述,Le-Net5采用的激活函数是sigmoid而非我们现在最常用的LeRU。
代码如下:

import tensorflow as tf
import tf_general as tfg

class LeNet5(object):
    def __init__(self, x, n_class=10, drop_rate=0):
        self.input = x
        self.n_class = n_class
        self.drop_rate = drop_rate
        self._build_net()

    def _build_net(self):
        with tf.name_scope('norm'):    
            self.x_norm = tf.nn.l2_normalize(tf.cast(self.input, tf.float32),axis=1)
        
        with tf.name_scope('conv_1'):
            self.conv1 = tfg.conv2d(self.x_norm, 5, 1, 6, 'conv1', 'VALID')
            print('conv_1: ', self.conv1.get_shape())
        
        
        with tf.name_scope('pool_1'):
            self.pool1 = tfg.avg_pool(self.conv1, 2, 2, 'pool1', 'VALID')
            print('pool_1: ', self.pool1.get_shape())
        
        
        with tf.name_scope('conv_2'):
            self.conv2 = tfg.conv2d(self.pool1, 5, 1, 16, 'conv2', 'VALID','SIGMOID')
            print('conv_2: ', self.conv2.get_shape())
        
        with tf.name_scope('pool_2'):
            self.pool2 = tfg.avg_pool(self.conv2, 2, 2, 'pool2', 'VALID')
            print('pool_2: ', self.pool2.get_shape())   
            
        with tf.name_scope('conv_3'):
            self.conv3 = tfg.conv2d(self.pool2, 5, 1, 120, 'conv3', 'VALID','TANH')
            print('conv_3:', self.conv3.get_shape())
    
        with tf.name_scope('flat_1'):
            self.flat1, self.flat_dim = tfg.flatten(self.conv3)
            print('flat_1:', self.flat1.get_shape())
        
        with tf.name_scope('fc_2'):
            self.fc2 = tfg.fc_layer(self.flat1, 120, 84, 'fc2')
            print('fc_2 ', self.fc2.get_shape())
        
        with tf.name_scope('fc_3'):
            self.fc3 = tfg.fc_layer(self.fc2, 84, 10, 'fc4')
            print('fc_3: ', self.fc3.get_shape())

        with tf.name_scope('drop_out'):
            self.drop1 = tfg.drop_out(self.fc3, self.drop_rate, 'drop_out')
            print('drop_out: ', self.drop1.get_shape())

        with tf.name_scope('prediction'):
            self.prediction = tf.nn.softmax(self.drop1)
            print('prediction: ', self.prediction.get_shape())

四. 训练和测试

在训练开始之前先要定义超参。
此处我们定义了6个超参

  • 训练迭代次数 epoch
  • 每个训练批次的图像数 batch_size
  • 每个测试批次的图像数 test_size
  • 初始的学习率 lr
  • 是否需要开启训练数据图像增强,True表示开启,False表示关闭。这个默认是开启的。
FLAGS = tf.flags.FLAGS
try:
    tf.flags.DEFINE_string('f', '', 'kernel')
    #super parameters
    tf.flags.DEFINE_integer('epoch', 30000, 'epoch')
    tf.flags.DEFINE_integer('batch_size',200, 'batch size')
    tf.flags.DEFINE_integer('test_size', 200, 'test size')
    tf.flags.DEFINE_float('lr', 0.01, 'learning rate')
    tf.flags.DEFINE_float('keep_prob', 0.8, 'keep prob for drop lay')
    tf.flags.DEFINE_boolean('augument', True, 'if image augument is applied')
    #other parameters
    tf.flags.DEFINE_float('ckpt_frequency', 250, 'frequency to save checkpoint')
    tf.flags.DEFINE_boolean('restore', False, 'restore from checkpoint and run test')
    print('parameters were defined.')
except:
    print('parameters have been defined.')

第一步 定义输入,采用供给数据的方式,因此需要定义一个和输入数据结构一样的placeholder叫做x。再定义y_作为标签的placeholder。
此处要注意的是,LeNet5原本是用与单色图片上的,而我们现在的数据却是RGB彩色的,也就是说输入是3通道的而非单通道。因此我们将输入x的结构定义为从原本的[None, 32,32,1]改为[None, 32,32,3], 输入为3通道,至于训练效果如何,让我们拭目以待。

with tf.name_scope('input'):
    x = tf.placeholder(tf.float32, [None, 32,32,3], name='x_input')
    y_ = tf.placeholder(tf.int64, [None], name='labels')

第二步 生成LeNet5对象,x作为输入,y作为他的输出,即预测结果。

with tf.name_scope('prediction'):
    le_net5 = LetNet5(x)
    y = le_net5.prediction

第三步 计算交叉熵做为损失函数。

with tf.name_scope('cross_entropy'):
    cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,
                                                                                  labels=y_, 
                                                                                  name="cross_entropy_per_example"))

第四步 使用AdagradOptimizer进行优化。该算法可以自动变更学习速率, 只是需要设定一个全局的学习速率ϵ,但是这并非是实际学习速率,实际的速率是与以往参数的模之和的开方成反比的。
在这里插入图片描述

with tf.name_scope('train_step'):
    train_step = tf.train.AdagradOptimizer(FLAGS.lr).minimize(cross_entropy)

第五步 计算训练正确率
argmax方法返回指定维度的向量中最大值的索引。y是一个Nx10的numpy数组,第一维度表示是该批次训练数据的个数,第二维度则是softmax的输出,是10个0-1之间的概率值。我们制定对第二维度(索引值为1)求argmax将softmax的输出转换成0-9的数值。然后使用equal将其与数据标签y_比对,相等的值为True即1,不相等的为False即0,。对该结果求平均值,就可以得到预测正确的概率。

prediction =tf.argmax(y, 1)
accuracy = tf.reduce_mean(tf.cast( tf.equal(prediction,y_), tf.float32))

至此整个训练的图已经完成,接下来我们要读取数据,启动session进行运算了。
在训练过程中,每完成250次训练,我们会保存一下模型,并进行一次测试,测试时不做优化,也不drop参数。由于测试数据量较大,我们依旧需要分批进行,并将测试正确率打印出来。

data = cifar10();
if __name__ == '__main__':
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        saver = tf.train.Saver(max_to_keep=1)
        for i in range(FLAGS.epoch):
            train_image, train_label,_ = data.get_train_batch(FLAGS.batch_size)
            loss, _,accuracy_rate = sess.run([cross_entropy, train_step,accuracy], 
            						feed_dict={x:train_image, y_:train_label})
            if (i+1) % FLAGS.ckpt_frequency == 0:  #保存预测模型
                saver.save(sess,ckpt_dir+'cifar10_'+str(i+1)+'.ckpt',global_step=i+1)
                acc_accuracy = 0
                for j in range(int(10000/FLAGS.test_size)):                    
                    test_image, test_label,test_index = data.get_test_batch(FLAGS.test_size)
                    accuracy_rate, output = sess.run([accuracy,prediction],feed_dict=
                    						{keep_prob: 1, x:test_image, y_:test_label})
                    acc_accuracy += accuracy_rate
                accuracy_rate = acc_accuracy/10000*FLAGS.test_size
                print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) + ' iter ' + str(i) + ', Test accuracy:' +str(round(accuracy_rate*100,2))+'%')
    tf.reset_default_graph()

让我们来看一下训练结果如何。此处进行了三次独立的训练,迭代20000次后正确率仅仅为36%,而且趋于平缓。这个结果显然很不理想。下一次,我们就来看看如何增强Le-Net5。
[Tensorflow] 搭建简化版LeNet5来训练识别CIFAR-10图片数据_第2张图片

附录

1. 深入理解LeNet5的结构

参考 EVA HUA的“Lenet5设计理解——咬文嚼字系列”。
https://blog.csdn.net/whatwho_518/article/details/79724602

2. Tensorflow通用方法封装

import tensorflow as tf

def get_variable(name, shape, initializer, regularizer=None, dtype='float', trainable=True):
    collections = [tf.GraphKeys.GLOBAL_VARIABLES]
    return tf.get_variable(name,
                           shape=shape,
                           initializer=initializer,
                           regularizer=regularizer,
                           collections=collections,
                           dtype=dtype,
                           trainable=trainable)
    #tf.get_variable_scope().reuse_variables()

def conv2d(x, ksize, stride, filter_out, name, padding='VALID', activate = 'RELU'):
    """ 
    x: input 
    ksize: kernel size 
    stride
    filter_out: filters numbers
    name: name of the calculation
    padding: VALID - no padding, SAME - keep the output size same as input size
    activate: RELU - relu or SIGMOID  -sigmoid, TANH - tanh
    """
    with tf.variable_scope(name):
        #Get input dimention
        filter_in = x.get_shape()[-1]
        
        stddev = 1. / tf.sqrt(tf.cast(filter_out, tf.float32))
        #use random uniform to initialize weight
        weight_initializer = tf.random_uniform_initializer(minval=-stddev, maxval=stddev, dtype=tf.float32)
        #use random uniform to initialize bias
        bias_initializer = tf.random_uniform_initializer(minval=-stddev, maxval=stddev, dtype=tf.float32)
        #kernel shape is [kenel size, kernel size, filter in size, filter out size]
        shape = [ksize, ksize, filter_in, filter_out]
        #set kernel
        kernel = get_variable('kernel', shape, weight_initializer)
        #set bias, bias shape is [filter_out]
        bias = get_variable('bias', [filter_out], bias_initializer)
        #conv2d
        conv = tf.nn.conv2d(x, kernel, [1, stride, stride, 1], padding=padding)
        #add conv result with bias
        out = tf.nn.bias_add(conv, bias)
        #activate           
        if activate == 'SIGMOID':
            out = tf.nn.sigmoid(out)
        elif activate == 'TANH':
            out = tf.nn.tanh(out)
        else:
            out = tf.nn.relu(out)
        return out
    
    
def max_pool(x, ksize, stride, name, padding):
    """ x: input
        ksize: kernel size
        stride: stride
        name: name of the calculation
        padding: VALID - no padding, SAME - keep the output size same as input size
    """
    return tf.nn.max_pool(x, [1, ksize, ksize, 1], [1, stride, stride, 1], name=name, padding=padding)

def avg_pool(x, ksize, stride, name, padding):
    """ average pool
        x: input
        ksize: kernel size
        stride: stride
        name: name of the calculation
        padding: VALID - no padding, SAME - keep the output size same as input size
    """    
    return tf.nn.avg_pool(x, [1, ksize, ksize, 1],[1, stride, stride, 1], name=name, padding=padding)

def flatten(x):
    """Reshape x to a list(one dimesion)
    """    
    shape = x.get_shape().as_list()
    dim = 1
    for i in range(1, len(shape)):
        dim *= shape[i]
    return tf.reshape(x, [-1, dim]), dim

def fc_layer(x, i_size, o_size, name, activate = 'NONE'):
    """Full connection layer
        x:
        i_size: input size
        o_size: output size
        name: name of the calculation
        activate: RELU - relu or SIGMOID  -sigmoid, TANH - tanh
    """
    with tf.variable_scope(name) as scope:
        w = tf.get_variable('w', shape=[i_size, o_size], dtype='float')
        b = tf.get_variable('b', shape=[o_size], dtype='float')
        out = tf.nn.xw_plus_b(x, w, b, name=scope.name)
        
         #activate           
        if activate == 'SIGMOID':
            out = tf.nn.sigmoid(out)
        elif activate == 'TANH':
            out = tf.nn.tanh(out)
        elif activate == 'RELU'
            out = tf.nn.relu(out)
    
        return out

def drop_out(x, rate, name):
    """drop out to prevent overfit, it should only used in training, not in test
        x: input
        keep_prob: probability of drop out, normally is 0.5
        name: name of the calculation
        
    """
    return tf.nn.dropout(x, rate=rate, name=name)

3. CIFAR-10的PYTHON格式数据读取

cifar10的类封装了数据文件读取, 图像增强 和 随机顺序的批量返回。
图像增强功能包括了随机改变色调和随机翻转两种功能。在批量获取数据时,通过augument参数为True或False来控制。

import pickle
import numpy as np
import random
import tensorflow as tf

def load(file_name):
    with open(file_name, 'rb') as fo:
        data = pickle.load(fo, encoding='bytes')
        return data  

class cifar10(object): 
    def __init__(self):
        self.train_indexs = list()
        self.test_indexs = list()
        self.train_images, self.train_labels = self._get_train()
        self.test_images, self.test_labels = self._get_test()
        self.label_dic = {0:'aircraft', 1:'car',2:'bird',3:'cat',4:'deer',5:'dog',6:'frog',7:'horse',8:'ship',9:'truck'}

        
    def _get_train(self):
        train_labels = []        
        data1 = load('cifar-10-batches-py/data_batch_1')
        x1 = np.array(data1[b'data'])
        #x1 = x1.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32*32*3)
        y1 = data1[b'labels']
        train_data = np.array(x1)
        train_labels = np.array(y1)
        
        data2 = load('cifar-10-batches-py/data_batch_2')
        x2 = np.array(data2[b'data'])
        #x2 = x2.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32*32*3)
        y2 = data2[b'labels']
        train_data = np.append(train_data, x2)
        train_labels = np.append(train_labels, y2)

        data3 = load('cifar-10-batches-py/data_batch_3')
        x3 = np.array(data3[b'data'])
        #x3 = x3.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32*32*3)
        y3 = np.array(data3[b'labels']).reshape(10000)
        train_data = np.append(train_data, x3)
        train_labels = np.append(train_labels, y3)

        data4 = load('cifar-10-batches-py/data_batch_4')
        x4 = np.array(data4[b'data'])
        #x4 = x4.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32*32*3)
        y4 = np.array(data4[b'labels']).reshape(10000)
        train_data = np.append(train_data, x4)
        train_labels = np.append(train_labels, y4)
        
        data5 = load('cifar-10-batches-py/data_batch_5')
        x5 = np.array(data4[b'data'])
        #x5 = x5.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32*32*3)
        y5 = np.array(data4[b'labels']).reshape(10000)
        train_data = np.append(train_data, x5)
        train_labels = np.append(train_labels, y5)
        
        train_data = train_data.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32,32,3)
        train_labels.astype(np.int64)
        #train_data, train_labels= self._append_distort_images(train_data, train_labels)
        
        #for item in labels:
        #    train_labels.append(item)
        #print('image shape:',np.shape(train_data))
        #print('label shape:',np.shape(train_labels))  
        if len(train_data) != len(train_labels):
            assert('train images ' + str(len(train_data))+' doesnt equal to train labels' + str(len(train_labels)))
            
        print('train set length: '+str(len(train_data)))
        return train_data, train_labels
 
    def _get_test(self):
        test_labels = list()
        data1 = load('cifar-10-batches-py/test_batch')
        x = np.array(data1[b'data'])
        x = x.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32,32,3)
        y = data1[b'labels']
        for item in y:
            test_labels.append(item)
        print('test image shape:',np.shape(x))
        print('test label shape:',np.shape(test_labels))        
        print('test set length: '+str(len(x)))
        return x, test_labels
    
    def _resize(self,image):
        resized_image = np.ndarray.reshape(image,(32,32,3))[2:30,2:30,0:3] 
        #print(resized_image.shape)
        return resized_image
    
    def random_flipper(self,image):
        if random.random() < 0.5:
            swap_time = int(len(image)/2)
            for i in range(swap_time):
                image[[i,len(image)-i-1],:] = image[[len(image)-i-1,i],:]
        return image
        
    def image_distort(self,image):
        
        return image
    
    def random_bright(self, image, delta=32):
        if random.random() < 0.5:
            delta_r = int(random.uniform(-delta, delta))
            delta_g = int(random.uniform(-delta, delta))
            delta_b = int(random.uniform(-delta, delta))
            image = image.transpose(2,1,0)
            #print(1)
            #print(np.shape(image))
            
            R = image[0] + delta_r
            G = image[0] + delta_g
            B = image[0] + delta_b
            
            image = np.asarray([R,G,B]).transpose(2,1,0) 
            #print(2)
            #print(np.shape(image))
            image = image.clip(min=0, max=255)
        return image
   
    def get_train_batch(self,batch_size=128, augument = True):
        batch_image = list()
        batch_label = list()
        data_index = list()
        i = 0
        while i < batch_size:
            index = random.randint(0, len(self.train_images)-1)
            if not index in self.train_indexs:
                i += 1
                d = self.train_images[index]
                if augument:
                    d = self.random_bright(self.random_flipper(d))
                batch_image.append(self._resize(d))
                batch_label.append(self.train_labels[index])
                self.train_indexs.append(index)
                data_index.append(index)
                if len(self.train_indexs) >=  len(self.train_images):
                    self.train_indexs.clear()
        return batch_image, batch_label, data_index
        
    def get_test_batch(self,batch_size=10000):
        batch_image = list()
        batch_label = list()
        data_index = list()
        i = 0
        while i < batch_size:
            index = random.randint(0, len(self.test_images)-1)
            if not index in self.test_indexs:
                i += 1
                d = self.test_images[index]
                batch_image.append(self._resize(d)) 
                batch_label.append(self.test_labels[index])
                self.test_indexs.append(index)
                data_index.append(index)
                if len(self.test_indexs) >=  len(self.test_images):
                    self.test_indexs.clear()
        return batch_image, batch_label,data_index

你可能感兴趣的:(Tensorflow,编程小白的机器学习笔记)