Le-Net5是由Yann LeCun于1998年在论文Gradient-based learning applied to document recognition中提出的,在论文中它主要用于识别MNIST手写数字的识别率。此处我们要来改造一下这个网络,已用于CIFAR-10数据的训练和识别。
此处最大的区别在于,MNIST手写数字是32x32的单色图片,而CIRAR-10则是32x32的RGB图像,使用Le-Net5的网络来训练效果会如何呢? 让我们来尝试一下。
这个模型的基本结构如下这张图中所示,在原论文中有大量的细节处理,此处我们主要的目的并非研究模型算法,而是学习如何使用Tensorflow,因此略去了一些细节内容,采用了一个简化的模型。
输入是一张32x32的单色图片。
第一层卷积层,采用5x5的过滤器,步长为1,输入通道为3, 输出通道为6, 生成数据为28x28x6,激活函数采用sigmoid。
第二层池化层,采用2x2的过滤器,步长为1,生成数据为14x14x6,激活函数采用sigmoid。
第三层卷积层,依旧采用5x5的过滤器, 步长为1,输入通道为6, 输出通道为16, 生成数据为10x10x16,激活函数采用sigmoid。(在原论文中,此处并非全连接,而是采用一种不对称的连接方式,因为实现起来太过复杂,此处改为全连接。)
第四层池化层,采用2x2的过滤器,步长为2,生成数据为5x5x16,激活函数采用sigmoid。
第五层依旧是卷积层,采用5x5的过滤器, 步长为1,输入通道为16, 输出通道为120, 生成数据为1x1x120,激活函数采用tanh。
第五层是展开层,根据上一层的数据量,该层为120= 1x1x120。因为这一层没有对数据做任何改变,仅仅是展开,因此一般不被看做一个有效层。
第六层是全连接层,节点数量是120。
第七层依旧是全连接层,节点数是84。
最后一层采用softmax来输出预测结果。(在原论文中此处采用RBF,而非softmax。)
CIFAR-10数据集包含60000张32x32的彩色图片,分别属于10个类别,每个类别分别有6,0000张图片。其中50000张图片被作为训练集,而另外10000张图片则被作为测试集。
该数据有3种格式可供选择,python格式,matlab格式和二进制格式。此处我们使用的是python格式。该数据需要导入pickle模块,并使用pickle.load方法来读取。数据被读取进来后,他的数据结构是一个数据字典。
他的data部分是10000x3072的numpy数组,数据类型为unit8。没一张图片数据中,前1024字节是红色通道的数据,中间1024字节是绿色通道数据,最后的1024字节是蓝色通道数据。
他的label部分是10000个数字,数字范围是0-9分别代表10个类别。
关于CIFAR-10数据集更多信息,可以在神经网络的大牛Alex Krizhevsky的个人网站上找到。 http://www.cs.toronto.edu/~kriz/cifar.html
我们构建一个LeNet5类来封装网络结构。
在论文中有提到,输入数据是经过归一化处理的,值空间大约在-0.1到1.175之间,这样可以加速训练速度,因此,此处加入了归一化处理,使用tensorflow函数l2_normalize。
为了让代码更加易读,我将卷积层,池化层和全连接层的实现代码封装在一个叫tf_general的模块内。tf_general的代码在文章尾部的附注中给出。
另外,根据论文中描述,Le-Net5采用的激活函数是sigmoid而非我们现在最常用的LeRU。
代码如下:
import tensorflow as tf
import tf_general as tfg
class LeNet5(object):
def __init__(self, x, n_class=10, drop_rate=0):
self.input = x
self.n_class = n_class
self.drop_rate = drop_rate
self._build_net()
def _build_net(self):
with tf.name_scope('norm'):
self.x_norm = tf.nn.l2_normalize(tf.cast(self.input, tf.float32),axis=1)
with tf.name_scope('conv_1'):
self.conv1 = tfg.conv2d(self.x_norm, 5, 1, 6, 'conv1', 'VALID')
print('conv_1: ', self.conv1.get_shape())
with tf.name_scope('pool_1'):
self.pool1 = tfg.avg_pool(self.conv1, 2, 2, 'pool1', 'VALID')
print('pool_1: ', self.pool1.get_shape())
with tf.name_scope('conv_2'):
self.conv2 = tfg.conv2d(self.pool1, 5, 1, 16, 'conv2', 'VALID','SIGMOID')
print('conv_2: ', self.conv2.get_shape())
with tf.name_scope('pool_2'):
self.pool2 = tfg.avg_pool(self.conv2, 2, 2, 'pool2', 'VALID')
print('pool_2: ', self.pool2.get_shape())
with tf.name_scope('conv_3'):
self.conv3 = tfg.conv2d(self.pool2, 5, 1, 120, 'conv3', 'VALID','TANH')
print('conv_3:', self.conv3.get_shape())
with tf.name_scope('flat_1'):
self.flat1, self.flat_dim = tfg.flatten(self.conv3)
print('flat_1:', self.flat1.get_shape())
with tf.name_scope('fc_2'):
self.fc2 = tfg.fc_layer(self.flat1, 120, 84, 'fc2')
print('fc_2 ', self.fc2.get_shape())
with tf.name_scope('fc_3'):
self.fc3 = tfg.fc_layer(self.fc2, 84, 10, 'fc4')
print('fc_3: ', self.fc3.get_shape())
with tf.name_scope('drop_out'):
self.drop1 = tfg.drop_out(self.fc3, self.drop_rate, 'drop_out')
print('drop_out: ', self.drop1.get_shape())
with tf.name_scope('prediction'):
self.prediction = tf.nn.softmax(self.drop1)
print('prediction: ', self.prediction.get_shape())
在训练开始之前先要定义超参。
此处我们定义了6个超参
FLAGS = tf.flags.FLAGS
try:
tf.flags.DEFINE_string('f', '', 'kernel')
#super parameters
tf.flags.DEFINE_integer('epoch', 30000, 'epoch')
tf.flags.DEFINE_integer('batch_size',200, 'batch size')
tf.flags.DEFINE_integer('test_size', 200, 'test size')
tf.flags.DEFINE_float('lr', 0.01, 'learning rate')
tf.flags.DEFINE_float('keep_prob', 0.8, 'keep prob for drop lay')
tf.flags.DEFINE_boolean('augument', True, 'if image augument is applied')
#other parameters
tf.flags.DEFINE_float('ckpt_frequency', 250, 'frequency to save checkpoint')
tf.flags.DEFINE_boolean('restore', False, 'restore from checkpoint and run test')
print('parameters were defined.')
except:
print('parameters have been defined.')
第一步 定义输入,采用供给数据的方式,因此需要定义一个和输入数据结构一样的placeholder叫做x。再定义y_作为标签的placeholder。
此处要注意的是,LeNet5原本是用与单色图片上的,而我们现在的数据却是RGB彩色的,也就是说输入是3通道的而非单通道。因此我们将输入x的结构定义为从原本的[None, 32,32,1]改为[None, 32,32,3], 输入为3通道,至于训练效果如何,让我们拭目以待。
with tf.name_scope('input'):
x = tf.placeholder(tf.float32, [None, 32,32,3], name='x_input')
y_ = tf.placeholder(tf.int64, [None], name='labels')
第二步 生成LeNet5对象,x作为输入,y作为他的输出,即预测结果。
with tf.name_scope('prediction'):
le_net5 = LetNet5(x)
y = le_net5.prediction
第三步 计算交叉熵做为损失函数。
with tf.name_scope('cross_entropy'):
cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,
labels=y_,
name="cross_entropy_per_example"))
第四步 使用AdagradOptimizer进行优化。该算法可以自动变更学习速率, 只是需要设定一个全局的学习速率ϵ,但是这并非是实际学习速率,实际的速率是与以往参数的模之和的开方成反比的。
with tf.name_scope('train_step'):
train_step = tf.train.AdagradOptimizer(FLAGS.lr).minimize(cross_entropy)
第五步 计算训练正确率
argmax方法返回指定维度的向量中最大值的索引。y是一个Nx10的numpy数组,第一维度表示是该批次训练数据的个数,第二维度则是softmax的输出,是10个0-1之间的概率值。我们制定对第二维度(索引值为1)求argmax将softmax的输出转换成0-9的数值。然后使用equal将其与数据标签y_比对,相等的值为True即1,不相等的为False即0,。对该结果求平均值,就可以得到预测正确的概率。
prediction =tf.argmax(y, 1)
accuracy = tf.reduce_mean(tf.cast( tf.equal(prediction,y_), tf.float32))
至此整个训练的图已经完成,接下来我们要读取数据,启动session进行运算了。
在训练过程中,每完成250次训练,我们会保存一下模型,并进行一次测试,测试时不做优化,也不drop参数。由于测试数据量较大,我们依旧需要分批进行,并将测试正确率打印出来。
data = cifar10();
if __name__ == '__main__':
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver(max_to_keep=1)
for i in range(FLAGS.epoch):
train_image, train_label,_ = data.get_train_batch(FLAGS.batch_size)
loss, _,accuracy_rate = sess.run([cross_entropy, train_step,accuracy],
feed_dict={x:train_image, y_:train_label})
if (i+1) % FLAGS.ckpt_frequency == 0: #保存预测模型
saver.save(sess,ckpt_dir+'cifar10_'+str(i+1)+'.ckpt',global_step=i+1)
acc_accuracy = 0
for j in range(int(10000/FLAGS.test_size)):
test_image, test_label,test_index = data.get_test_batch(FLAGS.test_size)
accuracy_rate, output = sess.run([accuracy,prediction],feed_dict=
{keep_prob: 1, x:test_image, y_:test_label})
acc_accuracy += accuracy_rate
accuracy_rate = acc_accuracy/10000*FLAGS.test_size
print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) + ' iter ' + str(i) + ', Test accuracy:' +str(round(accuracy_rate*100,2))+'%')
tf.reset_default_graph()
让我们来看一下训练结果如何。此处进行了三次独立的训练,迭代20000次后正确率仅仅为36%,而且趋于平缓。这个结果显然很不理想。下一次,我们就来看看如何增强Le-Net5。
参考 EVA HUA的“Lenet5设计理解——咬文嚼字系列”。
https://blog.csdn.net/whatwho_518/article/details/79724602
import tensorflow as tf
def get_variable(name, shape, initializer, regularizer=None, dtype='float', trainable=True):
collections = [tf.GraphKeys.GLOBAL_VARIABLES]
return tf.get_variable(name,
shape=shape,
initializer=initializer,
regularizer=regularizer,
collections=collections,
dtype=dtype,
trainable=trainable)
#tf.get_variable_scope().reuse_variables()
def conv2d(x, ksize, stride, filter_out, name, padding='VALID', activate = 'RELU'):
"""
x: input
ksize: kernel size
stride
filter_out: filters numbers
name: name of the calculation
padding: VALID - no padding, SAME - keep the output size same as input size
activate: RELU - relu or SIGMOID -sigmoid, TANH - tanh
"""
with tf.variable_scope(name):
#Get input dimention
filter_in = x.get_shape()[-1]
stddev = 1. / tf.sqrt(tf.cast(filter_out, tf.float32))
#use random uniform to initialize weight
weight_initializer = tf.random_uniform_initializer(minval=-stddev, maxval=stddev, dtype=tf.float32)
#use random uniform to initialize bias
bias_initializer = tf.random_uniform_initializer(minval=-stddev, maxval=stddev, dtype=tf.float32)
#kernel shape is [kenel size, kernel size, filter in size, filter out size]
shape = [ksize, ksize, filter_in, filter_out]
#set kernel
kernel = get_variable('kernel', shape, weight_initializer)
#set bias, bias shape is [filter_out]
bias = get_variable('bias', [filter_out], bias_initializer)
#conv2d
conv = tf.nn.conv2d(x, kernel, [1, stride, stride, 1], padding=padding)
#add conv result with bias
out = tf.nn.bias_add(conv, bias)
#activate
if activate == 'SIGMOID':
out = tf.nn.sigmoid(out)
elif activate == 'TANH':
out = tf.nn.tanh(out)
else:
out = tf.nn.relu(out)
return out
def max_pool(x, ksize, stride, name, padding):
""" x: input
ksize: kernel size
stride: stride
name: name of the calculation
padding: VALID - no padding, SAME - keep the output size same as input size
"""
return tf.nn.max_pool(x, [1, ksize, ksize, 1], [1, stride, stride, 1], name=name, padding=padding)
def avg_pool(x, ksize, stride, name, padding):
""" average pool
x: input
ksize: kernel size
stride: stride
name: name of the calculation
padding: VALID - no padding, SAME - keep the output size same as input size
"""
return tf.nn.avg_pool(x, [1, ksize, ksize, 1],[1, stride, stride, 1], name=name, padding=padding)
def flatten(x):
"""Reshape x to a list(one dimesion)
"""
shape = x.get_shape().as_list()
dim = 1
for i in range(1, len(shape)):
dim *= shape[i]
return tf.reshape(x, [-1, dim]), dim
def fc_layer(x, i_size, o_size, name, activate = 'NONE'):
"""Full connection layer
x:
i_size: input size
o_size: output size
name: name of the calculation
activate: RELU - relu or SIGMOID -sigmoid, TANH - tanh
"""
with tf.variable_scope(name) as scope:
w = tf.get_variable('w', shape=[i_size, o_size], dtype='float')
b = tf.get_variable('b', shape=[o_size], dtype='float')
out = tf.nn.xw_plus_b(x, w, b, name=scope.name)
#activate
if activate == 'SIGMOID':
out = tf.nn.sigmoid(out)
elif activate == 'TANH':
out = tf.nn.tanh(out)
elif activate == 'RELU'
out = tf.nn.relu(out)
return out
def drop_out(x, rate, name):
"""drop out to prevent overfit, it should only used in training, not in test
x: input
keep_prob: probability of drop out, normally is 0.5
name: name of the calculation
"""
return tf.nn.dropout(x, rate=rate, name=name)
cifar10的类封装了数据文件读取, 图像增强 和 随机顺序的批量返回。
图像增强功能包括了随机改变色调和随机翻转两种功能。在批量获取数据时,通过augument参数为True或False来控制。
import pickle
import numpy as np
import random
import tensorflow as tf
def load(file_name):
with open(file_name, 'rb') as fo:
data = pickle.load(fo, encoding='bytes')
return data
class cifar10(object):
def __init__(self):
self.train_indexs = list()
self.test_indexs = list()
self.train_images, self.train_labels = self._get_train()
self.test_images, self.test_labels = self._get_test()
self.label_dic = {0:'aircraft', 1:'car',2:'bird',3:'cat',4:'deer',5:'dog',6:'frog',7:'horse',8:'ship',9:'truck'}
def _get_train(self):
train_labels = []
data1 = load('cifar-10-batches-py/data_batch_1')
x1 = np.array(data1[b'data'])
#x1 = x1.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32*32*3)
y1 = data1[b'labels']
train_data = np.array(x1)
train_labels = np.array(y1)
data2 = load('cifar-10-batches-py/data_batch_2')
x2 = np.array(data2[b'data'])
#x2 = x2.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32*32*3)
y2 = data2[b'labels']
train_data = np.append(train_data, x2)
train_labels = np.append(train_labels, y2)
data3 = load('cifar-10-batches-py/data_batch_3')
x3 = np.array(data3[b'data'])
#x3 = x3.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32*32*3)
y3 = np.array(data3[b'labels']).reshape(10000)
train_data = np.append(train_data, x3)
train_labels = np.append(train_labels, y3)
data4 = load('cifar-10-batches-py/data_batch_4')
x4 = np.array(data4[b'data'])
#x4 = x4.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32*32*3)
y4 = np.array(data4[b'labels']).reshape(10000)
train_data = np.append(train_data, x4)
train_labels = np.append(train_labels, y4)
data5 = load('cifar-10-batches-py/data_batch_5')
x5 = np.array(data4[b'data'])
#x5 = x5.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32*32*3)
y5 = np.array(data4[b'labels']).reshape(10000)
train_data = np.append(train_data, x5)
train_labels = np.append(train_labels, y5)
train_data = train_data.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32,32,3)
train_labels.astype(np.int64)
#train_data, train_labels= self._append_distort_images(train_data, train_labels)
#for item in labels:
# train_labels.append(item)
#print('image shape:',np.shape(train_data))
#print('label shape:',np.shape(train_labels))
if len(train_data) != len(train_labels):
assert('train images ' + str(len(train_data))+' doesnt equal to train labels' + str(len(train_labels)))
print('train set length: '+str(len(train_data)))
return train_data, train_labels
def _get_test(self):
test_labels = list()
data1 = load('cifar-10-batches-py/test_batch')
x = np.array(data1[b'data'])
x = x.reshape(-1, 3, 32, 32).transpose(0,3,2,1).reshape(-1,32,32,3)
y = data1[b'labels']
for item in y:
test_labels.append(item)
print('test image shape:',np.shape(x))
print('test label shape:',np.shape(test_labels))
print('test set length: '+str(len(x)))
return x, test_labels
def _resize(self,image):
resized_image = np.ndarray.reshape(image,(32,32,3))[2:30,2:30,0:3]
#print(resized_image.shape)
return resized_image
def random_flipper(self,image):
if random.random() < 0.5:
swap_time = int(len(image)/2)
for i in range(swap_time):
image[[i,len(image)-i-1],:] = image[[len(image)-i-1,i],:]
return image
def image_distort(self,image):
return image
def random_bright(self, image, delta=32):
if random.random() < 0.5:
delta_r = int(random.uniform(-delta, delta))
delta_g = int(random.uniform(-delta, delta))
delta_b = int(random.uniform(-delta, delta))
image = image.transpose(2,1,0)
#print(1)
#print(np.shape(image))
R = image[0] + delta_r
G = image[0] + delta_g
B = image[0] + delta_b
image = np.asarray([R,G,B]).transpose(2,1,0)
#print(2)
#print(np.shape(image))
image = image.clip(min=0, max=255)
return image
def get_train_batch(self,batch_size=128, augument = True):
batch_image = list()
batch_label = list()
data_index = list()
i = 0
while i < batch_size:
index = random.randint(0, len(self.train_images)-1)
if not index in self.train_indexs:
i += 1
d = self.train_images[index]
if augument:
d = self.random_bright(self.random_flipper(d))
batch_image.append(self._resize(d))
batch_label.append(self.train_labels[index])
self.train_indexs.append(index)
data_index.append(index)
if len(self.train_indexs) >= len(self.train_images):
self.train_indexs.clear()
return batch_image, batch_label, data_index
def get_test_batch(self,batch_size=10000):
batch_image = list()
batch_label = list()
data_index = list()
i = 0
while i < batch_size:
index = random.randint(0, len(self.test_images)-1)
if not index in self.test_indexs:
i += 1
d = self.test_images[index]
batch_image.append(self._resize(d))
batch_label.append(self.test_labels[index])
self.test_indexs.append(index)
data_index.append(index)
if len(self.test_indexs) >= len(self.test_images):
self.test_indexs.clear()
return batch_image, batch_label,data_index