Tensorflow MNIST CNN进阶篇

2016-11-08 陈伟才 人工智能学堂

在TensorFlow入门教程中，我们采用了 Softmax 算法深度学习MNIST图库，短短几行代码，学习准确率92%左右。上一篇文章，我们也学习了更复杂的卷积神经网络CNN算法，本文将使用CNN来学习MNIST，其准确率相对softmax提升不少。

点击图片关注 人工智能学堂

简单回顾MNIST背景

MNIST图片库是28*28的黑白图片，图片中写着0到9共十个数字，在Softmax实现中，仅仅学习了一个Weight参数和一个Batis参数共两个参数。这也直接导致Softmax实习的准确率无法提升。

Weight = tf.Variable(tf.zeros([784, 10]))

Batis = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x, Weight) + Batis)

MNIST卷积神经网络的实现

CNN卷积神经网络深度学习过程，最最主要的两个过程是卷积和池化。池化在之前的CNN文章中没有提到，本文在用到的过程中会详细讲解。

权值参数的初始化

CNN深度学习涉及到大量的权值参数Weights和Batis，所以，对权值参数的初始化工作，我们抽象出两个函数用于参数的初始化，如下面代码片段所示：

def weight_variable(shape):

initial = tf.truncated_normal(shape, stddev=0.1)

return tf.Variable(initial)

def bias_variable(shape):

initial = tf.constant(0.1, shape=shape)

return tf.Variable(initial)

初始化权值参数时，使用了tf.truncated_normal函数，该函数的原型是 tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)。函数的输出是一个tensor，其形状是由输入参数shape进行定义，tensor中的值满足截断正态分布，值的数据类型默认是tf.float32。weight_variable([5, 5, 1, 32])则返回5*5*1*32的tensor变量。

初始化Batis，相对于Weight则相对简单很多，也是返回一个定义好shape的tensor，该tensor是一个常量进行初始化，初始值为0.1。bias_variable([32])则返回一个数组，该数组有32个元素，元素的值均为0.1。

卷积和池化

卷积和池化是CNN深度学习最为重要的两个阶段，先卷积再池化，且包含多层的卷积和池化，即卷积 -> 池化 ->卷积 -> 池化 ->卷积 -> 池化 -> ...如此循坏。所以，我们把卷积和池化涉及到的函数也抽象出来，如下面所示：

def conv2d(x, W):

return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):

return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],

strides=[1, 2, 2, 1], padding='SAME')

conv2d函数返回的是2维的卷积核。该函数最终调用的是tf.nn.conv2d，其原型为tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)。其中输入参数input和filter均为四维tensor，strides则是一个一维的整形tensor，其中元素依次表示input参数不同维度的步幅。所以stries只包含四个元素的数组。这里我们默认strides四个维度的步幅都是1。padding则表示填充算法，有“VALID”和“SAME”两种方式。

关于padding VALID和SAME的填充算法，https://www.tensorflow.org/versions/r0.11/api_docs/python/nn.html#convolution , 具体描述如下：

1、padding = VALID

new_height = new_width = (W – F + 1) / S （结果向上取整）

2、padding = SAME

new_height = new_width = W / S （结果向上取整）

在高度上需要pad的像素数为

pad_needed_height = (new_height – 1) × S + F - W

根据上式，输入矩阵上方添加的像素数为：pad_top = pad_needed_height / 2 （结果取整）。下方添加的像素数为：pad_down = pad_needed_height - pad_top。以此类推，在宽度上需要pad的像素数和左右分别添加的像素数为pad_needed_width = (new_width – 1) × S + F - W，pad_left = pad_needed_width / 2 （结果取整），pad_right = pad_needed_width – pad_left。

函数max_pool则是对池化的函数封装，采用的是最大值池化。其调用的函数是tf.nn.max_pool，其原型为tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC', name=None)。

第一层卷积

接下来，我们可以进行第一层的卷积和池化啦。input是28*28*1的图像（宽28，高28，由于是黑白色，所以深度为1），我们进行第一层卷积滤波器大小为5*5*1，输出32通道，所以涉及到的权值参数分别为：

W_conv1 = weight_variable([5, 5, 1, 32])

b_conv1 = bias_variable([32])

由于输入需要4维的tensor，所以我们需要将输入图像参数x reshape为4维的，如下所示：

x_image = tf.reshape(x, [-1, 28, 28, 1])

接下来我们对输入x_image进行卷积和池化，这里的卷积采用的是relu（Rectified linear unit），修正线性单元，如下所示：

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

h_pool1 = max_pool_2x2(h_conv1)

经过第一层卷积后，我们看下output是多大。由于我们采用的是SAME的padding算法，我们按照上面的计算公式计算到上下左右各需要增加2位，所以输入从28*28*1，填充为32*32*1。

由于滤波器大小为5，且步幅为1，所以，32 -5 +1 = 28，即经过第一轮卷积后output为28*28*32。然后再经过max pool池化话，就变为14*14*32了。

第二层卷积

第一层卷积之后，输出的通道是32，在第二层卷积，我们将构建更深的卷积通道。第二层卷积的输入是5*5*32，所以，第二层的weights是5*5*32，输出是64通道，如下所示：

W_conv2 = weight_variable([5, 5, 32, 64])

b_conv2 = bias_variable([64])

同样，我们进行第二层的卷积和池化：

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

h_pool2 = max_pool_2x2(h_conv2)

我们再次来计算下第二层卷积和池化之后，输出的shape是什么。同样我们计算出padding为SAME时，上下左右需要padding的大小为2，所以第二层卷积的输入是18*18*32。当滤波器大小为5，strides为1时，18-5+1=14，则第二层卷积后的shape是14*14*64。最后经过max pool池化后，output的shap是7*7*64。

全连接层

现在，图片尺寸减小到7x7，我们加入一个有1024个神经元的全连接层，用于处理整个图片。我们把池化层输出的张量reshape成一些向量，乘上权重矩阵，加上偏置，然后对其使用ReLU，如下所示：

W_fc1 = weight_variable([7 * 7 * 64, 1024])

b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

全链接后输出是一个1024元素的一维tensor。

Dropout

为了减少过拟合，我们在输出层之前加入dropout。我们用一个placeholder来代表一个神经元的输出在dropout中保持不变的概率。这样我们可以在训练过程中启用dropout，在测试过程中关闭dropout。 TensorFlow的tf.nn.dropout操作除了可以屏蔽神经元的输出外，还会自动处理神经元输出值的scale。所以用dropout的时候可以不用考虑scale。如下所示：

keep_prob = tf.placeholder("float")

h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

输出层

最后，我们需要将学习结果进行输出，上面全连接后，通道变成1024，而学习的是0到9十个数字的概率，所以，最后通道是10。如下所示：

W_fc2 = weight_variable([1024, 10])

b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

训练和评估

为了进行训练和评估，我们使用与之前简单的单层SoftMax神经网络模型几乎相同的一套代码，只是我们会用更加复杂的ADAM优化器来做梯度最速下降，在feed_dict中加入额外的参数keep_prob来控制dropout比例。然后每100次迭代输出一次日志。

执行结果

# ~/tensorflow/bin/python2.7 tf_cnn_mnist.py

Extracting /tmp/MNIST_data/train-images-idx3-ubyte.gz

Extracting /tmp/MNIST_data/train-labels-idx1-ubyte.gz

Extracting /tmp/MNIST_data/t10k-images-idx3-ubyte.gz

Extracting /tmp/MNIST_data/t10k-labels-idx1-ubyte.gz

step 0, training accuracy 0.16

step 100, training accuracy 0.84

step 200, training accuracy 0.9

step 300, training accuracy 0.88

step 400, training accuracy 0.96

step 500, training accuracy 0.92

step 600, training accuracy 0.98

step 700, training accuracy 0.98

step 800, training accuracy 0.88

step 900, training accuracy 1

step 1000, training accuracy 0.96

step 1100, training accuracy 0.96

step 1200, training accuracy 0.96

step 1300, training accuracy 0.98

step 1400, training accuracy 0.94

step 1500, training accuracy 0.98

step 1600, training accuracy 0.98

step 1700, training accuracy 0.94

step 1800, training accuracy 0.94

step 1900, training accuracy 1

step 2000, training accuracy 0.9

step 2100, training accuracy 0.92

step 2200, training accuracy 0.92

step 2300, training accuracy 1

step 2400, training accuracy 0.98

step 2500, training accuracy 0.98

step 2600, training accuracy 1

step 2700, training accuracy 1

step 2800, training accuracy 1

step 2900, training accuracy 0.96

step 3000, training accuracy 0.98

step 3100, training accuracy 0.98

step 3200, training accuracy 0.98

step 3300, training accuracy 1

step 3400, training accuracy 0.98

step 3500, training accuracy 1

step 3600, training accuracy 1

step 3700, training accuracy 0.96

step 3800, training accuracy 1

step 3900, training accuracy 0.98

step 4000, training accuracy 1

step 4100, training accuracy 0.96

step 4200, training accuracy 0.98

step 4300, training accuracy 1

step 4400, training accuracy 0.98

step 4500, training accuracy 0.98

step 4600, training accuracy 1

step 4700, training accuracy 0.98

step 4800, training accuracy 1

step 4900, training accuracy 0.96

step 5000, training accuracy 0.96

step 5100, training accuracy 0.98

step 5200, training accuracy 1

step 5300, training accuracy 1

step 5400, training accuracy 1

step 5500, training accuracy 1

step 5600, training accuracy 0.98

step 5700, training accuracy 1

step 5800, training accuracy 0.96

step 5900, training accuracy 1

step 6000, training accuracy 0.98

step 6100, training accuracy 0.98

step 6200, training accuracy 0.98

step 6300, training accuracy 0.98

step 6400, training accuracy 0.98

step 6500, training accuracy 1

step 6600, training accuracy 1

step 6700, training accuracy 1

step 6800, training accuracy 1

step 6900, training accuracy 1

step 7000, training accuracy 1

step 7100, training accuracy 0.98

step 7200, training accuracy 1

step 7300, training accuracy 0.98

step 7400, training accuracy 1

step 7500, training accuracy 0.98

step 7600, training accuracy 1

step 7700, training accuracy 1

step 7800, training accuracy 0.98

step 7900, training accuracy 1

step 8000, training accuracy 1

step 8100, training accuracy 0.98

step 8200, training accuracy 1

step 8300, training accuracy 1

...

从上面的输出结果可以看出，采用CNN卷积神经网络，其学习的准确率最低高达98%，比Softmax的准确率92%提升不少！

完整代码

完整代码见github地址：https://github.com/chenweicai/tensorflow-study/blob/master/tf_cnn_mnist.py , 具体内容如下：

import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data

def weight_variable(shape):

initial = tf.truncated_normal(shape, stddev=0.1)

return tf.Variable(initial)

def bias_variable(shape):

initial = tf.constant(0.1, shape=shape)

return tf.Variable(initial)

def conv2d(x, W):

return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):

return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

mnist = input_data.read_data_sets("/tmp/MNIST_data", one_hot=True)

x = tf.placeholder(tf.float32, [None, 784])

W = tf.Variable(tf.zeros([784, 10]))

b = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x, W) + b)

y_ = tf.placeholder(tf.float32, [None, 10])

# convolution layer 1

W_conv1 = weight_variable([5, 5, 1, 32])

b_conv1 = bias_variable([32])

x_image = tf.reshape(x, [-1, 28, 28, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

h_pool1 = max_pool_2x2(h_conv1)

# convolution layer 2

W_conv2 = weight_variable([5, 5, 32, 64])

b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

h_pool2 = max_pool_2x2(h_conv2)

# full convolution

W_fc1 = weight_variable([7 * 7 * 64, 1024])

b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])

h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# dropout

keep_prob = tf.placeholder("float")

h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# output layer, softmax

W_fc2 = weight_variable([1024, 10])

b_fc2 = bias_variable([10])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

# model training

cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv))

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

sess = tf.InteractiveSession()

sess.run(tf.initialize_all_variables())

for i in range(20000):

batch = mnist.train.next_batch(50)

if i%100 == 0:

train_accuracy = accuracy.eval(feed_dict={x:batch[0], y_: batch[1], keep_prob: 1.0})

print "step %d, training accuracy %g"%(i, train_accuracy)

train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

print "test accuracy %g"%accuracy.eval(feed_dict={

x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

点击图片关注人工智能学堂

Tensorflow MNIST CNN进阶篇

你可能感兴趣的:(Tensorflow MNIST CNN进阶篇)