对比http://blog.csdn.net/piaoxuezhong/article/details/78916872的结果,softmax分类器准确性优于两层神经网络结构的结果,之前在cs231n课程中,老师提到了这一点,神经网络层数达到一定复杂度后,神经网络才能发挥出比较大的优越性能,本篇使用TensorFlow实现卷积神经网络(CNN),测试一下效果。
测试数据仍是MNIST数据集,具体说明详见上一篇。
先附一下程序代码,然后详解:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import tensorflow as tf
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
#设置学习率,迭代步数,批大小
learning_rate = 0.001
num_steps = 500
batch_size = 128
display_step = 50
# Network Parameters
num_input = 784 #MNIST图片尺寸: 28*28)
num_classes = 10 #MNIST分类数 (数字0-9)
dropout = 0.75 # cnn中的丢弃神经元概率
# tf Graph input
X = tf.placeholder(tf.float32, [None, num_input])
Y = tf.placeholder(tf.float32, [None, num_classes])
keep_prob = tf.placeholder(tf.float32) # dropout (保留概率保存)
# Conv2D, relu激励函数
def conv2d(x, W, b, strides=1):
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)
def maxpool2d(x, k=2):
# 池化
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
padding='SAME')
# Create model
def conv_net(x, weights, biases, dropout):
# MNIST data input is a 1-D vector of 784 features (28*28 pixels)
# Reshape to match picture format [Height x Width x Channel]
# Tensor input become 4-D: [Batch Size, Height, Width, Channel]
x = tf.reshape(x, shape=[-1, 28, 28, 1])#-1表示该维度值由其他维的值和总的值决定,这里代表一次输入的数量
# 第一个卷积
conv1 = conv2d(x, weights['wc1'], biases['bc1'])
# 池化,降采样
conv1 = maxpool2d(conv1, k=2)
# 第二个卷积
conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
# 池化,降采样
conv2 = maxpool2d(conv2, k=2)
# 全连接层,先对conv2的输出进行变形
fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1)
# 随机失活
fc1 = tf.nn.dropout(fc1, dropout)
# 输出分类结果
out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
return out
# weights和biases设置,两者是对应的
weights = {
# 卷积1:5x5 conv, 1 input, 32 outputs
'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
# 卷积2:5x5 conv, 32 inputs, 64 outputs
'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
# 全连接层, 7*7*64 inputs, 1024 outputs
'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
# 输出层:1024 inputs, 10 outputs
'out': tf.Variable(tf.random_normal([1024, num_classes]))
}
biases = {
'bc1': tf.Variable(tf.random_normal([32])),
'bc2': tf.Variable(tf.random_normal([64])),
'bd1': tf.Variable(tf.random_normal([1024])),
'out': tf.Variable(tf.random_normal([num_classes]))
}
# 构建卷积实例
logits = conv_net(X, weights, biases, keep_prob)
prediction = tf.nn.softmax(logits)
# 定义损失函数和最优化方法
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)
# 模型评估
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
#参数全部初始化
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for step in range(1, num_steps+1):
batch_x, batch_y = mnist.train.next_batch(batch_size)
sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, keep_prob: 0.8})
if step % display_step == 0 or step == 1:
# 计算批损失和准确度
loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
Y: batch_y,
keep_prob: 1.0})
print("Step " + str(step) + ", Minibatch Loss= " + \
"{:.4f}".format(loss) + ", Training Accuracy= " + \
"{:.3f}".format(acc))
print("Optimization Finished!")
# 计算测试集的准确度
print("Testing Accuracy:", \
sess.run(accuracy, feed_dict={X: mnist.test.images[:256],
Y: mnist.test.labels[:256],
keep_prob: 1.0}))
执行结果:
Step 1, Minibatch Loss= 62928.7383, Training Accuracy= 0.141
Step 50, Minibatch Loss= 3887.7532, Training Accuracy= 0.773
Step 100, Minibatch Loss= 2378.7659, Training Accuracy= 0.867
Step 150, Minibatch Loss= 932.5720, Training Accuracy= 0.938
Step 200, Minibatch Loss= 1575.1481, Training Accuracy= 0.922
Step 250, Minibatch Loss= 1262.4065, Training Accuracy= 0.945
Step 300, Minibatch Loss= 810.5175, Training Accuracy= 0.930
Step 350, Minibatch Loss= 195.0972, Training Accuracy= 0.961
Step 400, Minibatch Loss= 1277.8181, Training Accuracy= 0.922
Step 450, Minibatch Loss= 301.4168, Training Accuracy= 0.961
Step 500, Minibatch Loss= 362.0416, Training Accuracy= 0.961
Optimization Finished!
Testing Accuracy: 0.984375
函数总的实现功能:该CNN网络对输入图像进行两次卷积和池化操作,然后是全连接层,最后是输出层。
输入的图像形状为【N,784】,N表示一次输入图片的个数,这是一种批处理方法。
为了满足卷积函数的输入的需要,reshape函数将输入变为【N,28,28,1】;
然后conv2d函数进行卷积,卷积核操作的结果是【N,28,28,32】;然后进行池化,图像形状变为【N,14,14,32】;
然后第二次卷积和池化操作,到图像形状变为【N,7,7,64】;
然后是全连接,全连接首先将输入形状改变为【N,7*7*64】,全连接之后变为【N,1024】,
最后是输出层,尺寸变为【N,10】,结果是输入的N张图片的标签。
函数的具体实现方式:
程序中定义了weights和bias两个字典数据,weights中wc1,wc2,wd1,out键值分别对应卷积层的卷积核,全连接层和输出层的尺寸;
def conv2d(x, W, b, strides=1):...函数定义卷积操作;
def maxpool2d(x, k=2):...函数定义池化操作;
def conv_net(x, weights, biases, dropout):...函数定义该函数的卷积模型结构:对输入图像进行两次卷积和池化操作,然后是全连接层,最后是输出层;具体详见代码中的步骤说明:
其中语句:conv1 = conv2d(x, weights['wc1'], biases['bc1']),卷积层对每个5 * 5的patch计算出32个特征映射(feature map),它的权值tensor:wc1为[5, 5, 1, 32]. 前两维是patch的大小,第三维是输入通道的数目,最后一维是输出通道的数目,并对每个输出通道加上偏置(bias)。
fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1)
对应的是全连接层,此时图像尺寸被缩减为7 * 7,加入神经元数目为1024的全连接层,将最后池化层的输出结果尺寸变为一维向量,与权值相乘,并加上偏置bd1,结果输入ReLu函数激励。
其他操作大都和之前的类似,这里附录两个主要函数的说明:
(1)tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)input:需要做卷积的输入图像,要求是一个4维的Tensor,shape为[batch, in_height, in_width, in_channels],分别是指[一个batch的图片数量, 图片高度, 图片宽度, 图像通道数],要求类型为float32或float64;
filter:相当于CNN中的卷积核,要求是一个4维的Tensor,shape为[filter_height, filter_width, in_channels, out_channels],具体是指[卷积核高度,卷积核宽度,图像通道数,卷积核个数],其第三维in_channels就是参数input的第四维;
strides:卷积时在图像每一维上的步长,一维的向量,长度为4;
padding:string类型的量,只能是"SAME","VALID"其中之一,这个值决定了不同的卷积方式;
use_cudnn_on_gpu:bool类型,是否使用cudnn加速,默认为true;
结果返回一个Tensor,即feature map,shape为[batch, height, width, channels]。
(2)tf.nn.max_pool(value, ksize, strides, padding, name=None)
value:需要池化的输入,一般池化层接在卷积层后面,所以输入通常是feature map,shape为:[batch, height, width, channels];
ksize:池化窗口的大小,一般是[1, height, width, 1]四维向量,因为不在batch和channels上做池化,所以这两个维度设为1;
strides:和卷积类似,窗口在每一个维度上滑动的步长,一般是[1, stride,stride, 1];
padding:和卷积类似,可以取'VALID' 或'SAME';
返回一个Tensor,类型不变,shape为[batch, height, width, channels]。
从执行结果来看,该CNN的准确度能达到:0.984375,比之前的softmax分类和简单的神经网络分类效果要好。
参考:
https://www.tensorflow.org/get_started/mnist/pros
http://www.tensorfly.cn/tfdoc/tutorials/mnist_pros.html
http://www.jeyzhang.com/tensorflow-learning-notes-2.html
http://lib.csdn.net/article/aimachinelearning/61475