tensorflow学习笔记(5)卷积神经网络(CNN)

对比http://blog.csdn.net/piaoxuezhong/article/details/78916872的结果,softmax分类器准确性优于两层神经网络结构的结果,之前在cs231n课程中,老师提到了这一点,神经网络层数达到一定复杂度后,神经网络才能发挥出比较大的优越性能,本篇使用TensorFlow实现卷积神经网络(CNN),测试一下效果。

测试数据仍是MNIST数据集,具体说明详见上一篇。

先附一下程序代码,然后详解:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import tensorflow as tf

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

#设置学习率,迭代步数,批大小
learning_rate = 0.001
num_steps = 500
batch_size = 128
display_step = 50

# Network Parameters
num_input = 784 #MNIST图片尺寸: 28*28)
num_classes = 10 #MNIST分类数 (数字0-9)  
dropout = 0.75 # cnn中的丢弃神经元概率

# tf Graph input
X = tf.placeholder(tf.float32, [None, num_input])
Y = tf.placeholder(tf.float32, [None, num_classes])
keep_prob = tf.placeholder(tf.float32) # dropout (保留概率保存)

# Conv2D, relu激励函数
def conv2d(x, W, b, strides=1):
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)


def maxpool2d(x, k=2):
    # 池化
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                          padding='SAME')


# Create model
def conv_net(x, weights, biases, dropout):
    # MNIST data input is a 1-D vector of 784 features (28*28 pixels)
    # Reshape to match picture format [Height x Width x Channel]
    # Tensor input become 4-D: [Batch Size, Height, Width, Channel]
    x = tf.reshape(x, shape=[-1, 28, 28, 1])#-1表示该维度值由其他维的值和总的值决定,这里代表一次输入的数量

    # 第一个卷积
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    # 池化,降采样
    conv1 = maxpool2d(conv1, k=2)

    # 第二个卷积
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    # 池化,降采样
    conv2 = maxpool2d(conv2, k=2)

    # 全连接层,先对conv2的输出进行变形
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    # 随机失活
    fc1 = tf.nn.dropout(fc1, dropout)

    # 输出分类结果
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

# weights和biases设置,两者是对应的
weights = {
    # 卷积1:5x5 conv, 1 input, 32 outputs
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 卷积2:5x5 conv, 32 inputs, 64 outputs
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # 全连接层, 7*7*64 inputs, 1024 outputs
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    # 输出层:1024 inputs, 10 outputs
    'out': tf.Variable(tf.random_normal([1024, num_classes]))
}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([num_classes]))
}

# 构建卷积实例
logits = conv_net(X, weights, biases, keep_prob)
prediction = tf.nn.softmax(logits)

# 定义损失函数和最优化方法
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)


# 模型评估
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

#参数全部初始化
init = tf.global_variables_initializer()

with tf.Session() as sess:

    sess.run(init)

    for step in range(1, num_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y, keep_prob: 0.8})
        if step % display_step == 0 or step == 1:
            # 计算批损失和准确度
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y,
                                                                 keep_prob: 1.0})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    print("Optimization Finished!")

    # 计算测试集的准确度
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: mnist.test.images[:256],
                                      Y: mnist.test.labels[:256],
keep_prob: 1.0}))
执行结果:

Step 1, Minibatch Loss= 62928.7383, Training Accuracy= 0.141
Step 50, Minibatch Loss= 3887.7532, Training Accuracy= 0.773
Step 100, Minibatch Loss= 2378.7659, Training Accuracy= 0.867
Step 150, Minibatch Loss= 932.5720, Training Accuracy= 0.938
Step 200, Minibatch Loss= 1575.1481, Training Accuracy= 0.922
Step 250, Minibatch Loss= 1262.4065, Training Accuracy= 0.945
Step 300, Minibatch Loss= 810.5175, Training Accuracy= 0.930
Step 350, Minibatch Loss= 195.0972, Training Accuracy= 0.961
Step 400, Minibatch Loss= 1277.8181, Training Accuracy= 0.922
Step 450, Minibatch Loss= 301.4168, Training Accuracy= 0.961
Step 500, Minibatch Loss= 362.0416, Training Accuracy= 0.961
Optimization Finished!
Testing Accuracy: 0.984375


函数总的实现功能:该CNN网络对输入图像进行两次卷积和池化操作,然后是全连接层,最后是输出层。
输入的图像形状为【N,784】,N表示一次输入图片的个数,这是一种批处理方法。
为了满足卷积函数的输入的需要,reshape函数将输入变为【N,28,28,1】;
然后conv2d函数进行卷积,卷积核操作的结果是【N,28,28,32】;然后进行池化,图像形状变为【N,14,14,32】;
然后第二次卷积和池化操作,到图像形状变为【N,7,7,64】;
然后是全连接,全连接首先将输入形状改变为【N,7*7*64】,全连接之后变为【N,1024】,
最后是输出层,尺寸变为【N,10】,结果是输入的N张图片的标签。

函数的具体实现方式:

程序中定义了weights和bias两个字典数据,weights中wc1,wc2,wd1,out键值分别对应卷积层的卷积核,全连接层和输出层的尺寸;

def conv2d(x, W, b, strides=1):...函数定义卷积操作;

def maxpool2d(x, k=2):...函数定义池化操作;

def conv_net(x, weights, biases, dropout):...函数定义该函数的卷积模型结构:对输入图像进行两次卷积和池化操作,然后是全连接层,最后是输出层;具体详见代码中的步骤说明:

其中语句:conv1 = conv2d(x, weights['wc1'], biases['bc1']),卷积层对每个5 * 5的patch计算出32个特征映射(feature map),它的权值tensor:wc1为[5, 5, 1, 32]. 前两维是patch的大小,第三维是输入通道的数目,最后一维是输出通道的数目,并对每个输出通道加上偏置(bias)。

fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1)

对应的是全连接层,此时图像尺寸被缩减为7 * 7,加入神经元数目为1024的全连接层,将最后池化层的输出结果尺寸变为一维向量,与权值相乘,并加上偏置bd1,结果输入ReLu函数激励。

其他操作大都和之前的类似,这里附录两个主要函数的说明:

(1)tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)

input:需要做卷积的输入图像,要求是一个4维的Tensor,shape为[batch, in_height, in_width, in_channels],分别是指[一个batch的图片数量, 图片高度, 图片宽度, 图像通道数],要求类型为float32或float64;
filter:相当于CNN中的卷积核,要求是一个4维的Tensor,shape为[filter_height, filter_width, in_channels, out_channels],具体是指[卷积核高度,卷积核宽度,图像通道数,卷积核个数],其第三维in_channels就是参数input的第四维;
strides:卷积时在图像每一维上的步长,一维的向量,长度为4;
padding:string类型的量,只能是"SAME","VALID"其中之一,这个值决定了不同的卷积方式;
use_cudnn_on_gpu:bool类型,是否使用cudnn加速,默认为true;
结果返回一个Tensor,即feature map,shape为[batch, height, width, channels]。

(2)tf.nn.max_pool(value, ksize, strides, padding, name=None)
value:需要池化的输入,一般池化层接在卷积层后面,所以输入通常是feature map,shape为:[batch, height, width, channels];
ksize:池化窗口的大小,一般是[1, height, width, 1]四维向量,因为不在batch和channels上做池化,所以这两个维度设为1;
strides:和卷积类似,窗口在每一个维度上滑动的步长,一般是[1, stride,stride, 1];
padding:和卷积类似,可以取'VALID' 或'SAME';
返回一个Tensor,类型不变,shape为[batch, height, width, channels]。

从执行结果来看,该CNN的准确度能达到:0.984375,比之前的softmax分类和简单的神经网络分类效果要好。

参考:

https://www.tensorflow.org/get_started/mnist/pros

http://www.tensorfly.cn/tfdoc/tutorials/mnist_pros.html

http://www.jeyzhang.com/tensorflow-learning-notes-2.html

http://lib.csdn.net/article/aimachinelearning/61475

你可能感兴趣的:(深度学习(deep,learning))