本文是一篇面向初学者的文章,文章主要基于代码描述tensorflow实现基础的卷积神经网络,并应用在mnist数据集上.
import tensorflow as tf
import tensorflow.examples.tutorials.mnist.input_data as input_data
import numpy as np
#计算测试集的loss和acc
def calc_acc(cnt, train_acc, train_loss):
sum_loss = 0.0
sum_acc = 0.0
for i in range(test_nbatch):
x_batch,y_batch = get_batch(test_data, test_batch_size, i)
feed = {x:x_batch,y:y_batch,keep_prob:1.0,keep_prob2:1.0}
calc_obj = [loss,acc_num]
calc_ans = sess.run(calc_obj, feed_dict=feed)
sum_loss += calc_ans[0]
sum_acc += calc_ans[1]
avg_loss = sum_loss / test_nbatch
avg_acc = sum_acc / test_size
print("{:0>2}:train-loss:{:.4f},acc:{:.4f} test-loss:{:.4f},acc:{:.4f}".format(cnt,train_loss,train_acc,avg_loss,avg_acc))
#获取第k个batch
def get_batch(data, batch_size, k):
ret = []
for i in range(2):
ret.append(data[i][batch_size*k:batch_size*(k+1)])
return ret
#const and data
image_size = 784
image_len = 28
class_size = 10
epochs = 10
eps = 1e-10
mnist = input_data.read_data_sets("MNIST_DATA/")
train_data = [mnist.train.images,mnist.train.labels]
test_data = [mnist.test.images,mnist.test.labels]
train_batch_size = 50
test_batch_size = 50
train_size = train_data[0].shape[0]
test_size = test_data[0].shape[0]
train_nbatch = train_size // train_batch_size
test_nbatch = test_size // test_batch_size
sess = tf.InteractiveSession()
#input
x = tf.placeholder('float', shape=[None,image_size])
y = tf.placeholder('int32', shape=[None])
x_image = tf.reshape(x, [-1,image_len,image_len,1])
y_label = tf.one_hot(y, class_size)
keep_prob = tf.placeholder('float')
keep_prob2 = tf.placeholder('float')
#conv1
w_conv1 = tf.Variable(tf.truncated_normal([5,5,1,20], stddev=0.01))
b_conv1 = tf.Variable(tf.constant(0.0, shape=[20]))
y_conv1 = tf.nn.relu(tf.nn.conv2d(x_image, w_conv1, strides=[1,1,1,1], padding='SAME') + b_conv1)
y_pool1 = tf.nn.max_pool(y_conv1, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
#conv2
w_conv2 = tf.Variable(tf.truncated_normal([5,5,20,50], stddev=0.01))
b_conv2 = tf.Variable(tf.constant(0.0, shape=[50]))
y_conv2 = tf.nn.relu(tf.nn.conv2d(y_pool1, w_conv2, strides=[1,1,1,1], padding='SAME') + b_conv2)
y_pool2 = tf.nn.max_pool(y_conv2, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
#flat
y_flat = tf.reshape(y_pool2, [-1,7*7*50])
y_flat_drop = tf.nn.dropout(y_flat, keep_prob2)
#fc1
w_fc1 = tf.Variable(tf.truncated_normal([7*7*50,500], stddev=0.01))
b_fc1 = tf.Variable(tf.constant(0.0, shape=[500]))
y_fc1 = tf.nn.relu(tf.matmul(y_flat_drop, w_fc1) + b_fc1)
y_fc1_drop = tf.nn.dropout(y_fc1, keep_prob)
#fc2
w_fc2 = tf.Variable(tf.truncated_normal([500,10], stddev=0.01))
b_fc2 = tf.Variable(tf.constant(0.0, shape=[10]))
y_fc2 = tf.nn.softmax(tf.matmul(y_fc1_drop, w_fc2) + b_fc2) + eps
#loss
loss = -tf.reduce_mean(tf.reduce_sum(y_label*tf.log(y_fc2), reduction_indices=[1]))
grad = tf.train.AdadeltaOptimizer(4.0).minimize(loss)
#acc
corr_pred = tf.equal(tf.argmax(y_label,1), tf.argmax(y_fc2,1))
acc_num = tf.reduce_sum(tf.cast(corr_pred, 'float'))
#run
sess.run(tf.global_variables_initializer())
cnt = 0
for i in range(epochs):
sum_acc = 0.0
sum_loss = 0.0
for j in range(train_nbatch):
x_batch,y_batch = get_batch(train_data, train_batch_size, j)
feed = {x:x_batch,y:y_batch,keep_prob:0.5,keep_prob2:0.75}
calc_obj = [grad,loss,acc_num]
calc_ans = sess.run(calc_obj, feed_dict=feed)
sum_acc += calc_ans[2]
sum_loss += calc_ans[1]
avg_acc = sum_acc / train_size
avg_loss = sum_loss / train_nbatch
cnt += 1
calc_acc(cnt, avg_acc, avg_loss)
全连接层的参数在tensorflow里是一个参数矩阵W+一个偏置向量b.
矩阵的行列数分别是上一层的节点数和本层的节点数,要注意的是参数W与信号X运算的时候,是写成X*W的矩阵乘法,X在前面才符合神经网络的公式.
卷积层的参数也是有W和b构成,不同点在于W的维度不再是2维,而是4维,具体是[width,height,in_channel,out_channel]
书本上描述的卷积层,是由多个卷积核组成,每个卷积核有宽度和高度,对应上述的width和height这两个维度.
每个卷积核对图片扫描得到一张特征图,这若干张特征图会传到下一层.对下一层来说,卷积核需要面对多张图,这与一开始只有一张图片有所不同.
因此,我们需要统一卷积层的定义,我们设置了in_ch和out_ch这两个维度,in_ch表示该层的输入有多少张图(通道),out_ch表示该层输出多少张图(通道).
由此,我们会产生in_ch*out_ch个卷积核,属于同一个out_ch的卷积核有in_ch个,它们进行卷积得到的特征图会直接进行相加得到一张特征图,作为这个out_ch的输出.
对于黑白照片来说,第一个卷积层的一通道输入.对于彩色照片来说,第一个卷积层就是三通道输入(如果是RGB模式的话).
dropout会让一部分的信号直接变成0,丢弃的比例取决于参数keep_prob(注意keep_prob实际上是保留的比例).另外,丢弃之后信号的总强度会削弱,因此还会乘上一个数去保持总强度期望上不变.
dropout能起到抑制过拟合的作用,尤其在全连接层表现明显.
一般dropout只在训练的时候使用,在测试时不要dropout(通过外部输入keep_prob控制).
因此,在训练过程前期会出现测试集的准确率高于训练集这一反常情况.
把卷积层的特征图变成一维的向量,以便输入到fc层.
一般采用relu激活函数,最后一层采用softmax.
一般采用交叉熵而不是均方误差.
不同的学习器的学习率一般不一样.
常用的学习器和其学习率如下:
学习器 | 描述 | 学习率 |
---|---|---|
GradientDescentOptimizer | 最普通的梯度下降 | 1.0 |
AdadeltaOptimizer | ada算法实现的梯度下降 | 1.0 |
AdamOptimizer | adam算法实现的梯度下降 | 1e-4 |
上述的学习率只是一个大概数量级,最优的学习率需要不断调整才能得到.
参数初值不能全为0,也不能太大,初值设置不当可能会造成无法训练.
偏置参数可以设为0,其余一般的参数最好控制在均值为0,标准差最好不要超过0.1,可以使用均匀分布或截尾的正态分布等生成.
batch增大会增加消耗的内存(显存),机器往往无法直接跑整个测试集,因此测试时也应该一个一个batch进行.
神经网络的输出强制加上1e-10,避免后面log运算出现log(0).
具体参考tensorflow训练出现nan的讨论
01:train-loss:0.2149,acc:0.9291 test-loss:0.0469,acc:0.9846
02:train-loss:0.0581,acc:0.9822 test-loss:0.0331,acc:0.9892
03:train-loss:0.0417,acc:0.9876 test-loss:0.0250,acc:0.9918
04:train-loss:0.0355,acc:0.9892 test-loss:0.0242,acc:0.9913
05:train-loss:0.0287,acc:0.9913 test-loss:0.0216,acc:0.9924
06:train-loss:0.0250,acc:0.9924 test-loss:0.0212,acc:0.9929
07:train-loss:0.0227,acc:0.9931 test-loss:0.0210,acc:0.9932
08:train-loss:0.0200,acc:0.9936 test-loss:0.0259,acc:0.9919
09:train-loss:0.0181,acc:0.9943 test-loss:0.0218,acc:0.9924
10:train-loss:0.0166,acc:0.9948 test-loss:0.0221,acc:0.9925