本文使用TensorFlow实现了基于RNN循环神经网络的MNIST手写数据集的分类。
1. 超参数定义
# 超参数定义(hyper parameters)
LEARNING_RATE = 0.001
TRAINING_ITER = 10000
BATCH_SIZE = 128
INPUT_DIMS = 28
HIDDEN_DIMS = 128
CLASSES_NUM = 10
TIME_STEPS = 28
对于以上超参数,其中:
INPUT_DIMS : 输入的维度,在本例中长度为28,即是图像的每一列的长度
HIDDEN_DIMS : RNN cell中的状态state的维度,即隐藏节点
CLASSES_NUM : 分类结果数,一共是10个分类
TIME_STEPS : 指的是整个序列的长度,这里设为每个图像的行数
2. 网络结构
此处实现的为RNN循环神经网络的前向传播
# weights和bias初始化函数,用于多次调用
def weight_init(shape):
return tf.Variable(tf.random_normal(shape=shape, stddev=0.1))
def bias_init(shape): # 使bias的初始化为较小的值,但是不为零
return tf.Variable(tf.zeros(shape=shape) + 0.01)
# 创建网络结构,进行前向传播
def rnn_inference(x):
weight_xa = weight_init([INPUT_DIMS, HIDDEN_DIMS])
bias_xa = bias_init([HIDDEN_DIMS])
weight_ay = weight_init([HIDDEN_DIMS, CLASSES_NUM])
bias_ay = bias_init([CLASSES_NUM])
x = tf.reshape(x, [-1 , INPUT_DIMS])
x_to_a = tf.matmul(x, weight_xa) + bias_xa
x_to_a = tf.reshape(x_in, [-1, TIME_STEPS, HIDDEN_DIMS])
rnn_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=HIDDEN_DIMS)
# state_init = rnn_cell.zero_state()
# 可以不进行状态初始化
output, last_state = tf.nn.dynamic_rnn(rnn_cell, x_to_a, dtype=tf.float32)
# 这里可以选取output的最后一个输出来进行计算
logits = tf.nn.softmax(tf.matmul(output[:, -1, :], weight_ay) + bias_ay , 1)
return logits
以上需要注意的的是,每层的RNN cell是共用相同的参数weights和biases,和CNN网络中卷积核比较类似,所以在这我们只需要定义weight_xa, bisa_a以及weight_ay和bias_y;由RNN循环神经网络的原理可知,状态更新为:
在本例中,选择状态量的维度为HIDDEN_DIMS,其中又可以分为,只需要我们自己定义即可,这里我的理解为TensorFlow自己已经实现内部变量的更新,则可以知道,所以这里我们需要对输入的x进行reshape,大小为,然后对,再将其reshape为TensorFlow中RNN cell的输入的格式即可,即为。具体的TensorFlow中的rnn_cell以及danamic_rnn()和rnn_cell.call()可以自行进行详细的了解。
输出为:
3. 进行训练
# 训练函数
def train():
# 数据读取
mnist_data = input_data.read_data_sets("MNIST_data", one_hot=True)
print(mnist_data.train.images.shape)
# 创建输入和训练标签占位符
x_in = tf.placeholder(dtype=tf.float32, shape=[None, 784], name="x_inputs")
y_label = tf.placeholder(dtype=tf.float32, shape=[None, CLASSES_NUM], name="y_labels")
y_out = lstm_inference(x_in)
# 使用交叉熵目标函数
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_label, logits=y_out))
correct_pred = tf.equal(tf.argmax(y_label, 1), tf.argmax(y_out, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# 使用Adam法进行优化
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(loss)
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
for i in range(TRAINING_ITER):
batch_x, batch_label = mnist_data.train.next_batch(BATCH_SIZE)
sess.run(
optimizer,
feed_dict={
x_in: batch_x,
y_label: batch_label
})
if i % 100 == 0:
print("steps {0} , loss : {1} , accuracy : {2}".format(str(i), str(loss.eval(
feed_dict={
x_in: batch_x,
y_label: batch_label
})), str(accuracy.eval(
feed_dict={
x_in: mnist_data.test.images,
y_label: mnist_data.test.labels
}
))))
4. 训练结果
steps 0 , loss : 2.299897 , accuracy : 0.1486
steps 100 , loss : 1.7092526 , accuracy : 0.7546
steps 200 , loss : 1.5534915 , accuracy : 0.9055
steps 300 , loss : 1.5672985 , accuracy : 0.9162
steps 400 , loss : 1.5232389 , accuracy : 0.9295
steps 500 , loss : 1.4930712 , accuracy : 0.9482
steps 600 , loss : 1.520176 , accuracy : 0.9547
steps 700 , loss : 1.497025 , accuracy : 0.9545
steps 800 , loss : 1.4983183 , accuracy : 0.9562
steps 900 , loss : 1.5114753 , accuracy : 0.9644
steps 1000 , loss : 1.498097 , accuracy : 0.9687
steps 1100 , loss : 1.4864463 , accuracy : 0.9672
steps 1200 , loss : 1.5137687 , accuracy : 0.9637
steps 1300 , loss : 1.4695554 , accuracy : 0.9712
steps 1400 , loss : 1.4954376 , accuracy : 0.9706
steps 1500 , loss : 1.4733143 , accuracy : 0.9724
steps 1600 , loss : 1.4818668 , accuracy : 0.9722
steps 1700 , loss : 1.4771671 , accuracy : 0.9669
steps 1800 , loss : 1.4712167 , accuracy : 0.9741
steps 1900 , loss : 1.4700866 , accuracy : 0.9742
steps 2000 , loss : 1.4893398 , accuracy : 0.9721
steps 2100 , loss : 1.48666 , accuracy : 0.9742
steps 2200 , loss : 1.4840587 , accuracy : 0.9748
steps 2300 , loss : 1.488863 , accuracy : 0.9765
steps 2400 , loss : 1.4716485 , accuracy : 0.9735
steps 2500 , loss : 1.4644128 , accuracy : 0.9749
Process finished with exit code -1
5. 结论
在实验结果中我们可以看到,收敛速度很快,大概进行1500次的梯度下降就可以达到97%的正确率,但是在进行了10000次的迭代之后最终的分类正确率大概只能到98%左右,和CNN和99.2%还是有不小的差距,改进如下:1).可以使用LSTM长短期记忆进行改进;2)使用多层RNN,即深度循环神经网络。
参考文献
附录
使用LSTM
def lstm_inference(x):
weight_xa = weight_init([INPUT_DIMS, HIDDEN_DIMS])
bias_xa = bias_init([HIDDEN_DIMS])
weight_ay = weight_init([HIDDEN_DIMS, CLASSES_NUM])
bias_ay = bias_init([CLASSES_NUM])
x = tf.reshape(x, [-1, INPUT_DIMS])
x_in = tf.matmul(x, weight_xa) + bias_xa
x_in = tf.reshape(x_in, [-1, TIME_STEPS, HIDDEN_DIMS])
# 定义LSTM cell
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(HIDDEN_DIMS)
output, last_state = tf.nn.dynamic_rnn(lstm_cell, x_in, dtype=tf.float32)
# 取output的最后一cell的输出
logits = tf.nn.softmax(tf.matmul(output[:, -1, :], weight_ay) + bias_ay, 1)
return logits
如有错误,欢迎指出,十分感谢