下面直接换个有隐藏层的BP:100个隐藏节点,tanh做激活函数:
代码如下:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
#mnist已经作为官方的例子,做好了数据下载,分割,转浮点等一系列工作,源码在tensorflow源码中都可以找到
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
# 配置每个 GPU 上占用的内存的比例
# 没有GPU直接sess = tf.Session()
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.95)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
#每个批次的大小
batch_size = 100
#定义训练轮数据
train_epoch = 10
#定义每n轮输出一次
test_epoch_n = 1
#计算一共有多少批次
n_batch = mnist.train.num_examples // batch_size
print("batch_size="+str(batch_size)+"n_batch="+str(n_batch))
#占位符,定义了输入,输出
x = tf.placeholder(tf.float32,[None, 784])
y = tf.placeholder(tf.float32,[None, 10])
#权重和偏置,使用0初始化
W1 = tf.Variable(tf.truncated_normal([784,100],stddev=0.1))
b1 = tf.Variable(tf.zeros([100]))
L1 = tf.nn.tanh(tf.matmul(x,W1)+b1)
W2 = tf.Variable(tf.truncated_normal([100,10],stddev=0.1))
b2 = tf.Variable(tf.zeros([10]))
L2 = tf.nn.tanh(tf.matmul(L1,W2)+b2)
#这里定义的网络结构
prediction = tf.nn.softmax(L2)
#损失函数是交叉熵
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))
#训练方法:
#train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
train_step = tf.train.AdamOptimizer(1e-2).minimize(cross_entropy)
#初始化sess中所有变量
init = tf.global_variables_initializer()
sess.run(init)
MaxACC = 0#最好的ACC
saver = tf.train.Saver()
#训练n个epoch
for epoch in range(train_epoch):
for batch in range(n_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
sess.run(train_step, feed_dict = {x: batch_xs, y: batch_ys})
if(0==(epoch%test_epoch_n)):#每若干次预测test一次
#计算test集的准确率
correct_prediction = tf.equal(tf.argmax(prediction,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
now_acc=sess.run(accuracy, feed_dict={x:mnist.test.images, y: mnist.test.labels})
print('epoch=',epoch,'ACC=',now_acc,'train acc =',sess.run(accuracy, feed_dict={x:mnist.train.images, y: mnist.train.labels}))
if(now_acc>MaxACC):
MaxACC = now_acc
saver.save(sess, "Model/ModelSoftmax.ckpt")
print('Save model! Now ACC=',MaxACC)
#计算最终test集的准确率
correct_prediction = tf.equal(tf.argmax(prediction,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Train OK! epoch=',epoch,'ACC=',sess.run(accuracy, feed_dict={x:mnist.test.images, y: mnist.test.labels}))
#关闭sess
sess.close()
#读取模型
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.95)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
saver.restore(sess, "./Model/ModelSoftmax.ckpt") # 注意此处路径前添加"./"
print('Load Model OK!')
print('ACC=',sess.run(accuracy, feed_dict={x:mnist.test.images, y: mnist.test.labels}))
最后结果:
epoch= 0 ACC= 0.9566 train acc = 0.964236
Save model! Now ACC= 0.9566
epoch= 1 ACC= 0.9534 train acc = 0.961218
epoch= 2 ACC= 0.9582 train acc = 0.966436
Save model! Now ACC= 0.9582
epoch= 3 ACC= 0.9558 train acc = 0.964545
epoch= 4 ACC= 0.9573 train acc = 0.9676
epoch= 5 ACC= 0.9572 train acc = 0.965782
epoch= 6 ACC= 0.9605 train acc = 0.970691
Save model! Now ACC= 0.9605
epoch= 7 ACC= 0.9538 train acc = 0.967218
epoch= 8 ACC= 0.9595 train acc = 0.9688
epoch= 9 ACC= 0.9581 train acc = 0.968018
Train OK! epoch= 9 ACC= 0.9581
实验开始:
1)激活函数换成relu+relu,acc=0.5857,为啥变烂了?
2)激活函数换成tanh+relu,acc=0.9582
3)激活函数换成relu+tanh,acc=0.9582
4)激活函数换成tanh+relu,acc=0.9626
5)激活函数换成sigmoid+sigmoid,acc=0.9679
6)激活函数换成sigmoid+relu,acc=0.3853,烂!
7)激活函数换成sigmoid+tanh,acc=0.7853
8)修改偏置
b1 = tf.Variable(tf.zeros([100])+0.1)
激活函数换成relu+relu,acc= 0.8658,说明relu一定要避免死节点的问题!
8)换成784->400->100->10,sigmoid+sigmoid+sigmoid
ACC= 0.9753!说明网络结构的影响非常大!
9)784->400->100->10,batch_size = 20
acc=0.7896,可见batch_size影响也是很大的
10)784->400->100->10,batch_size = 200
acc=0.979
11)换成784->400->200->10,sigmoid+sigmoid+sigmoid
ACC= 0.9772!说明网络越复杂,拟合能力越强,但此时容易出现过拟合情况