仅仅是“人工智能选修课”的上机作业
本实验采用的数据集MNIST是一个手写数字图片数据集,共包含图像和对应的标签。数据集中所有图片都是28x28像素大小,且所有的图像都经过了适当的处理使得数字位于图片的中心位置。MNIST数据集使用二进制方式存储。图片数据中每个图片为一个长度为784(28x28x1,即长宽28像素的单通道灰度图)的一维向量,而标签数据中每个标签均为长度为10的一维向量。
分层采样(或分层抽样,也叫类型抽样)方法,是将总体样本分成多个类别,再分别在每个类别中进行采样的方法。通过划分类别,采样出的样本的类型分布和总体样本相似,并且更具有代表性。在本实验中,MNIST数据集为手写数字集,有0~9共10种数字,进行分层采样时先将数据集按数字分为10类,再按同样的方式分别进行采样。
通常,我们可以通过实验测试来对神经网络模型的误差进行评估。为此,需要使用一个测试集来测试模型对新样本的判别能力,然后以此测试集上的测试误差作为误差的近似值。两种常见的划分训练集和测试集的方法:
留出法(hold-out) 直接将数据集按比例划分为两个互斥的集合。划分时为尽可能保持数据分布的一致性,可以采用分层采样(stratified sampling)的方式,使得训练集和测试集中的类别比例尽可能相似。需要注意的是,测试集在整个数据集上的分布如果不够均匀还可能引入额外的偏差,所以单次使用留出法得到的估计结果往往不够稳定可靠。在使用留出法时,一般要采用若干次随机划分、重复进行实验评估后取平均值作为留出法的评估结果。
k折交叉验证法(k-fold cross validation) 先将数据集划分为k个大小相似的互斥子集,每个子集都尽可能保持数据分布的一致性,即也采用分层采样(stratified sampling)的方法。然后,每次用k-1个子集的并集作为训练集,余下的那个子集作为测试集,这样就可以获得k组训练集和测试集,从而可以进行k次训练和测试。最终返回的是这k个测试结果的均值。显然,k折交叉验证法的评估结果的稳定性和保真性在很大程度上取决于k的取值。k最常用的取值是10,此外常用的取值还有5、20等。
介绍实验中程序的总体设计方案、关键步骤的编程方法及思路,主要包括:
构建全连接神经网络,每一层的神经元个数分别为:784->128->128->10
采用 Adam 优化器,使用 softmax 函数计算 loss
具体解释见代码注释
# 构建和训练模型
def train_and_test(images_train, labels_train, images_test, labels_test, images_validation, labels_validation):
x = tf.placeholder(tf.float32, [None, 784], name="X")
y = tf.placeholder(tf.float32, [None, 10], name="Y")
h1 = fcn_layer(inputs=x,
input_dim=784,
output_dim=128,
activation=tf.nn.relu)
h2 = fcn_layer(inputs=h1,
input_dim=128,
output_dim=128,
activation=tf.nn.relu)
forward = fcn_layer(inputs=h2,
input_dim=128,
output_dim=10,
activation=None)
pred = tf.nn.softmax(forward)
loss_function = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(logits=forward, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss_function) # 优化器
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) # 比较预测值和真实值
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
其中 fcn_layer
函数:
def fcn_layer(inputs, # 输入数据
input_dim, # 输入神经元数量
output_dim, # 输出神经元数量
activation=None): # 激活函数
W = tf.Variable(tf.truncated_normal(
[input_dim, output_dim], stddev=0.1)) # 初始化权重
b = tf.Variable(tf.zeros([output_dim])) # 初始化为0
XWb = tf.matmul(inputs, W) + b
return XWb if activation is None else activation(XWb)
具体解释见代码注释
train_epochs = 32 # 训练轮数
batch_size = 64 # 单次训练样本数(批次大小)
display_step = 4096 # 显示粒度
learning_rate = 0.001 # 学习率
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss_function) # 优化器
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) # 比较预测值和真实值
# 准确率,将布尔值转化为浮点数,并计算平均值
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
with tf.Session() as sess:
init = tf.global_variables_initializer() # 初始化变量
sess.run(init)
step = 0
for (batchImages, batchLabels) in batch_iter(images_train, labels_train, batch_size, train_epochs, shuffle=True):
sess.run(optimizer,feed_dict={x: batchImages, y: batchLabels})
具体解释见代码注释
display_step = 4096 # 显示粒度
with tf.Session() as sess:
init = tf.global_variables_initializer() # 初始化变量
sess.run(init)
step = 0
for (batchImages, batchLabels) in batch_iter(images_train, labels_train, batch_size, train_epochs, shuffle=True):
sess.run(optimizer,feed_dict={x: batchImages, y: batchLabels})
if step % display_step == 0:
loss, acc = sess.run([loss_function, accuracy],
feed_dict={x: images_validation, y: labels_validation}) # 测试
print(f"step: {step+1} Loss={loss} accuracy={acc}")
step += 1
输出结果:
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===
利用 sklearn
中的 train_test_split
实现。十次随机抽取训练集和测试集,取平均值。
具体解释见代码注释
# 留出法(hold-out)
from sklearn.model_selection import train_test_split
def hold_out(images, labels, train_percentage):
accu = []
# 十次随机抽取训练集和测试集,取平均值
for _ in range(10):
train_images, test_images, train_labels, test_labels = \
train_test_split(images,
labels,
train_size=train_percentage, # 训练集比例
stratify = labels # 保持类别分布
)
accu.append(train_and_test(train_images, train_labels, test_images, test_labels, test_images, test_labels))
print("hold-out accuracy:", accu)
利用 sklearn
中的 KFold
实现。计算k中不同抽取下的平均值。
具体解释见代码注释
# k折交叉验证法(k-fold cross validation)
from sklearn.model_selection import KFold
def cross_validation(images, labels, k):
accu = []
kf = KFold(n_splits=k, shuffle=True)
for train_index, test_index in kf.split(images):
images_train, images_test = images[train_index], images[test_index]
labels_train, labels_test = labels[train_index], labels[test_index]
accu.append(train_and_test(images_train, labels_train, images_test, labels_test, images_test, labels_test))
print("cross-validation accuracy:", np.mean(accu))
展示程序界面设计、运行结果及相关分析等,主要包括:
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===
step: 1 Loss=2.5073938369750977 accuracy=0.0729999914765358
step: 4097 Loss=0.27769413590431213 accuracy=0.9217997789382935
step: 8193 Loss=0.26662880182266235 accuracy=0.9259997010231018
step: 12289 Loss=0.263393372297287 accuracy=0.9231997728347778
step: 16385 Loss=0.26742368936538696 accuracy=0.9237997531890869
step: 20481 Loss=0.26651620864868164 accuracy=0.9251997470855713
step: 24577 Loss=0.26798802614212036 accuracy=0.9247996807098389
=== test accuracy: 0.9248 ===
0.92479974
step: 1 Loss=2.4127447605133057 accuracy=0.09719999134540558
step: 4097 Loss=0.08607088774442673 accuracy=0.9745997190475464
step: 8193 Loss=0.07784661650657654 accuracy=0.9785997271537781
step: 12289 Loss=0.095745749771595 accuracy=0.9759998321533203
step: 16385 Loss=0.09472983330488205 accuracy=0.9799997210502625
step: 20481 Loss=0.09713517129421234 accuracy=0.9787996411323547
step: 24577 Loss=0.0993366464972496 accuracy=0.9801996946334839
=== test accuracy: 0.9802 ===
0.98019964
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===
综上可以得出结论:隐藏层的层数越多,训练越久,但是得到的结果也越准确。但越多增加的效果也越不明显
step: 1 Loss=2.300844669342041 accuracy=0.12519998848438263
step: 4097 Loss=0.2754775583744049 accuracy=0.9239997863769531
step: 8193 Loss=0.24036210775375366 accuracy=0.9319997429847717
step: 12289 Loss=0.22833241522312164 accuracy=0.9349997639656067
step: 16385 Loss=0.22694511711597443 accuracy=0.9351996779441833
step: 20481 Loss=0.2160138636827469 accuracy=0.9395997524261475
step: 24577 Loss=0.20927678048610687 accuracy=0.9417997598648071
=== test accuracy: 0.9392 ===
0.93919969
step: 1 Loss=2.302095890045166 accuracy=0.10459998995065689
step: 4097 Loss=0.24206139147281647 accuracy=0.9285997152328491
step: 8193 Loss=0.19353719055652618 accuracy=0.9429997801780701
step: 12289 Loss=0.18354550004005432 accuracy=0.9491997361183167
step: 16385 Loss=0.18149533867835999 accuracy=0.9485996961593628
step: 20481 Loss=0.1877274215221405 accuracy=0.9493997097015381
step: 24577 Loss=0.1913667917251587 accuracy=0.951799750328064
=== test accuracy: 0.9548 ===
0.95479971
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===
综上可以得出结论:
隐藏层的节点数越多,参数量指数上升,训练越久,但是得到的结果也越准确。
(但随着参数量到一定程度,训练结果准确率上升趋于不明显,甚至发生过拟合现象。)
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===
step: 1 Loss=2.2310731410980225 accuracy=0.16619999706745148
=== test accuracy: 0.9718 ===
0.97179973
step: 1 Loss=2.2919344902038574 accuracy=0.15059998631477356
=== test accuracy: 0.8782 ===
0.87819982
综上可以得出结论:batch size 越大,训练越快。
但是过大的 batch size 会占用过多的显存,甚至导致溢出;同时也不利于随机梯度下降。
step: 1 Loss=2.343087673187256 accuracy=0.11779998987913132
step: 129 Loss=0.330795556306839 accuracy=0.9037997722625732
step: 257 Loss=0.24847447872161865 accuracy=0.925399661064148
step: 385 Loss=0.198109969496727 accuracy=0.9409997463226318
step: 513 Loss=0.17987975478172302 accuracy=0.9479997754096985
step: 641 Loss=0.1628917008638382 accuracy=0.9541996717453003
step: 769 Loss=0.14910820126533508 accuracy=0.9547997117042542
=== test accuracy: 0.9526 ===
0.95259976
step: 1 Loss=2.238192558288574 accuracy=0.17159998416900635
step: 4097 Loss=0.09725397080183029 accuracy=0.9717997312545776
step: 8193 Loss=0.10235630720853806 accuracy=0.9781997203826904
step: 12289 Loss=0.13071678578853607 accuracy=0.9735997915267944
step: 16385 Loss=0.12960655987262726 accuracy=0.9757996797561646
step: 20481 Loss=0.14140461385250092 accuracy=0.9765996932983398
step: 24577 Loss=0.16358020901679993 accuracy=0.9759997129440308
=== test accuracy: 0.97 ===
综上可以得出结论:epoch num 训练了多少数据后停止训练。最好在模型准确率趋于稳定之后停止训练,不然准确率将达不到期望值。
step: 1 Loss=2.328317642211914 accuracy=0.09679999947547913
step: 129 Loss=1.5215153694152832 accuracy=0.7011998891830444
step: 257 Loss=0.8109210133552551 accuracy=0.8205997943878174
step: 385 Loss=0.5582057237625122 accuracy=0.863199770450592
step: 513 Loss=0.4527219235897064 accuracy=0.8837997317314148
step: 641 Loss=0.39591166377067566 accuracy=0.8925997614860535
step: 769 Loss=0.3588014245033264 accuracy=0.8997997641563416
=== test accuracy: 0.9004 ===
0.9003998
step: 1 Loss=2.343087673187256 accuracy=0.11779998987913132
step: 129 Loss=0.330795556306839 accuracy=0.9037997722625732
step: 257 Loss=0.24847447872161865 accuracy=0.925399661064148
step: 385 Loss=0.198109969496727 accuracy=0.9409997463226318
step: 513 Loss=0.17987975478172302 accuracy=0.9479997754096985
step: 641 Loss=0.1628917008638382 accuracy=0.9541996717453003
step: 769 Loss=0.14910820126533508 accuracy=0.9547997117042542
=== test accuracy: 0.9526 ===
0.95259976
step: 1 Loss=2.3110170364379883 accuracy=0.2709999680519104
step: 129 Loss=0.2452460527420044 accuracy=0.9277997016906738
step: 257 Loss=0.215981587767601 accuracy=0.9361997246742249
step: 385 Loss=0.21104326844215393 accuracy=0.9363997578620911
step: 513 Loss=0.172766774892807 accuracy=0.9469997882843018
step: 641 Loss=0.14438582956790924 accuracy=0.9573997855186462
step: 769 Loss=0.15849816799163818 accuracy=0.9527996778488159
=== test accuracy: 0.9558 ===
0.95579976
step: 1 Loss=43.770484924316406 accuracy=0.10619999468326569
step: 129 Loss=1.7850791215896606 accuracy=0.2809999883174896
step: 257 Loss=1.7752128839492798 accuracy=0.3105999827384949
step: 385 Loss=1.719871997833252 accuracy=0.3147999942302704
step: 513 Loss=1.6704318523406982 accuracy=0.3511999845504761
step: 641 Loss=1.6277217864990234 accuracy=0.34059998393058777
step: 769 Loss=1.8401107788085938 accuracy=0.2733999788761139
=== test accuracy: 0.2738 ===
0.27379999
综上可以得出结论:
学习率减小,准确率提高,但收敛慢。
学习率减小,学习速率增加,但易震荡
print("===== hold-out =====")
print("train_percentage: 0.8: ", end='')
hold_out(total_images, total_labels, 0.8)
print("train_percentage: 0.9: ", end='')
hold_out(total_images, total_labels, 0.9)
print("train_percentage: 0.5: ", end='')
hold_out(total_images, total_labels, 0.5)
print("train_percentage: 0.2: ", end='')
hold_out(total_images, total_labels, 0.2)
结果:
===== hold-out =====
train_percentage: 0.8:
hold-out accuracy: [0.97072774, 0.974455, 0.97690958, 0.97781873, 0.97545499, 0.96990955, 0.97654593, 0.97209132, 0.97390956, 0.97772777]
train_percentage: 0.9:
hold-out accuracy: [0.97945446, 0.97327256, 0.97381806, 0.97654533, 0.97981811, 0.97472721, 0.97472715, 0.97327256, 0.97436351, 0.97163624]
train_percentage: 0.5:
hold-out accuracy: [0.97501898, 0.97083724, 0.97414637, 0.97454625, 0.97087359, 0.97469169, 0.97520077, 0.96894628, 0.97367346, 0.97367346]
train_percentage: 0.2:
hold-out accuracy: [0.96202344, 0.95893252, 0.95929617, 0.95911437, 0.95968258, 0.95965987, 0.96059167, 0.96009171, 0.96056885, 0.95806891]
综上可以得出结论:太极端的 train_percentage 使测试说服性降低,最好在0.8附近
print("===== cross-validation =====")
print("k=5: ", end='')
cross_validation(total_images, total_labels, 5)
print("k=10: ", end='')
cross_validation(total_images, total_labels, 10)
print("k=20: ", end='')
cross_validation(total_images, total_labels, 20)
print("k=2: ", end='')
cross_validation(total_images, total_labels, 2)
结果:
===== cross-validation =====
k=5:
cross-validation accuracy: 0.975146
k=10:
cross-validation accuracy: 0.976927
k=20:
cross-validation accuracy: 0.977491
k=2:
cross-validation accuracy: 0.973474
综上可以得出结论:k折交叉验证法比留出法对k的鲁棒性要好一点
我通过MNIST 手写数字图片数据集训练一个简单的手写数字识别神经网络为例子,了解了用 Transflow 训练全连接神经网络的技巧,探究了神经网络的各种参数对训练过程以及训练结果的影响。
还尝试了“留出法”与“ k 折交叉验证法”这两种神经网络模型评估方法,探索了这两种方法参数对评估结果的影响。
anaconda 优雅安装 tensorflow (不用手动安装cuda、cudnn等):
conda create --name tf_gpu_env python=3.6 anaconda tensorflow-gpu
不踩坑:Ubuntu下安装TensorFlow的最简单方法(无需手动安装CUDA和cuDNN) - 知乎 (zhihu.com)
运行 jupyter 时遇到的问题解决:
彻底解决:AttributeError:type object IOLoop has no attribute initialized_Joyyang_c的博客-CSDN博客