目录
一、前言
1.1Fasion MNSIT数据集介绍
1.2 LetNet-5简介
二、TF2.0构建LetNet卷积网络实现Fasion MNSIT分类
2.1数据读取
2.2 构建LetNet-5网络结构
2.3训练
2.4验证测试
2.5模型保存
2.6预测
本博客旨在通过利用tensorflow2.0构建一个简单的神经网络模型(LetNet-5)实现对数据集Fasion MNSIT的分类,来熟悉深度学习的大致流程,同时熟悉tensorflow2.0的api的使用。
学习记录:
1. tensorflow2.0深度学习开发环境搭建(ubuntu/win10)
2. 项目实战1:TF2.0+LetNet-5构建Fasion MNSIT分类器
3. 项目实战2:TF2.0+AlexNet构建Fasion MNSIT分类器
4. 项目实战3:TF2.0+VGG-16构建Fasion MNIST分类器
5. 项目实战4:TF2.0+ResNet构建Fasion MNIST分类器
5. 项目实战5:TF2.0+mobilieNet构建Fasion MNIST分类器
相信稍微涉足一点深度学习的学习者都有了解过MNSIT,很多学习历程的第一个demo就是对手写数字数据集MNSIT的分类,而Fashion MNIST 是一个替代 MNIST 手写数字集 的图像数据集。 它是由 Zalando(一家德国的时尚科技公司)旗下的研究部门提供。其涵盖了来自 10 种类别的共 7 万个不同商品的正面图片。Fashion MNIST 的大小、格式和训练集/测试集划分与原始的 MNIST 完全一致。训练集(60000),测试集(10000),数据尺寸:28x28x1(灰度图片)。数据及下载链接:
MNSIT: http://yann.lecun.com/exdb/mnist/
Fasion MNSIT: https://github.com/zalandoresearch/fashion-mnist
labels | class |
0 | T-shirt/top(T恤) |
1 | Trouser(裤子) |
2 | Pullover(套衫) |
3 | Dress(裙子) |
4 | Coat(外套) |
5 | Sandal(凉鞋) |
6 | Shirt(汗衫) |
7 | Sneaker(运动鞋) |
8 | Bag(包) |
9 | Ankle boot(踝靴) |
LeNet-5是较早的一个卷积神经网络,在1998年的时候被提出(论文连接)。那时候计算机处理速度不快,因此网络整个的设计都比较小,总参数约6万个。但麻雀虽小五脏俱全,整个网络中包含了常用的基本操作:卷积层、池化、全连接、激活函数等,在手写体字符识别测试中非常高效的卷积神经网络,其网络结构图如下图所示。
整个过程可分为以下几部分:
(1)数据读取
(2)模型构建
(3)训练
(4)模型保存
(5)验证
(6)预测
(1)在线读取,工程中加入以下代码,便可在线下载:
(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()
(2)读取本地数据
由于在线下载数据比较慢,此处推荐先自行将数据下载保存至本地,利用numpy实现调用(TF2.0的一大优势:可直接使用numpy进行数加载及预处理,再也不用像TF1.0+那样转成tfrecords文件了),当将Fasion MNSIT数据集下载后为如下四个文件夹:
训练集数据(60000):train-images-idx3-ubyte.gz
训练集标签(60000):train-labels-idx1-ubyte.gz
测试集数据(10000):t10k-images-idx3-ubyte.gz
测试集标签(10000):t10k-labels-idx1-ubyte.gz
新建mnsit_read.py用于读取文件,代码如下:
'''
data_path:数据集本地保存路径
kind:数据集类别(训练集:kind='train',测试集:kind='t10k')
'''
def load_mnist(data_path, kind='train'):
import os
import gzip
import struct
import numpy as np
'''Load MNIST data from `path`'''
labels_path = os.path.join(data_path, '%s-labels-idx1-ubyte.gz'% kind)
images_path = os.path.join(data_path,'%s-images-idx3-ubyte.gz'% kind)
print(images_path, labels_path)
with gzip.open(labels_path, 'rb') as lbpath:
data_file = lbpath.read()
_, nums = struct.unpack_from('>II', data_file, 0) # 取前2个整数,返回一个元组
labels = np.frombuffer(data_file, dtype=np.uint8,offset=8)
print("{} labels:{}".format(kind, nums))
with gzip.open(images_path, 'rb') as imgpath:
data_file = imgpath.read()
_, nums, width, height = struct.unpack_from('>IIII', data_file, 0) # 取前4个整数,返回一个元组
images = np.frombuffer(data_file,
dtype=np.uint8,offset=16).reshape(len(labels), width, height)
print("{} datas shape:({},{},{})".format(kind, nums, width, height))
return images, labels
def read_mnsit_test():
import matplotlib.pyplot as plt
class_names = ['t_shirt_top', 'trouser', 'pullover', 'dress', 'coat', 'sandal', 'shirt', 'sneaker', 'bag', 'ankle_boots']
data_path = "../data/fashion/"
x_train, y_train = load_mnist(data_path, 'train')
x_test, y_test = load_mnist(data_path, 't10k')
fig = plt.figure("fasion-mnsit")
showCOLS = 5
showROWS = 5
for figCnt in range(showCOLS*showROWS):
fig.add_subplot(showCOLS, showROWS , figCnt+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train[figCnt])
plt.xlabel(class_names[y_train[figCnt]])
plt.show()
if __name__ == '__main__':
read_mnsit_test()
建立好数据读取代码后,便可通过以下函数获取相应数据集:
train_datas, trian_labels = load_mnist(data_path, 'train')
test_datas, test_labels = load_mnist(data_path, 't10k')
运行测试代码,显示如下:
LetNet-5总共7层:输入图像为32x32的灰度图像,Fasion MNSIT为28x28,需进行简单修改(第一层卷积层padding='valid'更换为padding='same'),具体描述如下:
#layer1:卷积层(卷积核大小:5x5x6, padding='same',strides=1), output:28x28x6
#layer2:池化层(卷积核大小:2x2x1, padding='same', strides=2), output:14x14x6
#layer3:卷积层(卷积核大小:5x5x16, padding='valid',strides=1), output:10x10x16
#layer4:池化层(卷积核大小:2x2x1, padding='same', strides=2), output:5x5x16
#layer5:卷积层(卷积核大小:5x5x120, padding='valid',strides=1), output:1x1x120
#layer6:全连接层(output:1x84)
#layer7:输出层(output:1x10)
整个LetNe代码如下:
LetNet5_model = keras.Sequential(
[
layers.Conv2D(input_shape=((x_shape[1], x_shape[2], x_shape[3])),
filters=6, kernel_size=(5,5), strides=(1,1), padding='same', activation='relu'),
layers.MaxPool2D(pool_size=(2,2), strides=(2,2),padding='same'),
layers.Conv2D(filters=16, kernel_size=(5,5), strides=(1,1), padding='valid', activation='relu'),
layers.MaxPool2D(pool_size=(2,2), strides=(2,2),padding='same'),
layers.Conv2D(filters=120, kernel_size=(5,5), strides=(1,1), padding='valid', activation='relu'),
layers.MaxPool2D(pool_size=(2,2), strides=(2,2),padding='same'),
layers.Flatten(),
layers.Dense(84, activation='relu'),
layers.Dense(10, activation='softmax')
])
2.3.1模型配置
(1)优化函数(使用,默认参数):keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8, kappa=1-1e-8)
(2)loss函数:loss=keras.losses.SparseCategoricalCrossentropy()
!!!loss函数选择loss=keras.losses.SparseCategoricalCrossentropy()时模型加载不成功
换成:loss='sparse_categorical_crossentropy' 模型调用成功!!!
LetNet_5.compile(optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
LetNet_5.summary() #打印网络结构
'''
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 6) 156
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 6) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 10, 10, 16) 2416
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 1, 1, 120) 48120
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 1, 1, 120) 0
_________________________________________________________________
flatten (Flatten) (None, 120) 0
_________________________________________________________________
dense (Dense) (None, 84) 10164
_________________________________________________________________
dense_1 (Dense) (None, 10) 850
=================================================================
Total params: 61,706
Trainable params: 61,706
Non-trainable params: 0
_________________________________________________________________
'''
2.3.2训练
history = LetNet_5.fit(x_train, y_train, batch_size=64, epochs=5, validation_split=0.1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.legend(['training', 'valivation'], loc='upper left')
plt.show()
res = LetNet_5.evaluate(x_test, y_test)
'''
0000/10000 [==============================] - 0s 46us/sample - loss: 0.3347 - accuracy: 0.8819
'''
通过执行以上程序,不到1分钟的训练时间,可得到88%的准确率,当然这里我们并没有仅仅搭建了一个简单的模型,也没有对网络参数进行调参,但基本也包括了一个完整分类模型所涉及到的基本流程,接下来我们可以对网络做进一步优化,比如:
tensorflow2.0保存模型有多重模式,详见:https://zhuanlan.zhihu.com/p/59481985
本处采用“保持全模型”,内容如下:
LetNet_model.save('LetNet_model.h5')
加载模型,数据->前向传播进行推理,预测图片类别
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
print(tf.__version__)
from utils import mnist_reader
labels = ['t_shirt_top', 'trouser', 'pullover', 'dress', 'coat', 'sandal', 'shirt', 'sneaker', 'bag', 'ankle_boots']
DataSetPath = "./data/fashion/"
x_test, y_test = mnist_reader.load_mnist(DataSetPath, 't10k')
x_test = x_test.reshape((-1,28,28,1))
new_model = keras.models.load_model('LetNet5_model.h5')
new_prediction = new_model.predict(x_test)
for iCnt in range(100):
np.set_printoptions(suppress=True)
max_val = np.max( new_prediction[iCnt] )
array = new_prediction[iCnt].tolist()
class_id = array.index(max_val)
print(class_id,labels[class_id])