ImageNet虽然带有”Net“,但他不是一种深度神经网络模型,它是个数据集,斯坦福大学教授李飞飞带头建立,是目前图像分类、检测、定位的最常用数据集之一。该数据集含大量数据1500万图片,2.2万类别,真彩图(RGB三通道)。
mnist是手写数字识别数据集,训练集包含60000 张图像和标签,而测试集包含了10000 张图像和标签,每张图片是一个28*28像素点的0 ~ 9的灰质手写数字图片,黑底白字,图像像素值为0 ~ 255,越大该点越白。
本例将在ImageNet训练的VGG16模块迁移至mnist模块。
导入keras中有关神经网络的功能模块
import tensorflow
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.layers import Input, Flatten, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD
导入VGG16模块
from tensorflow.keras.applications.vgg16 import VGG16
导入keras自带的mnist数据集
from tensorflow.keras.datasets import mnist
导入图像处理和计算模块
import cv2
import numpy as np
model_vgg = VGG16(include_top=False, weights='imagenet',
input_shape=(224, 224, 3))
model = Flatten(name='flatten')(model_vgg.output)
model = Dense(10, activation='softmax')(model)
model_vgg_mnist=Model(model_vgg.input,model,name='vgg16')
print(model_vgg_mnist.summary())
其中,include_top=False 表示迁移除顶层之外的其余网络结构到自己模型中。
Model: "vgg16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
dense (Dense) (None, 10) 250890
=================================================================
Total params: 14,965,578
Trainable params: 14,965,578
Non-trainable params: 0
可以看到,需要训练的参数个数为14,965,578,达到千万级别。
迁移权重的优点:网络权值不需要重新训练,只需要训练输入层的网络权值即可;
迁移权重的缺点:已有权重是基于imagenet数据集进行训练的,样本种类、数据分布等与本次训练的数据集不一定相似。
ishape = 224
model_vgg = VGG16(include_top=False, weights='imagenet', input_shape=(ishape, ishape, 3))
for layer in model_vgg.layers:
layer.trainable = False
model = Flatten(name='flatten')(model_vgg.output)
model = Dense(10, activation='softmax')(model)
model_vgg_mnist = Model(model_vgg.input, model, name='vgg16_pretrain')
print(model_vgg_mnist.summary())
Model: "vgg16_pretrain"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
dense (Dense) (None, 10) 250890
=================================================================
Total params: 14,965,578
Trainable params: 250,890
Non-trainable params: 14,714,688
_________________________________________________________________
None
只需要训练250890(十万级)个参数
由于训练时间较长,本例中设置输入图像尺寸为(56,56)
Model: "vgg16_pretrain"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 56, 56, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 56, 56, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 56, 56, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 28, 28, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 28, 28, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 28, 28, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 14, 14, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 14, 14, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 14, 14, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 14, 14, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 7, 7, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 7, 7, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 7, 7, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 7, 7, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 3, 3, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 3, 3, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 3, 3, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 3, 3, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 1, 1, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 512) 0
_________________________________________________________________
dense (Dense) (None, 10) 5130
=================================================================
Total params: 14,719,818
Trainable params: 5,130
Non-trainable params: 14,714,688
_________________________________________________________________
None
对mnist数据集进行处理
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train[0])
print(x_train[0].shape) # (28, 28)
将 (28, 28)变成(56,56),mnist图像是黑白的,需要转成三维数据。
ishape = 56
x_train = [cv2.cvtColor(cv2.resize (i, (ishape, ishape)),cv2.COLOR_GRAY2BGR) for i in x_train]
x_train = np.concatenate ([arr[ np.newaxis] for arr in x_train] ).astype ("float32")
x_test = [cv2.cvtColor(cv2.resize (i, (ishape, ishape)),cv2.COLOR_GRAY2BGR) for i in x_test ]
x_test = np.concatenate ([arr[ np.newaxis] for arr in x_test ] ).astype ("float32")
print(x_train.shape)#(60000, 56, 56, 3)
print(x_test.shape)#(10000, 56, 56, 3)
归一化
x_train/=255
x_test/=255
将标签y进行one-hot编码
def tran_y(y):
y_ohe=np.zeros(10)
y_ohe[y]=1
return y_ohe
y_train_ohe = np.array([tran_y(y_train[i]) for i in range (len (y_train) )])
y_test_ohe = np. array([ tran_y(y_test [i]) for i in range (len (y_test) )])
引入tensorboard
tensorboard =tensorflow.keras.callbacks.TensorBoard(histogram_freq=1)
模型拟合
model_vgg_mnist.fit(x_train, y_train_ohe,validation_split=0.2,epochs=200,batch_size=128,shuffle=True,callbacks=[tensorboard])
模型保存
tensorflow.saved_model.save(model_vgg_mnist,'tflearn-vgg-mnist')
获得损失值和准确率
loss1, accuracy1 = model.evaluate(x_train, y_train_ohe)
loss2, accuracy2 = model.evaluate(x_test, y_test_ohe)
作者为节省时间,只设置了epochs=2来进行实践,结果如下:
训练集loss=0.5910102128982544,准确率=0.8633999824523926
测试集loss=0.5841793417930603,准确率=0.8659999966621399
保存的模型文件如下
路径下生成了logs文件,打开cmd
输入
tensorboard --logdir=C:\Users\ThinkStation\Desktop\logs\train
可以查看训练时的一些损失曲线等。