利用VGG16网络模块进行迁移学习实现图像识别

​ ImageNet虽然带有”Net“,但他不是一种深度神经网络模型,它是个数据集,斯坦福大学教授李飞飞带头建立,是目前图像分类、检测、定位的最常用数据集之一。该数据集含大量数据1500万图片,2.2万类别,真彩图(RGB三通道)。

​ mnist是手写数字识别数据集,训练集包含60000 张图像和标签,而测试集包含了10000 张图像和标签,每张图片是一个28*28像素点的0 ~ 9的灰质手写数字图片,黑底白字,图像像素值为0 ~ 255,越大该点越白。

​ 本例将在ImageNet训练的VGG16模块迁移至mnist模块。

导入keras中有关神经网络的功能模块

import tensorflow
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.layers import Input, Flatten, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD

导入VGG16模块

from tensorflow.keras.applications.vgg16 import VGG16

导入keras自带的mnist数据集

from tensorflow.keras.datasets import mnist

导入图像处理和计算模块

import cv2
import numpy as np

只迁移网络结构,不迁移网络权重

model_vgg = VGG16(include_top=False, weights='imagenet',
                  input_shape=(224, 224, 3))  
model = Flatten(name='flatten')(model_vgg.output)
model = Dense(10, activation='softmax')(model)
model_vgg_mnist=Model(model_vgg.input,model,name='vgg16')
print(model_vgg_mnist.summary())

其中,include_top=False 表示迁移除顶层之外的其余网络结构到自己模型中。

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
dense (Dense)                (None, 10)                250890    
=================================================================
Total params: 14,965,578
Trainable params: 14,965,578
Non-trainable params: 0

可以看到,需要训练的参数个数为14,965,578,达到千万级别。

迁移权重的优点:网络权值不需要重新训练,只需要训练输入层的网络权值即可;
迁移权重的缺点:已有权重是基于imagenet数据集进行训练的,样本种类、数据分布等与本次训练的数据集不一定相似。

同时迁移网络结构和权重

ishape = 224
model_vgg = VGG16(include_top=False, weights='imagenet', input_shape=(ishape, ishape, 3))
for layer in model_vgg.layers:
    layer.trainable = False
model = Flatten(name='flatten')(model_vgg.output)
model = Dense(10, activation='softmax')(model)
model_vgg_mnist = Model(model_vgg.input, model, name='vgg16_pretrain')
print(model_vgg_mnist.summary())
Model: "vgg16_pretrain"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
dense (Dense)                (None, 10)                250890    
=================================================================
Total params: 14,965,578
Trainable params: 250,890
Non-trainable params: 14,714,688
_________________________________________________________________
None

只需要训练250890(十万级)个参数

实践

由于训练时间较长,本例中设置输入图像尺寸为(56,56)

Model: "vgg16_pretrain"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 56, 56, 3)]       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 56, 56, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 56, 56, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 28, 28, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 28, 28, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 28, 28, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 14, 14, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 14, 14, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 14, 14, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 14, 14, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 7, 7, 256)         0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 7, 7, 512)         1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 7, 7, 512)         2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 7, 7, 512)         2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 3, 3, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 3, 3, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 3, 3, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 3, 3, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 512)               0         
_________________________________________________________________
dense (Dense)                (None, 10)                5130      
=================================================================
Total params: 14,719,818
Trainable params: 5,130
Non-trainable params: 14,714,688
_________________________________________________________________
None

对mnist数据集进行处理

(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train[0])
print(x_train[0].shape)  # (28, 28)

将 (28, 28)变成(56,56),mnist图像是黑白的,需要转成三维数据。

ishape = 56
x_train = [cv2.cvtColor(cv2.resize (i, (ishape, ishape)),cv2.COLOR_GRAY2BGR) for i in x_train]
x_train = np.concatenate ([arr[ np.newaxis] for arr in x_train] ).astype ("float32")
x_test = [cv2.cvtColor(cv2.resize (i, (ishape, ishape)),cv2.COLOR_GRAY2BGR) for i in x_test ]
x_test = np.concatenate ([arr[ np.newaxis] for arr in x_test ] ).astype ("float32")
print(x_train.shape)#(60000, 56, 56, 3)
print(x_test.shape)#(10000, 56, 56, 3)

归一化

x_train/=255
x_test/=255

将标签y进行one-hot编码

def tran_y(y):
    y_ohe=np.zeros(10)
    y_ohe[y]=1
    return y_ohe

y_train_ohe = np.array([tran_y(y_train[i]) for i in range (len (y_train) )])
y_test_ohe = np. array([ tran_y(y_test [i]) for i in range (len (y_test) )])

引入tensorboard

tensorboard =tensorflow.keras.callbacks.TensorBoard(histogram_freq=1)

模型拟合

model_vgg_mnist.fit(x_train, y_train_ohe,validation_split=0.2,epochs=200,batch_size=128,shuffle=True,callbacks=[tensorboard])

模型保存

tensorflow.saved_model.save(model_vgg_mnist,'tflearn-vgg-mnist')

获得损失值和准确率

loss1, accuracy1 = model.evaluate(x_train, y_train_ohe)
loss2, accuracy2 = model.evaluate(x_test, y_test_ohe)

作者为节省时间,只设置了epochs=2来进行实践,结果如下:
训练集loss=0.5910102128982544,准确率=0.8633999824523926
测试集loss=0.5841793417930603,准确率=0.8659999966621399

保存的模型文件如下
利用VGG16网络模块进行迁移学习实现图像识别_第1张图片
​ 路径下生成了logs文件,打开cmd
输入

tensorboard --logdir=C:\Users\ThinkStation\Desktop\logs\train

​ 可以查看训练时的一些损失曲线等。

你可能感兴趣的:(机器学习和深度学习,迁移学习,tensorflow,深度学习)