【Tool】Keras 实战 II: VGG16图片分类

上篇文章中我们自己设计了一个神经网络,从头开始训练用于图片分类。由于我们只使用了5000张图片,我们只取得了80%左右的准确率。这篇文章中,我们使用VGG16作为我们的base_model,在这个基础上进行训练。keras.applications模块中,有几种训练好的base model,可以直接用来进行迁移学习。通过设计include_top=False,我们可以获得不含全连接层的基础网络。通过在后面加入自己的custom layers,我们将其可以用于不同的分类任务。

# finetune from the base model VGG16
base_model = VGG16(include_top=False, weights='imagenet', input_shape=(150, 150, 3))
base_model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 150, 150, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 75, 75, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 37, 37, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 37, 37, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 18, 18, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 18, 18, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 9, 9, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 4, 4, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

此时我们有两种做法,一种是使用base_model作为特征提取器,不参与训练,只训练自己加入的全连接层,第二种是base_model也参加训练,此时我们训练的是一个end-to-end model。第二种方法要更难训练一点,我们先看看第一种。

VGG16 as feature extractor

keras中通过设置layers.trainable,我们可以控制哪些层是可以训练的,哪些层是不可以训练的。基础代码和上一篇文章一样。区别就是如何使用base_model和新加入的层作为自己的model。

import os
import numpy as np
from keras.models import Sequential, Model
from keras import layers
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.applications.vgg16 import VGG16
from keras.utils.np_utils import to_categorical
from scipy.misc import imread, imresize
import matplotlib.pyplot as plt
imgs = []
labels = []
img_shape =(150,150)
# image generator
files = os.listdir('data/test')
# read 1000 files for the generator
for img_file in files[:1000]:
    img = imread('data/test/' + img_file).astype('float32')
    img = imresize(img, img_shape)
    imgs.append(img)

imgs = np.array(imgs)
train_gen = ImageDataGenerator(
     # rescale = 1./255,
     featurewise_center=True,
     featurewise_std_normalization=True,
     rotation_range=20,
     width_shift_range=0.2,
     height_shift_range=0.2,
     horizontal_flip=True)
val_gen = ImageDataGenerator(
     # rescale = 1./255,
     featurewise_center=True,
     featurewise_std_normalization=True)

train_gen.fit(imgs)
val_gen.fit(imgs)

# 4500 training images 
train_iter = train_gen.flow_from_directory('data/train',class_mode='binary',
                                            target_size=img_shape,   batch_size=16)
# 501 validation images
val_iter = val_gen.flow_from_directory('data/val', class_mode='binary', 
                                        target_size=img_shape, batch_size=16)

'''
# image generator debug
for x_batch, y_batch in img_iter:
    print(x_batch.shape)
    print(y_batch.shape)
    plt.imshow(x_batch[0])
    plt.show()
'''

# finetune from the base model VGG16
base_model = VGG16(include_top=False, weights='imagenet', input_shape=(150, 150, 3))
base_model.summary()

out = base_model.layers[-1].output
out = layers.Flatten()(out)
out  = layers.Dense(1024, activation='relu')(out)
# 因为前面输出的dense feature太多了,我们这里加入dropout layer来防止过拟合
out = layers.Dropout(0.5)(out)
out = layers.Dense(512, activation='relu')(out)
out = layers.Dropout(0.3)(out)
out = layers.Dense(1, activation='sigmoid')(out)
tuneModel = Model(inputs=base_model.input, outputs = out)
for layer in tuneModel.layers[:19]: # freeze the base model only use it as feature extractors
    layer.trainable = False
tuneModel.compile(loss='binary_crossentropy', optimizer=optimizers.RMSprop(lr=1e-4),
        metrics=['acc'])

history = tuneModel.fit_generator(
        generator=train_iter,
        steps_per_epoch=100,
        epochs=100,
        validation_data=val_iter,
        validation_steps=32
        )

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1,101)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'r', label='Validation acc')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.legend()
plt.show()
# 输出
Epoch 1/100
100/100 [==============================] - 677s 7s/step - loss: 0.4214 - acc: 0.8113 - val_loss: 0.1659 - val_acc: 0.9311
Epoch 2/100
100/100 [==============================] - 786s 8s/step - loss: 0.2618 - acc: 0.8900 - val_loss: 0.1576 - val_acc: 0.9351

可以看到两个epoch之后就基本达到93%的accuracy,感觉像magic,在自己数据和计算资源有限的情况下finetune确实是一种很有效的提升效果的方式啊。

你可能感兴趣的:(【Tool】Keras 实战 II: VGG16图片分类)