【深度学习笔记2.2.3】AlexNet训练17flowers

概述

本文介绍使用AlexNet做17flowers的分类任务,代码参考文献[1],数据集17flowers来自文献[2],预训练模型bvlc_alexnet.npy来自文献[4]。

实验1:finetune最后一个全连接层

调参实验总结如下:

  1. 初始学习率不能大于0.0001,否则训练loss将会是nan;
  2. 如果learning_rate_init = 0.0001,train_layers = [‘fc7’, ‘fc8’],则训练loss也是nan;
  3. 如果learning_rate_init = 0.00001,train_layers = [‘fc7’, ‘fc8’],则训练loss可以收敛;
  4. 当train_layers = [‘fc6’, ‘fc7’, ‘fc8’]时,无论learning_rate_init多小,训练loss始终是nan;
  5. 如果learning_rate_init = 0.0001,train_layers = [‘fc8’],发现是否动态调整学习率,对训练loss以及准确率并无太大区别;

代码示例1,完整代码详见文献[5]alexnet_train1.py:

import os
import codecs
from datetime import datetime
import cv2
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import pickle as pkl
import random

from alexnet import AlexNet

# Learning paramss
learning_rate_init = 0.0001
training_epoch = 500
batch_size = 128

# Network params
dropout_rate = 0.5
num_classes = 17
input_image_shape = [227, 227, 3]  # width, height, channel

train_layers = ['fc8']

#placeholder for input and dropout rate
x = tf.placeholder(tf.float32, [None, input_image_shape[0], input_image_shape[1], input_image_shape[2]])
y = tf.placeholder(tf.float32, [None, num_classes])
keep_prob = tf.placeholder(tf.float32)
learning_rate_hold = tf.placeholder(tf.float32)

def resize_image(in_image, new_width, new_height, out_image=None, resize_mode=cv2.INTER_CUBIC):
    img = cv2.resize(in_image, (new_width, new_height), resize_mode)
    if out_image:
        cv2.imwrite(out_image, img)
    return img

def load_data(datafile, num_class, save=False, save_path='dataset.pkl', shuffle=True):
    fr = codecs.open(datafile, 'r', 'utf-8')
    train_list = fr.readlines()
    if shuffle:
        random.shuffle(train_list)
    labels = []
    images = []
    for line in train_list:
        tmp = line.strip().split(' ')
        fpath = tmp[0]
        img = cv2.imread(fpath)
        img = resize_image(img, input_image_shape[0], input_image_shape[1])
        np_img = np.asarray(img, dtype="float32")
        images.append(np_img)

        index = int(tmp[1])
        label = np.zeros(num_class)
        label[index] = 1
        labels.append(label)
    if save:
        pkl.dump((images, labels), open(save_path, 'wb'))
    fr.close()
    images = np.array(images)
    labels = np.array(labels)
    return images, labels


def train(network, images, labels, model_savepath):
    train_size = int(len(images) * 8.0 / 10.0)
    X_train = images[0:train_size]
    Y_train = labels[0:train_size]
    X_val = images[train_size:]
    Y_val = labels[train_size:]

    # Op for calculating the loss
    with tf.name_scope("cross_ent"):
        # 方法1: 自己实现交叉熵
        y_output = tf.nn.softmax(network)  # 对网络最后一层的输出做softmax, 这通常是求取输出属于某一类的概率
        cross_entropy = -tf.reduce_sum(y * tf.log(y_output))  # 用softmax的输出向量和样本的实际标签做一个交叉熵.
        # (交叉熵本应是一个向量,只不过这里的tf.reduce_mean直接是求取tensor所有维度的和)
        loss = tf.reduce_mean(cross_entropy)  # 对交叉熵求均值就是loss
        # loss = -tf.reduce_mean(y * tf.log(y_output))  # OK

        # 方法2:使用tensorflow自带的tf.nn.softmax_cross_entropy_with_logits函数实现交叉熵
        # loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=score, labels=y))

    # Train op
    with tf.name_scope("train"):
        optimizer = tf.train.GradientDescentOptimizer(learning_rate_hold)
        train_op = optimizer.minimize(loss)

        '''
        # 本段代码参考文献[3],训练也ok,但效果不如上述代码好
        var_list = [v for v in tf.trainable_variables() if v.name.split('/')[0] in train_layers]
        gradients = tf.gradients(ys=loss, xs=var_list)
        gradients = list(zip(gradients, var_list))
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate_init)
        train_op = optimizer.apply_gradients(grads_and_vars=gradients)
        '''

    # Evaluation op: Accuracy of the model
    with tf.name_scope("accuracy"):
        correct_pred = tf.equal(tf.argmax(network, 1), tf.argmax(y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

    saver = tf.train.Saver()
    init = tf.global_variables_initializer()

    loss_buf = []
    accuracy_buf = []
    with tf.Session() as sess:
        sess.run(init)

        # Load the pretrained weights into the non-trainable layer
        model.load_initial_weights(sess)

        print("{} Start training...".format(datetime.now()))
        boundaries = [200, 300, 400]
        learning_rates = [learning_rate_init, learning_rate_init / 10.0, learning_rate_init / 100.0,
                          learning_rate_init / 1000.0]

        total_batch = len(X_train) // batch_size
        for step in range(training_epoch):
            print("{} Epoch number: {}".format(datetime.now(), step + 1))
            learning_rate = tf.train.piecewise_constant(step, boundaries=boundaries, values=learning_rates)
            learning_rate = sess.run([learning_rate])[0]
            print('learning_rate = ', learning_rate)
            tmp_loss = []
            for i in range(total_batch):
                batch_xs = X_train[i * batch_size:(i + 1) * batch_size]
                batch_ys = Y_train[i * batch_size:(i + 1) * batch_size]
                sess.run(train_op, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout_rate,
                                              learning_rate_hold: learning_rate})

                loss_val = sess.run(loss, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout_rate})
                tmp_loss.append(loss_val)

                print("step {}, iteration {}, loss {}".format(step, i, loss_val))

            tmp_accuracy = []
            test_batch_size = batch_size
            test_batch = len(X_val) // test_batch_size
            for i in range(test_batch):
                batch_xs = X_val[i * test_batch_size:(i + 1) * test_batch_size]
                batch_ys = Y_val[i * test_batch_size:(i + 1) * test_batch_size]
                test_accuracy = sess.run(accuracy, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout_rate})

                tmp_accuracy.append(test_accuracy)

            accuracy_ndarray = np.array(tmp_accuracy)
            loss_ndarray = np.array(tmp_loss)
            acc_mean = accuracy_ndarray.mean()
            loss_mean = loss_ndarray.mean()
            accuracy_buf.append(acc_mean)
            loss_buf.append(loss_mean)
            print("step {}, loss {}, testing accuracy {}".format(step, loss_mean, acc_mean))

            if step in boundaries:
                print('{} saving checkpoint of model ...'.format(datetime.now()))
                checkpoint_name = os.path.join(model_savepath, 'model_epoch' + str(step + 1) + '.ckpt')
                saver.save(sess, checkpoint_name)
                print('{} Model checkpoint saved at {}'.format(datetime.now(), checkpoint_name))

        checkpoint_name = os.path.join(model_savepath, 'model_' + str(datetime.now()) + '.ckpt')
        saver.save(sess, checkpoint_name)
        print('{} Model checkpoint saved at {}'.format(datetime.now(), checkpoint_name))

    return accuracy_buf, loss_buf


def save_loss_acc(accuracy_buf, loss_buf, save_csvname):
    # 画出准确率曲线
    accuracy_ndarray = np.array(accuracy_buf)
    accuracy_size = np.arange(len(accuracy_ndarray))
    plt.plot(accuracy_size, accuracy_ndarray, 'b+', label='accuracy')

    loss_ndarray = np.array(loss_buf)
    loss_size = np.arange(len(loss_ndarray))
    plt.plot(loss_size, loss_ndarray, 'r*', label='loss')

    plt.show()

    with open(save_csvname, 'w') as fid:
        for loss, acc in zip(loss_buf, accuracy_buf):
            strText = str(loss) + ',' + str(acc) + '\n'
            fid.write(strText)
    fid.close()


if __name__ == '__main__':
    X, Y = load_data('./train.txt', num_classes)
    model = AlexNet(x, keep_prob, num_classes, skip_layer=train_layers)
    model_savepath = './savemodel/train[fc8]-assign'
    accuracy_buf, loss_buf = train(model.fc8, X, Y, model_savepath)
    csv_savepath = os.path.join(model_savepath, 'AlexNet-train[fc8]-assign.csv')
    save_loss_acc(accuracy_buf, loss_buf, csv_savepath)


print('end')

实验测试结果如下:
epoch 0, loss 460.86008, testing accuracy 0.0546875
epoch 1, loss 348.19464, testing accuracy 0.13671875
epoch 2, loss 303.55258, testing accuracy 0.23046875
epoch 3, loss 267.38477, testing accuracy 0.421875

epoch 498, loss 0.010247562, testing accuracy 0.875
epoch 499, loss 0.024827117, testing accuracy 0.85546875

实验2:finetune最后一个全连接层,且训练时使用dropout,测试时不使用dropout

  dropout是指在深度学习网络的训练过程中,将神经元按照一定的概率进行丢弃。而在测试阶段我们并不需要丢弃神经元,固在测试阶段可以不使用dropout。(me:不使用dropout,应该等效于dropout_rate=0)

AlexNet代码做如下改变,详细见文献[5] alexnet.py

class AlexNet(object):
    def __init__(self, x, keep_prob, num_classes, skip_layer,
                 weights_path='DEFAULT', is_train=True):
        ... ...
        self.IS_TRAIN = is_train
        ... ...

    def create(self):
        ... ...
        flattened = tf.reshape(pool5, [-1, flatsize])
        fc6 = fc(flattened, flatsize, 4096, name='fc6')
        fc6 = tf.cond(self.IS_TRAIN, lambda: dropout(fc6, self.KEEP_PROB), lambda: fc6)

        # 7th Layer: FC (w ReLu) -> Dropout
        fc7 = fc(fc6, 4096, 4096, name='fc7')
        fc7 = tf.cond(self.IS_TRAIN, lambda: dropout(fc7, self.KEEP_PROB), lambda: fc7)
        ... ...

训练部分代码做如下修改,详见文献[5] alexnet_train2.py

...
phase_train_hold = tf.placeholder(tf.bool, name='phase_train')

def train(...):
    ...
    with tf.Session() as sess:
        sess.run(train_op, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout_rate, learning_rate_hold: learning_rate, phase_train_hold: True})
        loss_val = sess.run(loss, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout_rate, phase_train_hold: False})
        test_accuracy = sess.run(accuracy, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout_rate, phase_train_hold: False})

训练测试结果如下:
epoch 0, loss 347.5379, testing accuracy 0.2109375
epoch 1, loss 225.7894, testing accuracy 0.58984375
epoch 2, loss 198.0277, testing accuracy 0.69140625
epoch 3, loss 93.13631, testing accuracy 0.7421875

epoch 499, loss 0.00022953929, testing accuracy 0.91015625

实验3:finetune所有层,且训练时使用dropout,测试时不使用dropout

复用自己训练出来的效果最好的模型,用相同的数据集再次finetune所有层参数。
对实验2训练部分代码做如下修改,完整代码详见文献[5]alexnet_train3.py:

...
checkpoint_path = './savemodel/train_fc8-trainDrop_testNoDrop/'

def train(...):
    ...
    saver = tf.train.Saver()  # 用于保存新的模型
    # init = tf.global_variables_initializer()

    with tf.Session() as sess:
        # sess.run(init)  # 一定要注释掉

        # model.load_initial_weights(sess)
        restore_saver = tf.train.import_meta_graph(checkpoint_path + 'model_2019-02-25 11:24:58.052133.ckpt.meta')  # 这里的restore_saver命名一定不能和前面的saver重名
        restore_saver.restore(sess, tf.train.latest_checkpoint(checkpoint_path))
        ...
        sess.run(train_op, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout_rate, learning_rate_hold: learning_rate, phase_train_hold: True})
        loss_val = sess.run(loss, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout_rate, phase_train_hold: False})
        test_accuracy = sess.run(accuracy, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout_rate, phase_train_hold: False})
...

总结:

  1. finetune所有层时,如果还是使用bvlc_alexnet.npy作为预训练模型,则训练将无法收敛,loss为nan;
  2. 本次实验复用实验2或实验3中的模型,测试效果提升明显;
  3. 在复用实验2或实验3中的模型finetune时,即使是相同的代码,每次训练的收敛速度和训练结果都略有不同;(me:这可能是我的数据集是随机拆分的,所以每次训练时训练集和测试集都各不相同;也有可能是每次梯度下降的方向也各不相同。)

训练测试结果如下:
epoch 0, loss 267.6259, testing accuracy 0.19140625
epoch 1, loss 390.37134, testing accuracy 0.12109375
epoch 2, loss 357.51944, testing accuracy 0.08984375
epoch 3, loss 341.1049, testing accuracy 0.15625
epoch 4, loss 261.40005, testing accuracy 0.515625
epoch 5, loss 50.218945, testing accuracy 0.92578125
epoch 6, loss 7.988167, testing accuracy 0.94921875

epoch 481, loss 0.00028576463, testing accuracy 0.98046875
epoch 482, loss nan, testing accuracy 0.07421875

实验4:finetune所有层,且训练、测试时都不使用dropout

将alexnet.py中的dropout层注释掉即可,训练代码同alexnet_train3.py。
训练测试结果如下:
epoch 0, loss 6.194342, testing accuracy 0.98828125
epoch 1, loss 0.30533525, testing accuracy 0.9921875
epoch 2, loss 0.13662925, testing accuracy 0.98828125

epoch 499, loss 0.0039327564, testing accuracy 0.98828125

实验5:finetune所有层,训练时使用dropout,测试时不使用dropout,且使用apply_gradients

对实验3中的alexnet_train3.py做如下修改,详见文献[5]alexnet_train5.py:

def train(...):
    ...
    with tf.name_scope("train"):
        # optimizer = tf.train.GradientDescentOptimizer(learning_rate_hold)
        # train_op = optimizer.minimize(loss)

        var_list = [v for v in tf.trainable_variables() if v.name.split('/')[0] in train_layers]
        gradients = tf.gradients(ys=loss, xs=var_list)
        gradients = list(zip(gradients, var_list))
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate_init)
        train_op = optimizer.apply_gradients(grads_and_vars=gradients)

训练测试结果如下:
epoch 0, loss 11.310171, testing accuracy 0.984375
epoch 1, loss 10.07435, testing accuracy 0.984375

epoch 499, loss 0.0061178566, testing accuracy 0.984375

最后附上各次实验的loss和accuracy曲线图:
loss曲线这里只提取了前60个数值进行显示,因为后面440个数值都近乎相等。

实验总结

  1. 当样本量较小时,可以先finetune最后一层全连接层,然后在此基础上继续训练所有层;
  2. 如果训练时使用dropout,那么当训练结束后,我们可以尝试去掉dropout再次finetune一下,这或许可以进一步提升识别效果;
  3. 其他可参考上述各次实验中的总结。

参考文献

[1] finetune_alexnet_with_tensorflow
[2] tenosrflow-alexnet
[3] tensorflow-AlexNet -> Finetune.py
[4] AlexNet implementation + weights in TensorFlow
[5] 我的handml仓库

你可能感兴趣的:(深度学习笔记)