python tensorflow学习(四)加大难度!AlexNet模型实现

加大难度!AlexNet模型实现

Alex 网络[8]是现代意义上的深度卷积神经网络的起源,是在 2012 年被推出
的一个经典网络模型,取得了同年 ImageNet 比赛的最优成绩。相比于上一章的 LeNet-5,AlexNet 的层次显著加深,参数规模也显著变大。
大纲

  • 猫狗大战数据集处理
    • 数据集的加工
    • 转换为TFRecord格式
  • Alex网络模型实现
    • 网络模型架构
    • 模型的训练存储及结果可视化

猫狗大战数据集处理

猫狗大战的数据集来源于Kaggle上的一个竞赛:Dogs vs. Cats。该数据集包括12500张猫的图片以及12500张狗的图片,是一个二分类问题。官方提供了免费下载:下载地址点这里.如果不想注册账号,还有微软的版本:下载地址二。
不同于MNIIST数据集,这个数据集均来自于真实的照片,tensorflow中也没有封装好的函数来读取该数据,所以需要对该数据进行预处理。

数据集的加工

首先,AlexNet模型的输入图像大小为227×227×3,所以需要把数据集的分辨率调整为该大小,这里使用opencv进行处理

# 把图片大小转换为227x227x3
def rebuild(dir):
    for root, dirs, files in os.walk(dir):
        for file in files:
            try:
                filepath = os.path.join(root, file)
                image = cv2.imread(filepath)
                dim = (227, 227)
                resized = cv2.resize(image, dim)
                print(file)
                path = "D:/cat_and_dog/Cat/"+file
                cv2.imwrite(path, resized)
            except IOError:
                print(filepath)
                os.remove(filepath)
        cv2.waitKey(0)   # 退出

rebuild("D:/PetImages/Cat")

对于损坏的数据,这里使用os.remove()直接删除。

转换为TFRecord格式

在第二章介绍过了TFRecord文件的创建和读取,传送门->python tensorflow学习(二) tensorflow数据的生成与读取 这里不再介绍,直接贴源代码:
获取数据集和标签:

# 设置标签
def get_file(file_dir):
    images = []
    temp = []
    for root, sub_folders, files in os.walk(file_dir):
        for name in files:
            images.append(os.path.join(root, name))

        for name in sub_folders:
            temp.append(os.path.join(root, name))
    labels = []
    for one_folder in temp:
        n_img = len(os.listdir(one_folder))
        letter = one_folder.split("\\")[-1]
        if letter=='Cat':
            labels = np.append(labels, n_img*[0])
        else:
            labels = np.append(labels, n_img*[1])

    temp = np.array([images, labels])
    temp = temp.transpose()
    np.random.shuffle(temp)

    image_list = list(temp[:, 0])
    label_list = list(temp[:, 1])
    label_list = [int(float(i)) for i in label_list]

    return image_list, label_list

imagelist, labellist = get_file("D:/cat_and_dog")

转换为TFRecord文件:

# 生成TFRecord文件
def int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))

def bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def ToTFRecord(image_list, label_list, save_dir, name):
    filename = os.path.join(save_dir, name+'.tfrecords')
    n_samples = len(imagelist)
    writer = tf.python_io.TFRecordWriter(filename)
    print("transform start...")
    for i in np.arange(0, n_samples):
        try:
            image = cv2.imread(image_list[i])
            image_raw = image.tostring()
            label = [int(label_list[i])]
            example = tf.train.Example(features=tf.train.Features(feature={
                'label':int64_feature(label),
                'image_raw':bytes_feature(image_raw)
            }))
            writer.write(example.SerializeToString())
        except IOError as e:
            print('could not read:', image_list[i])
    writer.close()
    print('transform done!')

其中save_dirname分别是存储路径和文件名。
至此TFRecord文件已经生成,还需要一个能读取该文件的函数来获取数据:

#  读取数据
def read_and_decode(tfrecord_file, batch_size):
    filename_queue = tf.train.string_input_producer([tfrecord_file])
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)
    img_features = tf.parse_single_example(
        serialized_example,
        features={
            'label': tf.FixedLenFeature([], tf.int64),
            'image_raw': tf.FixedLenFeature([], tf.string),
        })
    image = tf.decode_raw(img_features['image_raw'], tf.uint8)
    image = tf.reshape(image, [227, 227, 3])
    label = tf.cast(img_features['label'], tf.int32)
    image_batch, label_batch = tf.train.shuffle_batch([image, label],
                                                      batch_size=batch_size,
                                                      min_after_dequeue=100,
                                                      num_threads=64,
                                                      capacity=200
                                                      )
    return image_batch, tf.reshape(label_batch, [batch_size])

image_batch, label_batch = read_and_decode('cat_and_dog.tfrecords', 25)

该函数每次读取指定数目的数据集以便训练时提供。

网络模型架构

在实现每一层的架构之前,先对参数进行集中管理,这是一个很好的习惯:

# 集中管理参数
learning_rate = 1e-4  # 学习速率
training_iters = 200  # 迭代次数
batch_size = 50       # 每批的大小 
n_classes = 2         # 种类
n_fc1 = 4096          
n_fc2 = 2048          

# 构建模型
x = tf.placeholder(tf.float32, [None, 227, 227, 3])
y = tf.placeholder(tf.int32, [None, n_classes])

W_conv = {
    'conv1': tf.Variable(tf.truncated_normal([11, 11, 3, 96], stddev=0.0001)),
    'conv2': tf.Variable(tf.truncated_normal([5, 5, 96, 256], stddev=0.01)),
    'conv3': tf.Variable(tf.truncated_normal([3, 3, 256, 384], stddev=0.01)),
    'conv4': tf.Variable(tf.truncated_normal([3, 3, 384, 384], stddev=0.01)),
    'conv5': tf.Variable(tf.truncated_normal([3, 3, 384, 256], stddev=0.01)),
    'fc1': tf.Variable(tf.truncated_normal([6*6*256, n_fc1], stddev=0.1)),
    'fc2': tf.Variable(tf.truncated_normal([n_fc1, n_fc2], stddev=0.1)),
    'fc3': tf.Variable(tf.truncated_normal([n_fc2, n_classes], stddev=0.1))
}

b_conv = {
    'conv1': tf.Variable(tf.constant(0.0, dtype=tf.float32, shape=[96])),
    'conv2': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[256])),
    'conv3': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[384])),
    'conv4': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[384])),
    'conv5': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[256])),
    'fc1': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[n_fc1])),
    'fc2': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[n_fc2])),
    'fc3': tf.Variable(tf.constant(0.0, dtype=tf.float32, shape=[n_classes]))
}

下面实现各层:
第一层卷积池化

# 第一层卷积
conv1 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(x, W_conv['conv1'], strides=[1, 4, 4, 1], padding='SAME'), b_conv['conv1']))
pool1 = tf.nn.avg_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID')
# LRN局部相应归一化
norm1 = tf.nn.lrn(pool1, 5, bias=1.0, alpha=0.001/9.0, beta=0.75)

在AlexNet中使用了LRN(局部相应归一化)来缓解过拟合,后来被BN(批归一化)所取代。但在这里尽量还原原模型。
第二层卷积池化

# 第二层卷积
conv2 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(norm1, W_conv['conv2'], strides=[1, 1, 1, 1], padding='SAME'), b_conv['conv2']))
pool2 = tf.nn.avg_pool(conv2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID')
# LRN局部相应归一化
norm2 = tf.nn.lrn(pool2, 5, bias=1.0, alpha=0.001/9.0, beta=0.75)

第三层卷积(没有进行池化)

# 第三层卷积
conv3 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(norm2, W_conv['conv3'], strides=[1, 1, 1, 1], padding='SAME'), b_conv['conv3']))

第四层卷积(没有池化)

# 第四层卷积
conv4 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv3, W_conv['conv4'], strides=[1, 1, 1, 1], padding='SAME'), b_conv['conv4']))

第五层卷积池化

# 第五层卷积
conv5 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv4, W_conv['conv5'], strides=[1, 1, 1, 1], padding='SAME'), b_conv['conv5']))
pool5 = tf.nn.avg_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID')

将矩阵展开成一维进行第六层全连接:

reshape = tf.reshape(pool5, [-1, 6*6*256])

# 第六层全连接
fc1 = tf.nn.relu(tf.add(tf.matmul(reshape, W_conv['fc1']), b_conv['fc1']))
# dropout
fc1 = tf.nn.dropout(fc1, 0.5)

这里采用了dropout技术来抑制一部分神经元,以缓解过拟合
第七层全连接

# 第七层全连接
fc2 = tf.nn.relu(tf.add(tf.matmul(fc1, W_conv['fc2']), b_conv['fc2']))
# dropout
fc2 = tf.nn.dropout(fc2, 0.5)

分类:

# 全连接分类
fc3 = tf.add(tf.matmul(fc2, W_conv['fc3']), b_conv['fc3'])

定义损失和精度:

# 定义损失
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=fc3, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

# 模型精度
correct_num = tf.equal(tf.arg_max(fc3, 1), tf.arg_max(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_num, tf.float32))

至此,整个AlexNet模型已经搭建完成!

模型的训练存储及结果可视化

tensorflow提供了方法以存储模型和结构及权重,即tf.sava函数。
下面开始训练,并将训练后的模型保存下来:

# 训练
def train():
    with tf.Session() as sess:
        sess.run(init)
        save_model = ".//model//AlexNetModel.ckpt"
        train_writer = tf.summary.FileWriter(".//log", sess.graph)
        saver = tf.train.Saver()
        losslist = []  # 存储损失
        coord = tf.train.Coordinator()
        treades = tf.train.start_queue_runners(coord=coord)
        step = 0
        for i in range(500):
            step = i
            image, label = sess.run([image_batch, label_batch])
            labels = onehot(label)
            sess.run(optimizer, feed_dict={x: image, y: labels})
            loss_record = sess.run(loss, feed_dict={x: image, y: labels})
            print(loss_record)
            losslist.append(loss_record)

        saver = tf.train.Saver()
        saver.save(sess, save_model)
        print("model save finished!")
        coord.request_stop()
        coord.join(treades)
    plt.plot(losslist)
    plt.xlabel('iter')
    plt.ylabel('loss')
    plt.tight_layout()
    plt.savefig('cnn-tf-AlexNet.png', dpi=200)

上述函数用到了onehot函数来把标签转换为onehot形式,该函数的代码如下:

# 转换为one-hot
def onehot(label):
    onehot_label = np.zeros([len(label), max(label)+1])
    for i in range(len(label)):
        onehot_label[i][label[i]] = 1
    return onehot_label

与上一张不同的是,这次我们的输出和图表都是以损失为标准(一般精度变化和损失变化都是重要的指标)
训练结果:
python tensorflow学习(四)加大难度!AlexNet模型实现_第1张图片
完整代码:

import tensorflow as tf
import matplotlib.pyplot as plt
import cv2
import os
import  numpy as np
import io

# 集中管理参数
learning_rate = 1e-4  # 学习速率
training_iters = 200  # 迭代次数
batch_size = 50       # 每批的大小
display_step = 5      #
n_classes = 2         #
n_fc1 = 4096          #
n_fc2 = 2048          #

# 构建模型
x = tf.placeholder(tf.float32, [None, 227, 227, 3])
y = tf.placeholder(tf.int32, [None, n_classes])

W_conv = {
    'conv1': tf.Variable(tf.truncated_normal([11, 11, 3, 96], stddev=0.0001)),
    'conv2': tf.Variable(tf.truncated_normal([5, 5, 96, 256], stddev=0.01)),
    'conv3': tf.Variable(tf.truncated_normal([3, 3, 256, 384], stddev=0.01)),
    'conv4': tf.Variable(tf.truncated_normal([3, 3, 384, 384], stddev=0.01)),
    'conv5': tf.Variable(tf.truncated_normal([3, 3, 384, 256], stddev=0.01)),
    'fc1': tf.Variable(tf.truncated_normal([6*6*256, n_fc1], stddev=0.1)),
    'fc2': tf.Variable(tf.truncated_normal([n_fc1, n_fc2], stddev=0.1)),
    'fc3': tf.Variable(tf.truncated_normal([n_fc2, n_classes], stddev=0.1))
}

b_conv = {
    'conv1': tf.Variable(tf.constant(0.0, dtype=tf.float32, shape=[96])),
    'conv2': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[256])),
    'conv3': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[384])),
    'conv4': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[384])),
    'conv5': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[256])),
    'fc1': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[n_fc1])),
    'fc2': tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[n_fc2])),
    'fc3': tf.Variable(tf.constant(0.0, dtype=tf.float32, shape=[n_classes]))
}

# 第一层卷积
conv1 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(x, W_conv['conv1'], strides=[1, 4, 4, 1], padding='SAME'), b_conv['conv1']))
pool1 = tf.nn.avg_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID')
# LRN局部相应归一化
norm1 = tf.nn.lrn(pool1, 5, bias=1.0, alpha=0.001/9.0, beta=0.75)

# 第二层卷积
conv2 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(norm1, W_conv['conv2'], strides=[1, 1, 1, 1], padding='SAME'), b_conv['conv2']))
pool2 = tf.nn.avg_pool(conv2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID')
# LRN局部相应归一化
norm2 = tf.nn.lrn(pool2, 5, bias=1.0, alpha=0.001/9.0, beta=0.75)

# 第三层卷积
conv3 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(norm2, W_conv['conv3'], strides=[1, 1, 1, 1], padding='SAME'), b_conv['conv3']))

# 第四层卷积
conv4 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv3, W_conv['conv4'], strides=[1, 1, 1, 1], padding='SAME'), b_conv['conv4']))

# 第五层卷积
conv5 = tf.nn.relu(tf.nn.bias_add(tf.nn.conv2d(conv4, W_conv['conv5'], strides=[1, 1, 1, 1], padding='SAME'), b_conv['conv5']))
pool5 = tf.nn.avg_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID')

reshape = tf.reshape(pool5, [-1, 6*6*256])

# 第六层全连接
fc1 = tf.nn.relu(tf.add(tf.matmul(reshape, W_conv['fc1']), b_conv['fc1']))
# dropout
fc1 = tf.nn.dropout(fc1, 0.5)

# 第七层全连接
fc2 = tf.nn.relu(tf.add(tf.matmul(fc1, W_conv['fc2']), b_conv['fc2']))
# dropout
fc2 = tf.nn.dropout(fc2, 0.5)

# 全连接分类
fc3 = tf.add(tf.matmul(fc2, W_conv['fc3']), b_conv['fc3'])

# 定义损失
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=fc3, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

# 模型精度
correct_num = tf.equal(tf.arg_max(fc3, 1), tf.arg_max(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_num, tf.float32))

# 初始化变量
init = tf.global_variables_initializer()




# 把图片大小转换为227x227
def rebuild(dir):
    for root, dirs, files in os.walk(dir):
        for file in files:
            try:
                filepath = os.path.join(root, file)
                image = cv2.imread(filepath)
                dim = (227, 227)
                resized = cv2.resize(image, dim)
                print(file)
                path = "D:/cat_and_dog/Cat/"+file
                cv2.imwrite(path, resized)
            except IOError:
                print(filepath)
                os.remove(filepath)
        cv2.waitKey(0)   # 退出

rebuild("D:/PetImages/Cat")

# 设置标签
def get_file(file_dir):
    images = []
    temp = []
    for root, sub_folders, files in os.walk(file_dir):
        for name in files:
            images.append(os.path.join(root, name))

        for name in sub_folders:
            temp.append(os.path.join(root, name))
    labels = []
    for one_folder in temp:
        n_img = len(os.listdir(one_folder))
        letter = one_folder.split("\\")[-1]
        if letter=='Cat':
            labels = np.append(labels, n_img*[0])
        else:
            labels = np.append(labels, n_img*[1])

    temp = np.array([images, labels])
    temp = temp.transpose()
    np.random.shuffle(temp)

    image_list = list(temp[:, 0])
    label_list = list(temp[:, 1])
    label_list = [int(float(i)) for i in label_list]

    return image_list, label_list

imagelist, labellist = get_file("D:/cat_and_dog")


# 生成TFRecord文件
def int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))

def bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def ToTFRecord(image_list, label_list, save_dir, name):
    filename = os.path.join(save_dir, name+'.tfrecords')
    n_samples = len(imagelist)
    writer = tf.python_io.TFRecordWriter(filename)
    print("transform start...")
    for i in np.arange(0, n_samples):
        try:
            image = cv2.imread(image_list[i])
            image_raw = image.tostring()
            label = [int(label_list[i])]
            example = tf.train.Example(features=tf.train.Features(feature={
                'label':int64_feature(label),
                'image_raw':bytes_feature(image_raw)
            }))
            writer.write(example.SerializeToString())
        except IOError as e:
            print('could not read:', image_list[i])
    writer.close()
    print('transform done!')

ToTFRecord(imagelist, labellist, '.', 'cat_and_dog')



#  读取数据
def read_and_decode(tfrecord_file, batch_size):
    filename_queue = tf.train.string_input_producer([tfrecord_file])
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)
    img_features = tf.parse_single_example(
        serialized_example,
        features={
            'label': tf.FixedLenFeature([], tf.int64),
            'image_raw': tf.FixedLenFeature([], tf.string),
        })
    image = tf.decode_raw(img_features['image_raw'], tf.uint8)
    image = tf.reshape(image, [227, 227, 3])
    label = tf.cast(img_features['label'], tf.int32)
    image_batch, label_batch = tf.train.shuffle_batch([image, label],
                                                      batch_size=batch_size,
                                                      min_after_dequeue=100,
                                                      num_threads=64,
                                                      capacity=200
                                                      )
    return image_batch, tf.reshape(label_batch, [batch_size])

image_batch, label_batch = read_and_decode('cat_and_dog.tfrecords', 10)


# 转换为one-hot
def onehot(label):
    onehot_label = np.zeros([len(label), max(label)+1])
    for i in range(len(label)):
        onehot_label[i][label[i]] = 1
    return onehot_label


# 训练
def train():
    with tf.Session() as sess:
        sess.run(init)
        save_model = ".//model//AlexNetModel.ckpt"
        train_writer = tf.summary.FileWriter(".//log", sess.graph)
        saver = tf.train.Saver()
        losslist = []  # 存储损失
        coord = tf.train.Coordinator()
        treades = tf.train.start_queue_runners(coord=coord)
        step = 0
        for i in range(500):
            step = i
            image, label = sess.run([image_batch, label_batch])
            labels = onehot(label)
            sess.run(optimizer, feed_dict={x: image, y: labels})
            loss_record = sess.run(loss, feed_dict={x: image, y: labels})
            print(loss_record)
            losslist.append(loss_record)

        saver = tf.train.Saver()
        saver.save(sess, save_model)
        print("model save finished!")
        coord.request_stop()
        coord.join(treades)
    plt.plot(losslist)
    plt.xlabel('iter')
    plt.ylabel('loss')
    plt.tight_layout()
    plt.savefig('cnn-tf-AlexNet.png', dpi=200)

train()

这一章的内容还是非常多的,而且随着数据集大小的增长以及模型越来越复杂,训练时间也会越来越久,明天没空,预计断更一天^ ^。
ps:新学校饭堂的饭真的好吃啊,看来这三年又要无解肥了。。。

你可能感兴趣的:(tensorflow)