计算机视觉系列之学习笔记主要是本人进行学习人工智能(计算机视觉方向)的代码整理。本系列所有代码是用python3编写,在平台Anaconda中运行实现,在使用代码时,默认你已经安装相关的python库,这方面不做多余的说明。本系列所涉及的所有代码和资料可在我的github上下载到,gitbub地址:https://github.com/mcyJacky/DeepLearning-CV,如有问题,欢迎指出。
在使用多任务学习图形验证码时,我们首先要生成验证码图片,这里我们使用captcha验证码生成库,具体的使用方法如下:
# 验证码生成库
# pip install captcha
from captcha.image import ImageCaptcha
import numpy as np
from PIL import Image
import random
import sys
number = ['0','1','2','3','4','5','6','7','8','9']
def random_captcha_text(char_set=number, captcha_size=4):
# 验证码列表
captcha_text = []
for i in range(captcha_size):
# 随机选择
c = random.choice(char_set)
# 加入验证码列表
captcha_text.append(c)
return captcha_text
# 生成字符对应的验证码
def gen_captcha_text_and_image():
image = ImageCaptcha()
# 获得随机生成的验证码
captcha_text = random_captcha_text()
# 把验证码列表转为字符串
captcha_text = ''.join(captcha_text)
# 生成验证码
captcha = image.generate(captcha_text)
image.write(captcha_text, 'captcha/images/' + captcha_text + '.jpg') # 写到文件
# 数量少于6000,因为重名
num = 6000
if __name__ == '__main__':
for i in range(num):
gen_captcha_text_and_image()
sys.stdout.write('\r>> Creating image %d/%d' % (i+1, num))
sys.stdout.flush()
sys.stdout.write('\n')
sys.stdout.flush()
print("生成完毕")
执行完上述程序,就能在路径‘./captcha/images/’路径下生成验证码图片,图片样式如下图1.1所示,每张验证码图片的内容是从0-9之间的取4位数字,图片的命名为数字的名称,也是训练的标签。
这边与之前篇章的单个输出的训练任务不同,验证码是由4个数字组成,即我们要输出4多个不同的任务,即多任务学习。例如验证码的标签是0782,那就是由四个标签组合而成(one-hot独热编码),第一个label0:100000000,第二个label1:000000100,第三个label2:000000010,第四个label3:010000000。
下面我们使用AlexNet经典网络为基础进行多任务学习的训练(训练过程与AlexNet不完全一样),AlexNet是ImageNet2012年的冠军,它的网络结构如图2.1所示:
具体的实现如下:
import os
import tensorflow as tf
import numpy as np
import tensorflow.contrib.slim as slim
# 数据集路径
dataset_dir = "./captcha/images/"
# 测试集占比
num_test = 0.2
# 批次大小
batch_size = 32
# 周期大小
epochs = 100
# 分类数(4个任务,每个任务是10种)
num_classes = 10
# 学习率
lr = tf.Variable(0.001, dtype=tf.float32)
# 是否是训练状态
is_training = tf.placeholder(tf.bool)
# 获取所有验证码图片路径和标签
def get_filenames_and_classes(dataset_dir):
photo_filenames = []
labels = []
for filename in os.listdir(dataset_dir):
# 获取文件路径
path = os.path.join(dataset_dir, filename)
photo_filenames.append(path)
label = filename[0:4]
num_labels = []
for i in range(4):
num_labels.append(int(label[i]))
labels.append(num_labels)
return photo_filenames, labels
# 获取图片路径和标签
photo_filenames,labels = get_filenames_and_classes(dataset_dir)
photo_filenames = np.array(photo_filenames)
labels = np.array(labels)
# 打乱数据
np.random.seed(10)
shuffle_indices = np.random.permutation(np.arange(len(photo_filenames)))
photo_filenames_shuffled = photo_filenames[shuffle_indices]
labels_shuffled = labels[shuffle_indices]
# 切分训练集和测试集
test_sample_index = -1 * int(num_test * float(len(photo_filenames)))
x_train, x_test = photo_filenames_shuffled[:test_sample_index], photo_filenames_shuffled[test_sample_index:]
y_train, y_test = labels_shuffled[:test_sample_index], labels_shuffled[test_sample_index:]
# 图像处理函数
def parse_function(filenames, labels=None):
image = tf.read_file(filenames)
# 将图像解码
image = tf.image.decode_jpeg(image, channels=3)
# resize图片大小
image = tf.image.resize_images(image, [224, 224])
# 图片预处理[-1.1]
image = tf.cast(image, tf.float32) / 255.0
image = tf.subtract(image, 0.5)
image = tf.multiply(image, 2.0)
return image, labels
# 进行模型训练
# 定义两个placeholder
features_placeholder = tf.placeholder(photo_filenames_shuffled.dtype, [None])
labels_placeholder = tf.placeholder(labels_shuffled.dtype, [None, 4])
# 创建dataset对象
dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))
# 处理图片,用函数parse_function 处理数据
dataset = dataset.map(parse_function)
# 训练周期
dataset = dataset.repeat(1)
# 批次大小
dataset = dataset.batch(batch_size)
# 初始化迭代器
iterator = dataset.make_initializable_iterator()
# 获得一个批次数据和标签
data_batch, label_batch = iterator.get_next()
# 定义alexNet模型
def alexnet(inputs, is_training=True):
with slim.arg_scope([slim.conv2d, slim.fully_connected],
activation_fn=tf.nn.relu,
weights_initializer=tf.glorot_uniform_initializer(),
biases_initializer=tf.constant_initializer(0)):
net = slim.conv2d(inputs, 64, [11, 11], 4)
net = slim.max_pool2d(net, [3, 3])
net = slim.conv2d(net, 192, [5, 5])
net = slim.max_pool2d(net, [3, 3])
net = slim.conv2d(net, 384, [3, 3])
net = slim.conv2d(net, 384, [3, 3])
net = slim.conv2d(net, 256, [3, 3])
net = slim.max_pool2d(net, [3, 3])
# 数据扁平化
net = slim.flatten(net)
net = slim.fully_connected(net, 1024)
net = slim.dropout(net, is_training=is_training)
net0 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
net1 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
net2 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
net3 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
return net0,net1,net2,net3
# 定义会话
with tf.Session() as sess:
# 传入数据得到结果
logits0,logits1,logits2,logits3 = alexnet(data_batch, is_training)
# 定义loss
# sparse_softmax_cross_entropy:标签为整数
# softmax_cross_entropy:标签为one-hot独热编码
loss0 = tf.losses.sparse_softmax_cross_entropy(label_batch[:,0], logits0)
loss1 = tf.losses.sparse_softmax_cross_entropy(label_batch[:,1], logits1)
loss2 = tf.losses.sparse_softmax_cross_entropy(label_batch[:,2], logits2)
loss3 = tf.losses.sparse_softmax_cross_entropy(label_batch[:,3], logits3)
# 计算总的loss
total_loss = (loss0+loss1+loss2+loss3)/4.0
# 优化total_loss
optimizer = tf.train.AdamOptimizer(learning_rate=lr).minimize(total_loss)
# 计算准确率
correct0 = tf.nn.in_top_k(logits0, label_batch[:,0], 1)
accuracy0 = tf.reduce_mean(tf.cast(correct0, tf.float32))
correct1 = tf.nn.in_top_k(logits1, label_batch[:,1], 1)
accuracy1 = tf.reduce_mean(tf.cast(correct1, tf.float32))
correct2 = tf.nn.in_top_k(logits2, label_batch[:,2], 1)
accuracy2 = tf.reduce_mean(tf.cast(correct2, tf.float32))
correct3 = tf.nn.in_top_k(logits3, label_batch[:,3], 1)
accuracy3 = tf.reduce_mean(tf.cast(correct3, tf.float32))
# 总的准确率
total_correct = tf.cast(correct0, tf.float32)*tf.cast(correct1, tf.float32)*tf.cast(correct2, tf.float32)*tf.cast(correct3, tf.float32)
total_accuracy = tf.reduce_mean(tf.cast(total_correct, tf.float32))
# 所有变量初始化
sess.run(tf.global_variables_initializer())
# 定义saver保存模型
saver = tf.train.Saver()
# 训练epochs个周期
for i in range(epochs):
if i%30 == 0:
# 学习率的调整
sess.run(tf.assign(lr, lr/3))
# 训练集传入迭代器中
sess.run(iterator.initializer, feed_dict={
features_placeholder: x_train,
labels_placeholder: y_train})
# 训练模型
while True:
try:
sess.run(optimizer,feed_dict={
is_training:True})
except tf.errors.OutOfRangeError:
# 所有数据训练完毕后跳出循环
break
# 测试集放入迭代器中
sess.run(iterator.initializer, feed_dict={
features_placeholder: x_test,
labels_placeholder: y_test})
# 测试结果
while True:
try:
# 获得准确率和loss值
acc0,acc1,acc2,acc3,total_acc,l = \
sess.run([accuracy0,accuracy1,accuracy2,accuracy3,total_accuracy,total_loss],feed_dict={
is_training:False})
# loss值统计
tf.add_to_collection('sum_losses', l)
# 准确率统计
tf.add_to_collection('accuracy0', acc0)
tf.add_to_collection('accuracy1', acc1)
tf.add_to_collection('accuracy2', acc2)
tf.add_to_collection('accuracy3', acc3)
tf.add_to_collection('total_acc', total_acc)
except tf.errors.OutOfRangeError:
# loss值求平均
avg_loss = sess.run(tf.reduce_mean(tf.get_collection('sum_losses')))
# 准确率求平均
avg_acc0 = sess.run(tf.reduce_mean(tf.get_collection('accuracy0')))
avg_acc1 = sess.run(tf.reduce_mean(tf.get_collection('accuracy1')))
avg_acc2 = sess.run(tf.reduce_mean(tf.get_collection('accuracy2')))
avg_acc3 = sess.run(tf.reduce_mean(tf.get_collection('accuracy3')))
avg_total_acc = sess.run(tf.reduce_mean(tf.get_collection('total_acc')))
print('%d:loss=%.3f acc0=%.3f acc1=%.3f acc2=%.3f acc3=%.3f total_acc=%.3f' %
(i,avg_loss,avg_acc0,avg_acc1,avg_acc2,avg_acc3,avg_total_acc))
# 清空loss统计
temp = tf.get_collection_ref('sum_losses')
del temp[:]
# 清空准确率统计
temp = tf.get_collection_ref('accuracy0')
del temp[:]
# 清空准确率统计
temp = tf.get_collection_ref('accuracy1')
del temp[:]
# 清空准确率统计
temp = tf.get_collection_ref('accuracy2')
del temp[:]
# 清空准确率统计
temp = tf.get_collection_ref('accuracy3')
del temp[:]
# 清空准确率统计
temp = tf.get_collection_ref('total_acc')
del temp[:]
# 所有数据测试完毕后跳出循环
break
# 保存模型
saver.save(sess, 'models/model.ckpt', global_step = epochs)
# 部分输出结果:
# 0:loss=2.303 acc0=0.104 acc1=0.101 acc2=0.094 acc3=0.112 total_acc=0.001
# 1:loss=2.303 acc0=0.111 acc1=0.101 acc2=0.099 acc3=0.102 total_acc=0.001
# 2:loss=2.241 acc0=0.224 acc1=0.179 acc2=0.187 acc3=0.230 total_acc=0.001
# 3:loss=2.181 acc0=0.317 acc1=0.226 acc2=0.245 acc3=0.306 total_acc=0.005
# 4:loss=2.127 acc0=0.400 acc1=0.283 acc2=0.256 acc3=0.366 total_acc=0.014
# ...
# 93:loss=1.509 acc0=0.975 acc1=0.948 acc2=0.922 acc3=0.965 total_acc=0.842
# 94:loss=1.510 acc0=0.976 acc1=0.948 acc2=0.919 acc3=0.966 total_acc=0.838
# 95:loss=1.509 acc0=0.976 acc1=0.948 acc2=0.919 acc3=0.965 total_acc=0.836
# 96:loss=1.509 acc0=0.976 acc1=0.947 acc2=0.922 acc3=0.963 total_acc=0.839
# 97:loss=1.509 acc0=0.976 acc1=0.948 acc2=0.918 acc3=0.965 total_acc=0.840
# 98:loss=1.509 acc0=0.977 acc1=0.948 acc2=0.921 acc3=0.964 total_acc=0.843
# 99:loss=1.508 acc0=0.978 acc1=0.949 acc2=0.923 acc3=0.965 total_acc=0.842
通过上述程序,就能训练完成。并将训练结果保存为checkpoint文件。下面我们用训练结果,进行验证码识别模型的测试。
下面我们通过对checkpoint文件进行恢复,并对验证码图片进行测试:
import os
import tensorflow as tf
import numpy as np
import tensorflow.contrib.slim as slim
import matplotlib.pyplot as plt
from random import choice
from PIL import Image
# 数据集路径
dataset_dir = "./captcha/images/"
# 数据输入
inputs = tf.placeholder(tf.float32,[1,224,224,3])
# 分类数
num_classes = 10
# 获取所有验证码图片路径
def get_filenames(dataset_dir):
photo_filenames = []
for filename in os.listdir(dataset_dir):
# 获取文件路径
path = os.path.join(dataset_dir, filename)
photo_filenames.append(path)
return photo_filenames
# 获取图片路径
photo_filenames = get_filenames(dataset_dir)
# 定义alexnet网络
def alexnet(inputs, is_training=True):
with slim.arg_scope([slim.conv2d, slim.fully_connected],
activation_fn=tf.nn.relu,
weights_initializer=tf.glorot_uniform_initializer(),
biases_initializer=tf.constant_initializer(0)):
net = slim.conv2d(inputs, 64, [11, 11], 4)
net = slim.max_pool2d(net, [3, 3])
net = slim.conv2d(net, 192, [5, 5])
net = slim.max_pool2d(net, [3, 3])
net = slim.conv2d(net, 384, [3, 3])
net = slim.conv2d(net, 384, [3, 3])
net = slim.conv2d(net, 256, [3, 3])
net = slim.max_pool2d(net, [3, 3])
# 数据扁平化
net = slim.flatten(net)
net = slim.fully_connected(net, 1024)
net = slim.dropout(net, is_training=is_training)
net0 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
net1 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
net2 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
net3 = slim.fully_connected(net, num_classes, activation_fn=tf.nn.softmax)
return net0,net1,net2,net3
# 定义会话
with tf.Session() as sess:
# 传入数据得到结果
logits0,logits1,logits2,logits3 = alexnet(inputs, False)
# 预测值
predict0 = tf.argmax(logits0, 1)
predict1 = tf.argmax(logits1, 1)
predict2 = tf.argmax(logits2, 1)
predict3 = tf.argmax(logits3, 1)
# 所有变量初始化
sess.run(tf.global_variables_initializer())
# 定义saver载入模型
saver = tf.train.Saver()
saver.restore(sess,'models/model.ckpt-100')
for i in range(10):
filename = choice(photo_filenames)
# 读取图片
image = Image.open(filename)
# 根据模型的结构resize
image = image.resize((224, 224))
image = np.array(image)
# 图片预处理
image_data = image/255.0
image_data = image_data-0.5
image_data = image_data*2
# 变成4维数据
image_data = image_data.reshape((1,224,224,3))
# 获得预测结果
pre0,pre1,pre2,pre3 = sess.run([predict0,predict1,predict2,predict3], feed_dict={
inputs:image_data})
# 数据标签
label = filename.split('/')[-1][0:4]
plt.imshow(image)
plt.axis('off')
plt.title('predict:'+ str(pre0[0])+str(pre1[0])+str(pre2[0])+str(pre3[0]) + '\n' + 'label:' + label)
plt.show()
上述程序的部分输出结果如下图3.1所示,predict:6321为预测结果,label:6321为本身图片的标签。
【参考】:
1. 城市数据团课程《AI工程师》计算机视觉方向
2. deeplearning.ai 吴恩达《深度学习工程师》
3. 《机器学习》作者:周志华
4. 《深度学习》作者:Ian Goodfellow
转载声明:
版权声明:非商用自由转载-保持署名-注明出处
署名 :mcyJacky
文章出处:https://blog.csdn.net/mcyJacky