最近在学习使用Tensorflow框架,在学习到了CNN卷积神经网络的时候,跟着书上写了一个基于CNN网络的一个面部表情识别的小项目。
说一下我的硬件设备:
CPU:G4560,,这什么年代了,我还在用4560,你敢信??滑稽----
GPU:GTX1050 4G
我的python版本是3.6,Tensorflow版本是1.5
在这里我使用的Tensorflow-gpu版本是1.5的,,在运行的时候层遇到运行的库与编译库的版本不一致,后来经过将CUDNN降级,成功解决了这个问题,好了废话不多说,下面开始进入正题。
在这里我使用的人脸表情的数据集是来自kaggle竞赛中的数据,网址是https://inclass.kaggle.com/c/facial-keypoints-detector/data, 从这个网址就可以下载我使用的数据集了,,不过首先你需要注册一个Kaggle账号才行。
这个数据集中包括三个.csv文件:test.csv, train.csv, train_identity.csv.
每张图片的表情由一个0~6中的一个数字代表,0=生气,1=反感,2=恐惧,3=高兴,4=难过,5=惊喜,6=无表情。
在这里我一开始使用的的网络架构是:
input--->conv1-->pool1-->conv2-->pool2--->全连接层--->输出层
input:输入层 48 × 48 × 1
conv1:卷积层 5 × 5 × 1 × 32
pool1:最大池化层 2 × 2
conv2:卷积层 3 × 3 × 32 × 64
pool2:最大池化层 2 × 2
全连接层: 包含256个神经元 输入: 12 × 12 × 64 的一维张量
输出层:包含7个神经元,对应7种表情
在这里我一开始按照书上使用的是两个卷积层、两个池化层再加上两个全连接层。但是经过训练、验证后,发现模型的精确度不是非常好,所以我改变了一下网络的架构,多加了两个卷积层,这个改变的想法取自与VGG网络架构,但是由于硬件的问题,并不是全部按照VGG的网络架构来,,改变后的网络架构如下:
input---->conv1---->conv2---->pool1---->conv3----->conv4--->pool2--->全连接层--->输出层
input:输入层 48 × 48 × 1
conv1:卷积层 3 × 3 × 32 × 64
conv2:卷积层 3 × 3 × 64 × 64
pool1:最大池化层 2 × 2
conv3:卷积层 3 × 3 × 64 × 128
conv4:卷积层 3 × 3 × 128 × 128
pool2:最大池化层 2 × 2
全连接层: 包含256个神经元 输入: 12 × 12 × 128 的一维张量
输出层:包含7个神经元,对应7种表情
经过验证,,改变后的模型准确率达到了90%,相比之前有很大的提高,但是呢,准确度并不是非常的高,如果使用更大、更多样化的训练集,干预网络参数,在进一步修改网络架构,应该可以得到更好的模型。
import tensorflow as tf
import numpy as np
# import os, sys, inspect
from datetime import datetime
import EmotionDetectorUtils
FLAGS = tf.flags.FLAGS
tf.flags.DEFINE_string("data_dir", "EmotionDetector/", "Path to data files")
tf.flags.DEFINE_string("logs_dir", "logs/EmotionDetector_logs/", "Path to where log files are to be saved")
tf.flags.DEFINE_string("mode", "train", "mode: train (Default)/ test")
BATCH_SIZE = 128
LEARNING_RATE = 0.001 # 学习率
MAX_ITERATIONS = 1001 # 最大迭代次数
REGULARIZATION = 1e-2 # 正则项参数大小
IMAGE_SIZE = 48 # 图像大小
NUM_LABELS = 7 # 输出的类别数量
VALIDATION_PERCENT = 0.1
# 添加正则项
def add_to_regularization_loss(W, b):
tf.add_to_collection("losses", tf.nn.l2_loss(W))
tf.add_to_collection("losses", tf.nn.l2_loss(b))
# 按照传递进来的shape形状初始化权重
def weight_variable(shape, stddev=0.02, name=None):
initial = tf.truncated_normal(shape, stddev=stddev)
if name is None:
return tf.Variable(initial)
else:
return tf.get_variable(name, initializer=initial)
# 初始化偏差
def bias_variable(shape, name=None):
initial = tf.constant(0.0, shape=shape)
if name is None:
return tf.Variable(initial)
else:
return tf.get_variable(name, initializer=initial)
def conv2d_basic(x, W, bias):
conv = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding="SAME")
return tf.nn.bias_add(conv, bias)
# 池化层操作
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding="SAME")
# 模型的实现
def emotion_cnn(dataset):
print("input dataset's shape-->", dataset.shape)
with tf.name_scope("conv1") as scope:
tf.summary.histogram("W_conv1", weights['wc1'])
tf.summary.histogram("b_conv1", biases['bc1'])
conv_1 = tf.nn.conv2d(dataset, weights['wc1'],
strides=[1, 1, 1, 1], padding="SAME")
print("conv_1's shape--->", conv_1.shape)
h_conv1 = tf.nn.bias_add(conv_1, biases['bc1'])
h_1 = tf.nn.relu(h_conv1)
# h_pool1 = max_pool_2x2(h_1)
# print("h_pool1 shape-->", h_pool1.shape)
add_to_regularization_loss(weights['wc1'], biases['bc1'])
with tf.name_scope("conv2") as scope:
tf.summary.histogram("W_conv2", weights['wc2'])
tf.summary.histogram("b_conv2", biases['bc2'])
# conv_2 = tf.nn.conv2d(h_pool1, weights['wc2'], strides=[1, 1, 1, 1], padding="SAME")
conv_2 = tf.nn.conv2d(h_1, weights['wc2'], strides=[1, 1, 1, 1], padding="SAME")
print("conv_2's shape--->", conv_2.shape)
h_conv2 = tf.nn.bias_add(conv_2, biases['bc2'])
h_2 = tf.nn.relu(h_conv2)
h_pool2 = max_pool_2x2(h_2)
add_to_regularization_loss(weights['wc2'], biases['bc2'])
with tf.name_scope("conv3") as scope:
tf.summary.histogram("W_conv3", weights['wc3'])
tf.summary.histogram("b_conv3", biases['bc3'])
conv_3 = tf.nn.conv2d(h_pool2, weights['wc3'], strides=[1, 1, 1, 1], padding="SAME")
print("conv_3 shape-->", conv_3.shape)
h_conv3 = tf.nn.bias_add(conv_3, biases['bc3'])
h_3 = tf.nn.relu(h_conv3)
# h_pool3 = max_pool_2x2(h_3)
# print("h_pool3 shape-->", h_pool3.shape)
add_to_regularization_loss(weights['wc3'], biases['bc3'])
with tf.name_scope("conv4") as scope:
tf.summary.histogram("W_conv4", weights['wc4'])
tf.summary.histogram("b_conv4", biases['bc4'])
# conv_4 = tf.nn.conv2d(h_pool3, weights['wc4'], strides=[1, 1, 1, 1], padding="SAME")
conv_4 = tf.nn.conv2d(h_3, weights['wc4'], strides=[1, 1, 1, 1], padding="SAME")
print("conv_4 shape-->", conv_4.shape)
h_conv4 = tf.nn.bias_add(conv_4, biases['bc4'])
h_4 = tf.nn.relu(h_conv4)
h_pool4 = max_pool_2x2(h_4)
print("h_pool4 shape-->", h_pool4.shape)
add_to_regularization_loss(weights['wc4'], biases['bc4'])
with tf.name_scope("fc_1") as scope:
prob = 0.5
image_size = IMAGE_SIZE // 4
h_flat = tf.reshape(h_pool4, [-1, image_size * image_size * 128])
print("h_flat shape--->", h_flat.shape)
tf.summary.histogram("W_fc1", weights['wf1'])
tf.summary.histogram("b_fc1", biases['bf1'])
h_fc1 = tf.nn.relu(tf.matmul(h_flat, weights['wf1']) + biases['bf1'])
print("h_fc1'shape--->", h_fc1.shape)
h_fc1_dropout = tf.nn.dropout(h_fc1, prob)
print("h_fc1_dropout shape-->", h_fc1_dropout.shape)
with tf.name_scope("fc_2") as scope:
tf.summary.histogram("W_fc2", weights['wf2'])
tf.summary.histogram("b_fc2", biases['bf2'])
pred = tf.matmul(h_fc1_dropout, weights['wf2']) + biases['bf2']
print("pred shape-->", pred.shape)
return pred
weights = {
'wc1': weight_variable([3, 3, 1, 64], name="W_conv1"),
'wc2': weight_variable([3, 3, 64, 64], name="W_conv2"),
'wc3': weight_variable([3, 3, 64, 128], name="W_conv3"),
'wc4': weight_variable([3, 3, 128, 128], name="W_conv4"),
'wf1': weight_variable([(IMAGE_SIZE // 4) * (IMAGE_SIZE // 4) * 128, 256], name="W_fc1"),
'wf2': weight_variable([256, NUM_LABELS], name="W_fc2")
}
biases = {
'bc1': bias_variable([64], name="b_conv1"),
'bc2': bias_variable([64], name="b_conv2"),
'bc3': bias_variable([128], name="b_conv3"),
'bc4': bias_variable([128], name="b_conv4"),
'bf1': bias_variable([256], name="b_fc1"),
'bf2': bias_variable([NUM_LABELS], name="b_fc2")
}
def loss(pred, label):
cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=pred, labels=label))
tf.summary.scalar('Entropy', cross_entropy_loss)
reg_losses = tf.add_n(tf.get_collection("losses"))
# tf.summary.scalar('Reg_loss', reg_losses)
return cross_entropy_loss + REGULARIZATION * reg_losses
def train(loss, step):
return tf.train.AdamOptimizer().minimize(loss, global_step=step)
def get_next_batch(images, labels, step):
offset = (step * BATCH_SIZE) % (images.shape[0] - BATCH_SIZE)
batch_images = images[offset: offset + BATCH_SIZE]
batch_labels = labels[offset:offset + BATCH_SIZE]
return batch_images, batch_labels
# 入口函数
def main(argv=None):
# 获得数据
train_images, train_labels, valid_images, valid_labels, test_images = EmotionDetectorUtils.read_data(FLAGS.data_dir)
print("Train size: %s" % train_images.shape[0])
print('Validation size: %s' % valid_images.shape[0])
print("Test size: %s" % test_images.shape[0])
# 定义global_step变量追踪当前已进行优化迭代次数,trainable=Flase意为Tensorflow不会试图优化该变量
global_step = tf.Variable(0, trainable=False)
dropout_prob = tf.placeholder(tf.float32)
# 为输入图像定义占位符变量 None表示该张量可以载入任意数量的图像,每个图像高和宽都为IMAGE_SIZE像素,颜色通达数为1
input_dataset = tf.placeholder(tf.float32, [None, IMAGE_SIZE, IMAGE_SIZE, 1], name="input")
# 为输入input_data中的图像真实标签定义占位符变量
input_labels = tf.placeholder(tf.float32, [None, NUM_LABELS])
# 获得网络输出
pred = emotion_cnn(input_dataset)
# output_pred变量为预测结果,用于网络的测试和验证
output_pred = tf.nn.softmax(pred, name="output")
correct_prediction = tf.equal(tf.argmax(output_pred, 1), tf.argmax(input_labels, 1))
# tf.cast(x, dtype, name=None) 将输入转换成dtype的类型
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
tf.summary.scalar("accuracy", accuracy)
# loss_val变量为预测的类(pred)和输入图像的真实类(input_labels)之间的的误差
loss_val = loss(pred, input_labels)
# 获得优化器对象实例
train_op = train(loss_val, global_step)
# 定义summary_op变量用于Tensorboard可视化
summary_op = tf.summary.merge_all()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
summary_writer = tf.summary.FileWriter(FLAGS.logs_dir, sess.graph_def)
# 定义saver变量,以存储该模型
saver = tf.train.Saver()
ckpt = tf.train.get_checkpoint_state(FLAGS.logs_dir)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
print("Model Restored!")
# 开始训练
for step in range(MAX_ITERATIONS):
# 获得batch_size大小的一批训练样本
batch_image, batch_label = get_next_batch(train_images, train_labels, step)
# print("batch image's shape--->", batch_image.shape)
feed_dict = {input_dataset: batch_image, input_labels: batch_label}
# 运行优化器,将对应占位符的数据传递进去
sess.run(train_op, feed_dict=feed_dict)
if step % 10 == 0:
train_loss, summary_str = sess.run([loss_val, summary_op], feed_dict=feed_dict)
summary_writer.add_summary(summary_str, global_step=step)
print("Training Loss: %f" % train_loss)
# 当运行步数为100的倍数时,在验证集上验证训练出的模型,并且保存该模型
if step % 100 == 0:
valid_loss = sess.run(loss_val, feed_dict={input_dataset: valid_images, input_labels: valid_labels})
print("%s Validation Loss: %f" % (datetime.now(), valid_loss))
print("Accuracy: ", accuracy.eval(feed_dict={input_dataset: valid_images, input_labels: valid_labels}))
saver.save(sess, FLAGS.logs_dir + 'model.ckpt', global_step=step)
if __name__ == "__main__":
tf.app.run()
还有一个辅助的py文件
import pandas as pd
import numpy as np
import os, sys, inspect
from six.moves import cPickle as pickle
import scipy.misc as misc
IMAGE_SIZE = 48
NUM_LABELS = 7
VALIDATION_PERCENT = 0.1 # use 10 percent of training images for validation
IMAGE_LOCATION_NORM = IMAGE_SIZE // 2
np.random.seed(0)
emotion = {0: 'anger', 1: 'disgust',
2: 'fear', 3: 'happy',
4: 'sad', 5: 'surprise', 6: 'neutral'}
class testResult:
def __init__(self):
self.anger = 0
self.disgust = 0
self.fear = 0
self.happy = 0
self.sad = 0
self.surprise = 0
self.neutral = 0
def evaluate(self, label):
if (0 == label):
self.anger = self.anger + 1
if (1 == label):
self.disgust = self.disgust + 1
if (2 == label):
self.fear = self.fear + 1
if (3 == label):
self.happy = self.happy + 1
if (4 == label):
self.sad = self.sad + 1
if (5 == label):
self.surprise = self.surprise + 1
if (6 == label):
self.neutral = self.neutral + 1
def display_result(self, evaluations):
print("生气 = " + str((self.anger / float(evaluations)) * 100) + "%")
print("反感 = " + str((self.disgust / float(evaluations)) * 100) + "%")
print("恐惧 = " + str((self.fear / float(evaluations)) * 100) + "%")
print("高兴 = " + str((self.happy / float(evaluations)) * 100) + "%")
print("难过 = " + str((self.sad / float(evaluations)) * 100) + "%")
print("惊喜 = " + str((self.surprise / float(evaluations)) * 100) + "%")
print("无表情 = " + str((self.neutral / float(evaluations)) * 100) + "%")
def read_data(data_dir, force=False):
def create_onehot_label(x):
label = np.zeros((1, NUM_LABELS), dtype=np.float32)
label[:, int(x)] = 1
return label
pickle_file = os.path.join(data_dir, "EmotionDetectorData.pickle")
if force or not os.path.exists(pickle_file):
train_filename = os.path.join(data_dir, "train.csv")
data_frame = pd.read_csv(train_filename)
data_frame['Pixels'] = data_frame['Pixels'].apply(lambda x: np.fromstring(x, sep=" ") / 255.0)
data_frame = data_frame.dropna()
print("Reading train.csv ...")
train_images = np.vstack(data_frame['Pixels']).reshape(-1, IMAGE_SIZE, IMAGE_SIZE, 1)
print(train_images.shape)
train_labels = np.array(list(map(create_onehot_label, data_frame['Emotion'].values))).reshape(-1, NUM_LABELS)
print(train_labels.shape)
permutations = np.random.permutation(train_images.shape[0])
train_images = train_images[permutations]
train_labels = train_labels[permutations]
validation_percent = int(train_images.shape[0] * VALIDATION_PERCENT)
validation_images = train_images[:validation_percent]
validation_labels = train_labels[:validation_percent]
train_images = train_images[validation_percent:]
train_labels = train_labels[validation_percent:]
print("Reading test.csv ...")
test_filename = os.path.join(data_dir, "test.csv")
data_frame = pd.read_csv(test_filename)
data_frame['Pixels'] = data_frame['Pixels'].apply(lambda x: np.fromstring(x, sep=" ") / 255.0)
data_frame = data_frame.dropna()
test_images = np.vstack(data_frame['Pixels']).reshape(-1, IMAGE_SIZE, IMAGE_SIZE, 1)
with open(pickle_file, "wb") as file:
try:
print('Picking ...')
save = {
"train_images": train_images,
"train_labels": train_labels,
"validation_images": validation_images,
"validation_labels": validation_labels,
"test_images": test_images,
}
pickle.dump(save, file, pickle.HIGHEST_PROTOCOL)
except:
print("Unable to pickle file :/")
with open(pickle_file, "rb") as file:
save = pickle.load(file)
train_images = save["train_images"]
train_labels = save["train_labels"]
validation_images = save["validation_images"]
validation_labels = save["validation_labels"]
test_images = save["test_images"]
return train_images, train_labels, validation_images, validation_labels, test_images
用于模型测试的图片,可以自行百度下载,或者使用自己的照片,由于自己拍或者百度上下载来的图片不可能像用于训练的数据那样好,所以会包含很多的噪声,影响准确度。在这里如果使用人脸识别将脸部图片提取出来再进行测试效果应该会更好,即将人脸识别与面部表情识别结合起来可能会产生1+1>2的效果,当然这里也只是一个猜测。
from scipy import misc
import numpy as np
import matplotlib.cm as cm
import tensorflow as tf
import os, sys, inspect
from datetime import datetime
from matplotlib import pyplot as plt
import matplotlib.image as mpimg
from scipy import misc
import EmotionDetectorUtils
from EmotionDetectorUtils import testResult
import time
def rbg_to_gray(RGB_JPG):
"""
将RGB彩色图像转换为灰度图像
:param RGB_JPG: 输入的图像
:return: 灰度图像
"""
# print(type(RGB_JPG))
return np.dot(RGB_JPG[..., :3], [0.299, 0.587, 0.114])
l = []
image_list = []
# 读取图像
img = mpimg.imread('myself.jpg')
img2 = mpimg.imread('author_img.jpg')
img3 = mpimg.imread('test.jpg')
img4 = mpimg.imread('test1.jpg')
img5 = mpimg.imread('test2.jpg')
image_list.append(img)
image_list.append(img2)
image_list.append(img3)
image_list.append(img4)
image_list.append(img5)
for img in image_list:
# 将读入的彩色图像转换为灰度图像
gray = rbg_to_gray(img)
l.append(gray)
sess = tf.InteractiveSession()
# 调用之前保存的模型
new_saver = tf.train.import_meta_graph('logs/EmotionDetector_logs/model.ckpt-1000.meta')
new_saver.restore(sess, 'logs/EmotionDetector_logs/model.ckpt-1000')
tf.get_default_graph().as_graph_def()
x = sess.graph.get_tensor_by_name("input:0")
y_conv = sess.graph.get_tensor_by_name("output:0")
tResult = testResult()
num_evaluation = 1000
for img_gray in l:
image_test = np.resize(img_gray, (1, 48, 48, 1))
# 展示图片
plt.imshow(img_gray, cmap=plt.get_cmap('gray'))
plt.show()
print("开始训练")
start_time = time.time()
for i in range(0, num_evaluation):
result = sess.run(y_conv, feed_dict={x: image_test})
label = sess.run(tf.argmax(result, 1))
label = label[0]
label = int(label)
tResult.evaluate(label)
end_time = time.time()
tResult.display_result(num_evaluation)
print("用时----> %s 秒" % (end_time - start_time))
如果你想要一个效果很好的面部表情识别模型,你可以尝试去扩充数据集,或者改进模型的网络架构,更换更好的超参数等等。不过在做这些事情之前可以先根据损失函数的图像来判断模型当前是处于高偏差还是高方差的状态,,再去决定做哪些事情。。。
哪里如果写的有问题,,请大家指出来,谢谢。——一个菜鸡程序员