一、主要函数
align/ :用于人脸检测与人脸对齐的神经网络
facenet :用于人脸映射的神经网络
util/plot_learning_curves.m:这是用来在训练softmax模型的时候用matlab显示训练过程的程序
二、facenet/contributed/相关函数:
1、基于mtcnn与facenet的人脸聚类
代码:facenet/contributed/cluster.py(facenet/contributed/clustering.py实现了相似的功能,只是没有mtcnn进行检测这一步)
主要功能:
① 使用mtcnn进行人脸检测并对齐与裁剪
② 对裁剪的人脸使用facenet进行embedding
③ 对embedding的特征向量使用欧式距离进行聚类
2、基于mtcnn与facenet的人脸识别(输入单张图片判断这人是谁)
代码:facenet/contributed/predict.py
主要功能:
① 使用mtcnn进行人脸检测并对齐与裁剪
② 对裁剪的人脸使用facenet进行embedding
③ 执行predict.py进行人脸识别(需要训练好的svm模型)
3、以numpy数组的形式输出人脸聚类和图像标签
代码:facenet/contributed/export_embeddings.py
主要功能:
① 需要对数据进行对齐与裁剪做为输入数据
② 输出embeddings.npy;labels.npy;label_strings.npy
下面我们会介绍一个经典的人脸识别系统–谷歌人脸识别系统facenet,该网络主要包含两部分:
先去GitHub下载facenet源码:https://github.com/davidsandberg/facenet,解压后如下图所示:
import tensorflow as tf
import sklearn
import scipy
import cv2
import h5py
import matplotlib
import PIL
import requests
import psutil
如若哪个包没有报错,对应安装上就好了。
pip install 包名,单独安装或者如下安装requirements.txt依赖
pip install -r requirements.txt
我自己使用的是PyCharm+ Anaconda3-5.2.0-Windows-64+tensorflow-gpu1.14(facenet代码中用的tensorflow是1.7,注意这个tensorflow的版本,后面会遇到一个问题,真是自己作死啊)
将src文件夹添加到环境变量PYTHONPATH(临时的环境变量),若要设置永久的环境变量,可以到计算机——属性——高级系统设置——环境变量——系统变量——path,将路径添加到path中。添加环境变量是为了系统在当前路径下找不到你需要的模块时,会从环境变量路径中搜索。关于环境变量的添加具体可参考这篇博客
https://blog.csdn.net/Tona_ZM/article/details/79463284
下载LFW数据集
接下来将会讲解如何使用已经训练好的模型在LFW(Labeled Faces in the Wild)数据库上测试,不过我还需要先来介绍一下LFW数据集。
LFW数据集是由美国马赛诸塞大学阿姆斯特分校计算机实验室整理的人脸检测数据集,是评估人脸识别算法效果的公开测试数据集。LFW数据集共有13233张jpeg格式图片,属于5749个不同的人,其中有1680人对应不止一张图片,每张图片尺寸都是250×250,并且被标示出对应的人的名字。LFW数据集中每张图片命名方式为"lfw/name/name_xxx.jpg",这里"xxx"是前面补零的四位图片编号。例如,前美国总统乔治布什的第十张图片为"lfw/George_W_Bush/George_W_Bush_0010.jpg"。
数据集的下载地址为:http://vis-www.cs.umass.edu/lfw/lfw.tgz,下载完成后,解压数据集,打开打开其中一个文件夹,如下:
在lfw下新建一个文件夹raw,把lfw中所有的文件(除了raw)移到raw文件夹中。可以看到我的数据集lfw是放在datasets文件夹下,其中datasets文件夹和facenet是在同一路径下。
我们需要将检测所使用的数据集校准为和训练模型所使用的数据集大小一致(160×160),转换后的数据集存储在lfw_mtcnnpy_160文件夹内,处理的第一步是使用MTCNN网络进行人脸检测和对齐,并缩放到160×160。
MTCNN的实现主要在文件夹facenet/src/align中,文件夹的内容如下:
使用脚本align_dataset_mtcnn.py对LFW数据库进行人脸检测和对齐的方法通过运行命令,我们打开Anaconda Prompt(或者直接点击下方Terminal并在其中运行命令),来到facenet所在的路径下,运行如下命令:
python src/align/align_dataset_mtcnn.py datasets/lfw/raw datasets/lfw/lfw_mtcnnpy_160 --image_size 160 --margin 32 --random_order --gpu_memory_fraction=0.25 #没装GPU版tensorflow不加最后这一句--gpu_memory_fraction=0.25
若使用相对路径报错全都换成绝对路径再试试。
如果在PyCharm中运行,可以如下设置好参数后直接运行align_dataset_matcnn.py
Parameters设置为D:\program\facenet\datasets\lfw\raw D:\program\facenet\datasets\lfw\lfw_mtcnnpy_160 --image_size 160 --margin 32 --random_order --gpu_memory_fraction=0.25margin 32 --random_order --gpu_memory_fraction=0.25
Environment Variables设置为PYTHONUNBUFFERED=1;PYTHONPATH=D:\program\facenet\src
Working directory设置为D:\program\facenet\src
运行结果如下图:
该命令会创建一个datasets/lfw/lfw_mtcnnpy_160的文件夹,并将所有对齐好的人脸图像存放到这个文件夹中,数据的结构和原先的datasets/lfw/raw一样。参数–image_size 160 --margin 32的含义是在MTCNN检测得到的人脸框的基础上缩小32像素(训练时使用的数据偏大),并缩放到160×160大小,因此最后得到的对齐后的图像都是160×160像素的,这样的话,就成功地从原始图像中检测并对齐了人脸。
遇到的问题及解决办法
D:\program\facenet\datasets\lfw\raw D:\program\facenet\datasets\lfw\lfw_mtcnnpy_160 --image_size 160 --margin 32 --random_order --gpu_memory_fraction=0.25
选择run→Edit Configuration,在Environment variables中如下图设置,PYTHONUNBUFFERED=1;PYTHONPATH=D:\program\facenet\src。
2、运行align_detect_mtcnn.py提示[Error 13]Permission denied
解决方案:
(1)检查对应路径下的文件是否存在,且被占用。如果文件存在,被占用,将占用程序暂时关闭。
(2)修改cmd的权限,以管理员身份运行。
(3)检查是否是打开了文件夹。
上面几个都是网上的方法。我的错误是打开文件夹导致的,在Parameter中写入LFW数据集一定要写到以人物名字命名的文件夹的上一级,比如我就是漏了\raw这一级,数据集中raw目录下存放着所有人物图片,如果在Parameter中LFW路径只写到lfw这一层,程序只能读取其目录下的raw,不能读取raw目录下的所有图片(犯这么低级的错误,引以为戒啊)。
其中Parameters设置为D:\program\facenet\datasets\lfw\raw
D:\program\facenet\datasets\lfw\lfw_mtcnnpy_160 --image_size 160 --margin 32 --random_order --gpu_memory_fraction=0.25,分别是FLW数据集的存放路径、人脸对齐后的输出路径、先进行了边缘扩展32个像素)以及缩放(缩放到160×160160×160)、占用GPU比例。
下面我们来简略的分析一下align_dataset_mtcnn.py源文件,先上源代码如下,然后我们来解读一下main()函数.
关于人脸检测的具体细节可以查看detect_face()函数,具体细节部分可以参考MTCNN 的 TensorFlow 实现这篇博客。
"""Performs face alignment and stores face thumbnails in the output directory."""
# MIT License
#
# Copyright (c) 2016 David Sandberg
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from scipy import misc
import sys
import os
import argparse
import tensorflow as tf
import numpy as np
import facenet
import align.detect_face
import random
from time import sleep
'''
使用MTCNN网络进行人脸检测和对齐
'''
def main(args):
'''
args:
args:参数,关键字参数
'''
sleep(random.random())
#设置对齐后的人脸图像存放的路径
output_dir = os.path.expanduser(args.output_dir)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# Store some git revision info in a text file in the log directory 保存一些配置参数等信息
src_path,_ = os.path.split(os.path.realpath(__file__))
facenet.store_revision_info(src_path, output_dir, ' '.join(sys.argv))
'''1、获取LFW数据集 获取每个类别名称以及该类别下所有图片的绝对路径'''
dataset = facenet.get_dataset(args.input_dir)
print('Creating networks and loading parameters')
'''2、建立MTCNN网络,并预训练(即使用训练好的网络初始化参数)'''
with tf.Graph().as_default():
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=args.gpu_memory_fraction)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False))
with sess.as_default():
pnet, rnet, onet = align.detect_face.create_mtcnn(sess, None)
minsize = 20 # minimum size of face
threshold = [ 0.6, 0.7, 0.7 ] # three steps's threshold
factor = 0.709 # scale factor
# Add a random key to the filename to allow alignment using multiple processes
random_key = np.random.randint(0, high=99999)
bounding_boxes_filename = os.path.join(output_dir, 'bounding_boxes_%05d.txt' % random_key)
'''3、每个图片中人脸所在的边界框写入记录文件中'''
with open(bounding_boxes_filename, "w") as text_file:
nrof_images_total = 0
nrof_successfully_aligned = 0
if args.random_order:
random.shuffle(dataset)
#获取每一个人,以及对应的所有图片的绝对路径
for cls in dataset:
#每一个人对应的输出文件夹
output_class_dir = os.path.join(output_dir, cls.name)
if not os.path.exists(output_class_dir):
os.makedirs(output_class_dir)
if args.random_order:
random.shuffle(cls.image_paths)
#遍历每一张图片
for image_path in cls.image_paths:
nrof_images_total += 1
filename = os.path.splitext(os.path.split(image_path)[1])[0]
output_filename = os.path.join(output_class_dir, filename+'.png')
print(image_path)
if not os.path.exists(output_filename):
try:
img = misc.imread(image_path)
except (IOError, ValueError, IndexError) as e:
errorMessage = '{}: {}'.format(image_path, e)
print(errorMessage)
else:
if img.ndim<2:
print('Unable to align "%s"' % image_path)
text_file.write('%s\n' % (output_filename))
continue
if img.ndim == 2:
img = facenet.to_rgb(img)
img = img[:,:,0:3]
#人脸检测 bounding_boxes:表示边界框 形状为[n,5] 5对应x1,y1,x2,y2,score
#_:人脸关键点坐标 形状为 [n,10]
bounding_boxes, _ = align.detect_face.detect_face(img, minsize, pnet, rnet, onet, threshold, factor)
#边界框个数
nrof_faces = bounding_boxes.shape[0]
if nrof_faces>0:
#[n,4] 人脸框
det = bounding_boxes[:,0:4]
#保存所有人脸框
det_arr = []
img_size = np.asarray(img.shape)[0:2]
if nrof_faces>1:
#一张图片中检测多个人脸
if args.detect_multiple_faces:
for i in range(nrof_faces):
det_arr.append(np.squeeze(det[i]))
else:
bounding_box_size = (det[:,2]-det[:,0])*(det[:,3]-det[:,1])
img_center = img_size / 2
offsets = np.vstack([ (det[:,0]+det[:,2])/2-img_center[1], (det[:,1]+det[:,3])/2-img_center[0] ])
offset_dist_squared = np.sum(np.power(offsets,2.0),0)
index = np.argmax(bounding_box_size-offset_dist_squared*2.0) # some extra weight on the centering
det_arr.append(det[index,:])
else:
#只有一个人脸框
det_arr.append(np.squeeze(det))
#遍历每一个人脸框
for i, det in enumerate(det_arr):
#[4,] 边界框扩大margin区域,并进行裁切
det = np.squeeze(det)
bb = np.zeros(4, dtype=np.int32)
bb[0] = np.maximum(det[0]-args.margin/2, 0)
bb[1] = np.maximum(det[1]-args.margin/2, 0)
bb[2] = np.minimum(det[2]+args.margin/2, img_size[1])
bb[3] = np.minimum(det[3]+args.margin/2, img_size[0])
cropped = img[bb[1]:bb[3],bb[0]:bb[2],:]
#缩放到指定大小,并保存图片,以及边界框位置信息
scaled = misc.imresize(cropped, (args.image_size, args.image_size), interp='bilinear')
nrof_successfully_aligned += 1
filename_base, file_extension = os.path.splitext(output_filename)
if args.detect_multiple_faces:
output_filename_n = "{}_{}{}".format(filename_base, i, file_extension)
else:
output_filename_n = "{}{}".format(filename_base, file_extension)
misc.imsave(output_filename_n, scaled)
text_file.write('%s %d %d %d %d\n' % (output_filename_n, bb[0], bb[1], bb[2], bb[3]))
else:
print('Unable to align "%s"' % image_path)
text_file.write('%s\n' % (output_filename))
print('Total number of images: %d' % nrof_images_total)
print('Number of successfully aligned images: %d' % nrof_successfully_aligned)
def parse_arguments(argv):
'''
解析命令行参数
'''
parser = argparse.ArgumentParser()
#定义参数 input_dir、output_dir为外部参数名
parser.add_argument('input_dir', type=str, help='Directory with unaligned images.')
parser.add_argument('output_dir', type=str, help='Directory with aligned face thumbnails.')
parser.add_argument('--image_size', type=int,
help='Image size (height, width) in pixels.', default=160)
parser.add_argument('--margin', type=int,
help='Margin for the crop around the bounding box (height, width) in pixels.', default=32)
parser.add_argument('--random_order',
help='Shuffles the order of images to enable alignment using multiple processes.', action='store_true')
parser.add_argument('--gpu_memory_fraction', type=float,
help='Upper bound on the amount of GPU memory that will be used by the process.', default=1.0)
parser.add_argument('--detect_multiple_faces', type=bool,
help='Detect and align multiple faces per image.', default=False)
#解析
return parser.parse_args(argv)
if __name__ == '__main__':
main(parse_arguments(sys.argv[1:]))
项目的原作者提供了两个预训练的模型,分别是基于CASIA-WebFace和VGGFace2人脸库训练的,下载地址:https://github.com/davidsandberg/facenet:
注意:这两个模型文件需要才能够下载!!!!!!
这里我们使用的预训练模型是基于数据集VGGFace2的,并且使用的卷积网络结构是Inception ResNet v1,训练好的模型在LFW上可以达到99.05%左右的准确率。下载好模型后,将文件解压到facenet/models文件夹下(models文件夹需要自己新建)。解压后,会得到一个20180402-114759的文件夹,里面包含四个文件:
到这里、我们的准备工作已经基本完成,测试数据集LFW,模型、程序都有了,我们接下来开始评估模型的准确率。
我们打开Anaconda Prompt,来到facenet路径下(注意这里是facenet路径下。如何来到facenet路径下?打开Anaconda Prompt,输入activate your_environment切换到你的运行环境,之后如图所示,其中D为facenet代码所在盘符。),
然后运行如下命令:
python src/validate_on_lfw.py datasets/lfw/lfw_mtcnnpy_160/ models/20180402-114759
也可以在设置好参数后直接在PyCharm中运行validate_on_lfw.py文件
"""Validate a face recognizer on the "Labeled Faces in the Wild" dataset (http://vis-www.cs.umass.edu/lfw/).
Embeddings are calculated using the pairs from http://vis-www.cs.umass.edu/lfw/pairs.txt and the ROC curve
is calculated and plotted. Both the model metagraph and the model parameters need to exist
in the same directory, and the metagraph should have the extension '.meta'.
"""
# MIT License
#
# Copyright (c) 2016 David Sandberg
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
import numpy as np
import argparse
import facenet
import lfw
import os
import sys
from tensorflow.python.ops import data_flow_ops
from sklearn import metrics
from scipy.optimize import brentq
from scipy import interpolate
def main(args):
with tf.Graph().as_default():
with tf.Session() as sess:
# Read the file containing the pairs used for testing list
#每个元素如下:同一个人[Abel_Pacheco 1 4] 不同人[Ben_Kingsley 1 Daryl_Hannah 1]
pairs = lfw.read_pairs(os.path.expanduser(args.lfw_pairs))
# Get the paths for the corresponding images
# 获取测试图片的路径,actual_issame表示是否是同一个人
paths, actual_issame = lfw.get_paths(os.path.expanduser(args.lfw_dir), pairs)
#定义占位符
image_paths_placeholder = tf.placeholder(tf.string, shape=(None,1), name='image_paths')
labels_placeholder = tf.placeholder(tf.int32, shape=(None,1), name='labels')
batch_size_placeholder = tf.placeholder(tf.int32, name='batch_size')
control_placeholder = tf.placeholder(tf.int32, shape=(None,1), name='control')
phase_train_placeholder = tf.placeholder(tf.bool, name='phase_train')
#使用队列机制读取数据
nrof_preprocess_threads = 4
image_size = (args.image_size, args.image_size)
eval_input_queue = data_flow_ops.FIFOQueue(capacity=2000000,
dtypes=[tf.string, tf.int32, tf.int32],
shapes=[(1,), (1,), (1,)],
shared_name=None, name=None)
eval_enqueue_op = eval_input_queue.enqueue_many([image_paths_placeholder, labels_placeholder, control_placeholder], name='eval_enqueue_op')
image_batch, label_batch = facenet.create_input_pipeline(eval_input_queue, image_size, nrof_preprocess_threads, batch_size_placeholder)
# Load the model
input_map = {'image_batch': image_batch, 'label_batch': label_batch, 'phase_train': phase_train_placeholder}
facenet.load_model(args.model, input_map=input_map)
# Get output tensor
embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
#
#创建一个协调器,管理线程
coord = tf.train.Coordinator()
tf.train.start_queue_runners(coord=coord, sess=sess)
#开始评估
evaluate(sess, eval_enqueue_op, image_paths_placeholder, labels_placeholder, phase_train_placeholder, batch_size_placeholder, control_placeholder,
embeddings, label_batch, paths, actual_issame, args.lfw_batch_size, args.lfw_nrof_folds, args.distance_metric, args.subtract_mean,
args.use_flipped_images, args.use_fixed_image_standardization)
def evaluate(sess, enqueue_op, image_paths_placeholder, labels_placeholder, phase_train_placeholder, batch_size_placeholder, control_placeholder,
embeddings, labels, image_paths, actual_issame, batch_size, nrof_folds, distance_metric, subtract_mean, use_flipped_images, use_fixed_image_standardization):
# Run forward pass to calculate embeddings
print('Runnning forward pass on LFW images')
# Enqueue one epoch of image paths and labels
nrof_embeddings = len(actual_issame)*2 # nrof_pairs * nrof_images_per_pair
nrof_flips = 2 if use_flipped_images else 1
nrof_images = nrof_embeddings * nrof_flips
labels_array = np.expand_dims(np.arange(0,nrof_images),1)
image_paths_array = np.expand_dims(np.repeat(np.array(image_paths),nrof_flips),1)
control_array = np.zeros_like(labels_array, np.int32)
if use_fixed_image_standardization:
control_array += np.ones_like(labels_array)*facenet.FIXED_STANDARDIZATION
if use_flipped_images:
# Flip every second image
control_array += (labels_array % 2)*facenet.FLIP
sess.run(enqueue_op, {image_paths_placeholder: image_paths_array, labels_placeholder: labels_array, control_placeholder: control_array})
embedding_size = int(embeddings.get_shape()[1])
assert nrof_images % batch_size == 0, 'The number of LFW images must be an integer multiple of the LFW batch size'
nrof_batches = nrof_images // batch_size
emb_array = np.zeros((nrof_images, embedding_size))
lab_array = np.zeros((nrof_images,))
for i in range(nrof_batches):
feed_dict = {phase_train_placeholder:False, batch_size_placeholder:batch_size}
emb, lab = sess.run([embeddings, labels], feed_dict=feed_dict)
lab_array[lab] = lab
emb_array[lab, :] = emb
if i % 10 == 9:
print('.', end='')
sys.stdout.flush()
print('')
embeddings = np.zeros((nrof_embeddings, embedding_size*nrof_flips))
if use_flipped_images:
# Concatenate embeddings for flipped and non flipped version of the images
embeddings[:,:embedding_size] = emb_array[0::2,:]
embeddings[:,embedding_size:] = emb_array[1::2,:]
else:
embeddings = emb_array
assert np.array_equal(lab_array, np.arange(nrof_images))==True, 'Wrong labels used for evaluation, possibly caused by training examples left in the input pipeline'
tpr, fpr, accuracy, val, val_std, far = lfw.evaluate(embeddings, actual_issame, nrof_folds=nrof_folds, distance_metric=distance_metric, subtract_mean=subtract_mean)
print('Accuracy: %2.5f+-%2.5f' % (np.mean(accuracy), np.std(accuracy)))
print('Validation rate: %2.5f+-%2.5f @ FAR=%2.5f' % (val, val_std, far))
auc = metrics.auc(fpr, tpr)
print('Area Under Curve (AUC): %1.3f' % auc)
eer = brentq(lambda x: 1. - x - interpolate.interp1d(fpr, tpr)(x), 0., 1.)
print('Equal Error Rate (EER): %1.3f' % eer)
def parse_arguments(argv):
'''
参数解析
'''
parser = argparse.ArgumentParser()
parser.add_argument('lfw_dir', type=str,
help='Path to the data directory containing aligned LFW face patches.')
parser.add_argument('--lfw_batch_size', type=int,
help='Number of images to process in a batch in the LFW test set.', default=100)
parser.add_argument('model', type=str,
help='Could be either a directory containing the meta_file and ckpt_file or a model protobuf (.pb) file')
parser.add_argument('--image_size', type=int,
help='Image size (height, width) in pixels.', default=160)
parser.add_argument('--lfw_pairs', type=str,
help='The file containing the pairs to use for validation.', default='data/pairs.txt')
parser.add_argument('--lfw_nrof_folds', type=int,
help='Number of folds to use for cross validation. Mainly used for testing.', default=10)
parser.add_argument('--distance_metric', type=int,
help='Distance metric 0:euclidian, 1:cosine similarity.', default=0)
parser.add_argument('--use_flipped_images',
help='Concatenates embeddings for the image and its horizontally flipped counterpart.', action='store_true')
parser.add_argument('--subtract_mean',
help='Subtract feature mean before calculating distance.', action='store_true')
parser.add_argument('--use_fixed_image_standardization',
help='Performs fixed standardization of images.', action='store_true')
return parser.parse_args(argv)
if __name__ == '__main__':
main(parse_arguments(sys.argv[1:]))
在实际应用过程中,我们有时候还会关心如何在自己的图像上应用已有模型。下面我们以计算人脸之间的距离为例,演示如何将模型应用到自己的数据上。
假设我们现在有三张图片,我们把他们存放在facenet/src目录下,文件分别叫做img1.jpg,img2.jpg,img3.jpg。这三张图像中各包含有一个人的人脸,我们希望计算它们两两之间的距离。使用facenet/src/compare.py文件来实现。
我们打开Anaconda Prompt,来到facenet路径下(注意这里是facenet路径下),运行如下命令:
python src/compare.py src/models/20180402-114759 src/img1.jpg src/img2.jpg src/img3.jpg
但是结果显示0 successful operations,0 derived errors ignored。
由于安装的是tensorflow-gpu1.14,猜想是没有指定占用GPU比例,试着在原有命令中加上–gpu_memory_fraction=0.25,即
python src/compare.py src/models/20180402-114759 src/img1.jpg src/img2.jpg src/img3.jpg --gpu_memory_fraction=0.25
运行结果如下图
我们尝试使用不同的三个人的图片进行测试:
运行如下命令
python src/compare.py src/models/20180402-114759 src/img3.jpg src/img4.jpg src/img5.jpg --gpu_memory_fraction=0.25
结果如图所示
我们会发现同一个人的图片,测试得到的距离值偏小,而不同的人测试得到的距离偏大。正常情况下同一个人测得距离应该小于1,不同人测得距离应该大于1。然而上面的结果却不是这样,我认为这多半与我们选取的照片有关。在选取测试照片时,我们尽量要选取脸部较为清晰并且端正的图片,并且要与训练数据具有相同分布的图片,即此处尽量选取一些外国人的图片进行测试。
运行命令python src/compare.py src/models/20180402-114759 src/img6.jpg src/img7.jpg src/img8.jpg --gpu_memory_fraction=0.25,结果如图
我们可以看到这个效果还是不错的。因此,如果我们想在我们华人图片上也取得不错的效果,我们需要用华人的数据集进行训练模型。
从头训练一个新模型需要非常多的数据集,这里我们以CASIA-WebFace为例,这个 dataset 在原始地址已经下载不到了,而且这个 dataset 据说有很多无效的图片,所以这里我们使用的是清理过的数据库。该数据库可以在百度网盘中下载:下载地址,提取密码为 3zbb。
这个数据库有10575个类别494414张图像,每个类别都有各自的文件夹,里面有同一个人的几张或者几十张不等的脸部图片。我们先利用MTCNN 从这些照片中把人物的脸框出来,然后交给下面的 Facenet 去训练。
下载好之后,解压到datasets/casia/raw目录下,如图:
其中每个文件夹代表一个人,文件夹保存这个人的所有人脸图片。与LFW数据集类似,我们先利用MTCNN对原始图像进行人脸检测和对齐,我们打开Anaconda Prompt,来到facenet路径下,运行如下命令:
python src/align/align_dataset_mtcnn.py datasets/casia/raw datasets/casia/casia_mtcnnpy_182 --image_size 182 --margin 44 --random_order --gpu_memory_fraction=0.25
对齐后的图像保存在路径datasets/casia/casia_mtcnnpy_182下,每张图像的大小都是182×182。而最终网络的输入是160×160,之所以先生成182×182的图像,是为了留出一定的空间给数据增强的裁切环节。我们会在182×182的图像上随机裁切出160×160的区域,再送入神经网络进行训练。
使用如下命令开始训练:
python src/train_softmax.py --logs_base_dir ./logs --models_base_dir ./models --data_dir datasets/casia/casia_mtcnnpy_182 --image_size 160 --model_def models.inception_resnet_v1 --lfw_dir datasets/lfw/lfw_mtcnnpy_160 --optimizer RMSPROP --learning_rate -1 --max_nrof_epochs 80 --keep_probability 0.8 --random_crop --random_flip --learning_rate_schedule_file data/learning_rate_schedule_classifier_casia.txt --weight_decay 5e-5 --center_loss_factor 1e-2 --center_loss_alfa 0.9
上面命令中有很多参数,我们来一一介绍。首先是文件src/train_softmax.py文件,它采用中心损失和softmax损失结合来训练模型,其中参数如下:
除了上面我们使用到的参数,还有许多参数,下面介绍一些比较重要的:
由于CASIA-WebFace数据集比较大、训练起来周期较长,下面我们使用CASIA-WebFace一部分数据进行训练,运行结果如下:
解决方法:
1)需要指定GPU,代码头部添加如下代码:
import os
os.environ[“CUDA_VISIBLE_DEVICES”] = “1”
2)限制当前脚本可用显存,代码头部添加第一行,session 语句进行如第二行的修改
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
修改之后,继续运行,结果如下:
其中Epoch:[1][683/1000]表示当前为第1个epoch以及当前epoch内的第683个训练batch,程序中默认参数epoch_size为1000,表示一个epoch有1000个batch。Time表示这一步的消耗的时间,Lr是学习率,Loss为当前batch的损失,Xent是softmax损失,RegLoss是正则化损失和中心损失之和,Cl是中心损失(注意这里的损失都是平均损失,即当前batch损失和/batch_size);
最后还是有错,
python3中,iteritems变成items
生成日志文件和模型文件:
我们启动Anaconda Prompt,首先来到日志文件的上级路径下,这一步是必须的,然后输入如下命令:
tensorboard –logdir D:\program\facenet\logs\20200930-082803
接着打开浏览器,输入http://XiaRedMiG:6006,这里XiaRedMiG是本机地址,6006是端口号。打开后,单击SCALARS,我们会看到我们在程序中创建的变量total_loss_1,点击它,会显示如下内容:
上图为训练过程中损失函数的变化过程,横坐标为迭代步数,这里为33k左右,主要是因为我迭代了33个epoch后终止了程序,每个epoch又迭代1000个batch。
与之对应的,每个epoch结束还会在LFW数据集上做一次验证,对应的准确率变化曲线如下:
在左侧有个smoothing滚动条,可以用来改变右侧标量的曲线,我们还可以勾选上show data download links,然后下载数据。
train_softmax.py源码如下:
"""Training a face recognizer with TensorFlow using softmax cross entropy loss
"""
# MIT License
#
# Copyright (c) 2016 David Sandberg
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from datetime import datetime
import os.path
import time
import sys
import random
import tensorflow as tf
import numpy as np
import importlib
import argparse
import facenet
import lfw
import h5py
import math
import tensorflow.contrib.slim as slim
from tensorflow.python.ops import data_flow_ops
from tensorflow.python.framework import ops
from tensorflow.python.ops import array_ops
def main(args):
#导入CNN网络模块
network = importlib.import_module(args.model_def)
image_size = (args.image_size, args.image_size)
#当前时间
subdir = datetime.strftime(datetime.now(), '%Y%m%d-%H%M%S')
#日志文件夹路径
log_dir = os.path.join(os.path.expanduser(args.logs_base_dir), subdir)
if not os.path.isdir(log_dir): # Create the log directory if it doesn't exist
os.makedirs(log_dir)
#模型文件夹路径
model_dir = os.path.join(os.path.expanduser(args.models_base_dir), subdir)
if not os.path.isdir(model_dir): # Create the model directory if it doesn't exist
os.makedirs(model_dir)
stat_file_name = os.path.join(log_dir, 'stat.h5')
# Write arguments to a text file
facenet.write_arguments_to_file(args, os.path.join(log_dir, 'arguments.txt'))
# Store some git revision info in a text file in the log directory
src_path,_ = os.path.split(os.path.realpath(__file__))
facenet.store_revision_info(src_path, log_dir, ' '.join(sys.argv))
np.random.seed(seed=args.seed)
random.seed(args.seed)
#训练数据集准备工作:获取每个类别名称以及该类别下所有图片的绝对路径
dataset = facenet.get_dataset(args.data_dir)
if args.filter_filename:
dataset = filter_dataset(dataset, os.path.expanduser(args.filter_filename),
args.filter_percentile, args.filter_min_nrof_images_per_class)
if args.validation_set_split_ratio>0.0:
train_set, val_set = facenet.split_dataset(dataset, args.validation_set_split_ratio, args.min_nrof_val_images_per_class, 'SPLIT_IMAGES')
else:
train_set, val_set = dataset, []
#类别个数 每一个人都是一个类别
nrof_classes = len(train_set)
print('Model directory: %s' % model_dir)
print('Log directory: %s' % log_dir)
#指定了预训练模型?
pretrained_model = None
if args.pretrained_model:
pretrained_model = os.path.expanduser(args.pretrained_model)
print('Pre-trained model: %s' % pretrained_model)
#指定了lfw数据集路径?用于测试
if args.lfw_dir:
print('LFW directory: %s' % args.lfw_dir)
# Read the file containing the pairs used for testing
pairs = lfw.read_pairs(os.path.expanduser(args.lfw_pairs))
# Get the paths for the corresponding images
lfw_paths, actual_issame = lfw.get_paths(os.path.expanduser(args.lfw_dir), pairs)
with tf.Graph().as_default():
tf.set_random_seed(args.seed)
#训练步数
global_step = tf.Variable(0, trainable=False)
# Get a list of image paths and their labels
# image_list:list 每一个元素对应一个图像的路径
# label_list:list 每一个元素对应一个图像的标签 使用0,1,2...表示
image_list, label_list = facenet.get_image_paths_and_labels(train_set)
assert len(image_list)>0, 'The training set should not be empty'
val_image_list, val_label_list = facenet.get_image_paths_and_labels(val_set)
# Create a queue that produces indices into the image_list and label_list
labels = ops.convert_to_tensor(label_list, dtype=tf.int32)
#图像个数
range_size = array_ops.shape(labels)[0]
#创建一个索引队列,队列产生0到range_size-1的元素
index_queue = tf.train.range_input_producer(range_size, num_epochs=None,
shuffle=True, seed=None, capacity=32)
#每次出队args.batch_size*args.epoch_size个元素 即一个epoch样本数
index_dequeue_op = index_queue.dequeue_many(args.batch_size*args.epoch_size, 'index_dequeue')
#定义占位符
learning_rate_placeholder = tf.placeholder(tf.float32, name='learning_rate')
batch_size_placeholder = tf.placeholder(tf.int32, name='batch_size')
phase_train_placeholder = tf.placeholder(tf.bool, name='phase_train')
image_paths_placeholder = tf.placeholder(tf.string, shape=(None,1), name='image_paths')
labels_placeholder = tf.placeholder(tf.int32, shape=(None,1), name='labels')
control_placeholder = tf.placeholder(tf.int32, shape=(None,1), name='control')
#新建一个队列,数据流操作,fifo,队列中每一项包含一个输入图像路径和相应的标签、control shapes:对应的是每一项输入的形状
nrof_preprocess_threads = 4
input_queue = data_flow_ops.FIFOQueue(capacity=2000000,
dtypes=[tf.string, tf.int32, tf.int32],
shapes=[(1,), (1,), (1,)],
shared_name=None, name=None)
#返回一个入队操作
enqueue_op = input_queue.enqueue_many([image_paths_placeholder, labels_placeholder, control_placeholder], name='enqueue_op')
#返回一个出队操作,即每次训练获取batch大小的数据
image_batch, label_batch = facenet.create_input_pipeline(input_queue, image_size, nrof_preprocess_threads, batch_size_placeholder)
#复制副本
image_batch = tf.identity(image_batch, 'image_batch')
image_batch = tf.identity(image_batch, 'input')
label_batch = tf.identity(label_batch, 'label_batch')
print('Number of classes in training set: %d' % nrof_classes)
print('Number of examples in training set: %d' % len(image_list))
print('Number of classes in validation set: %d' % len(val_set))
print('Number of examples in validation set: %d' % len(val_image_list))
print('Building training graph')
# Build the inference graph
#创建CNN网络,最后一层输出 prelogits:[batch_size,128]
prelogits, _ = network.inference(image_batch, args.keep_probability,
phase_train=phase_train_placeholder, bottleneck_layer_size=args.embedding_size,
weight_decay=args.weight_decay)
#输出每个类别的概率 [batch_size,人数]
logits = slim.fully_connected(prelogits, len(train_set), activation_fn=None,
weights_initializer=slim.initializers.xavier_initializer(),
weights_regularizer=slim.l2_regularizer(args.weight_decay),
scope='Logits', reuse=False)
#先计算每一行的l2范数,然后对每一行的元素/该行范数
embeddings = tf.nn.l2_normalize(prelogits, 1, 1e-10, name='embeddings')
# Norm for the prelogits
eps = 1e-4
#默认prelogits先求绝对值,然后沿axis=1求1范数,最后求平均
prelogits_norm = tf.reduce_mean(tf.norm(tf.abs(prelogits)+eps, ord=args.prelogits_norm_p, axis=1))
#把变量prelogits_norm * args.prelogits_norm_loss_factor放入tf.GraphKeys.REGULARIZATION_LOSSES集合
tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, prelogits_norm * args.prelogits_norm_loss_factor)
# Add center loss 计算中心损失,并追加到tf.GraphKeys.REGULARIZATION_LOSSES集合
prelogits_center_loss, _ = facenet.center_loss(prelogits, label_batch, args.center_loss_alfa, nrof_classes)
tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, prelogits_center_loss * args.center_loss_factor)
#学习率指数衰减
learning_rate = tf.train.exponential_decay(learning_rate_placeholder, global_step,
args.learning_rate_decay_epochs*args.epoch_size, args.learning_rate_decay_factor, staircase=True)
tf.summary.scalar('learning_rate', learning_rate)
# Calculate the average cross entropy loss across the batch
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=label_batch, logits=logits, name='cross_entropy_per_example')
cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
#加入交叉熵代价函数
tf.add_to_collection('losses', cross_entropy_mean)
#计算准确率 correct_prediction:[batch_size,1]
correct_prediction = tf.cast(tf.equal(tf.argmax(logits, 1), tf.cast(label_batch, tf.int64)), tf.float32)
accuracy = tf.reduce_mean(correct_prediction)
# Calculate the total losses https://blog.csdn.net/uestc_c2_403/article/details/72415791
regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
total_loss = tf.add_n([cross_entropy_mean] + regularization_losses, name='total_loss')
# Build a Graph that trains the model with one batch of examples and updates the model parameters
train_op = facenet.train(total_loss, global_step, args.optimizer,
learning_rate, args.moving_average_decay, tf.global_variables(), args.log_histograms)
# Create a saver
saver = tf.train.Saver(tf.trainable_variables(), max_to_keep=3)
# Build the summary operation based on the TF collection of Summaries.
summary_op = tf.summary.merge_all()
# Start running operations on the Graph.
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=args.gpu_memory_fraction)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False))
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
summary_writer = tf.summary.FileWriter(log_dir, sess.graph)
#创建一个协调器,管理线程
coord = tf.train.Coordinator()
#启动start_queue_runners之后, 才会开始填充文件队列、并读取数据
tf.train.start_queue_runners(coord=coord, sess=sess)
#开始执行图
with sess.as_default():
#加载预训练模型
if pretrained_model:
print('Restoring pretrained model: %s' % pretrained_model)
saver.restore(sess, pretrained_model)
# Training and validation loop
print('Running training')
nrof_steps = args.max_nrof_epochs*args.epoch_size
nrof_val_samples = int(math.ceil(args.max_nrof_epochs / args.validate_every_n_epochs)) # Validate every validate_every_n_epochs as well as in the last epoch
stat = {
'loss': np.zeros((nrof_steps,), np.float32),
'center_loss': np.zeros((nrof_steps,), np.float32),
'reg_loss': np.zeros((nrof_steps,), np.float32),
'xent_loss': np.zeros((nrof_steps,), np.float32),
'prelogits_norm': np.zeros((nrof_steps,), np.float32),
'accuracy': np.zeros((nrof_steps,), np.float32),
'val_loss': np.zeros((nrof_val_samples,), np.float32),
'val_xent_loss': np.zeros((nrof_val_samples,), np.float32),
'val_accuracy': np.zeros((nrof_val_samples,), np.float32),
'lfw_accuracy': np.zeros((args.max_nrof_epochs,), np.float32),
'lfw_valrate': np.zeros((args.max_nrof_epochs,), np.float32),
'learning_rate': np.zeros((args.max_nrof_epochs,), np.float32),
'time_train': np.zeros((args.max_nrof_epochs,), np.float32),
'time_validate': np.zeros((args.max_nrof_epochs,), np.float32),
'time_evaluate': np.zeros((args.max_nrof_epochs,), np.float32),
'prelogits_hist': np.zeros((args.max_nrof_epochs, 1000), np.float32),
}
#开始迭代 epochs轮
for epoch in range(1,args.max_nrof_epochs+1):
step = sess.run(global_step, feed_dict=None)
# Train for one epoch
t = time.time()
cont = train(args, sess, epoch, image_list, label_list, index_dequeue_op, enqueue_op, image_paths_placeholder, labels_placeholder,
learning_rate_placeholder, phase_train_placeholder, batch_size_placeholder, control_placeholder, global_step,
total_loss, train_op, summary_op, summary_writer, regularization_losses, args.learning_rate_schedule_file,
stat, cross_entropy_mean, accuracy, learning_rate,
prelogits, prelogits_center_loss, args.random_rotate, args.random_crop, args.random_flip, prelogits_norm, args.prelogits_hist_max, args.use_fixed_image_standardization)
stat['time_train'][epoch-1] = time.time() - t
if not cont:
break
t = time.time()
if len(val_image_list)>0 and ((epoch-1) % args.validate_every_n_epochs == args.validate_every_n_epochs-1 or epoch==args.max_nrof_epochs):
validate(args, sess, epoch, val_image_list, val_label_list, enqueue_op, image_paths_placeholder, labels_placeholder, control_placeholder,
phase_train_placeholder, batch_size_placeholder,
stat, total_loss, regularization_losses, cross_entropy_mean, accuracy, args.validate_every_n_epochs, args.use_fixed_image_standardization)
stat['time_validate'][epoch-1] = time.time() - t
# Save variables and the metagraph if it doesn't exist already
save_variables_and_metagraph(sess, saver, summary_writer, model_dir, subdir, epoch)
# Evaluate on LFW
t = time.time()
if args.lfw_dir:
evaluate(sess, enqueue_op, image_paths_placeholder, labels_placeholder, phase_train_placeholder, batch_size_placeholder, control_placeholder,
embeddings, label_batch, lfw_paths, actual_issame, args.lfw_batch_size, args.lfw_nrof_folds, log_dir, step, summary_writer, stat, epoch,
args.lfw_distance_metric, args.lfw_subtract_mean, args.lfw_use_flipped_images, args.use_fixed_image_standardization)
stat['time_evaluate'][epoch-1] = time.time() - t
print('Saving statistics')
with h5py.File(stat_file_name, 'w') as f:
for key, value in stat.items():
f.create_dataset(key, data=value)
return model_dir
def find_threshold(var, percentile):
hist, bin_edges = np.histogram(var, 100)
cdf = np.float32(np.cumsum(hist)) / np.sum(hist)
bin_centers = (bin_edges[:-1]+bin_edges[1:])/2
#plt.plot(bin_centers, cdf)
threshold = np.interp(percentile*0.01, cdf, bin_centers)
return threshold
def filter_dataset(dataset, data_filename, percentile, min_nrof_images_per_class):
with h5py.File(data_filename,'r') as f:
distance_to_center = np.array(f.get('distance_to_center'))
label_list = np.array(f.get('label_list'))
image_list = np.array(f.get('image_list'))
distance_to_center_threshold = find_threshold(distance_to_center, percentile)
indices = np.where(distance_to_center>=distance_to_center_threshold)[0]
filtered_dataset = dataset
removelist = []
for i in indices:
label = label_list[i]
image = image_list[i]
if image in filtered_dataset[label].image_paths:
filtered_dataset[label].image_paths.remove(image)
if len(filtered_dataset[label].image_paths)<min_nrof_images_per_class:
removelist.append(label)
ix = sorted(list(set(removelist)), reverse=True)
for i in ix:
del(filtered_dataset[i])
return filtered_dataset
def train(args, sess, epoch, image_list, label_list, index_dequeue_op, enqueue_op, image_paths_placeholder, labels_placeholder,
learning_rate_placeholder, phase_train_placeholder, batch_size_placeholder, control_placeholder, step,
loss, train_op, summary_op, summary_writer, reg_losses, learning_rate_schedule_file,
stat, cross_entropy_mean, accuracy,
learning_rate, prelogits, prelogits_center_loss, random_rotate, random_crop, random_flip, prelogits_norm, prelogits_hist_max, use_fixed_image_standardization):
batch_number = 0
if args.learning_rate>0.0:
lr = args.learning_rate
else:
lr = facenet.get_learning_rate_from_file(learning_rate_schedule_file, epoch)
if lr<=0:
return False
#一个epoch,batch_size*epoch_size个样本
index_epoch = sess.run(index_dequeue_op)
label_epoch = np.array(label_list)[index_epoch]
image_epoch = np.array(image_list)[index_epoch]
# Enqueue one epoch of image paths and labels
labels_array = np.expand_dims(np.array(label_epoch),1)
image_paths_array = np.expand_dims(np.array(image_epoch),1)
control_value = facenet.RANDOM_ROTATE * random_rotate + facenet.RANDOM_CROP * random_crop + facenet.RANDOM_FLIP * random_flip + facenet.FIXED_STANDARDIZATION * use_fixed_image_standardization
control_array = np.ones_like(labels_array) * control_value
sess.run(enqueue_op, {image_paths_placeholder: image_paths_array, labels_placeholder: labels_array, control_placeholder: control_array})
# Training loop 一个epoch
train_time = 0
while batch_number < args.epoch_size:
start_time = time.time()
feed_dict = {learning_rate_placeholder: lr, phase_train_placeholder:True, batch_size_placeholder:args.batch_size}
tensor_list = [loss, train_op, step, reg_losses, prelogits, cross_entropy_mean, learning_rate, prelogits_norm, accuracy, prelogits_center_loss]
if batch_number % 100 == 0:
loss_, _, step_, reg_losses_, prelogits_, cross_entropy_mean_, lr_, prelogits_norm_, accuracy_, center_loss_, summary_str = sess.run(tensor_list + [summary_op], feed_dict=feed_dict)
summary_writer.add_summary(summary_str, global_step=step_)
else:
loss_, _, step_, reg_losses_, prelogits_, cross_entropy_mean_, lr_, prelogits_norm_, accuracy_, center_loss_ = sess.run(tensor_list, feed_dict=feed_dict)
duration = time.time() - start_time
stat['loss'][step_-1] = loss_
stat['center_loss'][step_-1] = center_loss_
stat['reg_loss'][step_-1] = np.sum(reg_losses_)
stat['xent_loss'][step_-1] = cross_entropy_mean_
stat['prelogits_norm'][step_-1] = prelogits_norm_
stat['learning_rate'][epoch-1] = lr_
stat['accuracy'][step_-1] = accuracy_
stat['prelogits_hist'][epoch-1,:] += np.histogram(np.minimum(np.abs(prelogits_), prelogits_hist_max), bins=1000, range=(0.0, prelogits_hist_max))[0]
duration = time.time() - start_time
print('Epoch: [%d][%d/%d]\tTime %.3f\tLoss %2.3f\tXent %2.3f\tRegLoss %2.3f\tAccuracy %2.3f\tLr %2.5f\tCl %2.3f' %
(epoch, batch_number+1, args.epoch_size, duration, loss_, cross_entropy_mean_, np.sum(reg_losses_), accuracy_, lr_, center_loss_))
batch_number += 1
train_time += duration
# Add validation loss and accuracy to summary
summary = tf.Summary()
#pylint: disable=maybe-no-member
summary.value.add(tag='time/total', simple_value=train_time)
summary_writer.add_summary(summary, global_step=step_)
return True
def validate(args, sess, epoch, image_list, label_list, enqueue_op, image_paths_placeholder, labels_placeholder, control_placeholder,
phase_train_placeholder, batch_size_placeholder,
stat, loss, regularization_losses, cross_entropy_mean, accuracy, validate_every_n_epochs, use_fixed_image_standardization):
print('Running forward pass on validation set')
nrof_batches = len(label_list) // args.lfw_batch_size
nrof_images = nrof_batches * args.lfw_batch_size
# Enqueue one epoch of image paths and labels
labels_array = np.expand_dims(np.array(label_list[:nrof_images]),1)
image_paths_array = np.expand_dims(np.array(image_list[:nrof_images]),1)
control_array = np.ones_like(labels_array, np.int32)*facenet.FIXED_STANDARDIZATION * use_fixed_image_standardization
sess.run(enqueue_op, {image_paths_placeholder: image_paths_array, labels_placeholder: labels_array, control_placeholder: control_array})
loss_array = np.zeros((nrof_batches,), np.float32)
xent_array = np.zeros((nrof_batches,), np.float32)
accuracy_array = np.zeros((nrof_batches,), np.float32)
# Training loop
start_time = time.time()
for i in range(nrof_batches):
feed_dict = {phase_train_placeholder:False, batch_size_placeholder:args.lfw_batch_size}
loss_, cross_entropy_mean_, accuracy_ = sess.run([loss, cross_entropy_mean, accuracy], feed_dict=feed_dict)
loss_array[i], xent_array[i], accuracy_array[i] = (loss_, cross_entropy_mean_, accuracy_)
if i % 10 == 9:
print('.', end='')
sys.stdout.flush()
print('')
duration = time.time() - start_time
val_index = (epoch-1)//validate_every_n_epochs
stat['val_loss'][val_index] = np.mean(loss_array)
stat['val_xent_loss'][val_index] = np.mean(xent_array)
stat['val_accuracy'][val_index] = np.mean(accuracy_array)
print('Validation Epoch: %d\tTime %.3f\tLoss %2.3f\tXent %2.3f\tAccuracy %2.3f' %
(epoch, duration, np.mean(loss_array), np.mean(xent_array), np.mean(accuracy_array)))
def evaluate(sess, enqueue_op, image_paths_placeholder, labels_placeholder, phase_train_placeholder, batch_size_placeholder, control_placeholder,
embeddings, labels, image_paths, actual_issame, batch_size, nrof_folds, log_dir, step, summary_writer, stat, epoch, distance_metric, subtract_mean, use_flipped_images, use_fixed_image_standardization):
start_time = time.time()
# Run forward pass to calculate embeddings
print('Runnning forward pass on LFW images')
# Enqueue one epoch of image paths and labels
nrof_embeddings = len(actual_issame)*2 # nrof_pairs * nrof_images_per_pair
nrof_flips = 2 if use_flipped_images else 1
nrof_images = nrof_embeddings * nrof_flips
labels_array = np.expand_dims(np.arange(0,nrof_images),1)
image_paths_array = np.expand_dims(np.repeat(np.array(image_paths),nrof_flips),1)
control_array = np.zeros_like(labels_array, np.int32)
if use_fixed_image_standardization:
control_array += np.ones_like(labels_array)*facenet.FIXED_STANDARDIZATION
if use_flipped_images:
# Flip every second image
control_array += (labels_array % 2)*facenet.FLIP
sess.run(enqueue_op, {image_paths_placeholder: image_paths_array, labels_placeholder: labels_array, control_placeholder: control_array})
embedding_size = int(embeddings.get_shape()[1])
assert nrof_images % batch_size == 0, 'The number of LFW images must be an integer multiple of the LFW batch size'
nrof_batches = nrof_images // batch_size
emb_array = np.zeros((nrof_images, embedding_size))
lab_array = np.zeros((nrof_images,))
for i in range(nrof_batches):
feed_dict = {phase_train_placeholder:False, batch_size_placeholder:batch_size}
emb, lab = sess.run([embeddings, labels], feed_dict=feed_dict)
lab_array[lab] = lab
emb_array[lab, :] = emb
if i % 10 == 9:
print('.', end='')
sys.stdout.flush()
print('')
embeddings = np.zeros((nrof_embeddings, embedding_size*nrof_flips))
if use_flipped_images:
# Concatenate embeddings for flipped and non flipped version of the images
embeddings[:,:embedding_size] = emb_array[0::2,:]
embeddings[:,embedding_size:] = emb_array[1::2,:]
else:
embeddings = emb_array
assert np.array_equal(lab_array, np.arange(nrof_images))==True, 'Wrong labels used for evaluation, possibly caused by training examples left in the input pipeline'
_, _, accuracy, val, val_std, far = lfw.evaluate(embeddings, actual_issame, nrof_folds=nrof_folds, distance_metric=distance_metric, subtract_mean=subtract_mean)
print('Accuracy: %2.5f+-%2.5f' % (np.mean(accuracy), np.std(accuracy)))
print('Validation rate: %2.5f+-%2.5f @ FAR=%2.5f' % (val, val_std, far))
lfw_time = time.time() - start_time
# Add validation loss and accuracy to summary
summary = tf.Summary()
#pylint: disable=maybe-no-member
summary.value.add(tag='lfw/accuracy', simple_value=np.mean(accuracy))
summary.value.add(tag='lfw/val_rate', simple_value=val)
summary.value.add(tag='time/lfw', simple_value=lfw_time)
summary_writer.add_summary(summary, step)
with open(os.path.join(log_dir,'lfw_result.txt'),'at') as f:
f.write('%d\t%.5f\t%.5f\n' % (step, np.mean(accuracy), val))
stat['lfw_accuracy'][epoch-1] = np.mean(accuracy)
stat['lfw_valrate'][epoch-1] = val
def save_variables_and_metagraph(sess, saver, summary_writer, model_dir, model_name, step):
# Save the model checkpoint
print('Saving variables')
start_time = time.time()
checkpoint_path = os.path.join(model_dir, 'model-%s.ckpt' % model_name)
saver.save(sess, checkpoint_path, global_step=step, write_meta_graph=False)
save_time_variables = time.time() - start_time
print('Variables saved in %.2f seconds' % save_time_variables)
metagraph_filename = os.path.join(model_dir, 'model-%s.meta' % model_name)
save_time_metagraph = 0
if not os.path.exists(metagraph_filename):
print('Saving metagraph')
start_time = time.time()
saver.export_meta_graph(metagraph_filename)
save_time_metagraph = time.time() - start_time
print('Metagraph saved in %.2f seconds' % save_time_metagraph)
summary = tf.Summary()
#pylint: disable=maybe-no-member
summary.value.add(tag='time/save_variables', simple_value=save_time_variables)
summary.value.add(tag='time/save_metagraph', simple_value=save_time_metagraph)
summary_writer.add_summary(summary, step)
def parse_arguments(argv):
'''
参数解析
'''
parser = argparse.ArgumentParser()
#日志文件保存路径
parser.add_argument('--logs_base_dir', type=str,
help='Directory where to write event logs.', default='~/logs/facenet')
#模型文件保存路径
parser.add_argument('--models_base_dir', type=str,
help='Directory where to write trained models and checkpoints.', default='~/models/facenet')
#GOU内存分配指定大小(百分比)
parser.add_argument('--gpu_memory_fraction', type=float,
help='Upper bound on the amount of GPU memory that will be used by the process.', default=1.0)
#加载预训练模型
parser.add_argument('--pretrained_model', type=str,
help='Load a pretrained model before training starts.')
#经过MTCNN对齐和人脸检测后的数据存放路径
parser.add_argument('--data_dir', type=str,
help='Path to the data directory containing aligned face patches.',
default='~/datasets/casia/casia_maxpy_mtcnnalign_182_160')
#指定网络结构
parser.add_argument('--model_def', type=str,
help='Model definition. Points to a module containing the definition of the inference graph.', default='models.inception_resnet_v1')
#训练epoch数
parser.add_argument('--max_nrof_epochs', type=int,
help='Number of epochs to run.', default=500)
#指定batch大小
parser.add_argument('--batch_size', type=int,
help='Number of images to process in a batch.', default=90)
#指定图片大小
parser.add_argument('--image_size', type=int,
help='Image size (height, width) in pixels.', default=160)
#每一个epoch的batches数量
parser.add_argument('--epoch_size', type=int,
help='Number of batches per epoch.', default=1000)
#embedding的维度
parser.add_argument('--embedding_size', type=int,
help='Dimensionality of the embedding.', default=128)
#随机裁切?
parser.add_argument('--random_crop',
help='Performs random cropping of training images. If false, the center image_size pixels from the training images are used. ' +
'If the size of the images in the data directory is equal to image_size no cropping is performed', action='store_true')
#随即翻转
parser.add_argument('--random_flip',
help='Performs random horizontal flipping of training images.', action='store_true')
#随机旋转
parser.add_argument('--random_rotate',
help='Performs random rotations of training images.', action='store_true')
parser.add_argument('--use_fixed_image_standardization',
help='Performs fixed standardization of images.', action='store_true')
#弃权系数
parser.add_argument('--keep_probability', type=float,
help='Keep probability of dropout for the fully connected layer(s).', default=1.0)
#正则化系数
parser.add_argument('--weight_decay', type=float,
help='L2 weight regularization.', default=0.0)
#中心损失和Softmax损失的平衡系数
parser.add_argument('--center_loss_factor', type=float,
help='Center loss factor.', default=0.0)
#中心损失的内部参数
parser.add_argument('--center_loss_alfa', type=float,
help='Center update rate for center loss.', default=0.95)
parser.add_argument('--prelogits_norm_loss_factor', type=float,
help='Loss based on the norm of the activations in the prelogits layer.', default=0.0)
parser.add_argument('--prelogits_norm_p', type=float,
help='Norm to use for prelogits norm loss.', default=1.0)
parser.add_argument('--prelogits_hist_max', type=float,
help='The max value for the prelogits histogram.', default=10.0)
#优化器
parser.add_argument('--optimizer', type=str, choices=['ADAGRAD', 'ADADELTA', 'ADAM', 'RMSPROP', 'MOM'],
help='The optimization algorithm to use', default='ADAGRAD')
#学习率
parser.add_argument('--learning_rate', type=float,
help='Initial learning rate. If set to a negative value a learning rate ' +
'schedule can be specified in the file "learning_rate_schedule.txt"', default=0.1)
parser.add_argument('--learning_rate_decay_epochs', type=int,
help='Number of epochs between learning rate decay.', default=100)
parser.add_argument('--learning_rate_decay_factor', type=float,
help='Learning rate decay factor.', default=1.0)
parser.add_argument('--moving_average_decay', type=float,
help='Exponential decay for tracking of training parameters.', default=0.9999)
parser.add_argument('--seed', type=int,
help='Random seed.', default=666)
parser.add_argument('--nrof_preprocess_threads', type=int,
help='Number of preprocessing (data loading and augmentation) threads.', default=4)
parser.add_argument('--log_histograms',
help='Enables logging of weight/bias histograms in tensorboard.', action='store_true')
parser.add_argument('--learning_rate_schedule_file', type=str,
help='File containing the learning rate schedule that is used when learning_rate is set to to -1.', default='data/learning_rate_schedule.txt')
parser.add_argument('--filter_filename', type=str,
help='File containing image data used for dataset filtering', default='')
parser.add_argument('--filter_percentile', type=float,
help='Keep only the percentile images closed to its class center', default=100.0)
parser.add_argument('--filter_min_nrof_images_per_class', type=int,
help='Keep only the classes with this number of examples or more', default=0)
parser.add_argument('--validate_every_n_epochs', type=int,
help='Number of epoch between validation', default=5)
parser.add_argument('--validation_set_split_ratio', type=float,
help='The ratio of the total dataset to use for validation', default=0.0)
parser.add_argument('--min_nrof_val_images_per_class', type=float,
help='Classes with fewer images will be removed from the validation set', default=0)
# Parameters for validation on LFW 检验参数
parser.add_argument('--lfw_pairs', type=str,
help='The file containing the pairs to use for validation.', default='data/pairs.txt')
#lfw数据集经过MTCNN进行人脸检测和对齐后的数据路径
parser.add_argument('--lfw_dir', type=str,
help='Path to the data directory containing aligned face patches.', default='')
parser.add_argument('--lfw_batch_size', type=int,
help='Number of images to process in a batch in the LFW test set.', default=100)
parser.add_argument('--lfw_nrof_folds', type=int,
help='Number of folds to use for cross validation. Mainly used for testing.', default=10)
parser.add_argument('--lfw_distance_metric', type=int,
help='Type of distance metric to use. 0: Euclidian, 1:Cosine similarity distance.', default=0)
parser.add_argument('--lfw_use_flipped_images',
help='Concatenates embeddings for the image and its horizontally flipped counterpart.', action='store_true')
parser.add_argument('--lfw_subtract_mean',
help='Subtract feature mean before calculating distance.', action='store_true')
return parser.parse_args(argv)
if __name__ == '__main__':
main(parse_arguments(sys.argv[1:]))
创建CNN网络,用于提取人脸特征;
训练数据准备阶段,采用TF队列机制加载数据集;
定义CNN网络损失函数,L2正则化、中心损失函数、交叉熵代价函数(严格来说是softmax损失函数);
开始在数据集上训练、并在LFW上测试;
有时候,我们需要用自己的数据集对预训练好的模型进行重新训练,或者之前训练了一个模型之后,觉得训练轮数不够,又不想从头开始训练,这样,在训练之前就要把之前训练的模型重新加载进去,方式如下:
第一步:添加预训练模型的参数:
在train_tripletloss.py中找到这样一个语句:
改成这样:
parser.add_argument('--pretrained_model', type=str,
help='Load a pretrained model before training starts.',default='模型所在路径')
第二步:解决程序中的一个小bug
如果只是完成了第一步,运行程序会报错。经过调试,是因为程序有一个小的bug需要修复:
找到这一行程序:
可以看出,这一处函数的作用是:如果预训练模型这个参数非空,那么用tensorflow的saver.restore()函数重新加载模型参数,但是此处会报错,
那么我们模仿compare.py函数中的加载模型方法,将这个函数改为:
facenet.load_model(args.pretrained_model)
然后运行程序,发现程序已经可以正常执行了。
如果不放心,可以取一个已经训练好的模型,加载之后训练一轮,会发现初始的损失函数非常小,同时,训练一轮之后模型的准确率已经和加载的预训练模型准确率差不多了,说明模型加载成功。
参考文章:
[1]Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks
[2]官方代码
[3]其他代码实现(MXNet)
[4]21个项目玩转深度学习 何之源(部分内容来自于该书,第六章,GitHub网址)
[5]如何通过OpenFace实现人脸识别框架
[6]如何应用MTCNN和FaceNet模型实现人脸检测及识别(原理讲解还是比较细的)
[7]MTCNN 的 TensorFlow 实现
[8]人脸识别(Facenet)
[9]【数据库】FaceDataset常用的人脸数据库
[10]https://www.cnblogs.com/zyly/p/9703614.html#_label3
还有很多文章帮了大忙,但时间过了挺久,已经忘了链接,这里统一表示感谢!