欢迎来到第4周的作业1!在此次作业中,你将学习构建人脸识别系统。
此笔记本中许多内容想法都来自于FaceNet。
在视频教程中,我们还提到了DeepFace。
人脸识别问题通常分为两类:
FaceNet网络将人脸图像编码为128个数字向量并学习,通过比较两个这样的向量,以确定两个图片是否是同一个人。
在此作业中,你将:
在本练习中,我们将使用预训练的模型,该模型使用"channels first"来表示ConvNet激活,而不是像先前的编程作业一样使用"channels last"。换句话说,一批图像将具有 ( m , n C , n H , n W ) (m, n_C, n_H, n_W) (m,nC,nH,nW)的维度 而非 ( m , n H , n W , n C ) (m, n_H, n_W, n_C) (m,nH,nW,nC)。这两种方式在开源实现中都有相当大的吸引力。深度学习中也没有统一的标准。
首先让我们加载所需的软件包。
In [1]:
cd /home/kesci/input/deeplearning113246
/home/kesci/input/deeplearning113246
In [2]:
from keras.models import Sequential
from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from keras.models import Model
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import MaxPooling2D, AveragePooling2D
from keras.layers.merge import Concatenate
from keras.layers.core import Lambda, Flatten, Dense
from keras.initializers import glorot_uniform
from keras.engine.topology import Layer
from keras import backend as K
K.set_image_data_format('channels_first')
import cv2
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
from fr_utils import *
from inception_blocks_v2 import *
%matplotlib inline
%load_ext autoreload
%autoreload 2
np.set_printoptions(threshold=np.nan)
Using TensorFlow backend.
Matplotlib is building the font cache using fc-list. This may take a moment.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-7bae747a1a03> in <module>
23 get_ipython().run_line_magic('autoreload', '2')
24
---> 25 np.set_printoptions(threshold=np.nan)
/opt/conda/lib/python3.6/site-packages/numpy/core/arrayprint.py in set_printoptions(precision, threshold, edgeitems, linewidth, suppress, nanstr, infstr, formatter, sign, floatmode, **kwarg)
244 opt = _make_options_dict(precision, threshold, edgeitems, linewidth,
245 suppress, nanstr, infstr, sign, formatter,
--> 246 floatmode, legacy)
247 # formatter is always reset
248 opt['formatter'] = formatter
/opt/conda/lib/python3.6/site-packages/numpy/core/arrayprint.py in _make_options_dict(precision, threshold, edgeitems, linewidth, suppress, nanstr, infstr, sign, formatter, floatmode, legacy)
91 # forbid the bad threshold arg suggested by stack overflow, gh-12351
92 if not isinstance(threshold, numbers.Number) or np.isnan(threshold):
---> 93 raise ValueError("threshold must be numeric and non-NAN, try "
94 "sys.maxsize for untruncated representation")
95 return options
ValueError: threshold must be numeric and non-NAN, try sys.maxsize for untruncated representation
在人脸验证中,你将获得两张图像,并且必须确定它们是否属于同一个人。最简单的方法是逐像素比较两个图像。如果原始图像之间的距离小于选定的阈值,则可能是同一个人!
图 1
当然,此算法的性能确实很差,因为像素值会由于光照,人脸方向,甚至头部位置的微小变化等因素而急剧变化。
你会发现,可以编码 f ( i m g ) f(img) f(img)而不是使用原始图像,这样对该编码进行逐元素比较就可以更准确地判断两张图片是否属于同一个人。
FaceNet模型需要大量训练数据并需要很长时间去训练。因此,按照深度学习中的常规应用做法,我们加载已经训练好的权重。网络结构遵循Szegedy et al.中的Inception模型。我们提供了初始网络实现。你可以查看文件inception_blocks.py
以了解其实现方式(转到Jupyter笔记本顶部的"File->Open…")。
你需要知道的关键事项是:
运行下面的单元格以创建人脸图像模型。
In [3]:
FRmodel = faceRecoModel(input_shape=(3, 96, 96))
WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
In [4]:
print("Total Params:", FRmodel.count_params())
Total Params: 3743280
预期输出:
Total Params: 3743280
通过使用128个神经元组成的全连接层作为最后一层,该模型可确保输出是大小为128的编码向量。然后,使用该编码比较两个人脸图像,如下所示:
图 2:通过计算两种编码和阈值之间的距离,以确定两张图片是否代表同一个人
如果满足以下条件,编码将是一种不错的选择:
triplet loss损失函数促进此实现,它尝试将“同一个人(锚点和正向)”的两个图像的编码“推”得更近,同时将另外一个人(锚点,负向)的两个图像的编码“拉”得更远。
图 3:
在下一部分中,我们将从左到右调用图片:锚点(A),正向(P),负向(N)
对于图像x,其编码表示为 f ( x ) f(x) f(x),其中f是神经网络的计算函数。
我们在模型的末尾添加一个标准化步骤,以使 ∣ ∣ f ( x ) ∣ ∣ 2 = 1 \mid \mid f(x) \mid \mid_2 = 1 ∣∣f(x)∣∣2=1(意味着编码向量应为范数1)。
训练将使用三组图像 (A,P,N):
这些图像是从我们的训练集中选取的。我们使用 ( A ( i ) , P ( i ) , N ( i ) ) (A^{(i)}, P^{(i)}, N^{(i)}) (A(i),P(i),N(i))来表示第i个训练示例。
如果你想确定一个人的图像 A ( i ) A^{(i)} A(i)比负例图像 B ( i ) B^{(i)} B(i)更接近正例图像 P ( i ) P^{(i)} P(i)至少要保证α:
∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 + α < ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 + \alpha < \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 ∣∣f(A(i))−f(P(i))∣∣22+α<∣∣f(A(i))−f(N(i))∣∣22
因此,你需要最小化以下"triplet cost":
J = ∑ i = 1 m [ ∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 ⏟ (1) − ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 ⏟ (2) + α ] + (3) \mathcal{J} = \sum^{m}_{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2}_\text{(1)} - \underbrace{\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2}_\text{(2)} + \alpha \large ] \small_+ \tag{3} J=i=1∑m[(1) ∣∣f(A(i))−f(P(i))∣∣22−(2) ∣∣f(A(i))−f(N(i))∣∣22+α]+(3)
在这里,我们使用符号" [ z ] + [z]_+ [z]+"表示 m a x ( z , 0 ) max(z,0) max(z,0)。
注意:
大多数实现方法还需对编码向量进行标准化以使其范数等于1(即 ∣ ∣ f ( i m g ) ∣ ∣ 2 \mid \mid f(img)\mid \mid_2 ∣∣f(img)∣∣2=1)
练习:实现公式(3)定义的三元组损失。包含4个步骤:
计算“锚示例”和“正示例”编码之间的距离: ∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 ∣∣f(A(i))−f(P(i))∣∣22
计算“锚示例”和“负示例”编码之间的距离: ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 ∣∣f(A(i))−f(N(i))∣∣22
根据每个训练示例计算公式:
∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ − ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 + α \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid - \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 + \alpha ∣∣f(A(i))−f(P(i))∣−∣∣f(A(i))−f(N(i))∣∣22+α
通过将最大值取为零并对训练示例求和来计算完整公式:
KaTeX parse error: \tag works only in display equations
一些有用的函数:tf.reduce_sum()
, tf.square()
, tf.subtract()
, tf.add()
, tf.maximum()
。
对于步骤1和步骤2,你需要加上 ∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 ∣∣f(A(i))−f(P(i))∣∣22 and ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 ∣∣f(A(i))−f(N(i))∣∣22 ,而在第4步中,你需要将训练示例求总。
In [5]:
# GRADED FUNCTION: triplet_loss
def triplet_loss(y_true, y_pred, alpha = 0.2):
"""
Implementation of the triplet loss as defined by formula (3)
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 128)
positive -- the encodings for the positive images, of shape (None, 128)
negative -- the encodings for the negative images, of shape (None, 128)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
### START CODE HERE ### (≈ 4 lines)
# Step 1: Compute the (encoding) distance between the anchor and the positive, you will need to sum over axis=-1
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)))# ,axis=-1
# Step 2: Compute the (encoding) distance between the anchor and the negative, you will need to sum over axis=-1
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)))
# Step 3: subtract the two previous distances and add alpha.
basic_loss = tf.add(tf.subtract(pos_dist,neg_dist),alpha)
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))
### END CODE HERE ###
#如果加上axis=-1,计算出的loss就和expected output相同,但是不能通过作业验证
return loss
In [6]:
with tf.Session() as test:
tf.set_random_seed(1)
y_true = (None, None, None)
y_pred = (tf.random_normal([3, 128], mean=6, stddev=0.1, seed = 1),
tf.random_normal([3, 128], mean=1, stddev=1, seed = 1),
tf.random_normal([3, 128], mean=3, stddev=4, seed = 1))
loss = triplet_loss(y_true, y_pred)
print("loss = " + str(loss.eval()))
loss = 350.02716
预期输出:
loss=528.143
通过最小化三元组损失来训练FaceNet。但是由于训练需要大量数据和计算,因此在这里我们不会从头开始进行训练。我们加载以前训练好的模型。使用以下单元格加载模型;这将花费几分钟才能运行。
In [7]:
FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])
load_weights_from_FaceNet(FRmodel)
这是三个示例的编码距离:
图 4:
三人的编码距离输出示例
现在,让我们使用此模型执行人脸验证和人脸识别!
回到 the Happy House(数据集介绍可参考L4W2KT作业)! 自从你在较早的任务中实现了对房子的幸福感识别以来,居民就过着幸福的生活。
但是,有几个问题不断出现:快乐之家变得如此高兴,以至于附近的每个快乐的人都在你的客厅里闲逛。房屋变得很拥挤,这对里面的居民产生了负面影响。所有其他快乐的人也在吃你的食物。
因此,你决定更改门禁政策,不让随机快乐的人进入,即使他们Happy!相反,你想构建一个“人脸验证”系统,以便仅允许指定列表中的人员进入。要被录取,每个人都必须刷一张ID卡(识别卡)才能触发门上的面部识别系统,然后检查他们是否是本人。
让我们建立一个数据库,其中包含允许进入幸福屋的人的编码向量。我们使用img_to_encoding(image_path, model)
生成编码,它基本上在指定的图像上运行模型的正向传播。
运行以下代码以构建数据库(以python字典表示)。该数据库将每个人的姓名映射为其面部的128维编码。
In [8]:
database = {}
database["danielle"] = img_to_encoding("images/danielle.png", FRmodel)
database["younes"] = img_to_encoding("images/younes.jpg", FRmodel)
database["tian"] = img_to_encoding("images/tian.jpg", FRmodel)
database["andrew"] = img_to_encoding("images/andrew.jpg", FRmodel)
database["kian"] = img_to_encoding("images/kian.jpg", FRmodel)
database["dan"] = img_to_encoding("images/dan.jpg", FRmodel)
database["sebastiano"] = img_to_encoding("images/sebastiano.jpg", FRmodel)
database["bertrand"] = img_to_encoding("images/bertrand.jpg", FRmodel)
database["kevin"] = img_to_encoding("images/kevin.jpg", FRmodel)
database["felix"] = img_to_encoding("images/felix.jpg", FRmodel)
database["benoit"] = img_to_encoding("images/benoit.jpg", FRmodel)
database["arnaud"] = img_to_encoding("images/arnaud.jpg", FRmodel)
现在,当有人出现在你的前门并刷他们的身份证时,你可以在数据库中查找他们的编码,并用它来检查站在前门的人是否是本人。
练习:实现 verify() 函数,该函数检查前门摄像头拍摄到的图片(image_path
)是否是本人。你需要执行以下步骤:
如上所述,你应该使用L2距离(np.linalg.norm)。(注意:在此实现中,将L2距离而不是L2距离的平方与阈值0.7进行比较。)
In [9]:
# GRADED FUNCTION: verify
def verify(image_path, identity, database, model):
"""
Function that verifies if the person on the "image_path" image is "identity".
Arguments:
image_path -- path to an image
identity -- string, name of the person you'd like to verify the identity. Has to be a resident of the Happy house.
database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).
model -- your Inception model instance in Keras
Returns:
dist -- distance between the image_path and the image of "identity" in the database.
door_open -- True, if the door should open. False otherwise.
"""
### START CODE HERE ###
# Step 1: Compute the encoding for the image. Use img_to_encoding() see example above. (≈ 1 line)
encoding = img_to_encoding(image_path,model)
# Step 2: Compute distance with identity's image (≈ 1 line)
dist = np.linalg.norm(encoding-database[identity])
# Step 3: Open the door if dist < 0.7, else don't open (≈ 3 lines)
if dist<0.7:
print("It's " + str(identity) + ", welcome home!")
door_open = True
else:
print("It's not " + str(identity) + ", please go away")
door_open = False
### END CODE HERE ###
return dist, door_open
尤恩斯(Younes)试图进入快乐之家,然后相机为他拍照(“images/camera_0.jpg”)。让我们在这张图片上运行你的验证算法:
In [10]:
verify("images/camera_0.jpg", "younes", database, FRmodel)
It's younes, welcome home!
Out[10]:
(0.67100745, True)
预期输出:
It’s younes, welcome home!
(0.67100745, True)
上周末破坏水族馆的Benoit已被禁止进入房屋,并已从数据库中删除。他偷了Kian的身份证,然后回到屋子里,试图把自己打扮成Kian。 前门摄像头拍摄了Benoit的照片(“images/camera_2.jpg”)。让我们运行验证算法来检查benoit是否可以进入。
In [11]:
verify("images/camera_2.jpg", "kian", database, FRmodel)
It's not kian, please go away
Out[11]:
(0.8580015, False)
预期输出:
It’s not kian, please go away
(0.8580015, False)
你的人脸验证系统在大部分情况下运行良好。但是自从肯恩(Kian)的身份证被盗以来,那天晚上他回到家中时,他进不了门了!
为了减少这种恶作剧,你想将人脸验证系统更改为人脸识别系统。这样,不再需要携带身份证。授权人员可以走到房屋前,前门将为他们解锁!
为此,你将实现一个人脸识别系统,该系统将图像作为输入,并确定该图像是否是授权人员之一。与以前的人脸验证系统不同,我们将不再获得一个人的名字作为其他输入。
练习:实现who_is_it()
,你需要执行以下步骤:
min_dist
变量初始化为足够大的数字(100)。这将帮助你跟踪最接近输入编码的编码。for (name, db_enc) in database.items()
。In [12]:
# GRADED FUNCTION: who_is_it
def who_is_it(image_path, database, model):
"""
Implements face recognition for the happy house by finding who is the person on the image_path image.
Arguments:
image_path -- path to an image
database -- database containing image encodings along with the name of the person on the image
model -- your Inception model instance in Keras
Returns:
min_dist -- the minimum distance between image_path encoding and the encodings from the database
identity -- string, the name prediction for the person on image_path
"""
### START CODE HERE ###
## Step 1: Compute the target "encoding" for the image. Use img_to_encoding() see example above. ## (≈ 1 line)
encoding = img_to_encoding(image_path,model)
## Step 2: Find the closest encoding ##
# Initialize "min_dist" to a large value, say 100 (≈1 line)
min_dist = 100
# Loop over the database dictionary's names and encodings.
for (name, db_enc) in database.items():
# Compute L2 distance between the target "encoding" and the current "emb" from the database. (≈ 1 line)
dist = np.linalg.norm(encoding-db_enc)
# If this distance is less than the min_dist, then set min_dist to dist, and identity to name. (≈ 3 lines)
if dist<min_dist:
min_dist = dist
identity = name
### END CODE HERE ###
if min_dist > 0.7:
print("Not in the database.")
else:
print ("it's " + str(identity) + ", the distance is " + str(min_dist))
return min_dist, identity
尤恩斯(Younes)在前门,相机为他拍照(“images/camera_0.jpg”)。让我们看看你的who_it_is()算法是否可以识别Younes。
In [13]:
who_is_it("images/camera_0.jpg", database, FRmodel)
it's younes, the distance is 0.67100745
Out[13]:
(0.67100745, 'younes')
预期输出:
it’s younes, the distance is 0.67100745
(0.67100745, ‘younes’)
你可以将"camera_0.jpg
"(younes的图片)更改为"camera_1.jpg
" (bertrand的图片),然后查看结果。
你的快乐之家运作良好。它只允许经过授权的人员进入,而人们也不再需要携带身份证!
现在你已经了解了最新的人脸识别系统是如何工作的。
尽管我们不会在这里实现它,但是这里有一些方法可以进一步改进算法:
你应该记住:
恭喜你完成此作业!