L4W4作业1 人脸识别 - the Happy House



  • 人脸验证:比如在某些机场,系统通过扫描你的护照,然后确认你(携带护照的人)是本人,从而通过海关。也比如使用脸部解锁的手机。通常这一类是1:1匹配的问题。
  • 人脸识别:例如讲座显示了一个百度员工进入办公室的人脸识别视频。此类则是1:K匹配问题。



  • 实现triplet loss 损失函数
  • 使用预先训练的模型将人脸图像映射为128维编码
  • 使用这些编码实现人脸验证和人脸识别

在本练习中,我们将使用预训练的模型,该模型使用"channels first"来表示ConvNet激活,而不是像先前的编程作业一样使用"channels last"。换句话说,一批图像将具有 ( m , n C , n H , n W ) (m, n_C, n_H, n_W) (m,nC,nH,nW)的维度 而非 ( m , n H , n W , n C ) (m, n_H, n_W, n_C) (m,nH,nW,nC)。这两种方式在开源实现中都有相当大的吸引力。深度学习中也没有统一的标准。


In [1]:

cd /home/kesci/input/deeplearning113246

In [2]:

from keras.models import Sequential
from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from keras.models import Model
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import MaxPooling2D, AveragePooling2D
from keras.layers.merge import Concatenate
from keras.layers.core import Lambda, Flatten, Dense
from keras.initializers import glorot_uniform
from keras.engine.topology import Layer
from keras import backend as K
import cv2
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
from fr_utils import *
from inception_blocks_v2 import *

%matplotlib inline
%load_ext autoreload
%autoreload 2

0 人脸验证


L4W4作业1 人脸识别 - the Happy House_第1张图片

图 1


你会发现,可以编码 f ( i m g ) f(img) f(img)而不是使用原始图像,这样对该编码进行逐元素比较就可以更准确地判断两张图片是否属于同一个人。

1 将人脸图像编码为128维向量

1.1 使用ConvNet计算编码

FaceNet模型需要大量训练数据并需要很长时间去训练。因此,按照深度学习中的常规应用做法,我们加载已经训练好的权重。网络结构遵循Szegedy et al.中的Inception模型。我们提供了初始网络实现。你可以查看文件inception_blocks.py以了解其实现方式(转到Jupyter笔记本顶部的"File->Open…")。


  • 该网络使用96x96尺寸的RGB图像作为输入。具体来说,输入一张人脸图像(或一批m人脸图像)作为维度为 ( m , n C , n H , n W ) = ( m , 3 , 96 , 96 ) (m, n_C, n_H, n_W) = (m, 3, 96, 96) (m,nC,nH,nW)=(m,3,96,96) 的张量
  • 输出维度为(m,128)的矩阵,该矩阵将每个输入的面部图像编码为128维向量


In [3]:

FRmodel = faceRecoModel(input_shape=(3, 96, 96))
WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.

In [4]:

print("Total Params:", FRmodel.count_params())
Total Params: 3743280

L4W4作业1 人脸识别 - the Happy House_第2张图片

图 2:通过计算两种编码和阈值之间的距离,以确定两张图片是否代表同一个人


  • 同一个人的两张图像的编码彼此非常相似
  • 不同人的两幅图像的编码差距明显

triplet loss损失函数促进此实现,它尝试将“同一个人(锚点和正向)”的两个图像的编码“推”得更近,同时将另外一个人(锚点,负向)的两个图像的编码“拉”得更远。

L4W4作业1 人脸识别 - the Happy House_第3张图片

图 3

1.2 三元组损失

对于图像x,其编码表示为 f ( x ) f(x) f(x),其中f是神经网络的计算函数。
L4W4作业1 人脸识别 - the Happy House_第4张图片

我们在模型的末尾添加一个标准化步骤,以使 ∣ ∣ f ( x ) ∣ ∣ 2 = 1 \mid \mid f(x) \mid \mid_2 = 1 ∣∣f(x)2=1(意味着编码向量应为范数1)。

训练将使用三组图像 (A,P,N):

  • A是“锚示例”图像:人的照片。
  • P是“正示例”图像:与锚示例图像相同的人的照片。
  • N是“负示例”图像:与锚示例图像不同的人的照片。

这些图像是从我们的训练集中选取的。我们使用 ( A ( i ) , P ( i ) , N ( i ) ) (A^{(i)}, P^{(i)}, N^{(i)}) (A(i),P(i),N(i))来表示第i个训练示例。

如果你想确定一个人的图像 A ( i ) A^{(i)} A(i)比负例图像 B ( i ) B^{(i)} B(i)更接近正例图像 P ( i ) P^{(i)} P(i)至少要保证α:
∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 + α < ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 + \alpha < \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 ∣∣f(A(i))f(P(i))22+α<∣∣f(A(i))f(N(i))22

因此,你需要最小化以下"triplet cost":
J = ∑ i = 1 m [ ∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 ⏟ (1) − ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 ⏟ (2) + α ] + (3) \mathcal{J} = \sum^{m}_{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2}_\text{(1)} - \underbrace{\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2}_\text{(2)} + \alpha \large ] \small_+ \tag{3} J=i=1m[(1) ∣∣f(A(i))f(P(i))22(2) ∣∣f(A(i))f(N(i))22+α]+(3)

在这里,我们使用符号" [ z ] + [z]_+ [z]+"表示 m a x ( z , 0 ) max(z,0) max(z,0)


  • 项(1)是给定三元组的锚示例“A”与正示例“P”之间的平方距离;期望最小化的值。
  • 项(2)是给定三元组的锚示例“A”和负示例“N”之间的平方距离,期望该值相对较大,因此在它前面有一个负号是有意义的。
  • α称为边距。它是一个超参数可以手动调节。我们将使用α=0.2。

大多数实现方法还需对编码向量进行标准化以使其范数等于1(即 ∣ ∣ f ( i m g ) ∣ ∣ 2 \mid \mid f(img)\mid \mid_2 ∣∣f(img)2=1)


  1. 计算“锚示例”和“正示例”编码之间的距离: ∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 ∣∣f(A(i))f(P(i))22

  2. 计算“锚示例”和“负示例”编码之间的距离: ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 ∣∣f(A(i))f(N(i))22

  3. 根据每个训练示例计算公式:
    ∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ − ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 + α \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid - \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 + \alpha ∣∣f(A(i))f(P(i))∣∣f(A(i))f(N(i))22+α

  4. 通过将最大值取为零并对训练示例求和来计算完整公式:

    KaTeX parse error: \tag works only in display equations

一些有用的函数:tf.reduce_sum(), tf.square(), tf.subtract(), tf.add(), tf.maximum()
对于步骤1和步骤2,你需要加上 ∣ ∣ f ( A ( i ) ) − f ( P ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 ∣∣f(A(i))f(P(i))22 and ∣ ∣ f ( A ( i ) ) − f ( N ( i ) ) ∣ ∣ 2 2 \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 ∣∣f(A(i))f(N(i))22 ,而在第4步中,你需要将训练示例求总。

In [5]:

# GRADED FUNCTION: triplet_loss

def triplet_loss(y_true, y_pred, alpha = 0.2):
    Implementation of the triplet loss as defined by formula (3)
    y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
    y_pred -- python list containing three objects:
            anchor -- the encodings for the anchor images, of shape (None, 128)
            positive -- the encodings for the positive images, of shape (None, 128)
            negative -- the encodings for the negative images, of shape (None, 128)
    loss -- real number, value of the loss
    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
    ### START CODE HERE ### (≈ 4 lines)
    # Step 1: Compute the (encoding) distance between the anchor and the positive, you will need to sum over axis=-1
    pos_dist =  tf.reduce_sum(tf.square(tf.subtract(anchor, positive)))# ,axis=-1 
    # Step 2: Compute the (encoding) distance between the anchor and the negative, you will need to sum over axis=-1
    neg_dist =  tf.reduce_sum(tf.square(tf.subtract(anchor, negative)))
    # Step 3: subtract the two previous distances and add alpha.
    basic_loss = tf.add(tf.subtract(pos_dist,neg_dist),alpha)
    # Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
    loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))
    ### END CODE HERE ###
    #如果加上axis=-1,计算出的loss就和expected output相同,但是不能通过作业验证
    return loss

In [6]:

with tf.Session() as test:
    y_true = (None, None, None)
    y_pred = (tf.random_normal([3, 128], mean=6, stddev=0.1, seed = 1),
              tf.random_normal([3, 128], mean=1, stddev=1, seed = 1),
              tf.random_normal([3, 128], mean=3, stddev=4, seed = 1))
    loss = triplet_loss(y_true, y_pred)
    print("loss = " + str(loss.eval()))
loss = 350.02716


2 加载训练后的模型


In [7]:

FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])


L4W4作业1 人脸识别 - the Happy House_第5张图片

图 4


3 模型应用

回到 the Happy House(数据集介绍可参考L4W2KT作业)! 自从你在较早的任务中实现了对房子的幸福感识别以来,居民就过着幸福的生活。



3.1 人脸验证

让我们建立一个数据库,其中包含允许进入幸福屋的人的编码向量。我们使用img_to_encoding(image_path, model)生成编码,它基本上在指定的图像上运行模型的正向传播。


In [8]:

database = {}
database["danielle"] = img_to_encoding("images/danielle.png", FRmodel)
database["younes"] = img_to_encoding("images/younes.jpg", FRmodel)
database["tian"] = img_to_encoding("images/tian.jpg", FRmodel)
database["andrew"] = img_to_encoding("images/andrew.jpg", FRmodel)
database["kian"] = img_to_encoding("images/kian.jpg", FRmodel)
database["dan"] = img_to_encoding("images/dan.jpg", FRmodel)
database["sebastiano"] = img_to_encoding("images/sebastiano.jpg", FRmodel)
database["bertrand"] = img_to_encoding("images/bertrand.jpg", FRmodel)
database["kevin"] = img_to_encoding("images/kevin.jpg", FRmodel)
database["felix"] = img_to_encoding("images/felix.jpg", FRmodel)
database["benoit"] = img_to_encoding("images/benoit.jpg", FRmodel)
database["arnaud"] = img_to_encoding("images/arnaud.jpg", FRmodel)


练习:实现 verify() 函数,该函数检查前门摄像头拍摄到的图片(image_path)是否是本人。你需要执行以下步骤:

  1. 从image_path计算图像的编码
  2. 计算此编码和存储在数据库中的身份图像的编码的距离
  3. 如果距离小于0.7,打开门,否则不要打开。


In [9]:


def verify(image_path, identity, database, model):
    Function that verifies if the person on the "image_path" image is "identity".
    image_path -- path to an image
    identity -- string, name of the person you'd like to verify the identity. Has to be a resident of the Happy house.
    database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).
    model -- your Inception model instance in Keras
    dist -- distance between the image_path and the image of "identity" in the database.
    door_open -- True, if the door should open. False otherwise.
    ### START CODE HERE ###

    # Step 1: Compute the encoding for the image. Use img_to_encoding() see example above. (≈ 1 line)
    encoding = img_to_encoding(image_path,model)

    # Step 2: Compute distance with identity's image (≈ 1 line)
    dist = np.linalg.norm(encoding-database[identity])

    # Step 3: Open the door if dist < 0.7, else don't open (≈ 3 lines)
    if dist<0.7:
        print("It's " + str(identity) + ", welcome home!")
        door_open = True
        print("It's not " + str(identity) + ", please go away")
        door_open = False

    ### END CODE HERE ###
    return dist, door_open


In [10]:

verify("images/camera_0.jpg", "younes", database, FRmodel)
It's younes, welcome home!


(0.67100745, True)

It’s younes, welcome home!
上周末破坏水族馆的Benoit已被禁止进入房屋,并已从数据库中删除。他偷了Kian的身份证,然后回到屋子里,试图把自己打扮成Kian。 前门摄像头拍摄了Benoit的照片(“images/camera_2.jpg”)。让我们运行验证算法来检查benoit是否可以进入。

In [11]:

verify("images/camera_2.jpg", "kian", database, FRmodel)
It's not kian, please go away


(0.8580015, False)

It’s not kian, please go away
3.2 人脸识别





  1. 从image_path计算图像的目标编码
  2. 从数据库中查找与目标编码距离最短的编码。
    - 将min_dist变量初始化为足够大的数字(100)。这将帮助你跟踪最接近输入编码的编码。
    - 遍历数据库字典的名称和编码。循环使用for (name, db_enc) in database.items()
    - 计算目标“编码”与数据库中当前“编码”之间的L2距离。
    - 如果此距离小于min_dist,则将min_dist设置为dist,并将identity设置为name。

In [12]:

# GRADED FUNCTION: who_is_it

def who_is_it(image_path, database, model):
    Implements face recognition for the happy house by finding who is the person on the image_path image.
    image_path -- path to an image
    database -- database containing image encodings along with the name of the person on the image
    model -- your Inception model instance in Keras
    min_dist -- the minimum distance between image_path encoding and the encodings from the database
    identity -- string, the name prediction for the person on image_path
    ### START CODE HERE ### 

    ## Step 1: Compute the target "encoding" for the image. Use img_to_encoding() see example above. ## (≈ 1 line)
    encoding = img_to_encoding(image_path,model)

    ## Step 2: Find the closest encoding ##

    # Initialize "min_dist" to a large value, say 100 (≈1 line)
    min_dist = 100

    # Loop over the database dictionary's names and encodings.
    for (name, db_enc) in database.items():

        # Compute L2 distance between the target "encoding" and the current "emb" from the database. (≈ 1 line)
        dist = np.linalg.norm(encoding-db_enc)

        # If this distance is less than the min_dist, then set min_dist to dist, and identity to name. (≈ 3 lines)
        if dist<min_dist:
            min_dist = dist
            identity = name

    ### END CODE HERE ###
    if min_dist > 0.7:
        print("Not in the database.")
        print ("it's " + str(identity) + ", the distance is " + str(min_dist))
    return min_dist, identity


In [13]:

who_is_it("images/camera_0.jpg", database, FRmodel)
it's younes, the distance is 0.67100745


(0.67100745, 'younes')

it’s younes, the distance is 0.67100745
你可以将"camera_0.jpg"(younes的图片)更改为"camera_1.jpg" (bertrand的图片),然后查看结果。




  • 将每个人的更多图像(在不同的光照条件下,在不同的日子等拍摄的图像)放入数据库中。然后给定新图像,将新面孔与人物的多张图片进行比较以提高准确性。
  • 裁剪仅包含脸部的图像,并减少脸部周围的“边框”区域。该预处理去除了面部周围的一些无关像素,并且还使算法更加健壮。


  • 人脸验证解决了更简单的1:1匹配问题;人脸识别则解决了更难的1:K匹配问题。
  • 三元组损失是用于训练神经网络以学习面部图像编码的有效损失函数。
  • 相同的编码可用于验证和识别。通过测量两个图像的编码之间的距离,可以确定它们是否是同一个人的照片。



  • Florian Schroff, Dmitry Kalenichenko, James Philbin (2015). FaceNet: A Unified Embedding for Face Recognition and Clustering
  • Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, Lior Wolf (2014). DeepFace: Closing the gap to human-level performance in face verification
  • The pretrained model we use is inspired by Victor Sy Wang’s implementation and was loaded using his code: https://github.com/iwantooxxoox/Keras-OpenFace.
  • Our implementation also took a lot of inspiration from the official FaceNet github repository: https://github.com/davidsandberg/facenet
