如何使用计算机视觉在Python 3中构建基于情感的狗过滤器

The author selected Girls Who Code to receive a donation as part of the Write for DOnations program.

作者选择了《编码的女孩》作为Write for DOnations计划的一部分来接受捐赠。

介绍 (Introduction)

Computer vision is a subfield of computer science that aims to extract a higher-order understanding from images and videos. This field includes tasks such as object detection, image restoration (matrix completion), and optical flow. Computer vision powers technologies such as self-driving car prototypes, employee-less grocery stores, fun Snapchat filters, and your mobile device’s face authenticator.

计算机视觉是计算机科学的一个子领域,旨在从图像和视频中提取更高层次的理解。 该字段包括诸如对象检测,图像恢复(矩阵完成)和光流之类的任务。 计算机视觉为自动驾驶汽车原型,无员工杂货店,有趣的Snapchat过滤器以及移动设备的面部识别器等技术提供了支持。

In this tutorial, you will explore computer vision as you use pre-trained models to build a Snapchat-esque dog filter. For those unfamiliar with Snapchat, this filter will detect your face and then superimpose a dog mask on it. You will then train a face-emotion classifier so that the filter can pick dog masks based on emotion, such as a corgi for happy or a pug for sad. Along the way, you will also explore related concepts in both ordinary least squares and computer vision, which will expose you to the fundamentals of machine learning.

在本教程中,您将使用预先训练的模型来构建Snapchat式的狗过滤器,从而探索计算机视觉。 对于那些不熟悉Snapchat的人,此过滤器将检测您的脸,然后在其上叠加一个狗口罩。 然后,您将训练面部表情分类器,以便过滤器可以根据情感选择狗口罩,例如,使小狗快乐的狗狗或使悲伤的哈巴狗的狗狗。 在此过程中,您还将探索普通最小二乘法和计算机视觉中的相关概念,这将使您了解机器学习的基础知识。

As you work through the tutorial, you’ll use OpenCV, a computer-vision library, numpy for linear algebra utilities, and matplotlib for plotting. You’ll also apply the following concepts as you build a computer-vision application:

在学习本教程时,将使用OpenCV (一个计算机视觉库), numpy用于线性代数实用程序)和matplotlib用于绘图)。 在构建计算机视觉应用程序时,您还将应用以下概念:

  • Ordinary least squares as a regression and classification technique.

    普通最小二乘作为回归和分类技术。
  • The basics of stochastic gradient neural networks.

    随机梯度神经网络的基础。

While not necessary to complete this tutorial, you’ll find it easier to understand some of the more detailed explanations if you’re familiar with these mathematical concepts:

虽然不必完成本教程,但如果您熟悉以下数学概念,就会发现更容易理解一些更详细的解释:

  • Fundamental linear algebra concepts: scalars, vectors, and matrices.

    基本线性代数概念:标量,向量和矩阵。
  • Fundamental calculus: how to take a derivative.

    基本演算:如何取导数。

You can find the complete code for this tutorial at https://github.com/do-community/emotion-based-dog-filter.

您可以在https://github.com/do-community/emotion-based-dog-filter上找到本教程的完整代码。

Let’s get started.

让我们开始吧。

先决条件 (Prerequisites)

To complete this tutorial, you will need the following:

要完成本教程,您将需要以下内容:

  • A local development environment for Python 3 with at least 1GB of RAM. You can follow How To Install and Set Up a Local Programming Environment for Python 3 to configure everything you need.

    具有至少1GB RAM的Python 3本地开发环境。 您可以按照如何为Python 3安装和设置本地编程环境来配置所需的一切。

  • A working webcam to do real-time image detection.

    可以进行实时图像检测的有效网络摄像头。

第1步-创建项目并安装依赖项 (Step 1 — Creating The Project and Installing Dependencies)

Let’s create a workspace for this project and install the dependencies we’ll need. We’ll call our workspace DogFilter:

让我们为该项目创建一个工作区并安装所需的依赖项。 我们将工作区DogFilter

  • mkdir ~/DogFilter

    mkdir〜/ DogFilter

Navigate to the DogFilter directory:

导航到DogFilter目录:

  • cd ~/DogFilter

    cd〜/ DogFilter

Then create a new Python virtual environment for the project:

然后为项目创建一个新的Python虚拟环境:

  • python3 -m venv dogfilter

    python3 -m venv dogfilter

Activate your environment.

激活您的环境。

  • source dogfilter/bin/activate

    源dogfilter / bin / activate

The prompt changes, indicating the environment is active. Now install PyTorch, a deep-learning framework for Python that we’ll use in this tutorial. The installation process depends on which operating system you’re using.

提示更改,表明环境处于活动状态。 现在安装PyTorch ,这是一个用于Python的深度学习框架,我们将在本教程中使用它。 安装过程取决于您使用的操作系统。

On macOS, install Pytorch with the following command:

在macOS上,使用以下命令安装Pytorch:

  • python -m pip install torch==0.4.1 torchvision==0.2.1

    python -m pip install torch == 0.4.1 torchvision == 0.2.1

On Linux, use the following commands:

在Linux上,使用以下命令:

  • pip install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp35-cp35m-linux_x86_64.whl

    pip安装http://download.pytorch.org/whl/cpu/torch- 0.4.1 -cp35-cp35m- linux_x86_64 .whl

  • pip install torchvision

    pip安装torchvision

And for Windows, install Pytorch with these commands:

对于Windows,请使用以下命令安装Pytorch:

  • pip install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp35-cp35m-win_amd64.whl

    pip安装http://download.pytorch.org/whl/cpu/torch- 0.4.1 -cp35-cp35m- win_amd64 .whl

  • pip install torchvision

    pip安装torchvision

Now install prepackaged binaries for OpenCV and numpy, which are computer vision and linear algebra libraries, respectively. The former offers utilities such as image rotations, and the latter offers linear algebra utilities such as a matrix inversion.

现在为OpenCVnumpy安装预打包的二进制文件,它们分别是计算机视觉和线性代数库。 前者提供诸如图像旋转之类的工具,而后者提供诸如矩阵求逆之类的线性代数工具。

  • python -m pip install opencv-python==3.4.3.18 numpy==1.14.5

    python -m pip install opencv-python == 3.4.3.18 numpy == 1.14.5

Finally, create a directory for our assets, which will hold the images we’ll use in this tutorial:

最后,为我们的资产创建一个目录,其中将包含我们将在本教程中使用的图像:

  • mkdir assets

    mkdir资产

With the dependencies installed, let’s build the first version of our filter: a face detector.

安装依赖项后,让我们构建过滤器的第一个版本:人脸检测器。

第2步-构建人脸检测器 (Step 2 — Building a Face Detector)

Our first objective is to detect all faces in an image. We’ll create a script that accepts a single image and outputs an annotated image with the faces outlined with boxes.

我们的首要目标是检测图像中的所有面部。 我们将创建一个脚本,该脚本接受单个图像并输出带有带框轮廓的带有面Kong的带注释的图像。

Fortunately, instead of writing our own face detection logic, we can use pre-trained models. We’ll set up a model and then load pre-trained parameters. OpenCV makes this easy by providing both.

幸运的是,我们可以使用预先训练的模型来代替编写自己的面部检测逻辑。 我们将建立一个模型,然后加载预训练的参数。 通过同时提供两者,OpenCV使此操作变得容易。

OpenCV provides the model parameters in its source code. but we need the absolute path to our locally-installed OpenCV to use these parameters. Since that absolute path may vary, we’ll download our own copy instead and place it in the assets folder:

OpenCV在其源代码中提供模型参数。 但是我们需要本地安装的OpenCV的绝对路径才能使用这些参数。 由于该绝对路径可能会有所不同,因此我们将下载自己的副本,并将其放置在assets文件夹中:

  • wget -O assets/haarcascade_frontalface_default.xml https://github.com/opencv/opencv/raw/master/data/haarcascades/haarcascade_frontalface_default.xml

    wget -O资产/haarcascade_frontalface_default.xml https://github.com/opencv/opencv/raw/master/data/haarcascades/haarcascade_frontalface_default.xml

The -O option specifies the destination as assets/haarcascade_frontalface_default.xml. The second argument is the source URL.

-O选项将目的地指定为assets/haarcascade_frontalface_default.xml 。 第二个参数是源URL。

We’ll detect all faces in the following image from Pexels (CC0, link to original image).

我们将从Pexels (CC0, 链接到原始图像 )中检测以下图像中的所有面Kong。

First, download the image. The following command saves the downloaded image as children.png in the assets folder:

首先,下载图像。 以下命令将下载的图像另存为assets文件夹中的children.png

  • wget -O assets/children.png https://assets.digitalocean.com/articles/python3_dogfilter/CfoBWbF.png

    wget -O asset / children.png https://assets.digitalocean.com/articles/python3_dogfilter/CfoBWbF.png

To check that the detection algorithm works, we will run it on an individual image and save the resulting annotated image to disk. Create an outputs folder for these annotated results.

为了检查检测算法是否有效,我们将在单个图像上运行它,并将生成的带注释的图像保存到磁盘。 为这些带注释的结果创建一个outputs文件夹。

  • mkdir outputs

    mkdir输出

Now create a Python script for the face detector. Create the file step_1_face_detect using nano or your favorite text editor:

现在为面部检测器创建一个Python脚本。 使用nano或您喜欢的文本编辑器创建文件step_1_face_detect

  • nano step_2_face_detect.py

    纳米step_2_face_detect.py

Add the following code to the file. This code imports OpenCV, which contains the image utilities and face classifier. The rest of the code is typical Python program boilerplate.

将以下代码添加到文件中。 此代码导入OpenCV,其中包含图像实用程序和面部分类器。 其余代码是典型的Python程序样板。

step_2_face_detect.py
step_2_face_detect.py
"""Test for face detection"""

import cv2


def main():
    pass

if __name__ == '__main__':
    main()

Now replace pass in the main function with this code which initializes a face classifier using the OpenCV parameters you downloaded to your assets folder:

现在,使用以下代码替换main函数中的pass ,该代码使用您下载到assets文件夹中的OpenCV参数初始化人脸分类器:

step_2_face_detect.py
step_2_face_detect.py
def main():
    # initialize front face classifier
    cascade = cv2.CascadeClassifier("assets/haarcascade_frontalface_default.xml")

Next, add this line to load the image children.png.

接下来,添加此行以加载图片children.png

step_2_face_detect.py
step_2_face_detect.py
frame = cv2.imread('assets/children.png')

Then add this code to convert the image to black and white, as the classifier was trained on black-and-white images. To accomplish this, we convert to grayscale and then discretize the histogram:

然后,添加该代码以将图像转换为黑白图像,因为分类器是在黑白图像上进行训练的。 为此,我们转换为灰度,然后离散化直方图:

step_2_face_detect.py
step_2_face_detect.py
# Convert to black-and-white
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    blackwhite = cv2.equalizeHist(gray)

Then use OpenCV’s detectMultiScale function to detect all faces in the image.

然后使用OpenCV的detectMultiScale函数检测图像中的所有面Kong。

step_2_face_detect.py
step_2_face_detect.py
rects = cascade.detectMultiScale(
        blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
        flags=cv2.CASCADE_SCALE_IMAGE)
  • scaleFactor specifies how much the image is reduced along each dimension.

    scaleFactor指定沿每个尺寸缩小多少图像。

  • minNeighbors denotes how many neighboring rectangles a candidate rectangle needs to be retained.

    minNeighbors表示一个候选矩形需要保留多少个相邻矩形。

  • minSize is the minimum allowable detected object size. Objects smaller than this are discarded.

    minSize是检测到的最小允许对象大小。 小于此值的对象将被丢弃。

The return type is a list of tuples, where each tuple has four numbers denoting the minimum x, minimum y, width, and height of the rectangle in that order.

返回类型是一个元组列表,其中每个元组都有四个数字,分别表示该矩形的最小x,最小y,宽度和高度。

Iterate over all detected objects and draw them on the image in green using cv2.rectangle:

遍历所有检测到的对象,并使用cv2.rectangle将它们绘制在绿色的图像上:

step_2_face_detect.py
step_2_face_detect.py
for x, y, w, h in rects:
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
  • The second and third arguments are opposing corners of the rectangle.

    第二个和第三个参数是矩形的相对角。
  • The fourth argument is the color to use. (0, 255, 0) corresponds to green for our RGB color space.

    第四个参数是要使用的颜色。 (0, 255, 0)对应于我们的RGB颜色空间的绿色。

  • The last argument denotes the width of our line.

    最后一个参数表示行的宽度。

Finally, write the image with bounding boxes into a new file at outputs/children_detected.png:

最后,将带有边界框的图像写入到output outputs/children_detected.png的新文件中:

step_2_face_detect.py
step_2_face_detect.py
cv2.imwrite('outputs/children_detected.png', frame)

Your completed script should look like this:

您完成的脚本应如下所示:

step_2_face_detect.py
step_2_face_detect.py
"""Tests face detection for a static image."""  

import cv2  


def main():  

    # initialize front face classifier  
    cascade = cv2.CascadeClassifier(  
        "assets/haarcascade_frontalface_default.xml")  

    frame = cv2.imread('assets/children.png')  

    # Convert to black-and-white  
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)  
    blackwhite = cv2.equalizeHist(gray)  

    rects = cascade.detectMultiScale(  
        blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),  
    flags=cv2.CASCADE_SCALE_IMAGE)  

    for x, y, w, h in rects:  
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)  

    cv2.imwrite('outputs/children_detected.png', frame)  

if __name__ == '__main__':  
    main()

Save the file and exit your editor. Then run the script:

保存文件并退出编辑器。 然后运行脚本:

  • python step_2_face_detect.py

    python step_2_face_detect.py

Open outputs/children_detected.png. You’ll see the following image that shows the faces outlined with boxes:

打开outputs/children_detected.png 。 您将看到以下图像,其中显示了用框框出的面Kong:

At this point, you have a working face detector. It accepts an image as input and draws bounding boxes around all faces in the image, outputting the annotated image. Now let’s apply this same detection to a live camera feed.

此时,您已经有了一个工作面检测器。 它接受图像作为输入,并在图像中的所有面周围绘制边框,以输出带注释的图像。 现在,我们将相同的检测应用于实时摄像机源。

步骤3 —链接相机Feed (Step 3 — Linking the Camera Feed)

The next objective is to link the computer’s camera to the face detector. Instead of detecting faces in a static image, you’ll detect all faces from your computer’s camera. You will collect camera input, detect and annotate all faces, and then display the annotated image back to the user. You’ll continue from the script in Step 2, so start by duplicating that script:

下一个目标是将计算机的相机连接到面部检测器。 您无需检测静态图像中的面部,而是可以检测计算机相机中的所有面部。 您将收集相机输入,检测并注释所有面部,然后将注释的图像显示给用户。 您将在第2步中继续执行脚本,因此首先复制该脚本:

  • cp step_2_face_detect.py step_3_camera_face_detect.py

    cp step_2_face_detect.py step_3_camera_face_detect.py

Then open the new script in your editor:

然后在编辑器中打开新脚本:

  • nano step_3_camera_face_detect.py

    纳米step_3_camera_face_detect.py

You will update the main function by using some elements from this test script from the official OpenCV documentation. Start by initializing a VideoCapture object that is set to capture live feed from your computer’s camera. Place this at the start of the main function, before the other code in the function:

您将通过使用官方OpenCV文档中此测试脚本中的某些元素来更新main功能。 首先初始化一个VideoCapture对象,该对象设置为从计算机的摄像机捕获实时供稿。 将此放在main函数的开始,在函数中的其他代码之前:

step_3_camera_face_detect.py
step_3_camera_face_detect.py
def main():
    cap = cv2.VideoCapture(0)
    ...

Starting from the line defining frame, indent all of your existing code, placing all of the code in a while loop.

从行开始定义frame ,缩进所有现有的代码,把所有的代码中while循环。

step_3_camera_face_detect.py
step_3_camera_face_detect.py
while True:
        frame = cv2.imread('assets/children.png')
        ...
        for x, y, w, h in rects:  
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)  

        cv2.imwrite('outputs/children_detected.png', frame)

Replace the line defining frame at the start of the while loop. Instead of reading from an image on disk, you’re now reading from the camera:

while循环的开始处替换行定义frame 。 现在,您不再需要从磁盘上读取图像,而是从相机读取数据:

step_3_camera_face_detect.py
step_3_camera_face_detect.py
while True:
        # frame = cv2.imread('assets/children.png') # DELETE ME
        # Capture frame-by-frame
        ret, frame = cap.read()

Replace the line cv2.imwrite(...) at the end of the while loop. Instead of writing an image to disk, you’ll display the annotated image back to the user’s screen:

while循环的末尾替换行cv2.imwrite(...) 。 无需将图像写入磁盘,而是将带注释的图像显示回用户屏幕:

step_3_camera_face_detect.py
step_3_camera_face_detect.py
cv2.imwrite('outputs/children_detected.png', frame)  # DELETE ME
      # Display the resulting frame
      cv2.imshow('frame', frame)

Also, add some code to watch for keyboard input so you can stop the program. Check if the user hits the q character and, if so, quit the application. Right after cv2.imshow(...) add the following:

另外,添加一些代码来监视键盘输入,以便可以停止该程序。 检查用户是否按了q字符,如果是,请退出应用程序。 在cv2.imshow(...)添加以下内容:

step_3_camera_face_detect.py
step_3_camera_face_detect.py
...
        cv2.imshow('frame', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
...

The line cv2.waitkey(1) halts the program for 1 millisecond so that the captured image can be displayed back to the user.

cv2.waitkey(1)暂停程序1毫秒,以便可以将捕获的图像显示回给用户。

Finally, release the capture and close all windows. Place this outside of the while loop to end the main function.

最后,释放捕获并关闭所有窗口。 将其放置在while循环之外以结束main功能。

step_3_camera_face_detect.py
step_3_camera_face_detect.py
...

    while True:
    ...


    cap.release()
    cv2.destroyAllWindows()

Your script should look like the following:

您的脚本应如下所示:

step_3_camera_face_detect.py
step_3_camera_face_detect.py
"""Test for face detection on video camera.

Move your face around and a green box will identify your face.
With the test frame in focus, hit `q` to exit.
Note that typing `q` into your terminal will do nothing.
"""

import cv2


def main():
    cap = cv2.VideoCapture(0)

    # initialize front face classifier
    cascade = cv2.CascadeClassifier(
        "assets/haarcascade_frontalface_default.xml")

    while True:
        # Capture frame-by-frame
        ret, frame = cap.read()

        # Convert to black-and-white
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        blackwhite = cv2.equalizeHist(gray)

        # Detect faces
        rects = cascade.detectMultiScale(
            blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
            flags=cv2.CASCADE_SCALE_IMAGE)

        # Add all bounding boxes to the image
        for x, y, w, h in rects:
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

        # Display the resulting frame
        cv2.imshow('frame', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()


if __name__ == '__main__':
    main()

Save the file and exit your editor.

保存文件并退出编辑器。

Now run the test script.

现在运行测试脚本。

  • python step_3_camera_face_detect.py

    python step_3_camera_face_detect.py

This activates your camera and opens a window displaying your camera’s feed. Your face will be boxed by a green square in real time:

这将激活您的相机并打开一个窗口,显示您的相机的提要。 您的脸将被绿色方框实时框出:

Note: If you find that you have to hold very still for things to work, the lighting in the room may not be adequate. Try moving to a brightly lit room where you and your background have high constrast. Also, avoid bright lights near your head. For example, if you have your back to the sun, this process might not work very well.

注意 :如果发现必须保持静止才能工作,则房间内的照明可能不足。 尝试移动到一个光线明亮的房间,您和您的背景对比度很高。 另外,请避免将明亮的灯光放在您的头部附近。 例如,如果您不晒太阳,则此过程可能效果不佳。

Our next objective is to take the detected faces and superimpose dog masks on each one.

我们的下一个目标是取下检测到的脸,并在每张脸上叠加狗口罩。

步骤4 —构建狗过滤器 (Step 4 — Building the Dog Filter)

Before we build the filter itself, let’s explore how images are represented numerically. This will give you the background needed to modify images and ultimately apply a dog filter.

在构建过滤器本身之前,让我们探讨如何用数字表示图像。 这将为您提供修改图像并最终应用狗过滤器所需的背景。

Let’s look at an example. We can construct a black-and-white image using numbers, where 0 corresponds to black and 1 corresponds to white.

让我们来看一个例子。 我们可以使用数字构造黑白图像,其中0对应于黑色,而1对应于白色。

Focus on the dividing line between 1s and 0s. What shape do you see?

专注于1s和0s之间的分界线。 您看到什么形状?

0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 1 1 1 0 0 0
0 0 1 1 1 1 1 0 0
0 0 0 1 1 1 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0

The image is a diamond. If save this matrix of values as an image. This gives us the following picture:

图像是钻石。 如果将此值矩阵另存为图像。 这给我们以下图片:

We can use any value between 0 and 1, such as 0.1, 0.26, or 0.74391. Numbers closer to 0 are darker and numbers closer to 1 are lighter. This allows us to represent white, black, and any shade of gray. This is great news for us because we can now construct any grayscale image using 0, 1, and any value in between. Consider the following, for example. Can you tell what it is? Again, each number corresponds to the color of a pixel.

我们可以使用0到1之间的任何值,例如0.1、0.26或0.74391。 接近0的数字更暗,接近1的数字更亮。 这使我们能够代表白色,黑色和任何灰色阴影。 这对我们来说是个好消息,因为我们现在可以使用0、1及其之间的任何值构造任何灰度图像。 例如,考虑以下内容。 你能说出它是什么吗? 同样,每个数字对应于像素的颜色。

1  1  1  1  1  1  1  1  1  1  1  1
1  1  1  1  0  0  0  0  1  1  1  1
1  1  0  0 .4 .4 .4 .4  0  0  1  1
1  0 .4 .4 .5 .4 .4 .4 .4 .4  0  1
1  0 .4 .5 .5 .5 .4 .4 .4 .4  0  1
0 .4 .4 .4 .5 .4 .4 .4 .4 .4 .4  0
0 .4 .4 .4 .4  0  0 .4 .4 .4 .4  0
0  0 .4 .4  0  1 .7  0 .4 .4  0  0
0  1  0  0  0 .7 .7  0  0  0  1  0
1  0  1  1  1  0  0 .7 .7 .4  0  1
1  0 .7  1  1  1 .7 .7 .7 .7  0  1
1  1  0  0 .7 .7 .7 .7  0  0  1  1
1  1  1  1  0  0  0  0  1  1  1  1
1  1  1  1  1  1  1  1  1  1  1  1

Re-rendered as an image, you can now tell that this is, in fact, a Poké Ball:

重新渲染为图像后,您现在可以知道这实际上是一个PokéBall:

You’ve now seen how black-and-white and grayscale images are represented numerically. To introduce color, we need a way to encode more information. An image has its height and width expressed as h x w.

您现在已经看到了黑白和灰度图像是如何用数字表示的。 要引入颜色,我们需要一种编码更多信息的方法。 图像的高度和宽度表示为hxw

In the current grayscale representation, each pixel is one value between 0 and 1. We can equivalently say our image has dimensions h x w x 1. In other words, every (x, y) position in our image has just one value.

在当前的灰度表示中,每个像素都是0到1之间的一个值。我们可以等效地说我们的图像尺寸为hxwx 1 。 换句话说,我们图像中的每个(x, y)位置都只有一个值。

For a color representation, we represent the color of each pixel using three values between 0 and 1. One number corresponds to the “degree of red,” one to the “degree of green,” and the last to the “degree of blue.” We call this the RGB color space. This means that for every (x, y) position in our image, we have three values (r, g, b). As a result, our image is now h x w x 3:

对于颜色表示,我们使用介于0和1之间的三个值表示每个像素的颜色。一个数字对应于“红色度”,一个对应于“绿色度”,最后一个对应于“蓝色度”。 ” 我们称其为RGB颜色空间 。 这意味着对于图像中的每个(x, y)位置,我们都有三个值(r, g, b) 。 结果,我们的图像现在是hxwx 3

Here, each number ranges from 0 to 255 instead of 0 to 1, but the idea is the same. Different combinations of numbers correspond to different colors, such as dark purple (102, 0, 204) or bright orange (255, 153, 51). The takeaways are as follows:

在这里,每个数字的范围是0到255,而不是0到1,但是想法是相同的。 数字的不同组合对应于不同的颜色,例如深紫色(102, 0, 204)或亮橙色(255, 153, 51) 。 外卖如下:

  1. Each image will be represented as a box of numbers that has three dimensions: height, width, and color channels. Manipulating this box of numbers directly is equivalent to manipulating the image.

    每个图像将被表示为一个具有三个维度的数字框:高度,宽度和颜色通道。 直接操纵此数字框等效于操纵图像。
  2. We can also flatten this box to become just a list of numbers. In this way, our image becomes a vector. Later on, we will refer to images as vectors.

    我们还可以展平此框,使其仅是数字列表。 这样,我们的图像便成为矢量 。 稍后,我们将图像称为矢量。

Now that you understand how images are represented numerically, you are well-equipped to begin applying dog masks to faces. To apply a dog mask, you will replace values in the child image with non-white dog mask pixels. To start, you will work with a single image. Download this crop of a face from the image you used in Step 2.

现在您已经了解了如何用数字表示图像,现在您已经准备好开始将狗口罩应用于面部。 要应用狗遮罩,您将用非白色狗遮罩像素替换子图像中的值。 首先,您将使用单个图像。 从您在第2步中使用的图像下载此脸部裁剪。

  • wget -O assets/child.png https://assets.digitalocean.com/articles/python3_dogfilter/alXjNK1.png

    wget -O asset / child.png https://assets.digitalocean.com/articles/python3_dogfilter/alXjNK1.png

Additionally, download the following dog mask. The dog masks used in this tutorial are my own drawings, now released to the public domain under a CC0 License.

此外,下载以下狗面具。 本教程中使用的狗口罩是我自己的图纸,现已根据CC0许可证发布到公共领域。

Download this with wget:

wget下载:

  • wget -O assets/dog.png https://assets.digitalocean.com/articles/python3_dogfilter/ED32BCs.png

    wget -O asset / dog.png https://assets.digitalocean.com/articles/python3_dogfilter/ED32BCs.png

Create a new file called step_4_dog_mask_simple.py which will hold the code for the script that applies the dog mask to faces:

创建一个名为step_4_dog_mask_simple.py的新文件,该文件将保存将狗遮罩应用于面部的脚本的代码:

  • nano step_4_dog_mask_simple.py

    纳米step_4_dog_mask_simple.py

Add the following boilerplate for the Python script and import the OpenCV and numpy libraries:

为Python脚本添加以下样板并导入OpenCV和numpy库:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
"""Test for adding dog mask"""

import cv2
import numpy as np


def main():
    pass

if __name__ == '__main__':
    main()

Replace pass in the main function with these two lines which load the original image and the dog mask into memory.

用这两行将main功能中的pass替换为pass原始图像和狗遮罩加载到内存中的两行。

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
...
def main():
    face = cv2.imread('assets/child.png')
    mask = cv2.imread('assets/dog.png')

Next, fit the dog mask to the child. The logic is more complicated than what we’ve done previously, so we will create a new function called apply_mask to modularize our code. Directly after the two lines that load the images, add this line which invokes the apply_mask function:

接下来,给孩子戴上狗口罩。 逻辑比我们之前做的还要复杂,因此我们将创建一个名为apply_mask的新函数来模块化我们的代码。 在加载图像的两行之后,直接添加此行,以调用apply_mask函数:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
...
    face_with_mask = apply_mask(face, mask)

Create a new function called apply_mask and place it above the main function:

创建一个名为apply_mask的新函数,并将其放在main函数上方:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
...
def apply_mask(face: np.array, mask: np.array) -> np.array:
    """Add the mask to the provided face, and return the face with mask."""
    pass

def main():
...

At this point, your file should look like this:

此时,您的文件应如下所示:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
"""Test for adding dog mask"""

import cv2
import numpy as np


def apply_mask(face: np.array, mask: np.array) -> np.array:
    """Add the mask to the provided face, and return the face with mask."""
    pass


def main():
    face = cv2.imread('assets/child.png')
    mask = cv2.imread('assets/dog.png')
    face_with_mask = apply_mask(face, mask)

if __name__ == '__main__':
    main()

Let’s build out the apply_mask function. Our goal is to apply the mask to the child’s face. However, we need to maintain the aspect ratio for our dog mask. To do so, we need to explicitly compute our dog mask’s final dimensions. Inside the apply_mask function, replace pass with these two lines which extract the height and width of both images:

让我们构建apply_mask函数。 我们的目标是将面膜涂在孩子的脸上。 但是,我们需要保持狗面具的长宽比。 为此,我们需要显式计算狗面具的最终尺寸。 在apply_mask函数内部,用以下两行替换pass ,以提取两个图像的高度和宽度:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
...
    mask_h, mask_w, _ = mask.shape
    face_h, face_w, _ = face.shape

Next, determine which dimension needs to be “shrunk more.” To be precise, we need the tighter of the two constraints. Add this line to the apply_mask function:

接下来,确定需要“缩小更多”的尺寸。 确切地说,我们需要严格限制两个约束。 将此行添加到apply_mask函数:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
...

    # Resize the mask to fit on face
    factor = min(face_h / mask_h, face_w / mask_w)

Then compute the new shape by adding this code to the function:

然后通过将以下代码添加到函数中来计算新形状:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
...
    new_mask_w = int(factor * mask_w)
    new_mask_h = int(factor * mask_h)
    new_mask_shape = (new_mask_w, new_mask_h)

Here we cast the numbers to integers, as the resize function needs integral dimensions.

在这里,我们将数字转换为整数,因为resize函数需要整数尺寸。

Now add this code to resize the dog mask to the new shape:

现在添加以下代码以将狗蒙版调整为新形状:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
...

    # Add mask to face - ensure mask is centered
    resized_mask = cv2.resize(mask, new_mask_shape)

Finally, write the image to disk so you can double-check that your resized dog mask is correct after you run the script:

最后,将映像写入磁盘,以便在运行脚本后可以再次检查调整后的狗掩码是否正确:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
cv2.imwrite('outputs/resized_dog.png', resized_mask)

The completed script should look like this:

完成的脚本应如下所示:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
"""Test for adding dog mask"""
import cv2
import numpy as np

def apply_mask(face: np.array, mask: np.array) -> np.array:
    """Add the mask to the provided face, and return the face with mask."""
    mask_h, mask_w, _ = mask.shape
    face_h, face_w, _ = face.shape

    # Resize the mask to fit on face
    factor = min(face_h / mask_h, face_w / mask_w)
    new_mask_w = int(factor * mask_w)
    new_mask_h = int(factor * mask_h)
    new_mask_shape = (new_mask_w, new_mask_h)

    # Add mask to face - ensure mask is centered
    resized_mask = cv2.resize(mask, new_mask_shape)
    cv2.imwrite('outputs/resized_dog.png', resized_mask)


def main():
    face = cv2.imread('assets/child.png')
    mask = cv2.imread('assets/dog.png')
    face_with_mask = apply_mask(face, mask)

if __name__ == '__main__':
    main()

Save the file and exit your editor. Run the new script:

保存文件并退出编辑器。 运行新脚本:

  • python step_4_dog_mask_simple.py

    python step_4_dog_mask_simple.py

Open the image at outputs/resized_dog.png to double-check the mask was resized correctly. It will match the dog mask shown earlier in this section.

outputs/resized_dog.png打开图像,以outputs/resized_dog.png检查遮罩是否已正确调整大小。 它将与本节前面显示的狗口罩匹配。

Now add the dog mask to the child. Open the step_4_dog_mask_simple.py file again and return to the apply_mask function:

现在,给孩子添加狗口罩。 再次打开step_4_dog_mask_simple.py文件,然后返回apply_mask函数:

  • nano step_4_dog_mask_simple.py

    纳米step_4_dog_mask_simple.py

First, remove the line of code that writes the resized mask from the apply_mask function since you no longer need it:

首先,因为不再需要该行,所以从apply_mask函数中删除了写了调整大小后的掩码的代码行:

cv2.imwrite('outputs/resized_dog.png', resized_mask)  # delete this line
    ...

In its place, apply your knowledge of image representation from the start of this section to modify the image. Start by making a copy of the child image. Add this line to the apply_mask function:

取而代之的是,从本节开始应用您对图像表示的知识,以修改图像。 首先制作子图像的副本。 将此行添加到apply_mask函数:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
...
    face_with_mask = face.copy()

Next, find all positions where the dog mask is not white or near white. To do this, check if the pixel value is less than 250 across all color channels, as we’d expect a near-white pixel to be near [255, 255, 255]. Add this code:

接下来,找到狗面具不是白色或接近白色的所有位置。 为此,请检查所有颜色通道上的像素值是否都小于250,因为我们希望接近白色的像素接近[255, 255, 255] 。 添加此代码:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
...
    non_white_pixels = (resized_mask < 250).all(axis=2)

At this point, the dog image is, at most, as large as the child image. We want to center the dog image on the face, so compute the offset needed to center the dog image by adding this code to apply_mask:

此时,狗图像最多与子图像一样大。 我们想将狗图像居中放置在脸部,因此通过将以下代码添加到apply_mask来计算使狗图像居中所需的偏移量:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
...
    off_h = int((face_h - new_mask_h) / 2)  
    off_w = int((face_w - new_mask_w) / 2)

Copy all non-white pixels from the dog image into the child image. Since the child image may be larger than the dog image, we need to take a subset of the child image:

将所有非白色像素从狗图像复制到子图像。 由于子图像可能比狗图像大,因此我们需要获取子图像的一个子集:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = \
            resized_mask[non_white_pixels]

Then return the result:

然后返回结果:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
return face_with_mask

In the main function, add this code to write the result of the apply_mask function to an output image so you can manually double-check the result:

main函数中,添加以下代码以将apply_mask函数的结果写入输出图像,以便您可以手动仔细检查结果:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
...
    face_with_mask = apply_mask(face, mask)
    cv2.imwrite('outputs/child_with_dog_mask.png', face_with_mask)

Your completed script will look like the following:

您完成的脚本将如下所示:

step_4_dog_mask_simple.py
step_4_dog_mask_simple.py
"""Test for adding dog mask"""

import cv2
import numpy as np


def apply_mask(face: np.array, mask: np.array) -> np.array:
    """Add the mask to the provided face, and return the face with mask."""
    mask_h, mask_w, _ = mask.shape
    face_h, face_w, _ = face.shape

    # Resize the mask to fit on face
    factor = min(face_h / mask_h, face_w / mask_w)
    new_mask_w = int(factor * mask_w)
    new_mask_h = int(factor * mask_h)
    new_mask_shape = (new_mask_w, new_mask_h)
    resized_mask = cv2.resize(mask, new_mask_shape)

    # Add mask to face - ensure mask is centered
    face_with_mask = face.copy()
    non_white_pixels = (resized_mask < 250).all(axis=2)
    off_h = int((face_h - new_mask_h) / 2)  
    off_w = int((face_w - new_mask_w) / 2)
    face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = \
         resized_mask[non_white_pixels]

    return face_with_mask

def main():
    face = cv2.imread('assets/child.png')
    mask = cv2.imread('assets/dog.png')
    face_with_mask = apply_mask(face, mask)
    cv2.imwrite('outputs/child_with_dog_mask.png', face_with_mask)

if __name__ == '__main__':
    main()

Save the script and run it:

保存脚本并运行它:

  • python step_4_dog_mask_simple.py

    python step_4_dog_mask_simple.py

You’ll have the following picture of a child with a dog mask in outputs/child_with_dog_mask.png:

您将在outputs/child_with_dog_mask.png得到以下带有狗面具的孩子的图片:

You now have a utility that applies dog masks to faces. Now let’s use what you’ve built to add the dog mask in real time.

现在,您有了一个将狗口罩应用到脸上的实用程序。 现在,让我们使用构建的内容实时添加狗面具。

We’ll pick up from where we left off in Step 3. Copy step_3_camera_face_detect.py to step_4_dog_mask.py.

我们将从在第3步中step_3_camera_face_detect.pystep_4_dog_mask.py 。将step_3_camera_face_detect.py复制到step_4_dog_mask.py

  • cp step_3_camera_face_detect.py step_4_dog_mask.py

    cp step_3_camera_face_detect.py step_4_dog_mask.py

Open your new script.

打开新脚本。

  • nano step_4_dog_mask.py

    纳米step_4_dog_mask.py

First, import the NumPy library at the top of the script:

首先,在脚本顶部导入NumPy库:

step_4_dog_mask.py
step_4_dog_mask.py
import numpy as np
...

Then add the apply_mask function from your previous work into this new file above the main function:

然后添加apply_mask从以前的功函数到上述新文件main功能:

step_4_dog_mask.py
step_4_dog_mask.py
def apply_mask(face: np.array, mask: np.array) -> np.array:
    """Add the mask to the provided face, and return the face with mask."""
    mask_h, mask_w, _ = mask.shape
    face_h, face_w, _ = face.shape

    # Resize the mask to fit on face
    factor = min(face_h / mask_h, face_w / mask_w)
    new_mask_w = int(factor * mask_w)
    new_mask_h = int(factor * mask_h)
    new_mask_shape = (new_mask_w, new_mask_h)
    resized_mask = cv2.resize(mask, new_mask_shape)

    # Add mask to face - ensure mask is centered
    face_with_mask = face.copy()
    non_white_pixels = (resized_mask < 250).all(axis=2)
    off_h = int((face_h - new_mask_h) / 2)  
    off_w = int((face_w - new_mask_w) / 2)
    face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = \
         resized_mask[non_white_pixels]

    return face_with_mask
...

Second, locate this line in the main function:

其次,在main函数中找到以下行:

step_4_dog_mask.py
step_4_dog_mask.py
cap = cv2.VideoCapture(0)

Add this code after that line to load the dog mask:

在该行之后添加以下代码以加载狗掩码:

step_4_dog_mask.py
step_4_dog_mask.py
cap = cv2.VideoCapture(0)

    # load mask
    mask = cv2.imread('assets/dog.png')
    ...

Next, in the while loop, locate this line:

接下来,在while循环中,找到以下行:

step_4_dog_mask.py
step_4_dog_mask.py
ret, frame = cap.read()

Add this line after it to extract the image’s height and width:

在其后添加以下行以提取图像的高度和宽度:

step_4_dog_mask.py
step_4_dog_mask.py
ret, frame = cap.read()
        frame_h, frame_w, _ = frame.shape
        ...

Next, delete the line in main that draws bounding boxes. You’ll find this line in the for loop that iterates over detected faces:

接下来,删除main中绘制边界框的线。 您将在for循环中找到以下行,该行遍历检测到的面部:

step_4_dog_mask.py
step_4_dog_mask.py
for x, y, w, h in rects:
        ...
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) # DELETE ME
        ...

In its place, add this code which crops the frame. For aesthetic purposes, we crop an area slightly larger than the face.

在其位置上,添加此代码以裁剪框架。 出于美学目的,我们裁切出比面部略大的区域。

step_4_dog_mask.py
step_4_dog_mask.py
for x, y, w, h in rects:
            # crop a frame slightly larger than the face
            y0, y1 = int(y - 0.25*h), int(y + 0.75*h)
            x0, x1 = x, x + w

Introduce a check in case the detected face is too close to the edge.

如果检测到的脸离边缘太近,请进行检查。

step_4_dog_mask.py
step_4_dog_mask.py
# give up if the cropped frame would be out-of-bounds
            if x0 < 0 or y0 < 0 or x1 > frame_w or y1 > frame_h:
                continue

Finally, insert the face with a mask into the image.

最后,将带有遮罩的面部插入图像。

step_4_dog_mask.py
step_4_dog_mask.py
# apply mask
            frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)

Verify that your script looks like this:

验证您的脚本如下所示:

step_4_dog_mask.py
step_4_dog_mask.py
"""Real-time dog filter

Move your face around and a dog filter will be applied to your face if it is not out-of-bounds. With the test frame in focus, hit `q` to exit. Note that typing `q` into your terminal will do nothing.
"""

import numpy as np
import cv2


def apply_mask(face: np.array, mask: np.array) -> np.array:
    """Add the mask to the provided face, and return the face with mask."""
    mask_h, mask_w, _ = mask.shape
    face_h, face_w, _ = face.shape

    # Resize the mask to fit on face
    factor = min(face_h / mask_h, face_w / mask_w)
    new_mask_w = int(factor * mask_w)
    new_mask_h = int(factor * mask_h)
    new_mask_shape = (new_mask_w, new_mask_h)
    resized_mask = cv2.resize(mask, new_mask_shape)

    # Add mask to face - ensure mask is centered
    face_with_mask = face.copy()
    non_white_pixels = (resized_mask < 250).all(axis=2)
    off_h = int((face_h - new_mask_h) / 2)
    off_w = int((face_w - new_mask_w) / 2)
    face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = \
         resized_mask[non_white_pixels]

    return face_with_mask

def main():
    cap = cv2.VideoCapture(0)

    # load mask
    mask = cv2.imread('assets/dog.png')

    # initialize front face classifier
    cascade = cv2.CascadeClassifier("assets/haarcascade_frontalface_default.xml")

    while(True):
        # Capture frame-by-frame
        ret, frame = cap.read()
        frame_h, frame_w, _ = frame.shape

        # Convert to black-and-white
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        blackwhite = cv2.equalizeHist(gray)

        # Detect faces
        rects = cascade.detectMultiScale(
            blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
            flags=cv2.CASCADE_SCALE_IMAGE)

        # Add mask to faces
        for x, y, w, h in rects:
            # crop a frame slightly larger than the face
            y0, y1 = int(y - 0.25*h), int(y + 0.75*h)
            x0, x1 = x, x + w

            # give up if the cropped frame would be out-of-bounds
            if x0 < 0 or y0 < 0 or x1 > frame_w or y1 > frame_h:
                continue

            # apply mask
            frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)

        # Display the resulting frame
        cv2.imshow('frame', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    # When everything done, release the capture
    cap.release()
    cv2.destroyAllWindows()


if __name__ == '__main__':
    main()

Save the file and exit your editor. Then run the script.

保存文件并退出编辑器。 然后运行脚本。

  • python step_4_dog_mask.py

    python step_4_dog_mask.py

You now have a real-time dog filter running. The script will also work with multiple faces in the picture, so you can get your friends together for some automatic dog-ification.

现在,您正在运行实时狗过滤器。 该脚本还可以处理图片中的多个面Kong,因此您可以将朋友聚在一起进行自动狗狗化处理。

This concludes our first primary objective in this tutorial, which is to create a Snapchat-esque dog filter. Now let’s use facial expression to determine the dog mask applied to a face.

总结了本教程的第一个主要目标,即创建一个类似Snapchat的狗过滤器。 现在,让我们使用面部表情来确定应用于面部的狗面具。

第5步-使用最小二乘建立基本的面部表情分类器 (Step 5 — Build a Basic Face Emotion Classifier using Least Squares)

In this section you’ll create an emotion classifier to apply different masks based on displayed emotions. If you smile, the filter will apply a corgi mask. If you frown, it will apply a pug mask. Along the way, you’ll explore the least-squares framework, which is fundamental to understanding and discussing machine learning concepts.

在本部分中,您将创建一个情感分类器,根据显示的情感应用不同的蒙版。 如果您微笑,滤镜会套上柯基犬口罩。 如果皱眉,它将套上哈巴狗面具。 在此过程中,您将探索最小二乘框架,这是理解和讨论机器学习概念的基础。

To understand how to process our data and produce predictions, we’ll first briefly explore machine learning models.

为了了解如何处理我们的数据并产生预测,我们将首先简要探讨机器学习模型。

We need to ask two questions for each model that we consider. For now, these two questions will be sufficient to differentiate between models:

对于我们考虑的每个模型,我们需要问两个问题。 目前,这两个问题足以区分模型:

  1. Input: What information is the model given?

    输入:模型提供了哪些信息?
  2. Output: What is the model trying to predict?

    输出:该模型试图预测什么?

At a high-level, the goal is to develop a model for emotion classification. The model is:

从高层次来看,目标是开发一种情感分类模型。 该模型是:

  1. Input: given images of faces.

    输入:给定的人脸图像。
  2. Output: predicts the corresponding emotion.

    输出:预测相应的情绪。
model: face -> emotion

The approach we’ll use is least squares; we take a set of points, and we find a line of best fit. The line of best fit, shown in the following image, is our model.

我们将使用的方法是最小二乘 ; 我们得出了一组要点,并找到了最合适的一条线。 下图所示的最佳拟合线是我们的模型。

Consider the input and output for our line:

考虑我们的生产线的输入和输出:

  1. Input: given x coordinates.

    输入:给定x坐标。

  2. Output: predicts the corresponding $y$ coordinate.

    输出:预测相应的$ y $坐标。
least squares line: x -> y

Our input x must represent faces and our output y must represent emotion, in order for us to use least squares for emotion classification:

我们的输入x必须代表面Kong,而我们的输出y必须代表情感,以便我们使用最小二乘法进行情感分类:

  • x -> face: Instead of using one number for x, we will use a vector of values for x. Thus, x can represent images of faces. The article Ordinary Least Squares explains why you can use a vector of values for x.

    x -> face :不要使用一个数字x ,我们将使用值的向量 x 。 因此, x可以代表脸部图像。 普通最小二乘文章解释了为什么可以对x使用值向量。

  • y -> emotion: Each emotion will correspond to a number. For example, “angry” is 0, “sad” is 1, and “happy” is 2. In this way, y can represent emotions. However, our line is not constrained to output the y values 0, 1, and 2. It has an infinite number of possible y values–it could be 1.2, 3.5, or 10003.42. How do we translate those y values to integers corresponding to classes? See the article One-Hot Encoding for more detail and explanation.

    y -> emotion :每种情感都会对应一个数字。 例如,“生气”为0,“悲伤”为1,“快乐”为2。这样, y可以表示情绪。 但是,我们的行并不限于输出y值0、1和2。它具有无限数量的y值,可以是1.2、3.5或10003.42。 我们如何将这些y值转换为对应于类的整数? 有关更多详细信息和说明,请参阅文章一键编码 。

Armed with this background knowledge, you will build a simple least-squares classifier using vectorized images and one-hot encoded labels. You’ll accomplish this in three steps:

掌握了这些背景知识,您将使用矢量化图像和一键编码标签构建一个简单的最小二乘分类器。 您将分三个步骤完成此操作:

  1. Preprocess the data: As explained at the start of this section, our samples are vectors where each vector encodes an image of a face. Our labels are integers corresponding to an emotion, and we’ll apply one-hot encoding to these labels.

    预处理数据:如本节开头所述,我们的样本是向量,其中每个向量都编码人脸图像。 我们的标签是与情感相对应的整数,我们将对这些标签应用一键编码。
  2. Specify and train the model: Use the closed-form least squares solution, w^*.

    指定并训练模型:使用封闭式最小二乘法w^*

  3. Run a prediction using the model: Take the argmax of Xw^* to obtain predicted emotions.

    使用模型运行预测:取Xw^*的argmax以获得预测的情绪。

Let’s get started.

让我们开始吧。

First, set up a directory to contain the data:

首先,建立一个包含数据的目录:

  • mkdir data

    mkdir数据

Then download the data, curated by Pierre-Luc Carrier and Aaron Courville, from a 2013 Face Emotion Classification competition on Kaggle.

然后下载数据,这些数据由Pierre-Luc Carrier和Aaron Courville策划,来自2013 年Kaggle的“面Kong情感分类” 竞赛 。

  • wget -O data/fer2013.tar https://bitbucket.org/alvinwan/adversarial-examples-in-computer-vision-building-then-fooling/raw/babfe4651f89a398c4b3fdbdd6d7a697c5104cff/fer2013.tar

    wget -O data / fer2013.tar https://bitbucket.org/alvinwan/adversarial-examples-in-computer-vision-building-then-fooling/raw/babfe4651f89a398c4b3fdbdd6d7a697c5104cff/fer2013.tar

Navigate to the data directory and unpack the data.

导航到data目录并解压缩数据。

  • cd data

    光盘数据
  • tar -xzf fer2013.tar

    焦油-xzf fer2013.tar

Now we’ll create a script to run the least-squares model. Navigate to the root of your project:

现在,我们将创建一个脚本来运行最小二乘法模型。 导航到项目的根目录:

  • cd ~/DogFilter

    cd〜/ DogFilter

Create a new file for the script:

为脚本创建一个新文件:

  • nano step_5_ls_simple.py

    纳米step_5_ls_simple.py

Add Python boilerplate and import the packages you will need:

添加Python样板并导入您需要的软件包:

step_5_ls_simple.py
step_5_ls_simple.py
"""Train emotion classifier using least squares."""

import numpy as np

def main():
    pass

if __name__ == '__main__':
    main()

Next, load the data into memory. Replace pass in your main function with the following code:

接下来,将数据加载到内存中。 将main函数中的pass替换为以下代码:

step_5_ls_simple.py
step_5_ls_simple.py
# load data
    with np.load('data/fer2013_train.npz') as data:
        X_train, Y_train = data['X'], data['Y']

    with np.load('data/fer2013_test.npz') as data:
        X_test, Y_test = data['X'], data['Y']

Now one-hot encode the labels. To do this, construct the identity matrix with numpy and then index into this matrix using our list of labels:

现在,一键编码标签。 为此,请使用numpy构造单位矩阵,然后使用我们的标签列表索引到该矩阵中:

step_5_ls_simple.py
step_5_ls_simple.py
# one-hot labels
    I = np.eye(6)
    Y_oh_train, Y_oh_test = I[Y_train], I[Y_test]

Here, we use the fact that the i-th row in the identity matrix is all zero, except for the i-th entry. Thus, the i-th row is the one-hot encoding for the label of class i. Additionally, we use numpy’s advanced indexing, where [a, b, c, d][[1, 3]] = [b, d].

在这里,我们使用一个事实, i除第i个条目外,单位矩阵中的第i个行都为零。 因此,第i行是类i的标签的单编码。 另外,我们使用numpy的高级索引,其中[a, b, c, d][[1, 3]] = [b, d]

Computing (X^TX)^{-1} would take too long on commodity hardware, as X^TX is a 2304x2304 matrix with over four million values, so we’ll reduce this time by selecting only the first 100 features. Add this code:

在商用硬件上,计算(X^TX)^{-1}会花费太长时间,因为X^TX是一个具有超过400万个值的2304x2304矩阵,因此我们将仅选择前100个功能来减少此时间。 添加此代码:

step_5_ls_simple.py
step_5_ls_simple.py
...
    # select first 100 dimensions
    A_train, A_test = X_train[:, :100], X_test[:, :100]

Next, add this code to evaluate the closed-form least-squares solution:

接下来,添加以下代码以评估封闭式最小二乘解:

step_5_ls_simple.py
step_5_ls_simple.py
...
    # train model
    w = np.linalg.inv(A_train.T.dot(A_train)).dot(A_train.T.dot(Y_oh_train))

Then define an evaluation function for training and validation sets. Place this before your main function:

然后定义训练和验证集的评估功能。 将其放置在main功能之前:

step_5_ls_simple.py
step_5_ls_simple.py
def evaluate(A, Y, w):
    Yhat = np.argmax(A.dot(w), axis=1)
    return np.sum(Yhat == Y) / Y.shape[0]

To estimate labels, we take the inner product with each sample and get the indices of the maximum values using np.argmax. Then we compute the average number of correct classifications. This final number is your accuracy.

为了估计标签,我们使用每个样本的内积,并使用np.argmax获得最大值的指数。 然后,我们计算正确分类的平均数量。 这是您的准确性。

Finally, add this code to the end of the main function to compute the training and validation accuracy using the evaluate function you just wrote:

最后,将此代码添加到main函数的末尾,以使用刚编写的evaluate函数计算训练和验证的准确性:

step_5_ls_simple.py
step_5_ls_simple.py
# evaluate model
    ols_train_accuracy = evaluate(A_train, Y_train, w)
    print('(ols) Train Accuracy:', ols_train_accuracy)
    ols_test_accuracy = evaluate(A_test, Y_test, w)
    print('(ols) Test Accuracy:', ols_test_accuracy)

Double-check that your script matches the following:

仔细检查您的脚本是否符合以下条件:

step_5_ls_simple.py
step_5_ls_simple.py
"""Train emotion classifier using least squares."""

import numpy as np


def evaluate(A, Y, w):
    Yhat = np.argmax(A.dot(w), axis=1)
    return np.sum(Yhat == Y) / Y.shape[0]

def main():

    # load data
    with np.load('data/fer2013_train.npz') as data:
        X_train, Y_train = data['X'], data['Y']

    with np.load('data/fer2013_test.npz') as data:
        X_test, Y_test = data['X'], data['Y']

    # one-hot labels
    I = np.eye(6)
    Y_oh_train, Y_oh_test = I[Y_train], I[Y_test]

    # select first 100 dimensions
    A_train, A_test = X_train[:, :100], X_test[:, :100]

    # train model
    w = np.linalg.inv(A_train.T.dot(A_train)).dot(A_train.T.dot(Y_oh_train))

    # evaluate model
    ols_train_accuracy = evaluate(A_train, Y_train, w)
    print('(ols) Train Accuracy:', ols_train_accuracy)
    ols_test_accuracy = evaluate(A_test, Y_test, w)
    print('(ols) Test Accuracy:', ols_test_accuracy)


if __name__ == '__main__':
    main()

Save your file, exit your editor, and run the Python script.

保存文件,退出编辑器,然后运行Python脚本。

  • python step_5_ls_simple.py

    python step_5_ls_simple.py

You’ll see the following output:

您将看到以下输出:


   
     
     
     
     
Output
(ols) Train Accuracy: 0.4748918316507146 (ols) Test Accuracy: 0.45280545359202934

Our model gives 47.5% train accuracy. We repeat this on the validation set to obtain 45.3% accuracy. For a three-way classification problem, 45.3% is reasonably above guessing, which is 33\%​. This is our starting classifier for emotion detection, and in the next step, you’ll build off of this least-squares model to improve accuracy. The higher the accuracy, the more reliably your emotion-based dog filter can find the appropriate dog filter for each detected emotion.

我们的模型可提供47.5%的火车精度。 我们在验证集上重复此操作,以获取45.3%的准确性。 对于三向分类问题,45.3%合理地高于猜测,即33%。 这是我们用于情感检测的起始分类器,在下一步中,您将基于最小二乘法建立模型以提高准确性。 精度越高,基于情绪的狗过滤器就可以为每个检测到的情绪找到合适的狗过滤器。

第6步-通过使输入具有特征性来提高准确性 (Step 6 — Improving Accuracy by Featurizing the Inputs)

We can use a more expressive model to boost accuracy. To accomplish this, we featurize our inputs.

我们可以使用更具表现力的模型来提高准确性。 为了实现这一点,我们输入内容特征化。

The original image tells us that position (0, 0) is red, (1, 0) is brown, and so on. A featurized image may tell us that there is a dog to the top-left of the image, a person in the middle, etc. Featurization is powerful, but its precise definition is beyond the scope of this tutorial.

原始图像告诉我们位置( 0, 0 )是红色,( 1, 0 )是棕色,依此类推。 特征化的图像可能会告诉我们图像的左上角有一条狗,中间是一个人,等等。特征化功能强大,但其精确定义超出了本教程的范围。

We’ll use an approximation for the radial basis function (RBF) kernel, using a random Gaussian matrix. We won’t go into detail in this tutorial. Instead, we’ll treat this as a black box that computes higher-order features for us.

我们将使用随机高斯矩阵对径向基函数(RBF)内核使用近似值 。 我们不会在本教程中详细介绍。 相反,我们将其视为为我们计算高阶特征的黑匣子。

We’ll continue where we left off in the previous step. Copy the previous script so you have a good starting point:

我们将继续在上一步中停止的地方。 复制上一个脚本,这样您便有了一个很好的起点:

  • cp step_5_ls_simple.py step_6_ls_simple.py

    cp step_5_ls_simple.py step_6_ls_simple.py

Open the new file in your editor:

在编辑器中打开新文件:

  • nano step_6_ls_simple.py

    纳米step_6_ls_simple.py

We’ll start by creating the featurizing random matrix. Again, we’ll use only 100 features in our new feature space.

我们将从创建特征化随机矩阵开始。 同样,我们将在新功能空间中仅使用100个功能。

Locate the following line, defining A_train and A_test:

找到以下行,定义A_trainA_test

step_6_ls_simple.py
step_6_ls_simple.py
# select first 100 dimensions
    A_train, A_test = X_train[:, :100], X_test[:, :100]

Directly above this definition for A_train and A_test, add a random feature matrix:

在此定义的A_trainA_test正上方,添加一个随机特征矩阵:

step_6_ls_simple.py
step_6_ls_simple.py
d = 100
    W = np.random.normal(size=(X_train.shape[1], d))
    # select first 100 dimensions
    A_train, A_test = X_train[:, :100], X_test[:, :100]  ...

Then replace the definitions for A_train and A_test. We redefine our matrices, called design matrices, using this random featurization.

然后替换A_trainA_test的定义。 使用这种随机特征化,我们重新定义了称为设计矩阵的矩阵。

step_6_ls_simple.py
step_6_ls_simple.py
A_train, A_test = X_train.dot(W), X_test.dot(W)

Save your file and run the script.

保存文件并运行脚本。

  • python step_6_ls_simple.py

    python step_6_ls_simple.py

You’ll see the following output:

您将看到以下输出:


   
     
     
     
     
Output
(ols) Train Accuracy: 0.584174642717 (ols) Test Accuracy: 0.584425799685

This featurization now offers 58.4% train accuracy and 58.4% validation accuracy, a 13.1% improvement in validation results. We trimmed the X matrix to be 100 x 100, but the choice of 100 was arbirtary. We could also trim the X matrix to be 1000 x 1000 or 50 x 50. Say the dimension of x is d x d. We can test more values of d by re-trimming X to be d x d and recomputing a new model.

现在,这种特征化可提供58.4%的火车精度和58.4%的验证精度,验证结果提高了13.1%。 我们将X矩阵修整为100 x 100 ,但选择100是任意的。 我们还可以将X矩阵修剪为1000 x 100050 x 50 。 假设x的尺寸为dxd 。 我们可以通过将X重新修剪为dxd并重新计算一个新模型来测试d的更多值。

Trying more values of d, we find an additional 4.3% improvement in test accuracy to 61.7%. In the following figure, we consider the performance of our new classifier as we vary d. Intuitively, as d increases, the accuracy should also increase, as we use more and more of our original data. Rather than paint a rosy picture, however, the graph exhibits a negative trend:

尝试使用更多的d值,我们发现测试准确性提高了4.3%,达到61.7%。 在下图中,随着d变化,我们将考虑新分类器的性能。 直观地,随着d增加,随着我们使用越来越多的原始数据,精度也应增加。 但是,该图并没有画出乐观的图片,而是呈现出负面趋势:

As we keep more of our data, the gap between the training and validation accuracies increases as well. This is clear evidence of overfitting, where our model is learning representations that are no longer generalizable to all data. To combat overfitting, we’ll regularize our model by penalizing complex models.

随着我们保留更多数据,训练和验证准确性之间的差距也越来越大。 这是过度拟合的明确证据,其中我们的模型正在学习不再能够推广到所有数据的表示形式。 为了防止过度拟合,我们将通过惩罚复杂模型来规范化我们的模型。

We amend our ordinary least-squares objective function with a regularization term, giving us a new objective. Our new objective function is called ridge regression and it looks like this:

我们用正则项修正我们的普通最小二乘目标函数,从而给我们一个新的目标。 我们新的目标函数称为岭回归 ,它看起来像这样:

min_w |Aw- y|^2 + lambda |w|^2

In this equation, lambda is a tunable hyperparameter. Plug lambda = 0 into the equation and ridge regression becomes least-squares. Plug lambda = infinity into the equation, and you’ll find the best w must now be zero, as any non-zero w incurs infinite loss. As it turns out, this objective yields a closed-form solution as well:

lambda式中, lambda是可调超参数。 将lambda = 0插入方程式,岭回归成为最小二乘。 将lambda = infinity插入方程式,您会发现最佳w现在必须为零,因为任何非零w引起无限损失。 事实证明,这个目标也产生了一个封闭形式的解决方案:

w^* = (A^TA + lambda I)^{-1}A^Ty

Still using the featurized samples, retrain and reevaluate the model once more.

仍然使用特征化的样本,再次训练和重新评估模型。

Open step_6_ls_simple.py again in your editor:

在编辑器中再次打开step_6_ls_simple.py

  • nano step_6_ls_simple.py

    纳米step_6_ls_simple.py

This time, increase the dimensionality of the new feature space to d=1000​. Change the value of d from 100 to 1000 as shown in the following code block:

这一次,增加新的功能空间的维度d=1000​ 。 如以下代码块所示,将d的值从100更改为1000

step_6_ls_simple.py
step_6_ls_simple.py
...
    d = 1000
    W = np.random.normal(size=(X_train.shape[1], d))
...

Then apply ridge regression using a regularization of lambda = 10^{10}. Replace the line defining w with the following two lines:

然后使用lambda = 10^{10}的正则化应用岭回归。 用以下两行替换定义w的行:

step_6_ls_simple.py
step_6_ls_simple.py
...
    # train model
    I = np.eye(A_train.shape[1])
    w = np.linalg.inv(A_train.T.dot(A_train) + 1e10 * I).dot(A_train.T.dot(Y_oh_train))

Then locate this block:

然后找到此块:

step_6_ls_simple.py
step_6_ls_simple.py
...
  ols_train_accuracy = evaluate(A_train, Y_train, w)
  print('(ols) Train Accuracy:', ols_train_accuracy)
  ols_test_accuracy = evaluate(A_test, Y_test, w)
  print('(ols) Test Accuracy:', ols_test_accuracy)

Replace it with the following:

将其替换为以下内容:

step_6_ls_simple.py
step_6_ls_simple.py
...

  print('(ridge) Train Accuracy:', evaluate(A_train, Y_train, w))
  print('(ridge) Test Accuracy:', evaluate(A_test, Y_test, w))

The completed script should look like this:

完成的脚本应如下所示:

step_6_ls_simple.py
step_6_ls_simple.py
"""Train emotion classifier using least squares."""

import numpy as np

def evaluate(A, Y, w):
    Yhat = np.argmax(A.dot(w), axis=1)
    return np.sum(Yhat == Y) / Y.shape[0]

def main():
    # load data
    with np.load('data/fer2013_train.npz') as data:
        X_train, Y_train = data['X'], data['Y']

    with np.load('data/fer2013_test.npz') as data:
        X_test, Y_test = data['X'], data['Y']

    # one-hot labels
    I = np.eye(6)
    Y_oh_train, Y_oh_test = I[Y_train], I[Y_test]
    d = 1000
    W = np.random.normal(size=(X_train.shape[1], d))
    # select first 100 dimensions
    A_train, A_test = X_train.dot(W), X_test.dot(W)

    # train model
    I = np.eye(A_train.shape[1])
    w = np.linalg.inv(A_train.T.dot(A_train) + 1e10 * I).dot(A_train.T.dot(Y_oh_train))

    # evaluate model
    print('(ridge) Train Accuracy:', evaluate(A_train, Y_train, w))
    print('(ridge) Test Accuracy:', evaluate(A_test, Y_test, w))

if __name__ == '__main__':
    main()

Save the file, exit your editor, and run the script:

保存文件,退出编辑器,然后运行脚本:

  • python step_6_ls_simple.py

    python step_6_ls_simple.py

You’ll see the following output:

您将看到以下输出:


   
     
     
     
     
Output
(ridge) Train Accuracy: 0.651173462698 (ridge) Test Accuracy: 0.622181436812

There’s an additional improvement of 0.4% in validation accuracy to 62.2%, as train accuracy drops to 65.1%. Once again reevaluating across a number of different d, we see a smaller gap between training and validation accuracies for ridge regression. In other words, ridge regression was subject to less overfitting.

验证精度下降了65.1%,验证精度又提高了0.4%,达到62.2%。 再次重新评估多个不同的d ,我们看到岭回归的训练和验证准确性之间的差距较小。 换句话说,岭回归的拟合程度较小。

Baseline performance for least squares, with these extra enhancements, performs reasonably well. The training and inference times, all together, take no more than 20 seconds for even the best results. In the next section, you’ll explore even more complex models.

有了这些额外的增强,最小二乘的基准性能表现相当不错。 训练和推理时间总共花费不超过20秒,即使是最佳结果。 在下一节中,您将探索更复杂的模型。

第7步–在PyTorch中使用卷积神经网络构建面部表情分类器 (Step 7 — Building the Face-Emotion Classifier Using a Convolutional Neural Network in PyTorch)

In this section, you’ll build a second emotion classifier using neural networks instead of least squares. Again, our goal is to produce a model that accepts faces as input and outputs an emotion. Eventually, this classifier will then determine which dog mask to apply.

在本节中,您将使用神经网络而不是最小二乘建立第二个情感分类器。 同样,我们的目标是产生一个接受面Kong作为输入并输出情感的模型。 最终,该分类器将确定要应用的狗罩。

For a brief neural network visualization and introduction, see the article Understanding Neural Networks. Here, we will use a deep-learning library called PyTorch. There are a number of deep-learning libraries in widespread use, and each has various pros and cons. PyTorch is a particularly good place to start. To impliment this neural network classifier, we again take three steps, as we did with the least-squares classifier:

有关简短的神经网络可视化和介绍,请参阅了解神经网络一文 。 在这里,我们将使用称为PyTorch的深度学习库。 有许多广泛使用的深度学习库,每个都有各自的优缺点。 PyTorch是一个特别好的起点。 为了隐含此神经网络分类器,我们再次采用三个步骤,就像我们对最小二乘分类器所做的那样:

  1. Preprocess the data: Apply one-hot encoding and then apply PyTorch abstractions.

    预处理数据:应用一键编码,然后应用PyTorch抽象。
  2. Specify and train the model: Set up a neural network using PyTorch layers. Define optimization hyperparameters and run stochastic gradient descent.

    指定并训练模型:使用PyTorch图层建立神经网络。 定义优化超参数并运行随机梯度下降。
  3. Run a prediction using the model: Evaluate the neural network.

    使用模型运行预测:评估神经网络。

Create a new file, named step_7_fer_simple.py

创建一个新文件,名为step_7_fer_simple.py

  • nano step_7_fer_simple.py

    纳米step_7_fer_simple.py

Import the necessary utilities and create a Python class that will hold your data. For data processing here, you will create the train and test datasets. To do these, implement PyTorch’s Dataset interface, which lets you load and use PyTorch’s built-in data pipeline for the face-emotion recognition dataset:

导入必要的实用程序,并创建一个Python 类来保存您的数据。 对于此处的数据处理,您将创建训练和测试数据集。 为此,请实现PyTorch的Dataset接口,该接口可让您加载和使用PyTorch的内置数据管道用于面部表情识别数据集:

step_7_fer_simple.py
step_7_fer_simple.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import torch
import cv2
import argparse


class Fer2013Dataset(Dataset):
    """Face Emotion Recognition dataset.

    Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
    and Aaron Courville in 2013.

    Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
    """
    pass

Delete the pass placeholder in the Fer2013Dataset class. In its place, add a function that will initialize our data holder:

删除Fer2013Dataset类中的pass占位符。 在其位置添加一个函数,该函数将初始化我们的数据持有人:

step_7_fer_simple.py
step_7_fer_simple.py
def __init__(self, path: str):
        """
        Args:
            path: Path to `.np` file containing sample nxd and label nx1
        """
        with np.load(path) as data:
            self._samples = data['X']
            self._labels = data['Y']
        self._samples = self._samples.reshape((-1, 1, 48, 48))

        self.X = Variable(torch.from_numpy(self._samples)).float()
        self.Y = Variable(torch.from_numpy(self._labels)).float()
...

This function starts by loading the samples and labels. Then it wraps the data in PyTorch data structures.

此功能首先加载样品和标签。 然后将数据包装在PyTorch数据结构中。

Directly after the __init__ function, add a __len__ function, as this is needed to implement the Dataset interface PyTorch expects:

__init__函数之后,直接添加__len__函数,这是实现PyTorch期望的Dataset接口所需的:

step_7_fer_simple.py
step_7_fer_simple.py
...
    def __len__(self):
        return len(self._labels)

Finally, add a __getitem__ method, which returns a dictionary containing the sample and the label:

最后,添加__getitem__方法,该方法返回包含样本和标签的字典 :

step_7_fer_simple.py
step_7_fer_simple.py
def __getitem__(self, idx):
        return {'image': self._samples[idx], 'label': self._labels[idx]}

Double-check that your file looks like the following:

仔细检查您的文件如下所示:

step_7_fer_simple.py
step_7_fer_simple.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import torch
import cv2
import argparse


class Fer2013Dataset(Dataset):
    """Face Emotion Recognition dataset.
    Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
    and Aaron Courville in 2013.
    Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
    """

    def __init__(self, path: str):
        """
        Args:
            path: Path to `.np` file containing sample nxd and label nx1
        """
        with np.load(path) as data:
            self._samples = data['X']
            self._labels = data['Y']
        self._samples = self._samples.reshape((-1, 1, 48, 48))

        self.X = Variable(torch.from_numpy(self._samples)).float()
        self.Y = Variable(torch.from_numpy(self._labels)).float()

    def __len__(self):
        return len(self._labels)

    def __getitem__(self, idx):
        return {'image': self._samples[idx], 'label': self._labels[idx]}

Next, load the Fer2013Dataset dataset. Add the following code to the end of your file after the Fer2013Dataset class:

接下来,加载Fer2013Dataset数据集。 将以下代码添加到Fer2013Dataset类之后的文件Fer2013Dataset

step_7_fer_simple.py
step_7_fer_simple.py
trainset = Fer2013Dataset('data/fer2013_train.npz')
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)

testset = Fer2013Dataset('data/fer2013_test.npz')
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)

This code initializes the dataset using the Fer2013Dataset class you created. Then for the train and validation sets, it wraps the dataset in a DataLoader. This translates the dataset into an iterable to use later.

此代码使用您创建的Fer2013Dataset类初始化数据集。 然后,对于训练集和验证集,它将数据集包装在DataLoader 。 这会将数据集转换为可迭代以供以后使用。

As a sanity check, verify that the dataset utilities are functioning. Create a sample dataset loader using DataLoader and print the first element of that loader. Add the following to the end of your file:

As a sanity check, verify that the dataset utilities are functioning. 使用DataLoader创建样本数据集加载器,并打印该加载器的第一个元素。 将以下内容添加到文件末尾:

step_7_fer_simple.py
step_7_fer_simple.py
if __name__ == '__main__':
    loader = torch.utils.data.DataLoader(trainset, batch_size=2, shuffle=False)
    print(next(iter(loader)))

Verify that your completed script looks like this:

Verify that your completed script looks like this:

step_7_fer_simple.py
step_7_fer_simple.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import torch
import cv2
import argparse


class Fer2013Dataset(Dataset):
    """Face Emotion Recognition dataset.
    Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
    and Aaron Courville in 2013.
    Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
    """

    def __init__(self, path: str):
        """
        Args:
            path: Path to `.np` file containing sample nxd and label nx1
        """
        with np.load(path) as data:
            self._samples = data['X']
            self._labels = data['Y']
        self._samples = self._samples.reshape((-1, 1, 48, 48))

        self.X = Variable(torch.from_numpy(self._samples)).float()
        self.Y = Variable(torch.from_numpy(self._labels)).float()

    def __len__(self):
        return len(self._labels)

    def __getitem__(self, idx):
        return {'image': self._samples[idx], 'label': self._labels[idx]}

trainset = Fer2013Dataset('data/fer2013_train.npz')
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)

testset = Fer2013Dataset('data/fer2013_test.npz')
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)

if __name__ == '__main__':
    loader = torch.utils.data.DataLoader(trainset, batch_size=2, shuffle=False)
    print(next(iter(loader)))

Exit your editor and run the script.

Exit your editor and run the script.

  • python step_7_fer_simple.py

    python step_7_fer_simple.py

This outputs the following pair of tensors. Our data pipeline outputs two samples and two labels. This indicates that our data pipeline is up and ready to go:

This outputs the following pair of tensors . 我们的数据管道输出两个样本和两个标签。 这表明我们的数据管道已启动并准备就绪:


   
     
     
     
     
Output
{'image': (0 ,0 ,.,.) = 24 32 36 ... 173 172 173 25 34 29 ... 173 172 173 26 29 25 ... 172 172 174 ... ⋱ ... 159 185 157 ... 157 156 153 136 157 187 ... 152 152 150 145 130 161 ... 142 143 142 ⋮ (1 ,0 ,.,.) = 20 17 19 ... 187 176 162 22 17 17 ... 195 180 171 17 17 18 ... 203 193 175 ... ⋱ ... 1 1 1 ... 106 115 119 2 2 1 ... 103 111 119 2 2 2 ... 99 107 118 [torch.LongTensor of size 2x1x48x48] , 'label': 1 1 [torch.LongTensor of size 2] }

Now that you’ve verified that the data pipeline works, return to step_7_fer_simple.py to add the neural network and optimizer. Open step_7_fer_simple.py.

Now that you've verified that the data pipeline works, return to step_7_fer_simple.py to add the neural network and optimizer. Open step_7_fer_simple.py .

  • nano step_7_fer_simple.py

    nano step_7_fer_simple.py

First, delete the last three lines you added in the previous iteration:

First, delete the last three lines you added in the previous iteration:

step_7_fer_simple.py
step_7_fer_simple.py
# Delete all three lines
if __name__ == '__main__':
    loader = torch.utils.data.DataLoader(trainset, batch_size=2, shuffle=False)
    print(next(iter(loader)))

In their place, define a PyTorch neural network that includes three convolutional layers, followed by three fully connected layers. Add this to the end of your existing script:

In their place, define a PyTorch neural network that includes three convolutional layers, followed by three fully connected layers. 将此添加到现有脚本的末尾:

step_7_fer_simple.py
step_7_fer_simple.py
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 6, 3)
        self.conv3 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 48)
        self.fc3 = nn.Linear(48, 3)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 16 * 4 * 4)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Now initialize the neural network, define a loss function, and define optimization hyperparameters by adding the following code to the end of the script:

现在,通过在脚本末尾添加以下代码,初始化神经网络,定义损失函数并定义优化超参数:

step_7_fer_simple.py
step_7_fer_simple.py
net = Net().float()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

We’ll train for two epochs. For now, we define an epoch to be an iteration of training where every training sample has been used exactly once.

We'll train for two epochs . For now, we define an epoch to be an iteration of training where every training sample has been used exactly once.

First, extract image and label from the dataset loader and then wrap each in a PyTorch Variable. Second, run the forward pass and then backpropagate through the loss and neural network. Add the following code to the end of your script to do that:

First, extract image and label from the dataset loader and then wrap each in a PyTorch Variable . Second, run the forward pass and then backpropagate through the loss and neural network. Add the following code to the end of your script to do that:

step_7_fer_simple.py
step_7_fer_simple.py
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs = Variable(data['image'].float())
        labels = Variable(data['label'].long())
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.data[0]
        if i % 100 == 0:
            print('[%d, %5d] loss: %.3f' % (epoch, i, running_loss / (i + 1)))

Your script should now look like this:

Your script should now look like this:

step_7_fer_simple.py
step_7_fer_simple.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import torch
import cv2
import argparse


class Fer2013Dataset(Dataset):
    """Face Emotion Recognition dataset.

    Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
    and Aaron Courville in 2013.

    Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
    """
    def __init__(self, path: str):
        """
        Args:
            path: Path to `.np` file containing sample nxd and label nx1
        """
        with np.load(path) as data:
            self._samples = data['X']
            self._labels = data['Y']
        self._samples = self._samples.reshape((-1, 1, 48, 48))

        self.X = Variable(torch.from_numpy(self._samples)).float()
        self.Y = Variable(torch.from_numpy(self._labels)).float()

    def __len__(self):
        return len(self._labels)


    def __getitem__(self, idx):
        return {'image': self._samples[idx], 'label': self._labels[idx]}


trainset = Fer2013Dataset('data/fer2013_train.npz')
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True)

testset = Fer2013Dataset('data/fer2013_test.npz')
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False)


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 6, 3)
        self.conv3 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 48)
        self.fc3 = nn.Linear(48, 3)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 16 * 4 * 4)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net().float()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)


for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs = Variable(data['image'].float())
        labels = Variable(data['label'].long())
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.data[0]
        if i % 100 == 0:
            print('[%d, %5d] loss: %.3f' % (epoch, i, running_loss / (i + 1)))

Save the file and exit the editor once you’ve verified your code. Then, launch this proof-of-concept training:

Save the file and exit the editor once you've verified your code. Then, launch this proof-of-concept training:

  • python step_7_fer_simple.py

    python step_7_fer_simple.py

You’ll see output similar to the following as the neural network trains:

You'll see output similar to the following as the neural network trains:


   
     
     
     
     
Output
[0, 0] loss: 1.094 [0, 100] loss: 1.049 [0, 200] loss: 1.009 [0, 300] loss: 0.963 [0, 400] loss: 0.935 [1, 0] loss: 0.760 [1, 100] loss: 0.768 [1, 200] loss: 0.775 [1, 300] loss: 0.776 [1, 400] loss: 0.767

You can then augment this script using a number of other PyTorch utilities to save and load models, output training and validation accuracies, fine-tune a learning-rate schedule, etc. After training for 20 epochs with a learning rate of 0.01 and momentum of 0.9, our neural network attains a 87.9% train accuracy and a 75.5% validation accuracy, a further 6.8% improvement over the most successful least-squares approach thus far at 66.6%. We’ll include these additional bells and whistles in a new script.

You can then augment this script using a number of other PyTorch utilities to save and load models, output training and validation accuracies, fine-tune a learning-rate schedule, etc. After training for 20 epochs with a learning rate of 0.01 and momentum of 0.9, our neural network attains a 87.9% train accuracy and a 75.5% validation accuracy, a further 6.8% improvement over the most successful least-squares approach thus far at 66.6%. We'll include these additional bells and whistles in a new script.

Create a new file to hold the final face emotion detector which your live camera feed will use. This script contains the code above along with a command-line interface and an easy-to-import version of our code that will be used later. Additionally, it contains the hyperparameters tuned in advance, for a model with higher accuracy.

Create a new file to hold the final face emotion detector which your live camera feed will use. This script contains the code above along with a command-line interface and an easy-to-import version of our code that will be used later. Additionally, it contains the hyperparameters tuned in advance, for a model with higher accuracy.

  • nano step_7_fer.py

    nano step_7_fer.py

Start with the following imports. This matches our previous file but additionally includes OpenCV as import cv2.

Start with the following imports. This matches our previous file but additionally includes OpenCV as import cv2.

step_7_fer.py
step_7_fer.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import torch
import cv2
import argparse

Directly beneath these imports, reuse your code from step_7_fer_simple.py to define the neural network:

Directly beneath these imports, reuse your code from step_7_fer_simple.py to define the neural network:

step_7_fer.py
step_7_fer.py
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 6, 3)
        self.conv3 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 48)
        self.fc3 = nn.Linear(48, 3)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 16 * 4 * 4)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Again, reuse the code for the Face Emotion Recognition dataset from step_7_fer_simple.py and add it to this file:

Again, reuse the code for the Face Emotion Recognition dataset from step_7_fer_simple.py and add it to this file:

step_7_fer.py
step_7_fer.py
class Fer2013Dataset(Dataset):
    """Face Emotion Recognition dataset.
    Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
    and Aaron Courville in 2013.
    Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
    """

    def __init__(self, path: str):
        """
        Args:
            path: Path to `.np` file containing sample nxd and label nx1
        """
        with np.load(path) as data:
            self._samples = data['X']
            self._labels = data['Y']
        self._samples = self._samples.reshape((-1, 1, 48, 48))

        self.X = Variable(torch.from_numpy(self._samples)).float()
        self.Y = Variable(torch.from_numpy(self._labels)).float()

    def __len__(self):
        return len(self._labels)

    def __getitem__(self, idx):
        return {'image': self._samples[idx], 'label': self._labels[idx]}

Next, define a few utilities to evaluate the neural network’s performance. First, add an evaluate function which compares the neural network’s predicted emotion to the true emotion for a single image:

Next, define a few utilities to evaluate the neural network's performance. First, add an evaluate function which compares the neural network's predicted emotion to the true emotion for a single image:

step_7_fer.py
step_7_fer.py
def evaluate(outputs: Variable, labels: Variable, normalized: bool=True) -> float:
    """Evaluate neural network outputs against non-one-hotted labels."""
    Y = labels.data.numpy()
    Yhat = np.argmax(outputs.data.numpy(), axis=1)
    denom = Y.shape[0] if normalized else 1
    return float(np.sum(Yhat == Y) / denom)

Then add a function called batch_evaluate which applies the first function to all images:

Then add a function called batch_evaluate which applies the first function to all images:

step_7_fer.py
step_7_fer.py
def batch_evaluate(net: Net, dataset: Dataset, batch_size: int=500) -> float:
    """Evaluate neural network in batches, if dataset is too large."""
    score = 0.0
    n = dataset.X.shape[0]
    for i in range(0, n, batch_size):
        x = dataset.X[i: i + batch_size]
        y = dataset.Y[i: i + batch_size]
        score += evaluate(net(x), y, False)
    return score / n

Now, define a function called get_image_to_emotion_predictor that takes in an image and outputs a predicted emotion, using a pretrained model:

Now, define a function called get_image_to_emotion_predictor that takes in an image and outputs a predicted emotion, using a pretrained model:

step_7_fer.py
step_7_fer.py
def get_image_to_emotion_predictor(model_path='assets/model_best.pth'):
    """Returns predictor, from image to emotion index."""
    net = Net().float()
    pretrained_model = torch.load(model_path)
    net.load_state_dict(pretrained_model['state_dict'])

    def predictor(image: np.array):
        """Translates images into emotion indices."""
        if image.shape[2] > 1:
            image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        frame = cv2.resize(image, (48, 48)).reshape((1, 1, 48, 48))
        X = Variable(torch.from_numpy(frame)).float()
        return np.argmax(net(X).data.numpy(), axis=1)[0]
    return predictor

Finally, add the following code to define the main function to leverage the other utilities:

Finally, add the following code to define the main function to leverage the other utilities:

step_7_fer.py
step_7_fer.py
def main():
    trainset = Fer2013Dataset('data/fer2013_train.npz')
    testset = Fer2013Dataset('data/fer2013_test.npz')
    net = Net().float()

    pretrained_model = torch.load("assets/model_best.pth")
    net.load_state_dict(pretrained_model['state_dict'])

    train_acc = batch_evaluate(net, trainset, batch_size=500)
    print('Training accuracy: %.3f' % train_acc)
    test_acc = batch_evaluate(net, testset, batch_size=500)
    print('Validation accuracy: %.3f' % test_acc)


if __name__ == '__main__':
    main()

This loads a pretrained neural network and evaluates its performance on the provided Face Emotion Recognition dataset. Specifically, the script outputs accuracy on the images we used for training, as well as a separate set of images we put aside for testing purposes.

This loads a pretrained neural network and evaluates its performance on the provided Face Emotion Recognition dataset. Specifically, the script outputs accuracy on the images we used for training, as well as a separate set of images we put aside for testing purposes.

Double-check that your file matches the following:

仔细检查您的文件是否符合以下条件:

step_7_fer.py
step_7_fer.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import torch
import cv2
import argparse

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 6, 3)
        self.conv3 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 48)
        self.fc3 = nn.Linear(48, 3)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 16 * 4 * 4)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


class Fer2013Dataset(Dataset):
    """Face Emotion Recognition dataset.
    Utility for loading FER into PyTorch. Dataset curated by Pierre-Luc Carrier
    and Aaron Courville in 2013.
    Each sample is 1 x 1 x 48 x 48, and each label is a scalar.
    """

    def __init__(self, path: str):
        """
        Args:
            path: Path to `.np` file containing sample nxd and label nx1
        """
        with np.load(path) as data:
            self._samples = data['X']
            self._labels = data['Y']
        self._samples = self._samples.reshape((-1, 1, 48, 48))

        self.X = Variable(torch.from_numpy(self._samples)).float()
        self.Y = Variable(torch.from_numpy(self._labels)).float()

    def __len__(self):
        return len(self._labels)

    def __getitem__(self, idx):
        return {'image': self._samples[idx], 'label': self._labels[idx]}


def evaluate(outputs: Variable, labels: Variable, normalized: bool=True) -> float:
    """Evaluate neural network outputs against non-one-hotted labels."""
    Y = labels.data.numpy()
    Yhat = np.argmax(outputs.data.numpy(), axis=1)
    denom = Y.shape[0] if normalized else 1
    return float(np.sum(Yhat == Y) / denom)


def batch_evaluate(net: Net, dataset: Dataset, batch_size: int=500) -> float:
    """Evaluate neural network in batches, if dataset is too large."""
    score = 0.0
    n = dataset.X.shape[0]
    for i in range(0, n, batch_size):
        x = dataset.X[i: i + batch_size]
        y = dataset.Y[i: i + batch_size]
        score += evaluate(net(x), y, False)
    return score / n


def get_image_to_emotion_predictor(model_path='assets/model_best.pth'):
    """Returns predictor, from image to emotion index."""
    net = Net().float()
    pretrained_model = torch.load(model_path)
    net.load_state_dict(pretrained_model['state_dict'])

    def predictor(image: np.array):
        """Translates images into emotion indices."""
        if image.shape[2] > 1:
            image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        frame = cv2.resize(image, (48, 48)).reshape((1, 1, 48, 48))
        X = Variable(torch.from_numpy(frame)).float()
        return np.argmax(net(X).data.numpy(), axis=1)[0]
    return predictor


def main():
    trainset = Fer2013Dataset('data/fer2013_train.npz')
    testset = Fer2013Dataset('data/fer2013_test.npz')
    net = Net().float()

    pretrained_model = torch.load("assets/model_best.pth")
    net.load_state_dict(pretrained_model['state_dict'])

    train_acc = batch_evaluate(net, trainset, batch_size=500)
    print('Training accuracy: %.3f' % train_acc)
    test_acc = batch_evaluate(net, testset, batch_size=500)
    print('Validation accuracy: %.3f' % test_acc)


if __name__ == '__main__':
    main(

Save the file and exit your editor.

保存文件并退出编辑器。

As before, with the face detector, download pre-trained model parameters and save them to your assets folder with the following command:

As before, with the face detector, download pre-trained model parameters and save them to your assets folder with the following command:

  • wget -O assets/model_best.pth https://github.com/alvinwan/emotion-based-dog-filter/raw/master/src/assets/model_best.pth

    wget -O assets/model_best.pth https://github.com/alvinwan/emotion-based-dog-filter/raw/master/src/assets/model_best.pth

Run the script to use and evaluate the pre-trained model:

Run the script to use and evaluate the pre-trained model:

  • python step_7_fer.py

    python step_7_fer.py

This will output the following:

这将输出以下内容:


   
     
     
     
     
Output
Training accuracy: 0.879 Validation accuracy: 0.755

At this point, you’ve built a pretty accurate face-emotion classifier. In essence, our model can correctly disambiguate between faces that are happy, sad, and surprised eight out of ten times. This is a reasonably good model, so you can now move on to using this face-emotion classifier to determine which dog mask to apply to faces.

At this point, you've built a pretty accurate face-emotion classifier. In essence, our model can correctly disambiguate between faces that are happy, sad, and surprised eight out of ten times. This is a reasonably good model, so you can now move on to using this face-emotion classifier to determine which dog mask to apply to faces.

Step 8 — Finishing the Emotion-Based Dog Filter (Step 8 — Finishing the Emotion-Based Dog Filter)

Before integrating our brand-new face-emotion classifier, we will need animal masks to pick from. We’ll use a Dalmation mask and a Sheepdog mask:

Before integrating our brand-new face-emotion classifier, we will need animal masks to pick from. We'll use a Dalmation mask and a Sheepdog mask:

Execute these commands to download both masks to your assets folder:

Execute these commands to download both masks to your assets folder:

  • wget -O assets/dalmation.png https://assets.digitalocean.com/articles/python3_dogfilter/E9ax7PI.png # dalmation

    wget -O assets/dalmation.png https://assets.digitalocean.com/articles/python3_dogfilter/E9ax7PI.png # dalmation
  • wget -O assets/sheepdog.png https://assets.digitalocean.com/articles/python3_dogfilter/HveFdkg.png # sheepdog

    wget -O assets/sheepdog.png https://assets.digitalocean.com/articles/python3_dogfilter/HveFdkg.png # sheepdog

Now let’s use the masks in our filter. Start by duplicating the step_4_dog_mask.py file:

Now let's use the masks in our filter. Start by duplicating the step_4_dog_mask.py file:

  • cp step_4_dog_mask.py step_8_dog_emotion_mask.py

    cp step_4_dog_mask.py step_8_dog_emotion_mask.py

Open the new Python script.

Open the new Python script.

  • nano step_8_dog_emotion_mask.py

    nano step_8_dog_emotion_mask.py

Insert a new line at the top of the script to import the emotion predictor:

Insert a new line at the top of the script to import the emotion predictor:

step_8_dog_emotion_mask.py
step_8_dog_emotion_mask.py
from step_7_fer import get_image_to_emotion_predictor
...

Then, in the main() function, locate this line:

Then, in the main() function, locate this line:

step_8_dog_emotion_mask.py
step_8_dog_emotion_mask.py
mask = cv2.imread('assets/dog.png')

Replace it with the following to load the new masks and aggregate all masks into a tuple:

Replace it with the following to load the new masks and aggregate all masks into a tuple:

step_8_dog_emotion_mask.py
step_8_dog_emotion_mask.py
mask0 = cv2.imread('assets/dog.png')
    mask1 = cv2.imread('assets/dalmation.png')
    mask2 = cv2.imread('assets/sheepdog.png')
    masks = (mask0, mask1, mask2)

Add a line break, and then add this code to create the emotion predictor.

Add a line break, and then add this code to create the emotion predictor.

step_8_dog_emotion_mask.py
step_8_dog_emotion_mask.py
# get emotion predictor
    predictor = get_image_to_emotion_predictor()

Your main function should now match the following:

Your main function should now match the following:

step_8_dog_emotion_mask.py
step_8_dog_emotion_mask.py
def main():
    cap = cv2.VideoCapture(0)

    # load mask
    mask0 = cv2.imread('assets/dog.png')
    mask1 = cv2.imread('assets/dalmation.png')
    mask2 = cv2.imread('assets/sheepdog.png')
    masks = (mask0, mask1, mask2)

    # get emotion predictor
    predictor = get_image_to_emotion_predictor()

    # initialize front face classifier
    ...

Next, locate these lines:

Next, locate these lines:

step_8_dog_emotion_mask.py
step_8_dog_emotion_mask.py
# apply mask
            frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)

Insert the following line below the # apply mask line to select the appropriate mask by using the predictor:

Insert the following line below the # apply mask line to select the appropriate mask by using the predictor:

step_8_dog_emotion_mask.py
step_8_dog_emotion_mask.py
# apply mask
            mask = masks[predictor(frame[y:y+h, x: x+w])]
            frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)

The completed file should look like this:

The completed file should look like this:

step_8_dog_emotion_mask.py
step_8_dog_emotion_mask.py
"""Test for face detection"""

from step_7_fer import get_image_to_emotion_predictor
import numpy as np
import cv2

def apply_mask(face: np.array, mask: np.array) -> np.array:
    """Add the mask to the provided face, and return the face with mask."""
    mask_h, mask_w, _ = mask.shape
    face_h, face_w, _ = face.shape

    # Resize the mask to fit on face
    factor = min(face_h / mask_h, face_w / mask_w)
    new_mask_w = int(factor * mask_w)
    new_mask_h = int(factor * mask_h)
    new_mask_shape = (new_mask_w, new_mask_h)
    resized_mask = cv2.resize(mask, new_mask_shape)

    # Add mask to face - ensure mask is centered
    face_with_mask = face.copy()
    non_white_pixels = (resized_mask < 250).all(axis=2)
    off_h = int((face_h - new_mask_h) / 2)
    off_w = int((face_w - new_mask_w) / 2)
    face_with_mask[off_h: off_h+new_mask_h, off_w: off_w+new_mask_w][non_white_pixels] = \
         resized_mask[non_white_pixels]

    return face_with_mask

def main():

    cap = cv2.VideoCapture(0)
    # load mask
    mask0 = cv2.imread('assets/dog.png')
    mask1 = cv2.imread('assets/dalmation.png')
    mask2 = cv2.imread('assets/sheepdog.png')
    masks = (mask0, mask1, mask2)

    # get emotion predictor
    predictor = get_image_to_emotion_predictor()

    # initialize front face classifier
    cascade = cv2.CascadeClassifier("assets/haarcascade_frontalface_default.xml")

    while True:
        # Capture frame-by-frame
        ret, frame = cap.read()
        frame_h, frame_w, _ = frame.shape

        # Convert to black-and-white
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        blackwhite = cv2.equalizeHist(gray)

        rects = cascade.detectMultiScale(
            blackwhite, scaleFactor=1.3, minNeighbors=4, minSize=(30, 30),
            flags=cv2.CASCADE_SCALE_IMAGE)

        for x, y, w, h in rects:
            # crop a frame slightly larger than the face
            y0, y1 = int(y - 0.25*h), int(y + 0.75*h)
            x0, x1 = x, x + w
            # give up if the cropped frame would be out-of-bounds
            if x0 < 0 or y0 < 0 or x1 > frame_w or y1 > frame_h:
                continue
            # apply mask
            mask = masks[predictor(frame[y:y+h, x: x+w])]
            frame[y0: y1, x0: x1] = apply_mask(frame[y0: y1, x0: x1], mask)

        # Display the resulting frame
        cv2.imshow('frame', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

if __name__ == '__main__':
    main()

Save and exit your editor. Now launch the script:

保存并退出编辑器。 Now launch the script:

  • python step_8_dog_emotion_mask.py

    python step_8_dog_emotion_mask.py

Now try it out! Smiling will register as “happy” and show the original dog. A neutral face or a frown will register as “sad” and yield the dalmation. A face of “surprise,” with a nice big jaw drop, will yield the sheepdog.

Now try it out! Smiling will register as “happy” and show the original dog. A neutral face or a frown will register as “sad” and yield the dalmation. A face of “surprise,” with a nice big jaw drop, will yield the sheepdog.

This concludes our emotion-based dog filter and foray into computer vision.

This concludes our emotion-based dog filter and foray into computer vision.

结论 (Conclusion)

In this tutorial, you built a face detector and dog filter using computer vision and employed machine learning models to apply masks based on detected emotions.

In this tutorial, you built a face detector and dog filter using computer vision and employed machine learning models to apply masks based on detected emotions.

Machine learning is widely applicable. However, it’s up to the practitioner to consider the ethical implications of each application when applying machine learning. The application you built in this tutorial was a fun exercise, but remember that you relied on OpenCV and an existing dataset to identify faces, rather than supplying your own data to train the models. The data and models used have significant impacts on how a program works.

Machine learning is widely applicable. However, it's up to the practitioner to consider the ethical implications of each application when applying machine learning. The application you built in this tutorial was a fun exercise, but remember that you relied on OpenCV and an existing dataset to identify faces, rather than supplying your own data to train the models. The data and models used have significant impacts on how a program works.

For example, imagine a job search engine where the models were trained with data about candidates. such as race, gender, age, culture, first language, or other factors. And perhaps the developers trained a model that enforces sparsity, which ends up reducing the feature space to a subspace where gender explains most of the variance. As a result, the model influences candidate job searches and even company selection processes based primarily on gender. Now consider more complex situations where the model is less interpretable and you don’t know what a particular feature corresponds to. You can learn more about this in Equality of Opportunity in Machine Learning by Professor Moritz Hardt at UC Berkeley.

For example, imagine a job search engine where the models were trained with data about candidates. such as race, gender, age, culture, first language, or other factors. And perhaps the developers trained a model that enforces sparsity, which ends up reducing the feature space to a subspace where gender explains most of the variance. As a result, the model influences candidate job searches and even company selection processes based primarily on gender. Now consider more complex situations where the model is less interpretable and you don't know what a particular feature corresponds to. You can learn more about this in Equality of Opportunity in Machine Learning by Professor Moritz Hardt at UC Berkeley.

There can be an overwhelming magnitude of uncertainty in machine learning. To understand this randomness and complexity, you’ll have to develop both mathematical intuitions and probabilistic thinking skills. As a practitioner, it is up to you to dig into the theoretical underpinnings of machine learning.

There can be an overwhelming magnitude of uncertainty in machine learning. To understand this randomness and complexity, you'll have to develop both mathematical intuitions and probabilistic thinking skills. As a practitioner, it is up to you to dig into the theoretical underpinnings of machine learning.

翻译自: https://www.digitalocean.com/community/tutorials/how-to-apply-computer-vision-to-build-an-emotion-based-dog-filter-in-python-3

你可能感兴趣的:(神经网络,python,计算机视觉,机器学习,人工智能)