python 姿势估计
Head pose estimation is a challenging problem in computer vision because of the various steps required to solve it. Firstly, we need to locate the face in the frame and then the various facial landmarks. Now, recognizing the face seems a trivial task in this day and that is true with faces facing the camera. The problem arises when the face is at an angle. Add to that some facial landmarks are not visible due to the movement of the head. After this, we need to convert the points to 3D coordinates to find the inclination. Sounds like a lot of work? Don’t worry we will go step by step and refer two great resources that will make our work a lot easier.
头部姿势估计在计算机视觉中是一个具有挑战性的问题,因为解决它需要采取各种步骤。 首先,我们需要在框架中定位面部,然后定位各种面部标志。 现在,识别脸部在今天看来是一件微不足道的任务,对于面对相机的脸部来说确实如此。 当面部倾斜时会出现问题。 此外,由于头部的移动,一些面部标志也看不见。 之后,我们需要将这些点转换为3D坐标以找到倾斜度。 听起来需要很多工作? 不用担心,我们将一步一步地介绍两个重要的资源,这些资源将使我们的工作更加轻松。
目录 (Table of Contents)
- Requirements 要求
- Face Detection 人脸检测
- Facial Landmark Detection 脸部地标检测
- Pose Estimation 姿势估计
要求 (Requirements)
For this project, we need OpenCV and Tensorflow so let’s install them.
对于此项目,我们需要OpenCV和Tensorflow,因此让我们安装它们。
#Using pip
pip install opencv-python
pip install tensorflow#Using conda
conda install -c conda-forge opencv
conda install -c conda-forge tensorflow
人脸检测 (Face Detection)
Our first step is to find the faces in the images on which we can find facial landmarks. For this task, we will be using a Caffe model of OpenCV’s DNN module. If you are wondering how it fares against other models like Haar Cascades or Dlib’s frontal face detector or you want to know more about it in-depth then you can refer to this article:
我们的第一步是在图像中找到可以找到面部标志的面Kong。 对于此任务,我们将使用OpenCVDNN模块的Caffe模型。 如果您想知道它与Haar Cascades或Dlib的正面人脸检测器之类的其他型号相比效果如何,或者想深入了解它,那么可以参考本文:
You can download the required models from my GitHub repository.
您可以从GitHub 存储库下载所需的模型。
import cv2
import numpy as npmodelFile = "models/res10_300x300_ssd_iter_140000.caffemodel"
configFile = "models/deploy.prototxt.txt"
net = cv2.dnn.readNetFromCaffe(configFile, modelFile)img = cv2.imread('test.jpg')
h, w = img.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(img, (300, 300)), 1.0,
(300, 300), (104.0, 117.0, 123.0))
net.setInput(blob)
faces = net.forward()#to draw faces on image
for i in range(faces.shape[2]):
confidence = faces[0, 0, i, 2]
if confidence > 0.5:
box = faces[0, 0, i, 3:7] * np.array([w, h, w, h])
(x, y, x1, y1) = box.astype("int")
cv2.rectangle(img, (x, y), (x1, y1), (0, 0, 255), 2)
Load the network using cv2.dnn.readNetFromCaffe
and pass the model's layers and weights as its arguments. It performs best on images resized to 300x300.
使用cv2.dnn.readNetFromCaffe
加载网络,并传递模型的图层和权重作为其参数。 在将尺寸调整为300x300的图像上,效果最佳。
脸部地标检测 (Facial Landmark Detection)
The most commonly used one is Dlib’s facial landmark detection which gives us 68 landmarks, however, it does not give good accuracy. Instead, we will be using a facial landmark detector provided by Yin Guobing in this Github repo. It also gives 68 landmarks and it is a Tensorflow CNN trained on 5 datasets! The pre-trained model can be found here. The author has only written a series of posts explaining the includes background, dataset, preprocessing, model architecture, training, and deployment that can be found here. I have provided a very brief summary here, but I would strongly encourage you to read them.
最常用的一种是Dlib的面部界标检测,可为我们提供68个界标,但是,它不能提供很好的准确性。 相反,我们将在此Github存储库中使用由尹国兵提供的面部标志检测器。 它还提供了68个地标,并且它是一个Tensorflow CNN,经过5个数据集训练! 预训练的模型可以在这里找到。 笔者只写了一个系列文章解释包括背景,数据集,预处理,模型架构,培训和部署,可以发现在这里 。 我在这里提供了一个非常简短的摘要,但是我强烈建议您阅读它们。
In the first of those series, he describes the problem of stability of facial landmarks in videos followed by labeling out the existing solutions like OpenFace and Dlib’s facial landmark detection along with the datasets available. The third article is all about data preprocessing and making it ready to use. In the next two articles, the work is to extract the faces and apply facial landmarks on it to make it ready to train a CNN and store them as TFRecord files. In the sixth article, a model is trained using Tensorflow. In this article, we can see how important loss functions are in training as first he used tf.losses.mean_pairwise_squared_error
which uses the relationships between points as the basis for optimization when minimizing loss and could not generalize well. In contrast, when tf.losses.mean_squared_error
was used it worked well. In the final article, the model is exported as an API and shown how to use it in Python.
在第一个系列中,他描述了视频中面部标志的稳定性问题,然后标记了现有解决方案(如OpenFace和Dlib的面部标志检测)以及可用的数据集。 第三篇文章都是关于数据预处理并使其可供使用的。 在接下来的两篇文章中,工作是提取面部并在其上应用面部标志,以使其准备训练CNN并将其存储为TFRecord文件。 在第六篇文章中,使用Tensorflow训练模型。 在本文中,我们可以看到损失函数在训练中有多重要,因为他首先使用tf.losses.mean_pairwise_squared_error
,该函数在最小化损失时将点之间的关系作为优化的基础,并且不能很好地概括。 相反,使用tf.losses.mean_squared_error
时效果很好。 在最后一篇文章中,该模型被导出为API,并展示了如何在Python中使用它。
The model takes square boxes of size 128x128 which contain faces and return 68 facial landmarks. The code provided below is taken from here and it can also be used to draw 3D annotation boxes on it. The code is modified to draw facial landmarks on all the faces, unlike the original code which would draw on only one.
该模型采用大小为128x128的方形框,其中包含面部并返回68个面部地标。 下面提供的代码是从此处获取的,也可以用于在其上绘制3D注释框。 修改了该代码以在所有面Kong上绘制脸部界标,这与仅在一个表面上绘制的原始代码不同。
This code will draw facial landmarks on the faces.
此代码将在脸上绘制面部标志。
Using the draw_annotation_box()
we can also draw the annotation box as shown below.
使用draw_annotation_box()
我们还可以绘制注释框,如下所示。
姿势估计 (Pose Estimation)
This is a great article on Learn OpenCV which explains head pose detection on images with a lot of Maths about converting the points to 3D space and using cv2.solvePnP
to find rotational and translational vectors. A quick read-through of that article will be great to understand the intrinsic working and hence I will write about it only in brief here.
这是一篇有关Learn OpenCV的优秀文章 ,其中介绍了使用许多数学方法来对图像进行头部姿势检测,这些数学方法涉及将点转换为3D空间并使用cv2.solvePnP
查找旋转矢量和平移矢量。 快速阅读该文章将有助于理解内在的工作原理,因此在这里我仅作简要介绍。
We need six points of the face i.e. is nose tip, chin, extreme left and right points of lips, and the left corner of the left eye and right corner of the right eye. We take standard 3D coordinates of these facial landmarks and try to estimate the rational and translational vectors at the nose tip. Now, for an accurate estimate, we need to intrinsic parameters of the camera like focal length, optical center, and radial distortion parameters. We can estimate the former two and assume the last one is not present to make our work easier. After obtaining the required vectors we can project those 3D points on a 2D surface that is our image.
我们需要面部的六个点,即鼻尖,下巴,嘴唇的左右两端,以及左眼的左角和右眼的右角。 我们采用这些面部标志的标准3D坐标,并尝试估计鼻尖处的有理和平移向量。 现在,为了进行准确的估算,我们需要摄像机的固有参数,例如焦距,光学中心和径向畸变参数。 我们可以估计前两个,并假设不存在最后一个,以使我们的工作更轻松。 获得所需的矢量后,我们可以将这些3D点投影到我们的图像2D表面上。
If we only use the code available and find the angle with the x-axis we can obtain the result shown below.
如果仅使用可用代码并找到与x轴的角度,则可以获得以下所示的结果。
It works great for recording the head moving up and down but not moving left or right. So how to do that? Well, above we had seen an annotation box on the face. If we could utilize it somehow to measure the left and right movements.
它非常适合记录头部上下移动但不能左右移动的记录。 那么该怎么做呢? 好了,上面我们在脸上看到了一个注释框。 如果我们能以某种方式利用它来测量左右移动。
We can find the line in the middle of the two dark blue lines to act as our pointer and find the angle with the y-axis to find the angle of movement.
我们可以找到两条深蓝色线中间的线作为指针,并找到与y轴的角度以找到运动角度。
Combining both of them we can get the result in which direction we want. The complete code can also be found here at my GitHub repository along with various other sub-models for an online proctoring solution.
将它们两者结合在一起,我们可以得到想要的结果。 您也可以在我的GitHub 存储库中找到完整的代码,以及用于在线监理解决方案的其他各种子模型。
On testing it on an i5 processor, even with displaying the image I was able to get a healthy 6.76 frame per seconds whereas the facial landmark detection model takes only 0.05 seconds to find them.
在i5处理器上进行测试时,即使显示图像,我也可以每秒获得6.76帧的健康图像,而面部标志检测模型仅需0.05秒即可找到它们。
Now that we have created a head-pose detector you might want to make an eye gaze tracker then you can have a look at this article:
既然我们已经创建了一个头姿势检测器,您可能想要制作一个眼动追踪器,那么您可以看一下这篇文章:
翻译自: https://towardsdatascience.com/real-time-head-pose-estimation-in-python-e52db1bc606a
python 姿势估计