怎样理解电脑评分
Facial Expression Recognition with PyTorch and Deep Learning using different Image Classification Techniques.
使用PyTorch和深度学习使用不同的图像分类技术进行面部表情识别 。
To answer the above question, we need to understand how computers actually see and build intuitions about what they see. But computers can’t see! Since they don’t have actual eyes, but there are tools and methods to help computers “see” things like humans can. The central component of all computer vision is images (with videos you work with a bunch of images). Computers see images as millions of individual pixels each with color and alpha values. There are lots of libraries and tools with which we can make computers see, some of the most popular ones are OpenCV, TensorFlow, PyTorch, MatLab, SimpleCV, SciPy, and NumPy.
要回答上述问题,我们需要了解计算机实际如何看待并建立关于它们所见事物的直觉。 但是计算机看不到! 由于他们没有实际的眼睛,但是有一些工具和方法可以帮助计算机像人类一样“看到”事物。 所有计算机视觉的中心组件都是图像(对于视频,您需要处理大量图像)。 计算机将图像视为数百万个单独的像素,每个像素都有颜色和alpha值。 我们可以使用许多库和工具来使计算机看到它们,其中一些最受欢迎的库是OpenCV,TensorFlow,PyTorch,MatLab,SimpleCV,SciPy和NumPy。
In this blog, I will try to build a Deep Learning Image Classification Model which will be able to recognise the facial expression of human faces.
在此博客中,我将尝试构建一个深度学习图像分类模型,该模型将能够识别人脸的面部表情。
方法: (Approach:-)
- Finding the dataset. 查找数据集。
- Preparation/Cleaning/Pre-Processing of the dataset. 数据集的准备/清洁/预处理。
- Experimenting with different models. 试用不同的模型。
- Comparing all the model's training time and accuracy. 比较所有模型的训练时间和准确性。
- Finalizing our model for deployment. 最终确定我们的部署模型。
Disclaimer: If you are too lazy to read all through the blogs, I have prepared a short and clear conclusion of whatever I have done in this whole process you can directly jump to the Conclusion section of this blog to get a rough idea about all of my approaches and their boundaries.
免责声明 :如果您懒得阅读所有博客,那么我将为我在整个过程中所做的一切准备一个简短明了的结论,您可以直接跳至该博客的“ 结论”部分以大致了解所有内容。我的方法及其边界。
1 。 查找/浏览数据集: (1. Finding/Exploring the dataset:)
Since some of the datasets were too large and some of them were not that easy to work with for any beginner in DL Field, it took me almost one day to find this dataset. It contains 3 columns emotions
, pixels
and Usage
and 35,887 rows.
Where emotions columns contained labels from 0–7
as anger
, disgust
, fear
, happiness
, neutral
, sadness
, and surprise
respectively. But I found two minor problems i.e., the images in the pixels
columns were in a string format and the whole dataset was in .csv
format and was not that easy to work for any beginner, so I created my own dataset from it.
由于某些数据集太大,而对于DL Field的任何初学者来说,其中一些数据都不那么容易使用,因此我花了几乎一天的时间才能找到该数据集 。 它包含3列emotions
, pixels
和Usage
以及35887 rows.
“情绪”列中包含从0–7
到7的标签,分别是anger
, disgust
, fear
, happiness
, neutral
, sadness
和surprise
。 但是我发现了两个小问题,即, pixels
列中的图像是字符串格式,整个数据集都是.csv
格式,对于任何初学者来说都不那么容易工作,因此我从中创建了自己的数据集。
2. 数据集的准备: (2. Preparation of the dataset:)
First of all, we need to import some of the libraries required to convert a string of pixels to an image of a .png
format.
首先,我们需要导入一些将像素字符串转换为.png
格式的图像所需的库。
Imports
进口货
The
os
library will help us in creating directories for our different class images, and also it will help us to move through directories and create or delete directories.os
库将帮助我们为不同的类映像创建目录,还将帮助我们在目录之间移动以及创建或删除目录。pandas
andnumpy
libraries help us in working with the different file formats and performing certain mathematical operations.pandas
和numpy
库可帮助我们处理不同的文件格式并执行某些数学运算。PIL
library helps us in working with images in Python.PIL
库可帮助我们使用Python处理图像。cv2
is a library which can give vision to computer i.e., it helps us to work with the device camera.cv2
是一个库,可以为计算机提供视觉效果,它可以帮助我们使用设备相机。
2.1 Defining a function to convert any pixels of string to image.png
2.1 定义将字符串的任何像素转换为 image.png
- Function Definition: 功能定义:
So, with the help of the above function, we just need to provide the label of the image i.e., 0–1
and it will convert all the pixels to an image matching with the condition provided inside the function. If you would like to have a look at the entire source code, You can find it here -> convert-pixels-to-images-in-one-go
因此,借助上述功能,我们只需要提供图像的标签(即0–1
,它将根据功能内部提供的条件将所有像素转换为匹配的图像。 如果您想看一下整个源代码,可以在这里找到它-> 将像素一次转换为图像
We have completed our 10% task towards our final goal.
我们已经完成了实现最终目标的10%任务。
3.试用图像分类模型和技术: (3. Experimenting with Image classification models and Techniques:)
We will be experimenting with 4 different kinds of model:
我们将尝试4种不同的模型:
- Logistic Regression Approach 逻辑回归法
- Feed-Forward Neural Network(FNN) 前馈神经网络(FNN)
- Convolutional Neural Network(CNN) 卷积神经网络
- Convolutional NN with Data Augmentation & Regularization Technique 带数据增强和正则化技术的卷积神经网络
3.1 Logistic Regression Approach for Image Classification
3.1 图像分类的Logistic回归方法
Logistic regression is used to calculate the probability of a binary event occurring, and to deal with issues of classification. There are three different kinds of Logistic regression: Binary logistic regression, Multinomial logistic regression, and Ordinal logistic regression. In our case, we’ll be using Multinomial logistic regression. So, let’s start by importing the required libraries and packages.
Logistic回归用于计算二进制事件发生的概率,并处理分类问题。 有三种不同的Logistic回归: 二元Logistic回归,多项式Logistic回归和序数Logistic回归。 在我们的例子中,我们将使用多项逻辑回归。 因此,让我们从导入所需的库和包开始。
进口: (Imports:)
So, now let’s understand the dataset directory and file structures.
因此,现在让我们了解数据集目录和文件结构。
But before moving towards our codes, do make sure that you click on the New Notebook
option of the dataset page i.e., here. In this way, we don't have to download the dataset. Also, we are working on Kaggle Kernels too.
但是在转向我们的代码之前,请确保您单击数据集页面的New Notebook
选项,即此处 。 这样,我们不必下载数据集。 另外,我们也在研究Kaggle内核。
Our dataset contains two dir
and one file
, with 7 classes as [‘anger’, ‘fear’, ‘surprise’, ‘happiness’, ‘sadness’, ‘neutral’, ‘disgust’].
我们的数据集包含两个dir
和一个file
,具有7个类别,例如['anger','fear','surprise','happiness','sadness','neutral','disgust']。
Can we know the number of images in each class? Yes.
我们可以知道每个类别中的图像数量吗? 是。
We can see that class happiness
contains the highest number of images and class disgust
the lowest.
我们可以看到,阶级happiness
包含最多的图像,而阶级disgust
包含的图像最少。
Creating the dataset
variable and converting each image to Tensors:
创建 dataset
变量并将每个图像转换为张量:
Now, after converting each image to Tensors, let’s have a look at a single image.
现在,将每个图像转换为张量后,让我们看一个图像。
The first line of output signifies that our image is of 48*48 pixels and contains 3 channels i.e., RGB. But since our images are B/W, I didn’t think of converting them to channel 1 because in some cases images with 3 channels tend to work better with certain models.
输出的第一行表示我们的图像为48 * 48像素,包含3个通道,即RGB。 但是由于我们的图像是黑白图像,所以我没有考虑将它们转换为通道1,因为在某些情况下,具有3个通道的图像在某些模型上往往效果更好。
But how are we going to visualize the images, will it not be a problem to view images in Tensors? So let's visualize them with the help of plt.imshow
Finallyfunction from matplotlib
library.
但是我们将如何可视化图像,以张量查看图像会不会有问题? 因此,让我们借助matplotlib
库中的plt.imshow
Finalfunction可视化它们。
Function to plot the pixels or Tensors:
绘制像素或张量的功能:
One thing to note from the above code is .permute(1,2,0)
since the channel of the images is at first position but plt.imshow()
requires the channel to be at the last position. so we use the .permute()
function to do the required changes.
上面的代码要注意的一件事是.permute(1,2,0)
因为图像的通道位于第一个位置,但是plt.imshow()
要求通道位于最后一个位置。 因此,我们使用.permute()
函数进行所需的更改。
Now it’s time to view an image:
现在是时候查看图像了:
Preparing the training and test dataset:
准备训练和测试数据集 :
We have allotted 10% of the dataset for the validation set. The dataset also contains an extra test set which we have allotted as test_ds.
我们已为验证集分配了10%的数据集。 数据集还包含一个额外的测试集,我们已将其分配为test_ds.
Data Loaders: Data loaders help us in loading our dataset into batches, it also shuffles the sets each time we load the data into the model. Else if we would not use data loaders we will end up having only the happiness class or the sad class in a particular batch of data.
数据加载器:数据加载器可帮助我们将数据集批量加载,每次将数据加载到模型中时,数据加载器也会对数据集进行混洗。 否则,如果我们不使用数据加载器,则最终在特定的一批数据中将只有幸福类或悲伤类。
Let us also have a look at a batch of data:
让我们还看一下一批数据:
So, now we have done all our tasks which are required before building any model.
因此,现在我们已经完成了构建任何模型之前所需的所有任务。
Building the Model
建立模型
Since we have built our model, we will require an accuracy
function to check the accuracy while training the model, an evaluate
function to evaluate our model each time we do some changes to find out which combination of hyperperParams gives the best result and also a fit
function is required to do training task, with the capability of changing the no. of epoch
andlr.
既然我们已经建立了模型,我们将需要一个accuracy
函数来在训练模型时检查准确性,每次执行一些更改以找出哪种hyperperParams组合都能提供最佳结果和fit
度时,就需要一个evaluate
函数来评估模型功能需要执行训练任务,并具有更改编号的功能。 epoch
和lr.
Defining the accuracy
, evaluate
and the fit
function.
定义 accuracy
, evaluate
和 fit
函数。
So far, we have defined our model and defined some of the useful functions which will be required to increase our performance of the model. So let’s have a look before the training how our model performs with the help of the evaluate
function which we just defined.
到目前为止,我们已经定义了模型并定义了一些有用的函数,这些函数将提高模型的性能。 因此,让我们在训练之前先看一下我们在刚刚定义的evaluate
函数的帮助下模型的性能。
We can see that without any training our model gives an accuracy of about 13% and with a very huge validation loss.
我们可以看到,未经任何训练,我们的模型的准确度约为13%,并且验证损失非常大。
We are now ready to start the training of the model. At first, we keep the epoch
as 45 and the lr
as 0.01 so that our model understands how the tensors and tries varying the weight and biases to figure out the best fit.
现在,我们准备开始模型的训练。 首先,我们将epoch
保留为45,将lr
保留为0.01,以便我们的模型了解张量,并尝试改变权重和偏差以找出最佳拟合。
We can see that the accuracy is fluctuating very much and that is good in this way the model will learn faster. Now its time to train the model again with epoch:
40 and the lr:
0.001.
我们可以看到,精度波动很大,这样可以很好地学习模型。 现在是时候再次训练模型了epoch:
40和lr:
0.001。
Our model has somewhat managed to get an accuracy of about 37%(Approx.). What about training the model for some more number of epoch
and changing the lr
to 0.001.
我们的模型在某种程度上设法获得了约37%(约)的精度。 训练模型以获得更多的epoch
并将lr
更改为0.001呢?
We can see that our model can be pushed to about 38% (approx) accuracy.
我们可以看到我们的模型可以推到大约38%(大约)的精度。
Finally, we can plot the performance of our model.
最后,我们可以绘制模型的性能。
Plotting Function:
绘图功能:
After that, we are done with defining the function to plot. Let’s plot and see the performance of our model.
之后,我们定义了要绘制的函数。 让我们绘制并查看模型的性能。
So, we see that after a certain number of epochs the accuracy starts to flatten out. Also, the loss stops decreasing.
因此,我们看到,经过一定数量的时间后,精度开始趋于平稳。 而且,损失停止减少。
Since we have done the training, let us evaluate the model once again and see the changes we get in the accuracy and the loss and compare them with the first evaluation we did.
既然我们已经完成了训练,就让我们再次评估模型,看看我们在准确性和损失方面得到的变化,并将其与我们进行的第一次评估进行比较。
So, we can see that our model has improved from val_acc
of 13% before the training to val_acc
of 39%(approx) accuracy after the training.
因此,我们可以看到我们的模型已从训练前的val_acc
提高到训练val_acc
的39%(大约)精度,提高了val_acc
的准确性。
Finally, we are in a position to make predictions with our trained model.
最后,我们可以使用我们训练有素的模型进行预测。
Prediction Function:
预测功能 :
Time to make our first prediction. Let’s Go
是时候做出我们的第一个预测了。 我们走吧
We got our first prediction right! Let’s make one more prediction.
我们正确地做出了第一个预测! 让我们再做一个预测。
Though our model is performing well in some cases, it fails in most.
尽管我们的模型在某些情况下表现良好,但大多数情况下都失败了。
So, here is a list of epochs and variations of HyperParams I tried but ended up till 39% (approx) accuracy only. If you are curious to see the entire source code for the logistic approach, you can find it here:- Source Code
因此,这是我尝试过的HyperParam的历时和变化列表,但最终仅达到39%(大约)的准确性。 如果您想查看物流方法的完整源代码,可以在这里找到:- 源代码
Finally, we are done with the logistic approach for classifying our images and managed to get an accuracy of about 39%(Approx.).Let us try out another approach i.e., Feed-Forward Neural Network(FNN), and see if we can boost the accuracy score by some more.
最后,我们使用逻辑方法对图像进行分类,并设法获得约39%(约)的精度。让我们尝试另一种方法,即前馈神经网络(FNN),看看是否可以进一步提高准确性得分。
3.2 Feed-Forward Neural Network(FNN) approach for Image classification.
3.2前馈神经网络(FNN)图像分类方法。
Feed-forward neural networks are the most popular and most widely used models in many practical applications. They are known by many different names, such as ‘multilayer perceptrons’ (MLP). A feed-forward neural network is a biologically inspired classification algorithm. It consists of a number of simple neuron-like processing units, organized in layers, and every unit in a layer is connected with all the units in the previous layer.This is how a feed-forward NN looks like:
前馈神经网络是许多实际应用中最流行和使用最广泛的模型。 它们以许多不同的名称而闻名,例如“多层感知器”(MLP)。 前馈神经网络是一种生物学启发的分类算法。 它由许多简单的类神经元处理单元组成,这些处理单元按层组织,并且一层中的每个单元都与上一层中的所有单元连接在一起,这就是前馈NN的样子:
Most of the starting codes will be similar to that of the Logistic approach, but since feed-forward NN is a calculative approach, our model will require a GPU, for speeding up the calculations. For that we will need to introduce some more functions to load all our model and batch of data to the GPU. Also, we will need to make some changes to the code for the data loaders.
大多数起始代码将类似于Logistic方法,但是由于前馈NN是一种计算方法,因此我们的模型将需要GPU,以加快计算速度。 为此,我们将需要引入更多功能以将我们的所有模型和数据批量加载到GPU。 另外,我们将需要对数据加载器的代码进行一些更改。
Data Loaders:
数据加载器 :
Defining the Model:
定义模型 :
To improve upon Logistic regression we will implement Neural Network. And this is where the neural network comes into play after this our model becomes a neural network with no. of layer
hidden layer.
为了改进Logistic回归,我们将实现神经网络。 这就是在我们的模型变为no. of layer
后的神经网络之后神经网络开始起作用的地方no. of layer
no. of layer
隐藏层。
input_size = 48*48*3
num_classes = 7
We stored our dimensions of images and the number of output classes. Let’s store our class in the model
variable.
我们存储了图像的尺寸和输出类的数量。 让我们将类存储在model
变量中。
model = Facial_Recog_Model(input_size, out_size = num_classes)
So, for using a GPU we will need to write some codes to load our data to the GPU, also in the case of PyTorch only NVidia GPU is supported.
因此,对于使用GPU,我们将需要编写一些代码将数据加载到GPU,同样在PyTorch的情况下,仅支持NVidia GPU。
Let’s define a function that will move our data to the GPU if found.
让我们定义一个函数,如果找到该函数,它将把我们的数据移至GPU。
So, now we can move all our data to the GPU
因此,现在我们可以将所有数据移至GPU
Defining of the training Function i.e., fit
:
定义训练功能,即fit
:
Now, let's view the architecture of our model an see the number of hidden layers and their inputs and outputs.
现在,让我们查看模型的架构,并查看隐藏层的数量及其输入和输出。
Finally, we are in a position to train our model and after training the model, let’s visualize the performance of our FNN model.
最后,我们可以训练我们的模型,并且在训练了模型之后,让我们可视化FNN模型的性能。
We see that the loss after decreasing for a certain time starts increasing, and this is due to the overfitting of our model.
我们看到,减少一定时间后的损失开始增加,这是由于我们模型的过度拟合造成的。
Final Evaluation:
最终评估 :
We can see that with the logistic approach we were able to attain a maximum accuracy of around 39% but with the FNN approach we are able to attain an accuracy of about 47% (approx). If you are curious to see the entire source code for the Feed-Forward NN, you can find it here:- Source Code FNNHere is a list of all the experiments I have tried with this Model.
我们可以看到,使用逻辑方法可以达到大约39%的最大精度,但是使用FNN方法可以达到大约47%(大约)的精度。 如果您想查看前馈NN的整个源代码,可以在这里找到:- 源代码FNN这是我使用此模型尝试过的所有实验的列表。
Finally, we are done with the Feed-Forward Neural Network approach for classifying our images and managed to boost the accuracy to about 47%(approx). Now, let us try out another approach i.e., Convolutional Neural Network(CNN), and see if we can boost the accuracy score by some more.
最后,我们使用前馈神经网络方法对图像进行分类,并设法将准确性提高到约47%(约)。 现在,让我们尝试另一种方法,即卷积神经网络(CNN),看看是否可以进一步提高准确性得分。
3.3 Convolutional Neural Network(CNN) Approach to Image classification.
3.3 卷积神经网络(CNN)的图像分类方法。
A convolutional neural network, or CNN, is a subset of deep learning and neural networks most commonly used to analyze visual imagery. Compared to other image classification algorithms, convolutional neural networks use minimal preprocessing, meaning the network learns the filters that typically are hand-engineered in other systems. Because CNN operates with such independence from human effort, they offer many advantages over alternative algorithms.This is how a CNN Model works:
卷积神经网络(CNN)是深度学习和神经网络的子集,是最常用于分析视觉图像的一个子集。 与其他图像分类算法相比,卷积神经网络使用最少的预处理,这意味着该网络可以学习通常在其他系统中人工设计的过滤器。 由于CNN无需人工干预即可操作,因此与替代算法相比它们具有许多优势,这就是CNN模型的工作原理:
In our previous approach i.e., FNN, we defined a deep neural network with fully-connected layers using nn.Linear
. For this we will use a convolutional neural network, using the nn.Conv2d
class from PyTorch. The 2D convolution is a fairly simple operation at heart: you start with a kernel, which is simply a small matrix of weights. This kernel “slides” over the 2D input data, performing an element-wise multiplication with the part of the input it is currently on, and then summing up the results into a single output pixel.
在我们以前的方法(即FNN)中,我们使用nn.Linear
定义了具有完全连接的层的深度神经网络。 为此,我们将使用来自PyTorch的nn.Conv2d
类的卷积神经网络。 2D卷积本质上是一个非常简单的操作:从内核开始,内核只是一个小的权重矩阵。 该内核在2D输入数据上“滑动”,对其当前处于打开状态的输入部分执行逐元素乘法,然后将结果求和成单个输出像素。
Most of our codes for CNN will be similar to FNN, except some. In case of CNN we need to define a function apply_kernel
to perform the convolutions among its layers.So let’s define it:
除了某些代码外,我们大多数的CNN代码将类似于FNN。 如果是CNN,我们需要定义一个函数apply_kernel
在其各层之间进行卷积,因此让我们对其进行定义:
Model Definition:
型号定义 :
We can have a look at all the layers and their input and the output sizes:
我们可以看一下所有图层及其输入和输出大小:
Now, we will need to define the fit
function and the evaluate.
Since this is quite similar to the above approach, I will not embed it here. If interested to see the entire source code you can find it here: Code
现在,我们将需要定义fit
函数和evaluate.
由于这与上述方法非常相似,因此我不会在这里嵌入它。 如果有兴趣查看完整的源代码,可以在这里找到: 代码
Without any training let us evaluate our CNN Model:
未经任何培训,让我们评估我们的CNN模型:
After training for certain epoch
with varying lr
I finally managed to get an accuracy of about 56%(approx).
在用不同的lr
训练了某个epoch
,我最终设法获得了大约56%(大约)的精度。
We can visualize everything we did with the CNN Model.Visualization:
我们可以可视化我们使用CNN模型所做的一切。 可视化
The loss
after sometime tends to increase, which is known as overfitting, but in our case this much of overfitting is Okay.
有时loss
会增加,这称为过度拟合,但在我们的案例中,这种过度拟合的数量还可以。
I have tested by doing some variations with the model and tried to push the accuracy by some more margin.You can have a look at all the different hyperParams here:
我通过对模型进行一些测试来进行测试,并尝试将精度提高一些。您可以在这里查看所有不同的hyperParam:
Hence, we can see that with the FNN approach we were able to attain a maximum accuracy of around 47% but with the CNN approach we are able to attain an accuracy of about 57% (approx).
因此,我们可以看到,使用FNN方法,我们可以达到大约47%的最大精度,但是使用CNN方法,我们可以达到大约57%(大约)的精度。
Now, let us also test our model’s Prediction with CNN:
现在,让我们还使用CNN测试模型的预测:
Predictions:So, for this, first of all, we will need to load our test
dataset,
预测 :因此,为此,首先,我们需要加载test
数据集,
Finally, let's evaluate our model after the training:
最后,让我们在训练后评估我们的模型:
Finally, we are done with the CNN approach for classifying our images and managed to get an accuracy of about 57% (approx).Let us try out one last approach i.e., Convolutional NN with Data Augmentation & Regularization Technique, and see if we can boost the accuracy score by some more.
最后,我们使用CNN方法对图像进行分类,并设法获得约57%的准确度。让我们尝试最后一种方法,即具有数据增强和正则化技术的卷积神经网络,看看是否可以进一步提高准确性得分。
3.4 Convolutional NN with Data Augmentation & Regularization Technique Approach
3.4 数据增强与正则化技术的卷积神经网络方法
In this approach, we try to use pre-trained models with little tweaks in the actual model. This are some changes we make in this approach:-
在这种方法中,我们尝试在实际模型中使用很少调整的预训练模型。 这是我们在此方法中所做的一些更改:-
Use test set for validation: Instead of setting aside a fraction (e.g. 10%) of the data from the training set for validation, we’ll simply use the test set as our validation set. This just gives a little more data to train with. In general, once you have picked the best model architecture & hyperparameters using a fixed validation set, it is a good idea to retrain the same model on the entire dataset just to give it a small final boost in performance.
使用测试集进行验证 :我们无需从训练集中预留一小部分数据(例如10%)进行验证,而是将测试集用作验证集。 这只是提供了更多的数据进行训练。 通常,一旦使用固定的验证集选择了最佳的模型体系结构和超参数,则最好在整个数据集上重新训练同一模型,以使其最终获得很小的性能提升。
Channel-wise data normalization: We will normalize the image tensors by subtracting the mean and dividing by the standard deviation across each channel. As a result, the mean of the data across each channel is 0, and standard deviation is 1. Normalizing the data prevents the values from any one channel from disproportionately affecting the losses and gradients while training, simply by having a higher or wider range of values than others.
逐通道数据标准化 : 我们将通过减去均值并除以每个通道的标准偏差来标准化图像张量。 结果,每个通道上的数据平均值为0,标准偏差为1。对数据进行归一化可以防止在训练时来自任何一个通道的值不成比例地影响损耗和梯度,只需通过更高或更宽的范围即可。比其他价值观。
Randomized data augmentations: We will apply randomly chosen transformations while loading images from the training dataset. You can view more about transformations here.
随机数据增强 : 从训练数据集中加载图像时,我们将应用随机选择的变换。 您可以 在此处 查看有关转换的更多信息 。
In our case, we are going to use ResNet9, ResNet34, ResNet, and ResNet50 to find out which one fits the best.
在我们的案例中,我们将使用ResNet9,ResNet34,ResNet和ResNet50来找出最合适的一个。
Most of the codes of this approach are similar to CNN the main critical change we make is here:
此方法的大多数代码与CNN相似,我们在此处进行的主要关键更改是:
Data Transformations:
数据转换 :
Also, in this approach, we don’t have any test_ds, we convert the test_ds
to val_ds
. So we end up having train_ds
and val_ds
only.
同样,在这种方法中,我们没有任何test_ds,我们将test_ds
转换为val_ds
。 因此,我们最终只拥有train_ds
和val_ds
。
Now, one of the key changes to our CNN model this time is the addition of the residual block, which adds the original input back to the output feature map obtained by passing the input through one or more convolutional layers.
现在,这次CNN模型的关键更改之一是添加了残差块,该残差块将原始输入添加回通过将输入穿过一个或多个卷积层而获得的输出特征图。
Model Definition:
型号定义:
For ResNet9:
对于ResNet9 :
Now if we need to work on a pre-trained model then we will need to make certain changes to it, like changing the number of input and output.So, for this let us define a function:
现在,如果我们需要使用预训练的模型,则需要对其进行某些更改,例如更改输入和输出的数量,因此,我们为此定义一个函数:
Training:
培训内容 :
Before we train the model, we will make some small changes:
在训练模型之前,我们将进行一些小的更改:
Now that we are done with the model definition, so let’s evaluate our model before we start the training.
既然我们已经完成了模型定义,那么让我们在开始训练之前评估模型。
Also, in this technique, we need to set some of the hyperParams too before the training.
同样,在这项技术中,我们也需要在训练之前设置一些hyperParams。
After the training, let us visualize the model performance:
训练后,让我们可视化模型的性能:
Visualization:
可视化:
We can also plot the lr’s:
我们还可以绘制lr:
So finally, we are done with this model, and come to conclusion that this model is far better than all the previous models. Since in less time it managed to give higher accuracy. If you are interested to see the entire source code: ResNet Code
因此,最后,我们完成了此模型,并得出结论,该模型比以前的所有模型都好得多。 因为它用更少的时间就能提供更高的精度。 如果您有兴趣查看完整的源代码: ResNet代码
I have tweaked some of the hyperParams and experimented with the model, You can see in this list:
我已经调整了一些hyperParams并对该模型进行了实验,您可以在此列表中看到:
We can see that within “10min 57s” this model was able to get the accuracy to 67%(approx) which is best till now of all the approaches tried.You can find the link to the whole Source code for this approach here: ResNet Codes. With this we come to the end of our Facial Expression Recognition Proejct.
我们可以看到,在“ 10min 57s”内,该模型能够将精度提高到67%(大约),这是迄今为止所有尝试方法中最好的。您可以在此处找到该方法的整个源代码的链接: ResNet代码 。 这样,我们就结束了面部表情识别程序。
结论: (Conclusion:)
First, we cleaned the dataset, did some pre-processing, and managed to make our own dataset, which is published on Kaggle.
首先,我们清理了数据集,进行了一些预处理,并设法制作了自己的数据集,该数据集已发布在Kaggle上。
Dataset: Kaggle Dataset Page
数据集 : Kaggle数据集页面
Kaggle: All Notebooks
Kaggle : 所有笔记本
Github Repo: Containing all the codes
Github Repo : 包含所有代码
Then we tried 4 different approaches to solve our problem:
然后,我们尝试了4种不同的方法来解决我们的问题:
Logistic Regression:
Logistic回归 :
—
-
val_loss:
1.62971val_loss:
1.62971—
-
val_acc:
38%(Approx.)val_acc:
38%(大约)— Notebook:
—笔记本:
Link
链接
Feed-Forward Neural Network(FNN): —
val_loss:
1.61005前馈神经网络(FNN): —
val_loss:
1.61005—
-
val_acc:
47%(Approx.)val_acc:
47%(大约)— Notebook:
—笔记本:
Link
链接
Convolutional Neural Network(CNN): —
val_loss:
1.44248卷积神经网络(CNN): —值
val_loss:
1.44248—
-
val_acc:
57%(Approx.)val_acc:
57%(大约)—
-
train_loss:
0.61862train_loss:
0.61862— Notebook:
—笔记本:
Link
链接
CNN with Data Augmentation & Regularization Technique: —
val_loss:
0.93374具有数据增强和正则化技术的CNN: —
val_loss:
0.93374—
-
val_acc:
65%(Approx.)val_acc:
65%(大约)—
-
train_loss:
0.73313train_loss:
0.73313— Notebook:
—笔记本:
Link
链接
So, in the end we come to the conclusion that CNN with Data Aug. & Reg. Techniques we can attain a good accuracy score, and also if we can figure out some good hyperParams and transformation combinations we may be able to attain an accuracy score of about (70–75)% and deploy our model to even production.
因此,最后我们得出的结论是CNN与Data Aug.&Reg。 我们可以获得较高的准确性得分的技术,而且如果我们可以找到一些好的hyperParams和转换组合,我们也许可以取得大约(70–75)%的准确性得分,并将我们的模型部署到生产中。
Lastly, I would like to thank Aakash N S Sir for providing us with this course, since the past 6 weeks. I didn’t have any idea about Deep Learning, and today I have successfully managed to build my own model.
最后,我要感谢Aakash NS先生,自过去6周以来为我们提供了此课程 。 我对深度学习一无所知,今天我已经成功地建立了自己的模型。
If you have made it till here, you can suggest some more techniques or approaches that I should try. If would like to give any feedback, you can connect me on various platforms:
如果您到这里为止,都可以提出更多我应该尝试的技术或方法。 如果想提供任何反馈,可以在各种平台上与我联系:
Kaggle: manishshah120
Kaggle : manishshah120
LinkedIn: manishshah120
领英( LinkedIn): manishshah120
Twitter: manishshah120
推特: manishshah120
GitHub: manishshah120
GitHub上的manishshah120
翻译自: https://medium.com/jovianml/can-computers-understand-our-emotions-c296ddf5f23f
怎样理解电脑评分