从头学习计算机网络_如何从头开始构建三层神经网络

从头学习计算机网络

by Daphne Cornelisse

达芙妮·康妮莉丝(Daphne Cornelisse)

如何从头开始构建三层神经网络 (How to build a three-layer neural network from scratch)

In this post, I will go through the steps required for building a three layer neural network. I’ll go through a problem and explain you the process along with the most important concepts along the way.

在这篇文章中,我将通过建立一个三层神经网络所需的步骤 我将解决一个问题,并向您解释整个过程以及最重要的概念。

要解决的问题 (The problem to solve)

A farmer in Italy was having a problem with his labelling machine: it mixed up the labels of three wine cultivars. Now he has 178 bottles left, and nobody knows which cultivar made them! To help this poor man, we will build a classifier that recognizes the wine based on 13 attributes of the wine.

意大利的一位农民的贴标机出现了问题:它混淆了三个葡萄酒品种的标签。 现在他只剩下178瓶了,没人知道是哪个品种栽培的! 为了帮助这个可怜的人,我们将建立一个分类器 ,该分类器基于葡萄酒的13个属性来识别葡萄酒。

The fact that our data is labeled (with one of the three cultivar’s labels) makes this a Supervised learning problem. Essentially, what we want to do is use our input data (the 178 unclassified wine bottles), put it through our neural network, and then get the right label for each wine cultivar as the output.

我们的数据被标记(带有三个品种的标签之一)的事实使这成为监督学习问题。 本质上,我们想要做的是使用我们的输入数据(178个未分类的葡萄酒瓶),通过我们的神经网络对其进行输入 ,然后为每个葡萄酒品种获得正确的标签作为输出。

We will train our algorithm to get better and better at predicting (y-hat) which bottle belongs to which label.

我们将训练我们的算法,使其越来越好地预测(y-hat)哪个瓶子属于哪个标签。

Now it is time to start building the neural network!

现在是时候开始构建神经网络了!

方法 (Approach)

Building a neural network is almost like building a very complicated function, or putting together a very difficult recipe. In the beginning, the ingredients or steps you will have to take can seem overwhelming. But if you break everything down and do it step by step, you will be fine.

建立一个神经网络几乎就像建立一个非常复杂的函数,或者将一个非常困难的配方组合在一起。 在开始时,您将要采取的成分或步骤似乎不堪重负。 但是,如果您将所有内容分解并逐步进行,则可以。

In short:

简而言之:

  • The input layer (x) consists of 178 neurons.

    输入层(x)由178个神经元组成。
  • A1, the first layer, consists of 8 neurons.

    第一层A1由8个神经元组成。
  • A2, the second layer, consists of 5 neurons.

    第二层A2由5个神经元组成。
  • A3, the third and output layer, consists of 3 neurons.

    A3,即第三层和输出层,由3个神经元组成。
步骤1:通常的准备 (Step 1: the usual prep)

Import all necessary libraries (NumPy, skicit-learn, pandas) and the dataset, and define x and y.

导入所有必要的库(NumPy,skicit-learn,pandas)和数据集,并定义x和y。

#importing all the libraries and dataset
import pandas as pdimport numpy as np
df = pd.read_csv('../input/W1data.csv')df.head()
# Package imports
# Matplotlib import matplotlibimport matplotlib.pyplot as plt
# SciKitLearn is a machine learning utilities libraryimport sklearn
# The sklearn dataset module helps generating datasets
import sklearn.datasetsimport sklearn.linear_modelfrom sklearn.preprocessing import OneHotEncoderfrom sklearn.metrics import accuracy_score
步骤2:初始化 (Step 2: initialization)

Before we can use our weights, we have to initialize them. Because we don’t have values to use for the weights yet, we use random values between 0 and 1.

在使用权重之前,我们必须对其进行初始化。 由于我们尚无权重值,因此我们使用0到1之间的随机值。

In Python, the random.seed function generates “random numbers.” However, random numbers are not truly random. The numbers generated are pseudorandom, meaning the numbers are generated by a complicated formula that makes it look random. In order to generate numbers, the formula takes the previous value generated as its input. If there is no previous value generated, it often takes the time as a first value.

在Python中, random.seed函数生成“随机数”。 但是,随机数并不是真正的随机数。 生成的数字是伪随机数 ,这意味着这些数字是由复杂的公式生成的,该公式使它看起来是随机的。 为了生成数字,该公式将先前生成的值作为其输入。 如果没有以前的值生成,则通常将时间作为第一个值。

That is why we seed the generator — to make sure that we always get the same random numbers. We provide a fixed value that the number generator can start with, which is zero in this case.

这就是我们为生成器提供种子的原因-确保我们始终获得相同的随机数 。 我们提供一个固定的值,数字生成器可以从该值开始,在这种情况下为零。

np.random.seed(0)
步骤3:向前传播 (Step 3: forward propagation)

There are roughly two parts of training a neural network. First, you are propagating forward through the NN. That is, you are “making steps” forward and comparing those results with the real values to get the difference between your output and what it should be. You basically see how the NN is doing and find the errors.

训练神经网络大约有两个部分。 首先,您正在通过NN向前传播。 也就是说,您正在“逐步”并将这些结果与实际值进行比较,以获取输出与实际值之间的差异。 您基本上可以看到NN的运行情况并找到错误。

After we have initialized the weights with a pseudo-random number, we take a linear step forward. We calculate this by taking our input A0 times the dot product of the random initialized weights plus a bias. We started with a bias of 0. This is represented as:

用伪随机数初始化权重后,我们向前线性迈进了一步。 我们通过将输入值A0乘以随机初始化权重的点积偏差来计算 。 我们以0的偏差开始。表示为:

Now we take our z1 (our linear step) and pass it through our first activation function. Activation functions are very important in neural networks. Essentially, they convert an input signal to an output signal — this is why they are also known as Transfer functions. They introduce non-linear properties to our functions by converting the linear input to a non-linear output, making it possible to represent more complex functions.

现在我们采取z1(线性步长)并将其传递给第一个激活函数 。 激活函数在神经网络中非常重要。 本质上,它们将输入信号转换为输出信号-这就是为什么它们也被称为传递函数的原因。 通过将线性输入转换为非线性输出,它们将非线性属性引入了我们的函数,从而可以表示更复杂的函数。

There are different kinds of activation functions (explained in depth in this article). For this model, we chose to use the tanh activation function for our two hidden layers — A1 and A2 — which gives us an output value between -1 and 1.

有各种不同的激活功能(在深度在解释这个文章)。 对于此模型,我们选择对两个隐藏层A1和A2使用tanh激活函数,从而为我们提供介于-1和1之间的输出值。

Since this is a multi-class classification problem (we have 3 output labels), we will use the softmax function for the output layer — A3 — because this will compute the probabilities for the classes by spitting out a value between 0 and 1.

由于这是一个多类分类问题 (我们有3个输出标签),因此我们将softmax函数用于输出层A3,因为这将通过吐出0到1之间的值来计算类的概率。

By passing z1 through the activation function, we have created our first hidden layer — A1 — which can be used as input for the computation of the next linear step, z2.

通过将z1传递到激活函数,我们创建了我们的第一个隐藏层A1,可用作下一个线性步骤z2的计算输入。

In Python, this process looks like this:

在Python中,此过程如下所示:

# This is the forward propagation functiondef forward_prop(model,a0):        # Load parameters from model    W1, b1, W2, b2, W3, b3 = model['W1'], model['b1'], model['W2'], model['b2'], model['W3'],model['b3']        # Do the first Linear step     z1 = a0.dot(W1) + b1        # Put it through the first activation function    a1 = np.tanh(z1)        # Second linear step    z2 = a1.dot(W2) + b2        # Put through second activation function    a2 = np.tanh(z2)        #Third linear step    z3 = a2.dot(W3) + b3        #For the Third linear activation function we use the softmax function    a3 = softmax(z3)        #Store all results in these values    cache = {'a0':a0,'z1':z1,'a1':a1,'z2':z2,'a2':a2,'a3':a3,'z3':z3}    return cache

In the end, all our values are stored in the cache.

最后,我们所有的值都存储在cache中 。

步骤4:向后传播 (Step 4: backwards propagation)

After we forward propagate through our NN, we backward propagate our error gradient to update our weight parameters. We know our error, and want to minimize it as much as possible.

在通过NN向前传播之后,我们向后传播误差梯度以更新权重参数。 我们知道我们的错误,并希望将其最小化。

We do this by taking the derivative of the error function, with respect to the weights (W) of our NN, using gradient descent.

我们通过使用梯度下降相对于我们的NN的权重(W)取误差函数导数来实现

Lets visualize this process with an analogy.

让我们以类比的方式可视化此过程。

Imagine you went out for a walk in the mountains during the afternoon. But now its an hour later and you are a bit hungry, so it’s time to go home. The only problem is that it is dark and there are many trees, so you can’t see either your home or where you are. Oh, and you forgot your phone at home.

想象一下,您下午在山上散步。 但是现在一个小时后,您有点饿了,是时候回家了。 唯一的问题是它很暗,有很多树,所以您看不到自己的房屋或所在的位置。 哦,您忘记了家里的电话。

But then you remember your house is in a valley, the lowest point in the whole area. So if you just walk down the mountain step by step until you don’t feel any slope, in theory you should arrive at your home.

但是,您还记得自己的房子在山谷中,这是整个地区的最低点。 因此,如果您只是一步一步走下山直到感觉不到任何坡度,理论上您应该回到家中。

So there you go, step by step carefully going down. Now think of the mountain as the loss function, and you are the algorithm, trying to find your home (i.e. the lowest point). Every time you take a step downwards, we update your location coordinates (the algorithm updates the parameters).

因此,您可以逐步仔细地进行下去。 现在将山视为损失函数,您就是算法,试图找到您的房屋(即最低点 )。 每次您向下移动时,我们都会更新您的位置坐标(算法会更新参数 )。

The loss function is represented by the mountain. To get to a low loss, the algorithm follows the slope — that is the derivative — of the loss function.

损失函数由山表示。 为了达到低损耗,该算法遵循损耗函数的斜率(即导数)。

When we walk down the mountain, we are updating our location coordinates. The algorithm updates the parameters of the neural network. By getting closer to the minimum point, we are approaching our goal of minimizing our error.

当我们走下山时,我们正在更新位置坐标。 该算法更新神经网络的参数。 通过接近最低点,我们正在实现将错误最小化的目标

In reality, gradient descent looks more like this:

实际上,梯度下降看起来更像这样:

We always start with calculating the slope of the loss function with respect to z, the slope of the linear step we take.

我们总是从计算损耗函数相对于z的斜率开始,z是我们采取的线性步长的斜率。

Notation is as follows: dv is the derivative of the loss function, with respect to a variable v.

表示法如下:dv是损失函数相对于变量v的导数。

Next we calculate the slope of the loss function with respect to our weights and biases. Because this is a 3 layer NN, we will iterate this process for z3,2,1 + W3,2,1 and b3,2,1. Propagating backwards from the output to the input layer.

接下来,我们计算损失函数相对于我们的权重和偏差的斜率 。 由于这是3层NN,因此我们将迭代z3,2,1 + W3,2,1和b3,2,1的过程。 从输出向后传播到输入层。

This is how this process looks in Python:

这是此过程在Python中的外观:

# This is the backward propagation functiondef backward_prop(model,cache,y):
# Load parameters from model    W1, b1, W2, b2, W3, b3 = model['W1'], model['b1'], model['W2'], model['b2'],model['W3'],model['b3']        # Load forward propagation results    a0,a1, a2,a3 = cache['a0'],cache['a1'],cache['a2'],cache['a3']        # Get number of samples    m = y.shape[0]        # Calculate loss derivative with respect to output    dz3 = loss_derivative(y=y,y_hat=a3)
# Calculate loss derivative with respect to second layer weights    dW3 = 1/m*(a2.T).dot(dz3) #dW2 = 1/m*(a1.T).dot(dz2)         # Calculate loss derivative with respect to second layer bias    db3 = 1/m*np.sum(dz3, axis=0)        # Calculate loss derivative with respect to first layer    dz2 = np.multiply(dz3.dot(W3.T) ,tanh_derivative(a2))        # Calculate loss derivative with respect to first layer weights    dW2 = 1/m*np.dot(a1.T, dz2)        # Calculate loss derivative with respect to first layer bias    db2 = 1/m*np.sum(dz2, axis=0)        dz1 = np.multiply(dz2.dot(W2.T),tanh_derivative(a1))        dW1 = 1/m*np.dot(a0.T,dz1)        db1 = 1/m*np.sum(dz1,axis=0)        # Store gradients    grads = {'dW3':dW3, 'db3':db3, 'dW2':dW2,'db2':db2,'dW1':dW1,'db1':db1}    return grads
步骤5:训练阶段 (Step 5: the training phase)

In order to reach the optimal weights and biases that will give us the desired output (the three wine cultivars), we will have to train our neural network.

为了达到最佳权重和偏见 ,这将为我们提供理想的产量(三个葡萄酒品种),我们将必须训练我们的神经网络。

I think this is very intuitive. For almost anything in life, you have to train and practice many times before you are good at it. Likewise, a neural network will have to undergo many epochs or iterations to give us an accurate prediction.

我认为这是非常直观的。 对于生活中几乎所有事情,您都必须多次训练和练习,然后再擅长于此。 同样,神经网络将必须经历许多时期或迭代才能为我们提供准确的预测。

When you are learning anything, lets say you are reading a book, you have a certain pace. This pace should not be too slow, as reading the book will take ages. But it should not be too fast, either, since you might miss a very valuable lesson in the book.

当您学习任何东西时,可以说您正在读书, 步伐一定。 这个步伐不应该太慢,因为阅读本书会花费很多时间。 但这也不应该太快,因为您可能会错过本书中非常有价值的一课。

In the same way, you have to specify a “learning rate” for the model. The learning rate is the multiplier to update the parameters. It determines how rapidly they can change. If the learning rate is low, training will take longer. However, if the learning rate is too high, we might miss a minimum. The learning rate is expressed as:

同样,您必须为模型指定一个“ 学习率 ”。 学习率是更新参数的乘数。 它决定了它们可以多快地改变。 如果学习率低,则培训将花费更长的时间。 但是,如果学习率太高,我们可能会错过最低水平。 学习率表示为:

  • := means that this is a definition, not an equation or proven statement.

    :=表示这是一个定义,而不是等式或经过证明的陈述。

  • a is the learning rate called alpha

    一个 学习率称为alpha

  • dL(w) is the derivative of the total loss with respect to our weight w

    dL(w)是相对于我们的体重w的总损失的导数

  • da is the derivative of alpha

    daalpha的导数

We chose a learning rate of 0.07 after some experimenting.

经过一些实验,我们选择了0.07的学习率。

# This is what we return at the endmodel = initialise_parameters(nn_input_dim=13, nn_hdim= 5, nn_output_dim= 3)model = train(model,X,y,learning_rate=0.07,epochs=4500,print_loss=True)plt.plot(losses)

Finally, there is our graph. You can plot your accuracy and/or loss to get a nice graph of your prediction accuracy. After 4,500 epochs, our algorithm has an accuracy of 99.4382022472 %.

最后,有我们的图。 您可以绘制准确度和/或损失的图,以获得预测准确度的良好图形。 在4,500个纪元后,我们的算法的准确度为99.4382022472%。

简要总结 (Brief summary)

We start by feeding data into the neural network and perform several matrix operations on this input data, layer by layer. For each of our three layers, we take the dot product of the input by the weights and add a bias. Next, we pass this output through an activation function of choice.

我们首先将数据馈入神经网络,然后逐层对该输入数据执行几个矩阵运算。 对于我们的三层中的每一层,我们将输入的点乘积乘以权重并添加一个偏差。 接下来,我们通过选择的激活函数传递此输出。

The output of this activation function is then used as an input for the following layer to follow the same procedure. This process is iterated three times since we have three layers. Our final output is y-hat, which is the prediction on which wine belongs to which cultivar. This is the end of the forward propagation process.

然后,此激活功能的输出将用作下一层的输入,以遵循相同的过程。 由于我们分为三层,因此此过程重复了三遍。 我们的最终输出是y-hat ,这是对哪种酒属于哪个品种的预测 。 这是正向传播过程的结束。

We then calculate the difference between our prediction (y-hat) and the expected output (y) and use this error value during backpropagation.

然后,我们计算预测值(y-hat)与预期输出(y)之间的 ,并在反向传播期间使用此误差值。

During backpropagation, we take our error — the difference between our prediction y-hat and y — and we mathematically push it back through the NN in the other direction. We are learning from our mistakes.

在反向传播期间,我们采用了误差-预测的y-hat和y之间的差-,并且在数学上将其从另一个方向推回了NN。 我们正在从错误中学习。

By taking the derivative of the functions we used during the first process, we try to discover what value we should give the weights in order to achieve the best possible prediction. Essentially we want to know what the relationship is between the value of our weight and the error that we get out as the result.

通过获取在第一步中使用的函数的导数,我们尝试发现应该赋予权重的值,以便获得最佳的预测 。 本质上,我们想知道体重值与结果错误之间的关系。

And after many epochs or iterations, the NN has learned to give us more accurate predictions by adapting its parameters to our dataset.

在经过许多时期或迭代之后,神经网络已学会通过将其参数调整为数据集来为我们提供更准确的预测。

This post was inspired by the week 1 challenge from the Bletchley Machine Learning Bootcamp that started on the 7th of February. In the coming nine weeks, I’m one of 50 students who will go through the fundamentals of Machine Learning. Every week we discuss a different topic and have to submit a challenge, which requires you to really understand the materials.

这篇文章的灵感来自于2月7日开始的Bletchley机器学习训练营的第1周挑战。 在接下来的九周中,我将成为50位将学习机器学习基础知识的学生之一。 每周我们讨论一个不同的主题,并且必须提交一个挑战,这要求您真正了解材料。

If you have any questions or suggestions or, let me know!

如果您有任何问题或建议,或者让我知道!

Or if you want to check out the whole code, you can find it here on Kaggle.

或者,如果你想看看整个代码,你可以找到它在这里的Kaggle。

Recommended videos to get a deeper understanding on neural networks:

推荐的视频可以使您对神经网络有更深入的了解:

  • 3Blue1Brown’s series on neural networks

    3Blue1Brown神经网络系列

  • Siraj Raval’s series on Deep Learning

    Siraj Raval的深度学习系列

翻译自: https://www.freecodecamp.org/news/building-a-3-layer-neural-network-from-scratch-99239c4af5d3/

从头学习计算机网络

你可能感兴趣的:(神经网络,算法,python,机器学习,人工智能)