必要的套餐 (Necessary packages)

matplotlib.pyplot : pyplot is a collection of command style functions that make matplotlib work like MATLAB.
matplotlib.pyplot： pyplot是使matplotlib像MATLAB一样工作的命令样式函数的集合。
sklearn(Sci-kit learn) : for machine learning.
sklearn(Sci-kit learning)：用于机器学习。
numpy : for dot product, matrices multiplication, etc; related to arrays.
numpy：用于点积，矩阵乘法等；与数组有关。

Install this packages using pip:

使用pip安装此软件包：

pip install matplotlib
pip install sklearn
pip install numpy

导入必要的功能 (Importing Necessary functions)

fetch_openml : for downloading the data.
fetch_openml：用于下载数据。
classification_report, confusion_matrix : for checking accuracy of the model.
category_report，confusion_matrix：用于检查模型的准确性。
train_test_split: for splitting the data in train set and test set.
train_test_split：用于在训练集和测试集中拆分数据。

# %matplotlib inline
from sklearn.datasets import fetch_openml
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
plt.style.use('ggplot')

下载资料 (Downloading data)

Here I am downloading our MNIST dataset and normalizing it.

在这里，我正在下载MNIST数据集并对其进行规范化。

什么是MNIST？ (What is MNIST?)

MNIST contains 60,000 images of hand-written single digits between 0 and 9, each image is 28 x 28 pixels in greyscale with pixel-values from 0 to 255.
MNIST包含60,000张介于0到9之间的手写数字图像，每张图像的灰度值为28 x 28像素，像素值为0到255。

greyscale 灰度

#mnist
mnist = fetch_openml('mnist_784')
X, y = mnist["data"], mnist["target"]
X = X / 255

让我们看一些带有目标的随机图像。 (Let’s see some random image with their targets.)

fig, axes = plt.subplots(2, 10, figsize=(16, 6))
for i in range(20):
    axes[i//10, i %10].imshow(X[i].reshape((28,28)), cmap='gray');
    axes[i//10, i %10].axis('off')
    axes[i//10, i %10].set_title(f"target: {y[i]}")

random image with their targets 带有目标的随机图像

前处理 (Preprocessing)

One hot Encode:

一种热门编码：

digits = 10
examples = y.shape[0]
y = y.reshape(1, examples)
Y_new = np.eye(digits)[y.astype('int32')]
Y_new = Y_new.T.reshape(digits, examples)

Splitting the data:

拆分数据：

m = 60000 #no of training samples or images
m_test = X.shape[0] - m
X_train, X_test = X[:m].T, X[m:].T
Y_train, Y_test = Y_new[:,:m], Y_new[:,m:]
shuffle_index = np.random.permutation(m)
X_train, Y_train = X_train[:, shuffle_index], Y_train[:, shuffle_index]
print(X_train.shape)
print(Y_train.shape)

Show a image:

显示图片：

i = 12
plt.imshow(X_train[:,i].reshape(28,28), cmap = "gray")
plt.axis("off")
plt.show()
Y_train[:,i]

image 图片

I am also upload a video of this tutorial in YouTube but in ‘Hindi’, If you want you can watch:

我还在YouTube上但在“印地语”中上传了本教程的视频，如果您愿意，可以观看：

最后，让我们构建ANN (Finally, let’s build the ANN)

ANN 人工神经网络

因此，这里有： (So here we have:)

Input node with some inputs (Real numbers; x1, x2, xn) with their weights (Real numbers; w1, w2, wn) and bias (Real number).
具有一些输入(实数； x1，x2，xn)及其权重(实数； w1，w2，wn)和偏差(实数)的输入节点。
And this parameters (weights and bias) connects with our hidden nodes, where we compute weighted sum (sigma or z) for all inputs and theirs weight and then we apply a non-linear activation function (like, sigmoid, tanh, etc) and this generate a our final output (y).
这个参数(权重和偏差)与我们的隐藏节点相连，在这里我们为所有输入及其权重计算加权和(sigma或z)，然后应用非线性激活函数(例如Sigmoid，tanh等)，这将产生我们的最终输出(y)。

Now, in our model we have 28 x 28 pixels of image (total pixels is 784 pixel) and this pixel is our inputs that goes to our input node then it goes to hidden node (single hidden layer) and then generate a output (single digits between 0 and 9).

现在，在我们的模型中，我们有28 x 28像素的图像(总像素为784像素)，此像素是我们的输入，该像素到达我们的输入节点，然后到达隐藏节点(单个隐藏层)，然后生成一个输出(单个像素)。 0到9之间的数字)。

乙状结肠激活功能 (Sigmoid Activation Function)

Here our y-hat (or output of nodes) is sigmoid(dot product of weight and input ‘x’ + bias)

在这里，我们的Y型帽子(或节点的输出)是S形的(重量与输入'x'的点积+偏差)

Implementing Sigmoid Activation Function:

实现Sigmoid激活功能：

#activation sigmoid
def sigmoid(x):
  return 1. / (1.+np.exp(-x))

交叉熵损失(又称成本，误差)函数 (Cross-entropy Loss (a.k.a Cost, Error) Function)

For ‘n’ classes and single samples or for ’n’ digits and single image, below we the have formula:

对于“ n”个类和单个样本，或者“ n”个数字和单个图像，下面有以下公式：

For ’n’ classes and single samples 对于“ n”类和单个样本

But, for ’n’ classes and multiple(m) samples or for ’n’ digits and multiple single image, below we the have formula:

但是，对于“ n”个类和多个(m)样本或“ n”个数字和多个单幅图像，下面有以下公式：

Implementing Cross-entropy Loss Function:

实现交叉熵损失功能：

#cross-entropy for our cost function
def compute_multiclass_loss(Y, Y_hat):
    L_sum = np.sum(np.multiply(Y, np.log(Y_hat)))
    m = Y.shape[1]
    L = -(1/m) * L_sum
    return L

使用梯度下降算法的反向传播 (Back-propagation Using Gradient Descent Algorithm)

Back-propagation 反向传播

Back-propagation is just a way of propagating the total loss back into the neural network to know how much of the loss every node is responsible for, and subsequently updating the weights in such a way that minimizes the loss by giving the nodes with higher error rates lower weights and vice versa.

反向传播只是一种将总损耗传播回神经网络以了解每个节点负责多少损耗的方法，然后通过赋予节点更高的误差以最小化损耗的方式更新权重降低重量，反之亦然。

梯度下降 (Gradient Descent)

Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. To find a local minimum of a function using gradient descent, we take steps proportional to the negative of the gradient of the function at the current point.

梯度下降是用于找到可微函数的局部最小值的一阶迭代优化算法。为了使用梯度下降找到函数的局部最小值，我们采取与该函数在当前点的梯度负值成比例的步骤。

Formula : new weight = prev. weight — learning rate*gradient

公式： 新重量=上一个。 体重-学习率*梯度

Gradient Descent Algorithm 梯度下降算法

计算梯度 (Computing Gradient)

最后，让我们实现它并训练我们的模型 (Finally let’s implement it and train our model)

n_x = X_train.shape[0]
n_h = 64
digits = 10
learning_rate = 1
epochs = 2000

Initializing Weights and bias:

初始化权重和偏差：

W1 = np.random.randn(n_h, n_x)
b1 = np.zeros((n_h, 1))
W2 = np.random.randn(digits, n_h)
b2 = np.zeros((digits, 1))
X = X_train
Y = Y_train

Now, training start:

现在，培训开始：

for i in range(epochs):
    Z1 = np.matmul(W1,X) + b1
    A1 = sigmoid(Z1)
    Z2 = np.matmul(W2,A1) + b2
    A2 = np.exp(Z2) / np.sum(np.exp(Z2), axis=0)
    cost = compute_multiclass_loss(Y, A2)
    dZ2 = A2-Y
    dW2 = (1./m) * np.matmul(dZ2, A1.T)
    db2 = (1./m) * np.sum(dZ2, axis=1, keepdims=True)
    dA1 = np.matmul(W2.T, dZ2)
    dZ1 = dA1 * sigmoid(Z1) * (1 - sigmoid(Z1))
    dW1 = (1./m) * np.matmul(dZ1, X.T)
    db1 = (1./m) * np.sum(dZ1, axis=1, keepdims=True)
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2
    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    if (i % 100 == 0):
        print("Epoch", i, "cost: ", cost)
print("Final cost:", cost)

model loss 模型损失

Generating our predictions and checking accuracy:

生成我们的预测并检查准确性：

Z1 = np.matmul(W1, X_test) + b1
A1 = sigmoid(Z1)
Z2 = np.matmul(W2, A1) + b2
A2 = np.exp(Z2) / np.sum(np.exp(Z2), axis=0)
predictions = np.argmax(A2, axis=0)
labels = np.argmax(Y_test, axis=0)
print(confusion_matrix(predictions, labels))
print(classification_report(predictions, labels))

Okay, we got 92% accuracy which is pretty good.

好的，我们获得了92％的准确度，这非常不错。

Code of this tutorial : https://github.com/madhav727/deep-learning-tutorial/blob/master/ann_scratch.ipynb

本教程的代码： https : //github.com/madhav727/deep-learning-tutorial/blob/master/ann_scratch.ipynb

翻译自: https://medium.com/analytics-vidhya/artificial-neural-network-from-scratch-using-python-numpy-580e9bacd67c

使用python numpy从零开始建立人工神经网络