Copyright © Microsoft Corporation. All rights reserved. 适用于License版权许可
输入层两个特征值x1, x2 $$ x= \begin{pmatrix} x_1 & x_2 \end{pmatrix} $$
隐层2x3的权重矩阵W1 $$ W1= \begin{pmatrix} w^1_{11} & w^1_{12} & w^1_{13} \ w^1_{21} & w^1_{22} & w^1_{23} \end{pmatrix} $$
隐层1x3的偏移矩阵B1
$$ B1=\begin{pmatrix} b^1_1 & b^1_2 & b^1_3 \end{pmatrix} $$
隐层由两个神经元构成
输出层3x3的权重矩阵W2 $$ W2=\begin{pmatrix} w^2_{11} & w^2_{12} & w^2_{13} \ w^2_{21} & w^2_{22} & w^2_{23} \ w^2_{31} & w^2_{32} & w^2_{33} \end{pmatrix} $$
输出层1x1的偏移矩阵B2
$$ B2=\begin{pmatrix} b^2_1 & b^2_2 & b^2_3 \end{pmatrix} $$
$$ z^1_1 = x_1 w^1_{11} + x_2 w^1_{21} + b^1_1 $$ $$ z^1_2 = x_1 w^1_{12} + x_2 w^1_{22} + b^1_2 $$ $$ z^1_3 = x_1 w^1_{13} + x_2 w^1_{23} + b^1_3 $$ $$ Z1 = X \cdot W1 + B1 $$
$$ a^1_1 = Sigmoid(z^1_1) $$ $$ a^1_2 = Sigmoid(z^1_2) $$ $$ a^1_3 = Sigmoid(z^1_3) $$ $$ A1 = Sigmoid(Z1) $$
$$ z^2_1 = a^1_1 w^2_{11} + a^1_2 w^2_{21} + a^1_3 w^2_{31} + b^2_1 $$ $$ z^2_2 = a^1_1 w^2_{12} + a^1_2 w^2_{22} + a^1_3 w^2_{32} + b^2_2 $$ $$ z^2_3 = a^1_1 w^2_{13} + a^1_2 w^2_{23} + a^1_3 w^2_{33} + b^2_3 $$ $$ Z2 = A1 \cdot W2 + B2 $$
$$ a^2_1 = {e^{z^2_1} \over e^{z^2_1} + e^{z^2_2} + e^{z^2_3}} $$ $$ a^2_2 = {e^{z^2_2} \over e^{z^2_1} + e^{z^2_2} + e^{z^2_3}} $$ $$ a^2_3 = {e^{z^2_3} \over e^{z^2_1} + e^{z^2_2} + e^{z^2_3}} $$ $$ A2 = Softmax(Z2) $$
使用多分类交叉熵损失函数: $$ loss = -(y_1 \ln a^2_1 + y_2 \ln a^2_2 + y_3 \ln a^2_3) $$ $$ J(w,b) = -{1 \over m} \sum^m_{i=1} \sum^n_{j=1} y_{ij} \ln (a^2_{ij}) $$
m为样本数,n为类别数。
在第7.1中学习过了Softmax与多分类交叉熵配合时的反向传播推导过程,最后是一个很简单的减法:
$$ {\partial loss \over \partial Z2}=A2-y => dZ2 $$
从Z2开始再向前推的话,和10.2节是一模一样的,所以直接把结论拿过来:
$$ {\partial loss \over \partial W2}=A1^T \cdot dZ2 => dW2 $$ $${\partial{loss} \over \partial{B2}}=dZ2 => dB2$$ $$ {\partial A1 \over \partial Z1}=A1 \odot (1-A1) => dA1 $$ $$ {\partial loss \over \partial Z1}=dZ2 \cdot W2^T \odot dA1 => dZ1 $$ $$ dW1=X^T \cdot dZ1 $$ $$ dB1=dZ1 $$
绝大部分代码都在HelperClass2目录中的基本类实现,这里只有主过程:
import numpy as np
import matplotlib.pyplot as plt
from HelperClass2.DataReader import *
from HelperClass2.HyperParameters2 import *
from HelperClass2.NeuralNet2 import *
from HelperClass2.Visualizer import *
train_data_name = "../../Data/ch11.train.npz"
test_data_name = "../../Data/ch11.test.npz"
if __name__ == '__main__':
# read data
dataReader = DataReader(train_data_name, test_data_name)
dataReader.ReadData()
dataReader.NormalizeY(YNormalizationMethod.MultipleClassifier, base=1)
# show source data
fig = plt.figure(figsize=(6,6))
ShowDataByOneHot2D(dataReader.XTrainRaw[:,0], dataReader.XTrainRaw[:,1], dataReader.YTrain)
plt.show()
# data operation
dataReader.NormalizeX()
dataReader.Shuffle()
dataReader.GenerateValidationSet()
# hyper-paramters construct
n_input = dataReader.num_feature
n_hidden = 3
n_output = dataReader.num_category
eta, batch_size, max_epoch = 0.1, 10, 5000
eps = 0.1
hp = HyperParameters2(n_input, n_hidden, n_output, eta, max_epoch, batch_size, eps, NetType.MultipleClassifier, InitialMethod.Xavier)
# create net and train
net = NeuralNet2(hp, "Bank_233")
net.train(dataReader, 100, True)
net.ShowTrainingTrace()
# show result
fig = plt.figure(figsize=(6,6))
ShowDataByOneHot2D(dataReader.XTrain[:,0], dataReader.XTrain[:,1], dataReader.YTrain)
ShowClassificationResult25D(net, 50)
plt.show()
过程描述:
首先显示原始样本分布:
损失函数记录:
迭代了5000次,没有到底损失函数小于0.1的条件。
分类结果图示:
因为没达到精度要求,所以分类效果一般。从分类结果图上看,外圈圆形差不多拟合住了,但是内圈的方形还差很多。
打印输出:
......
epoch=4999, total_iteration=449999
loss_train=0.225935, accuracy_train=0.800000
loss_valid=0.137970, accuracy_valid=0.960000
W= [[ -8.30315494 9.98115605 0.97148346]
[ -5.84460922 -4.09908698 -11.18484376]]
B= [[ 4.85763475 -5.61827538 7.94815347]]
W= [[-32.28586038 -8.60177788 41.51614172]
[-33.68897413 -7.93266621 42.09333288]
[ 34.16449693 7.93537692 -41.19340947]]
B= [[-11.11937314 3.45172617 7.66764697]]
testing...
0.952
最后的测试分类准确率为0.952。
ch11, Level1