1.训练数据
MNIST数据集:
训练集(train):50000——用于训练
验证集(validation):10000——用于训练中的自测
测试集(test):10000——用于测试
2.神经网络初始化
class Network(object):
def __init__(self, sizes):
self.num_layers = len(sizes)
self.sizes = sizes
self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
self.weights = [np.random.randn(y, x)
for x, y in zip(sizes[:-1], sizes[1:])]
解释:
init:只要实例化一个类,总要运行init,python的构造函数, 类似于Java里面类的构造函数。
self:索引到当前类,类似于java中的this
sizes:神经元有几层及每层个数,eg:net=Network([ 2, 3, 1])#第一层2个神经元,第二层3个,第三层1个
num_layers=len(sizes):神经网络层数
biases:偏移量初始化,0~1之间随机选取,一个神经元需要一个biases
weights:权重初始化,0~1之间随机选取,一个箭头对应一个权重
random:随机
为了便于理解此处单独运行如下代码:
sizes=[2,3,1]
bias=[np.random.randn(y, 1) for y in sizes[1:]]
print(bias)
运行结果:返回两个list,一个3×1和一个1×1的list。//np.random.randn(y, 1)可以理解为返回一个 y 行 1列 的list,list的值采用高斯分布随机赋值
[array([[-0.2310922 ],
[-0.33350782],
[ 0.88558646]]), array([[ 1.51042319]])]
Process finished with exit code 0
sizes[1:]:就是size中除了第一个数外的,后面所有的数
sizes[:-1]:就是size中除了最后一个数外的,前面所有的数
x, y in zip(sizes[:-1], sizes[1:])]:x,y分别取zip中的两个值,
net.weights[1]:存储连接第二层和第三层间的权重
import numpy as np
class Network(object):
def __init__(self, sizes):
self.num_layers = len(sizes)
self.sizes = sizes
self.biases = [np.random.randn(y, 1) for y in sizes[1:]]
self.weights = [np.random.randn(y, x)
for x, y in zip(sizes[:-1], sizes[1:])]
net = Network([2,3,1])
print(net.num_layers)
print(net.sizes)
print("偏移量:")
print(net.biases)
print("权重:")
print(net.weights)
运行结果:
3
[2, 3, 1]
偏移量:
[array([[ 0.72072723],
[ 1.02129651],
[ 0.0451003 ]]), array([[ 0.89568534]])]
权重:
[array([[ 0.35048635, 1.582825 ],
[-0.6184383 , 1.03039687],
[-1.22620262, -0.48511089]]), array([[ 1.51702976, -0.59924277, 0.07869854]])]
Process finished with exit code 0
3.前向传播
定义一个前向传播的神经网络叫feedforward
def feedforward(self, a):
"""Return the output of the network if ``a`` is input."""
for b, w in zip(self.biases, self.weights):
a = sigmoid(np.dot(w, a)+b)
return a
解释:
dot(w, a):向量w和向量a作点乘运算
4.随机梯度下降
def SGD(self, training_data, epochs, mini_batch_size, eta,
test_data=None):
"""Train the neural network using mini-batch stochastic
gradient descent. The ``training_data`` is a list of tuples
``(x, y)`` representing the training inputs and the desired
outputs. The other non-optional parameters are
self-explanatory. If ``test_data`` is provided then the
network will be evaluated against the test data after each
epoch, and partial progress printed out. This is useful for
tracking progress, but slows things down substantially."""
if test_data: n_test = len(test_data)
n = len(training_data)
for j in xrange(epochs):
random.shuffle(training_data)
mini_batches = [
training_data[k:k + mini_batch_size]
for k in xrange(0, n, mini_batch_size)]
for mini_batch in mini_batches:
self.update_mini_batch(mini_batch, eta)
if test_data:
print
"Epoch {0}: {1} / {2}".format(
j, self.evaluate(test_data), n_test)
else:
print
"Epoch {0} complete".format(j)
解释:
training_data:一个list,包括了许多tuples,每一个tuple对应一个实例 (x,y),x是输入,y是输出,以手写数字图片为例的话,x就代表784维的向量,y代表10维的向量。
epochs:训练轮数,根据先验知识和神经网络以及数据来设定的。
mini_batch_size:每一小块包含的实例数量。
eta:学习率
test_data=None:测试集,默认为空
n_test:测试集大小,即有多少张图片
n:训练集大小
j:代表第几轮
xrange(epochs):0~epochs
shuffle:洗牌,随机打乱
for k in xrange(0, n, mini_batch_size)]:0~n,每次间隔mini_batch_size。eg:mini_batch_size是100的话,[k:k + mini_batch_size]就是0~100,100~200,200~300…
Epoch {0}: {1} / {2}中的0, 1, 2分别对应——(j, self.evaluate(test_data), n_test)
5.权重和偏移量的更新
def update_mini_batch(self, mini_batch, eta):
"""Update the network's weights and biases by applying
gradient descent using backpropagation to a single mini batch.
The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
is the learning rate."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
解释:
nabla_b,nabla_w:初始化两个新的矩阵,形状和biases、weights一模一样。
backprop():快速求偏导的一个方法。
delta_nabla_b, delta_nabla_w = self.backprop(x, y):代入x和y之后,求nabla_b,nabla_w的偏导
nabla_b,nabla_w:把所有从每一对(x,y)求得的delta_nabla_b, delta_nabla_w 都累加起来
eta:学习率
最后两句就是更新权重和偏移量