注意:python中的zip() 函数用于将可迭代的对象作为参数,将对象中对应的元素打包成一个个元组,然后返回由这些元组组成的列表。
zip 语法:zip([iterable, ...])
; 返回元组列表。>>a = [1,2,3] >> b = [4,5,6] >> c = [4,5,6,7,8] >> zipped = list(zip(a,b)) # 打包为元组的列表 [(1, 4), (2, 5), (3, 6)] >> zip(a,c) # 元素个数与最短的列表一致 [(1, 4), (2, 5), (3, 6)] >> list(zip(*zipped)) # 与 zip 相反,*zipped 可理解为解压,返回二维矩阵式 [(1, 2, 3), (4, 5, 6)]
A library to load the MNIST image data. For details of the data
structures that are returned, see the doc strings for ``load_data``
and ``load_data_wrapper``. In practice, ``load_data_wrapper`` is the
function usually called by our neural network code.
#### Libraries
# Standard library
import pickle
import gzip
# Third-party libraries
import numpy as np
def load_data():
Return the MNIST data as a tuple containing the training data,
the validation data, and the test data.
The ``training_data`` is returned as a tuple with two entries.
The first entry contains the actual training images. This is a
numpy ndarray with 50,000 entries. Each entry is, in turn, a
numpy ndarray with 784 values, representing the 28 * 28 = 784
pixels in a single MNIST image.
The second entry in the ``training_data`` tuple is a numpy ndarray
containing 50,000 entries. Those entries are just the digit
values (0...9) for the corresponding images contained in the first
entry of the tuple.
The ``validation_data`` and ``test_data`` are similar, except
each contains only 10,000 images.
This is a nice data format, but for use in neural networks it's
helpful to modify the format of the ``training_data`` a little.
That's done in the wrapper function ``load_data_wrapper()``, see
f = gzip.open('data/mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = pickle.load(f, encoding='latin1')
return (training_data, validation_data, test_data)
def load_data_wrapper():
Return a tuple containing ``(training_data, validation_data,
test_data)``. Based on ``load_data``, but the format is more
convenient for use in our implementation of neural networks.
In particular, ``training_data`` is a list containing 50,000
2-tuples ``(x, y)``. ``x`` is a 784-dimensional numpy.ndarray
containing the input image. ``y`` is a 10-dimensional
numpy.ndarray representing the unit vector corresponding to the
correct digit for ``x``.
``validation_data`` and ``test_data`` are lists containing 10,000
2-tuples ``(x, y)``. In each case, ``x`` is a 784-dimensional
numpy.ndarry containing the input image, and ``y`` is the
corresponding classification, i.e., the digit values (integers)
corresponding to ``x``.
Obviously, this means we're using slightly different formats for
the training data and the validation / test data. These formats
turn out to be the most convenient for use in our neural network
tr_d, va_d, te_d = load_data()
training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
training_data = list(zip(training_inputs, training_results))
validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
validation_data = list(zip(validation_inputs, va_d[1]))
test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
test_data = list(zip(test_inputs, te_d[1]))
return (training_data, validation_data, test_data)
def vectorized_result(j):
Return a 10-dimensional unit vector with a 1.0 in the jth
position and zeroes elsewhere. This is used to convert a digit
(0...9) into a corresponding desired output from the neural
:param j:
e = np.zeros((10, 1))
e[j] = 1.0
return e
A module to implement the stochastic gradient descent learning
algorithm for a forward neural network. Gradients are calculated
using backpropagation. Note that I have focused on making the code
simple, easily readable, and easily modifiable. It is not optimized,
and omits many desirable features.
#### Libraries
# Standard library
import random
# Third-party libraries
import numpy as np
#### Miscellaneous functions
def sigmoid(z):
The sigmoid function.
return 1.0 / (1.0 + np.exp(-z))
def sigmoid_prime(z):
"""Derivative of the sigmoid function."""
return sigmoid(z) * (1 - sigmoid(z))
# 新建一个类,表示3层的神经网络结构!
class Network:
def __init__(self, sizes):
The list ``sizes`` contains the number of neurons in the
respective layers of the network. For example, if the list
was [2, 3, 1] then it would be a three-layer network, with the
first layer containing 2 neurons, the second layer 3 neurons,
and the third layer 1 neuron. The biases and weights for the
network are initialized randomly, using a Gaussian
distribution with mean 0, and variance 1. Note that the first
layer is assumed to be an input layer, and by convention we
won't set any biases for those neurons, since biases are only
ever used in computing the outputs from later layers.
:param size: [784, 30, 10] 元素表示每层的维度,我们设置为一个list
self.num_layers = len(sizes)
# sizes: [784, 30, 10]
self.sizes = sizes
# b: [ch_out, 1] 偏置
self.biases = [np.random.randn(ch_out, 1) for ch_out in sizes[1:]]
# w: [ch_out, ch_in] 权重
self.weights = [np.random.randn(ch_out, ch_in) for ch_in, ch_out in zip(sizes[:-1], sizes[1:])]
def forward(self, x):
Return the output of the network if ``a`` is input.
:param x: [784, 1] 表示输入的纬度。
:return: [10, 1]
for b, w in zip(self.biases, self.weights):
# [30, 784] @ [784, 1]=> [30, 1] + [30, 1] => [30, 1]
# [10, 30] @ [30, 1] + [10, 1] => [10, 1]
z = np.dot(w, x) + b
# [30, 1]
# [10, 1]
x = sigmoid(z)
return x
def train(self, training_data, epochs, batchsz, lr, test_data=None):
Train the neural network using mini-batch stochastic gradient descent.
The ``training_data`` is a list of tuples
``(x, y)`` representing the training inputs and the desired
outputs. The other non-optional parameters are self-explanatory. If
``test_data`` is provided then the network will be evaluated against
the test data after each epoch, and partial progress printed out.
This is useful for tracking progress, but slows things down substantially.
if test_data:
n_test = len(test_data)
n = len(training_data)
for j in range(epochs):
mini_batches = [training_data[k:k+batchsz] for k in range(0, n, batchsz)]
# for every (x,y)
for mini_batch in mini_batches:
loss = self.update_mini_batch(mini_batch, lr) # 返回的有损失值!
if test_data:
print("Epoch {0}: {1} / {2}".format(j, self.evaluate(test_data), n_test), 'loss: ', loss)
print("Epoch {0} complete".format(j) )
def update_mini_batch(self, batch, lr):
Update the network's weights and biases by applying
gradient descent using backpropagation to a single mini batch.
The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
is the learning rate.
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
loss = 0 # 损失值
# for every sample in current batch
for x, y in batch:
# list of every w, b gradient
delta_nabla_b, delta_nabla_w, loss_ = self.backprop(x, y) # 得到当前的梯度值
# 就比如是:[w1, w2, w3]这个是一个样本的,多样本的时候我们应该吧对应位置的累加起来求一个平均值。
nabla_b = [accu + cur for accu, cur in zip(nabla_b, delta_nabla_b)]
nabla_w = [accu + cur for accu, cur in zip(nabla_w, delta_nabla_w)] # cur当前的,accu为之前的;进行对应位置累加。
loss += loss_ #损失值
# 求平均值梯度值w, b ,这个除是点除,因为前面累加也是相应位置进行累加。
nabla_w = [w / len(batch) for w in nabla_w]
nabla_b = [b / len(batch) for b in nabla_b]
# 使用SGD随机梯度下降算法进行更新权值w偏置b
# w = w - lr * nabla_w
self.weights = [w - lr * nabla for w, nabla in zip(self.weights, nabla_w)]
self.biases = [b - lr * nabla for b, nabla in zip(self.biases, nabla_b)]
loss = loss / len(batch) # 损失值
return loss
def backprop(self, x, y):
Return a tuple ``(nabla_b, nabla_w)`` representing the
gradient for the cost function C_x. ``nabla_b`` and
``nabla_w`` are layer-by-layer lists of numpy arrays, similar
to ``self.biases`` and ``self.weights``.
:param x: [1, 784]
:param y: [1, 10], one_hot encoding
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# 1. forward
# 为什么反向传播过程中仍然需要forward, 因为我们需要在forward中记录每层z,activation变量,方便我们以后计算梯度。
# 但是为什么还有一个单独的forward过程,因为单独forward方便我们以后的预测。因为做测试的时候不需要backword的。
activation = x
activations = [x] # list to store all the activations, layer by layer
# w*x = z => sigmoid => x/activation
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(self.biases, self.weights):
# https://stackoverflow.com/questions/34142485/difference-between-numpy-dot-and-python-3-5-matrix-multiplication
# np.dot vs np.matmul = @ vs element-wise *
z = np.dot(w, activation) + b
activation = sigmoid(z)
# 损失函数的值
loss = np.power(activation[-1] - y, 2).sum()
# 2. backward pass
# (Ok-tk)*(1-Ok)*Ok 参考公式;倒数第一层
# 2.1 compute gradient on output layer 首先输出层计算梯度。
# [10, 1] * [10, 1] => [10, 1]
# 和下面这2个都可以的: delta = self.cost_derivative(activations[-1], y) * sigmoid_prime(zs[-1])
delta = activations[-1] * (1 - activations[-1]) * (activations[-1] - y)
nabla_b[-1] = delta
# delta: [10, 1]
# activations[-2]: [30, 1]
# [10, 1] @ [1, 30] => [10, 30]
nabla_w[-1] = np.dot(delta, activations[-2].transpose())
# Note that the variable l in the loop below is used a little
# differently to the notation in Chapter 2 of the book. Here,
# l = 1 means the last layer of neurons, l = 2 is the
# second-last layer, and so on. It's a renumbering of the
# scheme in the book, used here to take advantage of the fact
# that Python can use negative indices in lists.
# 2.2 compute hidden gradient
for l in range(2, self.num_layers):
# [30, 1]
z = zs[-l]
sp = sigmoid_prime(z)
# delta_j的公式
# [10, 30].T @ [10, 1] => [30, 10] @ [10, 1] => [30, 1] * [30, 1] => [30, 1]
delta = np.dot(self.weights[-l + 1].transpose(), delta) * sp # 公式
nabla_b[-l] = delta
# [30, 1] @ [784, 1].T => [30, 784]
nabla_w[-l] = np.dot(delta, activations[-l - 1].transpose()) # 矩阵相乘。
return (nabla_b, nabla_w, loss)
def evaluate(self, test_data):
Return the number of test inputs for which the neural
network outputs the correct result. Note that the neural
network's output is assumed to be the index of whichever
neuron in the final layer has the highest activation.
:param test_data: list of [x, y]
# x, y不加括号也是一样。
test_results = [(np.argmax(self.forward(x)), y) for (x, y) in test_data]
correct = sum(int(pred == y) for pred, y in test_results)
return correct
def main():
import mnist_loader
# Loading the MNIST data
training_data, validation_data, test_data = mnist_loader.load_data_wrapper()
print(len(training_data), training_data[0][0].shape, training_data[0][1].shape)
print(len(test_data), test_data[0][0].shape, test_data[0][1].shape)
# Set up a Network with 30 hidden neurons
net = Network([784, 30, 10])
# Use stochastic gradient descent to learn from the MNIST training_data over
# 30 epochs, with a mini-batch size of 10, and a learning rate of η = 3.0
net.train(training_data, 500, 10, 0.1, test_data=test_data)
if __name__ == '__main__':
C:\Anaconda3\envs\tf2\python.exe E:/Codes/MyCodes/TF2/TF2_5/network.py
50000 (784, 1) (10, 1)
10000 (784, 1) ()
