之前在《机器学习笔记017 | 图片中的数字是怎么被识别出来的》中对于图片的识别,其中所使用的参数,都是现成的。
如果你看了我前面几篇笔记,大概就能够清楚,这些参数是通过反向传播算法迭代得到的。
本篇笔记主要是反向传播(Back Propagation)算法主体的实现,以下代码为Python。更多实现的内容,请点击阅读原文查看。
1 代价函数
# 代价函数
def nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, mylambda):
# 因为传入的参数是一个数组,所以我们需要对参数进行拆分
Theta1 = nn_params[0:hidden_layer_size* (input_layer_size + 1)].reshape(hidden_layer_size,(input_layer_size + 1))
Theta2 = nn_params[(hidden_layer_size * (input_layer_size + 1)):].reshape(num_labels, (hidden_layer_size + 1))
J = 0
Theta1_grad = np.zeros(Theta1.shape)
Theta2_grad = np.zeros(Theta2.shape)
# 训练数据量
m = X.shape[0]
y_l = np.zeros(m*num_labels).reshape(m,num_labels)
for c in range(num_labels):
y_l[:,c] = ((y==(c+1)) + 0).T[0]
# 计算第二层
a1 = np.c_[np.ones(m).reshape(m,1),X]
z2 = a1.dot(Theta1.T)
a2 = sigmoid(z2)
# 计算第三层
a2 = np.c_[np.ones(m).reshape(m,1),a2]
z3 = a2.dot(Theta2.T)
a3 = sigmoid(z3)
h = a3
# 去掉首个参数
tempTheta1 = Theta1[:,1:]
tempTheta2 = Theta2[:,1:]
tempTheta = np.append(tempTheta1.flatten(), tempTheta2.flatten())
# 计算代价函数
J = (1.0/m*sum((-y_l*log(h) - (1-y_l)*log(1-h)).flatten() )) + 1.0*mylambda/2/m*sum(tempTheta**2)
return J
2 梯度
# 梯度
def nnGradient(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, mylambda):
# 因为传入的参数是一个数组,所以我们需要对参数进行拆分
Theta1 = nn_params[0:hidden_layer_size* (input_layer_size + 1)].reshape(hidden_layer_size,(input_layer_size + 1))
Theta2 = nn_params[(hidden_layer_size * (input_layer_size + 1)):].reshape(num_labels, (hidden_layer_size + 1))
J = 0
Theta1_grad = np.zeros(Theta1.shape)
Theta2_grad = np.zeros(Theta2.shape)
# 训练数据量
m = X.shape[0]
# 结果标签向量化
y_l = np.zeros(m*num_labels).reshape(m,num_labels)
for c in range(num_labels):
y_l[:,c] = ((y==(c+1)) + 0).T[0]
# 计算第二层
a1 = np.c_[np.ones(m).reshape(m,1),X]
z2 = a1.dot(Theta1.T)
a2 = sigmoid(z2)
# 计算第三层
a2 = np.c_[np.ones(m).reshape(m,1),a2]
z3 = a2.dot(Theta2.T)
a3 = sigmoid(z3)
h = a3
tempTheta1 = Theta1[:,1:]
tempTheta2 = Theta2[:,1:]
tempTheta = np.append(tempTheta1.flatten(), tempTheta2.flatten())
# 对每一组训练数据,加总梯度
for t in range(m):
tmp_a1 = getArray(a1[t,:]).T
tmp_a2 = getArray(a2[t,:]).T
tmp_a3 = getArray(a3[t,:]).T
tmp_y = getArray(y_l[t,:]).T
tmp_delta3 = tmp_a3 - tmp_y
tmp_delta2 = Theta2.T.dot(tmp_delta3)*tmp_a2*(1-tmp_a2)
Theta2_grad = Theta2_grad + (tmp_delta3).dot(tmp_a2.T)
Theta1_grad = Theta1_grad + (tmp_delta2.dot(tmp_a1.T))[1:,:]
# 添加正则化项
Theta1_grad[:,1:] = Theta1_grad[:,1:] + mylambda*Theta1[:,1:]
Theta2_grad[:,1:] = Theta2_grad[:,1:] + mylambda*Theta2[:,1:]
# 梯度平均化
Theta1_grad = Theta1_grad/m
Theta2_grad = Theta2_grad/m
# 合并两个参数,并使得结果扁平化
grad = np.append(Theta1_grad.flatten(), Theta2_grad.flatten())
return grad
3 参数初始化
# 随机初始化参数
def randInitializeWeights(L_in, L_out):
epsilon_init = 0.12
W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init
return W
print('\n初始化神经网络参数...')
initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size)
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels)
# 数据扁平化
initial_nn_params = np.append(initial_Theta1.flatten(), initial_Theta2.flatten())
4 调用
print('\n训练神经网络…… ')
opts = {'maxiter': 50}
mylambda = 1
result = optimize.minimize(nnCostFunction, initial_nn_params, args=(input_layer_size, hidden_layer_size, num_labels, X, y, mylambda), method='CG', jac=nnGradient, tol=None, callback=None, options=opts)
nn_params = result.x
print("\n代价函数计算结果:\n%s"%result.fun)
print("\n参数theta计算结果:\n%s"%result.x)
Theta1 = nn_params[0:hidden_layer_size* (input_layer_size + 1)].reshape(hidden_layer_size,(input_layer_size + 1))
Theta2 = nn_params[(hidden_layer_size * (input_layer_size + 1)):].reshape(num_labels, (hidden_layer_size + 1))
文章转载自公众号:止一之路