机器学习笔记021 | 反向传播方法的代码实现

机器学习笔记021 | 反向传播方法的代码实现_第1张图片

之前在《机器学习笔记017 | 图片中的数字是怎么被识别出来的》中对于图片的识别,其中所使用的参数,都是现成的。

如果你看了我前面几篇笔记,大概就能够清楚,这些参数是通过反向传播算法迭代得到的。

本篇笔记主要是反向传播(Back Propagation)算法主体的实现,以下代码为Python。更多实现的内容,请点击阅读原文查看。

1 代价函数

# 代价函数
def nnCostFunction(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, mylambda):
    # 因为传入的参数是一个数组,所以我们需要对参数进行拆分
    Theta1 = nn_params[0:hidden_layer_size* (input_layer_size + 1)].reshape(hidden_layer_size,(input_layer_size + 1))
    Theta2 = nn_params[(hidden_layer_size * (input_layer_size + 1)):].reshape(num_labels, (hidden_layer_size + 1))
    
    J = 0
    Theta1_grad = np.zeros(Theta1.shape)
    Theta2_grad = np.zeros(Theta2.shape)
    
    # 训练数据量
    m = X.shape[0]

    y_l = np.zeros(m*num_labels).reshape(m,num_labels)
    
    for c in range(num_labels):
        y_l[:,c] = ((y==(c+1)) + 0).T[0]
        
    # 计算第二层
    a1 = np.c_[np.ones(m).reshape(m,1),X]
    z2 = a1.dot(Theta1.T)
    a2 = sigmoid(z2)
    
    # 计算第三层
    a2 = np.c_[np.ones(m).reshape(m,1),a2]
    z3 = a2.dot(Theta2.T)
    a3 = sigmoid(z3)
    
    h = a3

    # 去掉首个参数
    tempTheta1 = Theta1[:,1:]
    tempTheta2 = Theta2[:,1:]

    tempTheta = np.append(tempTheta1.flatten(), tempTheta2.flatten())
    
    # 计算代价函数
    J = (1.0/m*sum((-y_l*log(h) - (1-y_l)*log(1-h)).flatten() )) + 1.0*mylambda/2/m*sum(tempTheta**2)

    return J

2 梯度

# 梯度
def nnGradient(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, mylambda):
    # 因为传入的参数是一个数组,所以我们需要对参数进行拆分
    Theta1 = nn_params[0:hidden_layer_size* (input_layer_size + 1)].reshape(hidden_layer_size,(input_layer_size + 1))
    Theta2 = nn_params[(hidden_layer_size * (input_layer_size + 1)):].reshape(num_labels, (hidden_layer_size + 1))
    
    J = 0
    Theta1_grad = np.zeros(Theta1.shape)
    Theta2_grad = np.zeros(Theta2.shape)
    
    # 训练数据量
    m = X.shape[0]
    
    # 结果标签向量化
    y_l = np.zeros(m*num_labels).reshape(m,num_labels)
    for c in range(num_labels):
        y_l[:,c] = ((y==(c+1)) + 0).T[0]
        
    # 计算第二层
    a1 = np.c_[np.ones(m).reshape(m,1),X]
    z2 = a1.dot(Theta1.T)
    a2 = sigmoid(z2)
    
    # 计算第三层
    a2 = np.c_[np.ones(m).reshape(m,1),a2]
    z3 = a2.dot(Theta2.T)
    a3 = sigmoid(z3)
    
    h = a3

    tempTheta1 = Theta1[:,1:]
    tempTheta2 = Theta2[:,1:]
    tempTheta = np.append(tempTheta1.flatten(), tempTheta2.flatten())
    
    # 对每一组训练数据,加总梯度
    for t in range(m):
        tmp_a1 = getArray(a1[t,:]).T
        tmp_a2 = getArray(a2[t,:]).T
        tmp_a3 = getArray(a3[t,:]).T

        tmp_y = getArray(y_l[t,:]).T

        tmp_delta3 = tmp_a3 - tmp_y
        tmp_delta2 = Theta2.T.dot(tmp_delta3)*tmp_a2*(1-tmp_a2)
        
        Theta2_grad = Theta2_grad + (tmp_delta3).dot(tmp_a2.T)
        Theta1_grad = Theta1_grad + (tmp_delta2.dot(tmp_a1.T))[1:,:]
     
    # 添加正则化项
    Theta1_grad[:,1:] = Theta1_grad[:,1:] + mylambda*Theta1[:,1:]
    Theta2_grad[:,1:] = Theta2_grad[:,1:] + mylambda*Theta2[:,1:]
                
    # 梯度平均化
    Theta1_grad = Theta1_grad/m
    Theta2_grad = Theta2_grad/m
                              
    # 合并两个参数,并使得结果扁平化
    grad = np.append(Theta1_grad.flatten(), Theta2_grad.flatten())

    return grad

3 参数初始化

# 随机初始化参数
def randInitializeWeights(L_in, L_out):
    epsilon_init = 0.12
    W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init
    
    return W

print('\n初始化神经网络参数...')

initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size)
initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels)

# 数据扁平化
initial_nn_params = np.append(initial_Theta1.flatten(), initial_Theta2.flatten())

4 调用

print('\n训练神经网络…… ')

opts = {'maxiter': 50}
mylambda = 1
result = optimize.minimize(nnCostFunction, initial_nn_params, args=(input_layer_size, hidden_layer_size, num_labels, X, y, mylambda), method='CG', jac=nnGradient, tol=None, callback=None, options=opts)
nn_params = result.x
print("\n代价函数计算结果:\n%s"%result.fun)
print("\n参数theta计算结果:\n%s"%result.x)

Theta1 = nn_params[0:hidden_layer_size* (input_layer_size + 1)].reshape(hidden_layer_size,(input_layer_size + 1))
Theta2 = nn_params[(hidden_layer_size * (input_layer_size + 1)):].reshape(num_labels, (hidden_layer_size + 1))

文章转载自公众号:止一之路

机器学习笔记021 | 反向传播方法的代码实现_第2张图片

你可能感兴趣的:(机器学习笔记021 | 反向传播方法的代码实现)