UFLDL新版教程与编程练习(六):Multi-Layer Neural Network(多层神经网络)

UFLDL是吴恩达团队编写的较早的一门深度学习入门,里面理论加上练习的节奏非常好,每次都想快点看完理论去动手编写练习,因为他帮你打好了整个代码框架,也有详细的注释,所以我们只要实现一点核心的代码编写工作就行了,上手快!

我这里找不到新版对应这块的中文翻译了,-_-,趁早写一下,否则又没感觉了!

第六节是:Multi-Layer Neural Network(多层神经网络)
多层神经网络,其实讲的就是全连接层的那种网络,之后会讲卷积神经网络(感觉要比全连接层麻烦多了)
下面就是一个神经网络的模型图:
包括输入层,输出层,中间的隐藏层,偏置项,3个输入神经元,3个隐藏神经元,1个输出神经元

神经网络模型

教程里有很多的符号(notation)说明,在此就不过多描述了,在神经网络模型中,主要包括两步:

  • forward propagation
  • backpropagation

前向传播:就是输入神经元乘以权重加上偏置作为下一层的输入,经过activation function激活之后又作为输出传递下去,就好像这些公式所描述的:
\begin{aligned} a_{1}^{(2)} &=f\left(W_{11}^{(1)} x_{1}+W_{12}^{(1)} x_{2}+W_{13}^{(1)} x_{3}+b_{1}^{(1)}\right) \\ a_{2}^{(2)} &=f\left(W_{21}^{(1)} x_{1}+W_{22}^{(1)} x_{2}+W_{23}^{(1)} x_{3}+b_{2}^{(1)}\right) \\ a_{3}^{(2)} &=f\left(W_{31}^{(1)} x_{1}+W_{32}^{(1)} x_{2}+W_{33}^{(1)} x_{3}+b_{3}^{(1)}\right) \\ h_{W, b}(x) &=a_{1}^{(3)}=f\left(W_{11}^{(2)} a_{1}^{(2)}+W_{12}^{(2)} a_{2}^{(2)}+W_{13}^{(2)} a_{3}^{(2)}+b_{1}^{(2)}\right) \end{aligned}
简化一点的写法就是:

我们计算出这些节点的值,是因为反向传播的时候会用到这些值:向量化写法:

反向传播:就是通过误差,反向计算出损失函数对各个变量的梯度,因为练习题用的是softmax的误差,前一篇理论的还是用的平方和误差,我这里写就按着练习题来写
我们的损失现在是这样的:
J(\theta)=-\left[\sum_{i=1}^{m} \sum_{k=1}^{K} 1\left\{y^{(i)}=k\right\} \log \frac{\exp \left(\theta^{(k) \top} h_{W, b}\left(x^{(i)}\right)\right)}{\sum_{j=1}^{K} \exp \left(\theta^{(j) \top} h_{W, b}(x)^{(i)}\right) )}\right]
最后一层的误差是这样的:

得到了最后一层的误差,就可以通过这个公式计算之前所有层的误差(向量化写法):

而后通过误差就可以计算出梯度:

计算出梯度就可以根据梯度下降算法来更新参数,其实我们只要写好cost和grad,它就会自动优化了,下面代码中损失项还加了的L2正则项,也叫权重衰减(weight decay
下面就是supervised_dnn_cost.m代码:

function [ cost, grad, pred_prob] = supervised_dnn_cost( theta, ei, data, labels, pred_only)
%   [~, ~, pred] = supervised_dnn_cost( opt_params, ei, data_train, [], true);
%   SPNETCOSTSLAVE Slave cost function for simple phone net
%   Does all the work of cost / gradient computation
%   Returns cost broken into cross-entropy, weight norm, and prox reg
%        components (ceCost, wCost, pCost)

%% default values
po = false;
if exist('pred_only','var')
  po = pred_only;
end;

%% reshape into network
stack = params2stack(theta, ei);  %here,theta is opt_params,stack size:(2,1),stack include network parameters
numHidden = numel(ei.layer_sizes) - 1; % num of hidden layers,here numHidden=1
hAct = cell(numHidden+1, 1); % hAct record each layer input and output(except input layer)
gradStack = cell(numHidden+1, 1); % gradStack record gradient of backprop
m = size(data,2); % number of examples
%% forward prop
%%% YOUR CODE HERE %%%
for l = 1:numHidden+1
    if(l == 1)
        hAct{l}.z = stack{l}.W*data;  % 第一个隐层,将训练数据集作为其数据
    else
        hAct{l}.z = stack{l}.W*hAct{l-1}.a; % 第l层的输入,是第l-1层的输出,
    end
    hAct{l}.z = bsxfun(@plus,hAct{l}.z,stack{l}.b); % 第l层的节点的输入加上偏置
    hAct{l}.a = sigmoid(hAct{l}.z); % 应用激活函数
end

%% return here if only predictions desired.
if po
  cost = -1; ceCost = -1; wCost = -1; numCorrect = -1;
  grad = [];  
  mat_e = exp(hAct{numHidden+1}.z); 
  pred_prob = bsxfun(@rdivide,mat_e,sum(mat_e,1));
  return;
end;

%% compute cost
%%% YOUR CODE HERE %%%
mat_e = exp(hAct{numHidden+1}.z); % 而网页上是theta * hw,b(x),就是经过激活的a size:(10,60000)
pred_prob = bsxfun(@rdivide,mat_e,sum(mat_e,1));
I = sub2ind(size(pred_prob),labels',1:size(pred_prob,2));
ceCost = -sum(log(pred_prob(I)));
%% compute gradients using backpropagation
%%% YOUR CODE HERE %%%
tabels = zeros(size(pred_prob));
tabels(I) = 1;
for l = numHidden+1:-1:1 
    if(l == numHidden+1)
        hAct{l}.delta = -(tabels - pred_prob);  % 输出层使用softmax的损失函数,所以和二次项损失函数不同,其他的都是一样的
    else
        hAct{l}.delta = (stack{l+1}.W'* hAct{l+1}.delta) .* (hAct{l}.a .*(1- hAct{l}.a));
    end
    
    if(l == 1)
        gradStack{l}.W = hAct{l}.delta*data'; %hAct{0}.a相当于输入data
        gradStack{l}.b = sum(hAct{l}.delta,2);
    else
        gradStack{l}.W = hAct{l}.delta*hAct{l-1}.a';
        gradStack{l}.b = sum(hAct{l}.delta,2);        
    end    
end

%% compute weight penalty cost and gradient for non-bias terms
%%% YOUR CODE HERE %%%
wCost = 0;
for l = 1:numHidden+1
    wCost = wCost+ sum(sum(stack{l}.W.^2));   %  网络参数W的累计和,正则项损失
end
cost = (1/m)*ceCost + .5 * ei.lambda * wCost; % 带正则项的损失

% Computing the gradient of the weight decay.
for l = numHidden+1: -1 : 1
    gradStack{l}.W = gradStack{l}.W + ei.lambda * stack{l}.W;
end

%% reshape gradients into vector
[grad] = stack2params(gradStack);
end

运行结果:
我放在一台i7 8代电脑上跑出的结果,可能CPU差点就时间长点而已:

多层神经网络运行结果

跟教程还是比较符合的,因为加了正则项,过拟合也不会太严重,在测试集上也达到了97%的准确率:
Train and test various network architectures. You should be able to achieve 100% training set accuracy with a single hidden layer of 256 hidden units.
参考:https://blog.csdn.net/lingerlanlan/article/details/38464317
有理解不到位之处,还请指出,有更好的想法,可以在下方评论交流!

你可能感兴趣的:(UFLDL新版教程与编程练习(六):Multi-Layer Neural Network(多层神经网络))