下载地址:DeepLearningToolBox
学习RBM代码之前,需要一些基本的RBM的知识。
网上有很多参考资料,我借鉴一篇写的很好的系列文章,看下来也差不多能看懂了,博客地址:http://blog.csdn.net/itplus/article/details/19168937
目录如下
(一)预备知识
(二)网络结构
(三)能量函数和概率分布
(四)对数似然函数
(五)梯度计算公式 (六)对比散度算法
(七)RBM 训练算法
(八)RBM 的评估
通过学习上面的系列文章,基本上理解了RBM的原理,接下来动手学习toolbox中对应的代码部分。本文参考诸多博客,感谢原文作者。
% Version 1.000 % % Code provided by Ruslan Salakhutdinov and Geoff Hinton % % Permission is granted for anyone to copy, use, modify, or distribute this % program and accompanying programs and documents for any purpose, provided % this copyright notice is retained and prominently displayed, along with % a note saying that the original programs are available from our % web page. % The programs and documents are distributed without any warranty, express or % implied. As the programs were written for research purposes only, they have % not been tested to the degree that would be advisable in any important % application. All use of these programs is entirely at the user's own risk. % This program pretrains a deep autoencoder for MNIST dataset % You can set the maximum number of epochs for pretraining each layer % and you can set the architecture of the multilayer net. clc clear all close all maxepoch=10; %In the Science paper we use maxepoch=50, but it works just fine. numhid=1000; numpen=500; numpen2=250; numopen=30; fprintf(1,'Converting Raw files into Matlab format \n'); converter; % 转换数据为matlab的格式 fprintf(1,'Pretraining a deep autoencoder. \n'); fprintf(1,'The Science paper used 50 epochs. This uses %3i \n', maxepoch); makebatches; [numcases numdims numbatches]=size(batchdata); fprintf(1,'Pretraining Layer 1 with RBM: %d-%d \n',numdims,numhid); restart=1; rbm; hidrecbiases=hidbiases; %hidbiases为隐含层的偏置值 save mnistvh vishid hidrecbiases visbiases; %保存每层的变量,分别为权值,隐含层偏置值,可视层偏置值 fprintf(1,'\nPretraining Layer 2 with RBM: %d-%d \n',numhid,numpen); batchdata=batchposhidprobs; %batchposhidprobs为第一个rbm的输出概率值 numhid=numpen; restart=1; rbm; hidpen=vishid; penrecbiases=hidbiases; hidgenbiases=visbiases; save mnisthp hidpen penrecbiases hidgenbiases; %mnisthp为所保存的文件名 fprintf(1,'\nPretraining Layer 3 with RBM: %d-%d \n',numpen,numpen2); batchdata=batchposhidprobs; numhid=numpen2; restart=1; rbm; hidpen2=vishid; penrecbiases2=hidbiases; hidgenbiases2=visbiases; save mnisthp2 hidpen2 penrecbiases2 hidgenbiases2; fprintf(1,'\nPretraining Layer 4 with RBM: %d-%d \n',numpen2,numopen); batchdata=batchposhidprobs; numhid=numopen; restart=1; rbmhidlinear; hidtop=vishid; toprecbiases=hidbiases; topgenbiases=visbiases; save mnistpo hidtop toprecbiases topgenbiases; backprop; %Finetune
本次是训练4个隐含层的autoencoder深度网络结构,输入层维度为784维,4个隐含层维度分别为1000,500,250,30。整个网络权值的获得流程梳理如下:
% Version 1.000 % % Code provided by Geoff Hinton and Ruslan Salakhutdinov % % Permission is granted for anyone to copy, use, modify, or distribute this % program and accompanying programs and documents for any purpose, provided % this copyright notice is retained and prominently displayed, along with % a note saying that the original programs are available from our % web page. % The programs and documents are distributed without any warranty, express or % implied. As the programs were written for research purposes only, they have % not been tested to the degree that would be advisable in any important % application. All use of these programs is entirely at the user's own risk. % This program trains Restricted Boltzmann Machine in which % visible, binary, stochastic pixels are connected to % hidden, binary, stochastic feature detectors using symmetrically % weighted connections. Learning is done with 1-step Contrastive Divergence. % The program assumes that the following variables are set externally: % maxepoch -- maximum number of epochs % numhid -- number of hidden units % batchdata -- the data that is divided into batches (numcases numdims numbatches) % restart -- set to 1 if learning starts from beginning epsilonw = 0.1; % Learning rate for weights epsilonvb = 0.1; % Learning rate for biases of visible units epsilonhb = 0.1; % Learning rate for biases of hidden units weightcost = 0.0002; initialmomentum = 0.5; finalmomentum = 0.9; %由此可见这里隐含层和可视层的偏置值不是共用的,当然了,其权值是共用的 [numcases numdims numbatches]=size(batchdata);%[100,784,600] if restart ==1, restart=0; epoch=1; % Initializing symmetric weights and biases. vishid = 0.1*randn(numdims, numhid);%权值初始值随便给,784*1000 hidbiases = zeros(1,numhid); visbiases = zeros(1,numdims); poshidprobs = zeros(numcases,numhid); %100*1000,单个batch正向传播时隐含层的输出概率 neghidprobs = zeros(numcases,numhid); posprods = zeros(numdims,numhid); negprods = zeros(numdims,numhid); vishidinc = zeros(numdims,numhid); hidbiasinc = zeros(1,numhid); visbiasinc = zeros(1,numdims); batchposhidprobs=zeros(numcases,numhid,numbatches); % 整个数据正向传播时隐含层的输出概率 end for epoch = epoch:maxepoch, fprintf(1,'epoch %d\r',epoch); errsum=0; for batch = 1:numbatches, %每次迭代都有遍历所有的batch fprintf(1,'epoch %d batch %d\r',epoch,batch); %%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% data = batchdata(:,:,batch); % 每次迭代都需要取出一个batch的数据,每一行代表一个样本值(这里的数据是double的,不是01的,严格的说后面应将其01化) poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1))); % 样本正向传播时隐含层节点的输出概率 batchposhidprobs(:,:,batch)=poshidprobs; posprods = data' * poshidprobs; %784*1000,这个是求系统的能量值用的,矩阵中每个元素表示对应的可视层节点和隐含层节点的乘积(包含此次样本的数据对应值的累加) poshidact = sum(poshidprobs);%针对样本值进行求和 posvisact = sum(data); %%%%%%%%% END OF POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% poshidstates = poshidprobs > rand(numcases,numhid); %将隐含层数据01化(此步骤在posprods之后进行),按照概率值大小来判定. %%%%%%%%% START NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% negdata = 1./(1 + exp(-poshidstates*vishid' - repmat(visbiases,numcases,1)));% 反向进行时的可视层数据 neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1))); % 反向进行后又马上正向传播的隐含层概率值 negprods = negdata'*neghidprobs;% 同理也是计算能量值用的,784*1000 neghidact = sum(neghidprobs); negvisact = sum(negdata); %%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% err= sum(sum( (data-negdata).^2 ));% 重构后的差值 errsum = err + errsum; if epoch>5, momentum=finalmomentum; %momentum为保持上一次权值更新增量的比例,如果迭代次数越少,则这个比例值可以稍微大一点 else momentum=initialmomentum; end; %%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% vishidinc = momentum*vishidinc + ... epsilonw*( (posprods-negprods)/numcases - weightcost*vishid); visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact); hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact); vishid = vishid + vishidinc; visbiases = visbiases + visbiasinc; hidbiases = hidbiases + hidbiasinc; %%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% end fprintf(1, 'epoch %4i error %6.1f \n', epoch, errsum); end;
偏置值的优化步骤:
当然了,权值更新和偏置值更新每次迭代都是同时进行的,所以应该是同时收敛的。并且在权值更新公式也可以稍微作下变形,比如加入momentum变量,即本次权值更新的增量会保留一部分上次更新权值的增量值。
% Version 1.000 % % Code provided by Ruslan Salakhutdinov and Geoff Hinton % % Permission is granted for anyone to copy, use, modify, or distribute this % program and accompanying programs and documents for any purpose, provided % this copyright notice is retained and prominently displayed, along with % a note saying that the original programs are available from our % web page. % The programs and documents are distributed without any warranty, express or % implied. As the programs were written for research purposes only, they have % not been tested to the degree that would be advisable in any important % application. All use of these programs is entirely at the user's own risk. % This program fine-tunes an autoencoder with backpropagation. % Weights of the autoencoder are going to be saved in mnist_weights.mat % and trainig and test reconstruction errors in mnist_error.mat % You can also set maxepoch, default value is 200 as in our paper. maxepoch=200; fprintf(1,'\nFine-tuning deep autoencoder by minimizing cross entropy error. \n');%其微调通过最小化交叉熵来实现 fprintf(1,'60 batches of 1000 cases each. \n'); load mnistvh % 分别load4个rbm的参数 load mnisthp load mnisthp2 load mnistpo makebatches; [numcases numdims numbatches]=size(batchdata); N=numcases; %%%% PREINITIALIZE WEIGHTS OF THE AUTOENCODER %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% w1=[vishid; hidrecbiases]; %分别装载每层的权值和偏置值,将它们作为一个整体 w2=[hidpen; penrecbiases]; w3=[hidpen2; penrecbiases2]; w4=[hidtop; toprecbiases]; w5=[hidtop'; topgenbiases]; w6=[hidpen2'; hidgenbiases2]; w7=[hidpen'; hidgenbiases]; w8=[vishid'; visbiases]; %%%%%%%%%% END OF PREINITIALIZATIO OF WEIGHTS %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% l1=size(w1,1)-1;%每个网络层中节点的个数 l2=size(w2,1)-1; l3=size(w3,1)-1; l4=size(w4,1)-1; l5=size(w5,1)-1; l6=size(w6,1)-1; l7=size(w7,1)-1; l8=size(w8,1)-1; l9=l1; %输出层节点和输入层的一样 test_err=[]; train_err=[]; for epoch = 1:maxepoch %%%%%%%%%%%%%%%%%%%% COMPUTE TRAINING RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% err=0; [numcases numdims numbatches]=size(batchdata); N=numcases; for batch = 1:numbatches data = [batchdata(:,:,batch)]; data = [data ones(N,1)]; % b补上一维,因为有偏置项 w1probs = 1./(1 + exp(-data*w1)); w1probs = [w1probs ones(N,1)];; %正向传播,计算每一层的输出,且同时在输出上增加一维(值为常量1) w2probs = 1./(1 + exp(-w1probs*w2)); w2probs = [w2probs ones(N,1)]; w3probs = 1./(1 + exp(-w2probs*w3)); w3probs = [w3probs ones(N,1)]; w4probs = w3probs*w4; w4probs = [w4probs ones(N,1)]; w5probs = 1./(1 + exp(-w4probs*w5)); w5probs = [w5probs ones(N,1)]; w6probs = 1./(1 + exp(-w5probs*w6)); w6probs = [w6probs ones(N,1)]; w7probs = 1./(1 + exp(-w6probs*w7)); w7probs = [w7probs ones(N,1)]; dataout = 1./(1 + exp(-w7probs*w8)); err= err + 1/N*sum(sum( (data(:,1:end-1)-dataout).^2 )); %重构的误差值 end train_err(epoch)=err/numbatches; %总的误差值(训练样本上) %%%%%%%%%%%%%% END OF COMPUTING TRAINING RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% DISPLAY FIGURE TOP ROW REAL DATA BOTTOM ROW RECONSTRUCTIONS %%%%%%%%%%%%%%%%%%%%%%%%% fprintf(1,'Displaying in figure 1: Top row - real data, Bottom row -- reconstructions \n'); output=[]; for ii=1:15 output = [output data(ii,1:end-1)' dataout(ii,:)']; %output为15(因为是显示15个数字)组,每组2列,分别为理论值和重构值 end if epoch==1 close all figure('Position',[100,600,1000,200]); else figure(1) end mnistdisp(output); drawnow; %%%%%%%%%%%%%%%%%%%% COMPUTE TEST RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [testnumcases testnumdims testnumbatches]=size(testbatchdata); N=testnumcases; err=0; for batch = 1:testnumbatches data = [testbatchdata(:,:,batch)]; data = [data ones(N,1)]; w1probs = 1./(1 + exp(-data*w1)); w1probs = [w1probs ones(N,1)]; w2probs = 1./(1 + exp(-w1probs*w2)); w2probs = [w2probs ones(N,1)]; w3probs = 1./(1 + exp(-w2probs*w3)); w3probs = [w3probs ones(N,1)]; w4probs = w3probs*w4; w4probs = [w4probs ones(N,1)]; w5probs = 1./(1 + exp(-w4probs*w5)); w5probs = [w5probs ones(N,1)]; w6probs = 1./(1 + exp(-w5probs*w6)); w6probs = [w6probs ones(N,1)]; w7probs = 1./(1 + exp(-w6probs*w7)); w7probs = [w7probs ones(N,1)]; dataout = 1./(1 + exp(-w7probs*w8)); err = err + 1/N*sum(sum( (data(:,1:end-1)-dataout).^2 )); end test_err(epoch)=err/testnumbatches; fprintf(1,'Before epoch %d Train squared error: %6.3f Test squared error: %6.3f \t \t \n',epoch,train_err(epoch),test_err(epoch)); %%%%%%%%%%%%%% END OF COMPUTING TEST RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% tt=0; for batch = 1:numbatches/10 %测试样本numbatches是100 fprintf(1,'epoch %d batch %d\r',epoch,batch); %%%%%%%%%%% COMBINE 10 MINIBATCHES INTO 1 LARGER MINIBATCH %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% tt=tt+1; data=[]; for kk=1:10 data=[data batchdata(:,:,(tt-1)*10+kk)]; end %%%%%%%%%%%%%%% PERFORM CONJUGATE GRADIENT WITH 3 LINESEARCHES %%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%共轭梯度线性搜索 max_iter=3; VV = [w1(:)' w2(:)' w3(:)' w4(:)' w5(:)' w6(:)' w7(:)' w8(:)']';; % 把所有权值(已经包括了偏置值)变成一个大的列向量 Dim = [l1; l2; l3; l4; l5; l6; l7; l8; l9]; %每层网络对应节点的个数(不包括偏置值) [X, fX] = minimize(VV,'CG_MNIST',max_iter,Dim,data);%该函数时使用共轭梯度的方法来对参数X进行优化 w1 = reshape(X(1:(l1+1)*l2),l1+1,l2); xxx = (l1+1)*l2; w2 = reshape(X(xxx+1:xxx+(l2+1)*l3),l2+1,l3); xxx = xxx+(l2+1)*l3; w3 = reshape(X(xxx+1:xxx+(l3+1)*l4),l3+1,l4); xxx = xxx+(l3+1)*l4; w4 = reshape(X(xxx+1:xxx+(l4+1)*l5),l4+1,l5); xxx = xxx+(l4+1)*l5; w5 = reshape(X(xxx+1:xxx+(l5+1)*l6),l5+1,l6); xxx = xxx+(l5+1)*l6; w6 = reshape(X(xxx+1:xxx+(l6+1)*l7),l6+1,l7); xxx = xxx+(l6+1)*l7; w7 = reshape(X(xxx+1:xxx+(l7+1)*l8),l7+1,l8); xxx = xxx+(l7+1)*l8; w8 = reshape(X(xxx+1:xxx+(l8+1)*l9),l8+1,l9); %依次重新赋值为优化后的参数 %%%%%%%%%%%%%%% END OF CONJUGATE GRADIENT WITH 3 LINESEARCHES %%%%%%%%%%%%%%%%%%%%%%%%%%%%% end save mnist_weights w1 w2 w3 w4 w5 w6 w7 w8 %前面一个是文件名 save mnist_error test_err train_err; end
函数CG_MNIST形式如下:
function [f, df] = CG_MNIST(VV,Dim,XX);
该函数实现的功能是计算网络代价函数值f,以及f对网络中各个参数值的偏导数df,权值和偏置值是同时处理。其中参数VV为网络中所有参数构成的列向量,参数Dim为每层网络的节点数构成的向量,XX为训练样本集合。f和df分别表示网络的代价函数和偏导函数值。
[X, fX, i] = minimize(X, f, length, P1, P2, P3, ... )
该函数时使用共轭梯度的方法来对参数X进行优化,所以X是网络的参数值,为一个列向量。f是一个函数的名称,它主要是用来计算网络中的代价函数以及代价函数对各个参数X的偏导函数,f的参数值分别为X,以及minimize函数后面的P1,P2,P3,…使用共轭梯度法进行优化的最大线性搜索长度为length。返回值X为找到的最优参数,fX为在此最优参数X下的代价函数,i为线性搜索的长度(即迭代的次数)。
本文参考如下博客:
http://www.cnblogs.com/tornadomeet/archive/2013/04/30/3052349.html