weixin_30920513

Deep Learning 16：用自编码器对数据进行降维_读论文“Reducing the Dimensionality of Data with Neural Networks”的笔记...

前言

论文“Reducing the Dimensionality of Data with Neural Networks”是深度学习鼻祖hinton于2006年发表于《SCIENCE 》的论文，也是这篇论文揭开了深度学习的序幕。

笔记

摘要：高维数据可以通过一个多层神经网络把它编码成一个低维数据，从而重建这个高维数据，其中这个神经网络的中间层神经元数是较少的，可把这个神经网络叫做自动编码网络或自编码器（autoencoder）。梯度下降法可用来微调这个自动编码器的权值，但是只有在初始化权值较好时才能得到最优解，不然就容易陷入局部最优解。本文提供了一种有效的初始化权值算法，就是利用深度自动编码网络来学习得到初始权值。这一算法比用主成份分析（PCA）来对数据进行降维更好更有效。

内容：

降维在分类、可视化、通信、高维数据的存储等方面都非常有促进作用。一个简单且广泛应用的方法就是PCA降维，它通过寻找数据中的最大变化方向，然后把每个数据都投影到这些方向构成的坐标系中，并表示出来。本文提出了一种PCA的非线性泛化算法，该算法用一个自适应的多层自动编码网络来把高维数据编码为一个低维数据，同时用一个类似的解码网络来把这个低维数据重构为原高维数据。

首先，对这两个网络的权值进行随机初始化，然后通过最小化重构项和原始数据之间的误差对权值进行训练。误差的偏导数通过后向传播得到梯度，也就是把误差偏导数先通过解码网络，再通过编码网络进行传播。整个系统叫做自编码器，具体见图1。

图1.预训练，就是训练一系列的RBM，每个RBM只有一层特征检测器。前一个RBM学习的特征作为下一个RBM的输入。预训练完成后把RBM展开得到一个深层自动编码网络，然后把误差的偏导数后向传播，用来对这个网络进行微调。

最优化有多层隐藏层（2－4层）的非线性自编码器的权值比较困难。因为如果权值初始值较大时，自编码器非常容易陷入局部最优解；如果权值初始值较小时，前几层的梯度下降是非常小的，权值更新就非常慢，这样就必须增加自编码器的隐藏层数，不然就训练不出最优值。如果初始权值比较接近最优解，那么就能能过梯度下降法很快训练得到最优解，但是通过一次学习一层特征的算法来找出这样的初始权值非常困难。“预训练”可以很好地解决这一问题，通过“预训练”可以得到比较接近最优解的初始权值。虽然本文中的“预训练”过程是用的二值数据，但是推广到其他真实的数据也是可以的，并且证明是有效的。

一个二值向量（如：图像）可以通过一个2层网络（即：RBM）来重构，在RBM（文献[5][6]）中，通过对称加权连接把随机二值像素点和随机二值特征检测器联系起来。那些像素点相当于RBM的可视化单元，因为它们的状态是可见的；那些特征检测器相当于隐藏单元。可视单元和隐藏单元的联合系统（v,h）之间的能量（文献[7]）表示为：

其中，v_i和h_j分别是第i个可视层单元和第j个隐藏层单元的状态，bi和bj是偏置项，wji是权值。这个网络通过这个能量函数得到每个可能图像的概率，具体解释见文献[8]。神经元的输入输出关系是sigmoid函数。给定一张输入图像（暂时是以二值图像为例），我们可以通过调整网络的权值和偏置值使得网络对该输入图像的能量最低。权值更新公式如下：

单层的二值网络不足以模拟大量的数据集，因此一般采用多层网络，即把第一层网络的输出作为第二层网络的输入。并且每增加一个网络层，就会提高网络对输入数据重构的log下界概率值，且上层的网络能够提取出其下层网络更高阶的特征。

当网络的预训练过程完成后，我们需要把解码和编码部分重新拿回来展开构成整个网络，然后用真实的数据作为样本标签来微调网络的参数。

对于连续的数据，第一个RBM的隐藏层仍然是二值的，但是其可视化层单元是带高斯白噪声的线性单元。如果该噪声是单位方差，隐藏单元的更新规则仍然是一样的，第i个可视化层单元的更新规则是从一个高斯噪声中抽样，这个噪声的方差是单位方差，均值是的平均值。

在实验中，每个RBM的可视层单元都有真实的[0，1]内激活值，对于高层RBM，其可视化层单元就是前一个RBM的隐藏层单元的激活概率，但是除了最上面一个RBM之外，其他的RBM的隐藏层单元都是随机的二值。最上面一个RBM的隐藏单元是一个随机实值状态，它是从单位方差噪声中抽样得到的，这个单位方差噪声的均值由RBM的可视单元决定。比起PCA，本算法较好地利用了连续变量。预训练和微调的细节见文献[8]。

交叉熵误差公式如下：

其中，pi是输入数据的重构值。

接下来，做了一系列实验。

实验

实验基础说明

1.实验代码：http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html

2.在CG_MNIST.m中会用到：后向传导算法求各层偏导数df，见“http://ufldl.stanford.edu/wiki/index.php/用反向传导思想求导”

3.一些matlab函数

rem和mod:

　　参考资料取模（mod）与取余（rem）的区别——Matlab学习笔记

　　通常取模运算也叫取余运算，它们返回结果都是余数.rem和mod唯一的区别在于:
　　当x和y的正负号一样的时候，两个函数结果是等同的；当x和y的符号不同时，rem函数结果的符号和x的一样，而mod和y一样。这是由于这两个函数的生成机制不同，rem函数采用fix函数，而mod函数采用了floor函数（这两个函数是用来取整的，fix函数向0方向舍入，floor函数向无穷小方向舍入）。rem（x，y）命令返回的是x-n.*y，如果y不等于0，其中的n = fix(x./y)，而mod(x,y)返回的是x-n.*y，当y不等于0时，n=floor(x./y)

4.函数说明

converter.m:

　　实现的功能是将样本集从.ubyte格式转换成.ascii格式，然后继续转换成.mat格式。

　　makebatches.m:

　　实现的是将原本的2维数据集变成3维的，因为分了多个批次，另外1维表示的是批次。

function [f, df] = CG_MNIST(VV,Dim,XX);

　该函数实现的功能是计算网络代价函数值f，以及f对网络中各个参数值的偏导数df，权值和偏置值是同时处理。其中参数VV为网络中所有参数构成的列向量，参数Dim为每层网络的节点数构成的向量，XX为训练样本集合。f和df分别表示网络的代价函数和偏导函数值。

　　共轭梯度下降的优化函数形式为：

　　[X, fX, i] = minimize(X, f, length, P1, P2, P3, ... )

　　该函数时使用共轭梯度的方法来对参数X进行优化，所以X是网络的参数值，为一个列向量。f是一个函数的名称，它主要是用来计算网络中的代价函数以及代价函数对各个参数X的偏导函数，f的参数值分别为X，以及minimize函数后面的P1,P2,P3,…使用共轭梯度法进行优化的最大线性搜索长度为length。返回值X为找到的最优参数，fX为在此最优参数X下的代价函数，i为线性搜索的长度（即迭代的次数）。

疑问

1.rbm.m的代码中，直接有v1=p(v1|h0)，而实际上应该是把p(v1|h0)与均匀分布的随机数比较得出v1，即：01化，但是在该代码中并没有把p(v1|h0)进行01化？为什么？

2.在第4个RBM的预训练代码rbmhidlinear.m中，有这句话：

poshidprobs = (data*vishid) + repmat(hidbiases,numcases,1);

即：p(hj=1|v0)＝Wji*v0+bj，为什么？

答：因为输出层神经元（即：第4个rbm的隐含层神经元）的激活函数是f(x)=x，而不是原来的logistic函数。

3.在把4个RBM展开连接起来，再用训练数据进行微调整个模型的代码backprop.m中这句话：

w4probs = w3probs*w4; w4probs = [w4probs ones(N,1)];

为什么？

答：因为没有把4个RBM展开前输出层神经元（即：第4个rbm的隐含层神经元）的激活函数是f(x)=x，而不是原来的logistic函数。所以把4个RBM展开并连接起来变为9层神经网络后，它的第5层神经元的激活函数仍然是f(x)=x。

即：下图中节点数为30的网络层神经元激活函数为f(x)=x。

4.backprop.m中这句话：

dataout = 1./(1 + exp(-w7probs*w8));

这里dataout是输出层的输出概率密度，但是它是下面代码中的作用是输出数据或重构数据，为什么？

答：原因不知道。但从这里可推导出：输出层的输出概率密度＝重构数据的概率密度＝重构数据

实验步骤

1.加载数据集，并转换为.mat格式，即代码中的converter.m；

2.依次预训练4个rbm，并把前一个rbm的输入作为后一个rbm的输入，见rbm.m；

3.把4个rbm展开成图1中的“Unrolling”部分，计算该网络的代价函数及其对各权值的偏导数，见CG_MNIST.m；

4.利用共轭梯度下降法对代价函数进行优化，见minimize.m。

实验结果

Train squared error: 4.318

Test squared error: 4.520

代码

mnistdeepauto.m

% Version 1.000
%
% Code provided by Ruslan Salakhutdinov and Geoff Hinton  
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our 
% web page. 
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.


% This program pretrains a deep autoencoder for MNIST dataset
% You can set the maximum number of epochs for pretraining each layer
% and you can set the architecture of the multilayer net.

clear all
close all

maxepoch=10; %最大迭代次数  In the Science paper we use maxepoch=50, but it works just fine. 
numhid=1000; numpen=500; numpen2=250; numopen=30;

fprintf(1,'Converting Raw files into Matlab format \n');
converter; % 把测试数据集和训练数据集转换为.mat格式

fprintf(1,'Pretraining a deep autoencoder. \n');
fprintf(1,'The Science paper used 50 epochs. This uses %3i \n', maxepoch);

makebatches;% 把数据集及其标签进行打包或分批，方便以后分批进行处理，因为数据太大了，这样可加快学习速率
[numcases numdims numbatches]=size(batchdata);%返回训练数据集的大小

fprintf(1,'Pretraining Layer 1 with RBM: %d-%d \n',numdims,numhid);
restart=1;
rbm;             %预训练第1个rbm
hidrecbiases=hidbiases; % 第一个rbm的隐含层偏置项
save mnistvh vishid hidrecbiases visbiases;% 保存第1个rbm的权值、隐含层偏置项、可视化层偏置项，为mnistvh.mat

fprintf(1,'\nPretraining Layer 2 with RBM: %d-%d \n',numhid,numpen);
batchdata=batchposhidprobs;% 第1个rbm中整个数据第一次正向传播时隐含层的输出概率（注意：不是把概率01化后的输出状态），作为第2个rbm的输入数据
numhid=numpen;% 第2个rbm的隐含层神经元数
restart=1;
rbm;       %预训练第2个rbm
hidpen=vishid; penrecbiases=hidbiases; hidgenbiases=visbiases;
save mnisthp hidpen penrecbiases hidgenbiases;% 保存第2个rbm的权值、隐含层偏置项、可视化层偏置项，为mnisthp.mat

fprintf(1,'\nPretraining Layer 3 with RBM: %d-%d \n',numpen,numpen2);
batchdata=batchposhidprobs;% 第2个rbm中整个数据第一次正向传播时隐含层的输出概率，作为第3个rbm的输入数据（注意：不是把概率01化后的输出状态作为输入数据）
numhid=numpen2;
restart=1;
rbm;       %预训练第3个rbm
hidpen2=vishid; penrecbiases2=hidbiases; hidgenbiases2=visbiases;
save mnisthp2 hidpen2 penrecbiases2 hidgenbiases2;% 保存第3个rbm的权值、隐含层偏置项、可视化层偏置项，为mnisthp2.mat

fprintf(1,'\nPretraining Layer 4 with RBM: %d-%d \n',numpen2,numopen);
batchdata=batchposhidprobs;% 第3个rbm中整个数据第一次正向传播时隐含层的输出概率，作为第4个rbm的输入数据
numhid=numopen; 
restart=1;
rbmhidlinear;      % 预训练第4个rbm，但是注意输出层单元激活函数是1，而不再是logistic函数
hidtop=vishid; toprecbiases=hidbiases; topgenbiases=visbiases;
save mnistpo hidtop toprecbiases topgenbiases;% 保存第4个rbm的权值、隐含层偏置项、可视化层偏置项，为mnistpo.mat

backprop; % 把4个RBM展开连接起来，再用训练数据进行微调整个模型

converter.m

% Version 1.000
% % 作用：把测试数据集和训练数据集转换为.mat格式
% 最终得到的测试数据集：test(0~9).mat
% 最终得到的训练数据集：digit(0~9).mat
% Code provided by Ruslan Salakhutdinov and Geoff Hinton
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.

% This program reads raw MNIST files available at 
% http://yann.lecun.com/exdb/mnist/ 
% and converts them to files in matlab format 
% Before using this program you first need to download files:
% train-images-idx3-ubyte.gz train-labels-idx1-ubyte.gz 
% t10k-images-idx3-ubyte.gz t10k-labels-idx1-ubyte.gz
% and gunzip them. You need to allocate some space for this.  

% This program was originally written by Yee Whye Teh 

%% 首先转换测试数据集的格式 Work with test files first 
fprintf(1,'You first need to download files:\n train-images-idx3-ubyte.gz\n train-labels-idx1-ubyte.gz\n t10k-images-idx3-ubyte.gz\n t10k-labels-idx1-ubyte.gz\n from http://yann.lecun.com/exdb/mnist/\n and gunzip them \n'); 

f = fopen('t10k-images.idx3-ubyte','r');
[a,count] = fread(f,4,'int32');
  
g = fopen('t10k-labels.idx1-ubyte','r');
[l,count] = fread(g,2,'int32');

fprintf(1,'Starting to convert Test MNIST images (prints 10 dots) \n'); 
n = 1000;

Df = cell(1,10);
for d=0:9,
  Df{d+1} = fopen(['test' num2str(d) '.ascii'],'w');
end;
  
for i=1:10,
  fprintf('.');
  rawimages = fread(f,28*28*n,'uchar');
  rawlabels = fread(g,n,'uchar');
  rawimages = reshape(rawimages,28*28,n);

  for j=1:n,
    fprintf(Df{rawlabels(j)+1},'%3d ',rawimages(:,j));
    fprintf(Df{rawlabels(j)+1},'\n');
  end;
end;

fprintf(1,'\n');
for d=0:9,
  fclose(Df{d+1});
  D = load(['test' num2str(d) '.ascii'],'-ascii');%这个test.ascii文件从哪来的？
  fprintf('%5d Digits of class %d\n',size(D,1),d);
  save(['test' num2str(d) '.mat'],'D','-mat');
end;


%% 然后转换训练数据集的格式Work with trainig files second  
f = fopen('train-images.idx3-ubyte','r');
[a,count] = fread(f,4,'int32');

g = fopen('train-labels.idx1-ubyte','r');
[l,count] = fread(g,2,'int32');

fprintf(1,'Starting to convert Training MNIST images (prints 60 dots)\n'); 
n = 1000;

Df = cell(1,10);
for d=0:9,
  Df{d+1} = fopen(['digit' num2str(d) '.ascii'],'w');
end;

for i=1:60,
  fprintf('.');
  rawimages = fread(f,28*28*n,'uchar');
  rawlabels = fread(g,n,'uchar');
  rawimages = reshape(rawimages,28*28,n);

  for j=1:n,
    fprintf(Df{rawlabels(j)+1},'%3d ',rawimages(:,j));
    fprintf(Df{rawlabels(j)+1},'\n');
  end;
end;

fprintf(1,'\n');
for d=0:9,
  fclose(Df{d+1});
  D = load(['digit' num2str(d) '.ascii'],'-ascii');
  fprintf('%5d Digits of class %d\n',size(D,1),d);
  save(['digit' num2str(d) '.mat'],'D','-mat');
end;

dos('rm *.ascii');

makebatches.m

% Version 1.000
% 作用：把数据集及其标签进行分批，方便以后分批进行处理，因为数据太大了，分批处理可加快学习速率
% 训练数据集及标签的打包结果：batchdata、batchtargets
% 测试数据集及标签的打包结果：testbatchdata、testbatchtargets

% Code provided by Ruslan Salakhutdinov and Geoff Hinton
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.

%% 训练数据集分批
digitdata=[]; % 训练数据
targets=[];   % 训练数据的标签
load digit0; digitdata = [digitdata; D]; targets = [targets; repmat([1 0 0 0 0 0 0 0 0 0], size(D,1), 1)];  
load digit1; digitdata = [digitdata; D]; targets = [targets; repmat([0 1 0 0 0 0 0 0 0 0], size(D,1), 1)];
load digit2; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 1 0 0 0 0 0 0 0], size(D,1), 1)]; 
load digit3; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 1 0 0 0 0 0 0], size(D,1), 1)];
load digit4; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 1 0 0 0 0 0], size(D,1), 1)]; 
load digit5; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 0 1 0 0 0 0], size(D,1), 1)];
load digit6; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 0 0 1 0 0 0], size(D,1), 1)];
load digit7; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 0 0 0 1 0 0], size(D,1), 1)];
load digit8; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 0 0 0 0 1 0], size(D,1), 1)];
load digit9; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 0 0 0 0 0 1], size(D,1), 1)];
digitdata = digitdata/255;% 简单缩放归一化

totnum=size(digitdata,1);%训练样本数：60000
fprintf(1, 'Size of the training dataset= %5d \n', totnum);

rand('state',0); %so we know the permutation of the training data
randomorder=randperm(totnum);% 产生totnum个小于等于totnum的正整数

numbatches=totnum/100;          % 批数：600
numdims  =  size(digitdata,2);  % 每个训练样本的维数：784
batchsize = 100;                % 每个batch中的训练样本数：100
batchdata = zeros(batchsize, numdims, numbatches);
batchtargets = zeros(batchsize, 10, numbatches);

for b=1:numbatches
  batchdata(:,:,b) = digitdata(randomorder(1+(b-1)*batchsize:b*batchsize), :);
  batchtargets(:,:,b) = targets(randomorder(1+(b-1)*batchsize:b*batchsize), :);
end;
clear digitdata targets;

%% 测试数据集分批
digitdata=[];
targets=[];
load test0; digitdata = [digitdata; D]; targets = [targets; repmat([1 0 0 0 0 0 0 0 0 0], size(D,1), 1)]; 
load test1; digitdata = [digitdata; D]; targets = [targets; repmat([0 1 0 0 0 0 0 0 0 0], size(D,1), 1)]; 
load test2; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 1 0 0 0 0 0 0 0], size(D,1), 1)];
load test3; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 1 0 0 0 0 0 0], size(D,1), 1)];
load test4; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 1 0 0 0 0 0], size(D,1), 1)];
load test5; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 0 1 0 0 0 0], size(D,1), 1)];
load test6; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 0 0 1 0 0 0], size(D,1), 1)];
load test7; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 0 0 0 1 0 0], size(D,1), 1)];
load test8; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 0 0 0 0 1 0], size(D,1), 1)];
load test9; digitdata = [digitdata; D]; targets = [targets; repmat([0 0 0 0 0 0 0 0 0 1], size(D,1), 1)];
digitdata = digitdata/255;

totnum=size(digitdata,1);
fprintf(1, 'Size of the test dataset= %5d \n', totnum);

rand('state',0); %so we know the permutation of the training data
randomorder=randperm(totnum);

numbatches=totnum/100;
numdims  =  size(digitdata,2);
batchsize = 100;
testbatchdata = zeros(batchsize, numdims, numbatches);
testbatchtargets = zeros(batchsize, 10, numbatches);

for b=1:numbatches
  testbatchdata(:,:,b) = digitdata(randomorder(1+(b-1)*batchsize:b*batchsize), :);
  testbatchtargets(:,:,b) = targets(randomorder(1+(b-1)*batchsize:b*batchsize), :);
end;
clear digitdata targets;


%%% Reset random seeds 
rand('state',sum(100*clock)); 
randn('state',sum(100*clock));

rbm.m

% Version 1.000 
% 作用：训练RBM，利用1步CD算法
% Code provided by Geoff Hinton and Ruslan Salakhutdinov 
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.

% This program trains Restricted Boltzmann Machine in which
% visible, binary, stochastic pixels are connected to
% hidden, binary, stochastic feature detectors using symmetrically
% weighted connections. Learning is done with 1-step Contrastive Divergence.   
% The program assumes that the following variables are set externally:
% maxepoch  -- 最大迭代次数maximum number of epochs
% numhid    -- 隐含层神经元数number of hidden units 
% batchdata -- 分批后的训练数据集the data that is divided into batches (numcases numdims numbatches)
% restart   -- 如果从第1层开始学习，就置restart为1.set to 1 if learning starts from beginning 

epsilonw      = 0.1;   % 权值的学习速率Learning rate for weights 
epsilonvb     = 0.1;   % 可视化层偏置项的学习速率Learning rate for biases of visible units 
epsilonhb     = 0.1;   % 隐含层偏置项的学习速率Learning rate for biases of hidden units 
weightcost  = 0.0002;  % 权衰减，用于防止出现过拟合，见论文“受限波尔兹曼机RBM”
initialmomentum  = 0.5;% 动量项学习率，用于克服收敛速度和算法的不稳定性之间的矛盾
finalmomentum    = 0.9;

[numcases numdims numbatches]=size(batchdata);%[numcases numdims numbatches]＝[每批中的样本数 每个样本的维数 训练样本批数]

if restart ==1,
  restart=0;
  epoch=1;

% Initializing symmetric weights and biases. 
  vishid     = 0.1*randn(numdims, numhid);% 连接权值Wij
  hidbiases  = zeros(1,numhid);           % 隐含层偏置项ci
  visbiases  = zeros(1,numdims);          % 可视化层偏置项bj

  poshidprobs = zeros(numcases,numhid);%100*1000，单个batch第一次正向传播时隐含层的输出概率p(h|v0)
  neghidprobs = zeros(numcases,numhid);%第二次正向传播时隐含层的输出概率p(h|v1)
  posprods    = zeros(numdims,numhid);% posprods表示p(hi=1|v0)*v0，以后更新detaWij时会用到这一项
  negprods    = zeros(numdims,numhid);% negprods表示p(hi=1|v1)*v1，以后更新detaWij时会用到这一项
  vishidinc  = zeros(numdims,numhid);% 权值更新的增量deta Wji
  hidbiasinc = zeros(1,numhid);      % 隐含层偏置项更新的增量deta bj
  visbiasinc = zeros(1,numdims);     % 可视化层偏置项更新的增量deta ci
  batchposhidprobs=zeros(numcases,numhid,numbatches);% 整个数据第一次正向传播时隐含层的输出概率
end

for epoch = epoch:maxepoch,
 fprintf(1,'epoch %d\r',epoch); 
 errsum=0;
 for batch = 1:numbatches,
 fprintf(1,'epoch %d batch %d\r',epoch,batch); 

%%%%%%%%% 求正项部分 START POSITIVE PHASE %%%%%%%%%%%%%%%%%以下的代码请对照“深度学习笔记_-_RBM”看%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  data = batchdata(:,:,batch);% data表示可视化层初始数据v0,每次迭代都需要取出一个batch的数据，每一行代表一个样本值（这里的数据是double的，不是01的，严格的说后面应将其01化）
  poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1)));% 样本第一次正向传播时隐含层节点的输出概率，即：p(hj=1|v0)       
  batchposhidprobs(:,:,batch)=poshidprobs;
  posprods    = data' * poshidprobs;% posprods表示p(hi=1|v0)*v0，以后更新detaWij时会用到这一项
  poshidact   = sum(poshidprobs);% 所有p(hi=1|v0)的累加，以后更新deta ci时会用到这一项
  posvisact = sum(data);% 所有v0的累加，以后更新deta bj时会用到这一项

%%%%%%%%% END OF POSITIVE PHASE  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  poshidstates = poshidprobs > rand(numcases,numhid); %poshidstates表示隐含层的状态h0，将隐含层数据01化（此步骤在posprods之后进行），按照概率值大小来判定.
                                                      %rand(m,n)为产生m*n大小的矩阵，矩阵中元素为(0,1)之间的均匀分布。
  
%%%%%%%%%求负项部分 START NEGATIVE PHASE  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  negdata = 1./(1 + exp(-poshidstates*vishid' - repmat(visbiases,numcases,1)));% 从下面来推断，negdata表示第一次反向进行时的可视层数据v1，但从其表达式上推断negdata实际上是p(v1|h0)，这里为什么没有将p(v1|h0)数据01，从而变为v1？而是直接v1=p(v1|h0)？感觉不对
  neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1))); % 第一次反向进行后又马上正向传播的隐含层概率值，即：p(hj=1|v1)   
  negprods  = negdata'*neghidprobs;% negprods表示p(hi=1|v1)*v1，以后更新detaWij时会用到这一项
  neghidact = sum(neghidprobs);    % 所有p(hi=1|v1)的累加，以后更新deta ci时会用到这一项
  negvisact = sum(negdata);        % 所有v1的累加，以后更新deta bj时会用到这一项

%%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  err= sum(sum( (data-negdata).^2 ));
  errsum = err + errsum;

   if epoch>5,
     momentum=finalmomentum;%0.5，momentum表示保持上一次更新增量的比例，如果迭代次数越少，则这个比例值可以稍微大一点
   else
     momentum=initialmomentum;%0.9
   end;

%%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
    vishidinc = momentum*vishidinc + ...  %vishidinc表示权值更新时的增量deta Wij；
                epsilonw*( (posprods-negprods)/numcases - weightcost*vishid);% posprods-negprods表示deta W，weightcost*vishid表示权衰减项，防止出现过拟合
    visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact);% (posvisact-negvisact)表示 deta bj
    hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact);% (poshidact-neghidact)表示 deta ci

    vishid = vishid + vishidinc;
    visbiases = visbiases + visbiasinc;
    hidbiases = hidbiases + hidbiasinc;

%%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

  end
  fprintf(1, 'epoch %4i error %6.1f  \n', epoch, errsum); 
end;

rbmhidlinear.m

% Version 1.000
% 作用：训练最顶层的一个RBM
% 输出层神经元的激活函数为1，是线性的，不再是sigmoid函数，所以该函数名字叫：rbmhidlinear.m
% Code provided by Ruslan Salakhutdinov and Geoff Hinton
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.

% This program trains Restricted Boltzmann Machine in which
% visible, binary, stochastic pixels are connected to
% hidden, tochastic real-valued feature detectors drawn from a unit
% variance Gaussian whose mean is determined by the input from 
% the logistic visible units. Learning is done with 1-step Contrastive Divergence.
% The program assumes that the following variables are set externally:
% maxepoch  -- maximum number of epochs
% numhid    -- number of hidden units
% batchdata -- the data that is divided into batches (numcases numdims numbatches)
% restart   -- set to 1 if learning starts from beginning

epsilonw      = 0.001; % Learning rate for weights 
epsilonvb     = 0.001; % Learning rate for biases of visible units
epsilonhb     = 0.001; % Learning rate for biases of hidden units 
weightcost  = 0.0002;  
initialmomentum  = 0.5;
finalmomentum    = 0.9;


[numcases numdims numbatches]=size(batchdata);

if restart ==1,
  restart=0;
  epoch=1;

% Initializing symmetric weights and biases.
  vishid     = 0.1*randn(numdims, numhid);
  hidbiases  = zeros(1,numhid);
  visbiases  = zeros(1,numdims);


  poshidprobs = zeros(numcases,numhid);
  neghidprobs = zeros(numcases,numhid);
  posprods    = zeros(numdims,numhid);
  negprods    = zeros(numdims,numhid);
  vishidinc  = zeros(numdims,numhid);
  hidbiasinc = zeros(1,numhid);
  visbiasinc = zeros(1,numdims);
  sigmainc = zeros(1,numhid);
  batchposhidprobs=zeros(numcases,numhid,numbatches);
end

for epoch = epoch:maxepoch,
 fprintf(1,'epoch %d\r',epoch); 
 errsum=0;

 for batch = 1:numbatches,
 fprintf(1,'epoch %d batch %d\r',epoch,batch);

%%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  data = batchdata(:,:,batch);
  poshidprobs =  (data*vishid) + repmat(hidbiases,numcases,1);% 样本第一次正向传播时隐含层节点的输出值，即：p(hj=1|v0)
                                                              % 为什么是这个表达式：p(hj=1|v0)＝Wji*v0+bj ？因为输出层激活函数为1
  batchposhidprobs(:,:,batch)=poshidprobs;
  posprods    = data' * poshidprobs;
  poshidact   = sum(poshidprobs);
  posvisact = sum(data);
  
%%%%%%%%% END OF POSITIVE PHASE  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
poshidstates = poshidprobs+randn(numcases,numhid);% h0:非概率密度，而是01后的实值

%%%%%%%%% START NEGATIVE PHASE  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  negdata = 1./(1 + exp(-poshidstates*vishid' - repmat(visbiases,numcases,1)));% v1=p(v1|h0)?
  neghidprobs = (negdata*vishid) + repmat(hidbiases,numcases,1);%为什么是这个表达式p(hj=1|v1)＝Wji*v1+bj ？ neghidprobs表示样本第二次正向传播时隐含层节点的输出值，即：p(hj=1|v1)
  negprods  = negdata'*neghidprobs;
  neghidact = sum(neghidprobs);
  negvisact = sum(negdata); 

%%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


  err= sum(sum( (data-negdata).^2 )); 
  errsum = err + errsum;
   if epoch>5,
     momentum=finalmomentum;
   else
     momentum=initialmomentum;
   end;

%%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    vishidinc = momentum*vishidinc + ...
                epsilonw*( (posprods-negprods)/numcases - weightcost*vishid);
    visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact);
    hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact);
    vishid = vishid + vishidinc;
    visbiases = visbiases + visbiasinc;
    hidbiases = hidbiases + hidbiasinc;

%%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 end
fprintf(1, 'epoch %4i error %f \n', epoch, errsum);

end

backprop.m

% Version 1.000
% 作用：把4个RBM展开连接起来，再用训练数据进行微调整个模型。相当于论文图1中的“Unrolling”部分，
% Code provided by Ruslan Salakhutdinov and Geoff Hinton
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.

% This program fine-tunes an autoencoder with backpropagation.
% Weights of the autoencoder are going to be saved in mnist_weights.mat
% and trainig and test reconstruction errors in mnist_error.mat
% You can also set maxepoch, default value is 200 as in our paper.  

maxepoch=200;
fprintf(1,'\nFine-tuning deep autoencoder by minimizing cross entropy error. \n');
fprintf(1,'60 batches of 1000 cases each. \n');% 60个batch，每个batch1000个样本

load mnistvh
load mnisthp
load mnisthp2
load mnistpo 

makebatches;
[numcases numdims numbatches]=size(batchdata);
N=numcases; 

%%%% PREINITIALIZE WEIGHTS OF THE AUTOENCODER %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
w1=[vishid; hidrecbiases];    % [W1;b1] 分别装载每层的权值和偏置值，将它们作为一个整体
w2=[hidpen; penrecbiases];    % [W2;b2]
w3=[hidpen2; penrecbiases2];  % [W3;b3]
w4=[hidtop; toprecbiases];    % [W4;b4]
w5=[hidtop'; topgenbiases];   % [W4';v4]
w6=[hidpen2'; hidgenbiases2]; % [W3';v3]
w7=[hidpen'; hidgenbiases];   % [W2';v2]
w8=[vishid'; visbiases];      % [W1';v1]

%%%%%%%%%% END OF PREINITIALIZATIO OF WEIGHTS  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

l1=size(w1,1)-1;% 每个网络层中节点的个数
l2=size(w2,1)-1;
l3=size(w3,1)-1;
l4=size(w4,1)-1;
l5=size(w5,1)-1;
l6=size(w6,1)-1;
l7=size(w7,1)-1;
l8=size(w8,1)-1;
l9=l1;           % 输出层节点和输入层的一样
test_err=[];
train_err=[];


for epoch = 1:maxepoch

%%  %%%%%%%%%%%%%%%%%% 计算训练误差 COMPUTE TRAINING RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
err=0; 
[numcases numdims numbatches]=size(batchdata);
N=numcases;
 for batch = 1:numbatches
  data = [batchdata(:,:,batch)];
  data = [data ones(N,1)];
  w1probs = 1./(1 + exp(-data*w1)); w1probs = [w1probs  ones(N,1)];%正向传播，计算每一层的输出概率密度p(h|v)，且同时在输出上增加一维（值为常量1）
  w2probs = 1./(1 + exp(-w1probs*w2)); w2probs = [w2probs ones(N,1)];
  w3probs = 1./(1 + exp(-w2probs*w3)); w3probs = [w3probs  ones(N,1)];
  w4probs = w3probs*w4; w4probs = [w4probs  ones(N,1)];
  w5probs = 1./(1 + exp(-w4probs*w5)); w5probs = [w5probs  ones(N,1)];
  w6probs = 1./(1 + exp(-w5probs*w6)); w6probs = [w6probs  ones(N,1)];
  w7probs = 1./(1 + exp(-w6probs*w7)); w7probs = [w7probs  ones(N,1)];
  dataout = 1./(1 + exp(-w7probs*w8));% 输出层的输出概率密度，即：重构数据的概率密度，也即：重构数据
  err= err +  1/N*sum(sum( (data(:,1:end-1)-dataout).^2 )); % 每个batch内的均方误差
  end
 train_err(epoch)=err/numbatches;% 迭代第epoch次的所有样本内的均方误差

%%%%%%%%%%%%%% END OF COMPUTING TRAINING RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%  %% DISPLAY FIGURE TOP ROW REAL DATA BOTTOM ROW RECONSTRUCTIONS 显示真实数据和重构数据 %%%%%%%%%%%%%%%%%%%%%%%%%
fprintf(1,'Displaying in figure 1: Top row - real data, Bottom row -- reconstructions \n');
output=[];
 for ii=1:15
  output = [output data(ii,1:end-1)' dataout(ii,:)'];%output为15（因为是显示15个数字）组，每组2列，分别为理论值和重构值
 end
   if epoch==1 
   close all 
   figure('Position',[100,600,1000,200]);
   else 
   figure(1)
   end 
   mnistdisp(output);%显示图片
   drawnow;%刷新屏幕

%% %%%%%%%%%%%%%%%%%% 计算测试误差 COMPUTE TEST RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[testnumcases testnumdims testnumbatches]=size(testbatchdata);% [100 784 100] 测试数据为100个batch，每个batch含100个测试样本，每个样本维数为784
N=testnumcases;
err=0;
for batch = 1:testnumbatches
  data = [testbatchdata(:,:,batch)];
  data = [data ones(N,1)];
  w1probs = 1./(1 + exp(-data*w1)); w1probs = [w1probs  ones(N,1)];
  w2probs = 1./(1 + exp(-w1probs*w2)); w2probs = [w2probs ones(N,1)];
  w3probs = 1./(1 + exp(-w2probs*w3)); w3probs = [w3probs  ones(N,1)];
  w4probs = w3probs*w4; w4probs = [w4probs  ones(N,1)];
  w5probs = 1./(1 + exp(-w4probs*w5)); w5probs = [w5probs  ones(N,1)];
  w6probs = 1./(1 + exp(-w5probs*w6)); w6probs = [w6probs  ones(N,1)];
  w7probs = 1./(1 + exp(-w6probs*w7)); w7probs = [w7probs  ones(N,1)];
  dataout = 1./(1 + exp(-w7probs*w8));
  err = err +  1/N*sum(sum( (data(:,1:end-1)-dataout).^2 ));
  end
 test_err(epoch)=err/testnumbatches;
 fprintf(1,'Before epoch %d Train squared error: %6.3f Test squared error: %6.3f \t \t \n',epoch,train_err(epoch),test_err(epoch));

%%%%%%%%%%%%%% END OF COMPUTING TEST RECONSTRUCTION ERROR %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 
 tt=0;
 for batch = 1:numbatches/10          % 训练样本：批数numbatches是600，每个batch内100个样本，组合后变为批数60，每个batch1000个样本
 fprintf(1,'epoch %d batch %d\r',epoch,batch);

%%%%%%%%%%% 在训练数据内组合10个mini-batch为一个larger-batch ，COMBINE 10 MINIBATCHES INTO 1 LARGER MINIBATCH %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 tt=tt+1; 
 data=[];
 for kk=1:10
  data=[data 
        batchdata(:,:,(tt-1)*10+kk)]; %使训练数据变为60个batch，每个batch内含1000个样本
 end 

%%%%%%%%%%%%%%% PERFORM CONJUGATE GRADIENT WITH 3 LINESEARCHES 进行共轭梯度3次线性搜索%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  max_iter=3;
  VV = [w1(:)' w2(:)' w3(:)' w4(:)' w5(:)' w6(:)' w7(:)' w8(:)']';% 把所有权值（已经包括了偏置值）变成一个大的列向量
  Dim = [l1; l2; l3; l4; l5; l6; l7; l8; l9];% 每层网络对应节点的个数（不包括偏置值）

  [X, fX] = minimize(VV,'CG_MNIST',max_iter,Dim,data);% X为3次线性搜索最优化后得到的权值参数，是一个列向量

  w1 = reshape(X(1:(l1+1)*l2),l1+1,l2);
  xxx = (l1+1)*l2;
  w2 = reshape(X(xxx+1:xxx+(l2+1)*l3),l2+1,l3);
  xxx = xxx+(l2+1)*l3;
  w3 = reshape(X(xxx+1:xxx+(l3+1)*l4),l3+1,l4);
  xxx = xxx+(l3+1)*l4;
  w4 = reshape(X(xxx+1:xxx+(l4+1)*l5),l4+1,l5);
  xxx = xxx+(l4+1)*l5;
  w5 = reshape(X(xxx+1:xxx+(l5+1)*l6),l5+1,l6);
  xxx = xxx+(l5+1)*l6;
  w6 = reshape(X(xxx+1:xxx+(l6+1)*l7),l6+1,l7);
  xxx = xxx+(l6+1)*l7;
  w7 = reshape(X(xxx+1:xxx+(l7+1)*l8),l7+1,l8);
  xxx = xxx+(l7+1)*l8;
  w8 = reshape(X(xxx+1:xxx+(l8+1)*l9),l8+1,l9);%依次重新赋值为优化后的参数

%%%%%%%%%%%%%%% END OF CONJUGATE GRADIENT WITH 3 LINESEARCHES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 end

 save mnist_weights w1 w2 w3 w4 w5 w6 w7 w8 
 save mnist_error test_err train_err;

end

CG_MNIST.m

% Version 1.000
% 得到代价函数及其对各权值的偏导数
% Code provided by Ruslan Salakhutdinov and Geoff Hinton
%
% Permission is granted for anyone to copy, use, modify, or distribute this
% program and accompanying programs and documents for any purpose, provided
% this copyright notice is retained and prominently displayed, along with
% a note saying that the original programs are available from our
% web page.
% The programs and documents are distributed without any warranty, express or
% implied.  As the programs were written for research purposes only, they have
% not been tested to the degree that would be advisable in any important
% application.  All use of these programs is entirely at the user's own risk.

function [f, df] = CG_MNIST(VV,Dim,XX)
% VV：权值（已经包括了偏置值），为一个大的列向量
% Dim：每层网络对应节点的个数
% XX：训练样本
% f ：代价函数，即交叉熵误差
% df ：代价函数对各权值的偏导数


l1 = Dim(1);%每层网络对应节点的个数（不包括偏置值）
l2 = Dim(2);
l3 = Dim(3);
l4= Dim(4);
l5= Dim(5);
l6= Dim(6);
l7= Dim(7);
l8= Dim(8);
l9= Dim(9);
N = size(XX,1);% 样本的个数

% Do decomversion.下面一系列步骤完成权值的矩阵化
 w1 = reshape(VV(1:(l1+1)*l2),l1+1,l2);% VV是一个长的列向量，它包括偏置值和权值，这里取出的向量已经包括了偏置值
 xxx = (l1+1)*l2; %xxx 表示已经使用了的长度
 w2 = reshape(VV(xxx+1:xxx+(l2+1)*l3),l2+1,l3);
 xxx = xxx+(l2+1)*l3;
 w3 = reshape(VV(xxx+1:xxx+(l3+1)*l4),l3+1,l4);
 xxx = xxx+(l3+1)*l4;
 w4 = reshape(VV(xxx+1:xxx+(l4+1)*l5),l4+1,l5);
 xxx = xxx+(l4+1)*l5;
 w5 = reshape(VV(xxx+1:xxx+(l5+1)*l6),l5+1,l6);
 xxx = xxx+(l5+1)*l6;
 w6 = reshape(VV(xxx+1:xxx+(l6+1)*l7),l6+1,l7);
 xxx = xxx+(l6+1)*l7;
 w7 = reshape(VV(xxx+1:xxx+(l7+1)*l8),l7+1,l8);
 xxx = xxx+(l7+1)*l8;
 w8 = reshape(VV(xxx+1:xxx+(l8+1)*l9),l8+1,l9);


  XX = [XX ones(N,1)];% 训练样本，加1维使其下可乘w1
  w1probs = 1./(1 + exp(-XX*w1)); w1probs = [w1probs  ones(N,1)];
  w2probs = 1./(1 + exp(-w1probs*w2)); w2probs = [w2probs ones(N,1)];
  w3probs = 1./(1 + exp(-w2probs*w3)); w3probs = [w3probs  ones(N,1)];
  w4probs = w3probs*w4; w4probs = [w4probs  ones(N,1)];% 第5层神经元激活函数为1，而不是logistic函数
  w5probs = 1./(1 + exp(-w4probs*w5)); w5probs = [w5probs  ones(N,1)];
  w6probs = 1./(1 + exp(-w5probs*w6)); w6probs = [w6probs  ones(N,1)];
  w7probs = 1./(1 + exp(-w6probs*w7)); w7probs = [w7probs  ones(N,1)];
  XXout = 1./(1 + exp(-w7probs*w8));% 输出层的概率密度，也就是重构数据

f = -1/N*sum(sum( XX(:,1:end-1).*log(XXout) + (1-XX(:,1:end-1)).*log(1-XXout)));%代价函数，即交叉熵误差，怎么推导的？可见论文最后一页
IO = 1/N*(XXout-XX(:,1:end-1));% 重构误差
%% % 用后向传导算法求各层偏导数df，见“http://ufldl.stanford.edu/wiki/index.php/用反向传导思想求导”
Ix8=IO; % 相当于输出层“残差”
dw8 =  w7probs'*Ix8;% 用后向传导算法求输出层偏导数

Ix7 = (Ix8*w8').*w7probs.*(1-w7probs); % 第7层“残差”
Ix7 = Ix7(:,1:end-1);
dw7 =  w6probs'*Ix7;  % 第7层偏导数

Ix6 = (Ix7*w7').*w6probs.*(1-w6probs); 
Ix6 = Ix6(:,1:end-1);
dw6 =  w5probs'*Ix6;

Ix5 = (Ix6*w6').*w5probs.*(1-w5probs); 
Ix5 = Ix5(:,1:end-1);
dw5 =  w4probs'*Ix5;

Ix4 = (Ix5*w5');
Ix4 = Ix4(:,1:end-1);
dw4 =  w3probs'*Ix4;

Ix3 = (Ix4*w4').*w3probs.*(1-w3probs); 
Ix3 = Ix3(:,1:end-1);
dw3 =  w2probs'*Ix3;

Ix2 = (Ix3*w3').*w2probs.*(1-w2probs); 
Ix2 = Ix2(:,1:end-1);
dw2 =  w1probs'*Ix2;

Ix1 = (Ix2*w2').*w1probs.*(1-w1probs); 
Ix1 = Ix1(:,1:end-1);
dw1 =  XX'*Ix1;

df = [dw1(:)' dw2(:)' dw3(:)' dw4(:)' dw5(:)' dw6(:)'  dw7(:)'  dw8(:)'  ]'; %网络代价函数的偏导数

minimize.m

function [X, fX, i] = minimize(X, f, length, varargin)
%作用：利用共轭梯度下降法对目标函数进行优化
% Minimize a differentiable multivariate function. 

% [X, fX, i]中的X : 3次线性搜索最优化后得到的权值参数，是一个列向量

% minimize(X, f, length, varargin)中的X : 优化目标，即权值
% minimize(X, f, length, varargin)中的f : 代价函数的名称
% minimize(X, f, length, varargin)中的length : 线性搜索次数
% minimize(X, f, length, varargin)中的varargin : 每层网络对应的节点数Dim和训练数据data

% Usage: [X, fX, i] = minimize(X, f, length, P1, P2, P3, ... )
%
% where the starting point is given by "X" (D by 1), and the function named in
% the string "f", must return a function value and a vector of partial
% derivatives of f wrt X, the "length" gives the length of the run: if it is
% positive, it gives the maximum number of line searches, if negative its
% absolute gives the maximum allowed number of function evaluations. You can
% (optionally) give "length" a second component, which will indicate the
% reduction in function value to be expected in the first line-search (defaults
% to 1.0). The parameters P1, P2, P3, ... are passed on to the function f.
%
% The function returns when either its length is up, or if no further progress
% can be made (ie, we are at a (local) minimum, or so close that due to
% numerical problems, we cannot get any closer). NOTE: If the function
% terminates within a few iterations, it could be an indication that the
% function values and derivatives are not consistent (ie, there may be a bug in
% the implementation of your "f" function). The function returns the found
% solution "X", a vector of function values "fX" indicating the progress made
% and "i" the number of iterations (line searches or function evaluations,
% depending on the sign of "length") used.
%
% The Polack-Ribiere flavour of conjugate gradients is used to compute search
% directions, and a line search using quadratic and cubic polynomial
% approximations and the Wolfe-Powell stopping criteria is used together with
% the slope ratio method for guessing initial step sizes. Additionally a bunch
% of checks are made to make sure that exploration is taking place and that
% extrapolation will not be unboundedly large.
%
% See also: checkgrad 
%
% Copyright (C) 2001 - 2006 by Carl Edward Rasmussen (2006-09-08).

INT = 0.1;    % don't reevaluate within 0.1 of the limit of the current bracket
EXT = 3.0;                  % extrapolate maximum 3 times the current step-size
MAX = 20;                         % max 20 function evaluations per line search
RATIO = 10;                                       % maximum allowed slope ratio
SIG = 0.1; RHO = SIG/2; % SIG and RHO are the constants controlling the Wolfe-
% Powell conditions. SIG is the maximum allowed absolute ratio between
% previous and new slopes (derivatives in the search direction), thus setting
% SIG to low (positive) values forces higher precision in the line-searches.
% RHO is the minimum allowed fraction of the expected (from the slope at the
% initial point in the linesearch). Constants must satisfy 0 < RHO < SIG < 1.
% Tuning of SIG (depending on the nature of the function to be optimized) may
% speed up the minimization; it is probably not worth playing much with RHO.

% The code falls naturally into 3 parts, after the initial line search is
% started in the direction of steepest descent. 1) we first enter a while loop
% which uses point 1 (p1) and (p2) to compute an extrapolation (p3), until we
% have extrapolated far enough (Wolfe-Powell conditions). 2) if necessary, we
% enter the second loop which takes p2, p3 and p4 chooses the subinterval
% containing a (local) minimum, and interpolates it, unil an acceptable point
% is found (Wolfe-Powell conditions). Note, that points are always maintained
% in order p0 <= p1 <= p2 < p3 < p4. 3) compute a new search direction using
% conjugate gradients (Polack-Ribiere flavour), or revert to steepest if there
% was a problem in the previous line-search. Return the best value so far, if
% two consecutive line-searches fail, or whenever we run out of function
% evaluations or line-searches. During extrapolation, the "f" function may fail
% either with an error or returning Nan or Inf, and minimize should handle this
% gracefully.

if max(size(length)) == 2, red=length(2); length=length(1); else red=1; end
if length>0, S='Linesearch'; else S='Function evaluation'; end 

i = 0;                                            % zero the run length counter
ls_failed = 0;                             % no previous line search has failed
[f0 df0] = feval(f, X, varargin{:});          % get function value and gradient
fX = f0;
i = i + (length<0);                                            % count epochs?!
s = -df0; d0 = -s'*s;           % initial search direction (steepest) and slope
x3 = red/(1-d0);                                  % initial step is red/(|s|+1)

while i < abs(length)                                      % while not finished
  i = i + (length>0);                                      % count iterations?!

  X0 = X; F0 = f0; dF0 = df0;                   % make a copy of current values
  if length>0, M = MAX; else M = min(MAX, -length-i); end

  while 1                             % keep extrapolating as long as necessary
    x2 = 0; f2 = f0; d2 = d0; f3 = f0; df3 = df0;
    success = 0;
    while ~success && M > 0
      try
        M = M - 1; i = i + (length<0);                         % count epochs?!
        [f3 df3] = feval(f, X+x3*s, varargin{:});
        if isnan(f3) || isinf(f3) || any(isnan(df3)+isinf(df3)), error(''), end
        success = 1;
      catch                                % catch any error which occured in f
        x3 = (x2+x3)/2;                                  % bisect and try again
      end
    end
    if f3 < F0, X0 = X+x3*s; F0 = f3; dF0 = df3; end         % keep best values
    d3 = df3'*s;                                                    % new slope
    if d3 > SIG*d0 || f3 > f0+x3*RHO*d0 || M == 0  % are we done extrapolating?
      break
    end
    x1 = x2; f1 = f2; d1 = d2;                        % move point 2 to point 1
    x2 = x3; f2 = f3; d2 = d3;                        % move point 3 to point 2
    A = 6*(f1-f2)+3*(d2+d1)*(x2-x1);                 % make cubic extrapolation
    B = 3*(f2-f1)-(2*d1+d2)*(x2-x1);
    x3 = x1-d1*(x2-x1)^2/(B+sqrt(B*B-A*d1*(x2-x1))); % num. error possible, ok!
    if ~isreal(x3) || isnan(x3) || isinf(x3) || x3 < 0 % num prob | wrong sign?
      x3 = x2*EXT;                                 % extrapolate maximum amount
    elseif x3 > x2*EXT                  % new point beyond extrapolation limit?
      x3 = x2*EXT;                                 % extrapolate maximum amount
    elseif x3 < x2+INT*(x2-x1)         % new point too close to previous point?
      x3 = x2+INT*(x2-x1);
    end
  end                                                       % end extrapolation

  while (abs(d3) > -SIG*d0 || f3 > f0+x3*RHO*d0) && M > 0  % keep interpolating
    if d3 > 0 || f3 > f0+x3*RHO*d0                         % choose subinterval
      x4 = x3; f4 = f3; d4 = d3;                      % move point 3 to point 4
    else
      x2 = x3; f2 = f3; d2 = d3;                      % move point 3 to point 2
    end
    if f4 > f0           
      x3 = x2-(0.5*d2*(x4-x2)^2)/(f4-f2-d2*(x4-x2));  % quadratic interpolation
    else
      A = 6*(f2-f4)/(x4-x2)+3*(d4+d2);                    % cubic interpolation
      B = 3*(f4-f2)-(2*d2+d4)*(x4-x2);
      x3 = x2+(sqrt(B*B-A*d2*(x4-x2)^2)-B)/A;        % num. error possible, ok!
    end
    if isnan(x3) || isinf(x3)
      x3 = (x2+x4)/2;               % if we had a numerical problem then bisect
    end
    x3 = max(min(x3, x4-INT*(x4-x2)),x2+INT*(x4-x2));  % don't accept too close
    [f3 df3] = feval(f, X+x3*s, varargin{:});
    if f3 < F0, X0 = X+x3*s; F0 = f3; dF0 = df3; end         % keep best values
    M = M - 1; i = i + (length<0);                             % count epochs?!
    d3 = df3'*s;                                                    % new slope
  end                                                       % end interpolation

  if abs(d3) < -SIG*d0 && f3 < f0+x3*RHO*d0          % if line search succeeded
    X = X+x3*s; f0 = f3; fX = [fX' f0]';                     % update variables
    fprintf('%s %6i;  Value %4.6e\r', S, i, f0);
    s = (df3'*df3-df0'*df3)/(df0'*df0)*s - df3;   % Polack-Ribiere CG direction
    df0 = df3;                                               % swap derivatives
    d3 = d0; d0 = df0'*s;
    if d0 > 0                                      % new slope must be negative
      s = -df0; d0 = -s'*s;                  % otherwise use steepest direction
    end
    x3 = x3 * min(RATIO, d3/(d0-realmin));          % slope ratio but max RATIO
    ls_failed = 0;                              % this line search did not fail
  else
    X = X0; f0 = F0; df0 = dF0;                     % restore best point so far
    if ls_failed || i > abs(length)         % line search failed twice in a row
      break;                             % or we ran out of time, so we give up
    end
    s = -df0; d0 = -s'*s;                                        % try steepest
    x3 = 1/(1-d0);                     
    ls_failed = 1;                                    % this line search failed
  end
end
fprintf('\n');

参考文献

Deep learning：三十五(用NN实现数据降维练习)；

Deep learning：三十四(用NN实现数据的降维)；

Reducing the dimensionality of data with neural networks；

supporting online material (pdf) ；

转载于:https://www.cnblogs.com/dmzhuo/p/5072808.html

你可能感兴趣的:(Deep Learning 16：用自编码器对数据进行降维_读论文“Reducing the Dimensionality of Data with Neural Networks”的笔记...)

Django 初始化导入数据详解 jay丿 django sqlite 数据库
Django初始化导入数据详解在Django项目中，初始化数据导入是一个常见的需求，特别是在开发阶段或者部署新环境时，通常需要一些预置的数据来确保应用能够正常运行。Django提供了一种高效的方法来加载初始化数据，即通过fixtures机制。本文将详细介绍如何使用Django的fixtures功能来导入初始化数据。1.Fixtures机制概述Django默认会在应用的目录下查找名为fixtures
Django框架的全面指南：从入门到高级步入烟尘 Python超入门指南全册 django sqlite 数据库
本文已收录于《Python超入门指南全册》本专栏专门针对零基础和需要进阶提升的同学所准备的一套完整教学，从基础到精通不断进阶深入，后续还有实战项目，轻松应对面试，专栏订阅地址：https://blog.csdn.net/mrdeam/category_12647587.html优点：订阅限时19.9付费专栏，私信博主还可进入全栈VIP答疑群，作者优先解答机会（代码指导、远程服务），群里大佬众多可以
Spring Boot 集成 Kettle m0_74824112 面试学习路线阿里巴巴 spring boot 后端 java
Kettle简介Kettle最初由MattCasters开发，是Pentaho数据集成平台的一部分。它提供了一个用户友好的界面和丰富的功能集，使用户能够轻松地设计、执行和监控ETL任务。Kettle通过其强大的功能和灵活性，帮助企业高效地处理大规模数据集成任务。主要组成部分Spoon：用途：Spoon是Kettle的图形化设计工具。用户可以使用Spoon设计和调试ETL转换和作业。功能：拖放式界面
PCB 制版的注意事项菜只因C stm32
一、设计阶段（一）布局规划元件间距：元件间距的设置在PCB设计中至关重要，它直接影响到电路板的可制造性、可维护性以及散热性能。对于手工焊接，元件引脚间距离不小于1mm是较为安全的标准，这能让操作人员有足够空间准确地将焊料施加到引脚连接处，避免因空间狭窄导致的焊接短路或虚焊问题。例如在一个采用直插式元件较多的电源模块中，电解电容、功率电阻等元件引脚间距如果小于1mm，焊接时电烙铁头很容易同时接触到相
Web前端发展史王珍岩笔记
1、静态页面阶段那是1990年的12月25日，恰是西方的圣诞节，TimBerners-Lee在他的NeXT电脑上部署了第一套“主机-网站-浏览器”构成的Web系统，这标志BS架构的网站应用软件的开端，也是前端工程的开端。1993年4月Mosaic浏览器作为第一款正式的浏览器发布。1994年11月，鼎鼎大名的Navigator浏览器发布发布了，到年底W3C在Berners-Lee的主持下成立，标志着
大模型全军覆没，中科院自动化所推出多图数学推理新基准 | CVPR 2025 量子位
关注前沿科技量子位挑战多图数学推理新基准，大模型直接全军覆没？！事情是这样的。近日，中国科学院自动化研究所推出多图数学推理全新基准MV-MATH（该工作已被CVPR2025录用），这是一个精心策划的多图数学推理数据集，旨在全面评估MLLM（多模态大语言模型）在多视觉场景中的数学推理能力。结果评估下来发现，GPT-4o仅得分32.1，类o1模型QvQ得分29.3，所有模型均不及格。具体咋回事，下面接
EMQ 启用 SSL/TLS 加密连接奋斗者潘 MQTT EMQ 启用 SSL/TLS 加密连接使用 WebSocket 客户端连接 MQTT 服务器
EMQ启用SSL/TLS加密连接使用加密连接的时候选择wss协议，并使用域名连接：绑定域名-证书之后，必须使用域名而非IP地址进行连接，这样浏览器才会根据域名去校验证书以在通过校验后建立连接。在EMQ上配置打开etc/emqx.conf配置文件，修改以下配置#wss监听地址listener.wss.external=8084#修改密钥文件地址listener.wss.external.keyfil
书籍-《车辆动力学的控制应用》自动驾驶人工智能无人驾驶汽车
书籍：ControlApplicationsofVehicleDynamics作者：JingshengYu，VladimirVantsevich出版：CRCPress编辑：陈萍萍的公主@一点人工一点智能下载：书籍下载-《车辆动力学的控制应用》01书籍介绍本书介绍了汽车动力学和控制理论的基本知识，并结合NILabVIEW软件产品的应用，为设计先进的车辆动力学及车辆系统控制器提供了实用且高度技术性的指
项目风险分析报告怎么写项目风险评估
厘清项目背景、识别关键风险源、评估发生概率与影响、制定对应应对方案，是写好项目风险分析报告的四大核心步骤。看似简单的条目，却在实际操作中蕴含着深厚的方法论与经验积累。我个人特别强调“识别关键风险源”，因为若定位不准、范围过宽或过窄，都可能导致资源浪费或风险漏判。掌握准确的风险范围能帮助我们在后续的评估和应对阶段精准发力，避免将过多精力投入到不具备实际威胁的环节中。“识别关键风险源”的实质是通过系统
项目管理软件分类有哪些项目管理
按功能分类、按部署方式分类、按行业特点分类、按项目管理思想分类，是当下主流的项目管理软件分类方式。其中，按功能分类可细分为进度管理、任务协作、风险控制、成本管控等多种类型，能够针对项目所需功能进行精确选型。项目管理软件选择需匹配企业现状和管理需求，如同“好马配好鞍”，才能事半功倍、提高效率与质量。在众多分类中，我个人尤其推荐按功能分类，因为它能让不同部门或不同阶段的工作需要一目了然，也更容易帮助管
探索创新：CanvasParticles - 点燃你的网页动态效果柏赢安Simona
探索创新：CanvasParticles-点燃你的网页动态效果去发现同类优质开源项目:https://gitcode.com/是一个开源的JavaScript库，专注于在HTML5Canvas上创建引人入胜的粒子动画效果。如果你是Web开发者，正在寻找一种方法为你的网站增添独特的视觉吸引力，那么这个项目绝对值得你深入了解。项目简介CanvasParticles提供了一套简洁而强大的API，让你能够
探索Coco-Web：一款强大的H5创作工具岑晔含Dora
探索Coco-Web：一款强大的H5创作工具去发现同类优质开源项目:https://gitcode.com/是一个开源的、基于Web的H5（HTML5）创作平台，旨在让开发者和设计师能够轻松地创建互动式的内容和应用。通过其直观的界面和丰富的功能，无论你是编程高手还是初学者，都能够利用Coco-Web制作出富有吸引力的数字内容。技术分析Coco-Web基于现代Web技术构建，包括：React.js:
探索CoreHTML5Canvas：创作动态Web图形的新工具郁英忆
探索CoreHTML5Canvas：创作动态Web图形的新工具去发现同类优质开源项目:https://gitcode.com/是一个强大的JavaScript库，专为开发者设计，旨在简化和增强在Web上创建交互式和动画图形的能力。这个项目利用HTML5Canvas元素，提供了一个简洁且高效的API，让开发人员可以轻松地构建出丰富的2D渲染效果。技术分析HTML5Canvas是HTML5的一个重要特
websocket wss_使用wss和HTTPS / TLS保护WebSocket的安全 dnc8371 java python 数据库安全 github
websocketwss这个博客的第50条提示，是的！技术提示＃49说明了如何使用用户名/密码和Servlet安全机制保护WebSocket的安全。本技术提示将说明如何在WildFly上使用HTTPS/TLS保护WebSockets。让我们开始吧！创建一个新的密钥库：keytool-genkey-aliaswebsocket-keyalgRSA-keystorewebsocket.keystore
JavaEE 项目常见错误解决方案一弦一柱 JavaEE 常见错误中文乱码 JSP 404
JavaEE项目常见错误解决方案数据库连接JavaBean获取不到数据库字段值或出现意料之外的值业务中出现null或""404NOTFOUNDGET请求中文乱码form表单提交中文乱码最近的实训中，练了一个比较基础的项目，JSP+Servlet+JavaBean，完成两张表的CRUD操作，前端使用Bootstrap和JQuery，交互使用AJAX，IDE选用Eclipse,在时间比较仓促的情况下完
LLM大语言模型项目知识点总结——Gunicorn、Flask和Docker NLP的小Y 语言模型 gunicorn flask
一、Flask框架1.1Blueprint流程：创建蓝图对象；在蓝图上定义路由和视图函数；在应用程序对象上注册蓝图(url_prefix参数指定蓝图的URL前缀)1.2CORS(app)Cross-OriginResourceSharing处理跨域的需求[email protected]_request钩子函数,在正常执行的时候插入一些东西，先执行这个东西然后再正常执行（hook）；并且先执行flas
OpenHarmony子系统开发 - 部件配置规则 __Benco 子系统开发 openharmony harmonyos 人工智能
OpenHarmony子系统开发-部件配置规则部件的bundle.json放在部件源码的根目录下。以泛sensor子系统的sensor服务部件为例，部件属性定义描述文件字段说明如下：{"name":"@ohos/sensor_lite",#HPM部件英文名称，格式"@组织/部件名称""description":"Sensorservices",#部件功能一句话描述"version":"3.1",#
如何实现和调试REST API中的摘要认证（Digest Authentication）
如何实现和调试RESTAPI中的摘要认证（DigestAuthentication）在保护RESTAPI时，开发者通常会在多种认证机制之间进行选择，其中摘要认证（DigestAuthentication）是一种常见的选择。本文探讨了使用摘要认证的原因，解释了其原理，提供了Java和Go语言的实现示例，并提供了测试该认证的工具和方法。为什么使用摘要认证来保护RESTAPI？摘要认证是一种安全的用户验
使用Spring Boot集成Kafka开发：接收Kafka消息的Java应用 YazIdris java spring boot kafka
Kafka是一个分布式的流处理平台，它具有高吞吐量、可扩展性和容错性的特点。SpringBoot提供了与Kafka集成的便捷方式，使得开发者可以轻松地创建Kafka消息接收应用。本文将介绍如何使用SpringBoot集成Kafka开发，以及如何编写Java代码来接收Kafka消息。首先，确保你已经安装了Kafka和Zookeeper，并启动了它们。接下来，创建一个新的SpringBoot项目，并添
实现openAI流式打印效果 (包含markdown代码高亮及复制功能) Todo_MrWu javascript vue.js 前端
准备工作//插件npminstallmarkdown-ithighlight.js//引入文件importMarkdownItfrom'markdown-it'importhljsfrom'highlight.js'import'highlight.js/styles/atom-one-dark.css'初始化数据data(){return{vHtml:'',//最终填充展示的htmlstring
使用Nginx进行反向代理HTTPS服务 TechABC nginx https 运维服务器
Nginx是一款高性能的Web服务器和反向代理服务器，它能够处理大量并发连接并提供快速的服务。在本文中，我们将学习如何使用Nginx来配置反向代理HTTPS服务。下面是一个详细的配置示例，以帮助您实现此目标。首先，您需要确保已经安装了Nginx。您可以通过以下命令来安装Nginx：sudoaptupdatesudoaptinstallnginx安装完成后，您可以编辑Nginx的配置文件。该文件通常
定期备份数据库：基于 Shell 脚本的自动化方案 mysql服务器脚本
数据库备份这件事，说实话，我一直没怎么上心。平时服务器跑得好好的，谁会想着备份呢？直到某天真出问题了，才意识到自己平时有多“懒”。我相信很多人跟我一样，觉得这东西看起来麻烦，等到数据库挂了、数据丢失了，才感叹自己怎么就没提前准备好呢？有一次数据库问题搞得我手忙脚乱，最后还好有个朋友给了我个备份文件，才算是有惊无险。经历了这次以后，我决定不能再拖了，必须把备份这事儿自动化起来。所以，我写了一个简单的
一文详解，什么是外贸订单管理系统？有什么功能和特点？
随着全球贸易的快速发展，外贸企业面临订单处理效率低、流程复杂、数据管理混乱等挑战。如何通过数字化工具实现订单全生命周期管理？外贸订单管理系统应运而生。本文将以ZohoBooks为例，解析其核心功能与价值。一、什么是外贸订单管理系统？1、什么是外贸订单管理系统？外贸订单管理系统是一种专门用于管理国际贸易订单的软件工具。它帮助企业跟踪和管理从客户下单到订单完成的整个流程，包括订单录入、订单审核、订单分
如何免费制作景区二维码？二维码
二维码随处可见，你外出旅游，支付用二维码，到了景区也能见到不少二维码。现在越来越多的景区也开始加入二维码，来代替纸质导览图，还能替代一部分的导游，为游客提供更有互动性的体验。一、景区二维码的应用案例招宝山风景区招宝山风景区推出了“智慧旅游”讲解二维码，游客只需打开微信扫描二维码，便能听到讲解员温柔的解说声，深入了解威远城背后的历史故事，让游客在游览过程中能更好地了解景区的文化内涵，提升了旅游体验。
【排序算法】选择排序啥也不会干的小码排序算法排序算法算法 c语言
一、定义：选择排序（Selectionsort）是一种简单直观的排序算法。第一次从待排序的数据（元素）中选出最小（或最大）的一个元素，存放在数组的起始位置，然后再从剩余的没有排序的元素中寻找到最小（大）元素，然后放到已排序的数组的末尾。以此类推，直到全部待排序的数据元素的个数为零。对于数据量大的排序就没啥用了，排的比较慢。二、原理：1、对于待排序的数组，我们从首元素开始，将首元素的下标用min记住
Sijia_y的个人经历以及计算机行业发展 Sijia_y python
如今互联网发展的速度甚是快，以至于技术都在更新迭代。稍有不注意可能就会被淘汰甚至是替代。作为一名中专生，我的成绩也是很差。因为高中考不上的缘故，来到了江苏上学。计算机行业我了解的并不是很多，当时只是听说工资高，铁饭碗。我是一个很懒的人，也是很贪玩。并没有学习很高的兴趣。我接触编程语言，完全是因为我的朋友。因为他是自学C语言的，后面他参加比赛得奖了。我就觉得非常厉害。我就开始学习Python，学会一
前端框架的发展史 Qpeterqiufengyi 专有名词解释前端框架
1、htmlcss+div从1990年代初HTML被发明开始样式表就以各种形式出现了，不同的浏览器结合了它们各自的样式语言，读者可以使用这些样式语言来调节网页的显示方式。一开始样式表是给读者用的，最初的HTML版本只含有很少的显示属性，读者来决定网页应该怎样被显示。但随着HTML的成长，为了满足设计师的要求，HTML获得了很多显示功能。随着这些功能的增加外来定义样式的语言越来越没有意义了。1994
10 大中文医学数据集汇总：涵盖神农中医药、中医药古籍、医学推理、医学问答……
医疗人工智能的快速发展离不开高质量数据集的支持。从疾病诊断到药物研发，再到个性化医疗，数据集在推动机器视觉、大模型等应用于医学领域中发挥着不可或缺的作用。医学数据集的形式多样，涵盖了不同维度和领域的数据资源。例如，在疾病诊断领域，像RJUA-QA这样的问答数据集推动了复杂医学知识的自动化应用；而在中医药领域，神农中医药数据集整合了传统中医药文献、临床案例和药方数据。针对于此，本文整理了医学领域的1
中文对联/十二生肖/城市景点/旅游计划……年味超浓的数据集汇总
正月初三，年味正浓。新春的喜庆氛围不仅弥漫在大街小巷，也在人工智能领域引发了诸多创新应用。从AI生成春联，到春运交通标志的智能识别，再到生肖文化的深度挖掘，AI工具正赋能传统民俗，让年味更浓！在这阖家团圆，喜庆祥和的日子里，HyperAI超神经为大家整理了8个春节相关的数据集，涵盖对联、十二生肖、民族文化等热门主题，助力开发者在AI赋能春节的道路上大展拳脚！快来领取你的「新春大礼包」吧~点击查看更
归并排序（二叉树的后续遍历思想和数组的双指针技巧）冰火同学力扣算法排序算法数据结构
这次归并排序就只讲思路了，代码实现放到下次刷题再做首先确认一下归并排序的时间复杂度是NlogN的时间复杂度。实现归并排序的算法，我认为有几个困难需要克服掉1、首先就是要明确归并排序的算法思想，就是二叉数据的后序遍历，就是先从中间分割成两个子数组，然后继续分，直到只剩下一个元素，那么此时就是有序的，这个和构造二叉树时的分解思想十分相似，把子问题全部解决，那问题也就都解决了，至于我们只关注其中一个节点
web报表工具FineReport常见的数据集报错错误代码和解释老A不折腾 web报表 finereport 代码可视化工具
在使用finereport制作报表，若预览发生错误，很多朋友便手忙脚乱不知所措了，其实没什么，只要看懂报错代码和含义，可以很快的排除错误，这里我就分享一下finereport的数据集报错错误代码和解释，如果有说的不准确的地方，也请各位小伙伴纠正一下。 NS-war-remote=错误代码\:1117 压缩部署不支持远程设计 NS_LayerReport_MultiDs=错误代码
Java的WeakReference与WeakHashMap bylijinnan java 弱引用
首先看看 WeakReference wiki 上 Weak reference 的一个例子： public class ReferenceTest { public static void main(String[] args) throws InterruptedException { WeakReference r = new Wea
Linux——（hostname）主机名与ip的映射 eksliang linux hostname
一、什么是主机名无论在局域网还是INTERNET上，每台主机都有一个IP地址，是为了区分此台主机和彼台主机，也就是说IP地址就是主机的门牌号。但IP地址不方便记忆，所以又有了域名。域名只是在公网（INtERNET)中存在，每个域名都对应一个IP地址，但一个IP地址可有对应多个域名。域名类型 linuxsir.org 这样的；主机名是用于什么的呢？答：在一个局域网中，每台机器都有一个主
oracle 常用技巧 18289753290
oracle常用技巧 ①复制表结构和数据 create table temp_clientloginUser as select distinct userid from tbusrtloginlog ②仅复制数据如果表结构一样 insert into mytable select * &nb
使用c3p0数据库连接池时出现com.mchange.v2.resourcepool.TimeoutException 酷的飞上天空 exception
有一个线上环境使用的是c3p0数据库，为外部提供接口服务。最近访问压力增大后台tomcat的日志里面频繁出现 com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.v2.resourcepool.BasicResou
IT系统分析师如何学习大数据蓝儿唯美大数据
我是一名从事大数据项目的IT系统分析师。在深入这个项目前需要了解些什么呢？学习大数据的最佳方法就是先从了解信息系统是如何工作着手，尤其是数据库和基础设施。同样在开始前还需要了解大数据工具，如Cloudera、Hadoop、Spark、Hive、Pig、Flume、Sqoop与Mesos。系统分析师需要明白如何组织、管理和保护数据。在市面上有几十款数据管理产品可以用于管理数据。你的大数据数据库可能
spring学习——简介 a-john spring
Spring是一个开源框架，是为了解决企业应用开发的复杂性而创建的。Spring使用基本的JavaBean来完成以前只能由EJB完成的事情。然而Spring的用途不仅限于服务器端的开发，从简单性，可测试性和松耦合的角度而言，任何Java应用都可以从Spring中受益。其主要特征是依赖注入、AOP、持久化、事务、SpringMVC以及Acegi Security 为了降低Java开发的复杂性，
自定义颜色的xml文件 aijuans xml
<?xml version="1.0" encoding="utf-8"?> <resources> <color name="white">#FFFFFF</color> <color name="black">#000000</color> &
运营到底是做什么的？ aoyouzi 运营到底是做什么的？
文章来源：夏叔叔（微信号：woshixiashushu），欢迎大家关注！很久没有动笔写点东西，近些日子，由于爱狗团产品上线，不断面试，经常会被问道一个问题。问：爱狗团的运营主要做什么？答：带着用户一起嗨。为什么是带着用户玩起来呢？究竟什么是运营？运营到底是做什么的？那么，我们先来回答一个更简单的问题——互联网公司对运营考核什么？以爱狗团为例，绝大部分的移动互联网公司，对运营部门的考核分为三块——用
js面向对象类和对象百合不是茶 js 面向对象函数创建类和对象
接触js已经有几个月了,但是对js的面向对象的一些概念根本就是模糊的,js是一种面向对象的语言但又不像java一样有class,js不是严格的面向对象语言 ,js在java web开发的地位和java不相上下 ,其中web的数据的反馈现在主流的使用json,json的语法和js的类和属性的创建相似下面介绍一些js的类和对象的创建的技术一:类和对
web.xml之资源管理对象配置 resource-env-ref bijian1013 java web.xml servlet
resource-env-ref元素来指定对管理对象的servlet引用的声明，该对象与servlet环境中的资源相关联 <resource-env-ref> <resource-env-ref-name>资源名</resource-env-ref-name> <resource-env-ref-type>查找资源时返回的资源类
Create a composite component with a custom namespace sunjing
https://weblogs.java.net/blog/mriem/archive/2013/11/22/jsf-tip-45-create-composite-component-custom-namespace When you developed a composite component the namespace you would be seeing would
【MongoDB学习笔记十二】Mongo副本集服务器角色之Arbiter bit1129 mongodb
一、复本集为什么要加入Arbiter这个角色回答这个问题，要从复本集的存活条件和Aribter服务器的特性两方面来说。什么是Artiber？ An arbiter does not have a copy of data set and cannot become a primary. Replica sets may have arbiters to add a
Javascript开发笔记白糖_ JavaScript
获取iframe内的元素通常我们使用window.frames["frameId"].document.getElementById("divId").innerHTML这样的形式来获取iframe内的元素，这种写法在IE、safari、chrome下都是通过的，唯独在fireforx下不通过。其实jquery的contents方法提供了对if
Web浏览器Chrome打开一段时间后，运行alert无效 bozch Web chorme alert 无效
今天在开发的时候，突然间发现alert在chrome浏览器就没法弹出了，很是怪异。试了试其他浏览器，发现都是没有问题的。开始想以为是chorme浏览器有啥机制导致的，就开始尝试各种代码让alert出来。尝试结果是仍然没有显示出来。这样开发的结果，如果客户在使用的时候没有提示，那会带来致命的体验。哎，没啥办法了就关闭浏览器重启。结果就好了，这也太怪异了。难道是cho
编程之美-高效地安排会议图着色问题贪心算法 bylijinnan 编程之美
import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.Random; public class GraphColoringProblem { /**编程之美高效地安排会议图着色问题贪心算法 * 假设要用很多个教室对一组
机器学习相关概念和开发工具 chenbowen00 算法 matlab 机器学习
基本概念：机器学习(Machine Learning, ML)是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。它是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域，它主要使用归纳、综合而不是演绎。开发工具 M
[宇宙经济学]关于在太空建立永久定居点的可能性 comsci 经济
大家都知道,地球上的房地产都比较昂贵,而且土地证经常会因为新的政府的意志而变幻文本格式........ 所以,在地球议会尚不具有在太空行使法律和权力的力量之前,我们外太阳系统的友好联盟可以考虑在地月系的某些引力平衡点上面,修建规模较大的定居点
oracle 11g database control 证书错误 daizj oracle 证书错误 oracle 11G 安装
oracle 11g database control 证书错误 win7 安装完oracle11后打开 Database control 后，会打开em管理页面，提示证书错误，点“继续浏览此网站”，还是会继续停留在证书错误页面解决办法：是 KB2661254 这个更新补丁引起的，它限制了 RSA 密钥位长度少于 1024 位的证书的使用。具体可以看微软官方公告：
Java I/O之用FilenameFilter实现根据文件扩展名删除文件游其是你 FilenameFilter
在Java中，你可以通过实现FilenameFilter类并重写accept(File dir, String name) 方法实现文件过滤功能。在这个例子中，我们向你展示在“c:\\folder”路径下列出所有“.txt”格式的文件并删除。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
C语言数组的简单以及一维数组的简单排序算法示例，二维数组简单示例 dcj3sjt126com c array
# include <stdio.h> int main(void) { int a[5] = {1, 2, 3, 4, 5}; //a 是数组的名字 5是表示数组元素的个数，并且这五个元素分别用a[0], a[1]...a[4] int i; for (i=0; i<5; ++i) printf("%d\n",
PRIMARY, INDEX, UNIQUE 这3种是一类 PRIMARY 主键。就是唯一且不能为空。 INDEX 索引，普通的 UNIQUE 唯一索引 dcj3sjt126com primary
PRIMARY, INDEX, UNIQUE 这3种是一类PRIMARY 主键。就是唯一且不能为空。INDEX 索引，普通的UNIQUE 唯一索引。不允许有重复。FULLTEXT 是全文索引，用于在一篇文章中，检索文本信息的。举个例子来说，比如你在为某商场做一个会员卡的系统。这个系统有一个会员表有下列字段：会员编号 INT会员姓名
java集合辅助类 Collections、Arrays shuizhaosi888 Collections Arrays HashCode
Arrays、Collections 1 ）数组集合之间转换 public static <T> List<T> asList(T... a) { return new ArrayList<>(a); } a）Arrays.asL
Spring Security（10）——退出登录logout 234390216 logout Spring Security 退出登录 logout-url LogoutFilter
要实现退出登录的功能我们需要在http元素下定义logout元素，这样Spring Security将自动为我们添加用于处理退出登录的过滤器LogoutFilter到FilterChain。当我们指定了http元素的auto-config属性为true时logout定义是会自动配置的，此时我们默认退出登录的URL为“/j_spring_secu
透过源码学前端之 Backbone 三 Model 逐行分析JS源代码 backbone 源码分析 js学习
Backbone 分析第三部分 Model 概述： Model 提供了数据存储，将数据以JSON的形式保存在 Model的 attributes里，但重点功能在于其提供了一套功能强大，使用简单的存、取、删、改数据方法，并在不同的操作里加了相应的监听事件，如每次修改添加里都会触发 change，这在据模型变动来修改视图时很常用，并且与collection建立了关联。
SpringMVC源码总结（七）mvc:annotation-driven中的HttpMessageConverter 乒乓狂魔 springMVC
这一篇文章主要介绍下HttpMessageConverter整个注册过程包含自定义的HttpMessageConverter，然后对一些HttpMessageConverter进行具体介绍。 HttpMessageConverter接口介绍： public interface HttpMessageConverter<T> { /** * Indicate
分布式基础知识和算法理论 bluky999 算法 zookeeper 分布式一致性哈希 paxos
分布式基础知识和算法理论 BY [email protected] 本文永久链接：http://nodex.iteye.com/blog/2103218 在大数据的背景下，不管是做存储，做搜索，做数据分析，或者做产品或服务本身，面向互联网和移动互联网用户，已经不可避免地要面对分布式环境。笔者在此收录一些分布式相关的基础知识和算法理论介绍，在完善自我知识体系的同
Android Studio的.gitignore以及gitignore无效的解决 bell0901 android gitignore
　　github上.gitignore模板合集，里面有各种.gitignore ： https://github.com/github/gitignore 　　自己用的Android Studio下项目的.gitignore文件，对github上的android.gitignore添加了　　　　　　# OSX files　　　　　　//mac os下　　　　　　.DS_Store
成为高级程序员的10个步骤 tomcat_oracle 编程
What 软件工程师的职业生涯要历经以下几个阶段：初级、中级，最后才是高级。这篇文章主要是讲如何通过 10 个步骤助你成为一名高级软件工程师。 Why 得到更多的报酬！因为你的薪水会随着你水平的提高而增加提升你的职业生涯。成为了高级软件工程师之后，就可以朝着架构师、团队负责人、CTO 等职位前进历经更大的挑战。随着你的成长，各种影响力也会提高。
mongdb在linux下的安装 xtuhcy mongodb linux
一、查询linux版本号： lsb_release -a LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noa