本实验仍然是对手写数字0-9的识别,相比于之前的模型变得更复杂,通过多层隐含层从原始特征中学习更能代表数据特点的特征,然后把学习到的新特征输入到softmax回归进行分类。实验中使用了更深的神经网络(更复杂),和我们的预期一样,最后的分类正确率更高了,但与此同时,模型的训练和测试时间也变长了。
可以发现,新的特征学习比较的耗时,其次就是微调,这个过程成深度的核心过程。
Tips:
由于训练数据比较大,如果之间在整个训练集上进行debug的话,需要很长的时间,debug的效率变得非常低。我的做法是只从整个训练集取1000个作为训练集,然后把稀疏自编码训练的那部分最大迭代次数设为100(一般为400次),这样可以大大提高debug的效率,等程序可以正常跑的时候再把代码改回来。
下面是训练数据更改的代码,很简单,需要注意的是trainLabels是一个列向量。
trainData = trainData(:,1:1000);
trainLabels = trainLabels(1:1000,:);
发现一个比较奇特线性,只用1000个训练数据,然后进行最大迭代次数设为100的稀疏自编码学习,最后得到识别正确率也非常的高,达到800以上。这从一定程度上反映了,deep learning算法,在没有较多的训练数据时也可以表现出较好的性能,也说明了deep learning算法的强大。
function [ cost, grad ] = stackedAECost(theta, inputSize, hiddenSize, ...
numClasses, netconfig, ...
lambda, data, labels)
% stackedAECost: Takes a trained softmaxTheta and a training data set with labels,
% and returns cost and gradient using a stacked autoencoder model. Used for
% finetuning.
% theta: trained weights from the autoencoder
% visibleSize: the number of input units
% hiddenSize: the number of hidden units *at the 2nd layer*
% numClasses: the number of categories
% netconfig: the network configuration of the stack
% lambda: the weight regularization penalty
% data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example.
% labels: A vector containing labels, where labels(i) is the label for the
% i-th training example
%% Unroll softmaxTheta parameter
% We first extract the part which compute the softmax gradient
softmaxTheta = reshape(theta(1:hiddenSize*numClasses), numClasses, hiddenSize);
% Extract out the "stack"
stack = params2stack(theta(hiddenSize*numClasses+1:end), netconfig);
% You will need to compute the following gradients
softmaxThetaGrad = zeros(size(softmaxTheta));
stackgrad = cell(size(stack));
for d = 1:numel(stack)
stackgrad{d}.w = zeros(size(stack{d}.w));
stackgrad{d}.b = zeros(size(stack{d}.b));
end
cost = 0; % You need to compute this
% You might find these variables useful
M = size(data, 2);
groundTruth = full(sparse(labels, 1:M, 1));
%% --------------------------- YOUR CODE HERE -----------------------------
% Instructions: Compute the cost function and gradient vector for
% the stacked autoencoder.
%
% You are given a stack variable which is a cell-array of
% the weights and biases for every layer. In particular, you
% can refer to the weights of Layer d, using stack{d}.w and
% the biases using stack{d}.b . To get the total number of
% layers, you can use numel(stack).
%
% The last layer of the network is connected to the softmax
% classification layer, softmaxTheta.
%
% You should compute the gradients for the softmaxTheta,
% storing that in softmaxThetaGrad. Similarly, you should
% compute the gradients for each layer in the stack, storing
% the gradients in stackgrad{d}.w and stackgrad{d}.b
% Note that the size of the matrices in stackgrad should
% match exactly that of the size of the matrices in stack.
%
depth = numel(stack);
z = cell(depth+1,1);
a = cell(depth+1,1);
a{1} = data;
for layer = 1:depth
% size(stack{layer}.w )
% size(a{layer})
% size(stack{layer}.w * a{layer})
% size(stack{layer}.b)
%z{depth+1} = stack{layer}.w * a{layer} + repmat(stack{layer}.b, 1, size(a{layer},2));
z{layer+1} = bsxfun(@plus,stack{layer}.w * a{layer}, stack{layer}.b);
a{layer+1} = sigmoid(z{layer+1});
end
P = softmaxTheta* a{depth+1};
P = bsxfun(@minus, P ,max(P,[],1));
P = exp(P);
p = bsxfun(@rdivide,P,sum(P,1));
cost = -1/M * groundTruth(:)' * log(p(:)) + lambda /2 * softmaxTheta(:)'* softmaxTheta(:);
softmaxThetaGrad = -1/M * (groundTruth - p)*a{depth+1}' + lambda * softmaxTheta;
delta = cell(depth+1);
delta{depth+1} = -(softmaxTheta' * (groundTruth - p)) .* a{depth+1} .* (1 - a{depth+1});
for layer = (depth:-1:2)
delta{layer} = (stack{layer}.w' * delta{layer+1}) .* a{layer} .* (1-a{layer});
end
for layer = (depth:-1:1)
stackgrad{layer}.w = (1/M) * delta{layer+1} * a{layer}';
stackgrad{layer}.b = (1/M) * sum(delta{layer+1}, 2);
end
% -------------------------------------------------------------------------
%% Roll gradient vector
grad = [softmaxThetaGrad(:) ; stack2params(stackgrad)];
end
% You might find this useful
function sigm = sigmoid(x)
sigm = 1 ./ (1 + exp(-x));
end