UFLDL新版教程与编程练习（五）：Softmax Regression（softmax回归，向量化）

UFLDL是吴恩达团队编写的较早的一门深度学习入门，里面理论加上练习的节奏非常好，每次都想快点看完理论去动手编写练习，因为他帮你打好了整个代码框架，也有详细的注释，所以我们只要实现一点核心的代码编写工作就行了，上手快！

我这里找不到新版对应这块的中文翻译了，-_-

第五节是：Softmax Regression（softmax回归），下面我会同时提供向量化与非向量化的代码。

softmax回归其实是逻辑回归的扩展，换句话说，逻辑回归是softmax回归的特例（即softmax回归中k=2）。逻辑回归通常用作2类的分类器，softmax则用作多类的分类器（不要问为什么它还叫作“Regression”）。

这是之前逻辑回归的hypothesis：以及对应的损失函数：

现在的softmax回归的hypothesis：
$h_{\theta}(x)=\left[\begin{array}{c}{P(y=1 | x ; \theta)} \\ {P(y=2 | x ; \theta)} \\ {\vdots} \\ {P(y=K | x ; \theta)}\end{array}\right]=\frac{1}{\sum_{j=1}^{K} \exp \left(\theta^{(j) \top} x\right)}\left[\begin{array}{c}{\exp \left(\theta^{(1) \top} x\right)} \\ {\exp \left(\theta^{(2) \top} x\right)} \\ {\vdots} \\ {\exp \left(\theta^{(K) \top} x\right)}\end{array}\right]$
以及对应的损失函数：

直观上理解就是每个score占总score的比值就相当于一个“概率”，具体的概率性解释可以去看看cs229 note吧。其中的1是一个开关函数（自己乱叫的），里面为真时其值为1，里面为假时其值为0，你在向量化编程时就会发现，Y其实是一个比较稀疏的矩阵。

而softmax回归的梯度公式是这样的（这个目前还不知道怎么推导过来的，有知道的可以指点一下）：

与之前不同的是，之前关于的梯度都是一个向量，而现在是一个N * K维（N是特征数，K是类别数）矩阵（其实按这样讲的话，逻辑回归里面二分类也应该是一个矩阵，但是因为其有redundant Properties ，这在教程里也有提到，可以减少一维，所以是一个向量，这是我自己的理解）

Talk is cheap,give me the code!
里面新学到一个函数sub2ind

function [f,g] = softmax_regression_vec(theta, X,y)
  %
  % Arguments:
  %   theta - A vector containing the parameter values to optimize.
  %       In minFunc, theta is reshaped to a long vector.  So we need to
  %       resize it to an n-by-(num_classes-1) matrix.
  %       Recall that we assume theta(:,num_classes) = 0.
  %
  %   X - The examples stored in a matrix.  
  %       X(i,j) is the i'th coordinate of the j'th example.
  %   y - The label for each example.  y(j) is the j'th example's label.
  %
  m=size(X,2);
  n=size(X,1);

  % theta is a vector;  need to reshape to n x num_classes.
  theta=reshape(theta, n, []);
  num_classes=size(theta,2)+1;
  
  % initialize objective value and gradient.
%   f = 0;
%   g = zeros(size(theta));

  %
  % TODO:  Compute the softmax objective function and gradient using vectorized code.
  %        Store the objective function value in 'f', and the gradient in 'g'.
  %        Before returning g, make sure you form it back into a vector with g=g(:);
  %
%%% YOUR CODE HERE %%%
  A = exp([theta' * X;zeros(1,m)]);
  B = bsxfun(@rdivide, A, sum(A));
  C = log(B);
  I = sub2ind(size(C),y,1:size(C,2)); 
  f = (-1) * sum(C(I));
  
  %%%%%%% calculate g %%%%%%%%%%%%
  Y = repmat(y',1,num_classes);
  for i=1:num_classes
      Y(Y(:,i)~=i,i) = 0;
  end
  Y(Y~=0)=1;
  % 这里去掉Y的一列，B的一行是因为theta只有num_classes-1列
  g = (-1) * X * (Y(:,1:(size(Y,2)-1))-B(1:(size(B,1)-1),:)');
  %%% 别人的写法，两种写法效果一样,主要是稀疏矩阵生成不一样一点，他的速度略快%%
  %%% 因为这里num_classes还很小，我耗时0.014272秒，他的耗时0.014249秒 %%%
%   h = theta'*X;%h(k,i)第k个theta，第i个样本
%   a = exp(h);
%   a = [a;ones(1,size(a,2))];%加1行
%   p = bsxfun(@rdivide,a,sum(a));
%   c = log2(p);
%   i = sub2ind(size(c), y,[1:size(c,2)]);
%   values = c(i);
%   f = -sum(values);

%   d = full(sparse(1:m,y,1));
%   d = d(:,1:(size(d,2)-1));%减1列
%   p = p(1:(size(p,1)-1),:);%减1行
%   g = X*(p'.-d);

  g=g(:); % make gradient a vector for minFunc

运行结果：

softmax regression向量化

编写完怕自己出错，和别人比较了一下，参考出处：https://blog.csdn.net/lingerlanlan/article/details/38425929

有理解不到位之处，还请指出，有更好的想法，可以在下方评论交流！

UFLDL新版教程与编程练习（五）：Softmax Regression（softmax回归，向量化）

你可能感兴趣的:(UFLDL新版教程与编程练习（五）：Softmax Regression（softmax回归，向量化）)