多分类的一种方法是使用one-vs-all logistic regression,也就是对每一类就计算一个估计函数,预测时计算属于每一类的概率,取最大者。
one-vs-all的cost function和grad和上周的完全一致,正规化:
lrCostFunction.m
function [J, grad] = lrCostFunction(theta, X, y, lambda)
% Initialize some useful values
m = length(y); % number of training examples
J = 0;
grad = zeros(size(theta));
n1 = length(theta);
% X size: m * (n+1), theta size: (n+1) * 1, h size: m * 1
% for every example i compute hi
h = sigmoid(X * theta);
% resularization
Jbias = lambda / (2 * m) * sum((theta .* theta)(2:n1));
% y size: m * 1, log(h) size: m * 1
% for every example i compute yi*log(hi) and (1-yi)*log(1-hi)
J = -1 / m * (y' * log(h) + (1 - y)' * log(1 - h)) + Jbias;
% when j = 0 (index = 1)
grad(1) = X'(1, :) * (h - y) / m;
% when j > 0 (index > 1)
Tbias = lambda * theta(2:n1) / m;
grad(2:n1) = (X'(2:n1, :) * (h - y)) / m + Tbias;
% =============================================================
grad = grad(:);
end
梯度下降结束后,保存每个类的theta:
oneVsAll.m
function [all_theta] = oneVsAll(X, y, num_labels, lambda)
% Some useful variables
m = size(X, 1);
n = size(X, 2);
all_theta = zeros(num_labels, n + 1);
% Add ones to the X data matrix
X = [ones(m, 1) X];
options = optimset('GradObj', 'on', 'MaxIter', 50);
for c=1:num_labels
initial_theta = zeros(n+1, 1);
theta = fmincg(@(t)(lrCostFunction(t, X, (y==c), lambda)), initial_theta, options);
all_theta(c, :) = theta;
end
% =========================================================================
end
对每个样例,和上面得到的每个类的theta,计算属于每个类的概率,取概率最大的,注意用[cls, index] = max(A, [], 2)获取矩阵中每一行最大值的索引:
predictOneVsAll.m
function p = predictOneVsAll(all_theta, X)
m = size(X, 1);
num_labels = size(all_theta, 1);
% You need to return the following variables correctly
p = zeros(size(X, 1), 1);
% Add ones to the X data matrix
X = [ones(m, 1) X];
% X size: m * (n+1), all_theta size: c * (n+1)
% for each example, compute 10 classes hypothesis, m * c
all_h = sigmoid(X * all_theta');
% for each example, get the max hypothesis index
[cls, index] = max(all_h, [], 2);
p = index;
end
多分类的另一种方法是使用神经网络,目前只需要前向传播即可:
predict.m
function p = predict(Theta1, Theta2, X)
% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);
p = zeros(size(X, 1), 1);
% Add ones to the X data matrix
% X size: m * 401
X = [ones(m, 1) X];
% Theta1 size: 25 * 401, => m * 25
A1 = sigmoid(X * Theta1');
m1 = size(A1, 1);
% A1 size: m * 26
A1 = [ones(m1, 1) A1];
% Theta2 size: 10 * 26, => m * 10
A2 = sigmoid(A1 * Theta2');
% for each example, get the max hypothesis index
[cls, index] = max(A2, [], 2);
p = index;
end