使用LDA线性判别分析进行多类的训练分类

  本文使用LDA作为分类器在matlab下做实验。

  其中投影转换矩阵W按照LDA的经典理论生成,如下的LDA函数,并返回各个类的投影后的(k-1)维的类均值。

LDA.m代码如下:

View Code
   
     
function [W,centers] = LDA(Input,Target)
% Ipuut: n * d matrix,each row is a sample;
% Target: n * 1 matrix,each is the class label
% W: d * (k - 1 ) matrix,to project samples to (k - 1 ) dimention
% cneters: k * (k - 1 ) matrix,the means of each after projection


% 初始化
[n dim]
= size(Input);
ClassLabel
= unique(Target);
k
= length(ClassLabel);

nGroup
= NaN(k, 1 ); % group count
GroupMean
= NaN(k,dim); % the mean of each value
W
= NaN(k - 1 ,dim); % the final transfer matrix
centers
= zeros(k,k - 1 ); % the centers of mean after projection
SB
= zeros(dim,dim); % 类间离散度矩阵
SW
= zeros(dim,dim); % 类内离散度矩阵

% 计算类内离散度矩阵和类间离散度矩阵
for i = 1 :k
group
= (Target == ClassLabel(i));
nGroup(i)
= sum( double (group));
GroupMean(i,:)
= mean(Input(group,:));
tmp
= zeros(dim,dim);
for j = 1 :n
if group(j) == 1
t
= Input(j,:) - GroupMean(i,:);
tmp
= tmp + t ' *t;
end
end
SW
= SW + tmp;
end
m
= mean(GroupMean);
for i = 1 :k
tmp
= GroupMean(i,:) - m;
SB
= SB + nGroup(i) * tmp ' *tmp;
end

% % W 变换矩阵由v的最大的K - 1个特征值所对应的特征向量构成
% v = inv(SW) * SB;
% [evec,eval] = eig(v);
% [x,d] = cdf2rdf(evec,eval);
% W = v(:, 1 :k - 1 );

% 通过SVD也可以求得
% 对K = (Hb,Hw) ' 进行奇异值分解可以转换为对Ht进行奇异值分解.P再通过K,U,sigmak求出来
% [P,sigmak,U] = svd(K, ' econ ' ); => [U,sigmak,V] = svd(Ht, 0 );
[U,sigmak,V]
= svd(SW, 0 );
t
= rank(SW);
R
= sigmak( 1 :t, 1 :t);
P
= SB ' *U(:,1:t)*inv(R);
[Q,sigmaa,W] = svd(P( 1 :k, 1 :t))
Y(:,
1 :t) = U(:, 1 :t) * inv(R) * W;
W
= Y(:, 1 :k - 1 );

% 计算投影后的中心值
for i = 1 :k
group
= (Target == ClassLabel(i));
centers(i,:)
= mean(Input(group,:) * W);
end

  因为LDA是二类分类器,需要推广到多类的问题。常用的方法one-vs-all方法训练K个分类器(这个方法在综合时不知道怎么处理?),以及任意两个分类配对训练分离器最后得到k(k-1)/2个的二类分类器。本文采用训练后者对样本进行训练得到模型model。在代码中,model为数组struct。

用于训练的函数LDATraining.m

View Code
   
     
function [model,k,ClassLabel] = LDATraining( input ,target)
%
input : n * d matrix,representing samples
% target: n
* 1 matrix, class label
% model: struct type(see codes below)
% k: the total
class number
% ClassLabel: the
class name of each class
%
model
= struct;
[n
dim ] = size( input );
ClassLabel
= unique(target);
k
= length(ClassLabel);

t
= 1 ;
for i = 1 :k - 1
for j = i + 1 :k
model(t).a
= i;
model(t).b
= j;
g1
= (target == ClassLabel(i));
g2
= (target == ClassLabel(j));
tmp1
= input (g1,:);
tmp2
= input (g2,:);
in = [tmp1;tmp2];
out
= ones(size( in , 1 ), 1 );
out(
1 :size(tmp1, 1 )) = 0 ;
% tmp3
= target(g1);
% tmp4
= target(g2);
% tmp3
= repmat(tmp3,length(tmp3), 1 );
% tmp4
= repmat(tmp4,length(tmp4), 1 );
% out
= [tmp3;tmp4];
[w m]
= LDA( in ,out);
model(t).W
= w;
model(t).means
= m;
t
= t + 1 ;
end
end

  在预测时,使用训练时生成的模型进行k(k-1)/2次预测,最后选择最多的分类作为预测结果。在处理二类分类器预测时,通过对预测样本作W的投影变换再比较与两个类的均值进行比较得到(不知道有没有更好的办法?)

用于预测的函数LDATesting.m

View Code
   
     
function target = LDATesting( input ,k,model,ClassLabel)
%
input : n * d matrix,representing samples
% target: n
* 1 matrix, class label
% model: struct type(see codes below)
% k: the total
class number
% ClassLabel: the
class name of each class
[n
dim ] = size( input );
s
= zeros(n,k);
target
= zeros(n, 1 );

for j = 1 :k * (k - 1 ) / 2
a
= model(j).a;
b
= model(j).b;
w
= model(j).W;
m
= model(j).means;
for i = 1 :n
sample
= input (i,:);
tmp
= sample * w;
if norm(tmp - m( 1 ,:)) < norm(tmp - m( 2 ,:))
s(i,a)
= s(i,a) + 1 ;
else
s(i,b)
= s(i,b) + 1 ;
end
end
end
for i = 1 :n
pos
= 1 ;
maxV
= 0 ;
for j = 1 :k
if s(i,j) > maxV
maxV
= s(i,j);
pos
= j;
end
end
target(i)
= ClassLabel(pos);
end

示例代码为:

  
    
function target = test( in ,out,t)
[model,k,ClassLabel]
= LDATraining( in ,out);
target
= LDATesting(t,k,model,ClassLabel);

  实验中对USPS数据集进行了测试,效果不怎么好,正确率才39%左右,而这个数据集使用KNN算法可以达到百分之百九十的正确率,汗!

你可能感兴趣的:(使用)