K近邻分类器的matlab代码(Matlab code of k-nearest neighbors)

function rate = KNN(Train_data,Train_label,Test_data,Test_label,k,Distance_mark);

% K-Nearest-Neighbor classifier(K-NN classifier)

%Input:

% Train_data,Test_data are training data set and test data

% set,respectively.(Each row is a data point)

% Train_label,Test_label are column vectors.They are labels of training

% data set and test data set,respectively.

% k is the number of nearest neighbors

% Distance_mark : ['Euclidean', 'L2'| 'L1' | 'Cos']

% 'Cos' represents Cosine distance.

%Output:

% rate:Accuracy of K-NN classifier

% Examples:

% %Classification problem with three classes

% A = rand(50,300);

% B = rand(50,300)+2;

% C = rand(50,300)+3;

% % label vector for the three classes

% gnd = [ones(300,1);2*ones(300,1);3*ones(300,1)];

% fea = [A B C]';

% trainIdx = [1:150,301:450,601:750]';

% testIdx = [151:300,451:600,751:900]';

% fea_Train = fea(trainIdx,:);

% gnd_Train = gnd(trainIdx);

% fea_Test = fea(testIdx,:);

% gnd_Test = gnd(testIdx);

% rate = KNN(fea_Train,gnd_Train,fea_Test,gnd_Test,1)

%Reference:

% If you used my matlab code, we appreciate it very much if you can cite our following papers:
% Jie Gui, Tongliang Liu, Dacheng Tao, Zhenan Sun, Tieniu Tan, "Representative Vector Machines: A unified framework for classical classifiers", IEEE
% Transactions on Cybernetics (Accepted).

% Jie Gui et al., "Group sparse multiview patch alignment framework with view consistency for image classification", IEEE Transactions on Image Processing , vol. 23, no. 7, pp. 3126-3137, 2014

% Jie Gui et al., "How to estimate the regularization parameter for spectral regression

% discriminant analysis and its kernel version?", IEEE Transactions on Circuits and

% Systems for Video Technology, vol. 24, no. 2, pp. 211-223, 2014

% Jie Gui, Zhenan Sun, Wei Jia, Rongxiang Hu, Yingke Lei and Shuiwang Ji, "Discriminant

% Sparse Neighborhood Preserving Embedding for Face Recognition", Pattern Recognition,

% vol. 45, no.8, pp. 2884–2893, 2012

% Jie Gui, Wei Jia, Ling Zhu, Shuling Wang and Deshuang Huang,

% "Locality Preserving Discriminant Projections for Face and Palmprint Recognition,"

% Neurocomputing, vol. 73, no.13-15, pp. 2696-2707, 2010

% Jie Gui et al., "Semi-supervised learning with local and global consistency",

% International Journal of Computer Mathematics (Accepted)

% Jie Gui, Shu-Lin Wang, and Ying-ke Lei, "Multi-step Dimensionality Reduction and

% Semi-Supervised Graph-Based Tumor Classification Using Gene Expression Data,"

% Artificial Intelligence in Medicine, vol. 50, no.3, pp. 181-191, 2010

%This code is written by Gui Jie in the evening 2009/03/11.

%If you have find some bugs in the codes, feel free to contract me

if nargin < 5

error('Not enought arguments!');

elseif nargin < 6

Distance_mark='L2';

end

[n dim] = size(Test_data);% number of test data set

train_num = size(Train_data, 1); % number of training data set

% Normalize each feature to have zero mean and unit variance.

% If you need the following four rows,you can uncomment them.

% M = mean(Train_data); % mean & std of the training data set

% S = std(Train_data);

% Train_data = (Train_data - ones(train_num, 1) * M)./(ones(train_num, 1) * S); % normalize training data set

% Test_data = (Test_data-ones(n,1)*M)./(ones(n,1)*S); % normalize data

U = unique(Train_label); % class labels

nclasses = length(U);%number of classes

Result = zeros(n, 1);

Count = zeros(nclasses, 1);

dist=zeros(train_num,1);

for i = 1:n

% compute distances between test data and all training data and

% sort them

test=Test_data(i,:);

for j=1:train_num

train=Train_data(j,:);V=test-train;

switch Distance_mark

case {'Euclidean', 'L2'}

dist(j,1)=norm(V,2); % Euclead (L2) distance

case 'L1'

dist(j,1)=norm(V,1); % L1 distance

case 'Cos'

dist(j,1)=acos(test*train'/(norm(test,2)*norm(train,2))); % cos distance

otherwise

dist(j,1)=norm(V,2); % Default distance

end

[Dummy Inds] = sort(dist);

% compute the class labels of the k nearest samples

Count(:) = 0;

for j = 1:k

ind = find(Train_label(Inds(j)) == U); %find the label of the j'th nearest neighbors

Count(ind) = Count(ind) + 1;

end% Count:the number of each class of k nearest neighbors

% determine the class of the data sample

[dummy ind] = max(Count);

Result(i) = U(ind);

end

correctnumbers=length(find(Result==Test_label));

rate=correctnumbers/n;

--------------------------------------------------以上是代码---------------------------------------------------------------------

余弦距离和余弦相似度的区别

餘弦相似度（cosine similarity）乃是傳統文件分類中，最常被拿來度量文件間距離的基本度量方法，其以兩個 d 維向量間的角度差異來度量該向量間的距離，所得數據介於 0 ~ 1 之間，當兩向量角度越相近時，所求出的餘弦距離越接近1；反之，則越接近 0。假設在 d 維空間中有兩點a = [a1, a2, …, ad]，b = [b1, b2, …,bd]，則其餘弦相似度可表示為：

cosineSimilarity(a,b) = dot(a,b) / (norm(a)*norm(b)) [我觉得这里说成cosineSimilarity，不应该说成cosineDistance。相似度越大，距离应该越小。比如，a和b夹角为0，此时最相似，相似度最大，距离最小]
dot(a,b) 代表a和b的内积，因为向量内积定义为 a·b = |a| × |b| × cosθ，（一般情况下，θ∈[0,π]， http://baike.baidu.com/view/1485493.htm ）。故这样定义不能满足在 0 ~ 1 之間，而是-1到1之间，有两种方式：
(1) 我下面的代码是正确的，用acos，将这个余弦转化为[0, π]之间的角度. 未必一定要限制在0 ~ 1 ，我的代码转化成[0, π]，值越大代表其距离越大；
(2) cosineDistance(a,b) = 1- cosineSimilarity(a,b) = 1- dot(a,b) / (norm(a)*norm(b))。cosineDistance的范围就在[0 2]。

範例：

a=[1 1 1]; b=[1 0 0];

cosineDistance = dot(a,b) / (norm(a)*norm(b))

cosineDistance =

0.5774 [ http://neural.cs.nthu.edu.tw/jang/books/dcpr/doc/02%E8%B7%9D%E9%9B%A2%E8%88%87%E7%9B%B8%E4%BC%BC%E5%BA%A6.pdf ，已经保存到电脑：距离与相似度.pdf]

(1) Lin Zhu 师弟讲将循环改为计算距离矩阵会节省时间，因为matlab循环很耗时，但大样本还必须用循环否则out of memory.想起以前上课jinsong he 老师也提供了一个KNN代码，不过他的也是用循环实现的.matlab有自带的函数knnclassify，论文Sparsity preserving projections的代码SPP_1NN.m中就用的该函数。在 ASLAN上我的KNN和 knnclassify识别率完全一样
(2) 极其重要注意点：倒数第四行程序不要用Result(i) = ind;这对Yale等标号依次为1,2,3等没问题。对二分类1和-1就有问题。SRC_QC和SRC_QC2也是类似的，倒数第三行不能用Result(i) = index, 要用Result(i) = classLabel(index); 原来只修改了这一处，其实SRC_QC2的50行和SRC_QC的42行也要将ii修改为classLabel(ii)。正因为这个错误，才得出SRC在ASLAN上是50%错误率方差是0的错误结果。正确的SRC_QC2和SRC_QC程序在ASLAN目录

K近邻分类器的matlab代码(Matlab code of k-nearest neighbors)

你可能感兴趣的:(K近邻分类器的matlab代码(Matlab code of k-nearest neighbors))