写了个程序 来选取SVM中参数 c和g的最佳值.
[写这个的目的是方便大家用这个小程序直接来寻找 c和g的最佳值,不用再另外编写东西了. ]
其实原本libsvm C语言版本中有相应的子程序可以找到最佳的c和g,需装载python语言然后用py 那个画图 就可以找到最佳的c和g,我写了个matlab版本的.算是弥补了libsvm在matlab版本下的空缺.
测试数据 还是我视频 里的wine data.
寻找最佳c和g的思想仍然是让c和g在一定的范围里跑(比如 c = 2^(-5),2^(-4),...,2^(5),g = 2^(-5),2^(-4),...,2^(5)),然后用cross validation的想法找到是的准确率最高的c和g,在这里我做了一点修改(纯粹是个人的一点小经验和想法),我改进的是: 因为会有不同的c和g都对应最高的的准确率,我把具有最小c的那组c和g认为是最佳的c和g,因为惩罚参数不能设置 太高,很高的惩罚参数能使得validation数据的准确率提高,但过高的惩罚参数c会造成过学习状态,反正从我用SVM到现在,往往都是惩罚参数c过高会导致最终测试集合的准确率并不是很理想..
在使用这个程序时也有小技巧,可以先大范围粗糙的找 比较理想的c和g,然后再细范围找更加理想的c和g.
比如首先让 c = 2^(-5),2^(-4),...,2^(5),g = 2^(-5),2^(-4),...,2^(5)在这个范围找比较理想的c和g,如图:
======
======
此时bestc = 0.5,bestg=1,bestacc = 98.8764[cross validation 的准确率]
最终测试集合的准确率 Accuracy = 96.6292% (86/89) (classification)
======
此时看到可以把c和g的范围缩小.还有步进的大小也可以缩小(程序里都有参数可以自己调节,也有默认值可不调节).
让 c = 2^(-2),2^(-1.5),...,2^(4),g = 2^(-4),2^(-3.5),...,2^(4)在这个范围找比较理想的c和g,如图:
=============
===============
此时bestc = 0.3536,bestg=0.7017,bestacc = 98.8764[cross validation 的准确率]
最终测试集合的准确率 Accuracy = 96.6292% (86/89) (classification)
===================
上面第二个的测试的代码
:
load wine_SVM;
train_wine = [wine(1:30,:);wine(60:95,:);wine(131:153,:)];
train_wine_labels = [wine_labels(1:30);wine_labels(60:95);wine_labels(131:153)];
test_wine = [wine(31:59,:);wine(96:130,:);wine(154:178,:)];
test_wine_labels = [wine_labels(31:59);wine_labels(96:130);wine_labels(154:178)];
[train_wine,pstrain] = mapminmax(train_wine');
pstrain.ymin = 0;
pstrain.ymax = 1;
[train_wine,pstrain] = mapminmax(train_wine,pstrain);
[test_wine,pstest] = mapminmax(test_wine');
pstest.ymin = 0;
pstest.ymax = 1;
[test_wine,pstest] = mapminmax(test_wine,pstest);
train_wine = train_wine';
test_wine = test_wine';
[bestacc,bestc,bestg] = SVMcg(train_wine_labels,train_wine,-2,4,-4,4,3,0.5,0.5,0.9);
cmd = ['-c ',num2str(bestc),' -g ',num2str(bestg)];
model = svmtrain(train_wine_labels,train_wine,cmd);
[pre,acc] = svmpredict(test_wine_labels,test_wine,model);
============我写的那个选取SVM中参数c和g的最佳值.的程序的代码 SVMcg.m====================
function [bestacc,bestc,bestg] = SVMcg(train_label,train,cmin,cmax,gmin,gmax,v,cstep,gstep,accstep)
%SVMcg cross validation by faruto
%Email:[email protected] QQ:516667408 http://blog.sina.com.cn/faruto BNU
%last modified 2009.8.23
%Super Moderator @ www.ilovematlab.cn
%% about the parameters of SVMcg
if nargin < 10
accstep = 1.5;
end
if nargin < 8
accstep = 1.5;
cstep = 1;
gstep = 1;
end
if nargin < 7
accstep = 1.5;
v = 3;
cstep = 1;
gstep = 1;
end
if nargin < 6
accstep = 1.5;
v = 3;
cstep = 1;
gstep = 1;
gmax = 5;
end
if nargin < 5
accstep = 1.5;
v = 3;
cstep = 1;
gstep = 1;
gmax = 5;
gmin = -5;
end
if nargin < 4
accstep = 1.5;
v = 3;
cstep = 1;
gstep = 1;
gmax = 5;
gmin = -5;
cmax = 5;
end
if nargin < 3
accstep = 1.5;
v = 3;
cstep = 1;
gstep = 1;
gmax = 5;
gmin = -5;
cmax = 5;
cmin = -5;
end
%% X:c Y:g cg:acc
[X,Y] = meshgrid(cmin:cstep:cmax,gmin:gstep:gmax);
[m,n] = size(X);
cg = zeros(m,n);
%% record acc with different c & g,and find the bestacc with the smallest c
bestc = 0;
bestg = 0;
bestacc = 0;
basenum = 2;
for i = 1:m
for j = 1:n
cmd = ['-v ',num2str(v),' -c ',num2str( basenum^X(i,j) ),' -g ',num2str( basenum^Y(i,j) )];
cg(i,j) = svmtrain(train_label, train, cmd);
if cg(i,j) > bestacc
bestacc = cg(i,j);
bestc = basenum^X(i,j);
bestg = basenum^Y(i,j);
end
if ( cg(i,j) == bestacc && bestc > basenum^X(i,j) )
bestacc = cg(i,j);
bestc = basenum^X(i,j);
bestg = basenum^Y(i,j);
end
end
end
%% to draw the acc with different c & g
[C,h] = contour(X,Y,cg,60:accstep:100);
clabel(C,h,'FontSize',10,'Color','r');
xlabel('log2c','FontSize',10);
ylabel('log2g','FontSize',10);
grid on;
=====================================
这样那个libsvm-matlab工具箱 我就有了自己的一个升级版本的了.大家可以把这个SVMcg.m加进去 一起用了...
里面有SVMcg.m使用说明.如下:
[bestacc,bestc,bestg] = SVMcg(train_label,train,cmin,cmax,gmin,gmax,v,cstep,gstep,accstep)
train_label:训练 集标签.要求与libsvm工具箱中要求一致.
train:训练集.要求与libsvm工具箱中要求一致.
cmin:惩罚参数c的变化范围的最小值(取以2为底的对数后),即 c_min = 2^(cmin).默认为 -5
cmax:惩罚参数c的变化范围的最大值(取以2为底的对数后),即 c_max = 2^(cmax).默认为 5
gmin:参数g的变化范围的最小值(取以2为底的对数后),即 g_min = 2^(gmin).默认为 -5
gmax:参数g的变化范围的最小值(取以2为底的对数后),即 g_min = 2^(gmax).默认为 5
v:cross validation的参数,即给测试集分为几部分进行cross validation.默认为 3
cstep:参数c步进的大小.默认为 1
gstep:参数g步进的大小.默认为 1
accstep:最后显示准确率图时的步进大小. 默认为 1.5
[上面这些参数大家可以更改以期达到最佳效果,也可不改用默认值]
====================