options:
-s svm_type : set type of SVM (default 0)
0 -- C-SVC
1 -- nu-SVC
2 -- one-class SVM
3 -- epsilon-SVR
4 -- nu-SVR
-t kernel_type : set type of kernel function
(default 2)
0 -- linear: u'*v
1 -- polynomial: (gamma*u'*v + coef0)^degree
2 -- radial basis function: exp(-gamma*|u-v|^2)
3 -- sigmoid: tanh(gamma*u'*v + coef0)
4 -- precomputed kernel (kernel values in training_set_file)
-d degree : set degree in kernel function (default 3)
-g gamma : set gamma in kernel function (default 1/num_features)
-r coef0 : set coef0 in kernel function (default 0)
-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)
-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)
-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)
-m cachesize : set cache memory size in MB (default 100)
-e epsilon : set tolerance of termination criterion (default 0.001)
-h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)
-b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
-wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1)
-v n: n-fold cross validation mode
-q : quiet mode (no outputs)
>>> model = svm_train(prob, param)
>>> model = svm_load_model('model_file_name') #从保存的model中获取模型
from svm import *
from svmutil import *
#构造svm训练数据和svm参数(包括核函数和交叉验证)
data = svm_problem([1,-1] ,[[1,0,1] , [-1,0,-1]] ) #元组第一个列表表示分类种类,第二个列表表示数据
# param = svm_parameter(kernel_type = 'LINEAR' ,C = 10) #使用线性核函数,交叉验证用10
#对svm模型训练
'''
参数-t表示kernel函数,-c表示交叉验证用多少
-t kernel_type : set type of kernel function (default 2)
0 -- linear: u'*v
1 -- polynomial: (gamma*u'*v + coef0)^degree
2 -- radial basis function: exp(-gamma*|u-v|^2)
'''
param = svm_parameter('-c 10 -h 0') #默认选择RBF核函数
model = svm_train(data, param)
#测试
svm_predict([1],[[1,1,1]],model) #predict有三个参数,第一个参数是你预测的类型,第二个是你输入要预测的数据,最后一个参数就是训练模型
结果是:
optimization finished, #iter = 1
nu = 0.107467
obj = -1.074672, rho = 0.000000
nSV = 2, nBSV = 0
Total nSV = 2
Accuracy = 100% (1/1) (classification)
(5)另外有一定需要特别注意的是,书本上的这种写法已经不合适了:
>>> newrow=[28.0,-1,-1,26.0,-1,1,2,0.8] # Man doesn't want children, woman does
>>> m.predict(scalef(newrow))
可以更新为:
>>>newrow=[(28.0,-1,-1,26.0,-1,1,2,0.8)]
#注意里面多了一个元组符号'()'
>>>svm_predict([0]*len(newrow),newrow,
m) #注意m代表svm_train出来的模型,第一个参数的解释如下:
a list/tuple of l true labels (type must be int/double). It is used for calculating the accuracy. Use [0]*len(x) if true labels are unavailable. 即第一个参数表示你对newrow的预测值。
5.如果你下载了svm3.x版本,就需要详细看下载包里面的README文件,里面有提到各种函数的用法,但解释感觉不全面;
6.另外书中第9章还有一些错误如下:
def scaledata(rows):
low=[999999999.0]*len(rows[0].data)
high=[-999999999.0]*len(rows[0].data)
# Find the lowest and highest values
for row in rows:
d=row.data
for i in range(len(d)):
if d[i]<low[i]: low[i]=d[i]
if d[i]>high[i]: high[i]=d[i]
# Create a function that scales data
def scaleinput(d):
return [(d.data[i]-low[i])/(high[i]-low[i])for i in range(len(low))] #可能出错(1)(2)
# Scale all the data
newrows=[matchrow(scaleinput(row.data)+[row.match])for row in rows]
# Return the new data and the function
return newrows,scaleinput
可能出错(1):如果使用作者前面计算位置距离的函数milesdistance():
def milesdistance(a1,a2):
return 0
分母则会为0出错,我的做法如下:1.产生[0,1]随机数;2.分母另外加上0.000000001;但使用Yahoo来获取距离是可以的!
出错的地方(2):d.data[i]出错,应该更改为d[i]
还有附录B中计算点积的公式有误:
def veclength(a):
return sum([a[i] for i in range(len(a))])**.5
一个多维向量的模应该为a[i]**2而非a[i];