代码可在Github上下载:代码下载
支持向量机,一个理论先于实践的产物,由Corinna Cortes和Vapnik等在1995年首次提出的,取得了非常好的效果。
SVM分为线性可分(又可分硬间隔与软间隔)和线性不可分的问题,线性不可分可通过核函数进行映射到空间里,从而线性可分。
模型: f ( x ) = s i g n ( w ⋅ x + b ) f\left( x \right) = sign\left( {w \cdot x + b} \right) f(x)=sign(w⋅x+b)
函数间隔(关于样本点): γ ^ i = y i ( w ⋅ x i + b ) {{\hat \gamma }_i} = {y_i}\left( {w \cdot {x_{\rm{i}}} + b} \right) γ^i=yi(w⋅xi+b)
函数间隔(关于数据集T): γ ^ = min i = 1 , . . . , N γ i \hat \gamma = \mathop {\min }\limits_{i = 1,...,N} {\gamma _i} γ^=i=1,...,Nminγi
函数间隔表示一个确信度
几何间隔: γ i = y i ( w ∥ w ∥ ⋅ x i + b ∥ w ∥ ) {\mathop \gamma \limits^{} _i} = {y_i}\left( {\frac{w}{{\left\| w \right\|}} \cdot {x_i} + \frac{b}{{\left\| w \right\|}}} \right) iγ=yi(∥w∥w⋅xi+∥w∥b)
其实这里的几何间隔的就是点直线的距离。
由此我们可以得到 max w , b γ ^ ∥ w ∥ \mathop {\max }\limits_{w,b} \frac{{\hat \gamma }}{{\left\| w \right\|}} w,bmax∥w∥γ^
s . t . y i ( w ⋅ x i + b ) − 1 ≥ 0 , i = 1 , 2 , . . . , N s.t.{\rm{ }}{y_i}\left( {w \cdot {x_i} + b} \right) - 1 \ge 0,i = 1,2,...,N s.t.yi(w⋅xi+b)−1≥0,i=1,2,...,N
同时等价于
min w , b 1 2 ∥ w ∥ 2 \mathop {\min }\limits_{w,b} \frac{1}{2}{\left\| w \right\|^2} w,bmin21∥w∥2
s . t . y i ( w ⋅ x i + b ) − 1 ≥ 0 , i = 1 , 2 , . . . , N s.t.{\rm{ }}{y_i}\left( {w \cdot {x_i} + b} \right) - 1 \ge 0,i = 1,2,...,N s.t.yi(w⋅xi+b)−1≥0,i=1,2,...,N
构造拉格朗日乘子 L ( w , b , α ) = 1 2 ∥ w ∥ 2 − ∑ i = 1 N α i y i ( w ⋅ x i + b ) + ∑ i = 1 N α i L\left( {w,b,\alpha } \right) = \frac{1}{2}{\left\| w \right\|^2} - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}\left( {w \cdot {x_i} + b} \right) + \sum\limits_{i = 1}^N {{\alpha _i}} } L(w,b,α)=21∥w∥2−i=1∑Nαiyi(w⋅xi+b)+i=1∑Nαi
通过求导等于0课得 w ∗ = ∑ i = 1 N α i y i x i {w^*} = \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}} w∗=i=1∑Nαiyixi
将 w w w代入到 y j ( w ∗ ⋅ x j + b ∗ ) − 1 = 0 {y_j}\left( {{w^*} \cdot {x_j} + {b^*}} \right) - 1 = 0 yj(w∗⋅xj+b∗)−1=0
可得 b ∗ = 1 y j − ∑ i = 1 N α i y i x i x j = y j − ∑ i = 1 N α i y i x i x j {b^*} = \frac{1}{{{y_j}}} - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}{x_j}} = {y_j} - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}{x_j}} b∗=yj1−i=1∑Nαiyixixj=yj−i=1∑Nαiyixixj
我们的目标是 max α min w , b L ( w , b , α ) \mathop {\max }\limits_\alpha \mathop {\min }\limits_{w,b} L\left( {w,b,\alpha } \right) αmaxw,bminL(w,b,α)
然后用对偶来表示,最终的对偶形式
min α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i \mathop {\min }\limits_\alpha \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}\left( {{x_i} \cdot {x_j}} \right)} - \sum\limits_{i = 1}^N {{\alpha _i}} } αmin21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαi
s . t . ∑ i = 1 N α i y i = 0 s.t.{\rm{ }}\sum\limits_{i = 1}^N {{\alpha _i}{y_i} = 0} s.t.i=1∑Nαiyi=0
α i ≥ 0 , i = 1 , 2 , . . . , N {\alpha _i} \ge 0,i = 1,2,...,N αi≥0,i=1,2,...,N
但是有时数据难免有偏差(比如数据点不允许在间隔内),如果不允许这些数据点在间隔内,我们很难找到理想的间隔来划分数据,于是就有了软间隔最大化,C作为一个惩罚参数,C值大则对误分类的惩罚增大,反之同理。
min α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i {\mathop {\min }\limits_\alpha {1 \over 2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}} } {\alpha _j}{y_i}{y_j}\left( {{x_i} \cdot {x_j}} \right) - \sum\limits_{i = 1}^N {{\alpha _i}} } αmin21i=1∑Nj=1∑Nαiαjyiyj(xi⋅xj)−i=1∑Nαi
s . t . ∑ i = 1 N α i y i = 0 {s.t.\sum\limits_{i = 1}^N {{\alpha _i}} {y_i} = 0} s.t.i=1∑Nαiyi=0
0 ≤ α i ≤ C , i = 1 , 2 , . . . , N {0 \le {\alpha _i} \le C,i = 1,2,...,N} 0≤αi≤C,i=1,2,...,N
可以看到软间隔最大化的公式主要差别在于 α \alpha α的取值大小。
到这步,我们需要求出 α \alpha α来得到 w w w,如何求出 α \alpha α通常是选择SMO算法。
《统计学习方法》P124 7.4序列最小化优化算法开始专门为这部分进行了一个介绍。
首先这个算法最主要的是想求出 α \alpha α,有了 α \alpha α我们就能得到 w w w和 b b b。
那SMO的算法主要是分两步,P128有详细介绍,这里简单描述下,第一步,找出第一个 α \alpha α,这个可以找到一个不满足KKT条件的,进行优化,第二步,找出能使得 ∣ E 1 − E 2 ∣ % \left| {{E_1} - {E_{\rm{2}}}} \right| ∣E1−E2∣最大的数,这里可以通过对 E 1 {E_1} E1进行判断,如果 E 1 {E_1} E1是正的,就选个最小的 E i {E_i} Ei作为 E 2 {E_2} E2,反之同理。
for i in range(N): #外循环
alpha1old = alpha[i].copy() #未更新的alpha,也就是alpha_old
y1 = labels[i] #y1
data1 = dataSet[i]
g1 = self.calcG(alpha, labels, data1, dataSet, b)
alphaIndex1 = -1 #存储不满足KKT条件的alphaIndex1
if alpha1old == 0: #判断是否满足KKT条件 (7.111)
if y1 * g1 < 1:
alphaIndex1 = i
flag = True
if alpha1old > 0 and alpha[i] < C: #(7.112)
if y1 * g1 != 1:
alphaIndex1 = i
flag = True
if alpha1old == C: #(7.1132)
if y1 * g1 <= 1:
alphaIndex1 = i
flag = True
if alphaIndex1 == -1: #说明满足KKT条件,继续下一次循环来找alpha1
continue
E1 = g1 - y1 #(7.105)
那首先有个外循环,来遍历每一个 α \alpha α来找第一个 α 1 o l d {\alpha_1^{old}} α1old,书本说的是选取违反KKT最严重的样本点,那这里做了简单处理,只要是违反的就行,并且算出 E 1 {E_1} E1。
其中E的计算方式为 E i = g ( x i ) − y i = ( ( ∑ j = 1 N α j y j K ( x j , x i ) ) + b ) − y i {E_i} = g\left( {{x_i}} \right) - {y_i} = \left( {\left( {\sum\limits_{j = 1}^N {{\alpha _j}{y_j}K\left( {{x_j},{x_i}} \right)} } \right) + b} \right) - {y_i} Ei=g(xi)−yi=((j=1∑NαjyjK(xj,xi))+b)−yi。
for j in range(N): #内循环
if i != j: #相等就没法选了
yj = labels[j]
gj = self.calcG(alpha, labels, dataSet[j], dataSet, b)
Ej = gj - yj
if E1 > 0: #说明要选最小的E2
if Ej < selectedE2:
selectedE2 = Ej
alphaIndex2 = j
else:
if Ej > selectedE2:
selectedE2 = Ej
alphaIndex2 = j
以上的代码就是寻找出第二个 α 2 o l d {\alpha_2^{old}} α2old。
L = 0 # P126末尾两段
H = 0
y2 = labels[alphaIndex2]
alpha2old = alpha[alphaIndex2].copy()
data2 = dataSet[alphaIndex2]
E2 = selectedE2
if (y1 == y2): #alpha2取值范围必须限制在L H): # (7.108)
alpha[alphaIndex2] = H
elif (alpha2new < L):
alpha[alphaIndex2] = L
else:
alpha[alphaIndex2] = alpha2new
alpha1new = alpha1old * y1 * y2 * (alpha2old - alpha2new) #(7.109)
alpha[alphaIndex1] = alpha1new
OK,有了两个 α \alpha α,我们可以根据公式 α 2 n e w , u n c = α 2 o l d + y 2 ( E 1 − E 2 ) η \alpha _{\rm{2}}^{new,unc} = \alpha _2^{old} + \frac{{{y_2}\left( {{E_1} - {E_2}} \right)}}{\eta } α2new,unc=α2old+ηy2(E1−E2)(7.106),其中来得出一个未经处理的候选 α \alpha α,这个 α \alpha α由于有限制范围,需要处理下。
具体处理方法为(7.108): α 2 n e w = { H , α 2 n e w , u n c > H α 2 n e w , u n c , L ≤ α 2 n e w , u n c ≤ H L , α 2 n e w , u n c < L \alpha _2^{new} = \left\{ \begin{array}{l} H,{\rm{ }}\alpha _2^{new,unc} > H\\ \alpha _2^{new,unc},L \le \alpha _2^{new,unc} \le H\\ L,{\rm{ }}\alpha _2^{new,unc} < L \end{array} \right. α2new=⎩⎨⎧H,α2new,unc>Hα2new,unc,L≤α2new,unc≤HL,α2new,unc<L
有了 α 2 n e w \alpha _2^{new} α2new之后,, α 1 n e w = α 1 o l d + y 1 y 2 ( α 2 o l d − α 2 n e w ) \alpha _1^{new} = \alpha _1^{old} + {y_1}{y_2}\left( {\alpha _2^{old} - \alpha _2^{new}} \right) α1new=α1old+y1y2(α2old−α2new)
b1new = -E1 - y1 * self.calcK(data1, data1) * (alpha1new - alpha1old) - y2 * self.calcK(data2, data1) * (alpha2new - alpha2old) + b #(7.115)
b2new = -E2 - y1 * self.calcK(data1, data1) * (alpha1new - alpha1old) - y2 * self.calcK(data2, data2) * (alpha2new - alpha2old) + b #(7.116)
if (alpha1new > 0 and alpha1new < C):
b = b1new
else:
b = (b1new + b2new) / 2
同时,由于计算过程中需要用到 b b b,所以需要在此过程中更新 b b b
公式在P130页(7.115)和(7.116), b 1 n e w = − E 1 − y 1 K 11 ( α 1 n e w − α 1 o l d ) − y 2 K 21 ( α 2 n e w − α 2 o l d ) + b o l d b 2 n e w = − E 2 − y 1 K 12 ( α 1 n e w − α 1 o l d ) − y 2 K 21 ( α 2 n e w − α 2 o l d ) + b o l d \begin{array}{l} b_1^{new} = - {E_1} - {y_1}{K_{11}}\left( {\alpha _{\rm{1}}^{new} - \alpha _{\rm{1}}^{old}} \right) - {y_2}{K_{21}}\left( {\alpha _2^{new} - \alpha _2^{old}} \right) + {b^{old}}\\ b_2^{new} = - {E_2} - {y_1}{K_{12}}\left( {\alpha _{\rm{1}}^{new} - \alpha _{\rm{1}}^{old}} \right) - {y_2}{K_{21}}\left( {\alpha _2^{new} - \alpha _2^{old}} \right) + {b^{old}}\\\end{array} b1new=−E1−y1K11(α1new−α1old)−y2K21(α2new−α2old)+boldb2new=−E2−y1K12(α1new−α1old)−y2K21(α2new−α2old)+bold
然后如果 α 1 n e w \alpha_1^{new} α1new和 α 2 n e w \alpha_2^{new} α2new同时满足 0 < α 1 n e w < C 0<{\alpha_1^{new}}<C 0<α1new<C,那么 b = b 1 n e w b={b_1^{new}} b=b1new或者 b = b 2 n e w b={b_2^{new}} b=b2new,如果不符合,则选这两个值的中值。
weights = np.dot(np.multiply(alpha, labels), dataSet) #权重
至此, α \alpha α全部都求出来了,我们通过 w ∗ = ∑ i = 1 N α i y i x i {w^*} = \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}} w∗=i=1∑Nαiyixi可以得到 w w w。 b b b就是刚才那个 b n e w b^{new} bnew了。
这里有个小插曲, b b b值也可以在每次得到两个 α \alpha α值后,通过 b ∗ = y j − ∑ i = 1 N α i y i x i x j {b^*} = {y_j} - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}{x_j}} b∗=yj−i=1∑Nαiyixixj。
这个代码在python3环境下运行。
def loadDataSet(): #加载文件
data = list()
labels = list()
with open('testSet.txt') as f:
lines = f.readlines()
for line in lines:
line = line.rstrip().split('\t')
data.append([float(line[0]), float(line[1])])
labels.append(float(line[-1]))
return data, labels
def sign(x): #符号函数
if x >= 0:
return 1
else:
return -1
class SVM:
def train(self, dataSet, labels): #训练并返回权重和偏置
b = 0 #偏置
C = 1 #惩罚系数
flag = True #检验是否全部都满足KKT条件
maxIter = 100 #最大循环次数
iter = 0
N = len(dataSet) #数据的行数
M = len(dataSet[0]) #数据的列数,维数
alpha = np.zeros(N)
while iter < maxIter:
print(iter)
iter += 1
flag = False
for i in range(N): #外循环
alpha1old = alpha[i].copy() #未更新的alpha,也就是alpha_old
y1 = labels[i] #y1
data1 = dataSet[i]
g1 = self.calcG(alpha, labels, data1, dataSet, b)
alphaIndex1 = -1 #存储不满足KKT条件的alphaIndex1
if alpha1old == 0: #判断是否满足KKT条件 (7.111)
if y1 * g1 < 1:
alphaIndex1 = i
flag = True
if alpha1old > 0 and alpha[i] < C: #(7.112)
if y1 * g1 != 1:
alphaIndex1 = i
flag = True
if alpha1old == C: #(7.1132)
if y1 * g1 <= 1:
alphaIndex1 = i
flag = True
if alphaIndex1 == -1: #说明满足KKT条件,继续下一次循环来找alpha1
continue
E1 = g1 - y1 #(7.105)
alphaIndex2 = -1
if E1 > 0: #正的话要找E2的最小值,反之同理
selectedE2 = np.inf
else:
selectedE2 = -np.inf
for j in range(N): #内循环
if i != j: #相等就没法选了
yj = labels[j]
gj = self.calcG(alpha, labels, dataSet[j], dataSet, b)
Ej = gj - yj
if E1 > 0: #说明要选最小的E2
if Ej < selectedE2:
selectedE2 = Ej
alphaIndex2 = j
else:
if Ej > selectedE2:
selectedE2 = Ej
alphaIndex2 = j
'''
此时应该选到了alpha2了
'''
L = 0 # P126末尾两段
H = 0
y2 = labels[alphaIndex2]
alpha2old = alpha[alphaIndex2].copy()
data2 = dataSet[alphaIndex2]
E2 = selectedE2
if (y1 == y2): #alpha2取值范围必须限制在L H): # (7.108)
alpha[alphaIndex2] = H
elif (alpha2new < L):
alpha[alphaIndex2] = L
else:
alpha[alphaIndex2] = alpha2new
alpha1new = alpha1old * y1 * y2 * (alpha2old - alpha2new) #(7.109)
alpha[alphaIndex1] = alpha1new
b1new = -E1 - y1 * self.calcK(data1, data1) * (alpha1new - alpha1old) - y2 * self.calcK(data2, data1) * (alpha2new - alpha2old) + b #(7.115)
b2new = -E2 - y1 * self.calcK(data1, data1) * (alpha1new - alpha1old) - y2 * self.calcK(data2, data2) * (alpha2new - alpha2old) + b #(7.116)
if (alpha1new > 0 and alpha1new < C):
b = b1new
else:
b = (b1new + b2new) / 2
print(alpha)
weights = np.dot(np.multiply(alpha, labels), dataSet) #权重
return weights, b
def calcK(self, data1, data2): #线性核函数,返回内积
return np.dot(data1, data2)
def calcG(self, alpha, labels, data, dataSet, b): #计算g
sum = 0
for j in range(len(alpha)):
sum += alpha[j] * labels[j] * self.calcK(data, dataSet[j]) #g(x)的计算
return sum + b
if __name__ == '__main__':
dataSet, labels = loadDataSet()
svm = SVM()
weights, b = svm.train(dataSet, labels)
print(weights, b)
x = [1, 2]
f = sign(np.dot(weights, x) + b)
print(f)
以下是数据集,请保存为testSet.txt文件放在同目录下。
3.542485 1.977398 -1
3.018896 2.556416 -1
7.551510 -1.580030 1
2.114999 -0.004466 -1
8.127113 1.274372 1
7.108772 -0.986906 1
8.610639 2.046708 1
2.326297 0.265213 -1
3.634009 1.730537 -1
0.341367 -0.894998 -1
3.125951 0.293251 -1
2.123252 -0.783563 -1
0.887835 -2.797792 -1
7.139979 -2.329896 1
1.696414 -1.212496 -1
8.117032 0.623493 1
8.497162 -0.266649 1
4.658191 3.507396 -1
8.197181 1.545132 1
1.208047 0.213100 -1
1.928486 -0.321870 -1
2.175808 -0.014527 -1
7.886608 0.461755 1
3.223038 -0.552392 -1
3.628502 2.190585 -1
7.407860 -0.121961 1
7.286357 0.251077 1
2.301095 -0.533988 -1
-0.232542 -0.547690 -1
3.457096 -0.082216 -1
3.023938 -0.057392 -1
8.015003 0.885325 1
8.991748 0.923154 1
7.916831 -1.781735 1
7.616862 -0.217958 1
2.450939 0.744967 -1
7.270337 -2.507834 1
1.749721 -0.961902 -1
1.803111 -0.176349 -1
8.804461 3.044301 1
1.231257 -0.568573 -1
2.074915 1.410550 -1
-0.743036 -1.736103 -1
3.536555 3.964960 -1
8.410143 0.025606 1
7.382988 -0.478764 1
6.960661 -0.245353 1
8.234460 0.701868 1
8.168618 -0.903835 1
1.534187 -0.622492 -1
9.229518 2.066088 1
7.886242 0.191813 1
2.893743 -1.643468 -1
1.870457 -1.040420 -1
5.286862 -2.358286 1
6.080573 0.418886 1
2.544314 1.714165 -1
6.016004 -3.753712 1
0.926310 -0.564359 -1
0.870296 -0.109952 -1
2.369345 1.375695 -1
1.363782 -0.254082 -1
7.279460 -0.189572 1
1.896005 0.515080 -1
8.102154 -0.603875 1
2.529893 0.662657 -1
1.963874 -0.365233 -1
8.132048 0.785914 1
8.245938 0.372366 1
6.543888 0.433164 1
-0.236713 -5.766721 -1
8.112593 0.295839 1
9.803425 1.495167 1
1.497407 -0.552916 -1
1.336267 -1.632889 -1
9.205805 -0.586480 1
1.966279 -1.840439 -1
8.398012 1.584918 1
7.239953 -1.764292 1
7.556201 0.241185 1
9.015509 0.345019 1
8.266085 -0.230977 1
8.545620 2.788799 1
9.295969 1.346332 1
2.404234 0.570278 -1
2.037772 0.021919 -1
1.727631 -0.453143 -1
1.979395 -0.050773 -1
8.092288 -1.372433 1
1.667645 0.239204 -1
9.854303 1.365116 1
7.921057 -1.327587 1
8.500757 1.492372 1
1.339746 -0.291183 -1
3.107511 0.758367 -1
2.609525 0.902979 -1
3.263585 1.367898 -1
2.912122 -0.202359 -1
1.731786 0.589096 -1
2.387003 1.573131 -1