【统计学习方法】 支持向量机(SVM) Python实现

前言

代码可在Github上下载:代码下载
支持向量机,一个理论先于实践的产物,由Corinna Cortes和Vapnik等在1995年首次提出的,取得了非常好的效果。
SVM分为线性可分(又可分硬间隔与软间隔)和线性不可分的问题,线性不可分可通过核函数进行映射到空间里,从而线性可分。

线性可分支持向量机

模型: f ( x ) = s i g n ( w ⋅ x + b ) f\left( x \right) = sign\left( {w \cdot x + b} \right) f(x)=sign(wx+b)
函数间隔(关于样本点): γ ^ i = y i ( w ⋅ x i + b ) {{\hat \gamma }_i} = {y_i}\left( {w \cdot {x_{\rm{i}}} + b} \right) γ^i=yi(wxi+b)
函数间隔(关于数据集T): γ ^ = min ⁡ i = 1 , . . . , N γ i \hat \gamma = \mathop {\min }\limits_{i = 1,...,N} {\gamma _i} γ^=i=1,...,Nminγi
函数间隔表示一个确信度
几何间隔: γ i = y i ( w ∥ w ∥ ⋅ x i + b ∥ w ∥ ) {\mathop \gamma \limits^{} _i} = {y_i}\left( {\frac{w}{{\left\| w \right\|}} \cdot {x_i} + \frac{b}{{\left\| w \right\|}}} \right) iγ=yi(wwxi+wb)
其实这里的几何间隔的就是点直线的距离。
由此我们可以得到 max ⁡ w , b γ ^ ∥ w ∥ \mathop {\max }\limits_{w,b} \frac{{\hat \gamma }}{{\left\| w \right\|}} w,bmaxwγ^
s . t . y i ( w ⋅ x i + b ) − 1 ≥ 0 , i = 1 , 2 , . . . , N s.t.{\rm{ }}{y_i}\left( {w \cdot {x_i} + b} \right) - 1 \ge 0,i = 1,2,...,N s.t.yi(wxi+b)10,i=1,2,...,N
同时等价于
min ⁡ w , b 1 2 ∥ w ∥ 2 \mathop {\min }\limits_{w,b} \frac{1}{2}{\left\| w \right\|^2} w,bmin21w2
s . t . y i ( w ⋅ x i + b ) − 1 ≥ 0 , i = 1 , 2 , . . . , N s.t.{\rm{ }}{y_i}\left( {w \cdot {x_i} + b} \right) - 1 \ge 0,i = 1,2,...,N s.t.yi(wxi+b)10,i=1,2,...,N
构造拉格朗日乘子 L ( w , b , α ) = 1 2 ∥ w ∥ 2 − ∑ i = 1 N α i y i ( w ⋅ x i + b ) + ∑ i = 1 N α i L\left( {w,b,\alpha } \right) = \frac{1}{2}{\left\| w \right\|^2} - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}\left( {w \cdot {x_i} + b} \right) + \sum\limits_{i = 1}^N {{\alpha _i}} } L(w,b,α)=21w2i=1Nαiyi(wxi+b)+i=1Nαi
通过求导等于0课得 w ∗ = ∑ i = 1 N α i y i x i {w^*} = \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}} w=i=1Nαiyixi
w w w代入到 y j ( w ∗ ⋅ x j + b ∗ ) − 1 = 0 {y_j}\left( {{w^*} \cdot {x_j} + {b^*}} \right) - 1 = 0 yj(wxj+b)1=0
可得 b ∗ = 1 y j − ∑ i = 1 N α i y i x i x j = y j − ∑ i = 1 N α i y i x i x j {b^*} = \frac{1}{{{y_j}}} - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}{x_j}} = {y_j} - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}{x_j}} b=yj1i=1Nαiyixixj=yji=1Nαiyixixj
我们的目标是 max ⁡ α min ⁡ w , b L ( w , b , α ) \mathop {\max }\limits_\alpha \mathop {\min }\limits_{w,b} L\left( {w,b,\alpha } \right) αmaxw,bminL(w,b,α)
然后用对偶来表示,最终的对偶形式
min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i \mathop {\min }\limits_\alpha \frac{1}{2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}{\alpha _j}{y_i}{y_j}\left( {{x_i} \cdot {x_j}} \right)} - \sum\limits_{i = 1}^N {{\alpha _i}} } αmin21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαi
s . t . ∑ i = 1 N α i y i = 0 s.t.{\rm{ }}\sum\limits_{i = 1}^N {{\alpha _i}{y_i} = 0} s.t.i=1Nαiyi=0
α i ≥ 0 , i = 1 , 2 , . . . , N {\alpha _i} \ge 0,i = 1,2,...,N αi0,i=1,2,...,N
但是有时数据难免有偏差(比如数据点不允许在间隔内),如果不允许这些数据点在间隔内,我们很难找到理想的间隔来划分数据,于是就有了软间隔最大化,C作为一个惩罚参数,C值大则对误分类的惩罚增大,反之同理。
min ⁡ α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j ( x i ⋅ x j ) − ∑ i = 1 N α i {\mathop {\min }\limits_\alpha {1 \over 2}\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^N {{\alpha _i}} } {\alpha _j}{y_i}{y_j}\left( {{x_i} \cdot {x_j}} \right) - \sum\limits_{i = 1}^N {{\alpha _i}} } αmin21i=1Nj=1Nαiαjyiyj(xixj)i=1Nαi
s . t . ∑ i = 1 N α i y i = 0 {s.t.\sum\limits_{i = 1}^N {{\alpha _i}} {y_i} = 0} s.t.i=1Nαiyi=0
0 ≤ α i ≤ C , i = 1 , 2 , . . . , N {0 \le {\alpha _i} \le C,i = 1,2,...,N} 0αiC,i=1,2,...,N
可以看到软间隔最大化的公式主要差别在于 α \alpha α的取值大小。
到这步,我们需要求出 α \alpha α来得到 w w w,如何求出 α \alpha α通常是选择SMO算法。
《统计学习方法》P124 7.4序列最小化优化算法开始专门为这部分进行了一个介绍。
首先这个算法最主要的是想求出 α \alpha α,有了 α \alpha α我们就能得到 w w w b b b
那SMO的算法主要是分两步,P128有详细介绍,这里简单描述下,第一步,找出第一个 α \alpha α,这个可以找到一个不满足KKT条件的,进行优化,第二步,找出能使得 ∣ E 1 − E 2 ∣ % \left| {{E_1} - {E_{\rm{2}}}} \right| E1E2最大的数,这里可以通过对 E 1 {E_1} E1进行判断,如果 E 1 {E_1} E1是正的,就选个最小的 E i {E_i} Ei作为 E 2 {E_2} E2,反之同理。

            for i in range(N):  #外循环
                alpha1old = alpha[i].copy()   #未更新的alpha,也就是alpha_old
                y1 = labels[i]  #y1
                data1 = dataSet[i]
                g1 = self.calcG(alpha, labels, data1, dataSet, b)
                alphaIndex1 = -1    #存储不满足KKT条件的alphaIndex1
                if alpha1old == 0:   #判断是否满足KKT条件 (7.111)
                    if y1 * g1 < 1:
                        alphaIndex1 = i
                        flag = True
                if alpha1old > 0 and alpha[i] < C:   #(7.112)
                    if y1 * g1 != 1:
                        alphaIndex1 = i
                        flag = True
                if alpha1old == C:   #(7.1132)
                    if y1 * g1 <= 1:
                        alphaIndex1 = i
                        flag = True
                if alphaIndex1 == -1:   #说明满足KKT条件,继续下一次循环来找alpha1
                    continue
                E1 = g1 - y1    #(7.105)

那首先有个外循环,来遍历每一个 α \alpha α来找第一个 α 1 o l d {\alpha_1^{old}} α1old,书本说的是选取违反KKT最严重的样本点,那这里做了简单处理,只要是违反的就行,并且算出 E 1 {E_1} E1
其中E的计算方式为 E i = g ( x i ) − y i = ( ( ∑ j = 1 N α j y j K ( x j , x i ) ) + b ) − y i {E_i} = g\left( {{x_i}} \right) - {y_i} = \left( {\left( {\sum\limits_{j = 1}^N {{\alpha _j}{y_j}K\left( {{x_j},{x_i}} \right)} } \right) + b} \right) - {y_i} Ei=g(xi)yi=((j=1NαjyjK(xj,xi))+b)yi

                for j in range(N): #内循环
                    if i != j:  #相等就没法选了
                        yj = labels[j]
                        gj = self.calcG(alpha, labels, dataSet[j], dataSet, b)
                        Ej = gj - yj
                        if E1 > 0:  #说明要选最小的E2
                            if Ej < selectedE2:
                                selectedE2 = Ej
                                alphaIndex2 = j
                        else:
                            if Ej > selectedE2:
                                selectedE2 = Ej
                                alphaIndex2 = j

以上的代码就是寻找出第二个 α 2 o l d {\alpha_2^{old}} α2old

                L = 0  # P126末尾两段
                H = 0
                y2 = labels[alphaIndex2]
                alpha2old = alpha[alphaIndex2].copy()
                data2 = dataSet[alphaIndex2]
                E2 = selectedE2
                if (y1 == y2):  #alpha2取值范围必须限制在L H):  # (7.108)
                    alpha[alphaIndex2] = H
                elif (alpha2new < L):
                    alpha[alphaIndex2] = L
                else:
                    alpha[alphaIndex2] = alpha2new
                alpha1new = alpha1old * y1 * y2 * (alpha2old - alpha2new)   #(7.109)
                alpha[alphaIndex1] = alpha1new

OK,有了两个 α \alpha α,我们可以根据公式 α 2 n e w , u n c = α 2 o l d + y 2 ( E 1 − E 2 ) η \alpha _{\rm{2}}^{new,unc} = \alpha _2^{old} + \frac{{{y_2}\left( {{E_1} - {E_2}} \right)}}{\eta } α2new,unc=α2old+ηy2(E1E2)(7.106),其中来得出一个未经处理的候选 α \alpha α,这个 α \alpha α由于有限制范围,需要处理下。
具体处理方法为(7.108): α 2 n e w = { H , α 2 n e w , u n c > H α 2 n e w , u n c , L ≤ α 2 n e w , u n c ≤ H L , α 2 n e w , u n c < L \alpha _2^{new} = \left\{ \begin{array}{l} H,{\rm{ }}\alpha _2^{new,unc} > H\\ \alpha _2^{new,unc},L \le \alpha _2^{new,unc} \le H\\ L,{\rm{ }}\alpha _2^{new,unc} < L \end{array} \right. α2new=H,α2new,unc>Hα2new,unc,Lα2new,uncHL,α2new,unc<L
有了 α 2 n e w \alpha _2^{new} α2new之后,, α 1 n e w = α 1 o l d + y 1 y 2 ( α 2 o l d − α 2 n e w ) \alpha _1^{new} = \alpha _1^{old} + {y_1}{y_2}\left( {\alpha _2^{old} - \alpha _2^{new}} \right) α1new=α1old+y1y2(α2oldα2new)

                b1new = -E1 - y1 * self.calcK(data1, data1) * (alpha1new - alpha1old) - y2 * self.calcK(data2, data1) * (alpha2new - alpha2old) + b #(7.115)
                b2new = -E2 - y1 * self.calcK(data1, data1) * (alpha1new - alpha1old) - y2 * self.calcK(data2, data2) * (alpha2new - alpha2old) + b #(7.116)
                if (alpha1new > 0 and alpha1new < C):
                    b = b1new
                else:
                    b = (b1new + b2new) / 2

同时,由于计算过程中需要用到 b b b,所以需要在此过程中更新 b b b
公式在P130页(7.115)和(7.116), b 1 n e w = − E 1 − y 1 K 11 ( α 1 n e w − α 1 o l d ) − y 2 K 21 ( α 2 n e w − α 2 o l d ) + b o l d b 2 n e w = − E 2 − y 1 K 12 ( α 1 n e w − α 1 o l d ) − y 2 K 21 ( α 2 n e w − α 2 o l d ) + b o l d \begin{array}{l} b_1^{new} = - {E_1} - {y_1}{K_{11}}\left( {\alpha _{\rm{1}}^{new} - \alpha _{\rm{1}}^{old}} \right) - {y_2}{K_{21}}\left( {\alpha _2^{new} - \alpha _2^{old}} \right) + {b^{old}}\\ b_2^{new} = - {E_2} - {y_1}{K_{12}}\left( {\alpha _{\rm{1}}^{new} - \alpha _{\rm{1}}^{old}} \right) - {y_2}{K_{21}}\left( {\alpha _2^{new} - \alpha _2^{old}} \right) + {b^{old}}\\\end{array} b1new=E1y1K11(α1newα1old)y2K21(α2newα2old)+boldb2new=E2y1K12(α1newα1old)y2K21(α2newα2old)+bold
然后如果 α 1 n e w \alpha_1^{new} α1new α 2 n e w \alpha_2^{new} α2new同时满足 0 < α 1 n e w < C 0<{\alpha_1^{new}}<C 0<α1new<C,那么 b = b 1 n e w b={b_1^{new}} b=b1new或者 b = b 2 n e w b={b_2^{new}} b=b2new,如果不符合,则选这两个值的中值。

weights = np.dot(np.multiply(alpha, labels), dataSet)   #权重

至此, α \alpha α全部都求出来了,我们通过 w ∗ = ∑ i = 1 N α i y i x i {w^*} = \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}} w=i=1Nαiyixi可以得到 w w w b b b就是刚才那个 b n e w b^{new} bnew了。
这里有个小插曲, b b b值也可以在每次得到两个 α \alpha α值后,通过 b ∗ = y j − ∑ i = 1 N α i y i x i x j {b^*} = {y_j} - \sum\limits_{i = 1}^N {{\alpha _i}{y_i}{x_i}{x_j}} b=yji=1Nαiyixixj
这个代码在python3环境下运行。

def loadDataSet():  #加载文件
    data = list()
    labels = list()
    with open('testSet.txt') as f:
        lines = f.readlines()
        for line in lines:
            line = line.rstrip().split('\t')
            data.append([float(line[0]), float(line[1])])
            labels.append(float(line[-1]))
    return data, labels

def sign(x):    #符号函数
    if x >= 0:
        return 1
    else:
        return -1

class SVM:
    def train(self, dataSet, labels):   #训练并返回权重和偏置
        b = 0   #偏置
        C = 1 #惩罚系数
        flag = True    #检验是否全部都满足KKT条件
        maxIter = 100 #最大循环次数
        iter = 0
        N = len(dataSet)   #数据的行数
        M = len(dataSet[0])    #数据的列数,维数
        alpha = np.zeros(N)
        while iter < maxIter:
            print(iter)
            iter += 1
            flag = False
            for i in range(N):  #外循环
                alpha1old = alpha[i].copy()   #未更新的alpha,也就是alpha_old
                y1 = labels[i]  #y1
                data1 = dataSet[i]
                g1 = self.calcG(alpha, labels, data1, dataSet, b)
                alphaIndex1 = -1    #存储不满足KKT条件的alphaIndex1
                if alpha1old == 0:   #判断是否满足KKT条件 (7.111)
                    if y1 * g1 < 1:
                        alphaIndex1 = i
                        flag = True
                if alpha1old > 0 and alpha[i] < C:   #(7.112)
                    if y1 * g1 != 1:
                        alphaIndex1 = i
                        flag = True
                if alpha1old == C:   #(7.1132)
                    if y1 * g1 <= 1:
                        alphaIndex1 = i
                        flag = True
                if alphaIndex1 == -1:   #说明满足KKT条件,继续下一次循环来找alpha1
                    continue
                E1 = g1 - y1    #(7.105)
                alphaIndex2 = -1
                if E1 > 0: #正的话要找E2的最小值,反之同理
                    selectedE2 = np.inf
                else:
                    selectedE2 = -np.inf
                for j in range(N): #内循环
                    if i != j:  #相等就没法选了
                        yj = labels[j]
                        gj = self.calcG(alpha, labels, dataSet[j], dataSet, b)
                        Ej = gj - yj
                        if E1 > 0:  #说明要选最小的E2
                            if Ej < selectedE2:
                                selectedE2 = Ej
                                alphaIndex2 = j
                        else:
                            if Ej > selectedE2:
                                selectedE2 = Ej
                                alphaIndex2 = j
                '''
                   此时应该选到了alpha2了
               '''
                L = 0  # P126末尾两段
                H = 0
                y2 = labels[alphaIndex2]
                alpha2old = alpha[alphaIndex2].copy()
                data2 = dataSet[alphaIndex2]
                E2 = selectedE2
                if (y1 == y2):  #alpha2取值范围必须限制在L H):  # (7.108)
                    alpha[alphaIndex2] = H
                elif (alpha2new < L):
                    alpha[alphaIndex2] = L
                else:
                    alpha[alphaIndex2] = alpha2new
                alpha1new = alpha1old * y1 * y2 * (alpha2old - alpha2new)   #(7.109)
                alpha[alphaIndex1] = alpha1new
                b1new = -E1 - y1 * self.calcK(data1, data1) * (alpha1new - alpha1old) - y2 * self.calcK(data2, data1) * (alpha2new - alpha2old) + b #(7.115)
                b2new = -E2 - y1 * self.calcK(data1, data1) * (alpha1new - alpha1old) - y2 * self.calcK(data2, data2) * (alpha2new - alpha2old) + b #(7.116)
                if (alpha1new > 0 and alpha1new < C):
                    b = b1new
                else:
                    b = (b1new + b2new) / 2
        print(alpha)
        weights = np.dot(np.multiply(alpha, labels), dataSet)   #权重
        return weights, b

    def calcK(self, data1, data2):  #线性核函数,返回内积
        return np.dot(data1, data2)

    def calcG(self, alpha, labels, data, dataSet, b):   #计算g
        sum = 0
        for j in range(len(alpha)):
            sum += alpha[j] * labels[j] * self.calcK(data, dataSet[j]) #g(x)的计算
        return sum + b

if __name__ == '__main__':
    dataSet, labels = loadDataSet()
    svm = SVM()
    weights, b = svm.train(dataSet, labels)
    print(weights, b)
    x = [1, 2]
    f = sign(np.dot(weights, x) + b)
    print(f)

以下是数据集,请保存为testSet.txt文件放在同目录下。

3.542485	1.977398	-1
3.018896	2.556416	-1
7.551510	-1.580030	1
2.114999	-0.004466	-1
8.127113	1.274372	1
7.108772	-0.986906	1
8.610639	2.046708	1
2.326297	0.265213	-1
3.634009	1.730537	-1
0.341367	-0.894998	-1
3.125951	0.293251	-1
2.123252	-0.783563	-1
0.887835	-2.797792	-1
7.139979	-2.329896	1
1.696414	-1.212496	-1
8.117032	0.623493	1
8.497162	-0.266649	1
4.658191	3.507396	-1
8.197181	1.545132	1
1.208047	0.213100	-1
1.928486	-0.321870	-1
2.175808	-0.014527	-1
7.886608	0.461755	1
3.223038	-0.552392	-1
3.628502	2.190585	-1
7.407860	-0.121961	1
7.286357	0.251077	1
2.301095	-0.533988	-1
-0.232542	-0.547690	-1
3.457096	-0.082216	-1
3.023938	-0.057392	-1
8.015003	0.885325	1
8.991748	0.923154	1
7.916831	-1.781735	1
7.616862	-0.217958	1
2.450939	0.744967	-1
7.270337	-2.507834	1
1.749721	-0.961902	-1
1.803111	-0.176349	-1
8.804461	3.044301	1
1.231257	-0.568573	-1
2.074915	1.410550	-1
-0.743036	-1.736103	-1
3.536555	3.964960	-1
8.410143	0.025606	1
7.382988	-0.478764	1
6.960661	-0.245353	1
8.234460	0.701868	1
8.168618	-0.903835	1
1.534187	-0.622492	-1
9.229518	2.066088	1
7.886242	0.191813	1
2.893743	-1.643468	-1
1.870457	-1.040420	-1
5.286862	-2.358286	1
6.080573	0.418886	1
2.544314	1.714165	-1
6.016004	-3.753712	1
0.926310	-0.564359	-1
0.870296	-0.109952	-1
2.369345	1.375695	-1
1.363782	-0.254082	-1
7.279460	-0.189572	1
1.896005	0.515080	-1
8.102154	-0.603875	1
2.529893	0.662657	-1
1.963874	-0.365233	-1
8.132048	0.785914	1
8.245938	0.372366	1
6.543888	0.433164	1
-0.236713	-5.766721	-1
8.112593	0.295839	1
9.803425	1.495167	1
1.497407	-0.552916	-1
1.336267	-1.632889	-1
9.205805	-0.586480	1
1.966279	-1.840439	-1
8.398012	1.584918	1
7.239953	-1.764292	1
7.556201	0.241185	1
9.015509	0.345019	1
8.266085	-0.230977	1
8.545620	2.788799	1
9.295969	1.346332	1
2.404234	0.570278	-1
2.037772	0.021919	-1
1.727631	-0.453143	-1
1.979395	-0.050773	-1
8.092288	-1.372433	1
1.667645	0.239204	-1
9.854303	1.365116	1
7.921057	-1.327587	1
8.500757	1.492372	1
1.339746	-0.291183	-1
3.107511	0.758367	-1
2.609525	0.902979	-1
3.263585	1.367898	-1
2.912122	-0.202359	-1
1.731786	0.589096	-1
2.387003	1.573131	-1

你可能感兴趣的:(机器学习,统计学习方法,SVM,python,支持向量机,机器学习)