已知硬边缘支持向量机模型为:
min w , b 1 2 ∥ w ∥ 2 s . t . y i ( w T x i + b ) ⩾ 1 , i = 1 , 2 , … , m \begin{aligned} & \min_{w, b}\ \frac{1}{2}\|w\|^{2} \\ & \ s.t. \ \ y_{i}\left(\boldsymbol{w}^{\rm{T}} \boldsymbol{x}_{i}+b\right) \geqslant 1, \quad i=1,2, \ldots, m \end{aligned} w,bmin 21∥w∥2 s.t. yi(wTxi+b)⩾1,i=1,2,…,m
然而现实训练样本存在不完全线性可分的情况,如下图所示
即存在某些样本不满足约束
y i ( w T x i + b ) ⩾ 1 y_{i}\left(\boldsymbol{w}^{\rm{T}} \boldsymbol{x}_{i}+b\right) \geqslant 1 yi(wTxi+b)⩾1
因此引入“松弛变量” ξ i ≥ 0 \xi_i \ge 0 ξi≥0,并使得在最大化间隔的同时,不满足约束的样本应该尽可能少。
min w , b , ξ i 1 2 ∥ w ∥ 2 + C ∑ i = 1 m ξ i s . t . y i ( w T x i + b ) ⩾ 1 − ξ i ξ i ≥ 0 , i = 1 , 2 , . . . , m \begin{aligned} & \min_{w, b, \xi_i} \ \frac{1}{2} \|w\|^{2} + C \sum_{i=1}^{m} \xi_i \\ & s.t. \quad y_{i}\left(\boldsymbol{w}^{\rm{T}} \boldsymbol{x}_{i}+b\right) \geqslant 1 - \xi_i\\ & \xi_i \ge 0 ,\quad i=1,2,...,m \end{aligned} w,b,ξimin 21∥w∥2+Ci=1∑mξis.t.yi(wTxi+b)⩾1−ξiξi≥0,i=1,2,...,m
使用拉格朗日乘子法得到拉格朗日函数:
L ( w , b , α , ξ , μ ) = 1 2 ∥ w ∥ 2 + C ∑ i = 1 m ξ i + ∑ i = 1 m α i ( 1 − ξ i − y i ( w T x i + b ) ) − ∑ i = 1 m μ i ξ i \begin{aligned} L(\boldsymbol{w}, b, \boldsymbol{\alpha}, \boldsymbol{\xi}, \boldsymbol{\mu})=& \frac{1}{2}\|\boldsymbol{w}\|^{2}+C \sum_{i=1}^{m} \xi_{i} \\ &+\sum_{i=1}^{m} \alpha_{i}\left(1-\xi_{i}-y_{i}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b\right)\right)-\sum_{i=1}^{m} \mu_{i} \xi_{i} \end{aligned} L(w,b,α,ξ,μ)=21∥w∥2+Ci=1∑mξi+i=1∑mαi(1−ξi−yi(wTxi+b))−i=1∑mμiξi
其中 α i ≥ 0 , μ i ≥ 0 \alpha_i \ge 0,\mu_i \ge 0 αi≥0,μi≥0 是拉格朗日乘子。
即原始问题的等价问题变为:
min w , b , ξ i max L ( w , b , α , ξ , μ ) \min_{w, b, \xi_i} \max L(\boldsymbol{w}, b, \boldsymbol{\alpha}, \boldsymbol{\xi}, \boldsymbol{\mu}) w,b,ξiminmaxL(w,b,α,ξ,μ)
再求其对偶问题:
max min w , b , ξ i L ( w , b , α , ξ , μ ) \max \min_{w, b, \xi_i} L(\boldsymbol{w}, b, \boldsymbol{\alpha}, \boldsymbol{\xi}, \boldsymbol{\mu}) maxw,b,ξiminL(w,b,α,ξ,μ)
求解 min w , b , ξ i L ( w , b , α , ξ , μ ) \min_{w, b, \xi_i} L(\boldsymbol{w}, b, \boldsymbol{\alpha},\boldsymbol{\xi}, \boldsymbol{\mu}) minw,b,ξiL(w,b,α,ξ,μ) ,令 L ( w , b , α , ξ , μ ) L(\boldsymbol{w}, b, \boldsymbol{\alpha},\boldsymbol{\xi}, \boldsymbol{\mu}) L(w,b,α,ξ,μ) 对 w , ξ , μ \boldsymbol{w}, \boldsymbol{\xi}, \boldsymbol{\mu} w,ξ,μ 求偏导数:
∂ L ( w , b , α , ξ , μ ) ∂ w = w − ∑ i = 1 m α i y i x i ∂ L ( w , b , α , ξ , μ ) ∂ b = ∂ ∑ i = 1 m α i ( − y i ) b ∂ b = − ∑ i = 1 m α i y i ∂ L ( w , b , α , ξ , μ ) ∂ ξ i = C − α i − μ i \begin{aligned} & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha},\boldsymbol{\xi}, \boldsymbol{\mu})} {\partial \boldsymbol{w}} = \boldsymbol{w} - \sum_{i=1}^{m} \alpha_i y_i \boldsymbol{x_i} \\ & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha},\boldsymbol{\xi}, \boldsymbol{\mu})} {\partial b} = \frac{\partial \sum_{i=1}^m \alpha_i (-y_i)b} {\partial b} = - \sum_{i=1}^m \alpha_i y_i \\ & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha},\boldsymbol{\xi}, \boldsymbol{\mu})} {\partial \xi_i} = C - \alpha_i - \mu_i \end{aligned} ∂w∂L(w,b,α,ξ,μ)=w−i=1∑mαiyixi∂b∂L(w,b,α,ξ,μ)=∂b∂∑i=1mαi(−yi)b=−i=1∑mαiyi∂ξi∂L(w,b,α,ξ,μ)=C−αi−μi
令偏导数为零可得:
w = ∑ i = 1 m α i y i x i , 0 = ∑ i = 1 m α i y i C = α i + μ i \begin{aligned} & \boldsymbol{w} = \sum_{i=1}^{m} \alpha_i y_i \boldsymbol{x_i},\\ & 0 = \sum_{i=1}^m \alpha_i y_i \\ & C = \alpha_i + \mu_i \end{aligned} w=i=1∑mαiyixi,0=i=1∑mαiyiC=αi+μi
将其带入拉格朗日函数,对偶问题变为:
max α ∑ i = 1 m α i − 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x i T x j s.t. ∑ i = 1 m α i y i = 0 , 0 ⩽ α i ⩽ C , i = 1 , 2 , … , m \begin{aligned} \max _{\alpha} & \sum_{i=1}^{m} \alpha_{i}-\frac{1}{2} \sum_{i=1}^{m} \sum_{j=1}^{m} \alpha_{i} \alpha_{j} y_{i} y_{j} \boldsymbol{x}_{i}^{\mathrm{T}} \boldsymbol{x}_{j} \\ \text { s.t. } & \sum_{i=1}^{m} \alpha_{i} y_{i}=0, \\ & 0 \leqslant \alpha_{i} \leqslant C, \quad i=1,2, \ldots, m \end{aligned} αmax s.t. i=1∑mαi−21i=1∑mj=1∑mαiαjyiyjxiTxji=1∑mαiyi=0,0⩽αi⩽C,i=1,2,…,m
对于软间隔支持向量机,KKT条件要求:
{ α i ⩾ 0 , μ i ⩾ 0 y i f ( x i ) − 1 + ξ i ⩾ 0 α i ( y i f ( x i ) − 1 + ξ i ) = 0 ξ i ⩾ 0 , μ i ξ i = 0 \left\{\begin{array}{l} \alpha_{i} \geqslant 0, \quad \mu_{i} \geqslant 0 \\ y_{i} f\left(\boldsymbol{x}_{i}\right)-1+\xi_{i} \geqslant 0 \\ \alpha_{i}\left(y_{i} f\left(\boldsymbol{x}_{i}\right)-1+\xi_{i}\right) = 0 \\ \xi_{i} \geqslant 0, \mu_{i} \xi_{i} = 0 \end{array}\right. ⎩⎪⎪⎨⎪⎪⎧αi⩾0,μi⩾0yif(xi)−1+ξi⩾0αi(yif(xi)−1+ξi)=0ξi⩾0,μiξi=0
分析上述条件,对于 ∀ ( x i , y i ) \forall (\boldsymbol{x}_i, y_i) ∀(xi,yi),总有 α i = 0 ⋁ y i f ( x i ) = 1 − ξ i \alpha_i=0 \ \bigvee \ y_i f(\boldsymbol{x}_i)=1-\xi_i αi=0 ⋁ yif(xi)=1−ξi。
因此最终模型仅与支持向量有关,根据上述式子可得:
w = ∑ i = 1 m α i y i x i \begin{aligned} w &=\sum_{i=1}^{m} \alpha_{i} y_{i} \boldsymbol{x}_{i} \end{aligned} w=i=1∑mαiyixi
令所有支持向量的下标集为 S = { i ∣ α i > 0 , i = 1 , 2 , . . . , n } \mathbf{S} = \{i \ | \ \alpha_i > 0, i = 1,2,...,n \} S={i ∣ αi>0,i=1,2,...,n},对于任意支持向量 ( x s , y s ) (\boldsymbol{x}_s,y_s) (xs,ys),都有 y s f ( x s ) = 1 − ξ s y_s f_(\boldsymbol{x}_s) = 1 - \xi_s ysf(xs)=1−ξs,因此
y s ( ∑ i ∈ S α i y i x i T x s + b ) = 1 − ξ s y_s(\sum_{i \in \mathbf{S}} \alpha_i y_i \boldsymbol{x}_{i}^{\rm T} \boldsymbol{x}_{s} + b) = 1 - \xi_s ys(i∈S∑αiyixiTxs+b)=1−ξs
理论上可以采用任意支持向量求得 b b b,但鲁棒性更高的做法为使用所有支持向量的均值求得:
b = 1 ∣ S ∣ ∑ s ∈ S ( 1 − ξ s y s − ∑ i ∈ S α i y i x i T x s ) b = \frac{1}{|\mathbf{S}|} \sum_{s \in S}(\frac {1 - \xi_s}{y_s} - \sum_{i \in \mathbf{S}} \alpha_i y_i \boldsymbol{x}_{i}^{\rm T} \boldsymbol{x}_{s}) b=∣S∣1s∈S∑(ys1−ξs−i∈S∑αiyixiTxs)