支持向量机(SVM)原理推导:Soft-margin

Soft-margin SVM原理推导

问题阐述

已知硬边缘支持向量机模型为:
min ⁡ w , b   1 2 ∥ w ∥ 2   s . t .    y i ( w T x i + b ) ⩾ 1 , i = 1 , 2 , … , m \begin{aligned} & \min_{w, b}\ \frac{1}{2}\|w\|^{2} \\ & \ s.t. \ \ y_{i}\left(\boldsymbol{w}^{\rm{T}} \boldsymbol{x}_{i}+b\right) \geqslant 1, \quad i=1,2, \ldots, m \end{aligned} w,bmin 21w2 s.t.  yi(wTxi+b)1,i=1,2,,m

然而现实训练样本存在不完全线性可分的情况,如下图所示
支持向量机(SVM)原理推导:Soft-margin_第1张图片
即存在某些样本不满足约束

y i ( w T x i + b ) ⩾ 1 y_{i}\left(\boldsymbol{w}^{\rm{T}} \boldsymbol{x}_{i}+b\right) \geqslant 1 yi(wTxi+b)1

因此引入“松弛变量” ξ i ≥ 0 \xi_i \ge 0 ξi0,并使得在最大化间隔的同时,不满足约束的样本应该尽可能少。

min ⁡ w , b , ξ i   1 2 ∥ w ∥ 2 + C ∑ i = 1 m ξ i s . t . y i ( w T x i + b ) ⩾ 1 − ξ i ξ i ≥ 0 , i = 1 , 2 , . . . , m \begin{aligned} & \min_{w, b, \xi_i} \ \frac{1}{2} \|w\|^{2} + C \sum_{i=1}^{m} \xi_i \\ & s.t. \quad y_{i}\left(\boldsymbol{w}^{\rm{T}} \boldsymbol{x}_{i}+b\right) \geqslant 1 - \xi_i\\ & \xi_i \ge 0 ,\quad i=1,2,...,m \end{aligned} w,b,ξimin 21w2+Ci=1mξis.t.yi(wTxi+b)1ξiξi0,i=1,2,...,m

拉格朗日对偶问题

使用拉格朗日乘子法得到拉格朗日函数:

L ( w , b , α , ξ , μ ) = 1 2 ∥ w ∥ 2 + C ∑ i = 1 m ξ i + ∑ i = 1 m α i ( 1 − ξ i − y i ( w T x i + b ) ) − ∑ i = 1 m μ i ξ i \begin{aligned} L(\boldsymbol{w}, b, \boldsymbol{\alpha}, \boldsymbol{\xi}, \boldsymbol{\mu})=& \frac{1}{2}\|\boldsymbol{w}\|^{2}+C \sum_{i=1}^{m} \xi_{i} \\ &+\sum_{i=1}^{m} \alpha_{i}\left(1-\xi_{i}-y_{i}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b\right)\right)-\sum_{i=1}^{m} \mu_{i} \xi_{i} \end{aligned} L(w,b,α,ξ,μ)=21w2+Ci=1mξi+i=1mαi(1ξiyi(wTxi+b))i=1mμiξi

其中 α i ≥ 0 , μ i ≥ 0 \alpha_i \ge 0,\mu_i \ge 0 αi0,μi0 是拉格朗日乘子。

即原始问题的等价问题变为:
min ⁡ w , b , ξ i max ⁡ L ( w , b , α , ξ , μ ) \min_{w, b, \xi_i} \max L(\boldsymbol{w}, b, \boldsymbol{\alpha}, \boldsymbol{\xi}, \boldsymbol{\mu}) w,b,ξiminmaxL(w,b,α,ξ,μ)

再求其对偶问题:

max ⁡ min ⁡ w , b , ξ i L ( w , b , α , ξ , μ ) \max \min_{w, b, \xi_i} L(\boldsymbol{w}, b, \boldsymbol{\alpha}, \boldsymbol{\xi}, \boldsymbol{\mu}) maxw,b,ξiminL(w,b,α,ξ,μ)

求解 min ⁡ w , b , ξ i L ( w , b , α , ξ , μ ) \min_{w, b, \xi_i} L(\boldsymbol{w}, b, \boldsymbol{\alpha},\boldsymbol{\xi}, \boldsymbol{\mu}) minw,b,ξiL(w,b,α,ξ,μ) ,令 L ( w , b , α , ξ , μ ) L(\boldsymbol{w}, b, \boldsymbol{\alpha},\boldsymbol{\xi}, \boldsymbol{\mu}) L(w,b,α,ξ,μ) w , ξ , μ \boldsymbol{w}, \boldsymbol{\xi}, \boldsymbol{\mu} w,ξ,μ 求偏导数:

∂ L ( w , b , α , ξ , μ ) ∂ w = w − ∑ i = 1 m α i y i x i ∂ L ( w , b , α , ξ , μ ) ∂ b = ∂ ∑ i = 1 m α i ( − y i ) b ∂ b = − ∑ i = 1 m α i y i ∂ L ( w , b , α , ξ , μ ) ∂ ξ i = C − α i − μ i \begin{aligned} & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha},\boldsymbol{\xi}, \boldsymbol{\mu})} {\partial \boldsymbol{w}} = \boldsymbol{w} - \sum_{i=1}^{m} \alpha_i y_i \boldsymbol{x_i} \\ & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha},\boldsymbol{\xi}, \boldsymbol{\mu})} {\partial b} = \frac{\partial \sum_{i=1}^m \alpha_i (-y_i)b} {\partial b} = - \sum_{i=1}^m \alpha_i y_i \\ & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha},\boldsymbol{\xi}, \boldsymbol{\mu})} {\partial \xi_i} = C - \alpha_i - \mu_i \end{aligned} wL(w,b,α,ξ,μ)=wi=1mαiyixibL(w,b,α,ξ,μ)=bi=1mαi(yi)b=i=1mαiyiξiL(w,b,α,ξ,μ)=Cαiμi

令偏导数为零可得:

w = ∑ i = 1 m α i y i x i , 0 = ∑ i = 1 m α i y i C = α i + μ i \begin{aligned} & \boldsymbol{w} = \sum_{i=1}^{m} \alpha_i y_i \boldsymbol{x_i},\\ & 0 = \sum_{i=1}^m \alpha_i y_i \\ & C = \alpha_i + \mu_i \end{aligned} w=i=1mαiyixi,0=i=1mαiyiC=αi+μi

将其带入拉格朗日函数,对偶问题变为:

max ⁡ α ∑ i = 1 m α i − 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x i T x j  s.t.  ∑ i = 1 m α i y i = 0 , 0 ⩽ α i ⩽ C , i = 1 , 2 , … , m \begin{aligned} \max _{\alpha} & \sum_{i=1}^{m} \alpha_{i}-\frac{1}{2} \sum_{i=1}^{m} \sum_{j=1}^{m} \alpha_{i} \alpha_{j} y_{i} y_{j} \boldsymbol{x}_{i}^{\mathrm{T}} \boldsymbol{x}_{j} \\ \text { s.t. } & \sum_{i=1}^{m} \alpha_{i} y_{i}=0, \\ & 0 \leqslant \alpha_{i} \leqslant C, \quad i=1,2, \ldots, m \end{aligned} αmax s.t. i=1mαi21i=1mj=1mαiαjyiyjxiTxji=1mαiyi=0,0αiC,i=1,2,,m

KKT条件

对于软间隔支持向量机,KKT条件要求:

{ α i ⩾ 0 , μ i ⩾ 0 y i f ( x i ) − 1 + ξ i ⩾ 0 α i ( y i f ( x i ) − 1 + ξ i ) = 0 ξ i ⩾ 0 , μ i ξ i = 0 \left\{\begin{array}{l} \alpha_{i} \geqslant 0, \quad \mu_{i} \geqslant 0 \\ y_{i} f\left(\boldsymbol{x}_{i}\right)-1+\xi_{i} \geqslant 0 \\ \alpha_{i}\left(y_{i} f\left(\boldsymbol{x}_{i}\right)-1+\xi_{i}\right) = 0 \\ \xi_{i} \geqslant 0, \mu_{i} \xi_{i} = 0 \end{array}\right. αi0,μi0yif(xi)1+ξi0αi(yif(xi)1+ξi)=0ξi0,μiξi=0

分析上述条件,对于 ∀ ( x i , y i ) \forall (\boldsymbol{x}_i, y_i) (xi,yi),总有 α i = 0   ⋁   y i f ( x i ) = 1 − ξ i \alpha_i=0 \ \bigvee \ y_i f(\boldsymbol{x}_i)=1-\xi_i αi=0  yif(xi)=1ξi

  • α i = 0 \alpha_i = 0 αi=0,则该样本无影响。
  • y i f ( x i ) = 1 − ξ i y_i f(\boldsymbol{x}_i)=1-\xi_i yif(xi)=1ξi,则该样本为“支持向量”。由对偶问题的约束可知
    • α i < C \alpha_i < C αi<C,则 μ i > 0 \mu_i > 0 μi>0,进而 ξ i = 0 \xi_i = 0 ξi=0,该样本恰处于最大间隔边界
    • α i = C \alpha_i = C αi=C,则 μ i = 0 \mu_i = 0 μi=0,此时若 ξ i ≤ 1 \xi_i \le 1 ξi1,则样本处于最大间隔内部;若 ξ i > 1 \xi_i \gt 1 ξi>1 则样本被错误分类

因此最终模型仅与支持向量有关,根据上述式子可得:

w = ∑ i = 1 m α i y i x i \begin{aligned} w &=\sum_{i=1}^{m} \alpha_{i} y_{i} \boldsymbol{x}_{i} \end{aligned} w=i=1mαiyixi

令所有支持向量的下标集为 S = { i   ∣   α i > 0 , i = 1 , 2 , . . . , n } \mathbf{S} = \{i \ | \ \alpha_i > 0, i = 1,2,...,n \} S={i  αi>0,i=1,2,...,n},对于任意支持向量 ( x s , y s ) (\boldsymbol{x}_s,y_s) (xs,ys),都有 y s f ( x s ) = 1 − ξ s y_s f_(\boldsymbol{x}_s) = 1 - \xi_s ysf(xs)=1ξs,因此

y s ( ∑ i ∈ S α i y i x i T x s + b ) = 1 − ξ s y_s(\sum_{i \in \mathbf{S}} \alpha_i y_i \boldsymbol{x}_{i}^{\rm T} \boldsymbol{x}_{s} + b) = 1 - \xi_s ys(iSαiyixiTxs+b)=1ξs

理论上可以采用任意支持向量求得 b b b,但鲁棒性更高的做法为使用所有支持向量的均值求得:

b = 1 ∣ S ∣ ∑ s ∈ S ( 1 − ξ s y s − ∑ i ∈ S α i y i x i T x s ) b = \frac{1}{|\mathbf{S}|} \sum_{s \in S}(\frac {1 - \xi_s}{y_s} - \sum_{i \in \mathbf{S}} \alpha_i y_i \boldsymbol{x}_{i}^{\rm T} \boldsymbol{x}_{s}) b=S1sS(ys1ξsiSαiyixiTxs)

你可能感兴趣的:(支持向量机,机器学习,算法)