手打SVM公式推导以及利用对偶学习算法求解全过程

手打SVM公式推导以及利用对偶学习算法求解全过程

视频地址
以下是看完 视频的笔记,涉及 SVM公式的推导、求解全过程:

  1. svm三宝:间隔、对偶,核函数。
  2. SVM分为:硬间隔SVM、软间隔SVM、核函数。

公式推导:

{ m a x m a r g i n ( w , b ) s . t . y i ( w T x i + b ) > 0 , ( i ∈ 1 , 2 , . . . , N ) \left \{ \begin{aligned} & max margin(w, b) \\ & s.t. \quad y_i(w^Tx_i+b) > 0, (i \in {1,2,...,N}) \\ \end{aligned} \right. {maxmargin(w,b)s.t.yi(wTxi+b)>0,(i1,2,...,N)

由于 y i y_i yi=-1或者+1,由点到直线的距离公式可推:
m a x m a r g i n ( w , b ) = m a x w , b m i n x i d i s t a n c e ( w , b , x i ) = m a x w , b m i n x i 1 ∣ ∣ w ∣ ∣ ∣ w T x i + b ∣ = m a x w , b 1 ∣ ∣ w ∣ ∣ m i n x i ∣ w T x i + b ∣ = m a x w , b 1 ∣ ∣ w ∣ ∣ m i n x i y i ( w T x i + b ) \begin{aligned} max margin(w,b) &= \underset{w,b}{max} \underset{x_i}{min} distance(w, b, x_i) \\ & =\underset{w,b}{max} \underset{x_i}{min} \frac{1}{||w||} |w^Tx_i+b|\\ & =\underset{w,b}{max} \frac{1}{||w||} \underset{x_i}{min} |w^Tx_i + b| \\ & = \underset{w,b}{max} \frac{1}{||w||} \underset{x_i}{min} y_i(w^Tx_i+b) \\ \end{aligned} maxmargin(w,b)=w,bmaxximindistance(w,b,xi)=w,bmaxximinw1wTxi+b=w,bmaxw1ximinwTxi+b=w,bmaxw1ximinyi(wTxi+b)

由于同时对w,b进行缩放不影响 超平面 w T x i + b w^Tx_i + b wTxi+b 的表达。所以,我们令 y i ( w T x i + b ) = 1 y_i(w^Tx_i+b)=1 yi(wTxi+b)=1,则上式可化为:
m a x m a r g i n ( w , b ) = m a x w , b 1 ∣ ∣ w ∣ ∣ = m i n w , b 1 2 ∣ ∣ w ∣ ∣ \begin{aligned} max margin(w,b) & = \underset{w,b}{max} \frac{1}{||w||} \\ & = \underset{w,b}{min} \frac{1}{2} ||w|| \end{aligned} maxmargin(w,b)=w,bmaxw1=w,bmin21w

这里的 1 2 \frac{1}{2} 21是我们方便求导加上去的,不影响求最值。这里的 ∣ ∣ w ∣ ∣ ||w|| w 就是w的2范数,也就是 w T w w^Tw wTw.
所以,最终 SVM的公式可表示为:
{ m i n w , b 1 2 w T w s . t . m i n w , b y i ( w T x i + b ) = 1 \left \{ \begin{aligned} & \underset{w,b}{min} \frac {1}{2}w^Tw \\ & s.t. \quad \underset{w,b}{min} y_i(w^Tx_i+b)=1 \\ \end{aligned} \right. w,bmin21wTws.t.w,bminyi(wTxi+b)=1

也就是:
{ m i n w , b 1 2 w T w s . t . y i ( w T x i + b ) ≥ 1 \left \{ \begin{aligned} & \underset{w,b}{min} \frac {1}{2}w^Tw \\ & s.t. \quad y_i(w^Tx_i+b) \geq 1 \\ \end{aligned} \right. w,bmin21wTws.t.yi(wTxi+b)1

也就是:
{ m i n w , b 1 2 w T w s . t . 1 − y i ( w T x i + b ) ≤ 0 \left \{ \begin{aligned} & \underset{w,b}{min} \frac {1}{2}w^Tw \\ & s.t. \quad 1- y_i(w^Tx_i+b) \leq 0 \\ \end{aligned} \right. w,bmin21wTws.t.1yi(wTxi+b)0

求解

考虑上面的这个带约束的二次凸优化问题,我们可以利用拉格朗日公式化为 无约束优化问题,然后,转化为一个最小最大的原始问题,
然后,由于二次凸优化问题,对偶问题的解=原始问题的解。并且,强对偶满足KKT条件,我们就可以利用KKT条件对拉格朗日公式进行求导,进而求出最优值。

带约束的问题:
{ m i n w , b 1 2 w T w s . t . 1 − y i ( w T x i + b ) ≤ 0 \left \{ \begin{aligned} & \underset{w,b}{min} \frac {1}{2}w^Tw \\ & s.t. \quad 1- y_i(w^Tx_i+b) \leq 0 \\ \end{aligned} \right. w,bmin21wTws.t.1yi(wTxi+b)0

利用拉格朗日公式化为 无约束问题:引入参数 λ i = λ 1 , λ 2 , . . . , λ N \lambda_i = {\lambda_1, \lambda_2, ... , \lambda_N} λi=λ1,λ2,...,λN
L ( w , b , λ ) = 1 2 w T w + ∑ i = 1 N λ i ( 1 − y i ( w T x i + b ) ) \begin{aligned} L(w,b, \lambda) = \frac{1}{2}w^Tw + \sum_{i=1}^N \lambda_i(1-y_i(w^Tx_i+b)) \end{aligned} L(w,b,λ)=21wTw+i=1Nλi(1yi(wTxi+b))

则带约束的问题可以转化为下面无约束问题:
{ m i n w , b m a x λ i L ( w , b , λ ) s . t . λ i ≥ 0 \left \{ \begin{aligned} & \underset{w,b}{min} \underset{\lambda_i}{max} \quad L(w,b,\lambda) \\ & s.t. \quad \lambda_i \geq 0 \end{aligned} \right. w,bminλimaxL(w,b,λ)s.t.λi0

根据对偶关系,上面的无约束的最小最大原始问题,可以转化为它的对偶问题,即最大最小问题:
{ m a x λ i m i n w , b L ( w , b , λ i ) s . t . λ i ≥ 0 \left \{ \begin{aligned} & \underset{\lambda_i}{max} \underset{w,b}{min} \quad L(w,b,\lambda_i) \\ & s.t. \quad \lambda_i \geq 0 \end{aligned} \right. λimaxw,bminL(w,b,λi)s.t.λi0
(1)先求 m i n w , b L ( w , b , λ ) \underset{w,b}{min} L(w,b, \lambda) w,bminL(w,b,λ)
L ( w , b , λ i ) L(w,b, \lambda_i) L(w,b,λi)分别对w,b进行求导,可以得到:
w = ∑ i = 1 N λ i y i x i w = \sum_{i=1}^N \lambda_i y_i x_i w=i=1Nλiyixi
∑ i = 1 N λ i y i = 0 \sum_{i=1}^N \lambda_i y_i = 0 i=1Nλiyi=0

代入拉格朗日函数 L ( w , b , λ ) L(w,b, \lambda) L(w,b,λ)中可得:

m i n w , b L ( w , b , λ ) = − 1 2 ∑ i = 1 N ∑ j = 1 N λ i λ j y i y j ( x i ⋅ x j ) + ∑ i = 1 N λ i \underset{w,b}{min} L(w,b,\lambda) = -\frac{1}{2}\sum_{i=1}^N \sum_{j=1}^N \lambda_i \lambda_j y_i y_j (x_i \cdot x_j) + \sum_{i=1}^N \lambda_i w,bminL(w,b,λ)=21i=1Nj=1Nλiλjyiyj(xixj)+i=1Nλi

(2) 求 m a x λ i m i n w , b L ( w , b , λ ) \underset{\lambda_i}{max} \underset{w,b}{min} \quad L(w,b,\lambda) λimaxw,bminL(w,b,λ)
m a x λ i m i n w , b L ( w , b , λ ) = { m a x λ i − 1 2 ∑ i = 1 N ∑ j = 1 N λ i λ j y i y j ( x i ⋅ x j ) + ∑ i = 1 N λ i s . t . ∑ i = 1 N λ i y i = 0 λ i ≥ 0 , i = 1 , 2 , . . . , N \begin{aligned} \underset{\lambda_i}{max} \underset{w,b}{min} L(w,b,\lambda) &= \left \{ \begin{aligned} \underset{\lambda_i}{max} &\quad -\frac{1}{2}\sum_{i=1}^N \sum_{j=1}^N \lambda_i \lambda_j y_i y_j (x_i \cdot x_j) + \sum_{i=1}^N \lambda_i \\ s.t. &\quad \sum_{i=1}^{N} \lambda_i y_i = 0 \\ &\quad \lambda_i \geq 0, i = 1,2,...,N \end{aligned} \right. \end{aligned} λimaxw,bminL(w,b,λ)=λimaxs.t.21i=1Nj=1Nλiλjyiyj(xixj)+i=1Nλii=1Nλiyi=0λi0,i=1,2,...,N

由求极大值转化为求极小值:
则最终原问题的对偶问题可表达为:
m a x λ i m i n w , b L ( w , b , λ ) = { m i n λ i 1 2 ∑ i = 1 N ∑ j = 1 N λ i λ j y i y j ( x i ⋅ x j ) − ∑ i = 1 N λ i s . t . ∑ i = 1 N λ i y i = 0 λ i ≥ 0 , i = 1 , 2 , . . . , N \begin{aligned} \underset{\lambda_i}{max} \underset{w,b}{min} L(w,b,\lambda) &= \left \{ \begin{aligned} \underset{\lambda_i}{min} &\quad \frac{1}{2}\sum_{i=1}^N \sum_{j=1}^N \lambda_i \lambda_j y_i y_j (x_i \cdot x_j) - \sum_{i=1}^N \lambda_i \\ s.t. & \quad \sum_{i=1}^{N} \lambda_i y_i = 0 \\ &\quad \lambda_i \geq 0, i = 1,2,...,N \end{aligned} \right. \end{aligned} λimaxw,bminL(w,b,λ)=λimins.t.21i=1Nj=1Nλiλjyiyj(xixj)i=1Nλii=1Nλiyi=0λi0,i=1,2,...,N

假设上式对偶问题对 λ \lambda λ的解为
λ ∗ = ( λ 1 ∗ , λ 2 ∗ , , . . . , λ N ∗ ) T \lambda^* = (\lambda_1^* , \lambda_2^* , ,..., \lambda_N^* )^T λ=(λ1,λ2,,...,λN)T
则可以由 λ ∗ \lambda^* λ 求出 w ∗ , b ∗ w^*, b^* w,b.

注意!这块的证明以及定理可以去参考《统计学习方法》第二版105页,这里不做具体证明。

根据KKT条件可以求出:
w ∗ = ∑ i = 1 N λ i ∗ y i x i w^* = \sum_{i=1}^N \lambda_{i}^* y_i x_i w=i=1Nλiyixi
b ∗ = y j − ∑ i = 1 N λ i ∗ y i ( x i ⋅ x j ) b^* = y_j - \sum_{i=1}^N \lambda_i^* y_i(x_i \cdot x_j) b=yji=1Nλiyi(xixj)

由此,可写出分离超平面为:
∑ i = 1 N λ i ∗ y i ( x ⋅ x i ) + b ∗ = 0 \sum_{i=1}^N \lambda_i^* y_i(x \cdot x_i) + b^* = 0 i=1Nλiyi(xxi)+b=0

分类决策函数可以写成:
f ( x ) = s i g n ( ∑ i = 1 N λ i ∗ y i ( x ⋅ x i ) + b ∗ ) f(x) = sign(\sum_{i=1}^N \lambda_i^* y_i(x \cdot x_i) + b^* ) f(x)=sign(i=1Nλiyi(xxi)+b)

说明:这里的x指的是测试样本输入,则由此可以看出,分类决策函数只依赖于输入x和训练样本 x i x_i xi的內积。

你可能感兴趣的:(机器学习,机器学习,SVM,公式推导)