【白板推导系列笔记】支持向量机-硬间隔SVM-模型求解-引出对偶问题&引出KKT条件

{ min  ω , b 1 2 ω T ω s . t . y i ( ω T x i + b ) ≥ 1 ⇔ 1 − y i ( ω T x i + b ) ≤ 0 , i = 1 , 2 , ⋯   , N ⏟ N 个约束 \left\{\begin{aligned}&\mathop{\text{min }}\limits_{\omega,b} \frac{1}{2}\omega^{T}\omega\\&s.t.y_{i}(\omega^{T}x_{i}+b)\geq 1\Leftrightarrow 1-y_{i}(\omega^{T}x_{i}+b)\leq 0,\underbrace{i=1,2,\cdots,N}_{N个约束}\end{aligned}\right. ω,bmin 21ωTωs.t.yi(ωTxi+b)11yi(ωTxi+b)0,N个约束 i=1,2,,N
构建拉格朗日函数
L ( ω , b , λ ) = 1 2 ω T ω + ∑ i = 1 N λ i [ 1 − y i ( ω T x i + b ) ] L(\omega,b,\lambda)=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}[1-y_{i}(\omega^{T}x_{i}+b)] L(ω,b,λ)=21ωTω+i=1Nλi[1yi(ωTxi+b)]
注意这里 L L L括号里面的 λ \lambda λ N × 1 N \times 1 N×1,等号右边的 λ i \lambda_{i} λi 1 × 1 1 \times 1 1×1

拉格朗日乘子法具体后面文章会解释

例如本题,如果 1 − y i ( ω T x i + b ) > 0 1-y_{i}(\omega^{T}x_{i}+b)>0 1yi(ωTxi+b)>0
max  λ L ( λ , ω , b ) = 1 2 ω T ω + ∞ = ∞ \mathop{\text{max }}\limits_{\lambda}L(\lambda,\omega,b)=\frac{1}{2}\omega^{T}\omega+ \infty=\infty λmax L(λ,ω,b)=21ωTω+=
如果 1 − y i ( ω T x i + b ) ≤ 0 1-y_{i}(\omega^{T}x_{i}+b)\leq 0 1yi(ωTxi+b)0
max  λ L ( λ , ω , b ) = 1 2 ω T ω + 0 = 1 2 ω T ω \mathop{\text{max }}\limits_{\lambda}L(\lambda,\omega,b)=\frac{1}{2}\omega^{T}\omega+0=\frac{1}{2}\omega^{T}\omega λmax L(λ,ω,b)=21ωTω+0=21ωTω
因此有
min  ω , b max  λ L ( λ , ω , b ) = min  ω , b ( ∞ , 1 2 ω T ω ) = min  ω , b 1 2 ω T ω \mathop{\text{min }}\limits_{\omega,b}\mathop{\text{max }}\limits_{\lambda}L(\lambda,\omega,b)=\mathop{\text{min }}\limits_{\omega,b}(\infty, \frac{1}{2}\omega^{T}\omega)=\mathop{\text{min }}\limits_{\omega,b} \frac{1}{2}\omega^{T}\omega ω,bmin λmax L(λ,ω,b)=ω,bmin (,21ωTω)=ω,bmin 21ωTω
因此该问题的无约束形式为
min  ω , b max  λ L ( ω , b , λ ) , s . t . λ i ≥ 0 \mathop{\text{min }}\limits_{\omega,b}\mathop{\text{max }}\limits_{\lambda}L(\omega,b,\lambda),s.t.\lambda_{i}\geq 0 ω,bmin λmax L(ω,b,λ),s.t.λi0

这里的有无约束指的是对 ω \omega ω的约束(这里的 ω \omega ω相当于模板中的 x x x)。本来 1 ⇔ 1 − y i ( ω T x i + b ) ≤ 0 1\Leftrightarrow 1-y_{i}(\omega^{T}x_{i}+b)\leq 0 11yi(ωTxi+b)0是对 ω \omega ω的约束,通过拉格朗日函数将约束条件转化为 λ i ≥ 0 \lambda_{i}\geq 0 λi0,是对 λ i \lambda_{i} λi的约束,不再是对 ω \omega ω的约束,因此称为无约束形式

由于不等式约束是仿射函数,对偶问题和原问题等价,因此该问题的对偶形式为
max  λ min  ω , b L ( ω , b , λ ) s . t . λ i ≥ 0 \mathop{\text{max }}\limits_{\lambda}\mathop{\text{min }}\limits_{\omega,b}L(\omega,b,\lambda)s.t.\lambda_{i}\geq 0 λmax ω,bmin L(ω,b,λ)s.t.λi0
先看 min  ω , b L ( ω , b , λ ) \mathop{\text{min }}\limits_{\omega,b}L(\omega,b,\lambda) ω,bmin L(ω,b,λ),对于 b b b
∂ L ∂ b = ∂ ∂ b [ ∑ i = 1 N λ i − ∑ i = 1 N λ i y i ( ω T x i + b ) ] = ∂ ∂ b ( − ∑ i = 1 N λ i y i b ) = − ∑ i = 1 N λ i y i = 0 \begin{aligned} \frac{\partial L}{\partial b}&=\frac{\partial }{\partial b}\left[\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}(\omega^{T}x_{i}+b)\right]\\ &=\frac{\partial }{\partial b}\left(-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}b\right)\\ &=-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}=0 \end{aligned} bL=b[i=1Nλii=1Nλiyi(ωTxi+b)]=b(i=1Nλiyib)=i=1Nλiyi=0
将其代入 L ( ω , b , λ ) L(\omega,b,\lambda) L(ω,b,λ)
L ( ω , b , λ ) = 1 2 ω T ω + ∑ i = 1 N λ i − ∑ i = 1 N λ i y i ( ω T x i + b ) = 1 2 ω T ω + ∑ i = 1 N λ i − ∑ i = 1 N λ i y i ω T x i + ∑ i = 1 N λ i y i b = 1 2 ω T ω + ∑ i = 1 N λ i − ∑ i = 1 N λ i y i ω T x i \begin{aligned} L(\omega,b,\lambda)&=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}(\omega^{T}x_{i}+b)\\ &=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}\omega^{T}x_{i}+\sum\limits_{i=1}^{N}\lambda_{i}y_{i}b\\ &=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}\omega^{T}x_{i} \end{aligned} L(ω,b,λ)=21ωTω+i=1Nλii=1Nλiyi(ωTxi+b)=21ωTω+i=1Nλii=1NλiyiωTxi+i=1Nλiyib=21ωTω+i=1Nλii=1NλiyiωTxi
对于 ω \omega ω
∂ L ∂ ω = 1 2 ⋅ 2 ω − ∑ i = 1 N λ i y i x i = 0 ω = ∑ i = 1 N λ i y i x i \begin{aligned} \frac{\partial L}{\partial \omega}&=\frac{1}{2} \cdot 2\omega- \sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}=0\\ \omega&=\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i} \end{aligned} ωLω=212ωi=1Nλiyixi=0=i=1Nλiyixi
将其代入 L ( ω , b , λ ) L(\omega,b,\lambda) L(ω,b,λ)
L ( ω , b , λ ) = 1 2 ( ∑ i = 1 N λ i y i x i ) T ( ∑ j = 1 N λ j y j x j ) ⏟ ∈ R − ∑ i = 1 N λ i y i ( ∑ j = 1 N λ j y j x j ) T x i ⏟ ∈ R + ∑ i = 1 N λ i = − 1 2 ∑ i = 1 N ∑ j = 1 N λ i λ j y i y j x i T x j + ∑ i = 1 N λ i \begin{aligned} L(\omega,b,\lambda)&=\frac{1}{2}\underbrace{\left(\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}\right)^{T}\left(\sum\limits_{j=1}^{N}\lambda_{j}y_{j}x_{j}\right)}_{\in \mathbb{R}}-\underbrace{\sum\limits_{i=1}^{N}\lambda_{i}y_{i}\left(\sum\limits_{j=1}^{N}\lambda_{j}y_{j}x_{j}\right)^{T}x_{i}}_{\in \mathbb{R}}+\sum\limits_{i=1}^{N}\lambda_{i}\\ &=- \frac{1}{2}\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N}\lambda_{i}\lambda_{j}y_{i}y_{j}x_{i}^{T}x_{j}+\sum\limits_{i=1}^{N}\lambda_{i} \end{aligned} L(ω,b,λ)=21R (i=1Nλiyixi)T(j=1Nλjyjxj)R i=1Nλiyi(j=1Nλjyjxj)Txi+i=1Nλi=21i=1Nj=1NλiλjyiyjxiTxj+i=1Nλi
因此原问题转化为
max  λ − 1 2 ∑ i = 1 N ∑ j = 1 N λ i λ j y i y j x i x j + ∑ i = 1 N λ i , s . t . λ i ≥ 0 , ∑ i = 1 N λ i y i = 0 \mathop{\text{max }}\limits_{\lambda}- \frac{1}{2}\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N}\lambda_{i}\lambda_{j}y_{i}y_{j}x_{i}x_{j}+\sum\limits_{i=1}^{N}\lambda_{i},s.t.\lambda_{i}\geq 0,\sum\limits_{i=1}^{N}\lambda_{i}y_{i}=0 λmax 21i=1Nj=1Nλiλjyiyjxixj+i=1Nλi,s.t.λi0,i=1Nλiyi=0
定义该优化问题的KKT条件(由于原、对偶问题具有强对偶关系 ⇔ \Leftrightarrow 满足KKT条件)
{ ∂ L ∂ ω = 0 , ∂ L ∂ b = 0 λ i [ 1 − y i ( ω T x i + b ) ] = 0 λ i ≥ 0 1 − y i ( ω T x i + b ) = 0 \left\{\begin{aligned}&\frac{\partial L}{\partial \omega}=0,\frac{\partial L}{\partial b}=0\\&\lambda_{i}[1-y_{i}(\omega^{T}x_{i}+b)]=0\\&\lambda_{i}\geq 0\\&1-y_{i}(\omega^{T}x_{i}+b)=0\end{aligned}\right. ωL=0,bL=0λi[1yi(ωTxi+b)]=0λi01yi(ωTxi+b)=0
其中 λ i [ 1 − y i ( ω T x i + b ) ] = 0 \lambda_{i}[1-y_{i}(\omega^{T}x_{i}+b)]=0 λi[1yi(ωTxi+b)]=0叫做互补松弛条件,因为对于支持向量 y i ( ω T x i + b ) = 1 y_{i}(\omega^{T}x_{i}+b)= 1 yi(ωTxi+b)=1,对于其他数据点 λ i = 0 \lambda_{i}=0 λi=0(根据拉格朗日函数的定义),即二者中一定至少有一个为 0 0 0
在之前的 ∂ L ∂ ω \begin{aligned} \frac{\partial L}{\partial \omega}\end{aligned} ωL中我们可以得到
ω ∗ = ∑ i = 1 N λ i y i x i \omega ^{*}=\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i} ω=i=1Nλiyixi

其中 λ i \lambda_{i} λi通过求解 max  λ − 1 2 ∑ i = 1 N ∑ j = 1 N λ i λ j y i y j x i x j + ∑ i = 1 N λ i , s . t . λ i ≥ 0 , ∑ i = 1 N λ i y i = 0 \begin{aligned} \mathop{\text{max }}\limits_{\lambda}- \frac{1}{2}\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N}\lambda_{i}\lambda_{j}y_{i}y_{j}x_{i}x_{j}+\sum\limits_{i=1}^{N}\lambda_{i},s.t.\lambda_{i}\geq 0,\sum\limits_{i=1}^{N}\lambda_{i}y_{i}=0\end{aligned} λmax 21i=1Nj=1Nλiλjyiyjxixj+i=1Nλi,s.t.λi0,i=1Nλiyi=0可以得到
有看到怎么求,但是还是看不懂,反正就是能得到 λ i \lambda_{i} λi就对了

对于 b ∗ b ^{*} b,我们假设
∃ ( x k , y k ) , s . t . 1 − y k ( ω T x k + b ) = 0 \exists (x_{k},y_{k}),s.t.1-y_{k}(\omega^{T}x_{k}+b)=0 (xk,yk),s.t.1yk(ωTxk+b)=0
显然 ( x k , y k ) (x_{k},y_{k}) (xk,yk),就是所谓的支持向量(在最开始 ω T = ω ^ T a , b = b ^ a \begin{aligned} \omega^{T}=\frac{\hat{\omega}^{T}}{a},b=\frac{\hat{b}}{a}\end{aligned} ωT=aω^T,b=ab^那里设的)。因此
y k ( ω T x k + b ) = 1 y k 2 ( ω T x k + b ) = y k y k ∈ { + 1 , − 1 } , y k 2 = 1 b ∗ = y k − ω T x k = y k − ∑ i = 1 N λ i y i x i T x k \begin{aligned} y_{k}(\omega^{T}x_{k}+b)&=1\\ y_{k}^{2}(\omega^{T}x_{k}+b)&=y_{k}\\ &y_{k}\in \left\{+1,-1\right\},y_{k}^{2}=1\\ b ^{*}&=y_{k}-\omega^{T}x_{k}=y_{k}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}^{T}x_{k} \end{aligned} yk(ωTxk+b)yk2(ωTxk+b)b=1=ykyk{+1,1},yk2=1=ykωTxk=yki=1NλiyixiTxk

这里的 ( x k , y k ) (x_{k},y_{k}) (xk,yk)就是 λ i \lambda_{i} λi对应的向量,至于 λ i \lambda_{i} λi怎么求,还是那句话我不会,非常抱歉

你可能感兴趣的:(白板推导系列笔记,支持向量机,机器学习,人工智能,算法,数据挖掘)