强对偶定理的证明

强对偶性的证明

本博客是参考资料的笔记整理:
https://www.bilibili.com/video/BV1dJ411B7gh?p=11

1. 预备知识

定义 1 凸集:某点集D是凸集,是指对于任意两点 x 1 x_1 x1, x 2 ∈ D x_2∈ D x2D 0 ≤ λ ≤ 1 0 ≤ λ ≤ 1 0λ1,有:
x = λ x 1 + ( 1 − λ ) x 2 ∈ D (1) x=\lambda x_{1}+(1-\lambda) x_{2} \in D \tag{1} x=λx1+(1λ)x2D(1)
以下是凸集的例子

强对偶定理的证明_第1张图片

定理 1 分离超平面定理:假设两个不相交的凸集 C C C D D D,即 C ∩ D = ∅ C \cap D=\emptyset CD=,则存在
向量 a ≠ 0 a \neq 0 a=0和常数 b b b,有
{ a T x ≤ b ∀ x ∈ C a T x ≥ b ∀ x ∈ D (2) \left\{\begin{array}{ll} \boldsymbol{a}^{\mathrm{T}} x \leq b & \forall x \in C \\ \boldsymbol{a}^{\mathrm{T}} x \geq b & \forall x \in D \end{array}\right. \tag{2} {aTxbaTxbxCxD(2)
强对偶定理的证明_第2张图片

Proof of 定理1:

定义 2:点集的 C C C D D D 的之间的距离为:
dist ⁡ ( C , D ) = inf ⁡ u ∈ C , v ∈ D ∥ u − v ∥ 2 (3) \operatorname{dist}(C, D)=\inf _{u \in C, v \in D}\|u-v\|^{2} \tag{3} dist(C,D)=uC,vDinfuv2(3)
​ 假设 c ∈ C , d ∈ D c \in C, d \in D cC,dD能达到此最小距离,即 dist ⁡ ( C , D ) = ∥ c − d ∥ 2 \operatorname{dist}(C, D)=\|c-d\|^{2} dist(C,D)=cd2,令 a = c − d a=c-d a=cd b = ∥ c ∥ 2 − ∥ d ∥ 2 2 b=\frac{\|c\|^{2}-\|d\|^{2}}{2} b=2c2d2(实际上, ( c − d ) T x − ∥ c ∥ 2 − ∥ d ∥ 2 2 = 0 (c-d)^{\mathrm{T}} x-\frac{\|c\|^{2}-\|d\|^{2}}{2}=0 (cd)Tx2c2d2=0是点 c c c 和点 d d d 连线的“中垂面”),下面证明:①对于任意 u ∈ C u \in C uC,有 a T u − b ≥ 0 a^{\mathrm{T}} u-b \geq 0 aTub0; ②对于任意 v ∈ D v \in D vD,有 a T v − b ≤ 0 a^{\mathrm{T}} v-b \leq 0 aTvb0.

反证法:假设存在一个 u ∈ C u \in C uC,使
a T u − b < 0 ( c − d ) T u − ∥ c ∥ 2 − ∥ d ∥ 2 2 < 0 ( c − d ) T ( u − 1 2 ( c + d ) ) < 0 ( c − d ) T ( ( u − c ) + 1 2 ( c − d ) ) < 0 ( c − d ) T ( u − c ) + 1 2 ∥ c − d ∥ 2 < 0 \begin{array}{l} a^{\mathrm{T}} u-b<0 \\ (c-d)^{\mathrm{T}} u-\frac{\|c\|^{2}-\|d\|^{2}}{2}<0 \\ (c-d)^{\mathrm{T}}\left(u-\frac{1}{2}(c+d)\right)<0 \\ (c-d)^{\mathrm{T}}\left((u-c)+\frac{1}{2}(c-d)\right)<0 \\ (c-d)^{\mathrm{T}}(u-c)+\frac{1}{2}\|c-d\|^{2}<0 \end{array} aTub<0(cd)Tu2c2d2<0(cd)T(u21(c+d))<0(cd)T((uc)+21(cd))<0(cd)T(uc)+21cd2<0
因为 − 1 2 ∥ c − d ∥ 2 ≥ 0 -\frac{1}{2}\|c-d\|^{2} \geq 0 21cd20,所以 ( c − d ) T ( u − c ) < 0 (c-d)^{\mathrm{T}}(u-c)<0 (cd)T(uc)<0。假设另有一点 p p p u u u c c c的连线上,即 p = λ u + ( 1 − λ ) c p=\lambda u+(1-\lambda) c p=λu+(1λ)c,其中 0 ≤ λ ≤ 1 0 \leq \lambda \leq 1 0λ1。根据 C C C 是凸集,则有 p ∈ C p \in C pC。下面计算 ∥ p − d ∥ 2 \|p-d\|^{2} pd2
∥ p − d ∥ 2 = ∥ λ u + ( 1 − λ ) c − d ∥ 2 = ∥ ( c − d ) + λ ( u − c ) ∥ 2 = ∥ c − d ∥ 2 + 2 λ ( c − d ) T ( u − c ) + λ 2 ∥ u − c ∥ 2 = ∥ c − d ∥ 2 + λ [ 2 ( c − d ) T ( u − c ) + λ ∥ u − c ∥ 2 ] \begin{aligned}\|p-d\|^{2} &=\|\lambda u+(1-\lambda) c-d\|^{2} \\ &=\|(c-d)+\lambda(u-c)\|^{2} \\ &=\|c-d\|^{2}+2 \lambda(c-d)^{\mathrm{T}}(u-c)+\lambda^{2}\|u-c\|^{2} \\ &=\|c-d\|^{2}+\lambda\left[2(c-d)^{\mathrm{T}}(u-c)+\lambda\|u-c\|^{2}\right] \end{aligned} pd2=λu+(1λ)cd2=(cd)+λ(uc)2=cd2+2λ(cd)T(uc)+λ2uc2=cd2+λ[2(cd)T(uc)+λuc2]
分析 ( c − d ) T ( u − c ) < 0 (c-d)^{\mathrm{T}}(u-c)<0 (cd)T(uc)<0,当 λ \lambda λ 取一个很小的正数时,即满足
λ < − 2 ( c − d ) T ( u − c ) ∥ u − c ∥ 2 (4) \lambda<-\frac{2(c-d)^{\mathrm{T}}(u-c)}{\|u-c\|^{2}} \tag{4} λ<uc22(cd)T(uc)(4)
一定有: ∥ p − d ∥ 2 < ∥ c − d ∥ 2 \|p-d\|^{2}<\|c-d\|^{2} pd2<cd2 p ∈ C p \in C pC,这与定义 2 矛盾,故①得证。而②的证明过程,同理。 ■ \blacksquare

定理2:若 c c c 是一个非零向量,即 ∥ c ∥ 2 > 0 \|c\|^{2}>0 c2>0,即则对任意 ε > 0 \varepsilon>0 ε>0,存在一个向量 x x x 满足: ① ∥ x ∥ 2 ≤ ε \|x\|^{2} \leq \varepsilon x2ε ;② c T x > 0 c^{T} x>0 cTx>0.

Proof of 定理2: 取 x = ε ∥ c ∥ 2 c x=\frac{\varepsilon}{\|c\|^{2}} c x=c2εc,则 ∥ x ∥ 2 = ε \|x\|^{2}=\varepsilon x2=ε,且 c T x = ε > 0 c^{T} x=\varepsilon>0 cTx=ε>0,同理也存在一个向量 x x x,使① ∥ x ∥ 2 ≤ ε \|x\|^{2} \leq \varepsilon x2ε,② c T x > 0 c^{T} x>0 cTx>0. ■ \blacksquare

2. 对偶问题

原问题(Prime Problem):
min ⁡ w f ( w )  s.t.  g i ( w ) ≤ 0 , i = 1 , 2 , … , K h j ( w ) = 0 , j = 1 , 2 , … , M (5) \begin{aligned} &\min _{w} f(w)\\ \text { s.t. }& g_{i}(w) \leq 0, \quad i=1,2, \dots, K\\ &h_{j}(w)=0, \quad j=1,2, \dots, M \end{aligned} \tag{5}  s.t. wminf(w)gi(w)0,i=1,2,,Khj(w)=0,j=1,2,,M(5)
对偶问题(Dual Problem):

先定义拉格朗日函数

L ( w , α , β ) = f ( w ) + ∑ i = 1 K α i g i ( w ) + ∑ j = 1 M β j h j ( w ) (6) \mathcal{L}(w, \alpha, \beta)=f(w)+\sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w) \tag{6} L(w,α,β)=f(w)+i=1Kαigi(w)+j=1Mβjhj(w)(6)
由拉格朗日函数推导出对偶问题的形式:
max ⁡ α , β θ ( α , β ) = inf ⁡ w L ( w , α , β )  s.t.  α i ≥ 0 , i = 1 , 2 , … K (7) \begin{aligned} \max _{\alpha, \beta} \theta(\alpha, \beta)=\inf _{w} \mathcal{L}(w, \alpha, \beta) \\ \text { s.t. } \alpha_{i} \geq 0, \quad i=1,2, \dots K \end{aligned} \tag{7} α,βmaxθ(α,β)=winfL(w,α,β) s.t. αi0,i=1,2,K(7)
定理3:若 W ∗ W^{*} W是原问题的解, ( α ∗ , β ∗ ) \left(\alpha^{*}, \beta^{*}\right) (α,β)是对偶问题的解,则有:
KaTeX parse error: Can't use function '$' in math mode at position 2: $̲\theta\left(\al…
Proof of 定理3:
θ ( α ∗ , β ∗ ) = inf ⁡ w L ( w , α ∗ , β ∗ ) ≤ L ( w ∗ , α ∗ , β ∗ ) = f ( w ∗ ) + ∑ i = 1 K α i ∗ g i ( w ∗ ) + ∑ j = 1 M β j ∗ h j ( w ∗ ) ≤ f ( w ∗ ) \begin{aligned} \theta\left(\alpha^{*}, \beta^{*}\right) &=\inf _{w} \mathcal{L}\left(w, \alpha^{*}, \beta^{*}\right) \\ & \leq \mathcal{L}\left(w^{*}, \alpha^{*}, \beta^{*}\right) \\ &=f\left(w^{*}\right)+\sum_{i=1}^{K} \alpha_{i}^{*} g_{i}\left(w^{*}\right)+\sum_{j=1}^{M} \beta_{j}^{*} h_{j}\left(w^{*}\right) \\ & \leq f\left(w^{*}\right) \end{aligned} θ(α,β)=winfL(w,α,β)L(w,α,β)=f(w)+i=1Kαigi(w)+j=1Mβjhj(w)f(w)
■ \blacksquare

定义3(凸函数) f ( w ) f(w) f(w)是凸函数是指对 ∀ w 1 , w 2 , ∀ λ ∈ [ 0 , 1 ] \forall w_{1}, w_{2}, \quad \forall \lambda \in[0,1] w1,w2,λ[0,1],有:
f ( λ w 1 + ( 1 − λ ) w 2 ) ≤ λ f ( w 1 ) + ( 1 − λ ) f ( w 2 ) (9) f\left(\lambda w_{1}+(1-\lambda) w_{2}\right) \leq \lambda f\left(w_{1}\right)+(1-\lambda) f\left(w_{2}\right) \tag{9} f(λw1+(1λ)w2)λf(w1)+(1λ)f(w2)(9)
强对偶定理的证明_第3张图片

3. 强对偶性的证明

定理4(强对偶定理):对于 f ( w ) , g i ( w ) , h j ( w ) f(w), g_{i}(w), h_{j}(w) f(w),gi(w),hj(w),若满足:

f ( w ) f(w) f(w)是凸函数;

g i ( w ) g_{i}(w) gi(w)是凸函数;

h j ( w ) h_{j}(w) hj(w)是仿射函数,即 h j ( w ) = c j T w + d h_{j}(w)=c_{j}^{\mathrm{T}} w+d hj(w)=cjTw+d

④slater条件:存在一个 w w w使 g i ( w ) < 0 g_i(w)<0 gi(w)<0 h j ( w ) = 0 h_j(w)=0 hj(w)=0

w w w的取值范围 D D D是开集,即若 w ∈ D w \in D wD 则存在邻域 N ( w , ε ) ∈ D N(w, \varepsilon) \in D N(w,ε)D

w w w的取值范围 D D D是凸集。

则有: f ( w ∗ ) = θ ( α ∗ , β ∗ ) f\left(w^{*}\right)=\theta\left(\alpha^{*}, \beta^{*}\right) f(w)=θ(α,β).

Proof of 强对偶定理

构造点集:
A = { ( u , v , t ) ∣ ∃ w ∈ D ,  使  g i ( w ) ≤ u i , h j ( w ) = v i , f ( w ) ≤ t } (10) A=\left\{(u, v, t) | \exists w \in D, \text { 使 } g_{i}(w) \leq u_{i}, h_{j}(w)=v_{i}, f(w) \leq t\right\} \tag{10} A={(u,v,t)wD, 使 gi(w)ui,hj(w)=vi,f(w)t}(10)
定义:
g ( w ) = [ g 1 ( w ) g 2 ( w ) ⋮ g K ( w ) ] , h ( w ) = [ h 1 ( w ) h 2 ( w ) ⋮ h M ( w ) ] (11) g(w)=\left[\begin{array}{c} g_{1}(w) \\ g_{2}(w) \\ \vdots \\ g_{K}(w) \end{array}\right], \quad h(w)=\left[\begin{array}{c} h_{1}(w) \\ h_{2}(w) \\ \vdots \\ h_{M}(w) \end{array}\right] \tag{11} g(w)=g1(w)g2(w)gK(w),h(w)=h1(w)h2(w)hM(w)(11)
注意:①若 w ∈ D w \in D wD,则 ( g ( w ) , h ( w ) , f ( w ) ) ∈ A (g(w), h(w), f(w)) \in A (g(w),h(w),f(w))A(证明:至少可以使定义中等号成立);②若 w ∈ D w \in D wD,则 ( + ∞ , h ( w ) , + ∞ ) ∈ A (+\infty, h(w),+\infty) \in A (+,h(w),+)A(证明:任何数都小于正无穷)。

引理1:若 D D D 是凸集, g i ( w ) g_{i}(w) gi(w)是凸函数 ( i = 1 , 2 , … , K ) (i=1,2, \dots, K) (i=1,2,,K) h j ( w ) h_{j}(w) hj(w)是仿射函数,即 h i ( w ) = c w + d h_{i}(w)=c w+d hi(w)=cw+d f ( w ) f(w) f(w)是凸函数,则 A A A 是凸集.

证明:

( u 1 , v 1 , t 1 ) , ( u 2 , v 2 , t 2 ) ∈ A \left(u_{1}, v_{1}, t_{1}\right),\left(u_{2}, v_{2}, t_{2}\right) \in A (u1,v1,t1),(u2,v2,t2)A,我们要证当 0 ≤ λ ≤ 1 0 \leq \lambda \leq 1 0λ1时,有
( λ u 1 + ( 1 − λ ) u 2 , λ v 1 + ( 1 − λ ) v 2 , λ t 1 + ( 1 − λ ) t 2 ) ∈ A (12) \left(\lambda u_{1}+(1-\lambda) u_{2}, \lambda v_{1}+(1-\lambda) v_{2}, \lambda t_{1}+(1-\lambda) t_{2}\right) \in A \tag{12} (λu1+(1λ)u2,λv1+(1λ)v2,λt1+(1λ)t2)A(12)
①因为 ( u 1 , v 1 , t 1 ) ∈ A \left(u_{1}, v_{1}, t_{1}\right) \in A (u1,v1,t1)A,所以 ∃ w 1 ∈ D \exists w_{1} \in D w1D,使 g i ( w 1 ) ≤ u i , h j ( w 1 ) = v i , f ( w 1 ) ≤ t g_{i}\left(w_{1}\right) \leq u_{i}, h_{j}\left(w_{1}\right)=v_{i}, f\left(w_{1}\right) \leq t gi(w1)ui,hj(w1)=vi,f(w1)t;同理 ( u 2 , v 2 , t 2 ) ∈ A \left(u_{2}, v_{2}, t_{2}\right) \in A (u2,v2,t2)A,所以 ∃ w 2 ∈ D \exists w_{2} \in D w2D,使 g i ( w 2 ) ≤ u i , h j ( w 2 ) = v i , f ( w 2 ) ≤ t g_{i}\left(w_{2}\right) \leq u_{i}, h_{j}\left(w_{2}\right)=v_{i}, f\left(w_{2}\right) \leq t gi(w2)ui,hj(w2)=vi,f(w2)t.

②设 w ′ = λ w 1 + ( 1 − λ ) w 2 w^{\prime}=\lambda w_{1}+(1-\lambda) w_{2} w=λw1+(1λ)w2,因为 D D D是凸集,所以 w ′ ∈ D w^{\prime} \in D wD。由于 g i ( w ) g_i(w) gi(w)是凸函数,故: g i ( w ′ ) ≤ λ g i ( w 1 ) + ( 1 − λ ) g i ( w 2 ) ≤ λ u 1 , i + ( 1 − λ ) u 2 , i g_{i}\left(w^{\prime}\right) \leq \lambda g_{i}\left(w_{1}\right)+(1-\lambda) g_{i}\left(w_{2}\right) \leq \lambda u_{1, i}+(1-\lambda) u_{2, i} gi(w)λgi(w1)+(1λ)gi(w2)λu1,i+(1λ)u2,i,同理有 f ( w ′ ) ≤ λ t 1 + ( 1 − λ ) t 2 f\left(w^{\prime}\right) \leq \lambda t_{1}+(1-\lambda) t_{2} f(w)λt1+(1λ)t2.

h j ( w ′ ) = c w ′ + d h_{j}\left(w^{\prime}\right)=c w^{\prime}+d hj(w)=cw+d
= λ ( c w 1 + d ) + ( 1 − λ ) ( c w 2 + d ) =\lambda\left(c w_{1}+d\right)+(1-\lambda)\left(c w_{2}+d\right) =λ(cw1+d)+(1λ)(cw2+d)
= λ h j ( w 1 ) + ( 1 − λ ) h j ( w 2 ) =\lambda h_{j}\left(w_{1}\right)+(1-\lambda) h_{j}\left(w_{2}\right) =λhj(w1)+(1λ)hj(w2)
= λ v 1 , j + ( 1 − λ ) v 2 , j =\lambda v_{1, j}+(1-\lambda) v_{2, j} =λv1,j+(1λ)v2,j

综上①②③,引理1得证. ■ \blacksquare

根据式子(10)的定义,我们有原问题的解
f ( w ∗ ) = min ⁡ ( 0 , 0 , t ) ∈ A t (13) f\left(w^{*}\right)=\min _{(0,0, t) \in A} t \tag{13} f(w)=(0,0,t)Amint(13)
定义另一个点集 B = { ( 0 , 0 , s ) ∣ s < f ( w ∗ ) } B=\left\{(0,0, s) | sB={(0,0,s)s<f(w)},可以证明 B B B 也是凸集,且 A ∩ B = ∅ A \cap B=\emptyset AB=.

根据定理1(分离超平面定理),存在 ( α , β , η ) (\alpha, \beta, \eta) (α,β,η)使得:①若 ( u , v , t ) ∈ A (u, v, t) \in A (u,v,t)A,则 α T u + β T v + η t ≥ b \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq b αTu+βTv+ηtb;②若 ( u , v , t ) ∈ B (u, v, t) \in B (u,v,t)B,则 α T u + β T v + η t < b \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta tαTu+βTv+ηt<b。由于此时, u = 0 u=0 u=0 v = 0 v=0 v=0,所以 − η t < b -\eta tηt<b.

引理2:若对 ∀ ( u , v , t ) ∈ A \forall(u, v, t) \in A (u,v,t)A,有 α T u + β T v + η t ≥ b \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq b αTu+βTv+ηtb,则有
α = [ α 1 , α 2 , … , α K ] ≽ 0 , η ≥ 0 (14) \alpha=\left[\alpha_{1}, \alpha_{2}, \ldots, \alpha_{K}\right] \succcurlyeq 0, \quad \eta \geq 0 \tag{14} α=[α1,α2,,αK]0,η0(14)
Proof

假设某个 α i < 0 \alpha_{i}<0 αi<0,则可以取相应 u i = + ∞ u_{i}=+\infty ui=+,此时 ( u , v , t ) (u, v, t) (u,v,t)仍然属于 A A A,但 α T u + β T v + η t = − ∞ \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t=-\infty αTu+βTv+ηt=,这与 α T u + β T v + η t ≥ 0 \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq 0 αTu+βTv+ηt0矛盾。同理可证 η ≥ 0 \eta \geq 0 η0

根据 A A A 的定义和①可得,对 ∀ w ∈ D \forall w \in D wD,有 ∑ i = 1 K α i g i ( w ) + ∑ j = 1 M β j h j ( w ) + η f ( w ) ≥ b \sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w)+\eta f(w) \geq b i=1Kαigi(w)+j=1Mβjhj(w)+ηf(w)b;根据 B B B 的定义和②的 − η t < b -\eta tηt<b可得, η f ( w ∗ ) ≤ b \eta f\left(w^{*}\right) \leq b ηf(w)b。因此有:
∑ i = 1 K α i g i ( w ) + ∑ j = 1 M β j h j ( w ) + η f ( w ) ≥ b ≥ η f ( w ∗ ) (15) \sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w)+\eta f(w) \geq b \geq \eta f\left(w^{*}\right) \tag{15} i=1Kαigi(w)+j=1Mβjhj(w)+ηf(w)bηf(w)(15)
下面分两种情况讨论:

情况1 η ≠ 0 \eta \neq 0 η=0,此时有
f ( w ∗ ) ≤ ∑ i = 1 K α i η g i ( w ) + ∑ j = 1 M β j η h j ( w ) + f ( w ) = L ( w , α η , β η ) (16) f\left(w^{*}\right) \leq \sum_{i=1}^{K} \frac{\alpha_{i}}{\eta} g_{i}(w)+\sum_{j=1}^{M} \frac{\beta_{j}}{\eta} h_{j}(w)+f(w)=\mathcal{L}\left(w, \frac{\alpha}{\eta}, \frac{\beta}{\eta}\right) \tag{16} f(w)i=1Kηαigi(w)+j=1Mηβjhj(w)+f(w)=L(w,ηα,ηβ)(16)
由于 w w w是任意的,因此有
f ( w ∗ ) ≤ inf ⁡ w L ( w , α η , β η ) = θ ( α η , β η ) (17) f\left(w^{*}\right) \leq \inf _{w} \mathcal{L}\left(w, \frac{\alpha}{\eta}, \frac{\beta}{\eta}\right)=\theta\left(\frac{\alpha}{\eta}, \frac{\beta}{\eta}\right) \tag{17} f(w)winfL(w,ηα,ηβ)=θ(ηα,ηβ)(17)
由于 α ≻ 0 , η > 0 \alpha \succ 0, \eta>0 α0,η>0,所以 α η ≻ 0 \frac{\alpha}{\eta} \succ 0 ηα0,满足对偶问题的限制条件,因此有:
f ( w ∗ ) ≤ θ ( α ∗ , β ∗ ) (18) f\left(w^{*}\right) \leq \theta\left(\alpha^{*}, \beta^{*}\right) \tag{18} f(w)θ(α,β)(18)
在根据定理3,有 θ ( α ∗ , β ∗ ) ≤ f ( w ∗ ) \theta\left(\alpha^{*}, \beta^{*}\right) \leq f\left(w^{*}\right) θ(α,β)f(w),所以 f ( w ∗ ) = θ ( α ∗ , β ∗ ) f\left(w^{*}\right)=\theta\left(\alpha^{*}, \beta^{*}\right) f(w)=θ(α,β),得证。

情况2 η = 0 \eta=0 η=0,此时对 ∀ w ∈ D \forall w \in D wD,有
∑ i = 1 K α i g i ( w ) + ∑ j = 1 M β j h j ( w ) ≥ 0 (19) \sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w) \geq 0 \tag{19} i=1Kαigi(w)+j=1Mβjhj(w)0(19)
根据定理4中的条件④(slater条件), ∃ w \exists w w使 g i ( w ) < 0 g_{i}(w)<0 gi(w)<0 h j ( w ) = 0 h_{j}(w)=0 hj(w)=0,这可以推出 α i = 0 \alpha_{i}=0 αi=0,因此公式(19)变为
∑ j = 1 M β j h j ( w ) ≥ 0 ,  或记为  β T h ( w ) ≥ 0 (20) \sum_{j=1}^{M} \beta_{j} h_{j}(w) \geq 0, \text { 或记为 } \beta^{\mathrm{T}} h(w) \geq 0 \tag{20} j=1Mβjhj(w)0, 或记为 βTh(w)0(20)
根据定理4中的条件③, h ( w ) = c w + d h(w)=c w+d h(w)=cw+d,代入得:
β T h ( w ) ≥ 0 β T c w + β T d ≥ 0 (21) \begin{array}{l} \beta^{\mathrm{T}} h(w) \geq 0 \\ \beta^{\mathrm{T}} c w+\beta^{\mathrm{T}} d \geq 0 \end{array} \tag{21} βTh(w)0βTcw+βTd0(21)
P = β T c P=\beta^{\mathrm{T}} c P=βTc q = β T d q=\beta^{\mathrm{T}} d q=βTd,则式子(21)改写为:
P w + q ≥ 0 (22) P w+q \geq 0 \tag{22} Pw+q0(22)
注意公式(22)对所有的 w ∈ D w \in D wD都成立。根据条件④ (slater条件), ∃ w \exists w w 使 c w + d = 0 c w+d=0 cw+d=0,从而 P w + q = 0 P w+q=0 Pw+q=0.

下面证明,存在一个 w ′ = w + Δ w w^{\prime}=w+\Delta w w=w+Δw,其中 Δ w \Delta w Δw w w w的一个领域 N ( 0 , ε ) N(0, \varepsilon) N(0,ε)中,使 P w ′ + q < 0 P w^{\prime}+q<0 Pw+q<0

证明:根据定理1,有 β ≠ 0 \beta \neq 0 β=0,否则 ( α , β , η ) (\alpha, \beta, \eta) (α,β,η)都为0,与分离超平面定理矛盾。则有 P = β T c ≠ 0 P=\beta^{\mathrm{T}} c \neq 0 P=βTc=0;根据定理2,存在一个 Δ w \Delta w Δw满足 ∥ w ∥ 2 < ε \|w\|^{2}<\varepsilon w2<ε P Δ w < 0 P \Delta w<0 PΔw<0。因此, w ′ = w + Δ w ∈ N ( 0 , ε ) w^{\prime}=w+\Delta w \in N(0, \varepsilon) w=w+ΔwN(0,ε)

根据定理4中的条件⑤, w ′ ∈ D w^{\prime} \in D wD,同时,
P w ′ + q = P ( w + Δ w ) + q = ( P w + q ) + P Δ w = P Δ w < 0 (23) \begin{aligned} P w^{\prime}+q &=P(w+\Delta w)+q \\ &=(P w+q)+P \Delta w \\ &=P \Delta w<0 \end{aligned} \tag{23} Pw+q=P(w+Δw)+q=(Pw+q)+PΔw=PΔw<0(23)
这与式子(22)矛盾,所以情况2不成立/不存在。

定理4 强对偶定理得证. ■ \blacksquare

你可能感兴趣的:(机器学习)