本博客是参考资料的笔记整理:
https://www.bilibili.com/video/BV1dJ411B7gh?p=11
定义 1 凸集:某点集D是凸集,是指对于任意两点 x 1 x_1 x1, x 2 ∈ D x_2∈ D x2∈D 和 0 ≤ λ ≤ 1 0 ≤ λ ≤ 1 0≤λ≤1,有:
x = λ x 1 + ( 1 − λ ) x 2 ∈ D (1) x=\lambda x_{1}+(1-\lambda) x_{2} \in D \tag{1} x=λx1+(1−λ)x2∈D(1)
以下是凸集的例子
定理 1 分离超平面定理:假设两个不相交的凸集 C C C 和 D D D,即 C ∩ D = ∅ C \cap D=\emptyset C∩D=∅,则存在
向量 a ≠ 0 a \neq 0 a=0和常数 b b b,有
{ a T x ≤ b ∀ x ∈ C a T x ≥ b ∀ x ∈ D (2) \left\{\begin{array}{ll} \boldsymbol{a}^{\mathrm{T}} x \leq b & \forall x \in C \\ \boldsymbol{a}^{\mathrm{T}} x \geq b & \forall x \in D \end{array}\right. \tag{2} {aTx≤baTx≥b∀x∈C∀x∈D(2)
Proof of 定理1:
定义 2:点集的 C C C 和 D D D 的之间的距离为:
dist ( C , D ) = inf u ∈ C , v ∈ D ∥ u − v ∥ 2 (3) \operatorname{dist}(C, D)=\inf _{u \in C, v \in D}\|u-v\|^{2} \tag{3} dist(C,D)=u∈C,v∈Dinf∥u−v∥2(3)
假设 c ∈ C , d ∈ D c \in C, d \in D c∈C,d∈D能达到此最小距离,即 dist ( C , D ) = ∥ c − d ∥ 2 \operatorname{dist}(C, D)=\|c-d\|^{2} dist(C,D)=∥c−d∥2,令 a = c − d a=c-d a=c−d, b = ∥ c ∥ 2 − ∥ d ∥ 2 2 b=\frac{\|c\|^{2}-\|d\|^{2}}{2} b=2∥c∥2−∥d∥2(实际上, ( c − d ) T x − ∥ c ∥ 2 − ∥ d ∥ 2 2 = 0 (c-d)^{\mathrm{T}} x-\frac{\|c\|^{2}-\|d\|^{2}}{2}=0 (c−d)Tx−2∥c∥2−∥d∥2=0是点 c c c 和点 d d d 连线的“中垂面”),下面证明:①对于任意 u ∈ C u \in C u∈C,有 a T u − b ≥ 0 a^{\mathrm{T}} u-b \geq 0 aTu−b≥0; ②对于任意 v ∈ D v \in D v∈D,有 a T v − b ≤ 0 a^{\mathrm{T}} v-b \leq 0 aTv−b≤0.
反证法:假设存在一个 u ∈ C u \in C u∈C,使
a T u − b < 0 ( c − d ) T u − ∥ c ∥ 2 − ∥ d ∥ 2 2 < 0 ( c − d ) T ( u − 1 2 ( c + d ) ) < 0 ( c − d ) T ( ( u − c ) + 1 2 ( c − d ) ) < 0 ( c − d ) T ( u − c ) + 1 2 ∥ c − d ∥ 2 < 0 \begin{array}{l} a^{\mathrm{T}} u-b<0 \\ (c-d)^{\mathrm{T}} u-\frac{\|c\|^{2}-\|d\|^{2}}{2}<0 \\ (c-d)^{\mathrm{T}}\left(u-\frac{1}{2}(c+d)\right)<0 \\ (c-d)^{\mathrm{T}}\left((u-c)+\frac{1}{2}(c-d)\right)<0 \\ (c-d)^{\mathrm{T}}(u-c)+\frac{1}{2}\|c-d\|^{2}<0 \end{array} aTu−b<0(c−d)Tu−2∥c∥2−∥d∥2<0(c−d)T(u−21(c+d))<0(c−d)T((u−c)+21(c−d))<0(c−d)T(u−c)+21∥c−d∥2<0
因为 − 1 2 ∥ c − d ∥ 2 ≥ 0 -\frac{1}{2}\|c-d\|^{2} \geq 0 −21∥c−d∥2≥0,所以 ( c − d ) T ( u − c ) < 0 (c-d)^{\mathrm{T}}(u-c)<0 (c−d)T(u−c)<0。假设另有一点 p p p在 u u u和 c c c的连线上,即 p = λ u + ( 1 − λ ) c p=\lambda u+(1-\lambda) c p=λu+(1−λ)c,其中 0 ≤ λ ≤ 1 0 \leq \lambda \leq 1 0≤λ≤1。根据 C C C 是凸集,则有 p ∈ C p \in C p∈C。下面计算 ∥ p − d ∥ 2 \|p-d\|^{2} ∥p−d∥2 :
∥ p − d ∥ 2 = ∥ λ u + ( 1 − λ ) c − d ∥ 2 = ∥ ( c − d ) + λ ( u − c ) ∥ 2 = ∥ c − d ∥ 2 + 2 λ ( c − d ) T ( u − c ) + λ 2 ∥ u − c ∥ 2 = ∥ c − d ∥ 2 + λ [ 2 ( c − d ) T ( u − c ) + λ ∥ u − c ∥ 2 ] \begin{aligned}\|p-d\|^{2} &=\|\lambda u+(1-\lambda) c-d\|^{2} \\ &=\|(c-d)+\lambda(u-c)\|^{2} \\ &=\|c-d\|^{2}+2 \lambda(c-d)^{\mathrm{T}}(u-c)+\lambda^{2}\|u-c\|^{2} \\ &=\|c-d\|^{2}+\lambda\left[2(c-d)^{\mathrm{T}}(u-c)+\lambda\|u-c\|^{2}\right] \end{aligned} ∥p−d∥2=∥λu+(1−λ)c−d∥2=∥(c−d)+λ(u−c)∥2=∥c−d∥2+2λ(c−d)T(u−c)+λ2∥u−c∥2=∥c−d∥2+λ[2(c−d)T(u−c)+λ∥u−c∥2]
分析 ( c − d ) T ( u − c ) < 0 (c-d)^{\mathrm{T}}(u-c)<0 (c−d)T(u−c)<0,当 λ \lambda λ 取一个很小的正数时,即满足
λ < − 2 ( c − d ) T ( u − c ) ∥ u − c ∥ 2 (4) \lambda<-\frac{2(c-d)^{\mathrm{T}}(u-c)}{\|u-c\|^{2}} \tag{4} λ<−∥u−c∥22(c−d)T(u−c)(4)
一定有: ∥ p − d ∥ 2 < ∥ c − d ∥ 2 \|p-d\|^{2}<\|c-d\|^{2} ∥p−d∥2<∥c−d∥2且 p ∈ C p \in C p∈C,这与定义 2 矛盾,故①得证。而②的证明过程,同理。 ■ \blacksquare ■
定理2:若 c c c 是一个非零向量,即 ∥ c ∥ 2 > 0 \|c\|^{2}>0 ∥c∥2>0,即则对任意 ε > 0 \varepsilon>0 ε>0,存在一个向量 x x x 满足: ① ∥ x ∥ 2 ≤ ε \|x\|^{2} \leq \varepsilon ∥x∥2≤ε ;② c T x > 0 c^{T} x>0 cTx>0.
Proof of 定理2: 取 x = ε ∥ c ∥ 2 c x=\frac{\varepsilon}{\|c\|^{2}} c x=∥c∥2εc,则 ∥ x ∥ 2 = ε \|x\|^{2}=\varepsilon ∥x∥2=ε,且 c T x = ε > 0 c^{T} x=\varepsilon>0 cTx=ε>0,同理也存在一个向量 x x x,使① ∥ x ∥ 2 ≤ ε \|x\|^{2} \leq \varepsilon ∥x∥2≤ε,② c T x > 0 c^{T} x>0 cTx>0. ■ \blacksquare ■
原问题(Prime Problem):
min w f ( w ) s.t. g i ( w ) ≤ 0 , i = 1 , 2 , … , K h j ( w ) = 0 , j = 1 , 2 , … , M (5) \begin{aligned} &\min _{w} f(w)\\ \text { s.t. }& g_{i}(w) \leq 0, \quad i=1,2, \dots, K\\ &h_{j}(w)=0, \quad j=1,2, \dots, M \end{aligned} \tag{5} s.t. wminf(w)gi(w)≤0,i=1,2,…,Khj(w)=0,j=1,2,…,M(5)
对偶问题(Dual Problem):
先定义拉格朗日函数
L ( w , α , β ) = f ( w ) + ∑ i = 1 K α i g i ( w ) + ∑ j = 1 M β j h j ( w ) (6) \mathcal{L}(w, \alpha, \beta)=f(w)+\sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w) \tag{6} L(w,α,β)=f(w)+i=1∑Kαigi(w)+j=1∑Mβjhj(w)(6)
由拉格朗日函数推导出对偶问题的形式:
max α , β θ ( α , β ) = inf w L ( w , α , β ) s.t. α i ≥ 0 , i = 1 , 2 , … K (7) \begin{aligned} \max _{\alpha, \beta} \theta(\alpha, \beta)=\inf _{w} \mathcal{L}(w, \alpha, \beta) \\ \text { s.t. } \alpha_{i} \geq 0, \quad i=1,2, \dots K \end{aligned} \tag{7} α,βmaxθ(α,β)=winfL(w,α,β) s.t. αi≥0,i=1,2,…K(7)
定理3:若 W ∗ W^{*} W∗是原问题的解, ( α ∗ , β ∗ ) \left(\alpha^{*}, \beta^{*}\right) (α∗,β∗)是对偶问题的解,则有:
KaTeX parse error: Can't use function '$' in math mode at position 2: $̲\theta\left(\al…
Proof of 定理3:
θ ( α ∗ , β ∗ ) = inf w L ( w , α ∗ , β ∗ ) ≤ L ( w ∗ , α ∗ , β ∗ ) = f ( w ∗ ) + ∑ i = 1 K α i ∗ g i ( w ∗ ) + ∑ j = 1 M β j ∗ h j ( w ∗ ) ≤ f ( w ∗ ) \begin{aligned} \theta\left(\alpha^{*}, \beta^{*}\right) &=\inf _{w} \mathcal{L}\left(w, \alpha^{*}, \beta^{*}\right) \\ & \leq \mathcal{L}\left(w^{*}, \alpha^{*}, \beta^{*}\right) \\ &=f\left(w^{*}\right)+\sum_{i=1}^{K} \alpha_{i}^{*} g_{i}\left(w^{*}\right)+\sum_{j=1}^{M} \beta_{j}^{*} h_{j}\left(w^{*}\right) \\ & \leq f\left(w^{*}\right) \end{aligned} θ(α∗,β∗)=winfL(w,α∗,β∗)≤L(w∗,α∗,β∗)=f(w∗)+i=1∑Kαi∗gi(w∗)+j=1∑Mβj∗hj(w∗)≤f(w∗)
■ \blacksquare ■
定义3(凸函数): f ( w ) f(w) f(w)是凸函数是指对 ∀ w 1 , w 2 , ∀ λ ∈ [ 0 , 1 ] \forall w_{1}, w_{2}, \quad \forall \lambda \in[0,1] ∀w1,w2,∀λ∈[0,1],有:
f ( λ w 1 + ( 1 − λ ) w 2 ) ≤ λ f ( w 1 ) + ( 1 − λ ) f ( w 2 ) (9) f\left(\lambda w_{1}+(1-\lambda) w_{2}\right) \leq \lambda f\left(w_{1}\right)+(1-\lambda) f\left(w_{2}\right) \tag{9} f(λw1+(1−λ)w2)≤λf(w1)+(1−λ)f(w2)(9)
定理4(强对偶定理):对于 f ( w ) , g i ( w ) , h j ( w ) f(w), g_{i}(w), h_{j}(w) f(w),gi(w),hj(w),若满足:
① f ( w ) f(w) f(w)是凸函数;
② g i ( w ) g_{i}(w) gi(w)是凸函数;
③ h j ( w ) h_{j}(w) hj(w)是仿射函数,即 h j ( w ) = c j T w + d h_{j}(w)=c_{j}^{\mathrm{T}} w+d hj(w)=cjTw+d;
④slater条件:存在一个 w w w使 g i ( w ) < 0 g_i(w)<0 gi(w)<0和 h j ( w ) = 0 h_j(w)=0 hj(w)=0;
⑤ w w w的取值范围 D D D是开集,即若 w ∈ D w \in D w∈D 则存在邻域 N ( w , ε ) ∈ D N(w, \varepsilon) \in D N(w,ε)∈D;
⑥ w w w的取值范围 D D D是凸集。
则有: f ( w ∗ ) = θ ( α ∗ , β ∗ ) f\left(w^{*}\right)=\theta\left(\alpha^{*}, \beta^{*}\right) f(w∗)=θ(α∗,β∗).
Proof of 强对偶定理:
构造点集:
A = { ( u , v , t ) ∣ ∃ w ∈ D , 使 g i ( w ) ≤ u i , h j ( w ) = v i , f ( w ) ≤ t } (10) A=\left\{(u, v, t) | \exists w \in D, \text { 使 } g_{i}(w) \leq u_{i}, h_{j}(w)=v_{i}, f(w) \leq t\right\} \tag{10} A={(u,v,t)∣∃w∈D, 使 gi(w)≤ui,hj(w)=vi,f(w)≤t}(10)
定义:
g ( w ) = [ g 1 ( w ) g 2 ( w ) ⋮ g K ( w ) ] , h ( w ) = [ h 1 ( w ) h 2 ( w ) ⋮ h M ( w ) ] (11) g(w)=\left[\begin{array}{c} g_{1}(w) \\ g_{2}(w) \\ \vdots \\ g_{K}(w) \end{array}\right], \quad h(w)=\left[\begin{array}{c} h_{1}(w) \\ h_{2}(w) \\ \vdots \\ h_{M}(w) \end{array}\right] \tag{11} g(w)=⎣⎢⎢⎢⎡g1(w)g2(w)⋮gK(w)⎦⎥⎥⎥⎤,h(w)=⎣⎢⎢⎢⎡h1(w)h2(w)⋮hM(w)⎦⎥⎥⎥⎤(11)
注意:①若 w ∈ D w \in D w∈D,则 ( g ( w ) , h ( w ) , f ( w ) ) ∈ A (g(w), h(w), f(w)) \in A (g(w),h(w),f(w))∈A(证明:至少可以使定义中等号成立);②若 w ∈ D w \in D w∈D,则 ( + ∞ , h ( w ) , + ∞ ) ∈ A (+\infty, h(w),+\infty) \in A (+∞,h(w),+∞)∈A(证明:任何数都小于正无穷)。
引理1:若 D D D 是凸集, g i ( w ) g_{i}(w) gi(w)是凸函数 ( i = 1 , 2 , … , K ) (i=1,2, \dots, K) (i=1,2,…,K), h j ( w ) h_{j}(w) hj(w)是仿射函数,即 h i ( w ) = c w + d h_{i}(w)=c w+d hi(w)=cw+d, f ( w ) f(w) f(w)是凸函数,则 A A A 是凸集.
证明:
设 ( u 1 , v 1 , t 1 ) , ( u 2 , v 2 , t 2 ) ∈ A \left(u_{1}, v_{1}, t_{1}\right),\left(u_{2}, v_{2}, t_{2}\right) \in A (u1,v1,t1),(u2,v2,t2)∈A,我们要证当 0 ≤ λ ≤ 1 0 \leq \lambda \leq 1 0≤λ≤1时,有
( λ u 1 + ( 1 − λ ) u 2 , λ v 1 + ( 1 − λ ) v 2 , λ t 1 + ( 1 − λ ) t 2 ) ∈ A (12) \left(\lambda u_{1}+(1-\lambda) u_{2}, \lambda v_{1}+(1-\lambda) v_{2}, \lambda t_{1}+(1-\lambda) t_{2}\right) \in A \tag{12} (λu1+(1−λ)u2,λv1+(1−λ)v2,λt1+(1−λ)t2)∈A(12)
①因为 ( u 1 , v 1 , t 1 ) ∈ A \left(u_{1}, v_{1}, t_{1}\right) \in A (u1,v1,t1)∈A,所以 ∃ w 1 ∈ D \exists w_{1} \in D ∃w1∈D,使 g i ( w 1 ) ≤ u i , h j ( w 1 ) = v i , f ( w 1 ) ≤ t g_{i}\left(w_{1}\right) \leq u_{i}, h_{j}\left(w_{1}\right)=v_{i}, f\left(w_{1}\right) \leq t gi(w1)≤ui,hj(w1)=vi,f(w1)≤t;同理 ( u 2 , v 2 , t 2 ) ∈ A \left(u_{2}, v_{2}, t_{2}\right) \in A (u2,v2,t2)∈A,所以 ∃ w 2 ∈ D \exists w_{2} \in D ∃w2∈D,使 g i ( w 2 ) ≤ u i , h j ( w 2 ) = v i , f ( w 2 ) ≤ t g_{i}\left(w_{2}\right) \leq u_{i}, h_{j}\left(w_{2}\right)=v_{i}, f\left(w_{2}\right) \leq t gi(w2)≤ui,hj(w2)=vi,f(w2)≤t.
②设 w ′ = λ w 1 + ( 1 − λ ) w 2 w^{\prime}=\lambda w_{1}+(1-\lambda) w_{2} w′=λw1+(1−λ)w2,因为 D D D是凸集,所以 w ′ ∈ D w^{\prime} \in D w′∈D。由于 g i ( w ) g_i(w) gi(w)是凸函数,故: g i ( w ′ ) ≤ λ g i ( w 1 ) + ( 1 − λ ) g i ( w 2 ) ≤ λ u 1 , i + ( 1 − λ ) u 2 , i g_{i}\left(w^{\prime}\right) \leq \lambda g_{i}\left(w_{1}\right)+(1-\lambda) g_{i}\left(w_{2}\right) \leq \lambda u_{1, i}+(1-\lambda) u_{2, i} gi(w′)≤λgi(w1)+(1−λ)gi(w2)≤λu1,i+(1−λ)u2,i,同理有 f ( w ′ ) ≤ λ t 1 + ( 1 − λ ) t 2 f\left(w^{\prime}\right) \leq \lambda t_{1}+(1-\lambda) t_{2} f(w′)≤λt1+(1−λ)t2.
③ h j ( w ′ ) = c w ′ + d h_{j}\left(w^{\prime}\right)=c w^{\prime}+d hj(w′)=cw′+d
= λ ( c w 1 + d ) + ( 1 − λ ) ( c w 2 + d ) =\lambda\left(c w_{1}+d\right)+(1-\lambda)\left(c w_{2}+d\right) =λ(cw1+d)+(1−λ)(cw2+d)
= λ h j ( w 1 ) + ( 1 − λ ) h j ( w 2 ) =\lambda h_{j}\left(w_{1}\right)+(1-\lambda) h_{j}\left(w_{2}\right) =λhj(w1)+(1−λ)hj(w2)
= λ v 1 , j + ( 1 − λ ) v 2 , j =\lambda v_{1, j}+(1-\lambda) v_{2, j} =λv1,j+(1−λ)v2,j
综上①②③,引理1得证. ■ \blacksquare ■
根据式子(10)的定义,我们有原问题的解
f ( w ∗ ) = min ( 0 , 0 , t ) ∈ A t (13) f\left(w^{*}\right)=\min _{(0,0, t) \in A} t \tag{13} f(w∗)=(0,0,t)∈Amint(13)
定义另一个点集 B = { ( 0 , 0 , s ) ∣ s < f ( w ∗ ) } B=\left\{(0,0, s) | s
根据定理1(分离超平面定理),存在 ( α , β , η ) (\alpha, \beta, \eta) (α,β,η)使得:①若 ( u , v , t ) ∈ A (u, v, t) \in A (u,v,t)∈A,则 α T u + β T v + η t ≥ b \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq b αTu+βTv+ηt≥b;②若 ( u , v , t ) ∈ B (u, v, t) \in B (u,v,t)∈B,则 α T u + β T v + η t < b \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta tαTu+βTv+ηt<b。由于此时, u = 0 u=0 u=0和 v = 0 v=0 v=0,所以 − η t < b -\eta t−ηt<b.
引理2:若对 ∀ ( u , v , t ) ∈ A \forall(u, v, t) \in A ∀(u,v,t)∈A,有 α T u + β T v + η t ≥ b \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq b αTu+βTv+ηt≥b,则有
α = [ α 1 , α 2 , … , α K ] ≽ 0 , η ≥ 0 (14) \alpha=\left[\alpha_{1}, \alpha_{2}, \ldots, \alpha_{K}\right] \succcurlyeq 0, \quad \eta \geq 0 \tag{14} α=[α1,α2,…,αK]≽0,η≥0(14)
Proof:
假设某个 α i < 0 \alpha_{i}<0 αi<0,则可以取相应 u i = + ∞ u_{i}=+\infty ui=+∞,此时 ( u , v , t ) (u, v, t) (u,v,t)仍然属于 A A A,但 α T u + β T v + η t = − ∞ \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t=-\infty αTu+βTv+ηt=−∞,这与 α T u + β T v + η t ≥ 0 \alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq 0 αTu+βTv+ηt≥0矛盾。同理可证 η ≥ 0 \eta \geq 0 η≥0。
根据 A A A 的定义和①可得,对 ∀ w ∈ D \forall w \in D ∀w∈D,有 ∑ i = 1 K α i g i ( w ) + ∑ j = 1 M β j h j ( w ) + η f ( w ) ≥ b \sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w)+\eta f(w) \geq b ∑i=1Kαigi(w)+∑j=1Mβjhj(w)+ηf(w)≥b;根据 B B B 的定义和②的 − η t < b -\eta t−ηt<b可得, η f ( w ∗ ) ≤ b \eta f\left(w^{*}\right) \leq b ηf(w∗)≤b。因此有:
∑ i = 1 K α i g i ( w ) + ∑ j = 1 M β j h j ( w ) + η f ( w ) ≥ b ≥ η f ( w ∗ ) (15) \sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w)+\eta f(w) \geq b \geq \eta f\left(w^{*}\right) \tag{15} i=1∑Kαigi(w)+j=1∑Mβjhj(w)+ηf(w)≥b≥ηf(w∗)(15)
下面分两种情况讨论:
情况1: η ≠ 0 \eta \neq 0 η=0,此时有
f ( w ∗ ) ≤ ∑ i = 1 K α i η g i ( w ) + ∑ j = 1 M β j η h j ( w ) + f ( w ) = L ( w , α η , β η ) (16) f\left(w^{*}\right) \leq \sum_{i=1}^{K} \frac{\alpha_{i}}{\eta} g_{i}(w)+\sum_{j=1}^{M} \frac{\beta_{j}}{\eta} h_{j}(w)+f(w)=\mathcal{L}\left(w, \frac{\alpha}{\eta}, \frac{\beta}{\eta}\right) \tag{16} f(w∗)≤i=1∑Kηαigi(w)+j=1∑Mηβjhj(w)+f(w)=L(w,ηα,ηβ)(16)
由于 w w w是任意的,因此有
f ( w ∗ ) ≤ inf w L ( w , α η , β η ) = θ ( α η , β η ) (17) f\left(w^{*}\right) \leq \inf _{w} \mathcal{L}\left(w, \frac{\alpha}{\eta}, \frac{\beta}{\eta}\right)=\theta\left(\frac{\alpha}{\eta}, \frac{\beta}{\eta}\right) \tag{17} f(w∗)≤winfL(w,ηα,ηβ)=θ(ηα,ηβ)(17)
由于 α ≻ 0 , η > 0 \alpha \succ 0, \eta>0 α≻0,η>0,所以 α η ≻ 0 \frac{\alpha}{\eta} \succ 0 ηα≻0,满足对偶问题的限制条件,因此有:
f ( w ∗ ) ≤ θ ( α ∗ , β ∗ ) (18) f\left(w^{*}\right) \leq \theta\left(\alpha^{*}, \beta^{*}\right) \tag{18} f(w∗)≤θ(α∗,β∗)(18)
在根据定理3,有 θ ( α ∗ , β ∗ ) ≤ f ( w ∗ ) \theta\left(\alpha^{*}, \beta^{*}\right) \leq f\left(w^{*}\right) θ(α∗,β∗)≤f(w∗),所以 f ( w ∗ ) = θ ( α ∗ , β ∗ ) f\left(w^{*}\right)=\theta\left(\alpha^{*}, \beta^{*}\right) f(w∗)=θ(α∗,β∗),得证。
情况2: η = 0 \eta=0 η=0,此时对 ∀ w ∈ D \forall w \in D ∀w∈D,有
∑ i = 1 K α i g i ( w ) + ∑ j = 1 M β j h j ( w ) ≥ 0 (19) \sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w) \geq 0 \tag{19} i=1∑Kαigi(w)+j=1∑Mβjhj(w)≥0(19)
根据定理4中的条件④(slater条件), ∃ w \exists w ∃w使 g i ( w ) < 0 g_{i}(w)<0 gi(w)<0, h j ( w ) = 0 h_{j}(w)=0 hj(w)=0,这可以推出 α i = 0 \alpha_{i}=0 αi=0,因此公式(19)变为
∑ j = 1 M β j h j ( w ) ≥ 0 , 或记为 β T h ( w ) ≥ 0 (20) \sum_{j=1}^{M} \beta_{j} h_{j}(w) \geq 0, \text { 或记为 } \beta^{\mathrm{T}} h(w) \geq 0 \tag{20} j=1∑Mβjhj(w)≥0, 或记为 βTh(w)≥0(20)
根据定理4中的条件③, h ( w ) = c w + d h(w)=c w+d h(w)=cw+d,代入得:
β T h ( w ) ≥ 0 β T c w + β T d ≥ 0 (21) \begin{array}{l} \beta^{\mathrm{T}} h(w) \geq 0 \\ \beta^{\mathrm{T}} c w+\beta^{\mathrm{T}} d \geq 0 \end{array} \tag{21} βTh(w)≥0βTcw+βTd≥0(21)
记 P = β T c P=\beta^{\mathrm{T}} c P=βTc, q = β T d q=\beta^{\mathrm{T}} d q=βTd,则式子(21)改写为:
P w + q ≥ 0 (22) P w+q \geq 0 \tag{22} Pw+q≥0(22)
注意公式(22)对所有的 w ∈ D w \in D w∈D都成立。根据条件④ (slater条件), ∃ w \exists w ∃w 使 c w + d = 0 c w+d=0 cw+d=0,从而 P w + q = 0 P w+q=0 Pw+q=0.
下面证明,存在一个 w ′ = w + Δ w w^{\prime}=w+\Delta w w′=w+Δw,其中 Δ w \Delta w Δw在 w w w的一个领域 N ( 0 , ε ) N(0, \varepsilon) N(0,ε)中,使 P w ′ + q < 0 P w^{\prime}+q<0 Pw′+q<0。
证明:根据定理1,有 β ≠ 0 \beta \neq 0 β=0,否则 ( α , β , η ) (\alpha, \beta, \eta) (α,β,η)都为0,与分离超平面定理矛盾。则有 P = β T c ≠ 0 P=\beta^{\mathrm{T}} c \neq 0 P=βTc=0;根据定理2,存在一个 Δ w \Delta w Δw满足 ∥ w ∥ 2 < ε \|w\|^{2}<\varepsilon ∥w∥2<ε且 P Δ w < 0 P \Delta w<0 PΔw<0。因此, w ′ = w + Δ w ∈ N ( 0 , ε ) w^{\prime}=w+\Delta w \in N(0, \varepsilon) w′=w+Δw∈N(0,ε)。
根据定理4中的条件⑤, w ′ ∈ D w^{\prime} \in D w′∈D,同时,
P w ′ + q = P ( w + Δ w ) + q = ( P w + q ) + P Δ w = P Δ w < 0 (23) \begin{aligned} P w^{\prime}+q &=P(w+\Delta w)+q \\ &=(P w+q)+P \Delta w \\ &=P \Delta w<0 \end{aligned} \tag{23} Pw′+q=P(w+Δw)+q=(Pw+q)+PΔw=PΔw<0(23)
这与式子(22)矛盾,所以情况2不成立/不存在。
定理4 强对偶定理得证. ■ \blacksquare ■