拉格朗日乘子法(Lagrange multipliers)是一种寻找多元函数在一组约束下的极值方法。通过引入拉格朗日乘子,可将有 d d d个变量与 k k k个约束条件的最优化问题转化为具有 d + k d+k d+k个变量的无约束优化问题求解
假设 x \mathbf x x为 d d d维向量,,欲寻找 x \mathbf x x的某个取值 x ∗ \mathbf x^* x∗,使目标函数 f ( x ) f(\mathbf x) f(x)最小且满足 m m m个等式约束和 n n n个不等式约束,且可行域 D ⊂ R d \mathbb D \subset \mathbb R^d D⊂Rd非空的优化问题
(1) m i n x   f ( x ) s . t . h i ( x ) ⩽ 0 ( i = 1 , 2 , … , m )    g j ( x ) = 0 ( j = 1 , 2 , … , n ) \underset{\mathbf x}{min}\,f(\mathbf x) \\ s.t. h_i(\mathbf x)\leqslant 0\ (i=1, 2, …, m) \\ \ \ \ \ \ \ \,\,g_j(\mathbf x) = 0 \ (j=1, 2, …, n) \tag{1} xminf(x)s.t.hi(x)⩽0 (i=1,2,…,m) gj(x)=0 (j=1,2,…,n)(1)
引入拉格朗日乘子 α = ( α 1 , α 2 , … , α m ) T \alpha = (\alpha_1, \alpha_2, …, \alpha_m)^T α=(α1,α2,…,αm)T和 β = ( β 1 , β 2 , … , β n ) T \beta = (\beta_1, \beta_2, …, \beta_n)^T β=(β1,β2,…,βn)T,相应的拉格朗日函数为
(2) L ( x , α , β ) = f ( x ) + ∑ i = 1 m α i h i ( x ) + ∑ j = 1 n β j g j ( x ) L(\mathbf x,\mathbf \alpha,\mathbf \beta) = f(\mathbf x) + \sum_{i=1}^{m}\ \alpha_i h_i(\mathbf x) + \sum_{j=1}^{n} \beta_j g_j(\mathbf x) \tag{2} L(x,α,β)=f(x)+i=1∑m αihi(x)+j=1∑nβjgj(x)(2)
这里 x = ( x 1 , x 2 , … , x d ) T ∈ R d \mathbf x = (x_1, x_2, …,x_d)^T \in \mathbb R^d x=(x1,x2,…,xd)T∈Rd, α i ⩾ 0 \alpha_i \geqslant 0 αi⩾0
假设 f ( x ) f(\mathbf x) f(x), h i ( x ) h_i(\mathbf x) hi(x), g j ( x ) g_j(\mathbf x) gj(x)是定义在 R d \mathbb R^d Rd上的连续可微函数
L ( x , α , β ) L(\mathbf x,\mathbf \alpha,\mathbf \beta) L(x,α,β)是 x , α , β \mathbf x,\mathbf \alpha,\mathbf \beta x,α,β的多元非线性函数
定义函数:
(3) θ P ( x ) = m a x α , β : α i ⩾ 0 L ( x , α , β ) \theta_P(\mathbf x) = \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\ L(\mathbf x, \mathbf \alpha, \mathbf \beta) \tag{3} θP(x)=α,β:αi⩾0max L(x,α,β)(3)
其中下标 P P P表示原始问题。则有:
(4) θ P ( x ) = { f ( x ) , i f x s t a t i s f y o r i g i n a l p r o b l e m ′ s c o n s t r a i n t + ∞ , o r e l s e \theta_P(\mathbf x) = \begin{cases} f(\mathbf x),\ if \ \mathbf x\ statisfy\ original\ problem's\ constraint \\ +\infty, \ or\ else \tag{4} \end{cases} θP(x)={f(x), if x statisfy original problem′s constraint+∞, or else(4)
若 x \mathbf x x 满足原问题的约束,则很容易证明 L ( x , α , β ) = f ( x ) + ∑ i = 1 m α i h i ( x ) ⩽ f ( x ) L(\mathbf x,\alpha,\mathbf \beta) = f(\mathbf x) + \sum_{i=1}^{m} \alpha_i h_i(\mathbf x) \leqslant f(\mathbf x) L(x,α,β)=f(x)+∑i=1mαihi(x)⩽f(x),等号在 α i = 0 \alpha_i = 0 αi=0时成立
若 x \mathbf x x 不满足原问题的约束:
若不满足 h i ( x ) ⩽ 0 h_i(\mathbf x) \leqslant 0 hi(x)⩽0:设违反的为 h i ( x ) > 0 h_{i}(\mathbf x) > 0 hi(x)>0,则令 α i → ∞ \alpha_i → \infty αi→∞,有:
L ( x , α , β ) = f ( x ) + ∑ i = 1 m α i h i ( x ) → ∞ L(\mathbf x,\mathbf \alpha,\mathbf \beta) = f(\mathbf x) + \sum_{i=1}^{m}\mathbf \alpha_i h_i(\mathbf x) → \infty L(x,α,β)=f(x)+∑i=1mαihi(x)→∞
若不满足 g j ( x ) = 0 g_j(\mathbf x) = 0 gj(x)=0:设违反的为 g j ( x ) ≠ 0 g_j(\mathbf x) \neq 0 gj(x)̸=0,则令 β j g j ( x ) → ∞ \beta_j g_j(\mathbf x) → \infty βjgj(x)→∞,有:
L ( x , α , β ) = f ( x ) + ∑ i = 1 m α i h i ( x ) + ∑ j = 1 n β j g j ( x ) → ∞ L(\mathbf x,\mathbf \alpha,\mathbf \beta) = f(\mathbf x) + \sum_{i=1}^{m}\ \alpha_i h_i(\mathbf x) + \sum_{j=1}^{n} \beta_j g_j(\mathbf x) → \infty L(x,α,β)=f(x)+∑i=1m αihi(x)+∑j=1nβjgj(x)→∞
考虑极小化问题:
(5) m i n x   θ P ( x ) = m i n x m a x α , β : α i ⩾ 0 L ( x , α , β ) \underset{\mathbf x}{min} \, \theta_P(\mathbf x) = \underset{\mathbf x}{min} \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\ L(\mathbf x, \mathbf \alpha, \mathbf \beta) \tag{5} xminθP(x)=xminα,β:αi⩾0max L(x,α,β)(5)
则该问题与原始最优化问题是等价的,即他们有相同的问题
定义 θ D ( α , β ) = m i n x   L ( x , α , β ) \theta_D (\mathbf \alpha, \mathbf \beta)= \underset{\mathbf x}{min}\, L(\mathbf x, \mathbf \alpha, \mathbf \beta) θD(α,β)=xminL(x,α,β),考虑极大化 θ D ( α , β ) \theta_D (\mathbf \alpha, \mathbf \beta) θD(α,β),即:
(6) m a x α , β : α i ⩾ 0   θ D ( α , β ) = m a x α , β : α i ⩾ 0   m i n x   L ( x , α , β ) \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\,\theta_D (\mathbf \alpha, \mathbf \beta) = \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\, \underset{\mathbf x}{min}\, L(\mathbf x, \mathbf \alpha, \mathbf \beta) \tag{6} α,β:αi⩾0maxθD(α,β)=α,β:αi⩾0maxxminL(x,α,β)(6)
问题 m a x α , β : α i ⩾ 0   m i n x   L ( x , α , β ) \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\, \underset{\mathbf x}{min}\, L(\mathbf x, \mathbf \alpha, \mathbf \beta) α,β:αi⩾0maxxminL(x,α,β)称为广义拉格朗日函数的极大极小问题。它可以表述为约束最优化问题:
(7) m a x α , β : α i ⩾ 0   θ D ( α , β ) = m a x α , β : α i ⩾ 0   m i n x   L ( x , α , β ) s . t . α i ⩾ 0 , i = 1 , 2 , . . , k \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\,\theta_D (\mathbf \alpha, \mathbf \beta) = \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\, \underset{\mathbf x}{min}\, L(\mathbf x, \mathbf \alpha, \mathbf \beta) \\ s.t. \alpha_i \geqslant 0,i=1,2,..,k \tag{7} α,β:αi⩾0maxθD(α,β)=α,β:αi⩾0maxxminL(x,α,β)s.t.αi⩾0,i=1,2,..,k(7)
称为原始问题的对偶问题。
为了方便讨论,定义对偶问题的最优值为: d ∗ = m a x α , β : α i ⩾ 0   θ D ( α , β ) d^* = \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\,\theta_D (\mathbf \alpha, \mathbf \beta) d∗=α,β:αi⩾0maxθD(α,β)
定理一:若原问题和对偶问题具有最优值,则:
(8) d ∗ = m a x α , β : α i ⩾ 0   m i n x   L ( x , α , β ) ⩽ m i n x m a x α , β : α i ⩾ 0 L ( x , α , β ) = p ∗ d^* = \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\, \underset{\mathbf x}{min}\, L(\mathbf x, \mathbf \alpha, \mathbf \beta) \leqslant \underset{\mathbf x}{min} \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\ L(\mathbf x, \mathbf \alpha, \mathbf \beta) = p^* \tag{8} d∗=α,β:αi⩾0maxxminL(x,α,β)⩽xminα,β:αi⩾0max L(x,α,β)=p∗(8)
推论一:设 x ∗ \mathbf x^* x∗为原始问题的可行解,且 θ P ( x ∗ ) \theta_P(\mathbf x^*) θP(x∗)的值为 p ∗ p^* p∗; α ∗ , β ∗ \mathbf \alpha^*,\mathbf \beta^* α∗,β∗为对偶问题的可行解, θ D ( α ∗ , β ∗ ) \theta_D (\mathbf \alpha^*, \mathbf \beta^*) θD(α∗,β∗)值为 d ∗ d^* d∗。
如果有 p ∗ = d ∗ p^* = d^* p∗=d∗,则 x ∗ , α ∗ , β ∗ \mathbf x^*,\mathbf \alpha^*,\mathbf \beta^* x∗,α∗,β∗分别为原始问题和对偶问题的最优解
定理二:假设函数 f ( x ) f(\mathbf x) f(x)和 h i ( x ) h_i(\mathbf x) hi(x)为凸函数, g j ( x ) g_j(\mathbf x) gj(x) 是仿射函数;并且假设不等式约束 h i ( x ) h_i(\mathbf x) hi(x)是严格可行的,即存在 x \mathbf x x,对于所有 i i i有 h i ( x ) < 0 h_i(\mathbf x) < 0 hi(x)<0。则存在 x ∗ , α ∗ , β ∗ \mathbf x^*,\mathbf \alpha^*,\mathbf \beta^* x∗,α∗,β∗,使得: x \mathbf x x是原始问题 m i n x   θ P ( x ) \underset{\mathbf x}{min} \, \theta_P(\mathbf x) xminθP(x)的解, α ∗ , β ∗ \mathbf \alpha^*,\mathbf \beta^* α∗,β∗是对偶问题 m a x α , β : α i ⩾ 0   θ D ( α , β ) \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\,\theta_D (\mathbf \alpha, \mathbf \beta) α,β:αi⩾0maxθD(α,β)的解,并且 p ∗ = d ∗ = L ( x ∗ , α ∗ , β ∗ ) p^*=d^* = L(\mathbf x^*,\mathbf \alpha^*,\mathbf \beta^*) p∗=d∗=L(x∗,α∗,β∗)
定理三:假设函数 f ( x ) f(\mathbf x) f(x)和 h i ( x ) h_i(\mathbf x) hi(x)为凸函数, g j ( x ) g_j(\mathbf x) gj(x) 是仿射函数;并且假设不等式约束 h i ( x ) h_i(\mathbf x) hi(x)是严格可行的,即存在 x \mathbf x x,对于所有 i i i有 h i ( x ) < 0 h_i(\mathbf x) < 0 hi(x)<0。则存在 x ∗ , α ∗ , β ∗ \mathbf x^*,\mathbf \alpha^*,\mathbf \beta^* x∗,α∗,β∗,使得: x \mathbf x x是原始问题 m i n x   θ P ( x ) \underset{\mathbf x}{min} \, \theta_P(\mathbf x) xminθP(x)的解, α ∗ , β ∗ \mathbf \alpha^*,\mathbf \beta^* α∗,β∗是对偶问题 m a x α , β : α i ⩾ 0   θ D ( α , β ) \underset{\mathbf \alpha,\mathbf \beta:\alpha_i\geqslant 0}{max}\,\theta_D (\mathbf \alpha, \mathbf \beta) α,β:αi⩾0maxθD(α,β)的解的充要条件是: x ∗ , α ∗ , β ∗ \mathbf x^*,\mathbf \alpha^*,\mathbf \beta^* x∗,α∗,β∗满足下面的**Karush-kuhn-Tucker(KKT)**条件:
∇ x = L ( x ∗ , α ∗ , β ∗ ) ∇ α = L ( x ∗ , α ∗ , β ∗ ) ∇ β = L ( x ∗ , α ∗ , β ∗ ) α i ∗ h i ∗ ( x ∗ ) = 0 , i = 1 , 2 , . . . , k h i ∗ ( x ∗ ) ⩽ 0 , i = 1 , 2 , . . . , k α i ∗ ⩾ 0 , i = 1 , 2 , . . . , k g j ∗ ( x ∗ ) = 0 , j = 1 , 2 , . . . , k \nabla_\mathbf x = L(\mathbf x^*,\mathbf \alpha^*,\mathbf \beta^*) \\ \nabla_\mathbf \alpha = L(\mathbf x^*,\mathbf \alpha^*,\mathbf \beta^*) \\ \nabla_\mathbf \beta = L(\mathbf x^*,\mathbf \alpha^*,\mathbf \beta^*) \\ \alpha_i^* h_i^*(\mathbf x^*) = 0,i=1,2,...,k \\ h_i^*(\mathbf x^*) \leqslant 0 ,i=1,2,...,k\\ \alpha_i^* \geqslant 0 ,i=1,2,...,k \\ g_j^*(\mathbf x^*) = 0,j=1,2,...,k ∇x=L(x∗,α∗,β∗)∇α=L(x∗,α∗,β∗)∇β=L(x∗,α∗,β∗)αi∗hi∗(x∗)=0,i=1,2,...,khi∗(x∗)⩽0,i=1,2,...,kαi∗⩾0,i=1,2,...,kgj∗(x∗)=0,j=1,2,...,k
仿射函数:仿射函数即由1阶多项式构成的函数。
一般形式为: f ( x ) = A x + b f(\mathbf x) = \mathbf A \mathbf x + b f(x)=Ax+b。这里: A \mathbf A A是一个 m × n m \times n m×n矩阵, x \mathbf x x是一个 k k k维列向量, b b b是一个 m m m维列向量,它实际上反映了一种从 k k k维到 m m m维的空间线性映射关系。
凸函数:设 f f f为定义在区间 X \mathcal{X} X上的函数,若对 X \mathcal{X} X上的任意两点 x 1 , x 2 \mathbf x_1,\mathbf x_2 x1,x2和任意的实数 λ ∈ ( 0 , 1 ) \lambda \in (0, 1) λ∈(0,1) ,总有 f ( λ x 1 + ( 1 − λ ) x 2 ) ⩾ λ f ( x 1 ) + ( 1 − λ ) f ( x 2 ) f(\lambda\mathbf x_1 + (1-\lambda)\mathbf x_2) \geqslant \lambda f(\mathbf x_1) + (1-\lambda)f(\mathbf x_2) f(λx1+(1−λ)x2)⩾λf(x1)+(1−λ)f(x2),则 f f f称为 X \mathcal{X} X上的凸函数 。
参考: