无约束优化 x ∈ R N x \in \mathbb{R}^{N} x∈RN
min x f ( x ) \min _{x} f(x) xminf(x) 有函数解析式时,由Fermat定理,对函数求导令其导数为零,即 ∇ x f ( x ) = 0 \nabla_{x} f(x)=0 ∇xf(x)=0
无函数解析式时,可以通过梯度下降法,牛顿法等迭代方法使沿着负梯度方向下降逐步逼近极小点.
min f ( x ) ( 1 ) \min f(x) \qquad(1) minf(x)(1) s.t. h i ( x ) = 0 , i = 1 , 2 , ⋯   , l \text { s.t. } \quad h_{i}(x)=0, i=1,2, \cdots, l s.t. hi(x)=0,i=1,2,⋯,l
定理1-1 (拉格朗日定理(KKT条件))假设 x ∗ x^* x∗是问题 ( 1 ) (1) (1)的局部极小点, f ( x ) f(x) f(x)和 h i ( x ∗ ) ( i = 1 , 2 , … , l ) h_i(x^*)(i=1,2,\dots,l) hi(x∗)(i=1,2,…,l)在 x ∗ x^* x∗的某邻域内连续可微。若向量组 ∇ h i ( x ∗ ) ( i = 1 , 2 , … , l ) \nabla h_{i}\left(x^{*}\right)(i=1,2,\dots,l) ∇hi(x∗)(i=1,2,…,l)线性无关,则存在乘子向量 λ ∗ = ( λ 1 ∗ , λ 2 ∗ , ⋯   , λ l ∗ ) T \lambda^{*}=\left(\lambda_{1}^{*}, \lambda_{2}^{*}, \cdots, \lambda_{l}^{*}\right)^{T} λ∗=(λ1∗,λ2∗,⋯,λl∗)T,使得 ∇ x L ( x ∗ , λ ∗ ) = 0 \nabla_{x} L\left(x^{*}, \lambda^{*}\right)=0 ∇xL(x∗,λ∗)=0即 ∇ f ( x ∗ ) − ∑ i = 1 l λ i ∗ ∇ h i ( x ∗ ) = 0 \nabla f\left(x^{*}\right)-\sum_{i=1}^{l} \lambda_{i}^{*} \nabla h_{i}\left(x^{*}\right)=0 ∇f(x∗)−i=1∑lλi∗∇hi(x∗)=0定理1-2    \; 对于等式约束问题 ( 1 ) (1) (1),假设 f ( x ) f(x) f(x)和 h i ( x ) ( i = 1 , 2 , … , l ) h_i(x)(i=1,2,\dots,l) hi(x)(i=1,2,…,l)都是二阶连续可微的,并且存在 ( x ∗ , λ ∗ ) ∈ R n × R l \left(x^{*}, \lambda^{*}\right) \in \mathbb{R}^{n} \times \mathbb{R}^{l} (x∗,λ∗)∈Rn×Rl,使得 ∇ L ( x ∗ , λ ∗ ) = 0 \nabla L\left(x^{*}, \lambda^{*}\right)=0 ∇L(x∗,λ∗)=0。若对任意的 0 ≠ d ∈ R n , ∇ h i ( x ∗ ) T d = 0    ( i = 1 , 2 , … , l ) 0 \neq d \in \mathbb{R}^{n},\nabla h_{i}\left(x^{*}\right)^{T}d=0 \;(i=1,2,\dots,l) 0̸=d∈Rn,∇hi(x∗)Td=0(i=1,2,…,l),,均有 d T ∇ x x 2 L ( x ∗ , λ ∗ ) d > 0 d^{T} \nabla_{x x}^{2} L\left(x^{*}, \lambda^{*}\right) d>0 dT∇xx2L(x∗,λ∗)d>0 则 x ∗ x^* x∗是问题 ( 1 ) (1) (1)的一个严格局部极小点.
min f ( x ) ( 2 ) \min f(x) \qquad(2) minf(x)(2) s.t. g i ( x ) ≥ 0 , i = 1 , 2 , ⋯   , m \text { s.t. } g_{i}(x) \geq 0, i=1,2, \cdots, m s.t. gi(x)≥0,i=1,2,⋯,m 记可行域为 D = { x ∈ R n ∣ g i ( x ) ≥ 0 , i = 1 , 2 , ⋯   , n } \mathcal{D}=\left\{x \in \mathbb{R}^{n} | g_{i}(x) \geq 0, i=1,2, \cdots, n\right\} D={x∈Rn∣gi(x)≥0,i=1,2,⋯,n} ,指标集 I = { 1 , ⋯   , m } I=\{1, \cdots, m\} I={1,⋯,m}
不等式约束问题的最优性条件需要用到有效约束和非有效约束的概念。对于一个可行点 x ‾ \overline{x} x,即 x ‾ ∈ D \overline{x} \in \mathcal{D} x∈D,此时可能会出现两种情形。即有效约束函数满足 g i ( x ‾ ) = 0 g_{i}(\overline{x})=0 gi(x)=0,而另一些约束函数满足 g i ( x ‾ ) > 0 g_{i}(\overline{x})>0 gi(x)>0,对于后一种情形,在 x ‾ \overline{x} x的某一个领域内仍然保持 g i ( x ‾ ) > 0 g_{i}(\overline{x})>0 gi(x)>0成立,而前者不具备这种性质,因此有必要把这两种情形区分开来.
定义1    \; 若问题 ( 1 ) (1) (1)的一个可行点 x ‾ ∈ D \overline{x} \in \mathcal{D} x∈D使得
g i ( x ‾ ) = 0 g_{i}(\overline{x})=0 gi(x)=0,则称不等式约束 g i ( x ) ≥ 0 g_{i}(x) \geq 0 gi(x)≥0为 x ‾ \overline{x} x有效约束,反之若有 g i ( x ‾ ) > 0 g_{i}(\overline{x})>0 gi(x)>0,则称不等式约束为 g i ( x ) > 0 g_{i}(x) >0 gi(x)>0的非有效约束。称集合
I ( x ‾ ) = { i : g i ( x ‾ ) = 0 } I(\overline{x})=\left\{i : g_{i}(\overline{x})=0\right\} I(x)={i:gi(x)=0}为 x ‾ \overline{x} x处的有效约束指标集,简称处的有效集(或积极集)
以下两个引理是研究不等式约束问题最优性条件的基础
引理1 (Farkas引理)    \; 设 a , b i ∈ R n ( i = 1 , 2 , … , r ) a, b_{i} \in \mathbb{R}^{n}(i=1,2,\dots,r) a,bi∈Rn(i=1,2,…,r),则线性不等式组
b i T d ≥ 0 , i = 1 , ⋯   , r , d ∈ R n b_{i}^{T} d \geq 0, \quad i=1, \cdots, r, d \in \mathbb{R}^{n} biTd≥0,i=1,⋯,r,d∈Rn与不等式 a T d ≥ 0 a^{T} d \geq 0 aTd≥0
相容的充要条件是存在非负实数 α 1 , ⋯   , α r \alpha_{1}, \cdots, \alpha_{r} α1,⋯,αr,使得
a = ∑ i = 1 r α i b i a=\sum_{i=1}^{r} \alpha_{i} b_{i} a=i=1∑rαibi
Gordon引理可以认为是Farkas引理的一个推论
引理2 (Gordan引理)    \; 设 b i ∈ R n ( i = 1 , ⋯   , r ) b_{i} \in \mathbb{R}^{n}(i=1, \cdots, r) bi∈Rn(i=1,⋯,r),线性不等式组
b i T d ≥ 0 , i = 1 , ⋯   , r , d ∈ R n b_{i}^{T} d \geq 0, \quad i=1, \cdots, r, d \in \mathbb{R}^{n} biTd≥0,i=1,⋯,r,d∈Rn无解的充要条件是 b i ( i = 1 , ⋯   , r ) b_{i}(i=1, \cdots, r) bi(i=1,⋯,r)线性相关,即存在不全为0的非负实数 α i ( i = 1 , ⋯   , r ) \alpha_{i}(i=1, \cdots, r) αi(i=1,⋯,r),使得 ∑ i = 1 r α i b i = 0 \sum_{i=1}^{r} \alpha_{i} b_{i}=0 i=1∑rαibi=0
下面的引理可认为是一个几何最优性条件
引理3    \; 设 x ∗ x^* x∗是不等式约束问题 ( 2 ) (2) (2)的一个局部极小点,. 假设 f ( x ) f(x) f(x)和 g i ( x ) ( i ∈ I ( x ∗ ) ) g_{i}(x)\left(i \in I\left(x^{*}\right)\right) gi(x)(i∈I(x∗))在处 x ∗ x^* x∗可微, 且 g i ( x ) ( i ∈ I \ I ( x ∗ ) ) g_{i}(x)\left(i \in I \backslash I\left(x^{*}\right)\right) gi(x)(i∈I\I(x∗))在 x ∗ x^* x∗处连续. 则问题 ( 2 ) (2) (2)的可行方向集 F \mathcal{F} F与下降方向集 S \mathcal{S} S的交集是空集, 即 F ∩ S = ∅ \mathcal{F} \cap \mathcal{S}=\emptyset F∩S=∅, 其中 F = { d ∈ R n ∣ ∇ g i ( x ∗ ) T d > 0 , i ∈ I ( x ∗ ) } , S = { d ∈ R n ∣ ∇ f ( x ∗ ) T d < 0 } \mathcal{F}=\left\{d \in \mathbb{R}^{n} | \nabla g_{i}\left(x^{*}\right)^{T} d>0, i \in I\left(x^{*}\right)\right\}, \quad \mathcal{S}=\left\{d \in \mathbb{R}^{n} | \nabla f\left(x^{*}\right)^{T} d<0\right\} F={d∈Rn∣∇gi(x∗)Td>0,i∈I(x∗)},S={d∈Rn∣∇f(x∗)Td<0}不等式约束问题的一阶必要条件,即著名的KKT条件
定理2-1    \; (KKT条件)设 x ∗ x^* x∗是不等式约束问题 ( 2 ) (2) (2)的局部极小点,有效约束集 I ( x ∗ ) = { i ∣ g i ( x ∗ ) = 0 , i = 1 , ⋯   , m } I\left(x^{*}\right)=\left\{i | g_{i}\left(x^{*}\right)=0, i=1, \cdots, m\right\} I(x∗)={i∣gi(x∗)=0,i=1,⋯,m}
,并设 f ( x ) f(x) f(x)和 g i ( x ) g_i(x) gi(x)在 x ∗ x^* x∗处可微,若向量组 ∇ g i ( x ∗ ) ( i ∈ I ( x ∗ ) ) \nabla g_{i}\left(x^{*}\right)\left(i \in I\left(x^{*}\right)\right) ∇gi(x∗)(i∈I(x∗))线性无关,则存在向量 λ ∗ = ( λ 1 ∗ , ⋯   , λ m ∗ ) T \lambda^{*}=\left(\lambda_{1}^{*}, \cdots, \lambda_{m}^{*}\right)^{T} λ∗=(λ1∗,⋯,λm∗)T使得
{ ∇ f ( x ∗ ) − ∑ i = 1 m λ i ∗ ∇ g i ( x ∗ ) = 0 g i ( x ∗ ) ≥ 0 , λ i ∗ ≥ 0 , λ ∗ g i ( x ∗ ) = 0 , i = 1 , ⋯   , m \left\{\begin{array}{l}{\nabla f\left(x^{*}\right)-\sum_{i=1}^{m} \lambda_{i}^{*} \nabla g_{i}\left(x^{*}\right)=0} \\ {g_{i}\left(x^{*}\right) \geq 0, \lambda_{i}^{*} \geq 0, \lambda^{*} g_{i}\left(x^{*}\right)=0, i=1, \cdots, m}\end{array}\right. {∇f(x∗)−∑i=1mλi∗∇gi(x∗)=0gi(x∗)≥0,λi∗≥0,λ∗gi(x∗)=0,i=1,⋯,m证明:因 x ∗ x^* x∗是问题 ( 2 ) (2) (2)的局部极小点,故由引理3知,不存在 d ∈ R n d \in \mathbb{R}^{n} d∈Rn使得 ∇ f ( x ∗ ) T d < 0 , ∇ g i ( x ∗ ) T d > 0 , i ∈ I ( x ∗ ) \nabla f\left(x^{*}\right)^{T} d<0, \quad \nabla g_{i}\left(x^{*}\right)^{T} d>0, i \in I\left(x^{*}\right) ∇f(x∗)Td<0,∇gi(x∗)Td>0,i∈I(x∗)即线性不等式组 ∇ f ( x ∗ ) T d < 0 , − ∇ g i ( x ∗ ) T d < 0 , i ∈ I ( x ∗ ) \nabla f\left(x^{*}\right)^{T} d<0, \quad-\nabla g_{i}\left(x^{*}\right)^{T} d<0, i \in I\left(x^{*}\right) ∇f(x∗)Td<0,−∇gi(x∗)Td<0,i∈I(x∗)无解,于是由Gordon引理知,存在不全为0的非负实数 μ 0 ≥ 0 \mu_{0} \geq 0 μ0≥0及 μ i ≥ 0 ( i ∈ I ( x ∗ ) ) \mu_{i} \geq 0(i\in I(x^*)) μi≥0(i∈I(x∗)),使得 μ 0 ∇ f ( x ∗ ) − ∑ i ∈ I ( x ∗ ) μ i ∇ g i ( x ∗ ) = 0 \mu_{0} \nabla f\left(x^{*}\right)-\sum_{i \in I\left(x^{*}\right)} \mu_{i} \nabla g_{i}\left(x^{*}\right)=0 μ0∇f(x∗)−i∈I(x∗)∑μi∇gi(x∗)=0不难证明 μ 0 ≠ 0 \mu_{0} \neq 0 μ0̸=0。事实上若 μ 0 = 0 \mu_{0}=0 μ0=0,则有 ∑ i ∈ I ( x ∗ ) μ i ∇ g i ( x ∗ ) = 0 \sum_{i \in I\left(x^{*}\right)} \mu_{i} \nabla g_{i}\left(x^{*}\right)=0 ∑i∈I(x∗)μi∇gi(x∗)=0,由此 ∇ g i ( x ∗ ) ( i ∈ I ( x ∗ ) ) \nabla g_{i}\left(x^{*}\right)\left(i \in I\left(x^{*}\right)\right) ∇gi(x∗)(i∈I(x∗))可知线性相关,这与假设矛盾,因此必有 μ 0 ≥ 0 \mu_{0} \geq 0 μ0≥0,于是可令 λ i ∗ = μ i μ 0 , i ∈ I ( x ∗ ) ; λ i ∗ = 0 , i ∈ I \ I ( x ∗ ) \lambda_{i}^{*}=\frac{\mu_{i}}{\mu_{0}}, \quad i \in I\left(x^{*}\right) ; \lambda_{i}^{*}=0, \quad i \in I \backslash I\left(x^{*}\right) λi∗=μ0μi,i∈I(x∗);λi∗=0,i∈I\I(x∗)则得 ∇ f ( x ∗ ) − ∑ i = 1 m λ i ∗ ∇ g i ( x ∗ ) = 0 \nabla f\left(x^{*}\right)-\sum_{i=1}^{m} \lambda_{i}^{*} \nabla g_{i}\left(x^{*}\right)=0 ∇f(x∗)−i=1∑mλi∗∇gi(x∗)=0及 g i ( x ∗ ) ≥ 0 , λ i ∗ ≥ 0 , λ i ∗ g i ( x ∗ ) = 0 , i = 1 , ⋯   , m g_{i}\left(x^{*}\right) \geq 0, \lambda_{i}^{*} \geq 0, \lambda_{i}^{*} g_{i}\left(x^{*}\right)=0, i=1, \cdots, m gi(x∗)≥0,λi∗≥0,λi∗gi(x∗)=0,i=1,⋯,m定理得证
min f ( x ) \min f(x) minf(x) s.t. h i ( x ) = 0 , i = 1 , 2 , ⋯   , l ( 3 ) \text { s.t. } \quad h_{i}(x)=0, i=1,2, \cdots, l \qquad(3) s.t. hi(x)=0,i=1,2,⋯,l(3) g i ( x ) ≥ 0 , i = 1 , 2 , ⋯   , m g_{i}(x) \geq 0, i=1,2, \cdots, m gi(x)≥0,i=1,2,⋯,m记指标集 E = { 1 , ⋯   , l } , I = { 1 , ⋯   , m } E=\{1, \cdots, l\}, I=\{1, \cdots, m\} E={1,⋯,l},I={1,⋯,m}
可行域 D = { x ∈ R n ∣ h i ( x ) = 0 , i ∈ E , g i ( x ) ≥ 0 , i ∈ I } \mathcal{D}=\left\{x \in \mathbb{R}^{n} | h_{i}(x)=0, i \in\right.E, g_{i}(x) \geq 0, i \in I \} D={x∈Rn∣hi(x)=0,i∈E,gi(x)≥0,i∈I}
将等式约束的KKT条件和不等式约束KKT条件合并起来即得到一般约束问题 ( 3 ) (3) (3)的KKT一阶必要条件
KKT一阶必要条件    \; 设 x ∗ x^* x∗是不等式约束问题 ( 3 ) (3) (3)的局部极小点,在 x ∗ x^* x∗有效约束集为
S ( x ∗ ) = E ∪ I ( x ∗ ) = E ∪ { i ∣ g i ( x ∗ ) = 0 , i ∈ I } S\left(x^{*}\right)=E \cup I\left(x^{*}\right)=E \cup\left\{i | g_{i}\left(x^{*}\right)=0, i \in I\right\} S(x∗)=E∪I(x∗)=E∪{i∣gi(x∗)=0,i∈I}并设 f ( x ) ,    h i ( x ) ( i ∈ E ) f(x), \;h_{i}(x)(i \in E) f(x),hi(x)(i∈E) 和 g i ( x ) ( i ∈ I ) g_{i}(x)(i \in I) gi(x)(i∈I) 在 x ∗ x^* x∗可微,若向量组 ∇ h i ( x ∗ ) ( i ∈ E ) \nabla h_{i}(x^{*})(i\in E) ∇hi(x∗)(i∈E)
, ∇ g i ( x ∗ ) ( i ∈ I ( x ∗ ) ) \nabla g_{i}\left(x^{*}\right)\left(i \in I\left(x^{*}\right)\right) ∇gi(x∗)(i∈I(x∗))线性无关,则存在向量 ( μ ∗ , λ ∗ ) ∈ R l × R m \left(\mu^{*}, \lambda^{*}\right) \in \mathbb{R}^{l} \times \mathbb{R}^{m} (μ∗,λ∗)∈Rl×Rm ,其中 μ ∗ = ( μ 1 ∗ , ⋯   , μ l ∗ ) T \mu^{*}=\left(\mu_{1}^{*}, \cdots, \mu_{l}^{*}\right)^{T} μ∗=(μ1∗,⋯,μl∗)T , λ ∗ = ( λ 1 ∗ , ⋯   , λ m ∗ ) T \lambda^{*}=\left(\lambda_{1}^{*}, \cdots, \lambda_{m}^{*}\right)^{T} λ∗=(λ1∗,⋯,λm∗)T使得
{ ∇ f ( x ∗ ) − ∑ i = 1 l μ i ∗ ∇ h i ( x ∗ ) − ∑ i = 1 m λ i ∗ ∇ g i ( x ∗ ) = 0 h i ( x ∗ ) = 0 , i ∈ E ( 3.1 ) g i ( x ∗ ) ≥ 0 , λ i ∗ ≥ 0 , λ i ∗ g i ( x ∗ ) = 0 , i ∈ I \left\{\begin{array}{l}{\nabla f\left(x^{*}\right)-\sum_{i=1}^{l} \mu_{i}^{*} \nabla h_{i}\left(x^{*}\right)-\sum_{i=1}^{m} \lambda_{i}^{*} \nabla g_{i}\left(x^{*}\right)=0} \\ {h_{i}\left(x^{*}\right)=0, i \in E} \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad(3.1)\\ {g_{i}\left(x^{*}\right) \geq 0, \quad \lambda_{i}^{*} \geq 0, \quad \lambda_{i}^{*} g_{i}\left(x^{*}\right)=0, i \in I}\end{array}\right. ⎩⎨⎧∇f(x∗)−∑i=1lμi∗∇hi(x∗)−∑i=1mλi∗∇gi(x∗)=0hi(x∗)=0,i∈E(3.1)gi(x∗)≥0,λi∗≥0,λi∗gi(x∗)=0,i∈I
称 ( 3.1 ) (3.1) (3.1)为KKT条件,满足这一条件的点 x ∗ x^* x∗称为KKT点,而把 ( x ∗ , ( μ ∗ , λ ∗ ) ) \left(x^{*},\left(\mu^{*}, \lambda^{*}\right)\right) (x∗,(μ∗,λ∗))称为KKT对,其中 ( μ ∗ , λ ∗ ) \left(\mu^{*}, \lambda^{*}\right) (μ∗,λ∗)称为问题的拉格朗日乘子,通常KKT点、KKT对和KKT条件可以不加区别的使用.
称为 λ i ∗ g i ( x ∗ ) = 0 ( i ∈ I ( x ∗ ) ) \lambda_{i}^{*} g_{i}\left(x^{*}\right)=0\left(i \in I\left(x^{*}\right)\right) λi∗gi(x∗)=0(i∈I(x∗))互补松弛条件,这意味着 λ i ∗ \lambda_{i}^{*} λi∗和 g i ( x ∗ ) g_{i}\left(x^{*}\right) gi(x∗)中至少有一个必为0,若二者中的一个为0,而另一个严格大于0,则称之为满足严格互补性松弛条件.
可以定义问题 ( 3 ) (3) (3)的拉格朗日函数 L ( x , λ , μ ) = f ( x ) − ∑ i = 1 l μ i h i ( x ) − ∑ i = 1 m λ i g i ( x ) L(x, \lambda, \mu)=f(x)-\sum_{i=1}^{l} \mu_{i} h_{i}(x)-\sum_{i=1}^{m} \lambda_{i} g_{i}(x) L(x,λ,μ)=f(x)−i=1∑lμihi(x)−i=1∑mλigi(x)可以求出它关于变量 x x x的梯度和Hessen矩阵分别为 ∇ x L ( x , λ , μ ) = ∇ f ( x ) − ∑ i = 1 l μ i ∇ h i ( x ) − ∑ i = 1 m λ i ∇ g i ( x ) \nabla_{x} L(x, \lambda, \mu)=\nabla f(x)-\sum_{i=1}^{l} \mu_{i} \nabla h_{i}(x)-\sum_{i=1}^{m} \lambda_{i} \nabla g_{i}(x) ∇xL(x,λ,μ)=∇f(x)−i=1∑lμi∇hi(x)−i=1∑mλi∇gi(x) ∇ x x 2 L ( x , λ , μ ) = ∇ 2 f ( x ) − ∑ i = 1 l μ i ∇ 2 h i ( x ) − ∑ i = 1 m λ i ∇ 2 g i ( x ) \nabla_{x x}^{2} L(x, \lambda, \mu)=\nabla^{2} f(x)-\sum_{i=1}^{l} \mu_{i} \nabla^{2} h_{i}(x)-\sum_{i=1}^{m} \lambda_{i} \nabla^{2} g_{i}(x) ∇xx2L(x,λ,μ)=∇2f(x)−i=1∑lμi∇2hi(x)−i=1∑mλi∇2gi(x)问题 ( 3 ) (3) (3)的二阶充分条件:
定理3-2 对于约束优化问题 ( 3 ) (3) (3),假设 f ( x ) , g i ( x ) ( i ∈ I ) f(x), g_{i}(x)(i \in I) f(x),gi(x)(i∈I)和 h i ( x ) ( i ∈ E ) h_{i}(x)(i \in E) hi(x)(i∈E)都是二阶连续可微的,有效约束集 I ( x ∗ ) = E ∪ I ( x ∗ ) = E ∪ { i ∣ g i ( x ∗ ) = 0 , i ∈ I } I\left(x^{*}\right)=E \cup I\left(x^{*}\right)=E \cup\left\{i | g_{i}\left(x^{*}\right)=0, i \in I\right\} I(x∗)=E∪I(x∗)=E∪{i∣gi(x∗)=0,i∈I} ,且 ( x ∗ , ( μ ∗ , λ ∗ ) ) \left(x^{*},\left(\mu^{*}, \lambda^{*}\right)\right) (x∗,(μ∗,λ∗))是问题的KKT点,若对任意的 0 ≠ d ∈ R n , ∇ g i ( x ∗ ) T d = 0 ( i ∈ I ( x ∗ ) , ∇ h i ( x ∗ ) T d = 0 ( i ∈ E ) 0 \neq d \in \mathbb{R}^{n}, \nabla g_{i}\left(x^{*}\right)^{T} d=0\left(i \in I\left(x^{*}\right)\right.,\nabla h_{i}\left(x^{*}\right)^{T} d=0(i \in E) 0̸=d∈Rn,∇gi(x∗)Td=0(i∈I(x∗),∇hi(x∗)Td=0(i∈E),均有 d T ∇ x x 2 L ( x ∗ , λ ∗ ) d > 0 d^{T} \nabla_{x x}^{2} L\left(x^{*}, \lambda^{*}\right) d>0 dT∇xx2L(x∗,λ∗)d>0,则 x ∗ x^* x∗是问题 ( 3 ) (3) (3)的一个严格局部极小点
一般而言,问题 ( 3 ) (3) (3)的KKT点不一定是局部极小点,但如果问题是下面的凸优化问题,则KKT点、局部极小点、全局极小点三者等价
定义2 对于约束优化问题 min f ( x ) , x ∈ R n \min f(x), \quad x \in \mathbb{R}^{n} minf(x),x∈Rn s.t. h i ( x ) = 0 , i = 1 , 2 , ⋯   , l \text { s.t. } \quad h_{i}(x)=0, i=1,2, \cdots, l s.t. hi(x)=0,i=1,2,⋯,l g i ( x ) ≥ 0 , i = 1 , 2 , ⋯   , m g_{i}(x) \geq 0, i=1,2, \cdots, m gi(x)≥0,i=1,2,⋯,m若 f ( x ) f(x) f(x)是凸函数, h i ( x ) ( i = 1 , ⋯   , l ) h_{i}(x)(i=1, \cdots, l) hi(x)(i=1,⋯,l)是线性函数, g i ( x ) ( i = 1 , ⋯   , m ) g_{i}(x)(i=1, \cdots, m) gi(x)(i=1,⋯,m)是凹函数(即 − g i ( x ) -g_i(x) −gi(x)是凸函数),那么上述约束优化问题称为凸优化问题。
定理3-3 设 ( x ∗ , μ ∗ , λ ∗ ) \left(x^{*}, \mu^{*}, \lambda^{*}\right) (x∗,μ∗,λ∗)是凸优化问题的KKT点,则 x ∗ x^* x∗是该问题的全局极小点.
证明:因对于凸优化问题,其拉格朗日函数 L ( x , μ ∗ , λ ∗ ) = f ( x ) − ∑ i = 1 l μ i ∗ h i ( x ) − ∑ i = 1 m λ i ∗ g i ( x ) L\left(x, \mu^{*}, \lambda^{*}\right)=f(x)-\sum_{i=1}^{l} \mu_{i}^{*} h_{i}(x)-\sum_{i=1}^{m} \lambda_{i}^{*} g_{i}(x) L(x,μ∗,λ∗)=f(x)−i=1∑lμi∗hi(x)−i=1∑mλi∗gi(x)关于 x x x是凸函数,故对于每一个可行点 x x x,有 f ( x ) ≥ f ( x ) − ∑ i = 1 l μ i ∗ h i ( x ) − ∑ i = 1 m λ i ∗ g i ( x ) = L ( x , μ ∗ , λ ∗ ) ≥ L ( x ∗ , μ ∗ , λ ∗ ) + ∇ x L ( x ∗ , μ ∗ , λ ∗ ) T ( x − x ∗ ) = L ( x ∗ , μ ∗ , λ ∗ ) = f ( x ∗ ) \begin{aligned} f(x) & \geq f(x)-\sum_{i=1}^{l} \mu_{i}^{*} h_{i}(x)-\sum_{i=1}^{m} \lambda_{i}^{*} g_{i}(x) \\ &=L\left(x, \mu^{*}, \lambda^{*}\right) \\ & \geq L\left(x^{*}, \mu^{*}, \lambda^{*}\right)+\nabla_{x} L\left(x^{*}, \mu^{*}, \lambda^{*}\right)^{T}\left(x-x^{*}\right) \\ &=L\left(x^{*}, \mu^{*}, \lambda^{*}\right)=f\left(x^{*}\right) \end{aligned} f(x)≥f(x)−i=1∑lμi∗hi(x)−i=1∑mλi∗gi(x)=L(x,μ∗,λ∗)≥L(x∗,μ∗,λ∗)+∇xL(x∗,μ∗,λ∗)T(x−x∗)=L(x∗,μ∗,λ∗)=f(x∗)故 x ∗ x^* x∗是问题的全局极小点.
参考:马昌凤《最优化方法及其Matlab程序设计》