标准优化问题(不一定为凸)形式:
p ⋆ = min f 0 ( x ) s.t. f i ( x ) ⩽ 0 , i = 1 , … , m h i ( x ) = 0 , i = 1 , … , p \begin{aligned} p^{\star}=\min & \quad f_0(\mathbf{x}) \\ \text { s.t. } & \quad f_i(\mathbf{x}) \leqslant 0, i=1, \ldots, m \\ & \quad h_i(\mathbf{x})=0, i=1, \ldots, p \end{aligned} p⋆=min s.t. f0(x)fi(x)⩽0,i=1,…,mhi(x)=0,i=1,…,p
其中,定义域为: D = ( ⋂ i = 0 m dom f i ) ∩ ( ⋂ i = 1 p dom h i ) \mathcal{D}=\left(\bigcap_{i=0}^m \operatorname{dom} f_i\right) \cap\left(\bigcap_{i=1}^p \operatorname{dom} h_i\right) D=(⋂i=0mdomfi)∩(⋂i=1pdomhi)。
该问题称为原问题,未知变量称作原始变量。
Lagrange函数的定义为:
L ( x , λ , ν ) = f 0 ( x ) + ∑ i = 1 m λ i f i ( x ) + ∑ i = 1 p ν i h i ( x ) \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu})=f_0(\mathbf{x})+\sum_{i=1}^m \lambda_i f_i(\mathbf{x})+\sum_{i=1}^p \nu_i h_i(\mathbf{x}) L(x,λ,ν)=f0(x)+i=1∑mλifi(x)+i=1∑pνihi(x)
其定义域为 dom L = D × R m × R p \operatorname{dom} \mathcal{L}=\mathcal{D} \times \mathbb{R}^m \times \mathbb{R}^p domL=D×Rm×Rp。 λ = [ λ 1 , … , λ m ] T \boldsymbol{\lambda}=\left[\lambda_1, \ldots, \lambda_m\right]^{\mathrm{T}} λ=[λ1,…,λm]T和 ν = [ ν 1 , … , ν p ] T \boldsymbol{\nu}=\left[\nu_1, \ldots, \nu_p\right]^{\mathrm{T}} ν=[ν1,…,νp]T称作对偶变量或者Lagrange乘子。
Lagrange对偶函数(或对偶函数)定义为:
g ( λ , ν ) ≜ inf x ∈ D L ( x , λ , ν ) g(\boldsymbol{\lambda}, \boldsymbol{\nu}) \triangleq \inf _{\mathbf{x} \in \mathcal{D}} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu}) g(λ,ν)≜x∈DinfL(x,λ,ν)
注:不难发现Lagrange函数时凹函数。因为对偶函数是关于 ( λ , ν ) (\boldsymbol{\lambda}, \boldsymbol{\nu}) (λ,ν)的放射函数(凹函数)的逐点下确界,因此为凹。(利用上镜图epigraph可以证明。)
其定义域为: dom g = { ( λ , ν ) ∣ g ( λ , ν ) > − ∞ } \operatorname{dom} g=\{(\boldsymbol{\lambda}, \boldsymbol{\nu}) \mid g(\boldsymbol{\lambda}, \boldsymbol{\nu})>-\infty\} domg={(λ,ν)∣g(λ,ν)>−∞}
g ( λ , ν ) ⩽ p ⋆ g(\boldsymbol{\lambda}, \boldsymbol{\nu}) \leqslant p^{\star} g(λ,ν)⩽p⋆
证明:假设 x ~ \tilde{\mathbf{x}} x~是原问题的一个可行解,择优:
L ( x ~ , λ , ν ) = f 0 ( x ~ ) + ∑ i = 1 m λ i f i ( x ~ ) + ∑ i = 1 p ν i h i ( x ~ ) ⩽ f 0 ( x ~ ) \mathcal{L}(\tilde{\mathbf{x}}, \boldsymbol{\lambda}, \nu)=f_0(\tilde{\mathbf{x}})+\sum_{i=1}^m \lambda_i f_i(\tilde{\mathbf{x}})+\sum_{i=1}^p \nu_i h_i(\tilde{\mathbf{x}}) \leqslant f_0(\tilde{\mathbf{x}}) L(x~,λ,ν)=f0(x~)+i=1∑mλifi(x~)+i=1∑pνihi(x~)⩽f0(x~)
由此可得:
p ⋆ ⩾ f 0 ( x ~ ) ⩾ L ( x ~ , λ , ν ) ⩾ inf x ∈ D L ( x , λ , ν ) = g ( λ , ν ) p^{\star} \geqslant f_0(\tilde{\mathbf{x}}) \geqslant \mathcal{L}(\tilde{\mathbf{x}}, \boldsymbol{\lambda}, \boldsymbol{\nu}) \geqslant \inf _{\mathbf{x} \in \mathcal{D}} \mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\nu})=g(\boldsymbol{\lambda}, \boldsymbol{\nu}) p⋆⩾f0(x~)⩾L(x~,λ,ν)⩾x∈DinfL(x,λ,ν)=g(λ,ν)
函数 f : R n → R f: \mathbb{R}^n \rightarrow \mathbb{R} f:Rn→R的共轭函数记作 f ∗ : R n → R f^*: \mathbb{R}^n \rightarrow \mathbb{R} f∗:Rn→R,定义为:
f ∗ ( y ) = sup x ∈ dom f ( y T x − f ( x ) ) f^*(\mathbf{y})=\sup _{\mathbf{x} \in \operatorname{dom} f}\left(\mathbf{y}^{\mathrm{T}} \mathbf{x}-f(\mathbf{x})\right) f∗(y)=x∈domfsup(yTx−f(x))
其定义域为: dom f ∗ = { y ∣ f ∗ ( y ) < ∞ } \operatorname{dom} f^*=\left\{\mathbf{y} \mid f^*(\mathbf{y})<\infty\right\} domf∗={y∣f∗(y)<∞}
关于共轭函数一些重要的性质:
下面给出一些简单的凸函数的共轭函数:
一些重要的推论
其证明也很容易,假定 f f f是可微的,令 y = ∇ f ( x 0 ) ∈ dom f ∗ \mathbf{y}=\nabla f\left(\mathbf{x}_0\right) \in \operatorname{dom} f^* y=∇f(x0)∈domf∗,则 f ∗ ( y ) f^*(\mathbf{y}) f∗(y)的共轭为:
f ∗ ∗ ( x ) = sup y ∈ dom f ∗ { x T y − f ∗ ( y ) } = sup x 0 ∈ dom f { x T ∇ f ( x 0 ) − ∇ f ( x 0 ) T x 0 + f ( x 0 ) } = sup x 0 ∈ dom { f ( x 0 ) + ∇ f ( x 0 ) T ( x − x 0 ) } = f ( x ) \begin{aligned} f^{* *}(\mathbf{x}) &=\sup _{\mathbf{y} \in \operatorname{dom} f^*}\left\{\mathbf{x}^{\mathrm{T}} \mathbf{y}-f^*(\mathbf{y})\right\} \\ &=\sup _{\mathbf{x}_0 \in \operatorname{dom} f}\left\{\mathbf{x}^{\mathrm{T}} \nabla f\left(\mathbf{x}_0\right)-\nabla f\left(\mathbf{x}_0\right)^{\mathrm{T}} \mathbf{x}_0+f\left(\mathbf{x}_0\right)\right\} \\ &=\sup _{\mathbf{x}_0 \in \operatorname{dom}}\left\{f\left(\mathbf{x}_0\right)+\nabla f\left(\mathbf{x}_0\right)^{\mathrm{T}}\left(\mathbf{x}-\mathbf{x}_0\right)\right\}\\&=f(\mathbf{x}) \end{aligned} f∗∗(x)=y∈domf∗sup{xTy−f∗(y)}=x0∈domfsup{xT∇f(x0)−∇f(x0)Tx0+f(x0)}=x0∈domsup{f(x0)+∇f(x0)T(x−x0)}=f(x)
考虑如下仿射约束的优化问题:
min f 0 ( x ) s.t. x = 0 \begin{aligned} &\min f_0(\mathbf{x}) \\ &\text { s.t. } \mathbf{x}=\mathbf{0} \end{aligned} minf0(x) s.t. x=0
其Lagrange函数为: L ( x , ν ) = f 0 ( x ) + ν T x \mathcal{L}(\mathbf{x}, \boldsymbol{\nu})=f_0(\mathbf{x})+\boldsymbol{\nu}^{\mathrm{T}} \mathbf{x} L(x,ν)=f0(x)+νTx,其对应的对偶函数为:
g ( ν ) = inf x { f 0 ( x ) + ν T x } = − sup x { ( − ν ) T x − f 0 ( x ) } = − f 0 ∗ ( − ν ) \begin{aligned} g(\boldsymbol{\nu}) &=\inf _{\mathbf{x}}\left\{f_0(\mathbf{x})+\boldsymbol{\nu}^{\mathrm{T}} \mathbf{x}\right\} \\ &=-\sup _{\mathbf{x}}\left\{(-\boldsymbol{\nu})^{\mathrm{T}} \mathbf{x}-f_0(\mathbf{x})\right\}=-f_0^*(-\boldsymbol{\nu}) \end{aligned} g(ν)=xinf{f0(x)+νTx}=−xsup{(−ν)Tx−f0(x)}=−f0∗(−ν)
其中, dom g = − dom f 0 ∗ \operatorname{dom} g=-\operatorname{dom} f_0^* domg=−domf0∗。由于 f 0 ∗ f_0^* f0∗是凸的,所以 g ( ν ) g(\boldsymbol{\nu}) g(ν)是凹的。
考虑更一般的优化问题形式:
min f 0 ( x ) s.t. A x ⪯ b , C x = d \begin{array}{ll} \min & f_0(\mathbf{x}) \\ \text { s.t. } & \mathbf{A x} \preceq \mathbf{b}, \mathbf{C x}=\mathbf{d} \end{array} min s.t. f0(x)Ax⪯b,Cx=d
此时有:
g ( λ , ν ) = inf x { f 0 ( x ) + λ T ( A x − b ) + ν T ( C x − d ) } = − b T λ − d T ν + inf x { f 0 ( x ) + ( A T λ + C T ν ) T x } = − b T λ − d T ν − sup x { − ( A T λ + C T ν ) T x − f 0 ( x ) } = − b T λ − d T ν − f 0 ∗ ( − A T λ − C T ν ) \begin{aligned} g(\boldsymbol{\lambda}, \boldsymbol{\nu}) &=\inf _{\mathbf{x}}\left\{f_0(\mathbf{x})+\boldsymbol{\lambda}^{\mathrm{T}}(\mathbf{A} \mathbf{x}-\mathbf{b})+\boldsymbol{\nu}^{\mathrm{T}}(\mathbf{C} \mathbf{x}-\mathbf{d})\right\} \\ &=-\mathbf{b}^{\mathrm{T}} \boldsymbol{\lambda}-\mathbf{d}^{\mathrm{T}} \boldsymbol{\nu}+\inf _{\mathbf{x}}\left\{f_0(\mathbf{x})+\left(\mathbf{A}^{\mathrm{T}} \boldsymbol{\lambda}+\mathbf{C}^{\mathrm{T}} \boldsymbol{\nu}\right)^{\mathrm{T}} \mathbf{x}\right\} \\ &=-\mathbf{b}^{\mathrm{T}} \boldsymbol{\lambda}-\mathbf{d}^{\mathrm{T}} \boldsymbol{\nu}-\sup _{\mathbf{x}}\left\{-\left(\mathbf{A}^{\mathrm{T}} \boldsymbol{\lambda}+\mathbf{C}^{\mathrm{T}} \boldsymbol{\nu}\right)^{\mathrm{T}} \mathbf{x}-f_0(\mathbf{x})\right\} \\ &=-\mathbf{b}^{\mathrm{T}} \boldsymbol{\lambda}-\mathbf{d}^{\mathrm{T}} \boldsymbol{\nu}-f_0^*\left(-\mathbf{A}^{\mathrm{T}} \boldsymbol{\lambda}-\mathbf{C}^{\mathrm{T}} \boldsymbol{\nu}\right) \end{aligned} g(λ,ν)=xinf{f0(x)+λT(Ax−b)+νT(Cx−d)}=−bTλ−dTν+xinf{f0(x)+(ATλ+CTν)Tx}=−bTλ−dTν−xsup{−(ATλ+CTν)Tx−f0(x)}=−bTλ−dTν−f0∗(−ATλ−CTν)
其中, dom g = { ( λ , ν ) ∣ − ( A T λ + C T ν ) ∈ dom f 0 ∗ } \operatorname{dom} g=\left\{(\boldsymbol{\lambda}, \boldsymbol{\nu}) \mid-\left(\mathbf{A}^{\mathrm{T}} \boldsymbol{\lambda}+\mathbf{C}^{\mathrm{T}} \boldsymbol{\nu}\right) \in \operatorname{dom} f_0^*\right\} domg={(λ,ν)∣−(ATλ+CTν)∈domf0∗}。
例如,最大熵问题:
max { ∑ i = 1 n x i log 1 x i } ≡ min { f 0 ( x ) ≜ ∑ i = 1 n x i log x i } s.t. x ∈ R + n , 1 n T x = 1 s.t. x ∈ R + n , 1 n T x = 1 \begin{aligned} &\max \left\{\sum_{i=1}^n x_i \log \frac{1}{x_i}\right\} \equiv \min \left\{f_0(\mathbf{x}) \triangleq \sum_{i=1}^n x_i \log x_i\right\}\\ &\text { s.t. } \mathbf{x} \in \mathbb{R}_{+}^n, \mathbf{1}_n^{\mathrm{T}} \mathbf{x}=1 \quad \text { s.t. } \mathbf{x} \in \mathbb{R}_{+}^n, \mathbf{1}_n^{\mathrm{T}} \mathbf{x}=1 \end{aligned} max{i=1∑nxilogxi1}≡min{f0(x)≜i=1∑nxilogxi} s.t. x∈R+n,1nTx=1 s.t. x∈R+n,1nTx=1
由上面常见凸函数的共轭函数有:
f 0 ∗ ( y ) = sup x ∈ R + n { y T x − f 0 ( x ) } = ∑ i = 1 n sup x i ∈ R + { y i x i − x i log x i } = ∑ i = 1 n e y i − 1 \begin{aligned} f_0^*(\mathbf{y}) &=\sup _{\mathbf{x} \in \mathbb{R}_{+}^n}\left\{\mathbf{y}^{\mathrm{T}} \mathbf{x}-f_0(\mathbf{x})\right\} \\ &=\sum_{i=1}^n \sup _{x_i \in \mathbb{R}_{+}}\left\{y_i x_i-x_i \log x_i\right\}=\sum_{i=1}^n \mathrm{e}^{y_i-1} \end{aligned} f0∗(y)=x∈R+nsup{yTx−f0(x)}=i=1∑nxi∈R+sup{yixi−xilogxi}=i=1∑neyi−1
其中, dom f 0 ∗ = R n \operatorname{dom} f_0^*=\mathbb{R}^n domf0∗=Rn。令: A = − I n , b = 0 , C = 1 n T \mathbf{A}=-\mathbf{I}_n, \mathbf{b}=\mathbf{0}, \mathbf{C}=1_n^{\mathrm{T}} A=−In,b=0,C=1nT,则有:
g ( λ , ν ) = − ν − f 0 ∗ ( λ − 1 n ν ) = − ν − ∑ i = 1 n e λ i − ν − 1 = − ν − e − ν − 1 ∑ i = 1 n e λ i \begin{aligned} g(\boldsymbol{\lambda}, \nu) &=-\nu-f_0^*\left(\boldsymbol{\lambda}-\mathbf{1}_n \nu\right) \\ &=-\nu-\sum_{i=1}^n \mathrm{e}^{\lambda_i-\nu-1}=-\nu-\mathrm{e}^{-\nu-1} \sum_{i=1}^n \mathrm{e}^{\lambda_i} \end{aligned} g(λ,ν)=−ν−f0∗(λ−1nν)=−ν−i=1∑neλi−ν−1=−ν−e−ν−1i=1∑neλi
总结:优化问题(不一定为凸)的Lagrange对偶函数可以很容易地通过共轭函数得到。