Lagrange 对偶,原优化问题
min f 0 ( x ) s . t . f i ( x ) ≤ 0 h i ( x ) ≤ 0 \begin{aligned} &\min &f_0(x) \\ &s.t. &f_i(x) \le 0\\ &&h_i(x) \le 0\end{aligned} mins.t.f0(x)fi(x)≤0hi(x)≤0
对偶函数为
g ( λ , ν ) = inf x ∈ D L ( x , λ , ν ) = inf x ∈ D ( f 0 ( x ) + ∑ λ i f i ( x ) + ∑ ν i h i ) g(\lambda,\nu)=\inf_{x\in \mathcal{D}} L(x,\lambda,\nu)=\inf_{x\in \mathcal{D}} \left(f_0(x) +\sum \lambda_if_i(x) +\sum \nu_i h_i \right) g(λ,ν)=x∈DinfL(x,λ,ν)=x∈Dinf(f0(x)+∑λifi(x)+∑νihi)
inf \inf inf可以取到 − ∞ -\infty −∞
定义 d o m g = { ( λ , ν ) ∣ g ( λ , ν ) > − ∞ } dom\,g=\{(\lambda,\nu)\mid g(\lambda,\nu) \gt -\infty \} domg={(λ,ν)∣g(λ,ν)>−∞}
弱对偶性:设 p ⋆ p^\star p⋆为原问题的最优值,对任意 λ ⪰ 0 , ν \lambda \succeq 0,\nu λ⪰0,ν成立
g ( λ , ν ) ≤ p ⋆ g(\lambda,\nu) \le p^\star g(λ,ν)≤p⋆
d ⋆ = sup g ( λ , ν ) ≤ p ⋆ d^\star=\sup g(\lambda,\nu)\le p^\star d⋆=supg(λ,ν)≤p⋆
p ⋆ − d ⋆ p^\star - d^\star p⋆−d⋆称为对偶间隔,如果对偶间隔为0,则称强对偶性成立
强对偶性成立有很多的准则,有一个简单的准则叫Slater条件(链接有证明)
几何理解见:知乎回答
其中直观解释了,当优化问题是凸的时候,凸优化问题的上镜图投影是凸的,而凸集在边界点存在一个支撑超平面。不过我不知道为什么凸优化问题大部分情况下强对偶都成立的原因(从这里貌似看不出)
如果强对偶成立(无论原问题是否凸),则
f 0 ( x ⋆ ) = g ( λ ⋆ , ν ⋆ ) = inf ( f 0 ( x ) + ∑ λ i ⋆ f i ( x ) + ∑ ν i ⋆ h i ( x ) ) ≤ f 0 ( x ⋆ ) + ∑ λ i ⋆ f i ( x ⋆ ) + ∑ ν i ⋆ h i ( x ⋆ ) ≤ f 0 ( x ⋆ ) \begin{aligned}f_0(x^\star)&=g(\lambda^\star,\nu^\star)\\ &=\inf \left(f_0(x) +\sum \lambda_i^\star f_i(x) +\sum \nu_i^\star h_i(x) \right)\\ &\le f_0(x^\star) +\sum \lambda_i^\star f_i(x^\star) +\sum \nu_i^\star h_i(x^\star)\\ &\le f_0(x^\star) \end{aligned} f0(x⋆)=g(λ⋆,ν⋆)=inf(f0(x)+∑λi⋆fi(x)+∑νi⋆hi(x))≤f0(x⋆)+∑λi⋆fi(x⋆)+∑νi⋆hi(x⋆)≤f0(x⋆)
故 λ i ⋆ f i ( x ⋆ ) = 0 \lambda_i^\star f_i(x^\star)=0 λi⋆fi(x⋆)=0,这个等式就叫互补松弛性
由互补松弛性,最优值的梯度条件,优化问题的约束,整合到一起就是KKT条件。它是强对偶性成立的必要条件
f i ( x ⋆ ) ≤ 0 h ( x ⋆ ) = 0 λ i ⋆ ≥ 0 λ i ⋆ f i ( x ⋆ ) = 0 ∇ f 0 ( x ⋆ ) + ∑ λ i ⋆ ∇ f i ( x ⋆ ) + ∑ ν i ⋆ ∇ h i ( x ⋆ ) = 0 f_i(x^\star)\le 0\\ h(x^\star)=0\\ \lambda_i^\star \ge 0\\ \lambda_i^\star f_i(x^\star)=0\\\\ \nabla f_0(x^\star) +\sum \lambda_i^\star \nabla f_i(x^\star) +\sum \nu_i^\star \nabla h_i(x^\star)=0 fi(x⋆)≤0h(x⋆)=0λi⋆≥0λi⋆fi(x⋆)=0∇f0(x⋆)+∑λi⋆∇fi(x⋆)+∑νi⋆∇hi(x⋆)=0
如果原问题为凸,则它也是强对偶的充分条件
因为
g ( λ ⋆ , ν ⋆ ) = L ( x ⋆ , λ ⋆ , ν ⋆ ) = f 0 ( x ⋆ ) + ∑ λ i ⋆ f i ( x ⋆ ) + ∑ ν i ⋆ h i ( x ⋆ ) = f 0 ( x ⋆ ) \begin{aligned}g(\lambda^\star,\nu^\star)&=L(x^\star,\lambda^\star,\nu^\star)\\ &= f_0(x^\star) +\sum \lambda_i^\star f_i(x^\star) +\sum \nu_i^\star h_i(x^\star)\\ &= f_0(x^\star) \end{aligned} g(λ⋆,ν⋆)=L(x⋆,λ⋆,ν⋆)=f0(x⋆)+∑λi⋆fi(x⋆)+∑νi⋆hi(x⋆)=f0(x⋆)
一个强对偶不成立的凸优化例子和几何解释
min e − x s . t . x 2 / y ≤ 0 \min e^{-x}\\ s.t.\quad x^2/y \le 0 mine−xs.t.x2/y≤0
定义域为 D = { ( x , y ) ∣ y > 0 } \mathcal{D}=\{(x,y)\mid y>0\} D={(x,y)∣y>0}
L ( x , y , λ ) = e − x + λ x 2 / y L(x,y,\lambda)=e^{-x}+\lambda x^2/y L(x,y,λ)=e−x+λx2/y
p ⋆ = 1 , d ⋆ = 0 p^\star=1,d^\star=0 p⋆=1,d⋆=0,对偶间隙为1
可行域只有一个点, g ( λ ) g(\lambda) g(λ)为0,所以 λ u + t = g ( λ ) \lambda u + t=g(\lambda) λu+t=g(λ)不是上镜图的支撑平面