给定最小化问题
min x ∈ R n f ( x )    ,      subject to      h ( x ) ≤ 0 ,   l ( x ) = 0 \mathop{\min}\limits_{x\in\mathbf{R}^n}f(x) \;,\;\;\text{ subject to }\;\; h(x)\leq0,\, l(x)=0 x∈Rnminf(x), subject to h(x)≤0,l(x)=0为了处理简单同时更便于洞察到问题本质,这里只假定存在单个限制函数。
定义 Lagrange对偶函数 为:
g ( u , v ) = min x ∈ R n L ( x , u , v ) = min x ∈ R n { f ( x ) + u h ( x ) + v l ( x ) } g(u,v)=\mathop{\min}\limits_{x\in\mathbf{R}^n}L(x,u,v)=\mathop{\min}\limits_{x\in\mathbf{R}^n}\{f(x)+uh(x)+vl(x)\} g(u,v)=x∈RnminL(x,u,v)=x∈Rnmin{f(x)+uh(x)+vl(x)} 则原问题的对偶问题是
max u , v g ( u , v )    ,      subject to      u ≥ 0 , v ≥ 0 \mathop{\max}\limits_{u,v}g(u,v) \;,\;\;\text{ subject to }\;\; u\ge 0, v\ge 0 u,vmaxg(u,v), subject to u≥0,v≥0 性质:
(1) 对偶问题是凸问题(也就是说,不管原函数 f f f 是否为凸函数,对偶函数 g g g 都是凸函数);
(2) 弱对偶:假设 f ∗ f^* f∗ 和 g ∗ g^* g∗ 分别是原问题和对偶问题的最优值,则 g ∗ ≤ f ∗ g^*\leq f^* g∗≤f∗。
事实上,假设 ( u ∗ , v ∗ ) (u^*,v^*) (u∗,v∗) 是对偶问题的最优解,则 g ∗ = g ( u ∗ , v ∗ ) = min x ∈ R n L ( x , u ∗ , v ∗ ) ≤ min x ∈ R n f ( x ) = f ∗ g^*=g(u^*,v^*)=\mathop{\min}\limits_{x\in\mathbf{R}^n}L(x,u^*,v^*)\leq \mathop{\min}\limits_{x\in\mathbf{R}^n} f(x) =f^* g∗=g(u∗,v∗)=x∈RnminL(x,u∗,v∗)≤x∈Rnminf(x)=f∗ 其中的 ≤ \leq ≤ 是由于原问题的限制条件。
(3) 强对偶:假设原问题是凸问题,且存在 strictly feasible point, 也就是说存在 x ′ x' x′ 使得 h ( x ′ ) < 0 , l ( x ′ ) = 0 h(x')<0,l(x')=0 h(x′)<0,l(x′)=0,则 f ∗ = g ∗ f^*= g^* f∗=g∗。 如何证明???
(4) 给定原问题的可行点 x x x 和对偶问题的可行点 u , v u,v u,v,定义对偶间距(duality gap)为: D ( x , u , v ) = f ( x ) − g ( u , v ) D(x,u,v)=f(x)-g(u,v) D(x,u,v)=f(x)−g(u,v) 如果 D ( x , u , v ) = 0 D(x,u,v)=0 D(x,u,v)=0,则 x x x 是原问题的最优解, u , v u,v u,v 是对偶问题的最优解。
Proof. 因为 g ∗ ≤ f ∗ g^*\leq f^* g∗≤f∗,所以 f ( x ) − f ∗ ≤ f ( x ) − g ∗ ≤ f ( x ) − g ( u , v ) = D ( x , u , v ) f(x)-f^*\leq f(x)-g^*\leq f(x)-g(u,v)=D(x,u,v) f(x)−f∗≤f(x)−g∗≤f(x)−g(u,v)=D(x,u,v),进而 f ( x ) = f ∗ , g ( u , v ) = g ∗ f(x)=f^*, g(u,v)=g^* f(x)=f∗,g(u,v)=g∗。 □ \Box □
Remark. 对偶间距 D ( x , u , v ) D(x,u,v) D(x,u,v) 可用于优化收敛算法:如果 D ( x , u , v ) < ϵ D(x,u,v)<\epsilon D(x,u,v)<ϵ,则 f ( x ) − f ∗ < ϵ f(x)-f^*<\epsilon f(x)−f∗<ϵ。
这里我们用一般的形式
min x ∈ R n f ( x ) \mathop{\min}\limits_{x\in\mathbf{R}^n}f(x) x∈Rnminf(x) subject to      h i ( x ) ≤ 0 ,   i = 1 , ⋯   , s \text{subject to}\;\;\qquad h_i(x)\leq0,\, i=1,\cdots,s\qquad\qquad\qquad subject tohi(x)≤0,i=1,⋯,s      l j ( x ) = 0 ,   j = 1 , ⋯   , t \;\;l_j(x)=0,\,j=1,\cdots,t lj(x)=0,j=1,⋯,t 定义 Lagrange函数:
L ( x , u , v ) = f ( x ) + u T h ( x ) + v T l ( x ) \mathcal{L}(x,u,v)=f(x)+u^Th(x)+v^Tl(x) L(x,u,v)=f(x)+uTh(x)+vTl(x) 这里 u ∈ R + s , v ∈ R + t , h = ( h 1 , ⋯   , h s ) , l = ( l 1 , ⋯   , l t ) . u\in\mathbf{R}^s_{+}, v\in\mathbf{R}^t_+, h=(h_1,\cdots,h_s), l=(l_1,\cdots,l_t). u∈R+s,v∈R+t,h=(h1,⋯,hs),l=(l1,⋯,lt).
定理. x ∗ x^* x∗ 是原问题的最优解,当且仅当存在唯一的 u ∗ , v ∗ u^*,v^* u∗,v∗,使得 KKT条件成立:
Example. P8 in Robert P. Rooderkerk, Harald J. van Heerde, Robust Optimization of the 0-1 Knapsack Problem: Balancing Risk and Return in Assortment Optimization, European Journal of Operational Research.
( RAO Robust ) (\text{RAO}_{\text{Robust}}) (RAORobust) max x min p ∈ U ∑ k = 1 n p k x k \mathop{\max}\limits_{x}\mathop{\min}\limits_{p\in U}\mathop{\sum}\limits_{k=1}^{n}p_kx_k xmaxp∈Umink=1∑npkxk subject to ∑ k = 1 n w k x k ≤ c ,    x k ∈ { 0 , 1 } ,      k = 1 , ⋯   , n \text{subject to }\qquad \mathop{\sum}\limits_{k=1}^{n}w_kx_k\leq c, \;x_k\in\{0,1\},\;\;k=1,\cdots,n subject to k=1∑nwkxk≤c,xk∈{0,1},k=1,⋯,n with the uncertertainty set defined as follows: U = { p ∣ ( p − p ˉ ) T Θ − 1 ( p − p ˉ ) ≤ r 2 } U=\left\{p|(p-\bar{p})^T\Theta^{-1}(p-\bar{p})\leq r^2\right\} U={p∣(p−pˉ)TΘ−1(p−pˉ)≤r2} where Θ \Theta Θ is a symmetric positive definite matrix with elements θ k k ′ \theta_{kk'} θkk′. Then ( RAO Robust ) (\text{RAO}_{\text{Robust}}) (RAORobust) can be rewrite as follows: max x ∑ k = 0 n p ˉ k x k − r ∑ k = 1 n ∑ k ′ = 1 n θ k k ′ x k x k ′ \mathop{\max}\limits_{x}\mathop{\sum}\limits_{k=0}^{n}\bar{p}_kx_k-r\sqrt{\mathop{\sum}\limits_{k=1}^{n}\mathop{\sum}\limits_{k'=1}^{n}\theta_{kk'}x_kx_{k'}} xmaxk=0∑npˉkxk−rk=1∑nk′=1∑nθkk′xkxk′ subject to ∑ k = 1 n w k x k ≤ c ,    x k ∈ { 0 , 1 } ,      k = 1 , ⋯   , n \text{subject to }\qquad \mathop{\sum}\limits_{k=1}^{n}w_kx_k\leq c, \;x_k\in\{0,1\},\;\;k=1,\cdots,n subject to k=1∑nwkxk≤c,xk∈{0,1},k=1,⋯,n Proof. We only need to consider the pertubation part min p ∈ U ∑ k = 1 n p ~ k x k \mathop{\min}\limits_{p\in U}\mathop{\sum}\limits_{k=1}^{n}\tilde{p}_kx_k p∈Umink=1∑np~kxk, define f ( p ~ ) = p ~ T x      ,        h ( p ~ ) = p ~ T Θ − 1 p ~ − r 2 f(\tilde{p})=\tilde{p}^Tx\;\;,\;\;\;h(\tilde{p})=\tilde{p}^T\Theta^{-1}\tilde{p}-r^2 f(p~)=p~Tx,h(p~)=p~TΘ−1p~−r2 and then Lagrange: L ( p ~ , u ) = p ~ T x + u ( p ~ T Θ − 1 p ~ − r 2 ) \mathcal{L}(\tilde{p},u)=\tilde{p}^Tx+u\left(\tilde{p}^T\Theta^{-1}\tilde{p}-r^2\right) L(p~,u)=p~Tx+u(p~TΘ−1p~−r2) Using the KKT conditions, if p ~ ∗ , u ∗ \tilde{p}^*,u^* p~∗,u∗ is the optimal solutions, we have
(1) ∇ p ~ L ( p ~ ∗ , u ∗ ) = x + 2 u ∗ Θ − 1 p ~ ∗ = 0 \nabla_{\tilde{p}}\mathcal{L}(\tilde{p}^*,u^*)=x+2u^*\Theta^{-1}\tilde{p}^*\tag{1}=0 ∇p~L(p~∗,u∗)=x+2u∗Θ−1p~∗=0(1) (2) p ~ ∗ T Θ − 1 p ~ ∗ = r 2 {\tilde{p}^*}^T\Theta^{-1}\tilde{p}^*=r^2\tag{2} p~∗TΘ−1p~∗=r2(2) From (1), we conclude p ~ ∗ = − 1 2 u ∗ Θ x \tilde{p}^*=-\frac{1}{2u^*}\Theta x p~∗=−2u∗1Θx, together with (2) we get 1 2 u ∗ = r 2 x T Θ x \frac{1}{2u^*}=\sqrt{\frac{r^2}{{x}^T\Theta x}} 2u∗1=xTΘxr2, so
min p ∈ U ∑ k = 1 n p ~ k x k = x T p ~ ∗ = − r 2 x T Θ x x T Θ x = − r x T Θ x \mathop{\min}\limits_{p\in U}\mathop{\sum}\limits_{k=1}^{n}\tilde{p}_kx_k ={x}^T\tilde{p}^*=-\sqrt{\frac{r^2}{{x}^T\Theta x}}{x}^T\Theta x=-r\sqrt{{x}^T\Theta x} p∈Umink=1∑np~kxk=xTp~∗=−xTΘxr2xTΘx=−rxTΘx i.e. min p ∈ U ∑ k = 1 n p k x k = ∑ k = 0 n p ˉ k x k − r ∑ k = 1 n ∑ k ′ = 1 n θ k k ′ x k x k ′ . □ \qquad\mathop{\min}\limits_{p\in U}\mathop{\sum}\limits_{k=1}^{n}p_kx_k=\mathop{\sum}\limits_{k=0}^{n}\bar{p}_kx_k-r\sqrt{\mathop{\sum}\limits_{k=1}^{n}\mathop{\sum}\limits_{k'=1}^{n}\theta_{kk'}x_kx_{k'}}.\qquad\qquad\Box p∈Umink=1∑npkxk=k=0∑npˉkxk−rk=1∑nk′=1∑nθkk′xkxk′.□