证明:f(X)=ln(e^x1+e^x2+...+e^xn)是凸函数

证明:f(X)=ln(ex1+ex2+…+exn)是凸函数.

方法一:使用定义证明

设X,Y是Rn上的两个向量,0<=a<=1
f ( a X + ( 1 − a ) Y ) = ln ⁡ ( e a x 1 + ( 1 − a ) y 1 + e a x 2 + ( 1 − a ) y 2 + ⋯ + e a x n + ( 1 − a ) y n ) = ln ⁡ ( e a x 1 ⋅ e ( 1 − a ) y 1 + e a x 2 ⋅ e ( 1 − a ) y 2 + ⋯ + e a x n ⋅ e ( 1 − a ) y n ) ≤ ln ⁡ ( ( e x 1 + e x 2 + ⋯ + e x n ) a × ( e y 1 + e y 2 + ⋯ + e y n ) 1 − a ) = a f ( X ) + ( 1 − a ) f ( Y ) \begin{aligned} f(aX+(1-a)Y) &= \ln(e^{ax1+(1-a)y1}+e^{ax2+(1-a)y2}+\cdots+e^{axn+(1-a)yn})\\ &= \ln(e^{ax1}\cdot e^{(1-a)y1}+e^{ax2}\cdot e^{(1-a)y2}+\cdots+e^{axn}\cdot e^{(1-a)yn})\\ &\le \ln( (e^{x1}+e^{x2}+\cdots+e^{xn})^a\times(e^{y1}+e^{y2}+\cdots+e^{yn})^{1-a}) \\ &= af(X)+(1-a)f(Y) \end{aligned} f(aX+(1a)Y)=ln(eax1+(1a)y1+eax2+(1a)y2++eaxn+(1a)yn)=ln(eax1e(1a)y1+eax2e(1a)y2++eaxne(1a)yn)ln((ex1+ex2++exn)a×(ey1+ey2++eyn)1a)=af(X)+(1a)f(Y)
该不等式是由HÖlder不等式得到的。

HÖlder不等式:
X T Y < = ∥ X ∥ q ∥ Y ∥ p , 1 / q + 1 / p = 1 X^TY<=\|X\|_q\|Y\|_p, 1/q+1/p=1 XTY<=XqYp,1/q+1/p=1
令:

  1. X=(eax1,eax2,…,eaxn)T,
  2. Y = (e(1-a)y1,e(1-a)y2,…,e(1-a)yn)T,
  3. q=1/a,
  4. p=1/(1-a)

即可得到上述证明中的不等号左右两端。

方法二:使用凸函数的二阶充要条件证明

设g(X)=ex1+ex2+…+exn, Z = (ex1,ex2,…,exn)T, Y = (y1, y2, … , yn)T
∇ f ( X ) = ( e x 1 , e x 2 , ⋯   , e x n ) T / g ( X ) = Z / g ( X ) ∇ 2 f ( X ) = ∂ 2 f ( X ) ∂ X ∂ X T = ∂ Z / g ( X ) ∂ X T = ( ( ∇ Z ) T g ( X ) − Z ∇ T g ( X ) ) / g 2 ( X ) = { d i a g [ Z ] g ( X ) − Z Z T } / g 2 ( X ) Y T ∇ 2 f ( X ) Y = 1 g 2 ( X ) ( Y T d i a g [ Z ] Y g ( X ) − Y T ( Z Z T ) Y ) = 1 g 2 ( X ) ( [ ( e x 1 + e x 2 + . . . + e x n ) ( ∑ i = 1 n z i y i 2 ) ] − ( ∑ i = 1 n z i y i ) 2 = 1 g 2 ( X ) ( ∑ i = 1 n z i ∑ i = 1 n z i y i 2 − ( ∑ i = 1 n z i z i y i ) 2 ) ≥ 0 ∇ 2 f ( X ) ⪰ 0 \begin{aligned} \nabla f(X) &= (e^{x1},e^{x2},\cdots,e^{xn})^T/g(X) = Z/g(X)\\ \nabla^2f(X) &= \frac{\partial ^2f(X)}{\partial X \partial X^T}\\ &= \frac{\partial Z/g(X)}{\partial X^T} \\ &= ((\nabla Z)^Tg(X)-Z \nabla ^Tg(X))/g^2(X) \\ &= \{diag[Z]g(X)-Z Z^T\}/g^2(X) \\ Y^T\nabla^2f(X)Y &= \frac{1}{g^2(X)}(Y^Tdiag[Z]Yg(X)-Y^T(ZZ^T)Y)\\ &= \frac{1}{g^2(X)}([(e^{x1}+e^{x2}+...+e^{xn})(\sum_{i=1}^nz_iy_i^2) ]- (\sum_{i=1}^nz_iy_i)^2\\ &= \frac{1}{g^2(X)}(\sum_{i=1}^nz_i\sum_{i=1}^nz_iy_i^2- (\sum_{i=1}^n \sqrt{z_i} \sqrt{z_i}y_i)^2)\\ &\ge0\\ \nabla^2f(X) &\succeq 0 \end{aligned} f(X)2f(X)YT2f(X)Y2f(X)=(ex1,ex2,,exn)T/g(X)=Z/g(X)=XXT2f(X)=XTZ/g(X)=((Z)Tg(X)ZTg(X))/g2(X)={diag[Z]g(X)ZZT}/g2(X)=g2(X)1(YTdiag[Z]Yg(X)YT(ZZT)Y)=g2(X)1([(ex1+ex2+...+exn)(i=1nziyi2)](i=1nziyi)2=g2(X)1(i=1nzii=1nziyi2(i=1nzi zi yi)2)00
所以f(X)是凸函数。

这里的不等式,是由柯西不等式得到的。
∥ X T Y ∥ ≤ ∥ X ∥ ∥ Y ∥ \| X^TY\|\le \|X\|\|Y\| XTYX∥∥Y
令:

  1. X = ( z 1 , z 2 , ⋯   , z n ) T X = (\sqrt{z_1},\sqrt{z_2},\cdots,\sqrt{z_n})^T X=(z1 ,z2 ,,zn )T
  2. Y = ( z 1 y 1 , z 2 y 2 , ⋯   , z n y n ) T Y = (\sqrt{z_1}y_1,\sqrt{z_2}y_2,\cdots,\sqrt{z_n}y_n)^T Y=(z1 y1,z2 y2,,zn yn)T
  3. ∥ X T Y ∥ ) 2 ≤ ∥ X ∥ 2 ∥ Y ∥ 2 \| X^TY\|)^2\le \|X\|^2\|Y\|^2 XTY)2X2Y2

你可能感兴趣的:(高等数学笔记,凸优化,凸函数,最优化理论)