全部笔记的汇总贴(视频也有传送门):中科大-凸优化
f ( x ) = log ( e x 1 + ⋯ + e x n ) x ∈ R n f(x)=\log(e^{x_1}+\cdots+e^{x_n})\;\;\;\;x\in\R^n f(x)=log(ex1+⋯+exn)x∈Rn
max { x 1 , ⋯ , x n } ≤ f ( x ) ≤ max { x 1 + ⋯ + x n } + log n \max\{x_1,\cdots,x_n\}\le f(x)\le\max\{x_1+\cdots+x_n\}+\log n max{ x1,⋯,xn}≤f(x)≤max{ x1+⋯+xn}+logn
∂ f ∂ x i = e x i e x 1 + ⋯ + e x n , H = [ H i j ] \frac{\partial f}{\partial x_i}=\frac{e^{x_i}}{e^{x_1}+\cdots+e^{x_n}},\;\;H=\Big[\;H_{ij}\;\Big] ∂xi∂f=ex1+⋯+exnexi,H=[Hij]
当 i ≠ j i\neq j i=j时, ∂ 2 f ∂ x i ∂ y i = − e x i e x j ( e x 1 + ⋯ + e e n ) 2 = − e x i e x j ( 1 ∣ ∣ z ∣ ∣ ) 2 \frac{\partial^2 f}{\partial x_i \partial y_i}=\frac{-e^{x_i}e^{x_j}}{(e^{x_1}+\cdots+e^{e_n})^2}=\frac{-e^{x_i}e^{x_j}}{(1||z||)^2} ∂xi∂yi∂2f=(ex1+⋯+een)2−exiexj=(1∣∣z∣∣)2−exiexj
当 i = j i=j i=j时, ∂ 2 f ∂ x i 2 = − e x i e x i + e x i ( e x 1 + ⋯ + e x n ) ( e x 1 + ⋯ + e e n ) 2 = − e x i e x i + e x i 1 T z ( 1 ∣ ∣ z ∣ ∣ ) 2 \frac{\partial^2 f}{\partial x_i^2}=\frac{-e^{x_i}e^{x_i}+e^{x_i}(e^{x_1}+\cdots+e^{x_n})}{(e^{x_1}+\cdots+e^{e_n})^2}=\frac{-e^{x_i}e^{x_i}+e^{x_i}1^Tz}{(1||z||)^2} ∂xi2∂2f=(ex1+⋯+een)2−exiexi+exi(ex1+⋯+exn)=(1∣∣z∣∣)2−exiexi+exi1Tz
其中 z = [ e x 1 , ⋯ , e x n ] T z=[e^{x_1},\cdots,e^{x_n}]^T z=[ex1,⋯,exn]T
H = 1 ( 1 T z ) 2 ⏟ > 0 ( ( 1 T z ) d i a g { z } − z z T ) ⏟ K ∈ R n ∗ n H=\underset{>0}{\underbrace{\frac1{(1^Tz)^2}}}\underset{K\in\R^{n*n}}{\underbrace{((1^Tz)diag\{z\}-zz^T)}} H=>0 (1Tz)21K∈Rn∗n ((1Tz)diag{ z}−zzT)
∀ v ∈ R n v T K v ≥ 0 \forall v\in\R^n\;\;\;\;\;v^TKv\ge0 ∀v∈RnvTKv≥0
v T K v = ( 1 T z ) v T d i a g { z } v − v T z z T v = ( ∑ i z i ) ⏟ b T b ( ∑ i v i 2 z i ) ⏟ a T a − ( ∑ i v i z i ) 2 ⏟ a T b v^TKv=(1^Tz)v^Tdiag\{z\}v-v^Tzz^Tv\\=\underset{b^Tb}{\underbrace{(\sum_iz_i)}}\underset{a^Ta}{\underbrace{(\sum_iv_i^2z_i)}}-\underset{a^Tb}{\underbrace{(\sum_iv_iz_i)^2}} vTKv=(1Tz)vTdiag{ z}v−vTzzTv=bTb (i∑zi)aTa (i∑vi2zi)−aTb (i∑vizi)2
a i = v i z i b i = z i a_i=v_i\sqrt{z_i}\;\;\;\;b_i=\sqrt{z_i} ai=vizibi=zi
v T K v = ( b T b ) ( a T a ) − ( a T b ) 2 ≥ 0 v^TKv=(b^Tb)(a^Ta)-(a^Tb)^2\ge0 vTKv=(bTb)(aTa)−(aTb)2≥0
Cachy-Schwartz不等式
⇒ \Rightarrow ⇒log-sum-exp是凸函数
f ( x ) = ( x 1 ⋅ … ⋅ x n ) 1 n x ∈ R + + n f(x)=(x_1\cdot…\cdot x_n)^{\frac1n}\;\;\;\;x\in\R^n_{++} f(x)=(x1⋅…⋅xn)n1x∈R++n
是个凹函数,这里限制每一个分量都非负主要是不想考虑复数的情况。
f ( x ) = log d e t ( x ) d o m f = S + + n f(x)=\log det(x)\;\;\;\;dom f=S_{++}^n f(x)=logdet(x)domf=S++n
当 n = 1 n=1 n=1时,是凹函数;
当 n > 1 n>1 n>1时, ∀ z ∈ S + + n , ∀ t ∈ R , ∀ v ∈ R n ∗ n \forall z\in S_{++}^n,\forall t\in\R,\forall v\in\R^{n*n} ∀z∈S++n,∀t∈R,∀v∈Rn∗n
z + t v ∈ S + + n = d o m f , 故 v ∈ S n z+tv\in S_{++}^n=dom f,故v\in S^n z+tv∈S++n=domf,故v∈Sn
g ( t ) = f ( z + t v ) = log d e t ( z + t v ) = log d e t { z 1 2 ( I + t z − 1 2 v z 1 2 ) z 1 2 } = log d e t { z } + log d e t { I + t z − 1 2 v z 1 2 ⏟ λ i 为 该 矩 阵 的 特 征 值 } = log d e t { z } + ∑ i = 1 n log ( 1 + t λ i ) g(t)=f(z+tv)=\log det(z+tv)\\=\log det\{z^{\frac12}(I+tz^{-\frac12}vz^{\frac12})z^{\frac12}\}\\=\log det\{z\}+\log det\{I+\underset{\lambda_i为该矩阵的特征值}{\underbrace{tz^{-\frac12}vz^{\frac12}}}\}\\=\log det\{z\}+\sum_{i=1}^n\log(1+t\lambda_i) g(t)=f(z+tv)=logdet(z+tv)=logdet{ z21(I+tz−21vz21)z21}=logdet{ z}+logdet{ I+λi为该矩阵的特征值 tz−21vz21}=logdet{ z}+i=1∑nlog(1+tλi)
令 t z − 1 2 v z 1 2 = Q Λ Q T Q Q T = I tz^{-\frac12}vz^{\frac12}=Q\Lambda Q^T\;\;\;\;\;\;QQ^T=I tz−21vz21=QΛQTQQT=I
d e t ( I + t z − 1 2 v z 1 2 ) = d e t ( Q Q T + Q Λ Q T ) = d e t ( Q ) d e t ( I + Λ ) d e t ( Q T ) = d e t ( Q Q T I n ) d e t ( I + Λ 1 + λ i ) det(I+tz^{-\frac12}vz^{\frac12})=det(QQ^T+Q\Lambda Q^T)\\=det(Q)det(I+\Lambda)det(Q^T)\\=det(\underset{\color{blue}I_n}{QQ^T})det(\underset{\color{blue}1+\lambda_i}{I+\Lambda}) det(I+tz−21vz21)=det(QQT+QΛQT)=det(Q)det(I+Λ)det(QT)=det(InQQT)det(1+λiI+Λ)
g ′ ( t ) = ∑ i λ i 1 + t λ i g ′ ′ ( t ) = ∑ i − λ i 2 ( 1 + t λ i ) 2 ≤ 0 g'(t)=\sum_i\frac{\lambda_i}{1+t\lambda_i}\\g''(t)=\sum_i\frac{-\lambda_i^2}{(1+t\lambda_i)^2}\le0 g′(t)=i∑1+tλiλig′′(t)=i∑(1+tλi)2−λi2≤0
所以是凹函数。
下一章传送门:中科大-凸优化 笔记(lec14)-保凸运算