二元函数凹凸性判断:
设 f ( x , y ) f(x,y) f(x,y)在区域 D D D上具有二阶连续偏导数,且分别记为: A = f x x ′ ′ ( x , y ) , B = f x y ′ ′ ( x , y ) , C = f y y ′ ′ ( x , y ) A=f_{xx}^{''}(x,y),B=f_{xy}^{''}(x,y),C=f_{yy}^{''}(x,y) A=fxx′′(x,y),B=fxy′′(x,y),C=fyy′′(x,y)则:
( 1 ) 在 D 上 恒 有 A > 0 , 且 A C − B 2 ≥ 0 ⟹ 凸 函 数 \qquad{(1)在D上恒有A>0,且AC-B^2\geq0\Longrightarrow凸函数} (1)在D上恒有A>0,且AC−B2≥0⟹凸函数
( 1 ) 在 D 上 恒 有 A < 0 , 且 A C − B 2 ≤ 0 ⟹ 凹 函 数 \qquad{(1)在D上恒有A<0,且AC-B^2\leq0\Longrightarrow凹函数} (1)在D上恒有A<0,且AC−B2≤0⟹凹函数
注:这里的凸函数是指下凸,也就是我们常见的“凹函数”,只不过在机器学习中用这种叫法,毕竟是外国人发明的东西。
二元凹凸函数求最值:
设 f ( x , y ) f(x,y) f(x,y)是在开区域 D D D内具有连续偏导数的凸(或者凹)函数,其中 ( x 0 , y 0 ) ∈ D (x_0,y_0)\in{D} (x0,y0)∈D,且 f x ′ ( x 0 , y 0 ) = 0 , f y ′ ( x 0 , y 0 ) = 0 f_{x}^{'}(x_0,y_0)=0,f_{y}^{'}(x_0,y_0)=0 fx′(x0,y0)=0,fy′(x0,y0)=0,则 f ( x 0 , y 0 ) f(x_0,y_0) f(x0,y0)必定为 f ( x , y ) f(x,y) f(x,y)在区域 D D D内的最小值(或者最大值)。
这里已知函数为:
E ( w , b ) = ∑ i = 1 m ( y i − w x i − b ) 2 (式1) E(w,b)=\sum_{i=1}^{m}(y_i-wx_i-b)^2\tag{式1} E(w,b)=i=1∑m(yi−wxi−b)2(式1)
将 E ( w , b ) E(w,b) E(w,b)分别对于 w , b w,b w,b求导数(偏导数),得到:
∂ E ( w , b ) ∂ w = 2 ( w ⋅ ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) (式2) \cfrac{\partial{E(w,b)}}{\partial{w}}=2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i)\tag{式2} ∂w∂E(w,b)=2(w⋅i=1∑mxi2−i=1∑m(yi−b)xi)(式2)
∂ E ( w , b ) ∂ b = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) (式3) \cfrac{\partial{E(w,b)}}{\partial{b}}=2(mb-\sum_{i=1}^{m}(y_i-wx_i))\tag{式3} ∂b∂E(w,b)=2(mb−i=1∑m(yi−wxi))(式3)
在(式2)基础上: ∂ 2 E ( w , b ) ∂ w 2 = ∂ ∂ w ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ w ( 2 ( w ⋅ ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ) \cfrac{\partial^{2}E(w,b)}{\partial{w^2}}=\cfrac{\partial}{\partial{w}}(\cfrac{\partial{E(w,b)}}{\partial{w}})=\cfrac{\partial}{\partial{w}}(2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i)) ∂w2∂2E(w,b)=∂w∂(∂w∂E(w,b))=∂w∂(2(w⋅i=1∑mxi2−i=1∑m(yi−b)xi))
= ∂ ∂ w ( 2 w ⋅ ∑ i = 1 m x i 2 ) = 2 ∑ i = 1 m x i 2 (式4) =\cfrac{\partial}{\partial{w}}(2w\cdot{\sum_{i=1}^{m}x_i^2})=2\sum_{i=1}^{m}x_i^2\tag{式4} =∂w∂(2w⋅i=1∑mxi2)=2i=1∑mxi2(式4)
⟹ A = f x x ′ ′ ( x , y ) = 2 ∑ i = 1 m x i 2 \Longrightarrow{A=f_{xx}^{''}(x,y)}=2\sum_{i=1}^{m}x_i^2 ⟹A=fxx′′(x,y)=2∑i=1mxi2
∂ E 2 ( w , b ) ∂ w ∂ b = ∂ ∂ b ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ b ( 2 ( w ⋅ ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ) \cfrac{\partial{E^2(w,b)}}{\partial{w}\partial{b}}=\cfrac{\partial}{\partial{b}}(\cfrac{\partial{E(w,b)}}{\partial{w}})=\cfrac{\partial}{\partial{b}}(2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i)) ∂w∂b∂E2(w,b)=∂b∂(∂w∂E(w,b))=∂b∂(2(w⋅i=1∑mxi2−i=1∑m(yi−b)xi))
= ∂ ∂ b ( − 2 ∑ i = 1 m ( y i − b ) x i ) = 2 ∑ i = 1 m x i (式5) =\cfrac{\partial}{\partial{b}}(-2\sum_{i=1}^{m}(y_i-b)x_i)=2\sum_{i=1}^{m}x_i\tag{式5} =∂b∂(−2i=1∑m(yi−b)xi)=2i=1∑mxi(式5)
⟹ B = f x y ′ ′ ( x , y ) = 2 ∑ i = 1 m x i \Longrightarrow{B=f_{xy}^{''}(x,y)}=2\sum_{i=1}^{m}x_i ⟹B=fxy′′(x,y)=2∑i=1mxi
在(式3)基础上:
∂ 2 E ( w , b ) ∂ b 2 = ∂ ∂ b ( ∂ E ( w , b ) ∂ b ) = ∂ ∂ b ( 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) ) = 2 m (式6) \cfrac{\partial^2E{(w,b)}}{\partial{b^2}}=\cfrac{\partial}{\partial{b}}(\cfrac{\partial{E(w,b)}}{\partial{b}})=\cfrac{\partial}{\partial{b}}(2(mb-\sum_{i=1}^{m}(y_i-wx_i))) =2m\tag{式6} ∂b2∂2E(w,b)=∂b∂(∂b∂E(w,b))=∂b∂(2(mb−i=1∑m(yi−wxi)))=2m(式6)
⟹ C = f y y ′ ′ ( x , y ) = 2 m \Longrightarrow{C=f_{yy}^{''}(x,y)}=2m ⟹C=fyy′′(x,y)=2m
A C − B 2 = 4 m ∑ i = 1 m x i 2 − [ 2 ∑ i = 1 m x i 2 ] 2 = 4 m ∑ i = 1 m x i 2 − 4 m 1 m ∑ i = 1 m x i ⋅ ∑ i = 1 m x i = 4 m ( ∑ i = 1 m x i 2 − ∑ i = 1 m x i x ˉ ) AC-B^2=4m\sum_{i=1}^{m}x_i^2-[2\sum_{i=1}^{m}x_i^2]^2=4m\sum_{i=1}^{m}x_i^2-4m\cfrac{1}{m}\sum_{i=1}^{m}x_i\cdot\sum_{i=1}^{m}x_i=4m(\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}x_i{\bar{x}}) AC−B2=4mi=1∑mxi2−[2i=1∑mxi2]2=4mi=1∑mxi2−4mm1i=1∑mxi⋅i=1∑mxi=4m(i=1∑mxi2−i=1∑mxixˉ)
4 m ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − 2 x i x ˉ + x ˉ 2 ) ≥ 0 (式7) 4m\sum_{i=1}^{m}(x_i^2-x_i\bar{x}-x_i\bar{x}+x_i\bar{x})=4m\sum_{i=1}^{m}(x_i^2-2x_i\bar{x}+\bar{x}^2)\geq0\tag{式7} 4mi=1∑m(xi2−xixˉ−xixˉ+xixˉ)=4mi=1∑m(xi2−2xixˉ+xˉ2)≥0(式7)
注:上式中进行的一个替换操作为: ∑ i = 1 m x i x ˉ = x ˉ ⋅ m ⋅ 1 m ∑ i = 1 m x i = m x ˉ 2 = ∑ i = 1 m x ˉ 2 \sum_{i=1}^{m}x_i\bar{x}=\bar{x}\cdot{m}\cdot\cfrac{1}{m}\sum_{i=1}^{m}x_i=m\bar{x}^2=\sum_{i=1}^{m}\bar{x}^2 ∑i=1mxixˉ=xˉ⋅m⋅m1∑i=1mxi=mxˉ2=∑i=1mxˉ2
以及: 1 m ∑ i = 1 m x i = x ˉ \cfrac{1}{m}\sum_{i=1}^{m}x_i=\bar{x} m1∑i=1mxi=xˉ。
到这里就证明了 E ( w , b ) E(w,b) E(w,b)为凸函数,所以就可以进行凸优化操作了。