二元函数判断凹凸性

二元函数凹凸性判断

二元函数凹凸性判断
f ( x , y ) f(x,y) f(x,y)在区域 D D D上具有二阶连续偏导数,且分别记为: A = f x x ′ ′ ( x , y ) , B = f x y ′ ′ ( x , y ) , C = f y y ′ ′ ( x , y ) A=f_{xx}^{''}(x,y),B=f_{xy}^{''}(x,y),C=f_{yy}^{''}(x,y) A=fxx(x,y)B=fxy(x,y)C=fyy(x,y)则:
( 1 ) 在 D 上 恒 有 A > 0 , 且 A C − B 2 ≥ 0 ⟹ 凸 函 数 \qquad{(1)在D上恒有A>0,且AC-B^2\geq0\Longrightarrow凸函数} (1)DA>0,ACB20
( 1 ) 在 D 上 恒 有 A < 0 , 且 A C − B 2 ≤ 0 ⟹ 凹 函 数 \qquad{(1)在D上恒有A<0,且AC-B^2\leq0\Longrightarrow凹函数} (1)DA<0,ACB20
注:这里的凸函数是指下凸,也就是我们常见的“凹函数”,只不过在机器学习中用这种叫法,毕竟是外国人发明的东西。
二元凹凸函数求最值
f ( x , y ) f(x,y) f(x,y)是在开区域 D D D内具有连续偏导数的凸(或者凹)函数,其中 ( x 0 , y 0 ) ∈ D (x_0,y_0)\in{D} (x0,y0)D,且 f x ′ ( x 0 , y 0 ) = 0 , f y ′ ( x 0 , y 0 ) = 0 f_{x}^{'}(x_0,y_0)=0,f_{y}^{'}(x_0,y_0)=0 fx(x0,y0)=0,fy(x0,y0)=0,则 f ( x 0 , y 0 ) f(x_0,y_0) f(x0,y0)必定为 f ( x , y ) f(x,y) f(x,y)在区域 D D D内的最小值(或者最大值)。

这里已知函数为:
E ( w , b ) = ∑ i = 1 m ( y i − w x i − b ) 2 (式1) E(w,b)=\sum_{i=1}^{m}(y_i-wx_i-b)^2\tag{式1} E(w,b)=i=1m(yiwxib)2(1)
E ( w , b ) E(w,b) E(w,b)分别对于 w , b w,b w,b求导数(偏导数),得到:
∂ E ( w , b ) ∂ w = 2 ( w ⋅ ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) (式2) \cfrac{\partial{E(w,b)}}{\partial{w}}=2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i)\tag{式2} wE(w,b)=2(wi=1mxi2i=1m(yib)xi)(2)
∂ E ( w , b ) ∂ b = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) (式3) \cfrac{\partial{E(w,b)}}{\partial{b}}=2(mb-\sum_{i=1}^{m}(y_i-wx_i))\tag{式3} bE(w,b)=2(mbi=1m(yiwxi))(3)
在(式2)基础上: ∂ 2 E ( w , b ) ∂ w 2 = ∂ ∂ w ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ w ( 2 ( w ⋅ ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ) \cfrac{\partial^{2}E(w,b)}{\partial{w^2}}=\cfrac{\partial}{\partial{w}}(\cfrac{\partial{E(w,b)}}{\partial{w}})=\cfrac{\partial}{\partial{w}}(2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i)) w22E(w,b)=w(wE(w,b))=w(2(wi=1mxi2i=1m(yib)xi))
= ∂ ∂ w ( 2 w ⋅ ∑ i = 1 m x i 2 ) = 2 ∑ i = 1 m x i 2 (式4) =\cfrac{\partial}{\partial{w}}(2w\cdot{\sum_{i=1}^{m}x_i^2})=2\sum_{i=1}^{m}x_i^2\tag{式4} =w(2wi=1mxi2)=2i=1mxi2(4)
⟹ A = f x x ′ ′ ( x , y ) = 2 ∑ i = 1 m x i 2 \Longrightarrow{A=f_{xx}^{''}(x,y)}=2\sum_{i=1}^{m}x_i^2 A=fxx(x,y)=2i=1mxi2
∂ E 2 ( w , b ) ∂ w ∂ b = ∂ ∂ b ( ∂ E ( w , b ) ∂ w ) = ∂ ∂ b ( 2 ( w ⋅ ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) ) \cfrac{\partial{E^2(w,b)}}{\partial{w}\partial{b}}=\cfrac{\partial}{\partial{b}}(\cfrac{\partial{E(w,b)}}{\partial{w}})=\cfrac{\partial}{\partial{b}}(2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i)) wbE2(w,b)=b(wE(w,b))=b(2(wi=1mxi2i=1m(yib)xi))
= ∂ ∂ b ( − 2 ∑ i = 1 m ( y i − b ) x i ) = 2 ∑ i = 1 m x i (式5) =\cfrac{\partial}{\partial{b}}(-2\sum_{i=1}^{m}(y_i-b)x_i)=2\sum_{i=1}^{m}x_i\tag{式5} =b(2i=1m(yib)xi)=2i=1mxi(5)
⟹ B = f x y ′ ′ ( x , y ) = 2 ∑ i = 1 m x i \Longrightarrow{B=f_{xy}^{''}(x,y)}=2\sum_{i=1}^{m}x_i B=fxy(x,y)=2i=1mxi
在(式3)基础上:
∂ 2 E ( w , b ) ∂ b 2 = ∂ ∂ b ( ∂ E ( w , b ) ∂ b ) = ∂ ∂ b ( 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) ) = 2 m (式6) \cfrac{\partial^2E{(w,b)}}{\partial{b^2}}=\cfrac{\partial}{\partial{b}}(\cfrac{\partial{E(w,b)}}{\partial{b}})=\cfrac{\partial}{\partial{b}}(2(mb-\sum_{i=1}^{m}(y_i-wx_i))) =2m\tag{式6} b22E(w,b)=b(bE(w,b))=b(2(mbi=1m(yiwxi)))=2m(6)
⟹ C = f y y ′ ′ ( x , y ) = 2 m \Longrightarrow{C=f_{yy}^{''}(x,y)}=2m C=fyy(x,y)=2m
A C − B 2 = 4 m ∑ i = 1 m x i 2 − [ 2 ∑ i = 1 m x i 2 ] 2 = 4 m ∑ i = 1 m x i 2 − 4 m 1 m ∑ i = 1 m x i ⋅ ∑ i = 1 m x i = 4 m ( ∑ i = 1 m x i 2 − ∑ i = 1 m x i x ˉ ) AC-B^2=4m\sum_{i=1}^{m}x_i^2-[2\sum_{i=1}^{m}x_i^2]^2=4m\sum_{i=1}^{m}x_i^2-4m\cfrac{1}{m}\sum_{i=1}^{m}x_i\cdot\sum_{i=1}^{m}x_i=4m(\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}x_i{\bar{x}}) ACB2=4mi=1mxi2[2i=1mxi2]2=4mi=1mxi24mm1i=1mxii=1mxi=4m(i=1mxi2i=1mxixˉ)
4 m ∑ i = 1 m ( x i 2 − x i x ˉ − x i x ˉ + x i x ˉ ) = 4 m ∑ i = 1 m ( x i 2 − 2 x i x ˉ + x ˉ 2 ) ≥ 0 (式7) 4m\sum_{i=1}^{m}(x_i^2-x_i\bar{x}-x_i\bar{x}+x_i\bar{x})=4m\sum_{i=1}^{m}(x_i^2-2x_i\bar{x}+\bar{x}^2)\geq0\tag{式7} 4mi=1m(xi2xixˉxixˉ+xixˉ)=4mi=1m(xi22xixˉ+xˉ2)0(7)
注:上式中进行的一个替换操作为: ∑ i = 1 m x i x ˉ = x ˉ ⋅ m ⋅ 1 m ∑ i = 1 m x i = m x ˉ 2 = ∑ i = 1 m x ˉ 2 \sum_{i=1}^{m}x_i\bar{x}=\bar{x}\cdot{m}\cdot\cfrac{1}{m}\sum_{i=1}^{m}x_i=m\bar{x}^2=\sum_{i=1}^{m}\bar{x}^2 i=1mxixˉ=xˉmm1i=1mxi=mxˉ2=i=1mxˉ2
以及: 1 m ∑ i = 1 m x i = x ˉ \cfrac{1}{m}\sum_{i=1}^{m}x_i=\bar{x} m1i=1mxi=xˉ
到这里就证明了 E ( w , b ) E(w,b) E(w,b)为凸函数,所以就可以进行凸优化操作了。

你可能感兴趣的:(机器学习,数据挖掘,凸函数)