凸优化——凸集与凸函数

一、数学规划
  从一个可行解的集合中,寻找出最优的元素,称为数学规划,又名优化。可以写为 m i n i m i z e   f 0 ( x ) s u b j e c t   t o   f i ( x ) < = b i , i = 1 , 2 , . . . , n \begin{aligned} & minimize\ f_0(x) \\ &subject\ to\ f_i(x) <= b_i, i = 1, 2, ..., n \end{aligned} minimize f0(x)subject to fi(x)<=bi,i=1,2,...,n其中 x = [ x 1 , . . . , x n ] T x = [x_1, ... , x_n]^T x=[x1,...,xn]T,称为优化变量; f 0 : R n → R f_0: \bm{R}^n \rightarrow \bm{R} f0:RnR称为目标函数; f i : R n → R f_i: \bm{R}^n \rightarrow \bm{R} fi:RnR称为不等式约束。优化问题的最优解为 x ∗ x^* x,等价于 ∀ z ∈ { z ∣ f i ( z ) < = b i } , f 0 ( z ) > = f 0 ( x ∗ ) \forall z \in \{z|f_i(z) <= b_i\}, f_0(z)>=f_0(x^*) z{zfi(z)<=bi},f0(z)>=f0(x)在图像处理中,对图像 I 0 ( x , y ) I_0(x,y) I0(x,y)存在噪声,考虑恢复图像 I ( x , y ) I(x,y) I(x,y)。考虑先验知识图像的分片光滑,则认为TV范数,形如 ∣ ∣ I ∣ ∣ T V = ∑ y ∑ x [ ( I ( x , y ) − I ( x , y − 1 ) ) 2 + ( I ( x , y ) − I ( x − 1 , y ) ) 2 ] 1 / 2 ||I||_{TV} = \sum_y\sum_x[(I(x,y) - I(x,y - 1))^2 + (I(x,y) - I(x - 1,y))^2]^{1/2} ITV=yx[(I(x,y)I(x,y1))2+(I(x,y)I(x1,y))2]1/2表示两方向差分的平方和开方求和,对于自然图像,TV范数一般较小。故可以化为优化问题: m i n i m i z e   ∣ ∣ I ∣ ∣ T V + λ ∣ ∣ I − I 0 ∣ ∣ F 2 \begin{aligned} & minimize\ ||I||_{TV} + λ||I - I_0||^2_F \end{aligned} minimize ITV+λII0F2以确保恢复得到的图像光滑的同时,噪声图像与恢复图像相对接近。
  数学规划可以以不同角度分类,包括线性规划问题与非线性规划问题,该类以约束的线性判定;凸优化与非凸优化,该类以约束的凸性判定,凸优化与非凸优化有本质上的差别,线性规划是典型的凸优化问题;光滑优化与非光滑优化,该类以目标函数的可微性判定;连续优化与离散优化,该类以可行域判定,离散优化一般情况下的非凸优化问题;单目标与多目标问题,该类以目标函数的数量判定。


二、仿射集
  首先考虑空间不同的两点 x 1 , x 2 ∈ R n \bm{x_1}, \bm{x_2} \in \bm{R}^n x1,x2Rn,为了表示过这两点的直线方程,定义变量 θ ∈ R θ \in \bm{R} θR,则该直线为 y = θ x 1 + ( 1 − θ ) x 2 \bm{y} = θ\bm{x_1} + (1 - θ)\bm{x_2} y=θx1+(1θ)x2再考虑线段,考虑空间不同的两点 x 1 , x 2 ∈ R n \bm{x_1}, \bm{x_2} \in \bm{R}^n x1,x2Rn θ ∈ R θ \in \bm{R} θR,线段可以表示为 y = θ x 1 + ( 1 − θ ) x 2 , θ ∈ [ 0 , 1 ] \bm{y} = θ\bm{x_1} + (1 - θ)\bm{x_2},θ\in [0,1] y=θx1+(1θ)x2θ[0,1]基于此,定义仿射集为,对于集合 C \bm{C} C ∀ x 1 , x 2 ∈ C \forall x_1, x_2 \in \bm{C} x1,x2C,连接 x 1 x_1 x1 x 2 x_2 x2的直线也在 C \bm{C} C内,则称该集合为仿射集。该定义推广到n元仍然有效。
  首先考察仿射集的性质。取仿射集 C \bm{C} C,定义 V = C − x 0 = { x − x 0 ∣ x ∈ C , ∀ x 0 ∈ C } \bm{V} = \bm{C} - x_0 = \{x - x_0|x \in \bm{C}, \forall x_0 \in \bm{C} \} V=Cx0={xx0xC,x0C} V \bm{V} V为与 C \bm{C} C相关的子空间。 V \bm{V} V亦是一个仿射集,考虑 ∀ v 1 , v 2 ∈ V \forall v_1, v_2 \in \bm{V} v1,v2V ∀ a , b ∈ R \forall a, b \in \bm{R} a,bR,考察 a v 1 + b v 2 + x 0 av_1 + bv_2 + x_0 av1+bv2+x0 C \bm{C} C的关系,有 a v 1 + b v 2 + x 0 = a ( v 1 + x 0 ) + b ( v 2 + x 0 ) + ( 1 − a − b ) x 0 av_1 + bv_2 + x_0 = a(v_1 + x_0) + b(v_2 + x_0) + (1 - a - b) x_0 av1+bv2+x0=a(v1+x0)+b(v2+x0)+(1ab)x0而其中易知 v 1 + x 0 ∈ C v_1 + x_0 \in \bm{C} v1+x0C v 2 + x 0 ∈ C v_2 + x_0 \in \bm{C} v2+x0C x 0 ∈ C x_0 \in \bm{C} x0C,故 a v 1 + b v 2 + x 0 ∈ C av_1 + bv_2 + x_0 \in \bm{C} av1+bv2+x0C,即 a v 1 + b v 2 ∈ V av_1 + bv_2 \in \bm{V} av1+bv2V。故 V \bm{V} V的性质为 ∀ v 1 , v 2 ∈ V , ∀ a , b ∈ R , a v 1 + b v 2 + x 0 ∈ C \forall v_1, v_2 \in \bm{V}, \forall a, b \in \bm{R}, av_1 + bv_2 + x_0 \in \bm{C} v1,v2V,a,bR,av1+bv2+x0C在几何空间中体现为 C \bm{C} C为任意超平面,而 V \bm{V} V C \bm{C} C平行且过原点。
  考虑 C = { X ∣ A X = b } \bm{C} = \{\bm{X}|\bm{A}\bm{X} = \bm{b}\} C={XAX=b},并 ∀ X 1 , X 2 ∈ C \forall\bm{X}_1, \bm{X}_2 \in \bm{C} X1,X2C,则有 A X 1 = b A X 2 = b \bm{A}\bm{X}_1 = \bm{b} \\ \bm{A}\bm{X}_2 = \bm{b} AX1=bAX2=b ∀ θ ∈ R \forallθ \in R θR,则有 θ A X 1 = b ( 1 − θ ) A X 2 = b θ\bm{A}\bm{X}_1 = \bm{b} \\ (1 - θ)\bm{A}\bm{X}_2 = \bm{b} θAX1=b(1θ)AX2=b故有 θ A X 1 + ( 1 − θ ) A X 2 = b θ\bm{A}\bm{X}_1 + (1 - θ)\bm{A}\bm{X}_2 = \bm{b} θAX1+(1θ)AX2=b θ X 1 + ( 1 − θ ) X 2 ∈ C θ\bm{X}_1 + (1 - θ)\bm{X}_2 \in \bm{C} θX1+(1θ)X2C,因此线性方程组的解集是一个仿射集。考虑该解集的子空间 V = { X − X 0 ∣ A X = b } , A X 0 = b \bm{V} = \{\bm{X} - \bm{X}_0 | \bm{A}\bm{X} = \bm{b}\}, \bm{A}\bm{X}_0 = \bm{b} V={XX0AX=b},AX0=b V = { X − X 0 ∣ A ( X − X 0 ) = 0 } \bm{V} = \{\bm{X} - \bm{X}_0 | \bm{A}(\bm{X} - \bm{X}_0) = \bm{0}\} V={XX0A(XX0)=0}考虑 Y = X − X 0 \bm{Y} = \bm{X} - \bm{X}_0 Y=XX0 V = { Y ∣ A Y = 0 } \bm{V} = \{\bm{Y} | \bm{A}\bm{Y} = \bm{0}\} V={YAY=0},即在高维空间中仍满足 V \bm{V} V C \bm{C} C平行并且过原点。
  考虑任意集合 C \bm{C} C,为了构造该集合的最小仿射集,定义仿射包 a f f   C = { θ 1 X 1 + . . . + θ k X k ∣ ∀ X 1 , . . . , X k ∈ C , θ 1 + . . . + θ k = 1 } aff\ \bm{C} = \{ θ_1\bm{X}_1 + ... + θ_k\bm{X}_k | \forall \bm{X}_1, ..., \bm{X}_k \in \bm{C}, θ_1 + ... + θ_k = 1 \} aff C={θ1X1+...+θkXkX1,...,XkC,θ1+...+θk=1}


三、凸集
  对于集合 C \bm{C} C ∀ x 1 , x 2 ∈ C \forall x_1, x_2 \in \bm{C} x1,x2C,连接 x 1 x_1 x1 x 2 x_2 x2的线段也在 C \bm{C} C内,则称该集合为凸集。该定义推广到n元仍然有效。仿射集是一种特殊的凸集。
  考虑任意集合 C \bm{C} C,为了构造该集合的最小凸集,定义凸包 C o n v   C = { θ 1 X 1 + . . . + θ k X k ∣ ∀ X 1 , . . . , X k ∈ C , θ 1 , . . . , θ k ∈ [ 0 , 1 ] , θ 1 + . . . + θ k = 1 } Conv\ \bm{C} = \{ θ_1\bm{X}_1 + ... + θ_k\bm{X}_k | \forall \bm{X}_1, ..., \bm{X}_k \in \bm{C}, θ_1, ... , θ_k \in [0, 1], θ_1 + ... + θ_k = 1 \} Conv C={θ1X1+...+θkXkX1,...,XkC,θ1,...,θk[0,1],θ1+...+θk=1}  对于集合 C \bm{C} C ∀ x ∈ C \forall \bm{x} \in \bm{C} xC,对 θ > = 0 θ >= 0 θ>=0,有 θ x ∈ C θ\bm{x} \in \bm{C} θxC,则称该集合为锥,锥一定经过原点。而对于集合 C \bm{C} C ∀ x 1 , x 2 ∈ C \forall \bm{x}_1, \bm{x}_2 \in \bm{C} x1,x2C,对 θ 1 , θ 2 > = 0 θ_1, θ_2 >= 0 θ1,θ2>=0,有 θ 1 x 1 + θ 2 x 2 ∈ C θ_1\bm{x}_1 + θ_2\bm{x}_2 \in \bm{C} θ1x1+θ2x2C,则称该集合为凸锥。考虑任意集合 C \bm{C} C,可以定义凸锥包 { θ 1 X 1 + . . . + θ k X k ∣ ∀ X 1 , . . . , X k ∈ C , θ 1 , . . . , θ k > = 0 } \{ θ_1\bm{X}_1 + ... + θ_k\bm{X}_k | \forall \bm{X}_1, ..., \bm{X}_k \in \bm{C}, θ_1, ... , θ_k >= 0\} {θ1X1+...+θkXkX1,...,XkC,θ1,...,θk>=0}  凸集中有几种特殊的形式:一个点是仿射集、凸集,但仅有原点是凸锥;空集是仿射集、凸集、凸锥; R n \bm{R}^n Rn空间是仿射集、凸集、凸锥; R n \bm{R}^n Rn的子空间是仿射集、凸集、凸锥;任意直线是仿射集、凸集,过原点的直线式凸锥;任意线段是凸集,点是仿射集,原点是凸锥; 任意射线是凸集,点是仿射集,过原点的射线是凸锥。
  接下来考虑复杂情况。考虑超平面 { x ∣ a T x = b , a , x ∈ R n , a ≠ 0 , b ∈ R } \{\bm{x}|\bm{a}^T\bm{x} = b, \bm{a}, \bm{x} \in \bm{R}^n, \bm{a} \ne \bm{0}, b \in \bm{R}\} {xaTx=b,a,xRn,a=0,bR},在低维中表现为直线、平面。超平面是仿射集,凸集,超平面过原点,即子空间是一个凸锥。而半空间 { x ∣ a T x > b , a , x ∈ R n , a ≠ 0 , b ∈ R } \{\bm{x}|\bm{a}^T\bm{x} > b, \bm{a}, \bm{x} \in \bm{R}^n, \bm{a} \ne \bm{0}, b \in \bm{R}\} {xaTx>b,a,xRn,a=0,bR} { x ∣ a T x < = b , a , x ∈ R n , a ≠ 0 , b ∈ R } \{\bm{x}|\bm{a}^T\bm{x} <= b, \bm{a}, \bm{x} \in \bm{R}^n, \bm{a} \ne \bm{0}, b \in \bm{R}\} {xaTx<=b,a,xRn,a=0,bR}是一个凸集,不是一个仿射集,过原点时是一个凸锥;球 { x ∣   ∣ ∣ x − x c ∣ ∣ 2 < = r , x c ∈ R n } \{\bm{x}|\ ||\bm{x} - \bm{x}_c||_2 <= r, \bm{x}_c \in \bm{R}^n\} {x xxc2<=r,xcRn},低维中表现为圆、球体,是凸集,点是仿射集,原点是凸锥。考虑证明球是凸集,取球 B ( x , x c ) = { x ∣   ∣ ∣ x − x c ∣ ∣ 2 < = r , x c ∈ R n } B(\bm{x}, \bm{x}_c) = \{\bm{x}|\ ||\bm{x} - \bm{x}_c||_2 <= r, \bm{x}_c \in \bm{R}^n\} B(x,xc)={x xxc2<=r,xcRn} ∀ x 1 , x 2 ∈ B \forall \bm{x}_1, \bm{x}_2 \in B x1,x2B,有 ∣ ∣ x 1 − x c ∣ ∣ 2 < = r , ∣ ∣ x 2 − x c ∣ ∣ 2 < = r ||\bm{x}_1 - \bm{x}_c||_2 <= r, ||\bm{x}_2 - \bm{x}_c||_2 <= r x1xc2<=r,x2xc2<=r,考虑 θ ∈ [ 0 , 1 ] θ \in [0, 1] θ[0,1],有 ∣ ∣ θ x 1 + ( 1 − θ ) x 2 − x c ∣ ∣ 2 =   ∣ ∣ θ ( x 1 − x c ) + ( 1 − θ ) ( x 2 − x c ) ∣ ∣ 2 ≤   ∣ ∣ θ ( x 1 − x c ) ∣ ∣ 2 + ∣ ∣ ( 1 − θ ) ( x 2 − x c ) ∣ ∣ 2 =   θ ∣ ∣ ( x 1 − x c ) ∣ ∣ 2 + ( 1 − θ ) ∣ ∣ ( x 2 − x c ) ∣ ∣ 2 ≤   r \begin{aligned} &||θ\bm{x}_1 + (1 - θ)\bm{x}_2 - \bm{x}_c||_2 \\ =\ & ||θ(\bm{x}_1 - \bm{x}_c) + (1 - θ)(\bm{x}_2 - \bm{x}_c)||_2 \\ \le\ & ||θ(\bm{x}_1 - \bm{x}_c)||_2 + ||(1 - θ)(\bm{x}_2 - \bm{x}_c)||_2 \\ =\ & θ||(\bm{x}_1 - \bm{x}_c)||_2 + (1 - θ)||(\bm{x}_2 - \bm{x}_c)||_2 \\ \le\ & r \end{aligned} =  =  θx1+(1θ)x2xc2θ(x1xc)+(1θ)(x2xc)2θ(x1xc)2+(1θ)(x2xc)2θ(x1xc)2+(1θ)(x2xc)2r即球中元素的凸组合仍在球内,球是凸集;椭球 { x ∣   ( x − x c ) T P − 1 ( x − x c ) < = 1 , x c ∈ R n , P ∈ S + + n } \{\bm{x}|\ (\bm{x} - \bm{x}_c)^T\bm{P}^{-1}(\bm{x} - \bm{x}_c) <= 1, \bm{x}_c \in \bm{R}^n, \bm{P} \in \bm{S}^n_{++}\} {x (xxc)TP1(xxc)<=1,xcRn,PS++n},其中 S + + n \bm{S}^n_{++} S++n表示n维正定对称矩阵, P \bm{P} P决定了椭球的半轴长。考虑椭球 { x ∣   ( x − x c ) T ( 4 0 0 1 ) − 1 ( x − x c ) < = 1 } \{\bm{x}|\ (\bm{x} - \bm{x}_c)^T\left( \begin{matrix}4 & 0 \\ 0 & 1 \end{matrix} \right )^{-1}(\bm{x} - \bm{x}_c) <= 1\} {x (xxc)T(4001)1(xxc)<=1},展开得 { ( x 1 , x 2 ) ∣ 1 / 4 x 1 2 + x 2 2 < = 1 } \{(x_1, x_2)|1/4x_1^2 + x_2^2 <= 1\} {(x1,x2)1/4x12+x22<=1}。椭球是凸集;多面体 { x ∣ a T x < = b j , j = 1 , 2 , . . . m , a T x = d j , j = 1 , 2 , . . . p } \{\bm{x}|\bm{a}^T\bm{x} <= b_j, j = 1, 2, ...m, \bm{a}^T\bm{x} = d_j, j = 1, 2, ...p\} {xaTx<=bj,j=1,2,...m,aTx=dj,j=1,2,...p},可以无界,多面体是凸集;单纯形,在 R n \bm{R}^n Rn空间中选择 v 0 , . . . , v k \bm{v}_0, ..., \bm{v}_k v0,...,vk共k+1个点, v 1 − v 0 , . . . , v k − v 0 \bm{v}_1 - \bm{v}_0, ..., \bm{v}_k - \bm{v}_0 v1v0,...,vkv0线性无关,则与上述点相关的单纯形为 C o n v { v 0 , . . . v k } = { θ 0 v 0 + . . . + θ k v k , θ > = 0 , 1 T θ = 1 } Conv\{\bm{v}_0, ... \bm{v}_k\} = \{θ_0\bm{v}_0 + ... + θ_k\bm{v}_k, θ >= 0, \bm{1}^Tθ = 1\} Conv{v0,...vk}={θ0v0+...+θkvk,θ>=0,1Tθ=1}。考虑二维情况, k = 1 k = 1 k=1时为线段, k = 2 k = 2 k=2时为三角形, k > = 3 k >= 3 k>=3 { v k } \{\bm{v}_k\} {vk}不能线性无关。考虑三维情况,单纯形是线段、三角形、正四面体。单纯形一定是一个多面体,考虑证明该命题。记单纯形 C C C x ∈ C , x = θ 0 v 0 + . . . + θ k v k , θ > = 0 , 1 T θ = 1 \bm{x} \in C, \bm{x} = θ_0\bm{v}_0 + ... + θ_k\bm{v}_k, θ >= 0, \bm{1}^Tθ = 1 xC,x=θ0v0+...+θkvk,θ>=0,1Tθ=1 v 1 − v 0 , . . . , v k − v 0 \bm{v}_1 - \bm{v}_0, ..., \bm{v}_k - \bm{v}_0 v1v0,...,vkv0线性无关。取 ( θ 1 , . . . , θ k ) T = y , ( v 1 − v 0 , . . . , v k − v 0 ) = B ∈ R n × k (θ_1, ..., θ_k)^T = \bm{y}, (\bm{v}_1 - \bm{v}_0, ..., \bm{v}_k - \bm{v}_0) = \bm{B} \in \bm{R}^{n × k} (θ1,...,θk)T=y,(v1v0,...,vkv0)=BRn×k,则 1 T y < = 1 , y > = 0 \bm{1}^T\bm{y} <= 1, \bm{y} >= 0 1Ty<=1,y>=0,则 x = θ 0 v 0 + . . . + θ k v k = v 0 + θ 1 ( v 1 − v 0 ) + . . . + θ k ( v k − v 0 ) = v 0 + B y \begin{aligned} \bm{x} &= θ_0\bm{v}_0 + ... + θ_k\bm{v}_k \\ &= \bm{v}_0 + θ_1(\bm{v}_1 - \bm{v}_0) + ... + θ_k(\bm{v}_k - \bm{v}_0) \\ &= \bm{v}_0 + \bm{B}\bm{y} \end{aligned} x=θ0v0+...+θkvk=v0+θ1(v1v0)+...+θk(vkv0)=v0+By其中, R a n k ( B n × k ) = k , k < = n Rank(\bm{B}_{n×k}) = k, k<=n Rank(Bn×k)=kk<=n,则有非奇异矩阵 A = ( A 1 A 2 ) ∈ R n × n \bm{A} = \left( \begin{matrix}\bm{A_1} \\ \bm{A_2}\end{matrix} \right ) \in \bm{R}^{n×n} A=(A1A2)Rn×n,使得 A B = ( A 1 A 2 ) B = ( I k 0 ) \bm{A}\bm{B} = \left( \begin{matrix}\bm{A_1} \\ \bm{A_2}\end{matrix} \right )\bm{B} = \left( \begin{matrix}\bm{I}_k \\ \bm{0}\end{matrix} \right ) AB=(A1A2)B=(Ik0)。故有 A x = A v 0 + A B y \bm{A}\bm{x} = \bm{A}\bm{v}_0 + \bm{A}\bm{B}\bm{y} Ax=Av0+ABy,即 ( A 1 A 2 ) x = ( A 1 A 2 ) v 0 + ( A 1 A 2 ) B y \left( \begin{matrix}\bm{A_1} \\ \bm{A_2}\end{matrix} \right )\bm{x} = \left( \begin{matrix}\bm{A_1} \\ \bm{A_2}\end{matrix} \right )\bm{v}_0 + \left( \begin{matrix}\bm{A_1} \\ \bm{A_2}\end{matrix} \right )\bm{B}\bm{y} (A1A2)x=(A1A2)v0+(A1A2)By展开得 A 1 x = A 1 v 0 + y A 2 x = A 2 v 0 \bm{A}_1\bm{x} = \bm{A}_1\bm{v}_0 +\bm{y} \\ \bm{A}_2\bm{x} = \bm{A}_2\bm{v}_0 A1x=A1v0+yA2x=A2v0考虑 1 T y < = 1 , y > = 0 \bm{1}^T\bm{y} <= 1, \bm{y} >= 0 1Ty<=1,y>=0,有 A 1 x > = A 1 v 0 1 T A 1 x < = 1 T A 1 v 0 + 1 A 2 x = A 2 v 0 \bm{A}_1\bm{x} >= \bm{A}_1\bm{v}_0\\ \bm{1}^T\bm{A}_1\bm{x} <= \bm{1}^T\bm{A}_1\bm{v}_0 + 1 \\ \bm{A}_2\bm{x} = \bm{A}_2\bm{v}_0 A1x>=A1v01TA1x<=1TA1v0+1A2x=A2v0得证。考虑对称矩阵集合 S n \bm{S}^n Sn,对称半正定矩阵集合 S + n \bm{S}^n_+ S+n与对称正定矩阵集合 S + n + \bm{S}^n_++ S+n+。现在证明 S + n \bm{S}^n_+ S+n是凸集、凸锥。 ∀ θ 1 , θ 2 > = 0 , ∀ A , B ∈ S + n \forall \theta_1, \theta_2 >= 0, \forall \bm{A}, \bm{B} \in \bm{S}^n_+ θ1,θ2>=0,A,BS+n,则有 ∀ X ∈ R n , X T A X > = 0 , X T B X > = 0 \forall \bm{X} \in \bm{R}^n, \bm{X}^T\bm{A}\bm{X} >=0, \bm{X}^T\bm{B}\bm{X} >= 0 XRn,XTAX>=0,XTBX>=0,则 X T ( θ 1 A + θ 2 B ) X = θ 1 X T A X + θ 2 X T B X > = 0 \bm{X}^T(\theta_1\bm{A} + \theta_2\bm{B})\bm{X} = \theta_1\bm{X}^T\bm{A}\bm{X} + \theta_2\bm{X}^T\bm{B}\bm{X} >= 0 XT(θ1A+θ2B)X=θ1XTAX+θ2XTBX>=0,即对称半正定矩阵是凸锥。但对称正定矩阵不是凸锥,但是一个凸集。


四、保凸运算
  若 S 1 , S 2 S_1, S_2 S1,S2是凸集,则 S 1 ∩ S 2 S_1 \cap S_2 S1S2是凸集,该结论可以推广到n个凸集的情况。
  考虑函数 f ( x ) = A x + b , A ∈ R m × n , b ∈ R m f(\bm{x}) = \bm{A}\bm{x} + \bm{b}, \bm{A} \in \bm{R}^{m×n}, \bm{b} \in \bm{R}^m f(x)=Ax+b,ARm×n,bRm,则 f : R n → R m f: \bm{R}^n \rightarrow \bm{R}^m f:RnRm是仿射函数。若 S ∈ R n S \in \bm{R}^n SRn是凸集, f : R n → R m f: \bm{R}^n \rightarrow \bm{R}^m f:RnRm是仿射函数,则 f ( S ) = { f ( x ) ∣ x ∈ S } f(S) = \{f(\bm{x})|\bm{x} \in S\} f(S)={f(x)xS}是凸集,逆仿射函数 f − 1 f^{-1} f1仍然是凸集。
  若 S 1 , S 2 S_1, S_2 S1,S2是凸集,则 { x + y ∣ x ∈ S 1 , y ∈ S 2 } \{x+y|x \in S_1, y \in S_2\} {x+yxS1,yS2}是凸集, { ( x , y ) ∣ x ∈ S 1 , y ∈ S 2 } \{(x, y)|x \in S_1, y \in S_2\} {(x,y)xS1,yS2}是凸集。
  考虑线性矩阵不等式【LMI】 A ( X ) = X 1 A 1 + . . . + X n A n ⪯ B , B , A i , X i ∈ S m A(\bm{X}) = \bm{X}_1\bm{A}_1 + ... + \bm{X}_n\bm{A}_n \preceq \bm{B}, \bm{B}, \bm{A}_i, \bm{X}_i \in \bm{S}^m A(X)=X1A1+...+XnAnB,B,Ai,XiSm,其中 A ( X ) ⪯ B A(\bm{X}) \preceq \bm{B} A(X)B表示 ( A ( X ) − B ) (A(\bm{X}) - \bm{B}) (A(X)B)是半负定矩阵,则 { X ∣ A ( X ) ⪯ B } \{\bm{X}|A(\bm{X})\preceq\bm{B}\} {XA(X)B}是一个凸集。考虑仿射变换 f ( X ) = B − A ( X ) f(\bm{X}) = \bm{B} - A(\bm{X}) f(X)=BA(X),而对称半正定矩阵是凸锥,则有 f − 1 ( S + n ) = { X ∣ B − A ( X ) ⪰ 0 } f^{-1}(\bm{S}^n_+) = \{\bm{X} | \bm{B} - A(\bm{X}) \succeq 0\} f1(S+n)={XBA(X)0}也是凸集,即LMI的解集也是凸集。
  考虑函数 p ( z , t ) = z / t , z ∈ R n , t ∈ R + + p(\bm{z}, t) = \bm{z}/t, \bm{z}\in \bm{R}^n, t\in R_{++} p(z,t)=z/t,zRn,tR++,则称该函数为透视函数。若 ( z , t ) (\bm{z}, t) (z,t)是凸集,则其透视函数 p ( z , t ) p(\bm{z}, t) p(z,t)是凸集。考虑高维的两点 x , y ∈ R n + 1 \bm{x}, \bm{y} \in \bm{R}^{n+1} x,yRn+1,则经过这两点的线段为 θ x + ( 1 − θ ) y θ\bm{x} + (1 - θ)\bm{y} θx+(1θ)y,其透视函数为 p ( θ x + ( 1 − θ ) y ) = ( θ x ′ + ( 1 − θ ) y ′ ) / ( θ x n + 1 + ( 1 − θ ) y n + 1 ) = θ x n + 1 / ( θ x n + 1 + ( 1 − θ ) y n + 1 ) ( x ′ / x n + 1 ) + ( 1 − θ ) y n + 1 / ( θ x n + 1 + ( 1 − θ ) y n + 1 ) ( y ′ / y n + 1 ) = μ p ( x ′ , x n + 1 ) + ( 1 − μ ) ( y ′ , y n + 1 ) \begin{aligned}p(θ\bm{x} + (1 - θ)\bm{y}) &= (θ\bm{x}' + (1 - θ)\bm{y}')/(θx_{n+1} + (1 - θ)y_{n+1}) \\&= θx_{n+1}/(θx_{n+1} + (1 - θ)y_{n+1}) (\bm{x}'/x_{n+1}) + (1 - θ)y_{n+1}/(θx_{n+1} + (1 - θ)y_{n+1}) (\bm{y}'/y_{n+1}) \\&= μp(\bm{x}', x_{n+1}) + (1 - μ)(\bm{y}', y_{n+1}) \end{aligned} p(θx+(1θ)y)=(θx+(1θ)y)/(θxn+1+(1θ)yn+1)=θxn+1/(θxn+1+(1θ)yn+1)(x/xn+1)+(1θ)yn+1/(θxn+1+(1θ)yn+1)(y/yn+1)=μp(x,xn+1)+(1μ)(y,yn+1)其中 μ = θ x n + 1 / ( θ x n + 1 + ( 1 − θ ) y n + 1 ) μ = θx_{n+1}/(θx_{n+1} + (1 - θ)y_{n+1}) μ=θxn+1/(θxn+1+(1θ)yn+1),该结果是一个凸组合。再考虑反透视映射 p − 1 ( c ) = { ( x , t ) ∈ R n + 1 ∣ x / t ∈ c , t > 0 } p^{-1}(\bm{c}) = \{(\bm{x}, t)\in \bm{R}^{n+1}|\bm{x}/t \in \bm{c}, t>0\} p1(c)={(x,t)Rn+1x/tc,t>0},其亦是凸集。
  考虑仿射函数 g ( x ) = ( A , c T ) T x + ( b , d ) T , A ∈ R m × n , C ∈ R n , b ∈ R m , d ∈ R g(\bm{x}) = (\bm{A}, \bm{c}^T)^T\bm{x} + (\bm{b}, d)^T, \bm{A}\in\bm{R}^{m×n}, \bm{C}\in\bm{R}^{n}, \bm{b}\in\bm{R}^{m}, d\in R g(x)=(A,cT)Tx+(b,d)T,ARm×n,CRn,bRm,dR,与透视函数 p : R m + 1 → R m p:\bm{R}^{m+1}\rightarrow \bm{R}^{m} p:Rm+1Rm,则定义线性分式函数 f : p g f:pg f:pg,即 f ( x ) = ( A x + b ) / ( C T x + d ) , d o m   f = { x ∣ C T x + d > 0 } f(\bm{x}) = (\bm{A}\bm{x} + \bm{b})/(\bm{C}^T\bm{x} + d), dom\ f=\{\bm{x}|\bm{C}^T\bm{x} + d>0\} f(x)=(Ax+b)/(CTx+d),dom f={xCTx+d>0},任意凸集的线性分式函数仍是凸集。考虑两个随机变量联合概率的条件概率,其中 u = { 1 , . . . , n } , v = { 1 , . . . , m } u = \{1, ..., n\}, v = \{1, ..., m\} u={1,...,n},v={1,...,m},则联合概率 p i j = P ( u = i , v = j ) p_{ij} = P(u = i, v = j) pij=P(u=i,v=j),以及条件概 f i ∣ j = P ( u = i ∣ v = j ) f_{i|j} = P(u = i|v = j) fij=P(u=iv=j),则 f i ∣ j = p i j / ∑ p k ∣ j f_{i|j} = p_{ij}/\sum{p_{k|j}} fij=pij/pkj该式是一个线性分式映射。


五、凸函数
  定义函数 f : R n → R f:\bm{R}^n\rightarrow R f:RnR,若 d o m f dom f domf是凸集,且对于任意 x , y ∈ d o m f \bm{x}, \bm{y} \in dom f x,ydomf 0 ≤ θ ≤ 1 0 \le θ \le 1 0θ1,都有 f ( θ x + ( 1 − θ ) y ) ≤ θ f ( x ) + ( 1 − θ ) f ( y ) f(θ\bm{x} + (1-θ)\bm{y}) \le θf(\bm{x})+(1-θ)f(\bm{y}) f(θx+(1θ)y)θf(x)+(1θ)f(y)则称函数 f f f是凸函数。若该式在 x ≠ y , 0 < θ < 1 \bm{x} \ne \bm{y}, 0 < θ < 1 x=y,0<θ<1时成立,则称 f f f严格凸。若 f f f是凸的,则 − f -f f是凹的。
  对于任意的凸函数 f f f,考虑在 d o m   f dom\ f dom f上过点 x \bm{x} x的直线 x + t v \bm{x} + t\bm{v} x+tv,则 g ( t ) = f ( x + t v ) g(t) = f(\bm{x} + t\bm{v}) g(t)=f(x+tv)是凸的。这有助于将凸函数限制在直线上判断凸性。
  对于任意的凸函数 f f f,可以拓展为 g ( x ) = f ( x ) , x ∈ d o m   f = ∞ , x ∉ d o m   f \begin{aligned} g(\bm{x}) &= f(\bm{x}), \bm{x} \in dom\ f \\ &= \infty, \bm{x} \notin dom\ f \end{aligned} g(x)=f(x),xdom f=,x/dom f拓展后的 g g g仍是一个凸函数。
  考虑凸函数的一阶条件。若函数 f : R n → R f:\bm{R}^n\rightarrow R f:RnR可微,即梯度 ▽ f ▽f f d o m   f dom\ f dom f上均存在,则 f f f为凸函数等价于 d o m   f dom\ f dom f为凸且 f ( y ) ≥ f ( x ) + ▽ f T ( x ) ( y − x ) , ∀ x , y ∈ d o m   f f(\bm{y}) \ge f(\bm{x}) + ▽f^T(\bm{x})(\bm{y} - \bm{x}), \forall \bm{x}, \bm{y} \in dom\ f f(y)f(x)+fT(x)(yx),x,ydom f这是一条重要的性质,考虑存在 ▽ f T ( x ) = 0 ▽f^T(\bm{x}) = \bm{0} fT(x)=0的情况,则上述式为 f ( y ) ≥ f ( x ) , ∀ x , y ∈ d o m   f f(\bm{y}) \ge f(\bm{x}), \forall \bm{x}, \bm{y} \in dom\ f f(y)f(x),x,ydom f,这是凸优化的重要思想。
  考虑凸函数的二阶条件。若函数 f : R n → R f:\bm{R}^n\rightarrow R f:RnR二阶可微,则 f f f为凸函数等价于 d o m   f dom\ f dom f为凸且 ▽ f 2 ( x ) ⪰ 0 , ∀ x ∈ d o m   f ▽f^2(\bm{x}) \succeq 0, \forall \bm{x}\in dom\ f f2(x)0,xdom f其中 ▽ f 2 ( x ) ▽f^2(\bm{x}) f2(x)是海森【Hession】矩阵。
  考虑二次函数 f : R n → R f:\bm{R}^n\rightarrow R f:RnR,形如 f ( x ) = x T P x / 2 + q T x + r , P ∈ S n , q ∈ R n , r ∈ R f(\bm{x}) = \bm{x}^T\bm{Px}/2 + \bm{q}^T\bm{x} + r, \bm{P} \in \bm{S}^n, \bm{q} \in \bm{R}^n, r \in R f(x)=xTPx/2+qTx+r,PSn,qRn,rR考察其凸性,只需考察其海森矩阵 ▽ f 2 ( x ) = P ▽f^2(\bm{x}) = \bm{P} f2(x)=P
  考虑仿射函数 f ( x ) = A x + b f(\bm{x}) = \bm{A}\bm{x} + \bm{b} f(x)=Ax+b,其海森矩阵 ▽ f 2 ( x ) = 0 ▽f^2(\bm{x}) = \bm{0} f2(x)=0,即凸又凹。
  考虑指数函数 f ( x ) = e a x f(\bm{x}) = e^{a\bm{x}} f(x)=eax,其海森矩阵 ▽ f 2 ( x ) = a 2 e a x ⪰ 0 ▽f^2(\bm{x}) = a^2e^{a\bm{x}} \succeq 0 f2(x)=a2eax0,为凸。
  考虑幂函数 f ( x ) = x a , x ∈ R + + f(\bm{x}) = \bm{x}^a, x \in R_{++} f(x)=xa,xR++,其海森矩阵 ▽ f 2 ( x ) = a ( a − 1 ) x a − 2 ▽f^2(\bm{x}) = a(a-1)\bm{x}^{a-2} f2(x)=a(a1)xa2,当 0 ≤ a ≤ 1 0 \le a \le 1 0a1,为凸。
  考虑负熵 f ( x ) = x l o g x , x ∈ R + + f(x) = xlogx, x \in R_{++} f(x)=xlogx,xR++,其二阶导数为 1 / x 1/x 1/x,是严格凸的函数。
  考虑范数 p ( x ) , x ∈ R n p(\bm{x}),\bm{x} \in \bm{R}^n p(x),xRn满足 p ( a x ) = ∣ a ∣ p ( x ) p ( x + y ) ≤ p ( x ) + p ( y ) p ( x ) = 0 , x = 0 p(a\bm{x}) = |a|p(\bm{x}) \\ p(\bm{x} + \bm{y}) \le p(\bm{x}) + p(\bm{y}) \\ p(\bm{x}) = 0, \bm{x} = \bm{0} p(ax)=ap(x)p(x+y)p(x)+p(y)p(x)=0,x=0考察范数的凸性。 ∀ x , y ∈ R n , ∀ θ ∈ [ 0 , 1 ] \forall \bm{x}, \bm{y} \in \bm{R}^n, \forall\theta \in[0, 1] x,yRn,θ[0,1],有 p ( θ x + ( 1 − θ ) y ) ≤ p ( θ x ) + p ( ( 1 − θ ) y ) = θ p ( x ) + ( 1 − θ ) p ( y ) \begin{aligned} p(\theta\bm{x} + (1 - \theta)\bm{y}) &\le p(\theta\bm{x}) + p((1 - \theta)\bm{y}) \\ &= \theta p(\bm{x}) + (1 - \theta)p(\bm{y}) \end{aligned} p(θx+(1θ)y)p(θx)+p((1θ)y)=θp(x)+(1θ)p(y)即范数为凸。而考虑0范数 ∣ ∣ x ∣ ∣ 0 = n u m { x ∣ x i ≠ 0 } ||\bm{x}||_0 = num\{\bm{x}|x_{i} \ne 0\} x0=num{xxi=0}0范数不是范数,也非凸。
  考虑极大值函数 f ( x ) = m a x { x 1 , . . . , x n } , x ∈ R n f(\bm{x}) = max\{x_1, ..., x_n\}, \bm{x} \in \bm{R}^n f(x)=max{x1,...,xn},xRn ∀ x , y ∈ R n , ∀ θ ∈ [ 0 , 1 ] \forall \bm{x}, \bm{y} \in \bm{R}^n, \forall\theta \in[0, 1] x,yRn,θ[0,1],有 f ( θ x + ( 1 − θ ) y ) = m a x { θ x i + ( 1 − θ ) y i , i = 1 , . . . , n } ≤ θ m a x { x i } + ( 1 − θ ) m a x { y i } , i = 1 , . . . , n \begin{aligned} f(\theta\bm{x} + (1 - \theta)\bm{y}) &= max\{\theta x_i + (1 - \theta)y_i, i = 1, ..., n\} \\ &\le \theta max\{x_i\} + (1 - \theta) max\{y_i\}, i = 1, ..., n\end{aligned} f(θx+(1θ)y)=max{θxi+(1θ)yi,i=1,...,n}θmax{xi}+(1θ)max{yi},i=1,...,n即极大值函数为凸。极大值函数不可导,为了解决该问题,使用解析逼近解决该问题,形如 f ( x ) = l o g ( e x 1 + . . . + e x n ) , x ∈ R n m a x { x i } ≤ f ( x ) ≤ m a x { x i } + l o g n f(\bm{x}) = log(e^{x_1} + ... + e^{x_n}), \bm{x} \in \bm{R}^n \\ max\{x_i\} \le f(\bm{x}) \le max\{x_i\} + logn f(x)=log(ex1+...+exn),xRnmax{xi}f(x)max{xi}+logn其海森矩阵为 ∂ f / ∂ x i = e x i / ∑ e x i H i j = ∂ f 2 / ∂ x i ∂ x j ∂ f 2 / ∂ x i ∂ x j = − e x i e x j / ( ∑ e x i ) 2 , i ≠ j ∂ f 2 / ∂ x i ∂ x j = ( − e x i e x i + e x i ∑ e x i ) / ( ∑ e x i ) 2 , i = j ∂f/∂x_i = e^{x_i}/\sum{e^{x_i}} \\ \bm{H}_{ij} = ∂f^2/∂x_i∂x_j \\ ∂f^2/∂x_i∂x_j = -e^{x_i}e^{x_j}/(\sum{e^{x_i}})^2,i \ne j \\ ∂f^2/∂x_i∂x_j = (-e^{x_i}e^{x_i} + e^{x_i}\sum{e^{x_i}})/(\sum{e^{x_i}})^2, i = j f/xi=exi/exiHij=f2/xixjf2/xixj=exiexj/(exi)2,i=jf2/xixj=(exiexi+exiexi)/(exi)2,i=j则有 H = 1 / ( ∑ e x i ) 2 [ d i a g ( e x i ∑ e x i ) − ( e x 1 , . . . , e x n ) T ( e x 1 , . . . , e x n ) ] \bm{H} = 1/(\sum{e^{x_i}})^2[diag(e^{x_i}\sum{e^{x_i}}) - (e^{x_1}, ..., e^{x_n})^T(e^{x_1}, ..., e^{x_n})] H=1/(exi)2[diag(exiexi)(ex1,...,exn)T(ex1,...,exn)]考察 H \bm{H} H的半正定性,即 ∀ V ∈ R n , V T H V ≥ 0 \forall \bm{V} \in \bm{R}^n, \bm{V}^T\bm{H}\bm{V} \ge 0 VRn,VTHV0,取 z = ( e x 1 , . . . , e x n ) \bm{z} = (e^{x_1}, ..., e^{x_n}) z=(ex1,...,exn)不考虑正数系数,有 V T H V = k + + [ ( 1 T z ) V T d i a g ( z ) V − V T z z T V ] = k + + [ ∑ z i ∑ v i 2 z i − ( ∑ v i z i ) 2 ] \begin{aligned}\bm{V}^T\bm{H}\bm{V} &= k_{++}[(\bm{1}^T\bm{z})\bm{V}^Tdiag(\bm{z})\bm{V} - \bm{V}^T\bm{z}\bm{z}^T\bm{V}] \\&= k_{++}[\sum z_i\sum v_i^2z_i - (\sum v_iz_i)^2] \end{aligned} VTHV=k++[(1Tz)VTdiag(z)VVTzzTV]=k++[zivi2zi(vizi)2] a i = v i ( z i ) 1 / 2 , b i = z i 1 / 2 a_i = v_i(z_i)^{1/2}, b_i = z_i^{1/2} ai=vi(zi)1/2,bi=zi1/2,有 V T H V = k + + [ ∑ z i ∑ v i 2 z i − ( ∑ v i z i ) 2 ] = k + + [ b T b a T a − ( a T b ) 2 ] \begin{aligned}\bm{V}^T\bm{H}\bm{V} &= k_{++}[\sum z_i\sum v_i^2z_i - (\sum v_iz_i)^2] \\&= k_{++}[\bm{b}^T\bm{b}\bm{a}^T\bm{a} - (\bm{a}^T\bm{b})^2] \end{aligned} VTHV=k++[zivi2zi(vizi)2]=k++[bTbaTa(aTb)2]由柯西施瓦茨【Cauchy-Schwarz】不等式,该式非负,即极大值解析函数为凸。


六、保凸函数
  若 f 1 , . . . f m f_1, ...f_m f1,...fm是凸函数,则其非负加权和,即 f = ∑ w i f i , w i ≥ 0 f = \sum w_if_i, w_i \ge 0 f=wifi,wi0为凸。推广到连续情况,若 f ( x , y ) f(x, y) f(x,y)对于任何 y ∈ A y \in A yA均为凸,设 w ( y ) ≥ 0 w(y) \ge 0 w(y)0,则 g ( x ) = ∫ y ∈ A w ( y ) f ( x , y ) d y g(x) = \int_{y\in A}w(y)f(x, y)dy g(x)=yAw(y)f(x,y)dy为凸。
  考虑 f : R n → R , A ∈ R n × m , b ∈ R n f:\bm{R}^n \rightarrow R, \bm{A} \in \bm{R}^{n×m}, \bm{b} \in \bm{R}^n f:RnR,ARn×m,bRn,定义函数 g ( x ) = f ( A x + b ) g(\bm{x}) = f(\bm{A}\bm{x} + \bm{b}) g(x)=f(Ax+b),若 f f f为凸,则 g g g为凸。考虑 ∀ x , y ∈ d o m   g , 0 ≤ θ ≤ 1 \forall \bm{x}, \bm{y} \in dom\ g, 0 \le \theta \le 1 x,ydom g,0θ1,有 g ( θ x + ( 1 − θ ) y ) = f ( θ A x + ( 1 − θ ) A y + b ) = f ( θ ( A x + b ) + ( 1 − θ ) ( A y + b ) ) ≤ θ f ( A x + b ) + ( 1 − θ ) f ( A y + b ) = θ g ( x ) + ( 1 − θ ) g ( y ) \begin{aligned} g(\theta\bm{x} + (1 - \theta)\bm{y}) &= f(\theta\bm{A}\bm{x} + (1 - \theta)\bm{A}\bm{y} + \bm{b}) \\&= f(\theta(\bm{A}\bm{x} + \bm{b}) + (1 - \theta)(\bm{A}\bm{y} + \bm{b})) \\ &\le\theta f(\bm{A}\bm{x} + \bm{b}) + (1 - \theta)f(\bm{A}\bm{y} + \bm{b}) \\&= \theta g(\bm{x})+(1 - \theta)g(\bm{y}) \end{aligned} g(θx+(1θ)y)=f(θAx+(1θ)Ay+b)=f(θ(Ax+b)+(1θ)(Ay+b))θf(Ax+b)+(1θ)f(Ay+b)=θg(x)+(1θ)g(y)该问题先仿射,再映射;再考虑映射后仿射,即 f i : R n → R , A ∈ R n , b ∈ R f_i:\bm{R}^n \rightarrow R, \bm{A} \in \bm{R}^n, b \in R fi:RnR,ARn,bR,定义函数 g ( x ) = A ( f 1 ( x ) , . . . , f n ( x ) ) T + b g(\bm{x}) = \bm{A}(f_1(\bm{x}), ..., f_n(\bm{x}))^T+b g(x)=A(f1(x),...,fn(x))T+b,若 A \bm{A} A均为正,则该式是一个非负加权和。
  考虑两个函数的极大值函数, f 1 f_1 f1 f 2 f_2 f2为凸,则 f ( x ) = m a x { f 1 ( x ) , f 2 ( x ) } f(x) = max\{f_1(x), f_2(x)\} f(x)=max{f1(x),f2(x)}为凸。考虑 ∀ x , y ∈ d o m   f , 0 ≤ θ ≤ 1 \forall \bm{x}, \bm{y} \in dom\ f, 0 \le \theta \le 1 x,ydom f,0θ1,有 f ( θ x + ( 1 − θ ) y ) = m a x { f 1 ( θ x + ( 1 − θ ) y ) , f 2 ( θ x + ( 1 − θ ) y ) } ≤ m a x { θ f 1 ( x ) + ( 1 − θ ) f 1 ( y ) , θ f 2 ( x ) + ( 1 − θ ) f 2 ( y ) } ≤ m a x { θ f 1 ( x ) , θ f 2 ( x ) } + m a x { ( 1 − θ ) f 1 ( y ) , ( 1 − θ ) f 2 ( y ) } = θ f ( x ) + ( 1 − θ ) f ( y ) \begin{aligned} f(\theta\bm{x} + (1 - \theta)\bm{y}) &= max\{f_1(\theta\bm{x} + (1 - \theta)\bm{y}), f_2(\theta\bm{x} + (1 - \theta)\bm{y})\} \\&\le max\{\theta f_1(\bm{x}) + (1 - \theta)f_1(\bm{y}), \theta f_2(\bm{x}) + (1 - \theta)f_2(\bm{y})\} \\&\le max\{\theta f_1(\bm{x}), \theta f_2(\bm{x})\} + max\{(1 - \theta)f_1(\bm{y}), (1 - \theta)f_2(\bm{y})\} \\&= \theta f(\bm{x}) + (1 - \theta)f(\bm{y}) \end{aligned} f(θx+(1θ)y)=max{f1(θx+(1θ)y),f2(θx+(1θ)y)}max{θf1(x)+(1θ)f1(y),θf2(x)+(1θ)f2(y)}max{θf1(x),θf2(x)}+max{(1θ)f1(y),(1θ)f2(y)}=θf(x)+(1θ)f(y)  考虑函数的组合,定义 h : R k → R , g : R n → R k h:\bm{R}^k \rightarrow R, g:\bm{R}^n\rightarrow\bm{R}^k h:RkR,g:RnRk,则其函数组合为 f = h g : R n → R f=hg:\bm{R}^n\rightarrow R f=hg:RnR,其定义域 d o m   f = { x ∈ d o m   g ∣ g ( x ) ∈ d o m   h } dom\ f = \{x\in dom\ g|g(x) \in dom\ h\} dom f={xdom gg(x)dom h}。考察定义在 R R R上一维二阶可微函数的凸性,即 f ( x ) = h ( g ( x ) ) f(x) = h(g(x)) f(x)=h(g(x))的二阶导数 d f ( x ) 2 / d 2 x = d h 2 ( g ( x ) ) / d 2 g ( x ) ⋅ ( d g ( x ) / d x ) 2 + d h ( g ( x ) ) / d g ( x ) ⋅ d g ( x ) 2 / d 2 x df(x)^2/d^2x = dh^2(g(x))/d^2g(x)·(dg(x)/dx)^2 + dh(g(x))/dg(x)·dg(x)^2/d^2x df(x)2/d2x=dh2(g(x))/d2g(x)(dg(x)/dx)2+dh(g(x))/dg(x)dg(x)2/d2x则有 h h h为凸且单调不减,而 g g g为凸函数时, f f f为凸;或 h h h为凸且单调不增,而 g g g为凹函数时, f f f为凸。再考虑复杂情况,即高维、非实数全空间定义或二阶不可微时,分别使用海森矩阵、扩展函数与原始定义来解决。
  定义函数 f : R n → R , g : R n × R + + → R f:\bm{R}^n \rightarrow R, g:\bm{R}^n × R_{++}\rightarrow R f:RnR,g:Rn×R++R,其中 g ( x , t ) = t f ( x / t ) g(\bm{x},t) = tf(\bm{x}/t) g(x,t)=tf(x/t)其中 d o m   g = { ( x , t ) ∣ t > 0 , x / t ∈ d o m   f } dom\ g = \{(\bm{x}, t)|t >0, \bm{x}/t \in dom\ f\} dom g={(x,t)t>0,x/tdom f}。那么若 f f f为凸,则 g g g为凸。
  考虑负对数 f ( x ) = − l o g x f(x) = -logx f(x)=logx,其是一个凸函数,而其透视 g ( x , t ) = t l o g ( t / x ) g(x, t) = tlog(t/x) g(x,t)=tlog(t/x)也是凸的。再考虑 u , v ∈ R + + n \bm{u}, \bm{v} \in \bm{R}_{++}^n u,vR++n,那么 g ( u , v ) = ∑ u i l o v g ( u i / v i ) g(\bm{u}, \bm{v}) = \sum u_ilovg(u_i/v_i) g(u,v)=uilovg(ui/vi)也是凸的,其是凸函数的和。再考虑 D K L ( u , v ) = ∑ ( u i l o g ( u i / v i ) − u i + v i ) D_{KL}(\bm{u}, \bm{v}) = \sum (u_ilog(u_i/v_i)-u_i + v_i) DKL(u,v)=(uilog(ui/vi)ui+vi)称为KL散度,其是一个凸函数,并且是一种Bregman散度。考虑函数 f : R → R f:R \rightarrow R f:RR为凸,则其Bregman散度为 D B ( u , v ) = f ( u ) − f ( v ) − ▽ f ( v ) ( u − v ) D_B(u, v) = f(u) - f(v) - ▽f(v)(u-v) DB(u,v)=f(u)f(v)f(v)(uv)当取 f ( u ) = ∑ u i l o g u i − ∑ u i f(\bm{u}) = \sum u_ilogu_i - \sum u_i f(u)=uiloguiui时,其退化为KL散度,因为Bregman不保凸。


七、拟凸函数
  考虑函数 f : R n → R f:\bm{R}^n \rightarrow R f:RnR,其α下水平集【α-sublevel set】定义为 C α = { x ∈ d o m   f ∣ f ( x ) ≤ α } C_\alpha=\{\bm{x} \in dom\ f|f(\bm{x}) \le \alpha\} Cα={xdom ff(x)α}凸函数的所有下水平集都是凸集,对于 ∀ x , y ∈ C α , f ( x ) ≤ α , f ( y ) ≤ α \forall \bm{x}, \bm{y} \in C_\alpha, f(\bm{x}) \le \alpha, f(\bm{y}) \le \alpha x,yCα,f(x)α,f(y)α,有 f ( θ x + ( 1 − θ ) y ) ≤ θ f ( x ) + ( 1 − θ ) f ( y ) ≤ α \begin{aligned} f(\theta\bm{x} + (1 - \theta)\bm{y}) &\le \theta f(\bm{x}) + (1 - \theta)f(\bm{y}) \\&\le \alpha \end{aligned} f(θx+(1θ)y)θf(x)+(1θ)f(y)α即对任意的 α \alpha α都满足。但该性质反之则不成立。
  考虑下水平集的意义,对于凸函数 f : R 2 → R f:\bm{R}^2 \rightarrow R f:R2R, 将函数空间投影到几何平面时,当 α \alpha α增大,其下水平集投影是单调不减的凸集,推广到高维亦然。
  而对于这样的函数,其不是凸函数,但其下水平集是凸集,称为拟凸函数。若一个函数是凸函数,则其一定是一个拟凸函数,但反之不成立,拟凸函数甚至可能是一个凹函数。拟凸函数也称单模态函数,一般来讲,凸优化算法亦适用于拟凸函数。拟凸函数可以用数字语言定义,形如 m a x { f ( x ) , f ( y ) } ≥ f ( θ x + ( 1 − θ ) y ) max\{f(\bm{x}), f(\bm{y})\} \ge f(\theta\bm{x} + (1 - \theta)\bm{y}) max{f(x),f(y)}f(θx+(1θ)y)则称 f f f为拟凸函数。
  对于一个拟凸函数 f f f,若其一阶可微,则有若 f ( y ) ≤ f ( x ) f(\bm{y}) \le f(\bm{x}) f(y)f(x),则 ▽ f T ( x ) ( y − x ) ≤ 0 ▽f^T(\bm{x})(\bm{y} - \bm{x}) \le 0 fT(x)(yx)0
  对于一个拟凸函数 f f f,若其二阶可微,则有若 y T ▽ f ≥ 0 \bm{y}^T▽f \ge 0 yTf0,则 y T ▽ 2 f y ≥ 0 \bm{y}^T▽^2f\bm{y} \ge 0 yT2fy0

你可能感兴趣的:(数学)