Convex Optimization Note 1

Convex Optimization Note 1

本文是《Convex Optimization》ch.2\3 appendix A的笔记

1. Convex Set

1.1 Affine and convex sets:

1) C=V+x0={x+x0|xV} C = V + x 0 = { x + x 0 | x ∈ V } ​ affine set 可以看做subspace在其中偏移一个点。类似于

Ax=b A x = b ​
的通解是nullspace的加上一个特解。

2) Affine dimension and relative interior 在affine hull的dimension与其上的interior

3) Convex combination 可以推广到infinite情况:

xC x ∈ C
then
p(x)xdxC ∫ p ( x ) x d x ∈ C

4) Cones if xC,then θ>0,θxC i f   x ∈ C , t h e n   ∀ θ > 0 , θ x ∈ C –>convex cone

1.2 Some examples

1) Euclidean balls and ellipsoids:

{(xxc)TP1(xxc)} { ( x − x c ) T P − 1 ( x − x c ) }
{xc+Au|u1} { x c + A u | ∥ u ∥≤ 1 }
A=P1/2 A = P 1 / 2

2) Norm cones {(x,t)|x<t}Rn+1 { ( x , t ) | ‖ x ‖ < t } ⊂ R n + 1

3) Polyhedra –> simplex (the convex hull of k+1 k + 1 affinely independent points is k k -dimension simplex)

like unit simplex (0,e1,...,en) ( 0 , e 1 , . . . , e n ) and probability simplex (e1,...,en) ( e 1 , . . . , e n )

Polyhedra可以有两种表示方法: Convex hull 或 Inequality

4) Positive semi-definite cone Sn+ S + n

1.3 Operation that preserve convexity

1) Intersection

Positive semi-definite Cone Sn+=z0{XSn|zTXz0} S + n = ⋂ z ≠ 0 { X ∈ S n | z T X z ≥ 0 }

S={xRm| |p(t)|1 for|t|π/3} S = { x ∈ R m |   | p ( t ) | ≤ 1   f o r | t | ≤ π / 3 } and p(t)=mk=1xkcoskt p ( t ) = ∑ k = 1 m x k cos ⁡ k t

所有的convex set可以表达为infinite个halfspace的交集

2) Affine function 仿射函数或其逆函数均不改变convexity

Polyhedra

Solution set of linear matrix inequality A(x)=x1A1+...+xnAnB A ( x ) = x 1 A 1 + . . . + x n A n ⪯ B

Hyperbolic cone {x|xTPx(cTx)2,cTx0} { x | x T P x ≤ ( c T x ) 2 , c T x ≥ 0 } is inverse image of {(x,t)|xTxt2,t0} { ( x , t ) | x T x ≤ t 2 , t ≥ 0 }

3) Perspective functions

P(z,t)=z/t P ( z , t ) = z / t 其中 dom P=Rn×R++ d o m   P = R n × R + + 这种函数(或其逆函数)可以保持凸性

Conditional probability: 原始probability位于probability simplex上,condition只是除以部分的和,可以看作linear-fractional function,因此conditional prob也是convex set

1.4 Separating and supporting theorem

1) Separating theorem: 任意两个不相交的凸集可以用hyperplane分开。

证明为找两个凸集的最近点连线的中点,过中点并且垂直于连线的hyperplane,两个集合必定会将其分开。反证其不能分开( Ax+b A x + b 符号不对)则可以在凸集中找到一个更近的点(正好是欧氏距离的导数)。

2)Strict separating:

两个凸集不一定strict separating

一个closed convex set与一个点可以strict separating,表明所有closed convex set是所有包含它的half-space的交集。

3)inverse: 对于两个凸集,如果有一个是开集,则如果它们存在separating hyperplane,那么它们disjoint

4)supporting theorem可以由 intC i n t C P P 的separating来证明

2. Mathematical background (Appendix A)

1) norm
Vector norm: P-quadratic: xp=(xTPx)1/2 ‖ x ‖ p = ( x T P x ) 1 / 2
Matrix norm:
sum-absolute/maximum-absolute
operator norms Xa,b=sup{Xua | ub1} ‖ X ‖ a , b = s u p { ‖ X u ‖ a   |   ‖ u ‖ b ≤ 1 }
由operator产生的: l2 l 2 产生spectral norm为最大的奇异值, l1 l 1 得到max-column-sum, l l ∞ 得到max-row-sum

2) equivalence of norm: 所有 Rn R n 上的norm与某个quadratic norm等价,满足 xPxnxP ‖ x ‖ P ≤ ‖ x ‖ ≤ n ‖ x ‖ P

3) Dual norm:
zTxxz z T x ≤ ‖ x ‖ ‖ z ‖ ∗
L2-norm与自身dual,L1与L dual,Lp与Lq dual( 1/p+1/q=1 1 / p + 1 / q = 1 )

4) close/open set and boundary definition

5) closed function: sublevel set {xdomf|f(x)α} { x ∈ d o m f | f ( x ) ≤ α } all are closed set
如果 f 连续,dom f 是闭集,则f closed
如果 f 连续,dom f 是开集,则f 在端点上需要趋近于 才能让f closed

6) logdet(I+X1/2ΔXX1/2)=ni=1(1+λi) log ⁡ det ( I + X − 1 / 2 Δ X X 1 / 2 ) = ∑ i = 1 n ( 1 + λ i ) 其中 λi λ i X1/2ΔXX1/2 X − 1 / 2 Δ X X 1 / 2 的特征值
logdet(X)=X1 ∇ log ⁡ det ( X ) = X − 1

7) cond(A)=A2A12=σmax(A)/σmin(A) c o n d ( A ) = ‖ A ‖ 2 ‖ A − 1 ‖ 2 = σ m a x ( A ) / σ m i n ( A )

8) pseudo inverse :
Ab A † b minimize Axb22 m i n i m i z e   ‖ A x − b ‖ 2 2 的解
generalized quadratic function minima

9) Schur complement X=(ABTBC) X = ( A B B T C ) S=CBTA1B S = C − B T A − 1 B
detX=detAdetS det X = det A det S
inverse 可以分解为S的逆
infu(uv)(ABTBC)(uv)=vTSv inf u ( u v ) ( A B B T C ) ( u v ) = v T S v
X的正定<–>A与S正定,X正定A正定<–>S正定
当A为singular时,Schur补可以由A的pseudo inverse来表示

3. Convex function

3.1 basics

1) restrict to line convex/ extended value function

2) 1st order condition: f(y)f(x)+f(x)T(yx) f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x )

3) 2nd order condition: 2f(x)0 ∇ 2 f ( x ) ⪰ 0

4) sublevel sets of convex functions are convex sets, converse is not true.

5) Epigraph is convex function is convex
Epigraph在 (x,f(x)) ( x , f ( x ) ) 的supporting plane法向为 (f(x),1) ( ∇ f ( x ) , − 1 )

6) f(θx+(1θ)y)θf(x)+(1θ)f(y) f ( θ x + ( 1 − θ ) y ) ≤ θ f ( x ) + ( 1 − θ ) f ( y ) 推广 f(Ex)Ef(x) f ( E x ) ≤ E f ( x ) 可以称为Jensen’s Inequality
可以用它证明:
ab(a+b)/2 a b ≤ ( a + b ) / 2
Holder inequality ni=1xiyi(ni=1|xi|p)1/p(ni=1|yi|q)1/q ∑ i = 1 n x i y i ≤ ( ∑ i = 1 n | x i | p ) 1 / p ( ∑ i = 1 n | y i | q ) 1 / q 其中(1/p+1/q=1)

7) examples
f(x)=x2/y  with y>0 f ( x ) = x 2 / y     w i t h   y > 0
log-sum-exp function f(x)=log(ex1+...+exn) f ( x ) = l o g ( e x 1 + . . . + e x n ) 求二阶导数,用Cauthy 不等式可得
geometric mean f(x)=(ni=1xi)1/n f ( x ) = ( ∏ i = 1 n x i ) 1 / n 同求二阶导,用Cauthy不等式得concave
log-determinant f(X)=logdetX  domf=Sn++ f ( X ) = log ⁡ det X     d o m f = S + + n 限制到直线上,求导可得

3.2 operations that preserve convexity

1) nonnegative weighted sum –>推广到无限sum

2) affine mapping f(Ax+b)

3) point-wise max f(x)=max(f1(x),...,fn(x)) f ( x ) = m a x ( f 1 ( x ) , . . . , f n ( x ) ) –> infinite set g(x)=supyf(x,y) g ( x ) = sup y f ( x , y ) 给定y,所有的f(x)都是凸函数
sum of r largest component
supporting function of a set(任意集合) f(x)=sup{xTy|yC} f ( x ) = sup { x T y | y ∈ C }
distance to the farthest point of a set f(x)=supyCxy f ( x ) = s u p y ∈ C ‖ x − y ‖
maximum eigenvalue of a symmetric matrix f(X)=sup{yTXy|y2=1} f ( X ) = s u p { y T X y | ‖ y ‖ 2 = 1 }
operator norm见2. background
所有凸函数都是所有affine under-estimator 函数的supremum(每一点都取supporting plane)

4)Composition
从求二次导数的式子可以得到。 h(g(x))=h(g(x))g(x)2+h(g(x))g(x) h ′ ′ ( g ( x ) ) = h ′ ′ ( g ( x ) ) g ′ ( x ) 2 + h ′ ( g ( x ) ) g ′ ′ ( x )
推广后并不需要二次可导,只需要h在其extended value function上是nondecreasing或者nonincreasing即可。
这种extended value上限制了h定义域的范围,一定会包括( )

5) Minimization: f f is convex in (x,y) ( x , y ) , and C C is convex non-empty set, g(x)=infyCf(x,y) g ( x ) = i n f y ∈ C f ( x , y ) is convex
distance to a convex set
g(x)=inf{h(y)|Ay=x} g ( x ) = i n f { h ( y ) | A y = x }

6) Perspective of a function: g(x,t)=tf(x/t) g ( x , t ) = t f ( x / t ) 可以由epigraph证明
g(x,t)=xTx/t g ( x , t ) = x T x / t
g(x,t)=tlog(x/t)=tlogttlogx g ( x , t ) = − t log ⁡ ( x / t ) = t log ⁡ t − t log ⁡ x

3.3 conjugate function

1) f(y)=supxdomf(yTf(x)) f ∗ ( y ) = s u p x ∈ d o m f ( y T − f ( x ) )

2) Affine: b − b
Negative logarithm: log(y)1 y<0 − l o g ( − y ) − 1   y < 0
Exponential: ylogyy with y0 y log ⁡ y − y   w i t h   y ≥ 0
Negative entropy: ey1 yR e y − 1   y ∈ R
Inverse: 2(y)1/2 y0 − 2 ( − y ) 1 / 2   y ≤ 0
Strictly convex quadratic function: f(x)=12xTQx with Q0 f ( x ) = 1 2 x T Q x   w i t h   Q ≻ 0 f(y)=12yTQ1y f ∗ ( y ) = 1 2 y T Q − 1 y
Log-determinant: f(Y)=logdet(Y)1n f ∗ ( Y ) = log ⁡ det ( − Y ) − 1 − n
Indicator function: supporting function

3) f(x)+f(y)xTy f ( x ) + f ∗ ( y ) ≥ x T y

4) f convex and closed–> f=f f ∗ ∗ = f 没有前提,不成立

5) scaling and affine transformation$$
sum of independent functions$f(u,v)=f_1(u)+f_2(v)$ 则$f^(w,z)=f_1^(w)+f_2^*(z)$

你可能感兴趣的:(凸优化)