本文是《Convex Optimization》ch.2\3 appendix A的笔记
1) C=V+x0={x+x0|x∈V} C = V + x 0 = { x + x 0 | x ∈ V } affine set 可以看做subspace在其中偏移一个点。类似于
2) Affine dimension and relative interior 在affine hull的dimension与其上的interior
3) Convex combination 可以推广到infinite情况:
4) Cones if x∈C,then ∀θ>0,θx∈C i f x ∈ C , t h e n ∀ θ > 0 , θ x ∈ C –>convex cone
1) Euclidean balls and ellipsoids:
2) Norm cones {(x,t)|∥x∥<t}⊂Rn+1 { ( x , t ) | ‖ x ‖ < t } ⊂ R n + 1
3) Polyhedra –> simplex (the convex hull of k+1 k + 1 affinely independent points is k k -dimension simplex)
like unit simplex (0,e1,...,en) ( 0 , e 1 , . . . , e n ) and probability simplex (e1,...,en) ( e 1 , . . . , e n )
Polyhedra可以有两种表示方法: Convex hull 或 Inequality
4) Positive semi-definite cone Sn+ S + n
1) Intersection
Positive semi-definite Cone Sn+=⋂z≠0{X∈Sn|zTXz≥0} S + n = ⋂ z ≠ 0 { X ∈ S n | z T X z ≥ 0 }
S={x∈Rm| |p(t)|≤1 for|t|≤π/3} S = { x ∈ R m | | p ( t ) | ≤ 1 f o r | t | ≤ π / 3 } and p(t)=∑mk=1xkcoskt p ( t ) = ∑ k = 1 m x k cos k t
所有的convex set可以表达为infinite个halfspace的交集
2) Affine function 仿射函数或其逆函数均不改变convexity
Polyhedra
Solution set of linear matrix inequality A(x)=x1A1+...+xnAn⪯B A ( x ) = x 1 A 1 + . . . + x n A n ⪯ B
Hyperbolic cone {x|xTPx≤(cTx)2,cTx≥0} { x | x T P x ≤ ( c T x ) 2 , c T x ≥ 0 } is inverse image of {(x,t)|xTx≤t2,t≥0} { ( x , t ) | x T x ≤ t 2 , t ≥ 0 }
3) Perspective functions
P(z,t)=z/t P ( z , t ) = z / t 其中 dom P=Rn×R++ d o m P = R n × R + + 这种函数(或其逆函数)可以保持凸性
Conditional probability: 原始probability位于probability simplex上,condition只是除以部分的和,可以看作linear-fractional function,因此conditional prob也是convex set
1) Separating theorem: 任意两个不相交的凸集可以用hyperplane分开。
证明为找两个凸集的最近点连线的中点,过中点并且垂直于连线的hyperplane,两个集合必定会将其分开。反证其不能分开( Ax+b A x + b 符号不对)则可以在凸集中找到一个更近的点(正好是欧氏距离的导数)。
2)Strict separating:
两个凸集不一定strict separating
一个closed convex set与一个点可以strict separating,表明所有closed convex set是所有包含它的half-space的交集。
3)inverse: 对于两个凸集,如果有一个是开集,则如果它们存在separating hyperplane,那么它们disjoint
4)supporting theorem可以由 intC i n t C 与 P P 的separating来证明
1) norm
Vector norm: P-quadratic: ∥x∥p=(xTPx)1/2 ‖ x ‖ p = ( x T P x ) 1 / 2
Matrix norm:
sum-absolute/maximum-absolute
operator norms ∥X∥a,b=sup{∥Xu∥a | ∥u∥b≤1} ‖ X ‖ a , b = s u p { ‖ X u ‖ a | ‖ u ‖ b ≤ 1 }
由operator产生的: l2 l 2 产生spectral norm为最大的奇异值, l1 l 1 得到max-column-sum, l∞ l ∞ 得到max-row-sum
2) equivalence of norm: 所有 Rn R n 上的norm与某个quadratic norm等价,满足 ∥x∥P≤∥x∥≤n−−√∥x∥P ‖ x ‖ P ≤ ‖ x ‖ ≤ n ‖ x ‖ P
3) Dual norm:
zTx≤∥x∥∥z∥∗ z T x ≤ ‖ x ‖ ‖ z ‖ ∗
L2-norm与自身dual,L1与L ∞ ∞ dual,Lp与Lq dual( 1/p+1/q=1 1 / p + 1 / q = 1 )
4) close/open set and boundary definition
5) closed function: sublevel set {x∈domf|f(x)≤α} { x ∈ d o m f | f ( x ) ≤ α } all are closed set
如果 f 连续,dom f 是闭集,则f closed
如果 f 连续,dom f 是开集,则f 在端点上需要趋近于 ∞ ∞ 才能让f closed
6) logdet(I+X−1/2ΔXX1/2)=∑ni=1(1+λi) log det ( I + X − 1 / 2 Δ X X 1 / 2 ) = ∑ i = 1 n ( 1 + λ i ) 其中 λi λ i 是 X−1/2ΔXX1/2 X − 1 / 2 Δ X X 1 / 2 的特征值
∇logdet(X)=X−1 ∇ log det ( X ) = X − 1
7) cond(A)=∥A∥2∥A−1∥2=σmax(A)/σmin(A) c o n d ( A ) = ‖ A ‖ 2 ‖ A − 1 ‖ 2 = σ m a x ( A ) / σ m i n ( A )
8) pseudo inverse :
A†b A † b 是 minimize ∥Ax−b∥22 m i n i m i z e ‖ A x − b ‖ 2 2 的解
generalized quadratic function minima
9) Schur complement X=(ABTBC) X = ( A B B T C ) S=C−BTA−1B S = C − B T A − 1 B
detX=detAdetS det X = det A det S
inverse 可以分解为S的逆
infu(uv)(ABTBC)(uv)=vTSv inf u ( u v ) ( A B B T C ) ( u v ) = v T S v
X的正定<–>A与S正定,X正定A正定<–>S正定
当A为singular时,Schur补可以由A的pseudo inverse来表示
1) restrict to line convex/ extended value function
2) 1st order condition: f(y)≥f(x)+∇f(x)T(y−x) f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x )
3) 2nd order condition: ∇2f(x)⪰0 ∇ 2 f ( x ) ⪰ 0
4) sublevel sets of convex functions are convex sets, converse is not true.
5) Epigraph is convex ⇔ ⇔ function is convex
Epigraph在 (x,f(x)) ( x , f ( x ) ) 的supporting plane法向为 (∇f(x),−1) ( ∇ f ( x ) , − 1 )
6) f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y) f ( θ x + ( 1 − θ ) y ) ≤ θ f ( x ) + ( 1 − θ ) f ( y ) 推广 f(Ex)≤Ef(x) f ( E x ) ≤ E f ( x ) 可以称为Jensen’s Inequality
可以用它证明:
ab−−√≤(a+b)/2 a b ≤ ( a + b ) / 2
Holder inequality ∑ni=1xiyi≤(∑ni=1|xi|p)1/p(∑ni=1|yi|q)1/q ∑ i = 1 n x i y i ≤ ( ∑ i = 1 n | x i | p ) 1 / p ( ∑ i = 1 n | y i | q ) 1 / q 其中(1/p+1/q=1)
7) examples
f(x)=x2/y with y>0 f ( x ) = x 2 / y w i t h y > 0
log-sum-exp function f(x)=log(ex1+...+exn) f ( x ) = l o g ( e x 1 + . . . + e x n ) 求二阶导数,用Cauthy 不等式可得
geometric mean f(x)=(∏ni=1xi)1/n f ( x ) = ( ∏ i = 1 n x i ) 1 / n 同求二阶导,用Cauthy不等式得concave
log-determinant f(X)=logdetX domf=Sn++ f ( X ) = log det X d o m f = S + + n 限制到直线上,求导可得
1) nonnegative weighted sum –>推广到无限sum
2) affine mapping f(Ax+b)
3) point-wise max f(x)=max(f1(x),...,fn(x)) f ( x ) = m a x ( f 1 ( x ) , . . . , f n ( x ) ) –> infinite set g(x)=supyf(x,y) g ( x ) = sup y f ( x , y ) 给定y,所有的f(x)都是凸函数
sum of r largest component
supporting function of a set(任意集合) f(x)=sup{xTy|y∈C} f ( x ) = sup { x T y | y ∈ C }
distance to the farthest point of a set f(x)=supy∈C∥x−y∥ f ( x ) = s u p y ∈ C ‖ x − y ‖
maximum eigenvalue of a symmetric matrix f(X)=sup{yTXy|∥y∥2=1} f ( X ) = s u p { y T X y | ‖ y ‖ 2 = 1 }
operator norm见2. background
所有凸函数都是所有affine under-estimator 函数的supremum(每一点都取supporting plane)
4)Composition
从求二次导数的式子可以得到。 h′′(g(x))=h′′(g(x))g′(x)2+h′(g(x))g′′(x) h ′ ′ ( g ( x ) ) = h ′ ′ ( g ( x ) ) g ′ ( x ) 2 + h ′ ( g ( x ) ) g ′ ′ ( x )
推广后并不需要二次可导,只需要h在其extended value function上是nondecreasing或者nonincreasing即可。
这种extended value上限制了h定义域的范围,一定会包括( ∞ ∞ )
5) Minimization: f f is convex in (x,y) ( x , y ) , and C C is convex non-empty set, g(x)=infy∈Cf(x,y) g ( x ) = i n f y ∈ C f ( x , y ) is convex
distance to a convex set
g(x)=inf{h(y)|Ay=x} g ( x ) = i n f { h ( y ) | A y = x }
6) Perspective of a function: g(x,t)=tf(x/t) g ( x , t ) = t f ( x / t ) 可以由epigraph证明
g(x,t)=xTx/t g ( x , t ) = x T x / t
g(x,t)=−tlog(x/t)=tlogt−tlogx g ( x , t ) = − t log ( x / t ) = t log t − t log x
1) f∗(y)=supx∈domf(yT−f(x)) f ∗ ( y ) = s u p x ∈ d o m f ( y T − f ( x ) )
2) Affine: −b − b
Negative logarithm: −log(−y)−1 y<0 − l o g ( − y ) − 1 y < 0
Exponential: ylogy−y with y≥0 y log y − y w i t h y ≥ 0
Negative entropy: ey−1 y∈R e y − 1 y ∈ R
Inverse: −2(−y)1/2 y≤0 − 2 ( − y ) 1 / 2 y ≤ 0
Strictly convex quadratic function: f(x)=12xTQx with Q≻0 f ( x ) = 1 2 x T Q x w i t h Q ≻ 0 f∗(y)=12yTQ−1y f ∗ ( y ) = 1 2 y T Q − 1 y
Log-determinant: f∗(Y)=logdet(−Y)−1−n f ∗ ( Y ) = log det ( − Y ) − 1 − n
Indicator function: supporting function
3) f(x)+f∗(y)≥xTy f ( x ) + f ∗ ( y ) ≥ x T y
4) f convex and closed–> f∗∗=f f ∗ ∗ = f 没有前提,不成立
5) scaling and affine transformation$$
sum of independent functions$f(u,v)=f_1(u)+f_2(v)$ 则$f^(w,z)=f_1^(w)+f_2^*(z)$