机器学习中的凸和非凸优化问题

题目(145):机器学习中的优化问题,哪些是凸优化问题,哪些是非凸优化问题?请各举一个例子。

  • 凸优化定义

  • 凸优化问题

  • 非凸优化问题

  • 凸优化定义:公式、geometric insight

  • 凸优化问题:逻辑回归;通过Hessian matrix的半正定性质判定;局部最优等价于全部最优

  • 非凸优化问题:PCA;PCA求解方式

凸优化问题

逻辑回归

L i ( θ ) = log ⁡ ( 1 + exp ⁡ ( − y i θ T x i ) ) L_i(\theta) = \log(1+\exp(-y_i \theta^T x_i)) Li(θ)=log(1+exp(yiθTxi))

损失函数推导
logistic regression model:
log ⁡ p 1 − p = θ T x ⇒ p = exp ⁡ ( θ T x ) 1 + exp ⁡ ( θ T x ) \log \frac{p}{1-p}=\theta^T x \Rightarrow p = \frac{\exp(\theta^T x)}{1+\exp(\theta^T x)} log1pp=θTxp=1+exp(θTx)exp(θTx)

max ⁡ MLE ≃ − min ⁡ log ⁡ MLE : = min ⁡ L ( x , y ; θ ) \max \text{MLE} \simeq -\min \log \text{MLE}:= \min L(x,y;\theta) maxMLEminlogMLE:=minL(x,y;θ)

L = − ( y log ⁡ p + ( 1 − y ) log ⁡ ( 1 − p ) ) = − y log ⁡ 1 1 + exp ⁡ ( − θ T x ) − ( 1 − y ) log ⁡ 1 1 + exp ⁡ ( θ T x ) = y log ⁡ ( 1 + exp ⁡ ( − θ T x ) ) + ( 1 − y ) log ⁡ ( 1 + exp ⁡ ( θ T x ) ) = log ⁡ ( 1 + exp ⁡ ( − θ T x ⋅ y ) ) , \begin{aligned} L &= - (y \log p + (1-y) \log (1-p)) \\ &= - y \log \frac{1}{1+\exp(-\theta^T x)} - (1-y) \log \frac{1}{1+\exp(\theta^T x)}\\ &= y \log (1+\exp(-\theta^T x)) + (1-y) \log (1+\exp(\theta^T x))\\ &=\log (1+\exp(-\theta^T x \cdot y)), \end{aligned} L=(ylogp+(1y)log(1p))=ylog1+exp(θTx)1(1y)log1+exp(θTx)1=ylog(1+exp(θTx))+(1y)log(1+exp(θTx))=log(1+exp(θTxy)),

where Y ∈ { 0 , 1 } Y \in \{0,1\} Y{0,1} and p = P ( Y = 1 ∣ X = x ) p=P(Y=1|X=x) p=P(Y=1X=x).

其它例子:SVM, linear regression \textcolor{red}{\text{\small 其它例子:SVM, linear regression}} 其它例子:SVM, linear regression

非凸优化问题

PCA

min ⁡ V V T L ( V ) = ∥ X − V T V X ∥ F 2 \min_{V V^T}L(V)= \| X-V^T V X\|_F^2 VVTminL(V)=XVTVXF2

(minimise   the   reconstruction   error) \textcolor{gray}{\textit{\small (minimise the reconstruction error)}} (minimise the reconstruction error)

Formulation from the perspective of maximising the variance \textcolor{red}{\text{\small Formulation from the perspective of maximising the variance}} Formulation from the perspective of maximising the variance

验证该目标为非凸问题:检查定义
If V ∗ V^\ast V is the minimum, then − V ∗ -V^\ast V is also the minimum as L ( V ∗ ) = L ( − V ∗ ) L(V^\ast)=L(-V^\ast) L(V)=L(V).
L ( 1 2 V ∗ + 1 2 ( − V ∗ ) ) = L ( 0 ) = ∥ X ∥ F 2 > ∥ X − V ∗ T V ∗ X ∥ F 2 = 1 2 L ( V ∗ ) + 1 2 L ( − V ∗ ) \begin{aligned} L\large(\frac{1}{2} V^\ast + \frac{1}{2} (-V^\ast) \large)=L(0)&=\|X\|_F^2 \\ &> \| X-V^{\ast T} V^\ast X\|_F^2=\frac{1}{2} L(V^\ast) + \frac{1}{2} L(-V^\ast) \end{aligned} L(21V+21(V))=L(0)=XF2>XVTVXF2=21L(V)+21L(V)

求解: SVD \textcolor{red}{\text{\small SVD}} SVD

其它例子:low-rank model (e.g. matrix decomposition), deep neural network \textcolor{red}{\text{\small 其它例子:low-rank model (e.g. matrix decomposition), deep neural network}} 其它例子:low-rank model (e.g. matrix decomposition), deep neural network

参考文献:

  1. 《百面机器学习》

你可能感兴趣的:(Optimization,百面机器学习)