题目(145):机器学习中的优化问题,哪些是凸优化问题,哪些是非凸优化问题?请各举一个例子。
凸优化定义
凸优化问题
非凸优化问题
凸优化定义:公式、geometric insight
凸优化问题:逻辑回归;通过Hessian matrix的半正定性质判定;局部最优等价于全部最优
非凸优化问题:PCA;PCA求解方式
L i ( θ ) = log ( 1 + exp ( − y i θ T x i ) ) L_i(\theta) = \log(1+\exp(-y_i \theta^T x_i)) Li(θ)=log(1+exp(−yiθTxi))
损失函数推导
logistic regression model:
log p 1 − p = θ T x ⇒ p = exp ( θ T x ) 1 + exp ( θ T x ) \log \frac{p}{1-p}=\theta^T x \Rightarrow p = \frac{\exp(\theta^T x)}{1+\exp(\theta^T x)} log1−pp=θTx⇒p=1+exp(θTx)exp(θTx)
max MLE ≃ − min log MLE : = min L ( x , y ; θ ) \max \text{MLE} \simeq -\min \log \text{MLE}:= \min L(x,y;\theta) maxMLE≃−minlogMLE:=minL(x,y;θ)
L = − ( y log p + ( 1 − y ) log ( 1 − p ) ) = − y log 1 1 + exp ( − θ T x ) − ( 1 − y ) log 1 1 + exp ( θ T x ) = y log ( 1 + exp ( − θ T x ) ) + ( 1 − y ) log ( 1 + exp ( θ T x ) ) = log ( 1 + exp ( − θ T x ⋅ y ) ) , \begin{aligned} L &= - (y \log p + (1-y) \log (1-p)) \\ &= - y \log \frac{1}{1+\exp(-\theta^T x)} - (1-y) \log \frac{1}{1+\exp(\theta^T x)}\\ &= y \log (1+\exp(-\theta^T x)) + (1-y) \log (1+\exp(\theta^T x))\\ &=\log (1+\exp(-\theta^T x \cdot y)), \end{aligned} L=−(ylogp+(1−y)log(1−p))=−ylog1+exp(−θTx)1−(1−y)log1+exp(θTx)1=ylog(1+exp(−θTx))+(1−y)log(1+exp(θTx))=log(1+exp(−θTx⋅y)),
where Y ∈ { 0 , 1 } Y \in \{0,1\} Y∈{0,1} and p = P ( Y = 1 ∣ X = x ) p=P(Y=1|X=x) p=P(Y=1∣X=x).
其它例子:SVM, linear regression \textcolor{red}{\text{\small 其它例子:SVM, linear regression}} 其它例子:SVM, linear regression
min V V T L ( V ) = ∥ X − V T V X ∥ F 2 \min_{V V^T}L(V)= \| X-V^T V X\|_F^2 VVTminL(V)=∥X−VTVX∥F2
(minimise the reconstruction error) \textcolor{gray}{\textit{\small (minimise the reconstruction error)}} (minimise the reconstruction error)
Formulation from the perspective of maximising the variance \textcolor{red}{\text{\small Formulation from the perspective of maximising the variance}} Formulation from the perspective of maximising the variance
验证该目标为非凸问题:检查定义
If V ∗ V^\ast V∗ is the minimum, then − V ∗ -V^\ast −V∗ is also the minimum as L ( V ∗ ) = L ( − V ∗ ) L(V^\ast)=L(-V^\ast) L(V∗)=L(−V∗).
L ( 1 2 V ∗ + 1 2 ( − V ∗ ) ) = L ( 0 ) = ∥ X ∥ F 2 > ∥ X − V ∗ T V ∗ X ∥ F 2 = 1 2 L ( V ∗ ) + 1 2 L ( − V ∗ ) \begin{aligned} L\large(\frac{1}{2} V^\ast + \frac{1}{2} (-V^\ast) \large)=L(0)&=\|X\|_F^2 \\ &> \| X-V^{\ast T} V^\ast X\|_F^2=\frac{1}{2} L(V^\ast) + \frac{1}{2} L(-V^\ast) \end{aligned} L(21V∗+21(−V∗))=L(0)=∥X∥F2>∥X−V∗TV∗X∥F2=21L(V∗)+21L(−V∗)
求解: SVD \textcolor{red}{\text{\small SVD}} SVD
其它例子:low-rank model (e.g. matrix decomposition), deep neural network \textcolor{red}{\text{\small 其它例子:low-rank model (e.g. matrix decomposition), deep neural network}} 其它例子:low-rank model (e.g. matrix decomposition), deep neural network
参考文献: