在线性优化中,通常会遇到l1正则化,由于L1不可导,所以如何求得最优解是个问题。
考虑最简单的线性模型, 用平方误差作为优化函数,则优化目标为:
使用L1正则化,则为:
该优化目标也叫LASSO回归。
将(1)视为对w的函数,:
若f(x)可导,且∇f可导,且∇f满足L−Lipschitz条件,即存在常数L>0使得
∣ ∇ f ( x ′ ) − ∇ f ( x ) ∣ ≤ L ∣ x ′ − x ∣ , ∀ ( x , x ′ ) |\nabla f(x') - \nabla f(x)| \leq L|x'-x|, \forall (x,x') ∣∇f(x′)−∇f(x)∣≤L∣x′−x∣,∀(x,x′)
∣ ∇ f ( x ′ ) − ∇ f ( x ) ∣ ∣ x ′ − x ∣ ≤ L , ∀ ( x , x ′ ) \frac{|\nabla f(x') - \nabla f(x)|}{|x'-x|} \leq L ,\forall (x,x') ∣x′−x∣∣∇f(x′)−∇f(x)∣≤L,∀(x,x′)
由导数的定义可得,
∣ ∇ 2 f ( x ) ≤ L |\nabla ^2 f(x) \leq L ∣∇2f(x)≤L
因此对优化目标在 x k x_k xk处进行二阶泰勒展开:
f ^ ( x k ) = f ( x k ) + ∇ f ( x k ) T ( x − x k ) + ∇ 2 f ( x k ) 2 ! ( x − x k ) T ( x − x k ) \hat{f}(x_k) = f(x_k)+\nabla f(x_k)^T(x-x_k)+\frac{\nabla ^2f(x_k)}{2!}(x-x_k)^T(x-x_k) f^(xk)=f(xk)+∇f(xk)T(x−xk)+2!∇2f(xk)(x−xk)T(x−xk)
≤ f ( x k ) + ∇ f ( x k ) T ( x − x k ) + L 2 ( x − x k ) T ( x − x k ) \leq f(x_k)+\nabla f(x_k)^T(x-x_k)+\frac{L}{2}(x-x_k)^T(x-x_k) ≤f(xk)+∇f(xk)T(x−xk)+2L(x−xk)T(x−xk)
= f ( x k ) + L 2 ( ( x − x k ) T ( x − x k ) + 2 L ∇ f ( x k ) ( x − x k ) + 1 L 2 ∇ 2 f ( x k ) ) − 1 L 2 ∇ 2 f ( x k ) =f(x_k)+\frac{L}{2}((x-x_k)^T(x-x_k)+\frac{2}{L}\nabla f(x_k)(x-x_k)+\frac{1}{L^2}\nabla ^2f(x_k))-\frac{1}{L^2}\nabla ^2f(x_k) =f(xk)+2L((x−xk)T(x−xk)+L2∇f(xk)(x−xk)+L21∇2f(xk))−L21∇2f(xk)
= L 2 ∣ ∣ x − ( x k − 1 L ∇ f ( x k ) ) ∣ ∣ 2 2 + c o n s t =\frac{L}{2}||x-(x_k-\frac{1}{L}\nabla f(x_k))||^2_2+const =2L∣∣x−(xk−L1∇f(xk))∣∣22+const
其中, c o n s t = f ( x k ) − − 1 L 2 ∇ 2 f ( x k ) const=f(x_k)--\frac{1}{L^2}\nabla ^2f(x_k) const=f(xk)−−L21∇2f(xk)
显然, f ^ ( x ) m i n \hat{f}(x)_{min} f^(x)min在 z = x k − 1 L ∇ f ( x k ) z=x_k-\frac{1}{L}\nabla f(x_k) z=xk−L1∇f(xk)处,带入(2)式得优化目标为:
首先我们讨论下式(3)的单调性,为了更加直观,我们先变换该式:
L 2 ∣ ∣ x − z ∣ ∣ 2 2 + λ ∣ x ∣ \frac{L}{2}||x-z||^2_2+\lambda|x| 2L∣∣x−z∣∣22+λ∣x∣
= ∑ i = 1 d L 2 ( ( x i − z i ) 2 + λ ∣ x i ∣ ) =\sum^d_{i=1}\frac{L}{2}((x_i-z_i)^2+\lambda|x_i|) =∑i=1d2L((xi−zi)2+λ∣xi∣)
考虑矩阵中的分量xi:
对式(4)求导:
g ′ ( x ) = L ( x i − z i ) + λ s g n ( x i ) , g'(x)=L(x_i-z_i)+\lambda sgn(x_i), g′(x)=L(xi−zi)+λsgn(xi),
s g n ( . ) sgn(.) sgn(.)为指示函数,满足 s g n ( x i ) = { 1 , x i > 0 0 , x i < 0 sgn(x_i)=\left\{\begin{aligned}1,x_i>0 \\0,x_i<0\end{aligned}\right. sgn(xi)={1,xi>00,xi<0,注意 x i = 0 x_i=0 xi=0时不可导。
设 g ′ ( x ) = L ( x i − z i ) + λ s g n ( x i ) = 0 , g'(x)=L(x_i-z_i)+\lambda sgn(x_i)=0, g′(x)=L(xi−zi)+λsgn(xi)=0,对 x i x_i xi 的三种取值情况进行分析:
x i > 0 x_i>0 xi>0
L ( x i − z i ) + λ = 0 L(x_i-z_i)+\lambda =0 L(xi−zi)+λ=0
x i = z i − λ L x_i=z_i-\frac{\lambda}{L} xi=zi−Lλ
因为 x i > 0 , x_i>0, xi>0,所以 z i > λ L z_i>\frac{\lambda}{L} zi>Lλ
x i < 0 x_i<0 xi<0
L ( x i − z i ) − λ = 0 L(x_i-z_i)-\lambda =0 L(xi−zi)−λ=0
x i = z i + λ L x_i=z_i+\frac{\lambda}{L} xi=zi+Lλ
因为 x i < 0 , x_i<0, xi<0,所以 z i < − λ L z_i<-\frac{\lambda}{L} zi<−Lλ
x i = 0 x_i=0 xi=0
g ( x ) = L 2 ( z i ) 2 g(x)=\frac{L}{2}(z_i)^2 g(x)=2L(zi)2为常量恒定不变。
综上可得:
x i = { z i − λ L , z i > λ L 0 , ∣ z i ∣ ≤ λ L z i + λ L , z i < − λ L x_i=\left\{\begin{aligned}z_i-\frac{\lambda}{L},z_i>\frac{\lambda}{L} \\0,\ \ \ \ |z_i| \leq \frac{\lambda}{L}\ \ \ \ \\z_i+ \frac{\lambda}{L},z_i<-\frac{\lambda}{L}\end{aligned}\right. xi=⎩⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎧zi−Lλ,zi>Lλ0, ∣zi∣≤Lλ zi+Lλ,zi<−Lλ
代码:
def prox_l1(z, lambda_L):
x = z - lambda_L
y = -z - lambda_L
x[x < 0] = 0
y[y< 0] = 0
# print(f'prox_l1 x:{(x-y).size()}')
return x-y