group lasso 块坐标下降

group lasso 块坐标下降

优化目标:

minx12||Axb||22+λj=1J||xj||2 min x 1 2 | | A x − b | | 2 2 + λ ∑ j = 1 J | | x j | | 2

其中 A={A1,,AJ}Rn×m,x={x1,,xJ}Rm A = { A 1 , … , A J } ∈ R n × m , x = { x 1 , … , x J } ∈ R m

目标函数可以化简为:

====12||Axb||22+λj=1J||xj||212(ijxTiATiAjxj2bTiAixi+bTb)+λj=1J||xj||2k(12(xTkATkAkxk+2ikxTiATiAkxk2bTAkxk)+λ||xk||2)+bTb2k(12xkATkAkxk+(ikxTiATibT)Akxk+λ||xk||2)+bTb2k(12xTkMkxk+pTkxk+λ||xk||2)+bTb2 1 2 | | A x − b | | 2 2 + λ ∑ j = 1 J | | x j | | 2 = 1 2 ( ∑ i ∑ j x i T A i T A j x j − 2 b T ∑ i A i x i + b T b ) + λ ∑ j = 1 J | | x j | | 2 = ∑ k ( 1 2 ( x k T A k T A k x k + 2 ∑ i ≠ k x i T A i T A k x k − 2 b T A k x k ) + λ | | x k | | 2 ) + b T b 2 = ∑ k ( 1 2 x k A k T A k x k + ( ∑ i ≠ k x i T A i T − b T ) A k x k + λ | | x k | | 2 ) + b T b 2 = ∑ k ( 1 2 x k T M k x k + p k T x k + λ | | x k | | 2 ) + b T b 2

其中 Mk=ATkAk,pTk=(ikxTiATibT)Ak M k = A k T A k , p k T = ( ∑ i ≠ k x i T A i T − b T ) A k

根据块坐标下降,每次取定一个 k k 后,求解以下优化问题:

minxk12xTkMkxk+pTkxk+λ||xk||2 min x k 1 2 x k T M k x k + p k T x k + λ | | x k | | 2

由一阶最优解条件知:
Mkxk+pk+λg(xk)=0 M k x k + p k + λ g ( x k ) = 0

其中 g(xk) g ( x k ) 表示 ||x||2 | | x | | 2 xk x k 处的次梯度

xk0 x k ≠ 0 时, g(xk)=xk||xk||2 g ( x k ) = x k | | x k | | 2 ||x||2 | | x | | 2 xk x k 处的次梯度

因为 y ∀ y ,由次梯度定义有

||y||2||xk||2+xTk||xk||2(yxk)=||xk||2+yTxk||xk||2xTkxk||xk||2=||y||2cosθ | | y | | 2 ≥ | | x k | | 2 + x k T | | x k | | 2 ( y − x k ) = | | x k | | 2 + y T x k | | x k | | 2 − x k T x k | | x k | | 2 = | | y | | 2 cos ⁡ θ

恒成立,因此 g(xk)=xk||xk||2 g ( x k ) = x k | | x k | | 2 ||x||2 | | x | | 2 xk x k 处的次梯度

xk=0 x k = 0

  1. ||pk||2λ | | p k | | 2 ≤ λ ,可知 pTkxk+λ||xk||20 p k T x k + λ | | x k | | 2 ≥ 0 ,从而可知 xk=0 x k = 0 是最优解

    因为 xk ∀ x k ,想要 pTkxk+λ||xk||20 p k T x k + λ | | x k | | 2 ≥ 0 ,考虑

    pTkxk+λ||xk||2=pTkxk||xk||2||xk||2+λ||xk||2=(pTkxk||xk||2+λ)||xk||2(||pk||2+λ)||xk||20 p k T x k + λ | | x k | | 2 = p k T x k | | x k | | 2 | | x k | | 2 + λ | | x k | | 2 = ( p k T x k | | x k | | 2 + λ ) | | x k | | 2 ≥ ( − | | p k | | 2 + λ ) | | x k | | 2 ≥ 0

    则一个充分条件是 ||pk||2λ | | p k | | 2 ≤ λ

  2. xk=0 x k = 0 为最优解,则由一阶最优解条件知,  g0 ∃   g 0 ||xk||2 | | x k | | 2 xk=0 x k = 0 处的次梯度,满足

  3. Mkxk+pk+λg0=0 M k x k + p k + λ g 0 = 0

    又由于此时 xk=0 x k = 0 ,故有 pk+λg0=0 p k + λ g 0 = 0 ,所以 ||pk||2=λ||g0||2λ | | p k | | 2 = λ | | g 0 | | 2 ≤ λ

    下证 ||g0||21 | | g 0 | | 2 ≤ 1 ,由次梯度定义知, ||xk||2 | | x k | | 2 xk=0 x k = 0 处的次梯度满足

    ||xk||2||0||2||0||2+gT0(xk0)xk | | x k | | 2 − | | 0 | | 2 ≥ | | 0 | | 2 + g 0 T ( x k − 0 ) ∀ x k


    ||xk||2gT0xk=||g0||2||xk||2cosθxk | | x k | | 2 ≥ g 0 T x k = | | g 0 | | 2 | | x k | | 2 cos ⁡ θ ∀ x k

    因此可知 ||g0||21 | | g 0 | | 2 ≤ 1

综上可知, xk=0 x k = 0 是最优解的充分必要条件是 ||pk||2λ | | p k | | 2 ≤ λ

因此当 xk0 x k ≠ 0 时,由一阶最优解条件知

Mkxk+pk+λxk||xk||2=0 M k x k + p k + λ x k | | x k | | 2 = 0


xk=(Mk+λ||xk||2I)1pk x k = ( M k + λ | | x k | | 2 I ) − 1 p k

块坐标下降算法如下:

  1. 任意选定 k k
  2. 计算 pk p k
  3. ||pk||2λ | | p k | | 2 ≤ λ ,则 xk=0 x k = 0 ,否则 xk=(Mk+λ||xk||2I)1pk x k = ( M k + λ | | x k | | 2 I ) − 1 p k

你可能感兴趣的:(优化算法)