牛顿法与拟牛顿法

文章目录

    • 牛顿法&拟牛顿法
      • 1 牛顿法
      • 2 拟牛顿法
        • 2.1 对称秩1校正
        • 2.2 DFP
        • 2.3 BFGS

牛顿法&拟牛顿法

设无约束优化问题:
min ⁡ f ( x ) ,   x ∈ R n \min f(x),{\kern 1pt} \,x \in {R^n} minf(x),xRn

1 牛顿法

基本思想,通过泰勒二阶展开,通过对泰勒展开求导,并令其等于0,从而求得极小值。

f ( x ) f(x) f(x) x k x_k xk处进行泰勒展开:
f ( x ) ≈ f ( x k ) + Δ f ( x k ) ( x − x k ) + 1 2 ( x − x k ) T Δ 2 f ( x k ) ( x − x k ) + 0 ( ∣ ∣ x − x k ∣ ∣ 2 ) f(x) \approx f({x_k}) + {\rm{\Delta }}f({x_k})(x - {x_k}) + \frac{1}{2}{\left( {x - {x_k}} \right)^T}{{\rm{\Delta }}^2}f({x_k})(x - {x_k}) + 0(||x - {x_k}|{|^2}) f(x)f(xk)+Δf(xk)(xxk)+21(xxk)TΔ2f(xk)(xxk)+0(∣∣xxk2)

对泰勒式及进行求导:
∇ f ( x ) ≈ ∇ f ( x k ) + ∇ 2 f ( x k ) ( x − x k ) \nabla f(x) \approx \nabla f({x_k}) + {\nabla ^2}f({x_k})(x - {x_k}) f(x)f(xk)+2f(xk)(xxk)

令上次为0,便可求得极小值位置:
∇ f ( x k ) = − ∇ 2 f ( x k ) ( x − x k ) \nabla f({x_k}){\rm{ = }} - {\nabla ^2}f({x_k})(x - {x_k}) f(xk)=2f(xk)(xxk)

得:
x = x k − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) x{\rm{ = }}{x_k} - {\nabla ^2}f{({x_k})^{{\rm{ - 1}}}}\nabla f({x_k}) x=xk2f(xk)1f(xk)
我们便可得下降到最小值的方向: − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) -{\nabla ^2}f{({x_k})^{{\rm{ - 1}}}}\nabla f({x_k}) 2f(xk)1f(xk)

牛顿法流程:
设定初始点: x 0 x_0 x0,迭代步数 k = 0 k=0 k=0
开始迭代:
1)计算梯度,Hessian矩阵,从而得到牛顿迭代方向:
∇ f ( x k ) 2 ,    f ( x k ) ,    d k = − ∇ 2 f ( x k ) − 1 ∇ f ( x k ) \nabla f{({x_k})^2},\ \ f({x_k}),\ \ {d_k} = - {\nabla ^2}f{({x_k})^{{\rm{ - 1}}}}\nabla f({x_k}) f(xk)2,  f(xk),  dk=2f(xk)1f(xk)

2)判断当前迭代步长是否足够小或达到迭代次数,是则停止,否,继续下一步

3)更新:
x = x k + d k k + + \begin{array}{l} x{\rm{ = }}{x_k} + {d_k}\\ k ++ \end{array} x=xk+dkk++
转步骤1)

Tips:
此外,牛顿法,是局部收敛的。因为是基于求解极值的方法,因而会局限于局部极小值,因而要求初始点选取在全局最优解附近。
牛顿法有一次收敛性。但由于 ∇ 2 f ( x k ) {\nabla ^2}f({x_k}) 2f(xk)不能保证一定正定,会出现秩为0的情况,此时,牛顿法会失效。因而基于修正hessian矩阵的算法出现,如LM算法。

2 拟牛顿法

拟牛顿法是一种基于牛顿法的变形,提供一种思想,包含多种解决方案。
基本思想:牛顿法虽收敛速度快,但要计算Hessian矩阵及其逆矩阵,且Hessian矩阵不一定正定,为了减少计算量,拟牛顿法被提出,其基本思想是使用目标函数的一阶信息近似牛顿法的Hessian矩阵。

拟牛顿法的条件:使用k时刻的一阶二阶近似出 k + 1 k+1 k+1时刻的二阶Hessian矩阵。
我们将目标函数 f ( x ) f(x) f(x) x k + 1 x_{k+1} xk+1处进行泰勒展开:
f ( x ) ≈ f ( x k + 1 ) + Δ f ( x k + 1 ) ( x − x k + 1 ) + 1 2 ( x − x k + 1 ) T Δ 2 f ( x k + 1 ) ( x − x k + 1 ) + 0 ( ∣ ∣ x − x k + ! ∣ ∣ 2 ) f(x) \approx f({x_{k + 1}}) + {\rm{\Delta }}f({x_{k + 1}})(x - {x_{k + 1}}) + \frac{1}{2}{\left( {x - {x_{k + 1}}} \right)^T}{{\rm{\Delta }}^2}f({x_{k + 1}})(x - {x_{k + 1}}) + 0(||x - {x_{k + !}}|{|^2}) f(x)f(xk+1)+Δf(xk+1)(xxk+1)+21(xxk+1)TΔ2f(xk+1)(xxk+1)+0(∣∣xxk+!2)

对泰勒式及进行求导:
∇ f ( x ) ≈ ∇ f ( x k + 1 ) + ∇ 2 f ( x k + 1 ) ( x − x k + 1 ) \nabla f(x) \approx \nabla f({x_{k + 1}}) + {\nabla ^2}f({x_{k + 1}})(x - {x_{k + 1}}) f(x)f(xk+1)+2f(xk+1)(xxk+1)

x = x k ,   g k = ∇ f ( x k + 1 ) x = {x_k},\,{g_k} = \nabla f({x_{k + 1}}) x=xk,gk=f(xk+1),则得:
g k − g k + 1 ≈ ∇ 2 f ( x k + 1 ) ( x k − x k + 1 ) {g_k} - {g_{k + 1}} \approx {\nabla ^2}f({x_{k + 1}})({x_k} - {x_{k + 1}}) gkgk+12f(xk+1)(xkxk+1)

p k = g k − g k + 1 ,    q k = x k − x k + 1 ,    H k + 1 = ∇ 2 f ( x k + 1 ) {p_k} = {g_k} - {g_{k + 1}},\,\,{q_k} = {x_k} - {x_{k + 1}},\,\,{H_{k + 1}} = {\nabla ^2}f({x_{k + 1}}) pk=gkgk+1,qk=xkxk+1,Hk+1=2f(xk+1),则得拟牛顿法的条件:
p k ≈ H k + 1 q k {p_k} \approx {H_{k + 1}}{q_k} pkHk+1qk

Tips:
关于这个公式的理解,可以将x想为时间t,即,两帧(两次迭代之间 k − > k + 1 k->k+1 k>k+1)的速度变化,等于 k + 1 k+1 k+1时刻的加速度*两帧之间的时差。
此外,如果我们对目标函数在 x k x_k xk处展开,再将 x k + 1 x_{k+1} xk+1代入,可以得到 。这里,引入思考,泰勒展开式只保证在展开点的一定邻域内有效,当 x k x_k xk x k + 1 x_{k+1} xk+1相差较大时,以上拟牛顿法的条件是否也有一定误差。

牛顿法是使 H k + 1 = H k + Δ H k {H_{k + 1}} = {H_k} + \Delta {H_k} Hk+1=Hk+ΔHk来拟合 H k + 1 {H_{k + 1}} Hk+1

2.1 对称秩1校正

这个思想是,保证 的半正定及对称性,我们令:
Δ H k = v k v k T , v k ≠ 0 \Delta {H_k} = {v_k}v_k^T,{v_k} \ne 0 ΔHk=vkvkTvk=0

由于 v k v k T {v_k}v_k^T vkvkT使得 Δ H k \Delta {H_k} ΔHk的秩为1,故而此方法名为秩1校正。
代入拟牛顿法条件
p k ≈ H k + 1 q k {p_k} \approx {H_{k + 1}}{q_k} pkHk+1qk p k ≈ H k + 1 q k = ( H k + Δ H k ) q k p k = ( H k + v k v k T ) q k = H k q k + v k v k T q k \begin{array}{l} {p_k} \approx {H_{k + 1}}{q_k} = \left( {{H_k} + \Delta {H_k}} \right){q_k}\\ {p_k} = \left( {{H_k} + {v_k}v_k^T} \right){q_k} = {H_k}{q_k} + {v_k}v_k^T{q_k} \end{array} pkHk+1qk=(Hk+ΔHk)qkpk=(Hk+vkvkT)qk=Hkqk+vkvkTqk

从而得: v k v k T q k = p k − H k q k {v_k}v_k^T{q_k} = {p_k} - {H_k}{q_k} vkvkTqk=pkHkqk
由于 q k {q_k} qk为向量,上式可以看成 A x = b Ax=b Ax=b A A A为对称对阵,求 A A A的问题。
等式两侧都乘以 ( p k − H k q k ) T {\left( {{p_k} - {H_k}{q_k}} \right)^T} (pkHkqk)T,得:
v k v k T q k ( p k − H k q k ) T = ( p k − H k q k ) ( p k − H k q k ) T {v_k}v_k^T{q_k}{\left( {{p_k} - {H_k}{q_k}} \right)^T} = \left( {{p_k} - {H_k}{q_k}} \right){\left( {{p_k} - {H_k}{q_k}} \right)^T} vkvkTqk(pkHkqk)T=(pkHkqk)(pkHkqk)T
进而得:
v k v k T = ( p k − H k q k ) ( p k − H k q k ) T q k ( p k − H k q k ) T {v_k}v_k^T = \frac{{\left( {{p_k} - {H_k}{q_k}} \right){{\left( {{p_k} - {H_k}{q_k}} \right)}^T}}}{{{q_k}{{\left( {{p_k} - {H_k}{q_k}} \right)}^T}}} vkvkT=qk(pkHkqk)T(pkHkqk)(pkHkqk)T

因而我们可以发现,秩1校正成立的条件 q k ( p k − H k q k ) T > 0 {q_k}{\left( {{p_k} - {H_k}{q_k}} \right)^T}>0 qk(pkHkqk)T>0

2.2 DFP

基本思想,我们可以从秩1校正中发现 Δ H k \Delta {H_k} ΔHk的是由 p k p_k pk H k q k H_kq_k Hkqk组成的,因而我么可以设计以下方法来完成校正:a和b为系数。
Δ H k = a ( p k p k T ) + b ( H k q k ) ( H k q k ) T = a ( P k P k T ) + b H k q k q k T H k T {\rm{\Delta }}{H_k} = a\left( {{p_k}p_k^T} \right) + b({H_k}{q_k}){({H_k}{q_k})^T} = a\left( {{P_k}P_k^T} \right) + b{H_k}{q_k}q_k^TH_k^T ΔHk=a(pkpkT)+b(Hkqk)(Hkqk)T=a(PkPkT)+bHkqkqkTHkT

代入拟牛顿法条件
p k ≈ H k + 1 q k {p_k} \approx {H_{k + 1}}{q_k} pkHk+1qk p k ≈ H k + 1 q k = ( H k + Δ H k ) q k p k = ( H k + a ( p k p k T ) + b H k q k q k T H k T ) q k = H k q k + a p k p k T q k + b H k q k q k T H k T q k \begin{array}{l} {p_k} \approx {H_{k + 1}}{q_k} = \left( {{H_k} + \Delta {H_k}} \right){q_k}\\ {p_k} = \left( {{H_k} + a\left( {{p_k}p_k^T} \right) + b{H_k}{q_k}q_k^TH_k^T} \right){q_k} = {H_k}{q_k} + a{p_k}p_k^T{q_k} + b{H_k}{q_k}q_k^TH_k^T{q_k} \end{array} pkHk+1qk=(Hk+ΔHk)qkpk=(Hk+a(pkpkT)+bHkqkqkTHkT)qk=Hkqk+apkpkTqk+bHkqkqkTHkTqk

目标转为求系数a和b。根据乘法的结合率,我们可以知道: p k T q k p_k^T{q_k} pkTqk q k T H k T q k q_k^TH_k^T{q_k} qkTHkTqk为常数,因此,可得:
( H k + b q k T H k T q k H k ) q k + ( a p k T q k − 1 ) p k = 0 ⇒ ( 1 + b q k T H k T q k ) H k q k + ( a p k T q k − 1 ) p k = 0 \begin{array}{l} \left( {{H_k} + bq_k^TH_k^T{q_k}{H_k}} \right){q_k} + \left( {ap_k^T{q_k} - 1} \right){p_k} = 0\\ \Rightarrow \left( {1 + bq_k^TH_k^T{q_k}} \right){H_k}{q_k} + \left( {ap_k^T{q_k} - 1} \right){p_k} = 0 \end{array} (Hk+bqkTHkTqkHk)qk+(apkTqk1)pk=0(1+bqkTHkTqk)Hkqk+(apkTqk1)pk=0
p k p_k pk H k q k H_kq_k Hkqk线性无关,则要使上式 = 0 =0 =0成立,则需:
{ 1 + b q k T H k T q k = 0 a p k T q k − 1 = 0 ⇒ { a = 1 p k T q k b = − 1 q k T H k q k \left\{ {\begin{array}{c} {1 + bq_k^TH_k^T{q_k} = 0}\\ {ap_k^T{q_k} - 1 = 0} \end{array}} \right. \Rightarrow \left\{ {\begin{array}{c} {a = \frac{1}{{p_k^T{q_k}}}}\\ {b = - \frac{1}{{q_k^T{H_k}{q_k}}}} \end{array}} \right. {1+bqkTHkTqk=0apkTqk1=0{a=pkTqk1b=qkTHkqk1
从而可得:
Δ H k = p k p k T p k T q k − H k q k ⋅ q k T H k q k T H k q k {\rm{\Delta }}{H_k} = \frac{{{p_k}p_k^T}}{{p_k^T{q_k}}} - \frac{{{H_k}{q_k} \cdot q_k^T{H_k}}}{{q_k^T{H_k}{q_k}}} ΔHk=pkTqkpkpkTqkTHkqkHkqkqkTHk

DFP流程:
设定初始点: x 0 x_0 x0,迭代步数 k = 0 k=0 k=0, (单位矩阵)
开始迭代:
1)计算梯度,Hessian矩阵,从而得到牛顿迭代方向:
∇ f ( x k ) ,    ∇ 2 f ( x k ) ,    d k = − H − 1 ∇ f ( x k ) \nabla f({x_k}),\ \ {\nabla ^2}f({x_k}),\ \ {d_k} = - {{\rm{H}}^{{\rm{ - 1}}}}\nabla f({x_k}) f(xk),  2f(xk),  dk=H1f(xk)
2)判断当前迭代步长是否足够小或达到迭代次数,是则停止,否,继续下一步

3)更新:
x = x k + d k ;   k + + ; g k + 1 = Δ f ( x k + 1 ) ,    p k = x k + 1 − x k ,    q k = g k + 1 − g k ; H k + 1 = H k + p k p k T p k T q k − H k q k ⋅ q k T H k q k T H k q k \begin{array}{l} x{\rm{ = }}{x_k} + {d_k};\ k + + ;\\ {g_{k + 1}} = {\rm{\Delta }}f({x_{k + 1}}),\ \ {p_k} = {x_{k + 1}} - {x_k},\ \ {q_k} = {g_{k + 1}} - {g_k};\\ {H_{k + 1}} = {H_k} + \frac{{{p_k}p_k^T}}{{p_k^T{q_k}}} - \frac{{{H_k}{q_k} \cdot q_k^T{H_k}}}{{q_k^T{H_k}{q_k}}} \end{array} x=xk+dk; k++;gk+1=Δf(xk+1),  pk=xk+1xk,  qk=gk+1gk;Hk+1=Hk+pkTqkpkpkTqkTHkqkHkqkqkTHk
转步骤1)
示例代码:

#include 
#include 

using namespace std;

// 目标函数
double objectiveFunction(double x)
{
    return pow(x, 2) - 4 * x + 4;
}

// 目标函数的一阶导数
double derivativeFunction(double x)
{
    return 2 * x - 4;
}

// 拟牛顿法(BFGS算法)求解函数
double bfgsMethod(double initialGuess, double tolerance)
{
    double x = initialGuess;
    double dx = 0.0;
    double H = 1.0; // 初始Hessian逆矩阵
    float s = 0.1;
    do {
        double g = derivativeFunction(x);
        double step = -g / H;
        x = x + s*step;

        double dg = derivativeFunction(x) - g;
        
        // 更新Hessian逆矩阵
        double deltaX = step;
        double deltaY = dg;
        H += (deltaX * deltaX) / (deltaX * deltaY) - (H * deltaY * deltaY * H) / (deltaY * H * deltaY);

        dx = abs(step);
    } while (dx > tolerance);

    return x;
}

int main()
{
    double initialGuess = 1.0; // 初始猜测值
    double tolerance = 0.0001; // 精度

    double result = bfgsMethod(initialGuess, tolerance);

    cout << "Solution: " << result << endl;

    return 0;
}
2.3 BFGS

基本思想:由于DFP需要求解Hessian阵的逆,因此,BFGS直接拟合Hessian阵的逆来完成迭代。
N k − N k + 1 ≈ ∇ 2 f ( x k + 1 ) − 1 ( g k − g k + 1 ) .     ↦ g k − g k ≈ ∇ 2 f ( x k + 1 ) ( x k − 4 k + 1 ) {N_k} - {N_{k + 1}} \approx {\nabla ^2}f{({x_{k + 1}})^{ - 1}}({g_k} - {g_{k + 1}}).\,\,\,\quad \mapsto \quad {g_k} - {g_k} \approx {\nabla ^2}f({x_{k + 1}})({x_k} - 4k + 1) NkNk+12f(xk+1)1(gkgk+1).gkgk2f(xk+1)(xk4k+1)
p k ≈ H k + 1 q k   ↦ q k ≈ B k + 1 p k {p_k} \approx {H_{k + 1}}{q_k}\,\quad \mapsto \quad {q_k} \approx {B_{k + 1}}{p_k} pkHk+1qkqkBk+1pk
Δ B k \Delta {B_k} ΔBk的思路,即可以把 Δ H k = p k p k T p k T q k − H k q k ⋅ q k T H k q k T H k q k {\rm{\Delta }}{H_k} = \frac{{{p_k}p_k^T}}{{p_k^T{q_k}}} - \frac{{{H_k}{q_k} \cdot q_k^T{H_k}}}{{q_k^T{H_k}{q_k}}} ΔHk=pkTqkpkpkTqkTHkqkHkqkqkTHk中的 p k {p_k} pk换成 q k q_k qk,将 H k q k {H_k}{q_k} Hkqk替换为 B k p k {B_k}{p_k} Bkpk,最后得:
Δ B k = ( I − q k p k T q k T p k ) B k ( I − p k q k T q k T p k ) + q k q k T q k T p k T {\rm{\Delta }}{B_k} = (I - \frac{{{q_k}p_k^T}}{{q_k^T{p_k}}}){B_k}(I - \frac{{{p_k}q_k^T}}{{q_k^T{p_k}}}) + \frac{{{q_k}q_k^T}}{{q_k^Tp_k^T}} ΔBk=(IqkTpkqkpkT)Bk(IqkTpkpkqkT)+qkTpkTqkqkT
示例代码

#include 
#include 

using namespace std;

// 目标函数
double objectiveFunction(double x)
{
    return pow(x, 2) - 4 * x + 4;
}

// 目标函数的一阶导数
double derivativeFunction(double x)
{
    return 2 * x - 4;
}

// 拟牛顿法(BFGS算法)求解函数
double bfgsMethod(double initialGuess, double tolerance)
{
    double x = initialGuess;
    double dx = 0.0;
     double B = 1.0; // 初始Hessian逆矩阵
     float s = 0.1;
     do {
         double g = derivativeFunction(x);
         double step = -B*g;
         x = x + s*step;

         double dg = derivativeFunction(x) - g;

         // 更新Hessian逆矩阵
         double deltaX = step;
         double deltaY = dg;
         B += (1-(deltaY * deltaX) / (deltaY * deltaX)) * B *
              (1-(deltaY * deltaX) / (deltaY * deltaX))
            + (deltaY * deltaY) / (deltaY * deltaX);

        dx = abs(step);
    } while (dx > tolerance);

    return x;
}

int main()
{
    double initialGuess = 1.0; // 初始猜测值
    double tolerance = 0.0001; // 精度

    double result = bfgsMethod(initialGuess, tolerance);

    cout << "Solution: " << result << endl;

    return 0;
}

你可能感兴趣的:(linux,运维,服务器)