最速下降法(Steepest descent)与共轭梯度法(Conjugate gradient)

目标问题:

\text{min}~ f = x_1^2+10x_2^2.

初始化参数:x_0=[5, 5]^\text{T},~k=0.

\nabla f = \binom{2x_1}{20x_2}

在第一步迭代时(k=0),共轭梯度法等同于最速下降法:

x_1 = x_0-\alpha_0\nabla f_0=\begin{bmatrix} 5-10\alpha_0\\ 5-100\alpha_0 \end{bmatrix}

采用牛顿迭代法搜寻k=0时适当的步长因子:

f(\alpha)=(5-10\alpha)^2+10(5-100\alpha)^2.

\alpha_{m=0}=1, m=0

\dot f(\alpha)=2(5-10\alpha)(-10)+20(5-100\alpha)(-100)

        =-100+200\alpha-10000+20000\alpha = -10100+200200\alpha

\ddot f(\alpha)=200200

\alpha_1 = \alpha_0 - \frac{\dot f(\alpha_0)}{\ddot f(\alpha_0)}=1-\frac{190100}{200200}=0.0505.

|\alpha_1 - \alpha_0|>10^{-3},~\text{not convergence}.

m = m+1=1.

\alpha_2 = \alpha_1- \frac{dot f(\alpha)}{\ddot f(\alpha_1)}=0.0505-\frac{10.1}{200200}\approx 0.0505.

|\alpha_2 - \alpha_1|>10^{-3},~\text{convergence}.

当k=1时,将\alpha_2带入原迭代方程,并采用最速下降法迭代:

x_1 = x_0-\alpha_0\nabla f_0=\begin{bmatrix} 5-10\alpha_0\\ 5-100\alpha_0 \end{bmatrix}=\begin{bmatrix} 5-10\times 0.0505\\ 5-100\times 0.0505\end{bmatrix}=\begin{bmatrix} 4.495\\ -0.05 \end{bmatrix}

||x_1-x_0||=||\begin{bmatrix} 4.495-5\\ -0.05-5 \end{bmatrix}||=||\begin{bmatrix} -0.505\\ -5.05 \end{bmatrix}||=5.0752>10^{-3}.~(\text{not convergence})

k = k+1=1.

x_2 = x_1-\alpha_1\nabla f_1=\begin{bmatrix} 4.495-8.99\alpha_1\\ -0.05+\alpha_1 \end{bmatrix}

再次采用牛顿迭代法搜寻k=1时适当的步长因子:

f(\alpha)=(4.495-8.99\alpha)^2+10(-0.05-\alpha)^2.

\alpha_{m=0}=1, m=0

\dot f(\alpha)=2(4.495-8.99\alpha)(-8.99)+20(-0.05+\alpha)=181.64\alpha-81.82

\ddot f(\alpha)=181.64.

\alpha_1 = \alpha_0 - \frac{\dot f(\alpha_0)}{\ddot f(\alpha_0)}=1-\frac{99.82}{181.64}=0.4505.

|\alpha_1 - \alpha_0|>10^{-3},~\text{not convergence}.

m = m+1=1.

\alpha_2 = \alpha_1- \frac{dot f(\alpha)}{\ddot f(\alpha_1)}=0.4505-\frac{0.0088}{181.64}\approx 0.4505.

|\alpha_2 - \alpha_1|>10^{-3},~\text{convergence}.

\alpha_2带入原迭代方程,并采用最速下降法迭代:

x_2 = x_1-\alpha_1\nabla f_1=\begin{bmatrix} 4.495-8.99\alpha_1\\ -0.05+\alpha_1 \end{bmatrix}

     =\begin{bmatrix} 4.495-8.99\times 0.4505\\ -0.05+ 0.4505\end{bmatrix}=\begin{bmatrix} 4.445\\ -0.4005 \end{bmatrix}

。。。。。。利用最速下降法的迭代方法,重复多次的迭代。


当k=1时采用共轭梯度法迭代:

x_2 = x_1 + \alpha_1 d_1

\nabla f_1\cdot \nabla f_1 = 81.82

\nabla f_0\cdot \nabla f_0 = 10100

d_1 = -\nabla f_1+\frac{\nabla f_1\cdot \nabla f_1}{\nabla f_0\cdot \nabla f_0}\cdot d_0=\begin{bmatrix} -8.99+\frac{81.82}{10100}(-10)\\ 1+\frac{81.82}{10100}(-100) \end{bmatrix}=\begin{bmatrix} -9.0710\\ 0.1899 \end{bmatrix}

 x_2 = x_1+\alpha_1 d_1=\begin{bmatrix} 4.495-9.071\alpha_1\\ -0.05+0.1899\alpha_1 \end{bmatrix}

采用牛顿迭代法搜寻k=1时适当的步长因子:

f(\alpha)=(4.495-9.071\alpha)^2+10(-0.05-0.1899\alpha)^2.

\alpha_{m=0}=1, m=0

\dot f(\alpha)=2(4.495-9.071\alpha)(-9.071)+20(-0.05+0.1899\alpha)(0.1899)

         =181.64\alpha-81.82

\ddot f(\alpha)=165.2873

\alpha_1 = \alpha_0 - \frac{\dot f(\alpha_0)}{\ddot f(\alpha_0)}=1-\frac{83.5491}{165.2873}=0.4945.

|\alpha_1 - \alpha_0|>10^{-3},~\text{not convergence}.

m = m+1=1.

\alpha_2 = \alpha_1- \frac{dot f(\alpha)}{\ddot f(\alpha_1)}=0.4945-\frac{0.0036}{165.2873}\approx 0.4945.

|\alpha_2 - \alpha_1|>10^{-3},~\text{convergence}.

\alpha_2带入原迭代方程,并采用共轭梯度法迭代:

x_2 = x_1-\alpha_1 d_1=\begin{bmatrix} 4.495-9.071\alpha_1\\ -0.05+0.1899\alpha_1 \end{bmatrix}

     =\begin{bmatrix} 4.495-9.071\times 0.4945\\ -0.05+ 0.1899\times 0.4945\end{bmatrix}=\begin{bmatrix} 0.0094\\ -0.0439 \end{bmatrix}

||x_2-x_1||~\text{close to} ~0

。。。。。。利用共轭梯度法的迭代方法,重复多次的迭代。

你可能感兴趣的:(数值算法,算法,深度学习)