设序列 { x k } \left \lbrace \boldsymbol{x}_k \right \rbrace {xk}收敛于 x ∗ \boldsymbol{x}^* x∗, 设 γ ≥ 1 \gamma \ge 1 γ≥1, β > 0 \beta > 0 β>0, 若极限 lim k → ∞ ∥ x k + 1 − x ∗ ∥ ∥ x k − x ∗ ∥ γ = β \underset{k \to \infty}{\lim} \dfrac{\left \Vert \boldsymbol{x}_{k+1} - \boldsymbol{x}^* \right \Vert}{\left \Vert \boldsymbol{x}_k - \boldsymbol{x}^* \right \Vert ^ \gamma} = \beta k→∞lim∥xk−x∗∥γ∥xk+1−x∗∥=β则称序列 { x k } \left \lbrace \boldsymbol{x}_k \right \rbrace {xk}是 γ \gamma γ阶收敛的。
若 γ = 1 \gamma = 1 γ=1, 则称序列 { x k } \left \lbrace \boldsymbol{x}_k \right \rbrace {xk}是线性收敛的;
若 γ = 2 \gamma = 2 γ=2, 则称序列 { x k } \left \lbrace \boldsymbol{x}_k \right \rbrace {xk}是二阶收敛的;
若 1 < γ < 2 1 < \gamma < 2 1<γ<2, 则称序列 { x k } \left \lbrace \boldsymbol{x}_k \right \rbrace {xk}是超线性收敛的。
【例1】设 x n = ( 1 n ) n x_n = \left ( \dfrac{1}{n} \right )^n xn=(n1)n, 判断点列 { x n } \left \lbrace x_n \right \rbrace {xn}的收敛速度。
【解】点列 { x n } \left \lbrace x_n \right \rbrace {xn}收敛于 x ∗ = lim n → ∞ ( 1 n ) n = e − lim n → ∞ n ln n = 0 x^* = \underset{n \to \infty}{\lim} \left ( \dfrac{1}{n} \right )^n = e^{-\underset{n \to \infty}{\lim} n\ln n} = 0 x∗=n→∞lim(n1)n=e−n→∞limnlnn=0设 lim n → ∞ x n + 1 x n γ = lim n → ∞ n γ n ( n + 1 ) n + 1 = lim n → ∞ e γ n ln n e ( n + 1 ) ln ( n + 1 ) = β ≠ 0 \underset{n \to \infty}{\lim} \dfrac{x_{n+1}}{x_n^\gamma} = \underset{n \to \infty}{\lim} \dfrac{n^{\gamma n}}{(n+1)^{n+1}} = \underset{n \to \infty}{\lim} \dfrac{e^{\gamma n \ln n}}{e^{(n+1) \ln (n+1)}} = \beta \ne 0 n→∞limxnγxn+1=n→∞lim(n+1)n+1nγn=n→∞lime(n+1)ln(n+1)eγnlnn=β=0由洛必达法则得 β = lim n → ∞ γ e γ n ln n e ( n + 1 ) ln ( n + 1 ) ⋅ 1 + ln n 1 + ln ( n + 1 ) = γ β \beta = \underset{n \to \infty}{\lim} \dfrac{\gamma e^{\gamma n \ln n}}{e^{(n+1) \ln (n+1)}} \cdot \dfrac{1+\ln n}{1+\ln (n+1)} = \gamma \beta β=n→∞lime(n+1)ln(n+1)γeγnlnn⋅1+ln(n+1)1+lnn=γβ解得 γ = 1 \gamma = 1 γ=1即 { x n } \left \lbrace x_n \right \rbrace {xn}是线性收敛( 1 1 1阶收敛)的。
设 A \boldsymbol{A} A是 n n n阶对称矩阵, 若 n n n维列向量 p \boldsymbol{p} p和 q \boldsymbol{q} q满足 p T A q = 0 \boldsymbol{p}^{\rm T} \boldsymbol{A} \boldsymbol{q} = 0 pTAq=0, 则称 p \boldsymbol{p} p和 q \boldsymbol{q} q关于矩阵 A \boldsymbol{A} A共轭。
若 n n n维非零列向量组 p 1 , p 2 , ⋯ , p m \boldsymbol{p}_1, \boldsymbol{p}_2, \cdots, \boldsymbol{p}_m p1,p2,⋯,pm满足: ∀ i ≠ j \forall i \ne j ∀i=j, p i T A p j = 0 \boldsymbol{p}_i^{\rm T} \boldsymbol{A}\boldsymbol{p}_j = 0 piTApj=0, 则称向量组 p 1 , p 2 , ⋯ , p m \boldsymbol{p}_1, \boldsymbol{p}_2, \cdots, \boldsymbol{p}_m p1,p2,⋯,pm关于 A \boldsymbol{A} A共轭, 也称向量组 p 1 , p 2 , ⋯ , p m \boldsymbol{p}_1, \boldsymbol{p}_2, \cdots, \boldsymbol{p}_m p1,p2,⋯,pm为 A \boldsymbol{A} A的共轭方向组。
若 A \boldsymbol{A} A是正定矩阵, 则以下两个结论成立:
设 A \boldsymbol{A} A是 n n n阶正定矩阵, n n n元向量组 p 1 , p 2 , ⋯ , p n \boldsymbol{p}_1, \boldsymbol{p}_2, \cdots, \boldsymbol{p}_n p1,p2,⋯,pn线性无关, 下面的方法可以生成关于 A \boldsymbol A A的共轭方向组: { q 1 = p 1 q 2 = p 2 − p 2 T A q 1 q 1 T A q 1 q 1 ⋮ q n = p n − p n T A q 1 q 1 T A q 1 q 1 − p n T A q 2 q 2 T A q 2 q 2 − ⋯ − p n T A q n − 1 q n − 1 T A q n − 1 q n − 1 \begin{cases} \boldsymbol{q}_1 = \boldsymbol{p}_1 \\ \boldsymbol{q}_2 = \boldsymbol{p}_2 - \dfrac{\boldsymbol{p}_2^{\rm T} \boldsymbol{A} \boldsymbol{q}_1}{\boldsymbol{q}_1^{\rm T}\boldsymbol{A}\boldsymbol{q}_1}\boldsymbol{q}_1 \\ \vdots \\ \boldsymbol{q}_n = \boldsymbol{p}_n - \dfrac{\boldsymbol{p}_n^{\rm T} \boldsymbol{A} \boldsymbol{q}_1}{\boldsymbol{q}_1^{\rm T}\boldsymbol{A}\boldsymbol{q}_1}\boldsymbol{q}_1 - \dfrac{\boldsymbol{p}_n^{\rm T} \boldsymbol{A} \boldsymbol{q}_2}{\boldsymbol{q}_2^{\rm T}\boldsymbol{A}\boldsymbol{q}_2}\boldsymbol{q}_2 - \cdots - \dfrac{\boldsymbol{p}_n^{\rm T} \boldsymbol{A} \boldsymbol{q}_{n-1}}{\boldsymbol{q}_{n-1}^{\rm T}\boldsymbol{A}\boldsymbol{q}_{n-1}}\boldsymbol{q}_{n-1} \end{cases} ⎩ ⎨ ⎧q1=p1q2=p2−q1TAq1p2TAq1q1⋮qn=pn−q1TAq1pnTAq1q1−q2TAq2pnTAq2q2−⋯−qn−1TAqn−1pnTAqn−1qn−1通式可以写为 q i = p i − ∑ j = 1 i − 1 p i T A q j q j T A q j q j \boldsymbol{q}_i = \boldsymbol{p}_i - \sum_{j=1}^{i-1}\dfrac{\boldsymbol{p}_i^{\rm T} \boldsymbol{A} \boldsymbol{q}_j}{\boldsymbol{q}_j^{\rm T}\boldsymbol{A}\boldsymbol{q}_j}\boldsymbol{q}_j qi=pi−j=1∑i−1qjTAqjpiTAqjqj
【例2】求矩阵 M = [ 1 1 1 1 2 0 1 0 3 ] \boldsymbol{M} = \begin{bmatrix}1 & 1 & 1 \\ 1 & 2 & 0 \\ 1 & 0 & 3\end{bmatrix} M= 111120103 的一个共轭方向组。
【解】矩阵 M \boldsymbol{M} M是正定矩阵, 则 { a = e 1 = ( 1 , 0 , 0 ) T b = e 2 − e 2 T M a a T M a a = ( − 1 , 1 , 0 ) T c = e 3 − e 3 T M a a T M a a − e 3 T M b b T M b b = ( − 2 , 1 , 1 ) T \begin{cases} \boldsymbol{a} = \boldsymbol{e}_1 = (1, 0, 0)^{\rm T} \\ \boldsymbol{b} = \boldsymbol{e}_2 - \dfrac{\boldsymbol{e}_2^{\rm T} \boldsymbol{M} \boldsymbol{a}}{\boldsymbol{a}^{\rm T} \boldsymbol{M} \boldsymbol{a}} \boldsymbol{a} = (-1, 1, 0)^{\rm T} \\ \boldsymbol{c} = \boldsymbol{e}_3 - \dfrac{\boldsymbol{e}_3^{\rm T} \boldsymbol{M} \boldsymbol{a}}{\boldsymbol{a}^{\rm T} \boldsymbol{M} \boldsymbol{a}} \boldsymbol{a} - \dfrac{\boldsymbol{e}_3^{\rm T} \boldsymbol{M} \boldsymbol{b}}{\boldsymbol{b}^{\rm T} \boldsymbol{M} \boldsymbol{b}} \boldsymbol{b} = (-2, 1, 1)^{\rm T} \end{cases} ⎩ ⎨ ⎧a=e1=(1,0,0)Tb=e2−aTMae2TMaa=(−1,1,0)Tc=e3−aTMae3TMaa−bTMbe3TMbb=(−2,1,1)T是 M \boldsymbol{M} M的一个共轭方向组。
进退算法用于确定搜索的初始区间, 算法描述如下(使用C/C++语言描述)
typedef double var;
// 算法输入:目标函数f, 初始点x0, 初始步长step
// 算法输出:搜索区间[a, b]
void forwardBackward(var(*f)(var), var x0, var step, var *a, var *b)
{
*b = x0 + step;
if(f(b) > f(x0))
step *= -1;
while(1){
*a = x0 + step;
if(f(*a) > f(x0))
break;
*b = x0;
x0 = *a;
step *= 2;
}
}
平分法的思想是:每次取两个端点和中点, 利用中点的导数信息判断下一次迭代区间。算法描述如下(使用C/C++语言描述)
typedef double var;
// 算法输入:目标函数的导函数df, 初始区间[a, b], 精度e
// 算法输出:极小值点x
var equalizationSplit(var(*df)(var), var a, var b, var e)
{
var d, x;
while(b - a > e){
d = df(x=(a+b)/2);
if(d > 0)
b = x;
else
a = x;
}
return x;
}
【例3】用平分法求函数 f ( x ) = e x − 2 x − 1 f(x) = e^x - 2x - 1 f(x)=ex−2x−1的极小值点, 搜索区间为 [ − 1 , 1 ] [-1,1] [−1,1], 迭代 3 3 3次即可。
【解】函数 f ( x ) f(x) f(x)的导函数为 f ′ ( x ) = e x − 2 f'(x) = e^x - 2 f′(x)=ex−2
第 1 1 1次迭代: a = − 1 a = -1 a=−1, b = 1 b = 1 b=1, a + b 2 = 0 \dfrac{a+b}{2} = 0 2a+b=0, f ′ ( 0 ) = − 1 < 0 f'(0) =-1 < 0 f′(0)=−1<0, 更新 a = 0 a = 0 a=0;
第 2 2 2次迭代: a = 0 a = 0 a=0, b = 1 b = 1 b=1, a + b 2 = 1 2 \dfrac{a+b}{2} = \dfrac{1}{2} 2a+b=21, f ′ ( 1 2 ) = e − 2 < 0 f'(\dfrac{1}{2}) =\sqrt{e} - 2 < 0 f′(21)=e−2<0, 更新 a = 1 2 a = \dfrac{1}{2} a=21;
第 3 3 3次迭代: a = 1 2 a = \dfrac{1}{2} a=21, b = 1 b = 1 b=1, a + b 2 = 3 4 \dfrac{a+b}{2} = \dfrac{3}{4} 2a+b=43, f ′ ( 3 4 ) = e 3 4 − 2 > 0 f'(\dfrac{3}{4}) =\sqrt[4]{e^3} - 2 > 0 f′(43)=4e3−2>0, 更新 b = 3 4 b = \dfrac{3}{4} b=43。
所以极小值点大约是 a + b 2 = 0.625 \dfrac{a+b}{2} = 0.625 2a+b=0.625。
0.618法的思想是:每次通过比较两个点的大小把区间缩小 0.382 0.382 0.382。算法描述如下(使用C/C++语言描述)
typedef double var;
// 算法输入:目标函数f, 初始区间[a, b], 精度e
// 算法输出:极小值点x
var goldenSection(var(*f)(var), var a, var b, var e)
{
var len = b - a;
const var r = 0.618;
var left = b - r * len;
var right = a + r * len;
while((len > e){
if(f(left) > f(right)){
a = left;
len = b - a;
left = right;
right = a + r * len;
}else{
b = right;
len = b - a;
right = left;
left = b - r * len;
}
}
return (a + b) / 2;
}
牛顿法的思想是:用函数 f ( x ) f(x) f(x)在已知点处的二阶泰勒展开式来近似 f ( x ) f(x) f(x), 迭代公式为 x i + 1 = x i − f ′ ( x i ) f ′ ′ ( x i ) x_{i+1}=x_i-\dfrac{f'(x_i)}{f''(x_i)} xi+1=xi−f′′(xi)f′(xi)
【例4】用牛顿法求函数 f ( x ) = 3 x 4 − 16 x 3 + 30 x 2 − 24 x + 8 f(x) = 3x^4 - 16x^3 + 30x^2 - 24x +8 f(x)=3x4−16x3+30x2−24x+8的极小值点, 取初始点 x 0 = 3 x_0 = 3 x0=3, 迭代 3 3 3次即可。
【解】令 φ ( x ) = f ′ ( x ) f ′ ′ ( x ) = x 3 − 4 x 2 + 5 x − 2 3 x 2 − 8 x + 5 \varphi (x) = \dfrac{f'(x)}{f''(x)} = \dfrac{x^3 - 4x^2 + 5x - 2}{3x^2 - 8x + 5} φ(x)=f′′(x)f′(x)=3x2−8x+5x3−4x2+5x−2
第 1 1 1次迭代: x 1 = x 0 − φ ( x 0 ) = 3 − φ ( 3 ) = 5 2 x_1 = x_0 - \varphi (x_0) = 3 - \varphi(3) = \dfrac{5}{2} x1=x0−φ(x0)=3−φ(3)=25;
第 2 2 2次迭代: x 2 = x 1 − φ ( x 1 ) = 5 2 − φ ( 5 2 ) = 11 5 x_2 = x_1 - \varphi (x_1) = \dfrac{5}{2} - \varphi(\dfrac{5}{2}) = \dfrac{11}{5} x2=x1−φ(x1)=25−φ(25)=511;
第 3 3 3次迭代: x 3 = x 2 − φ ( x 2 ) = 11 5 − φ ( 11 5 ) = 41 20 x_3 = x_2 - \varphi (x_2) = \dfrac{11}{5} - \varphi(\dfrac{11}{5}) = \dfrac{41}{20} x3=x2−φ(x2)=511−φ(511)=2041。
所以极小值点大约是 2.05 2.05 2.05。
抛物线的思想与牛顿法类似, 也是使用二次函数来拟合目标函数, 牛顿法是使用一个点的函数值与该点处的一阶导数和二阶导数构造二次函数, 而抛物线法则使用三个点的函数值构造二次函数。
对于一维搜索问题 min α f ( x + α d ) \underset{\alpha}{\min} f(\boldsymbol{x} + \alpha \boldsymbol{d}) αminf(x+αd), 非精确一维搜索只要求:
对于无约束多元函数优化问题 min f ( x ) \min f(\boldsymbol{x}) minf(x)下降算法的更新公式可以统一表示为 x i + 1 = x i − α i H i ∇ f ( x i ) \boldsymbol{x}_{i+1} = \boldsymbol{x}_i - \alpha_i \boldsymbol{H}_i \nabla f(\boldsymbol{x}_i) xi+1=xi−αiHi∇f(xi)下降方向为 d i = − H i ∇ f ( x i ) \boldsymbol{d}_i =- \boldsymbol{H}_i \nabla f(\boldsymbol{x}_i) di=−Hi∇f(xi)结束条件为 d i = 0 \boldsymbol{d}_i = \bold{0} di=0。
最速下降法: H i ≡ E \boldsymbol{H}_i \equiv \boldsymbol{E} Hi≡E, α i \alpha_i αi由一维搜索确定;
牛顿法: H i = ( ∇ 2 f ( x i ) ) − 1 \boldsymbol{H}_i = \left(\nabla^2f(\boldsymbol{x}_i) \right)^{-1} Hi=(∇2f(xi))−1, α i ≡ 1 \alpha_i \equiv 1 αi≡1;
阻尼牛顿法: H i = ( ∇ 2 f ( x i ) ) − 1 \boldsymbol{H}_i = \left(\nabla^2f(\boldsymbol{x}_i) \right)^{-1} Hi=(∇2f(xi))−1, α i \alpha_i αi由一维搜索确定。
【例5】用阻尼牛顿法求函数 f ( x 1 , x 2 ) = 4 x 1 2 + x 2 2 − 8 x 1 − 4 x 2 f(x_1, x_2) = 4x_1^2 + x_2^2 - 8x_1 - 4x_2 f(x1,x2)=4x12+x22−8x1−4x2的极小值, 初始点取原点。
【解】 ∇ f ( x 1 , x 2 ) = [ 8 x 1 − 8 2 x 2 − 4 ] \nabla f(x_1, x_2) = \begin{bmatrix} 8x_1 - 8 \\ 2x_2 - 4 \end{bmatrix} ∇f(x1,x2)=[8x1−82x2−4], ∇ 2 f ( x 1 , x 2 ) = [ 8 0 0 22 ] \nabla^2 f(x_1, x_2) = \begin{bmatrix} 8 & 0 \\ 0 & 22 \end{bmatrix} ∇2f(x1,x2)=[80022]正定, 则 H = [ ∇ 2 f ( x 1 , x 2 ) ] − 1 ≡ 1 8 [ 1 0 0 4 ] \boldsymbol{H} = \left[ \nabla^2 f(x_1, x_2) \right]^{-1} \equiv \dfrac{1}{8} \begin{bmatrix} 1 & 0 \\ 0 & 4 \end{bmatrix} H=[∇2f(x1,x2)]−1≡81[1004]。
x 0 = [ 0 , 0 ] T \boldsymbol{x}_0 = [0, 0]^{\rm T} x0=[0,0]T, H ∇ f ( x 0 ) = [ − 1 , − 2 ] T \boldsymbol{H} \nabla f(\boldsymbol{x}_0) = [-1, -2]^{\rm T} H∇f(x0)=[−1,−2]T,
解一维搜索问题 min f ( x 0 − α 0 H ∇ f ( x 0 ) ) = 8 α 0 2 − 16 α 0 \min f(\boldsymbol{x}_0 - \alpha_0 \boldsymbol{H} \nabla f(\boldsymbol{x}_0)) = 8\alpha_0^2 - 16\alpha_0 minf(x0−α0H∇f(x0))=8α02−16α0, 得 α 0 = 1 \alpha_0 = 1 α0=1,
更新 x 1 = x 0 − α 0 H ∇ f ( x 0 ) = [ 1 , 2 ] T \boldsymbol{x}_1 = \boldsymbol{x}_0 - \alpha_0 \boldsymbol{H} \nabla f(\boldsymbol{x}_0) = [1, 2]^{\rm T} x1=x0−α0H∇f(x0)=[1,2]T,
因为 ∇ f ( x 1 ) = [ 0 , 0 ] T \nabla f(\boldsymbol{x}_1) = [0, 0]^{\rm T} ∇f(x1)=[0,0]T, 所以 [ 1 , 2 ] T [1, 2]^{\rm T} [1,2]T是极小值点, 极小值为 f ( 1 , 2 ) = − 8 f(1,2) = -8 f(1,2)=−8。
令 { Δ x i = x i + 1 − x i Δ y i = ∇ f ( x i + 1 ) − ∇ f ( x i ) \begin{cases} \Delta \boldsymbol{x}_i = \boldsymbol{x}_{i+1} - \boldsymbol{x}_i \\ \Delta \boldsymbol{y}_i = \nabla f(\boldsymbol{x}_{i+1}) - \nabla f(\boldsymbol{x}_i) \end{cases} {Δxi=xi+1−xiΔyi=∇f(xi+1)−∇f(xi)又令 v i = Δ x i Δ x i T Δ y i − H i Δ y i Δ y i T H i Δ y i \boldsymbol{v}_i = \dfrac{\Delta \boldsymbol{x}_i}{\Delta \boldsymbol{x}_i^{\rm T} \Delta \boldsymbol{y}_i} - \dfrac{\boldsymbol{H}_i \Delta \boldsymbol{y}_i}{\Delta \boldsymbol{y}_i^{\rm T} \boldsymbol{H}_i \Delta \boldsymbol{y}_i} vi=ΔxiTΔyiΔxi−ΔyiTHiΔyiHiΔyi拟牛顿法的 H i \boldsymbol{H}_i Hi更新公式为 H i + 1 = H i + Δ x i Δ x i T Δ x i T Δ y i − H i Δ y i Δ y i T H i Δ y i T H i Δ y i + ρ Δ y i T H i Δ y i v i v i T \boldsymbol{H}_{i+1} = \boldsymbol{H}_i + \dfrac{\Delta \boldsymbol{x}_i \Delta \boldsymbol{x}_i^{\rm T}}{\Delta \boldsymbol{x}_i^{\rm T} \Delta \boldsymbol{y}_i} - \dfrac{\boldsymbol{H}_i \Delta \boldsymbol{y}_i \Delta \boldsymbol{y}_i^{\rm T} \boldsymbol{H}_i}{\Delta \boldsymbol{y}_i^{\rm T} \boldsymbol{H}_i \Delta \boldsymbol{y}_i} + \rho \Delta \boldsymbol{y}_i^{\rm T} \boldsymbol{H}_i \Delta \boldsymbol{y}_i \boldsymbol{v}_i \boldsymbol{v}_i^{\rm T} Hi+1=Hi+ΔxiTΔyiΔxiΔxiT−ΔyiTHiΔyiHiΔyiΔyiTHi+ρΔyiTHiΔyiviviT初始的 H 0 = E \boldsymbol{H}_0 = \boldsymbol{E} H0=E, α i \alpha_i αi由一维搜索确定。
当 ρ = 0 \rho=0 ρ=0时, 称为DFP算法;
当 ρ = 1 \rho = 1 ρ=1时, 称为BFGS算法。
【例6】用DFP算法求函数 f ( x 1 , x 2 ) = x 1 2 + 3 x 2 2 + 2 x 1 x 2 − x 1 + x 2 f(x_1, x_2) = x_1^2 + 3x_2^2 + 2x_1x_2 - x_1 + x_2 f(x1,x2)=x12+3x22+2x1x2−x1+x2的极小值, 初始点取原点。
【解】 ∇ f ( x 1 , x 2 ) = [ 2 x 1 + 2 x 2 − 1 , 2 x 1 + 6 x 2 + 1 ] T \nabla f(x_1, x_2) = [2x_1 + 2x_2 - 1, 2x_1 + 6x_2 + 1]^{\rm T} ∇f(x1,x2)=[2x1+2x2−1,2x1+6x2+1]T
第 1 1 1次迭代:
H 0 = E \boldsymbol{H}_0 = \boldsymbol{E} H0=E, ∇ f ( x 0 ) = [ − 1 , 1 ] T \nabla f(\boldsymbol{x}_0) = [-1, 1]^{\rm T} ∇f(x0)=[−1,1]T, H 0 ∇ f ( x 0 ) = [ − 1 , 1 ] T \boldsymbol{H}_0 \nabla f(\boldsymbol{x}_0) = [-1, 1]^{\rm T} H0∇f(x0)=[−1,1]T,
解一维搜索问题 min f ( x 0 − α 0 H 0 ∇ f ( x 0 ) ) = 2 α 0 2 − 2 α 0 \min f(\boldsymbol{x}_0 - \alpha_0 \boldsymbol{H}_0 \nabla f(\boldsymbol{x}_0)) = 2\alpha_0^2 -2\alpha_0 minf(x0−α0H0∇f(x0))=2α02−2α0, 得 α 0 = 1 2 \alpha_0 = \dfrac{1}{2} α0=21,
更新 x 1 = [ 1 2 , − 1 2 ] T \boldsymbol{x}_1 = \left[\dfrac{1}{2}, -\dfrac{1}{2} \right]^{\rm T} x1=[21,−21]T, ∇ f ( x 1 ) = [ − 1 , − 1 ] T ≠ 0 \nabla f(\boldsymbol{x}_1) = [-1, -1]^{\rm T} \ne \bold0 ∇f(x1)=[−1,−1]T=0;
第 2 2 2次迭代:
Δ x 0 = [ 1 2 , − 1 2 ] T \Delta \boldsymbol{x}_0 = \left[\dfrac{1}{2}, -\dfrac{1}{2} \right]^{\rm T} Δx0=[21,−21]T, Δ y 0 = [ − 2 , 0 ] T \Delta \boldsymbol{y}_0 = [-2, 0]^{\rm T} Δy0=[−2,0]T,
H 1 = H 0 + Δ x 0 Δ x 0 T Δ x 0 T Δ y 0 − H 0 Δ y 0 Δ y 0 T H 0 Δ y 0 T H 0 Δ y 0 = 1 4 [ 5 − 1 − 1 1 ] \boldsymbol{H}_1 = \boldsymbol{H}_0 + \dfrac{\Delta \boldsymbol{x}_0 \Delta \boldsymbol{x}_0^{\rm T}}{\Delta \boldsymbol{x}_0^{\rm T} \Delta \boldsymbol{y}_0} - \dfrac{\boldsymbol{H}_0 \Delta \boldsymbol{y}_0 \Delta \boldsymbol{y}_0^{\rm T} \boldsymbol{H}_0}{\Delta \boldsymbol{y}_0^{\rm T} \boldsymbol{H}_0 \Delta \boldsymbol{y}_0} = \dfrac{1}{4}\begin{bmatrix} 5 & -1 \\ -1 & 1 \end{bmatrix} H1=H0+Δx0TΔy0Δx0Δx0T−Δy0TH0Δy0H0Δy0Δy0TH0=41[5−1−11], H 1 ∇ f ( x 1 ) = [ − 1 , 0 ] T \boldsymbol{H}_1 \nabla f(\boldsymbol{x}_1) = [-1, 0]^{\rm T} H1∇f(x1)=[−1,0]T,
解一维搜索问题 min f ( x 1 − α 1 H 1 ∇ f ( x 1 ) ) = α 1 2 − α 1 − 1 2 \min f(\boldsymbol{x}_1 - \alpha_1 \boldsymbol{H}_1 \nabla f(\boldsymbol{x}_1)) = \alpha_1^2 - \alpha_1 - \dfrac{1}{2} minf(x1−α1H1∇f(x1))=α12−α1−21, 得 α 1 = 1 2 \alpha_1 = \dfrac{1}{2} α1=21,
更新 x 2 = [ 1 , − 1 2 ] T \boldsymbol{x}_2 = \left[1, -\dfrac{1}{2} \right]^{\rm T} x2=[1,−21]T, ∇ f ( x 2 ) = [ 0 , 0 ] T \nabla f(\boldsymbol{x}_2) = [0, 0]^{\rm T} ∇f(x2)=[0,0]T, 结束。
所以极小值点为 [ 1 , − 1 2 ] T \left[1, -\dfrac{1}{2} \right]^{\rm T} [1,−21]T, 极小值为 f ( 1 , − 1 2 ) = − 3 4 f(1, -\dfrac{1}{2}) = -\dfrac{3}{4} f(1,−21)=−43。
共轭梯度法的更新公式仍为 x i + 1 = x i − α i H i ∇ f ( x i ) = x i + α i d i \boldsymbol{x}_{i+1} = \boldsymbol{x}_i - \alpha_i \boldsymbol{H}_i \nabla f(\boldsymbol{x}_i) = \boldsymbol{x}_i + \alpha_i \boldsymbol{d}_i xi+1=xi−αiHi∇f(xi)=xi+αidi α i \alpha_i αi由一维搜索确定, 下降方向的更新公式为 d i + 1 = [ ∇ f ( x i + 1 ) ] T ∇ f ( x i + 1 ) [ ∇ f ( x i ) ] T ∇ f ( x i ) d i − ∇ f ( x i + 1 ) \boldsymbol{d}_{i+1} = \dfrac{\left[\nabla f(\boldsymbol{x}_{i+1}) \right]^{\rm T}\nabla f(\boldsymbol{x}_{i+1})}{\left[\nabla f(\boldsymbol{x}_i) \right]^{\rm T}\nabla f(\boldsymbol{x}_i)}\boldsymbol{d}_i - \nabla f(\boldsymbol{x}_{i+1}) di+1=[∇f(xi)]T∇f(xi)[∇f(xi+1)]T∇f(xi+1)di−∇f(xi+1)初始下降方向取负梯度方向 d 0 = − ∇ f ( x 0 ) \boldsymbol{d}_0 = -\nabla f(\boldsymbol{x}_0) d0=−∇f(x0)
【例7】用共轭梯度法求函数 f ( x 1 , x 2 ) = x 1 2 + x 2 2 − x 1 x 2 − 3 x 1 + 3 f(x_1, x_2) = x_1^2 + x_2^2 - x_1x_2 - 3x_1 + 3 f(x1,x2)=x12+x22−x1x2−3x1+3 的极小值, 初始点取原点。
【解】 ∇ f ( x 1 , x 2 ) = [ 2 x 1 − x 2 − 3 , 2 x 2 − x 1 ] T \nabla f(x_1, x_2) = [2x_1 - x_2 - 3, 2x_2 - x_1]^{\rm T} ∇f(x1,x2)=[2x1−x2−3,2x2−x1]T
第 1 1 1次迭代:
d 0 = − ∇ f ( 0 , 0 ) = [ 3 , 0 ] T \boldsymbol{d}_0 = -\nabla f(0, 0) = [3, 0]^{\rm T} d0=−∇f(0,0)=[3,0]T,
解一维搜索问题 min f ( x 0 + α 0 d 0 ) = 9 α 0 2 − 9 α 0 + 3 \min f(\boldsymbol{x}_0 + \alpha_0\boldsymbol{d}_0) = 9\alpha_0^2 - 9\alpha_0 + 3 minf(x0+α0d0)=9α02−9α0+3, 得 α 0 = 1 2 \alpha_0 = \dfrac{1}{2} α0=21,
更新 x 1 = x 0 + α 0 d 0 = [ 3 2 , 0 ] T \boldsymbol{x}_1 = \boldsymbol{x}_0 + \alpha_0 \boldsymbol{d}_0 = \left[\dfrac{3}{2}, 0 \right]^{\rm T} x1=x0+α0d0=[23,0]T, ∇ f ( x 1 ) = [ 0 , − 3 2 ] T ≠ 0 \nabla f(\boldsymbol{x}_1) = \left[0, -\dfrac{3}{2} \right]^{\rm T} \ne \bold0 ∇f(x1)=[0,−23]T=0;
第 2 2 2次迭代:
d 1 = [ ∇ f ( x 1 ) ] T ∇ f ( x 1 ) [ ∇ f ( x 0 ) ] T ∇ f ( x 0 ) d 0 − ∇ f ( x 1 ) = [ 3 4 , 3 2 ] T \boldsymbol{d}_1 = \dfrac{\left[\nabla f(\boldsymbol{x}_1) \right]^{\rm T}\nabla f(\boldsymbol{x}_1)}{\left[\nabla f(\boldsymbol{x}_0) \right]^{\rm T}\nabla f(\boldsymbol{x}_0)}\boldsymbol{d}_0 - \nabla f(\boldsymbol{x}_1) = \left[\dfrac{3}{4}, \dfrac{3}{2} \right]^{\rm T} d1=[∇f(x0)]T∇f(x0)[∇f(x1)]T∇f(x1)d0−∇f(x1)=[43,23]T,
解一维搜索问题 min f ( x 1 + α 1 d 1 ) = 27 16 α 1 2 − 9 4 α 1 + 3 4 \min f(\boldsymbol{x}_1 + \alpha_1\boldsymbol{d}_1) = \dfrac{27}{16}\alpha_1^2 - \dfrac{9}{4}\alpha_1 + \dfrac{3}{4} minf(x1+α1d1)=1627α12−49α1+43, 得 α 1 = 2 3 \alpha_1 = \dfrac{2}{3} α1=32,
更新 x 2 = x 1 + α 1 d 1 = [ 2 , 1 ] T \boldsymbol{x}_2 = \boldsymbol{x}_1 + \alpha_1 \boldsymbol{d}_1 = [2, 1]^{\rm T} x2=x1+α1d1=[2,1]T, ∇ f ( x 2 ) = [ 0 , 0 ] T \nabla f(\boldsymbol{x}_2) = [0, 0]^{\rm T} ∇f(x2)=[0,0]T, 结束。
所以极小值点为 [ 2 , 1 ] T [2, 1]^{\rm T} [2,1]T, 极小值为 f ( 2 , 1 ) = 0 f(2, 1) = 0 f(2,1)=0。