【优化方法学习笔记】第二章:无约束优化

本章目录

  • 1. 点列的收敛速度
  • 2. 共轭方向
    • 2.1 共轭与共轭方向组
    • 2.2 共轭方向组的性质
    • 2.3 共轭方向组的求法
  • 3. 一维搜索
    • 3.1 进退算法
    • 3.2 精确一维搜索
      • 3.2.1 平分法
      • 3.2.2 黄金分割法(0.618法)
      • 3.2.3 牛顿法
      • 3.2.4 抛物线法
    • 3.3 非精确一维搜索
  • 4. 多元函数的下降算法
    • 4.1 最速下降法、牛顿法和阻尼牛顿法
    • 4.2 拟牛顿法(变尺度法)
    • 4.3 共轭梯度法

1. 点列的收敛速度

设序列 { x k } \left \lbrace \boldsymbol{x}_k \right \rbrace {xk}收敛于 x ∗ \boldsymbol{x}^* x, 设 γ ≥ 1 \gamma \ge 1 γ1, β > 0 \beta > 0 β>0, 若极限 lim ⁡ k → ∞ ∥ x k + 1 − x ∗ ∥ ∥ x k − x ∗ ∥ γ = β \underset{k \to \infty}{\lim} \dfrac{\left \Vert \boldsymbol{x}_{k+1} - \boldsymbol{x}^* \right \Vert}{\left \Vert \boldsymbol{x}_k - \boldsymbol{x}^* \right \Vert ^ \gamma} = \beta klimxkxγxk+1x=β则称序列 { x k } \left \lbrace \boldsymbol{x}_k \right \rbrace {xk} γ \gamma γ阶收敛的。
γ = 1 \gamma = 1 γ=1, 则称序列 { x k } \left \lbrace \boldsymbol{x}_k \right \rbrace {xk}线性收敛的;
γ = 2 \gamma = 2 γ=2, 则称序列 { x k } \left \lbrace \boldsymbol{x}_k \right \rbrace {xk}二阶收敛的;
1 < γ < 2 1 < \gamma < 2 1<γ<2, 则称序列 { x k } \left \lbrace \boldsymbol{x}_k \right \rbrace {xk}超线性收敛的。

【例1】设 x n = ( 1 n ) n x_n = \left ( \dfrac{1}{n} \right )^n xn=(n1)n, 判断点列 { x n } \left \lbrace x_n \right \rbrace {xn}的收敛速度。
【解】点列 { x n } \left \lbrace x_n \right \rbrace {xn}收敛于 x ∗ = lim ⁡ n → ∞ ( 1 n ) n = e − lim ⁡ n → ∞ n ln ⁡ n = 0 x^* = \underset{n \to \infty}{\lim} \left ( \dfrac{1}{n} \right )^n = e^{-\underset{n \to \infty}{\lim} n\ln n} = 0 x=nlim(n1)n=enlimnlnn=0 lim ⁡ n → ∞ x n + 1 x n γ = lim ⁡ n → ∞ n γ n ( n + 1 ) n + 1 = lim ⁡ n → ∞ e γ n ln ⁡ n e ( n + 1 ) ln ⁡ ( n + 1 ) = β ≠ 0 \underset{n \to \infty}{\lim} \dfrac{x_{n+1}}{x_n^\gamma} = \underset{n \to \infty}{\lim} \dfrac{n^{\gamma n}}{(n+1)^{n+1}} = \underset{n \to \infty}{\lim} \dfrac{e^{\gamma n \ln n}}{e^{(n+1) \ln (n+1)}} = \beta \ne 0 nlimxnγxn+1=nlim(n+1)n+1nγn=nlime(n+1)ln(n+1)eγnlnn=β=0由洛必达法则得 β = lim ⁡ n → ∞ γ e γ n ln ⁡ n e ( n + 1 ) ln ⁡ ( n + 1 ) ⋅ 1 + ln ⁡ n 1 + ln ⁡ ( n + 1 ) = γ β \beta = \underset{n \to \infty}{\lim} \dfrac{\gamma e^{\gamma n \ln n}}{e^{(n+1) \ln (n+1)}} \cdot \dfrac{1+\ln n}{1+\ln (n+1)} = \gamma \beta β=nlime(n+1)ln(n+1)γeγnlnn1+ln(n+1)1+lnn=γβ解得 γ = 1 \gamma = 1 γ=1 { x n } \left \lbrace x_n \right \rbrace {xn}是线性收敛( 1 1 1阶收敛)的。

2. 共轭方向

2.1 共轭与共轭方向组

A \boldsymbol{A} A n n n阶对称矩阵, 若 n n n维列向量 p \boldsymbol{p} p q \boldsymbol{q} q满足 p T A q = 0 \boldsymbol{p}^{\rm T} \boldsymbol{A} \boldsymbol{q} = 0 pTAq=0, 则称 p \boldsymbol{p} p q \boldsymbol{q} q关于矩阵 A \boldsymbol{A} A共轭
n n n维非零列向量组 p 1 , p 2 , ⋯   , p m \boldsymbol{p}_1, \boldsymbol{p}_2, \cdots, \boldsymbol{p}_m p1,p2,,pm满足: ∀ i ≠ j \forall i \ne j i=j, p i T A p j = 0 \boldsymbol{p}_i^{\rm T} \boldsymbol{A}\boldsymbol{p}_j = 0 piTApj=0, 则称向量组 p 1 , p 2 , ⋯   , p m \boldsymbol{p}_1, \boldsymbol{p}_2, \cdots, \boldsymbol{p}_m p1,p2,,pm关于 A \boldsymbol{A} A共轭, 也称向量组 p 1 , p 2 , ⋯   , p m \boldsymbol{p}_1, \boldsymbol{p}_2, \cdots, \boldsymbol{p}_m p1,p2,,pm A \boldsymbol{A} A共轭方向组

2.2 共轭方向组的性质

A \boldsymbol{A} A是正定矩阵, 则以下两个结论成立:

  1. A \boldsymbol{A} A的共轭方向组线性无关;
  2. 若向量组 p 1 , p 2 , ⋯   , p m \boldsymbol{p}_1, \boldsymbol{p}_2, \cdots, \boldsymbol{p}_m p1,p2,,pm A \boldsymbol{A} A的共轭方向组, 向量 q \boldsymbol{q} q满足: ∀ i ∈ { 1 , 2 , ⋯   , m } \forall i \in \lbrace1, 2, \cdots, m \rbrace i{1,2,,m}, 都有 p i T A q = 0 \boldsymbol{p}_i^{\rm T}\boldsymbol{A}\boldsymbol{q} = 0 piTAq=0, 则 q = 0 \boldsymbol{q} = \bold0 q=0

2.3 共轭方向组的求法

A \boldsymbol{A} A n n n阶正定矩阵, n n n元向量组 p 1 , p 2 , ⋯   , p n \boldsymbol{p}_1, \boldsymbol{p}_2, \cdots, \boldsymbol{p}_n p1,p2,,pn线性无关, 下面的方法可以生成关于 A \boldsymbol A A的共轭方向组: { q 1 = p 1 q 2 = p 2 − p 2 T A q 1 q 1 T A q 1 q 1 ⋮ q n = p n − p n T A q 1 q 1 T A q 1 q 1 − p n T A q 2 q 2 T A q 2 q 2 − ⋯ − p n T A q n − 1 q n − 1 T A q n − 1 q n − 1 \begin{cases} \boldsymbol{q}_1 = \boldsymbol{p}_1 \\ \boldsymbol{q}_2 = \boldsymbol{p}_2 - \dfrac{\boldsymbol{p}_2^{\rm T} \boldsymbol{A} \boldsymbol{q}_1}{\boldsymbol{q}_1^{\rm T}\boldsymbol{A}\boldsymbol{q}_1}\boldsymbol{q}_1 \\ \vdots \\ \boldsymbol{q}_n = \boldsymbol{p}_n - \dfrac{\boldsymbol{p}_n^{\rm T} \boldsymbol{A} \boldsymbol{q}_1}{\boldsymbol{q}_1^{\rm T}\boldsymbol{A}\boldsymbol{q}_1}\boldsymbol{q}_1 - \dfrac{\boldsymbol{p}_n^{\rm T} \boldsymbol{A} \boldsymbol{q}_2}{\boldsymbol{q}_2^{\rm T}\boldsymbol{A}\boldsymbol{q}_2}\boldsymbol{q}_2 - \cdots - \dfrac{\boldsymbol{p}_n^{\rm T} \boldsymbol{A} \boldsymbol{q}_{n-1}}{\boldsymbol{q}_{n-1}^{\rm T}\boldsymbol{A}\boldsymbol{q}_{n-1}}\boldsymbol{q}_{n-1} \end{cases} q1=p1q2=p2q1TAq1p2TAq1q1qn=pnq1TAq1pnTAq1q1q2TAq2pnTAq2q2qn1TAqn1pnTAqn1qn1通式可以写为 q i = p i − ∑ j = 1 i − 1 p i T A q j q j T A q j q j \boldsymbol{q}_i = \boldsymbol{p}_i - \sum_{j=1}^{i-1}\dfrac{\boldsymbol{p}_i^{\rm T} \boldsymbol{A} \boldsymbol{q}_j}{\boldsymbol{q}_j^{\rm T}\boldsymbol{A}\boldsymbol{q}_j}\boldsymbol{q}_j qi=pij=1i1qjTAqjpiTAqjqj
【例2】求矩阵 M = [ 1 1 1 1 2 0 1 0 3 ] \boldsymbol{M} = \begin{bmatrix}1 & 1 & 1 \\ 1 & 2 & 0 \\ 1 & 0 & 3\end{bmatrix} M= 111120103 的一个共轭方向组。
【解】矩阵 M \boldsymbol{M} M是正定矩阵, 则 { a = e 1 = ( 1 , 0 , 0 ) T b = e 2 − e 2 T M a a T M a a = ( − 1 , 1 , 0 ) T c = e 3 − e 3 T M a a T M a a − e 3 T M b b T M b b = ( − 2 , 1 , 1 ) T \begin{cases} \boldsymbol{a} = \boldsymbol{e}_1 = (1, 0, 0)^{\rm T} \\ \boldsymbol{b} = \boldsymbol{e}_2 - \dfrac{\boldsymbol{e}_2^{\rm T} \boldsymbol{M} \boldsymbol{a}}{\boldsymbol{a}^{\rm T} \boldsymbol{M} \boldsymbol{a}} \boldsymbol{a} = (-1, 1, 0)^{\rm T} \\ \boldsymbol{c} = \boldsymbol{e}_3 - \dfrac{\boldsymbol{e}_3^{\rm T} \boldsymbol{M} \boldsymbol{a}}{\boldsymbol{a}^{\rm T} \boldsymbol{M} \boldsymbol{a}} \boldsymbol{a} - \dfrac{\boldsymbol{e}_3^{\rm T} \boldsymbol{M} \boldsymbol{b}}{\boldsymbol{b}^{\rm T} \boldsymbol{M} \boldsymbol{b}} \boldsymbol{b} = (-2, 1, 1)^{\rm T} \end{cases} a=e1=(1,0,0)Tb=e2aTMae2TMaa=(1,1,0)Tc=e3aTMae3TMaabTMbe3TMbb=(2,1,1)T M \boldsymbol{M} M的一个共轭方向组。

3. 一维搜索

3.1 进退算法

进退算法用于确定搜索的初始区间, 算法描述如下(使用C/C++语言描述)

typedef double var;
// 算法输入:目标函数f, 初始点x0, 初始步长step
// 算法输出:搜索区间[a, b]
void forwardBackward(var(*f)(var), var x0, var step, var *a, var *b)
{
	*b = x0 + step;
	if(f(b) > f(x0))
		step *= -1;
	while(1){
		*a = x0 + step;
		if(f(*a) > f(x0))
			break;
		*b = x0;
		x0 = *a;
		step *= 2;
	}
}

3.2 精确一维搜索

3.2.1 平分法

平分法的思想是:每次取两个端点和中点, 利用中点的导数信息判断下一次迭代区间。算法描述如下(使用C/C++语言描述)

typedef double var;
// 算法输入:目标函数的导函数df, 初始区间[a, b], 精度e
// 算法输出:极小值点x
var equalizationSplit(var(*df)(var), var a, var b, var e)
{
	var d, x;
	while(b - a > e){
		d = df(x=(a+b)/2);
		if(d > 0)
			b = x;
		else
			a = x;
	}
	return x;
}

【例3】用平分法求函数 f ( x ) = e x − 2 x − 1 f(x) = e^x - 2x - 1 f(x)=ex2x1的极小值点, 搜索区间为 [ − 1 , 1 ] [-1,1] [1,1], 迭代 3 3 3次即可。

【解】函数 f ( x ) f(x) f(x)的导函数为 f ′ ( x ) = e x − 2 f'(x) = e^x - 2 f(x)=ex2

1 1 1次迭代: a = − 1 a = -1 a=1, b = 1 b = 1 b=1, a + b 2 = 0 \dfrac{a+b}{2} = 0 2a+b=0, f ′ ( 0 ) = − 1 < 0 f'(0) =-1 < 0 f(0)=1<0, 更新 a = 0 a = 0 a=0

2 2 2次迭代: a = 0 a = 0 a=0, b = 1 b = 1 b=1, a + b 2 = 1 2 \dfrac{a+b}{2} = \dfrac{1}{2} 2a+b=21, f ′ ( 1 2 ) = e − 2 < 0 f'(\dfrac{1}{2}) =\sqrt{e} - 2 < 0 f(21)=e 2<0, 更新 a = 1 2 a = \dfrac{1}{2} a=21

3 3 3次迭代: a = 1 2 a = \dfrac{1}{2} a=21, b = 1 b = 1 b=1, a + b 2 = 3 4 \dfrac{a+b}{2} = \dfrac{3}{4} 2a+b=43, f ′ ( 3 4 ) = e 3 4 − 2 > 0 f'(\dfrac{3}{4}) =\sqrt[4]{e^3} - 2 > 0 f(43)=4e3 2>0, 更新 b = 3 4 b = \dfrac{3}{4} b=43

所以极小值点大约是 a + b 2 = 0.625 \dfrac{a+b}{2} = 0.625 2a+b=0.625

3.2.2 黄金分割法(0.618法)

0.618法的思想是:每次通过比较两个点的大小把区间缩小 0.382 0.382 0.382。算法描述如下(使用C/C++语言描述)

typedef double var;
// 算法输入:目标函数f, 初始区间[a, b], 精度e
// 算法输出:极小值点x
var goldenSection(var(*f)(var), var a, var b, var e)
{
	var len = b - a;
	const var r = 0.618;
	var left = b - r * len;
	var right = a + r * len;
	while((len > e){
		if(f(left) > f(right)){
			a = left;
			len = b - a;
			left = right;
			right = a + r * len;
		}else{
			b = right;
			len = b - a;
			right = left;
			left = b - r * len;
		}
	}
	return (a + b) / 2;
}

3.2.3 牛顿法

牛顿法的思想是:用函数 f ( x ) f(x) f(x)在已知点处的二阶泰勒展开式来近似 f ( x ) f(x) f(x), 迭代公式为 x i + 1 = x i − f ′ ( x i ) f ′ ′ ( x i ) x_{i+1}=x_i-\dfrac{f'(x_i)}{f''(x_i)} xi+1=xif′′(xi)f(xi)
【例4】用牛顿法求函数 f ( x ) = 3 x 4 − 16 x 3 + 30 x 2 − 24 x + 8 f(x) = 3x^4 - 16x^3 + 30x^2 - 24x +8 f(x)=3x416x3+30x224x+8的极小值点, 取初始点 x 0 = 3 x_0 = 3 x0=3, 迭代 3 3 3次即可。

【解】令 φ ( x ) = f ′ ( x ) f ′ ′ ( x ) = x 3 − 4 x 2 + 5 x − 2 3 x 2 − 8 x + 5 \varphi (x) = \dfrac{f'(x)}{f''(x)} = \dfrac{x^3 - 4x^2 + 5x - 2}{3x^2 - 8x + 5} φ(x)=f′′(x)f(x)=3x28x+5x34x2+5x2

1 1 1次迭代: x 1 = x 0 − φ ( x 0 ) = 3 − φ ( 3 ) = 5 2 x_1 = x_0 - \varphi (x_0) = 3 - \varphi(3) = \dfrac{5}{2} x1=x0φ(x0)=3φ(3)=25

2 2 2次迭代: x 2 = x 1 − φ ( x 1 ) = 5 2 − φ ( 5 2 ) = 11 5 x_2 = x_1 - \varphi (x_1) = \dfrac{5}{2} - \varphi(\dfrac{5}{2}) = \dfrac{11}{5} x2=x1φ(x1)=25φ(25)=511

3 3 3次迭代: x 3 = x 2 − φ ( x 2 ) = 11 5 − φ ( 11 5 ) = 41 20 x_3 = x_2 - \varphi (x_2) = \dfrac{11}{5} - \varphi(\dfrac{11}{5}) = \dfrac{41}{20} x3=x2φ(x2)=511φ(511)=2041

所以极小值点大约是 2.05 2.05 2.05

3.2.4 抛物线法

抛物线的思想与牛顿法类似, 也是使用二次函数来拟合目标函数, 牛顿法是使用一个点的函数值与该点处的一阶导数和二阶导数构造二次函数, 而抛物线法则使用三个点的函数值构造二次函数。

3.3 非精确一维搜索

对于一维搜索问题 min ⁡ α f ( x + α d ) \underset{\alpha}{\min} f(\boldsymbol{x} + \alpha \boldsymbol{d}) αminf(x+αd), 非精确一维搜索只要求:

  1. 下一次迭代点处的函数值比上一次小;
  2. 下一次迭代点处的方向导数比上一次大。

4. 多元函数的下降算法

对于无约束多元函数优化问题 min ⁡ f ( x ) \min f(\boldsymbol{x}) minf(x)下降算法的更新公式可以统一表示为 x i + 1 = x i − α i H i ∇ f ( x i ) \boldsymbol{x}_{i+1} = \boldsymbol{x}_i - \alpha_i \boldsymbol{H}_i \nabla f(\boldsymbol{x}_i) xi+1=xiαiHif(xi)下降方向为 d i = − H i ∇ f ( x i ) \boldsymbol{d}_i =- \boldsymbol{H}_i \nabla f(\boldsymbol{x}_i) di=Hif(xi)结束条件为 d i = 0 \boldsymbol{d}_i = \bold{0} di=0

4.1 最速下降法、牛顿法和阻尼牛顿法

最速下降法 H i ≡ E \boldsymbol{H}_i \equiv \boldsymbol{E} HiE, α i \alpha_i αi由一维搜索确定;

牛顿法 H i = ( ∇ 2 f ( x i ) ) − 1 \boldsymbol{H}_i = \left(\nabla^2f(\boldsymbol{x}_i) \right)^{-1} Hi=(2f(xi))1, α i ≡ 1 \alpha_i \equiv 1 αi1

阻尼牛顿法 H i = ( ∇ 2 f ( x i ) ) − 1 \boldsymbol{H}_i = \left(\nabla^2f(\boldsymbol{x}_i) \right)^{-1} Hi=(2f(xi))1, α i \alpha_i αi由一维搜索确定。

【例5】用阻尼牛顿法求函数 f ( x 1 , x 2 ) = 4 x 1 2 + x 2 2 − 8 x 1 − 4 x 2 f(x_1, x_2) = 4x_1^2 + x_2^2 - 8x_1 - 4x_2 f(x1,x2)=4x12+x228x14x2的极小值, 初始点取原点。

【解】 ∇ f ( x 1 , x 2 ) = [ 8 x 1 − 8 2 x 2 − 4 ] \nabla f(x_1, x_2) = \begin{bmatrix} 8x_1 - 8 \\ 2x_2 - 4 \end{bmatrix} f(x1,x2)=[8x182x24], ∇ 2 f ( x 1 , x 2 ) = [ 8 0 0 22 ] \nabla^2 f(x_1, x_2) = \begin{bmatrix} 8 & 0 \\ 0 & 22 \end{bmatrix} 2f(x1,x2)=[80022]正定, 则 H = [ ∇ 2 f ( x 1 , x 2 ) ] − 1 ≡ 1 8 [ 1 0 0 4 ] \boldsymbol{H} = \left[ \nabla^2 f(x_1, x_2) \right]^{-1} \equiv \dfrac{1}{8} \begin{bmatrix} 1 & 0 \\ 0 & 4 \end{bmatrix} H=[2f(x1,x2)]181[1004]

x 0 = [ 0 , 0 ] T \boldsymbol{x}_0 = [0, 0]^{\rm T} x0=[0,0]T, H ∇ f ( x 0 ) = [ − 1 , − 2 ] T \boldsymbol{H} \nabla f(\boldsymbol{x}_0) = [-1, -2]^{\rm T} Hf(x0)=[1,2]T,

解一维搜索问题 min ⁡ f ( x 0 − α 0 H ∇ f ( x 0 ) ) = 8 α 0 2 − 16 α 0 \min f(\boldsymbol{x}_0 - \alpha_0 \boldsymbol{H} \nabla f(\boldsymbol{x}_0)) = 8\alpha_0^2 - 16\alpha_0 minf(x0α0Hf(x0))=8α0216α0, 得 α 0 = 1 \alpha_0 = 1 α0=1,

更新 x 1 = x 0 − α 0 H ∇ f ( x 0 ) = [ 1 , 2 ] T \boldsymbol{x}_1 = \boldsymbol{x}_0 - \alpha_0 \boldsymbol{H} \nabla f(\boldsymbol{x}_0) = [1, 2]^{\rm T} x1=x0α0Hf(x0)=[1,2]T,

因为 ∇ f ( x 1 ) = [ 0 , 0 ] T \nabla f(\boldsymbol{x}_1) = [0, 0]^{\rm T} f(x1)=[0,0]T, 所以 [ 1 , 2 ] T [1, 2]^{\rm T} [1,2]T是极小值点, 极小值为 f ( 1 , 2 ) = − 8 f(1,2) = -8 f(1,2)=8

4.2 拟牛顿法(变尺度法)

{ Δ x i = x i + 1 − x i Δ y i = ∇ f ( x i + 1 ) − ∇ f ( x i ) \begin{cases} \Delta \boldsymbol{x}_i = \boldsymbol{x}_{i+1} - \boldsymbol{x}_i \\ \Delta \boldsymbol{y}_i = \nabla f(\boldsymbol{x}_{i+1}) - \nabla f(\boldsymbol{x}_i) \end{cases} {Δxi=xi+1xiΔyi=f(xi+1)f(xi)又令 v i = Δ x i Δ x i T Δ y i − H i Δ y i Δ y i T H i Δ y i \boldsymbol{v}_i = \dfrac{\Delta \boldsymbol{x}_i}{\Delta \boldsymbol{x}_i^{\rm T} \Delta \boldsymbol{y}_i} - \dfrac{\boldsymbol{H}_i \Delta \boldsymbol{y}_i}{\Delta \boldsymbol{y}_i^{\rm T} \boldsymbol{H}_i \Delta \boldsymbol{y}_i} vi=ΔxiTΔyiΔxiΔyiTHiΔyiHiΔyi拟牛顿法的 H i \boldsymbol{H}_i Hi更新公式为 H i + 1 = H i + Δ x i Δ x i T Δ x i T Δ y i − H i Δ y i Δ y i T H i Δ y i T H i Δ y i + ρ Δ y i T H i Δ y i v i v i T \boldsymbol{H}_{i+1} = \boldsymbol{H}_i + \dfrac{\Delta \boldsymbol{x}_i \Delta \boldsymbol{x}_i^{\rm T}}{\Delta \boldsymbol{x}_i^{\rm T} \Delta \boldsymbol{y}_i} - \dfrac{\boldsymbol{H}_i \Delta \boldsymbol{y}_i \Delta \boldsymbol{y}_i^{\rm T} \boldsymbol{H}_i}{\Delta \boldsymbol{y}_i^{\rm T} \boldsymbol{H}_i \Delta \boldsymbol{y}_i} + \rho \Delta \boldsymbol{y}_i^{\rm T} \boldsymbol{H}_i \Delta \boldsymbol{y}_i \boldsymbol{v}_i \boldsymbol{v}_i^{\rm T} Hi+1=Hi+ΔxiTΔyiΔxiΔxiTΔyiTHiΔyiHiΔyiΔyiTHi+ρΔyiTHiΔyiviviT初始的 H 0 = E \boldsymbol{H}_0 = \boldsymbol{E} H0=E, α i \alpha_i αi由一维搜索确定。
ρ = 0 \rho=0 ρ=0时, 称为DFP算法
ρ = 1 \rho = 1 ρ=1时, 称为BFGS算法

【例6】用DFP算法求函数 f ( x 1 , x 2 ) = x 1 2 + 3 x 2 2 + 2 x 1 x 2 − x 1 + x 2 f(x_1, x_2) = x_1^2 + 3x_2^2 + 2x_1x_2 - x_1 + x_2 f(x1,x2)=x12+3x22+2x1x2x1+x2的极小值, 初始点取原点。

【解】 ∇ f ( x 1 , x 2 ) = [ 2 x 1 + 2 x 2 − 1 , 2 x 1 + 6 x 2 + 1 ] T \nabla f(x_1, x_2) = [2x_1 + 2x_2 - 1, 2x_1 + 6x_2 + 1]^{\rm T} f(x1,x2)=[2x1+2x21,2x1+6x2+1]T

1 1 1次迭代:

H 0 = E \boldsymbol{H}_0 = \boldsymbol{E} H0=E, ∇ f ( x 0 ) = [ − 1 , 1 ] T \nabla f(\boldsymbol{x}_0) = [-1, 1]^{\rm T} f(x0)=[1,1]T, H 0 ∇ f ( x 0 ) = [ − 1 , 1 ] T \boldsymbol{H}_0 \nabla f(\boldsymbol{x}_0) = [-1, 1]^{\rm T} H0f(x0)=[1,1]T,

解一维搜索问题 min ⁡ f ( x 0 − α 0 H 0 ∇ f ( x 0 ) ) = 2 α 0 2 − 2 α 0 \min f(\boldsymbol{x}_0 - \alpha_0 \boldsymbol{H}_0 \nabla f(\boldsymbol{x}_0)) = 2\alpha_0^2 -2\alpha_0 minf(x0α0H0f(x0))=2α022α0, 得 α 0 = 1 2 \alpha_0 = \dfrac{1}{2} α0=21,

更新 x 1 = [ 1 2 , − 1 2 ] T \boldsymbol{x}_1 = \left[\dfrac{1}{2}, -\dfrac{1}{2} \right]^{\rm T} x1=[21,21]T, ∇ f ( x 1 ) = [ − 1 , − 1 ] T ≠ 0 \nabla f(\boldsymbol{x}_1) = [-1, -1]^{\rm T} \ne \bold0 f(x1)=[1,1]T=0

2 2 2次迭代:

Δ x 0 = [ 1 2 , − 1 2 ] T \Delta \boldsymbol{x}_0 = \left[\dfrac{1}{2}, -\dfrac{1}{2} \right]^{\rm T} Δx0=[21,21]T, Δ y 0 = [ − 2 , 0 ] T \Delta \boldsymbol{y}_0 = [-2, 0]^{\rm T} Δy0=[2,0]T,

H 1 = H 0 + Δ x 0 Δ x 0 T Δ x 0 T Δ y 0 − H 0 Δ y 0 Δ y 0 T H 0 Δ y 0 T H 0 Δ y 0 = 1 4 [ 5 − 1 − 1 1 ] \boldsymbol{H}_1 = \boldsymbol{H}_0 + \dfrac{\Delta \boldsymbol{x}_0 \Delta \boldsymbol{x}_0^{\rm T}}{\Delta \boldsymbol{x}_0^{\rm T} \Delta \boldsymbol{y}_0} - \dfrac{\boldsymbol{H}_0 \Delta \boldsymbol{y}_0 \Delta \boldsymbol{y}_0^{\rm T} \boldsymbol{H}_0}{\Delta \boldsymbol{y}_0^{\rm T} \boldsymbol{H}_0 \Delta \boldsymbol{y}_0} = \dfrac{1}{4}\begin{bmatrix} 5 & -1 \\ -1 & 1 \end{bmatrix} H1=H0+Δx0TΔy0Δx0Δx0TΔy0TH0Δy0H0Δy0Δy0TH0=41[5111], H 1 ∇ f ( x 1 ) = [ − 1 , 0 ] T \boldsymbol{H}_1 \nabla f(\boldsymbol{x}_1) = [-1, 0]^{\rm T} H1f(x1)=[1,0]T,

解一维搜索问题 min ⁡ f ( x 1 − α 1 H 1 ∇ f ( x 1 ) ) = α 1 2 − α 1 − 1 2 \min f(\boldsymbol{x}_1 - \alpha_1 \boldsymbol{H}_1 \nabla f(\boldsymbol{x}_1)) = \alpha_1^2 - \alpha_1 - \dfrac{1}{2} minf(x1α1H1f(x1))=α12α121, 得 α 1 = 1 2 \alpha_1 = \dfrac{1}{2} α1=21,

更新 x 2 = [ 1 , − 1 2 ] T \boldsymbol{x}_2 = \left[1, -\dfrac{1}{2} \right]^{\rm T} x2=[1,21]T, ∇ f ( x 2 ) = [ 0 , 0 ] T \nabla f(\boldsymbol{x}_2) = [0, 0]^{\rm T} f(x2)=[0,0]T, 结束。

所以极小值点为 [ 1 , − 1 2 ] T \left[1, -\dfrac{1}{2} \right]^{\rm T} [1,21]T, 极小值为 f ( 1 , − 1 2 ) = − 3 4 f(1, -\dfrac{1}{2}) = -\dfrac{3}{4} f(1,21)=43

4.3 共轭梯度法

共轭梯度法的更新公式仍为 x i + 1 = x i − α i H i ∇ f ( x i ) = x i + α i d i \boldsymbol{x}_{i+1} = \boldsymbol{x}_i - \alpha_i \boldsymbol{H}_i \nabla f(\boldsymbol{x}_i) = \boldsymbol{x}_i + \alpha_i \boldsymbol{d}_i xi+1=xiαiHif(xi)=xi+αidi α i \alpha_i αi由一维搜索确定, 下降方向的更新公式为 d i + 1 = [ ∇ f ( x i + 1 ) ] T ∇ f ( x i + 1 ) [ ∇ f ( x i ) ] T ∇ f ( x i ) d i − ∇ f ( x i + 1 ) \boldsymbol{d}_{i+1} = \dfrac{\left[\nabla f(\boldsymbol{x}_{i+1}) \right]^{\rm T}\nabla f(\boldsymbol{x}_{i+1})}{\left[\nabla f(\boldsymbol{x}_i) \right]^{\rm T}\nabla f(\boldsymbol{x}_i)}\boldsymbol{d}_i - \nabla f(\boldsymbol{x}_{i+1}) di+1=[f(xi)]Tf(xi)[f(xi+1)]Tf(xi+1)dif(xi+1)初始下降方向取负梯度方向 d 0 = − ∇ f ( x 0 ) \boldsymbol{d}_0 = -\nabla f(\boldsymbol{x}_0) d0=f(x0)

【例7】用共轭梯度法求函数 f ( x 1 , x 2 ) = x 1 2 + x 2 2 − x 1 x 2 − 3 x 1 + 3 f(x_1, x_2) = x_1^2 + x_2^2 - x_1x_2 - 3x_1 + 3 f(x1,x2)=x12+x22x1x23x1+3 的极小值, 初始点取原点。

【解】 ∇ f ( x 1 , x 2 ) = [ 2 x 1 − x 2 − 3 , 2 x 2 − x 1 ] T \nabla f(x_1, x_2) = [2x_1 - x_2 - 3, 2x_2 - x_1]^{\rm T} f(x1,x2)=[2x1x23,2x2x1]T

1 1 1次迭代:

d 0 = − ∇ f ( 0 , 0 ) = [ 3 , 0 ] T \boldsymbol{d}_0 = -\nabla f(0, 0) = [3, 0]^{\rm T} d0=f(0,0)=[3,0]T,

解一维搜索问题 min ⁡ f ( x 0 + α 0 d 0 ) = 9 α 0 2 − 9 α 0 + 3 \min f(\boldsymbol{x}_0 + \alpha_0\boldsymbol{d}_0) = 9\alpha_0^2 - 9\alpha_0 + 3 minf(x0+α0d0)=9α029α0+3, 得 α 0 = 1 2 \alpha_0 = \dfrac{1}{2} α0=21,

更新 x 1 = x 0 + α 0 d 0 = [ 3 2 , 0 ] T \boldsymbol{x}_1 = \boldsymbol{x}_0 + \alpha_0 \boldsymbol{d}_0 = \left[\dfrac{3}{2}, 0 \right]^{\rm T} x1=x0+α0d0=[23,0]T, ∇ f ( x 1 ) = [ 0 , − 3 2 ] T ≠ 0 \nabla f(\boldsymbol{x}_1) = \left[0, -\dfrac{3}{2} \right]^{\rm T} \ne \bold0 f(x1)=[0,23]T=0

2 2 2次迭代:

d 1 = [ ∇ f ( x 1 ) ] T ∇ f ( x 1 ) [ ∇ f ( x 0 ) ] T ∇ f ( x 0 ) d 0 − ∇ f ( x 1 ) = [ 3 4 , 3 2 ] T \boldsymbol{d}_1 = \dfrac{\left[\nabla f(\boldsymbol{x}_1) \right]^{\rm T}\nabla f(\boldsymbol{x}_1)}{\left[\nabla f(\boldsymbol{x}_0) \right]^{\rm T}\nabla f(\boldsymbol{x}_0)}\boldsymbol{d}_0 - \nabla f(\boldsymbol{x}_1) = \left[\dfrac{3}{4}, \dfrac{3}{2} \right]^{\rm T} d1=[f(x0)]Tf(x0)[f(x1)]Tf(x1)d0f(x1)=[43,23]T,

解一维搜索问题 min ⁡ f ( x 1 + α 1 d 1 ) = 27 16 α 1 2 − 9 4 α 1 + 3 4 \min f(\boldsymbol{x}_1 + \alpha_1\boldsymbol{d}_1) = \dfrac{27}{16}\alpha_1^2 - \dfrac{9}{4}\alpha_1 + \dfrac{3}{4} minf(x1+α1d1)=1627α1249α1+43, 得 α 1 = 2 3 \alpha_1 = \dfrac{2}{3} α1=32,

更新 x 2 = x 1 + α 1 d 1 = [ 2 , 1 ] T \boldsymbol{x}_2 = \boldsymbol{x}_1 + \alpha_1 \boldsymbol{d}_1 = [2, 1]^{\rm T} x2=x1+α1d1=[2,1]T, ∇ f ( x 2 ) = [ 0 , 0 ] T \nabla f(\boldsymbol{x}_2) = [0, 0]^{\rm T} f(x2)=[0,0]T, 结束。

所以极小值点为 [ 2 , 1 ] T [2, 1]^{\rm T} [2,1]T, 极小值为 f ( 2 , 1 ) = 0 f(2, 1) = 0 f(2,1)=0

你可能感兴趣的:(优化方法学习笔记,算法)