min f ( x ) = 1 2 ∑ i = 1 m r i 2 ( x ) = 1 2 r ( x ) T r ( x ) x ∈ R n , m ⩾ n (1) \min f\left( x\right) =\dfrac{1}{2}\sum ^{m}_{i=1}r^{2}_i\left( x\right) =\dfrac{1}{2}r\left( x\right) ^{T}r\left( x\right)\quad x\in \mathbb{R} ^{n},m\geqslant n\tag{1} minf(x)=21i=1∑mri2(x)=21r(x)Tr(x)x∈Rn,m⩾n(1)
这里 r ( x ) = ( r 1 ( x ) , r 2 ( x ) , ⋯ , r m ( x ) ) T r\left( x\right) =\left( r_{1}\left( x\right) ,r_{2}\left( x\right) ,\cdots ,r_{m}\left( x\right) \right) ^{T} r(x)=(r1(x),r2(x),⋯,rm(x))T称为剩余函数,点 x x x 处剩余函数的值称为剩余量。若 r i ( x ) r_i(x) ri(x)均为线性函数,则问题(1)为线性最小二乘问题,若至少有一个 r i ( x ) r_i(x) ri(x)为非线性函数,则问题(1) 为非线性最小二乘问题。
设 J ( x ) J(x) J(x)为 r ( x ) r(x) r(x)的Jacobian矩阵
J ( x ) = ∂ r ∂ x = [ ∇ r 1 ( x ) , … , ∇ r m ( x ) ] T ∈ R m × n (2) J\left( x\right) =\dfrac{\partial r}{\partial x}=\left[ \nabla r_{1}\left( x\right) ,\ldots ,\nabla r_{m}\left( x\right) \right] ^{T}\in \mathbb{R} ^{m\times n} \tag{2} J(x)=∂x∂r=[∇r1(x),…,∇rm(x)]T∈Rm×n(2)
则 f ( x ) f(x) f(x)的梯度为
g ( x ) = ∇ f ( x ) = ∑ i = 1 m r i ( x ) ∇ r i ( x ) = J T ( x ) r ( x ) (3) g\left( x\right) =\nabla f\left( x\right) =\sum ^{m}_{i=1}r_{i}\left( x\right) \nabla r_{i}\left( x\right) =J^{T}\left( x\right) r\left( x\right) \tag{3} g(x)=∇f(x)=i=1∑mri(x)∇ri(x)=JT(x)r(x)(3)
f ( x ) f(x) f(x)的Hesse矩阵为
G ( x ) = ∇ 2 f ( x ) = ∑ i = 1 m ∇ r i ( x ) ∇ r i ( x ) T + ∑ i = 1 m r i ( x ) ∇ 2 r i ( x ) = J T ( x ) J ( x ) + S ( x ) (4) \begin{aligned} G\left( x\right) &=\nabla ^{2}f\left( x\right) =\sum ^{m}_{i=1}\nabla r_{i}\left( x\right) \nabla r_{i}\left( x\right) ^{T}+\sum ^{m}_{i=1}r_{i}\left( x\right) \nabla ^{2}r_{i}\left( x\right) \\ &=J^{T}\left( x\right) J\left( x\right) +S\left( x\right) \end{aligned}\tag{4} G(x)=∇2f(x)=i=1∑m∇ri(x)∇ri(x)T+i=1∑mri(x)∇2ri(x)=JT(x)J(x)+S(x)(4)
其中
S ( x ) = ∑ i = 1 m r i ( x ) ∇ 2 r i ( x ) (5) S(x)=\sum ^{m}_{i=1}r_{i}\left( x\right) \nabla ^{2}r_{i}\left( x\right) \tag{5} S(x)=i=1∑mri(x)∇2ri(x)(5)
为便于讨论,我们采用以下记号:
J ∗ = J ( x ∗ ) , J k = J ( x k ) S ∗ = S ( x ∗ ) , S k = S ( x k ) J^{\ast}=J(x^{\ast}),\quad J_k=J(x_k) \\ S^{\ast}=S(x^{\ast}),\quad S_k=S(x_k) J∗=J(x∗),Jk=J(xk)S∗=S(x∗),Sk=S(xk)
在点 x ∗ x^{\ast} x∗处, ∥ S ∗ ∥ \Vert S^{\ast}\Vert ∥S∗∥的大小取决于剩余量与问题的非线性程度,对零剩余或线性最小二乘问题, ∥ S ∗ ∥ = 0 \Vert S^{\ast}\Vert=0 ∥S∗∥=0,随着剩余量的增大或 e i ( x ) ( i = 1 , ⋯ , m ) e_i(x)(i=1,\cdots,m) ei(x)(i=1,⋯,m)的非线性程度的增强, ∥ S ∗ ∥ \Vert S^{\ast}\Vert ∥S∗∥的值变大。根据问题的这种特点,将算法分为小剩余算法和大剩余算法。小剩余算法处理 ∥ S ∗ ∥ \Vert S^{\ast}\Vert ∥S∗∥为零或不太大的问题,大剩余算法处理 ∥ S ∗ ∥ \Vert S^{\ast}\Vert ∥S∗∥较大的问题。
f ( x ) = f ( x k ) + ∇ f ( x k ) T ( x − x k ) + 1 2 ( x − x k ) T ∇ 2 f ( x k ) ( x − x k ) + O ( ∥ x − x k ∥ 2 ) f\left( x\right) =f\left( x_{k}\right) +\nabla f\left( x_{k}\right) ^{T}\left( x-x_{k}\right) +\dfrac{1}{2}\left( x-x_{k}\right) ^{T}\nabla ^{2}f\left( x_{k}\right) \left( x-x_{k}\right) +O\left( \left\| x-x_{k}\right\| ^{2}\right) f(x)=f(xk)+∇f(xk)T(x−xk)+21(x−xk)T∇2f(xk)(x−xk)+O(∥x−xk∥2)
使用二阶泰勒展开进行局部近似,这是一个二次型
q ( x ) = f ( x k ) + ∇ f ( x k ) T ( x − x k ) + 1 2 ( x − x k ) T ∇ 2 f ( x k ) ( x − x k ) q\left( x\right) =f\left( x_{k}\right) +\nabla f\left( x_{k}\right) ^{T}\left( x-x_{k}\right) +\dfrac{1}{2}\left( x-x_{k}\right) ^{T}\nabla ^{2}f\left( x_{k}\right) \left( x-x_{k}\right) q(x)=f(xk)+∇f(xk)T(x−xk)+21(x−xk)T∇2f(xk)(x−xk)
二次型的极值可以通过令导数为0求得
q ′ ( x ) = ∇ f ( x k ) + ∇ 2 f ( x k ) ( x − x k ) = 0 q'\left( x\right) =\nabla f\left( x_{k}\right) +\nabla ^{2}f\left( x_{k}\right) \left( x-x_{k}\right)=0 q′(x)=∇f(xk)+∇2f(xk)(x−xk)=0
令 d = x − x k d=x-x_k d=x−xk为增量,代入 ∇ f ( x ) , ∇ 2 f ( x ) \nabla f(x),\nabla^2 f(x) ∇f(x),∇2f(x)得
( J k T J k + S k ) d = − J k T r k (6) \left( J_{k}^{T}J_{k}+S_{k}\right) d=-J_{k}^{T}r_{k}\tag{6} (JkTJk+Sk)d=−JkTrk(6)
对最小二乘问题, Newton 方法的缺点是每次迭代都要求 S k S_k Sk ,即计算m个 n × n n\times n n×n对称矩阵.显然,对一个算法而言, S k S_k Sk 的计算是一个沉重的负担.解决这个问题的方法是或者在 Newton 方程中忽略 S k S_k Sk ,或者用一阶导数信息近似 S k S_k Sk 。而要忽略 S k S_k Sk ,则应在 r i ( x ) r_i(x) ri(x)接近于0或接近于线性时进行。这就是下面我们要讲的小剩余算法。
在Newton方程(6)中忽略 S k S_k Sk就得到Gauss-Newton(GN)方法。该方法也可以这样理解,在点 x k x_k xk处线性化剩余函数 r i ( x k + d ) r_i(x_k+d) ri(xk+d),我们得到关于 d d d的线性最小二乘问题
min d ∈ R n q k ( d ) = 1 2 ∥ r k + J k d ∥ 2 2 (7) \min_{d\in \mathbb{R}^n}q_k(d)=\dfrac{1}{2}\Vert r_k+J_kd\Vert^2_2\tag{7} d∈Rnminqk(d)=21∥rk+Jkd∥22(7)
其中
q k ( d ) = 1 2 ( J k d + r k ) T ( J k d + r k ) = 1 2 d T J k T J k d + d T ( J k T r k ) + 1 2 r k T r k (8) \begin{aligned} q_k(d)&=\dfrac{1}{2}(J_k d+r_k)^T(J_k d+r_k)\\ &= \dfrac{1}{2}d^{T}J_{k}^{T}J_{k}d+d^{T}\left( J_{k}^{T}r_{k}\right) +\dfrac{1}{2}r_{k}^{T}r_{k} \end{aligned}\tag{8} qk(d)=21(Jkd+rk)T(Jkd+rk)=21dTJkTJkd+dT(JkTrk)+21rkTrk(8)
这里 q k ( d ) q_k(d) qk(d)是对 f ( x k + d ) f(x_k+d) f(xk+d)的一种二次近似,它与 f ( x k + d ) f(x_k+d) f(xk+d)的二次Taylor近似的差别在于二次项中少了 S k S_k Sk。
问题(7)的极小点 d k d_k dk满足
J k T J k d k = − J k T r k (9) J_{k}^{T}J_{k}d_k=-J_{k}^{T}r_{k}\tag{9} JkTJkdk=−JkTrk(9)
式(9)称为Gauss-Newton方程,由(9)式得到的方向 d k d_k dk称为Gauss-Newton方向。
用 Gauss-Newton 方法求解最小二乘问题的算法如下
算法1 (Gauss-Newton 方法求解最小二乘问题)
基本Gauss-Newton方法是指 α k = 1 \alpha_k =1 αk=1的Gauss-Newton方法.带线搜索的Gauss-Newton方法称为阻尼Gauss-Newton 方法.
Gauss-Newton方法的优点在于它无须计算 r ( x ) r(x) r(x)的二阶导数.另外,由(3)式和(9)式知
d k T g k = d k T J k T r k = − d k T J k T J k d k = − ∥ J k d k ∥ 2 d_{k}^{T}g_{k}=d_{k}^{T}J_{k}^{T}r_{k}=-d_{k}^{T}J_{k}^{T}J_{k}d_{k}=-\left\| J_{k}d_{k}\right\| ^{2} dkTgk=dkTJkTrk=−dkTJkTJkdk=−∥Jkdk∥2
这说明.当 J k J_k Jk满秩, g k g_k gk非零时, d k d_k dk是下降方向。
定理2(基本Gauss-Newton 方法的局部收敛性)
设 r i ( x ) ∈ C 2 ( i = 1 , ⋯ , m ) , x ∗ r_i(x)\in C^2(i=1,\cdots,m),x^{\ast} ri(x)∈C2(i=1,⋯,m),x∗是最小二乘问题(1)的最优解,且 J ∗ T J ∗ J^{\ast T}J^{\ast} J∗TJ∗正定。假设由基本Gauss-Newton法迭代产生的点列 { x k } \{x_k\} {xk}收敛于 x ∗ x^{\ast} x∗,则当 G ( x ) G(x) G(x)与 J ( x ) T J ( x ) J(x)^TJ(x) J(x)TJ(x)在 x ∗ x^{\ast} x∗的邻域内Lipschitz连续时,有
∥ h k + 1 ∥ ⩽ ∥ ( J ∗ T J ∗ ) − 1 ∥ ∥ S ∗ ∥ ∥ h k ∥ + O ( ∥ h k ∥ 2 ) \left\| h_{k+1}\right\| \leqslant \left\| \left( J^{\ast T}J^{\ast}\right) ^{-1}\right\|\left\|S^{\ast}\right\| \left\| h_{k}\right\| +O\left( \left\| h_{k}\right\| ^{2}\right) ∥hk+1∥⩽ (J∗TJ∗)−1 ∥S∗∥∥hk∥+O(∥hk∥2)
其中 h k = x k − x ∗ h_k=x_k-x^{\ast} hk=xk−x∗。
证明
因为 f ∈ C 2 f\in C^2 f∈C2,且 G ( x ) G(x) G(x) 在 x ∗ x^{\ast} x∗的邻域内Lipschitz连续,当 x k x_k xk充分接近 x ∗ x^\ast x∗时,由Newton法收敛性的定理证明知
g ( x k + d ) = g k + G k d + O ( ∥ d ∥ 2 ) g\left( x_{k}+d\right) =g_{k}+G_{k}d+O\left( \left\| d\right\| ^{2}\right) g(xk+d)=gk+Gkd+O(∥d∥2)
令 d = − h k d=-h_k d=−hk,得
0 = g ∗ = g k − G k h k + O ( ∥ h k ∥ 2 ) 0=g^{\ast }=g_{k}-G_{k}h_{k}+O\left( \left\| h_{k}\right\| ^{2}\right) 0=g∗=gk−Gkhk+O(∥hk∥2)
将(3)(4)式代入上式得
J k T r k − ( J k T J k + S k ) h k + O ( ∥ h k ∥ 2 ) = 0 (10) J_{k}^{T}r_{k}-\left( J_{k}^{T}J_{k}+S_{k}\right) h_{k}+O\left( \left\| h_{k}\right\| ^{2}\right) =0\tag{10} JkTrk−(JkTJk+Sk)hk+O(∥hk∥2)=0(10)
因为 J ∗ T J ∗ J^{\ast T}J^{\ast} J∗TJ∗正定,当 x k x_k xk充分接近 x ∗ x^* x∗时, J k T J k J_k^TJ_k JkTJk亦正定,我们用 ( J k T J k ) − 1 (J_k^TJ_k)^{-1} (JkTJk)−1左乘(10)式,由(8)式得
− d k − h k − ( J k T J k ) − 1 S k h k + O ( ∥ h k ∥ 2 ) = 0 -d_{k}-h_{k}-\left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}h_{k}+O\left( \left\| h_{k}\right\| ^{2}\right) =0 −dk−hk−(JkTJk)−1Skhk+O(∥hk∥2)=0
因为
d k + h k = x k + 1 − x k + x k − x ∗ = h k + 1 d_{k}+h_{k}=x_{k+1}-x_{k}+x_{k}-x^{\ast }=h_{k+1} dk+hk=xk+1−xk+xk−x∗=hk+1
所以
h k + 1 = − ( J k T J k ) − 1 S k h k + O ( ∥ h k ∥ 2 ) ∥ h k + 1 ∥ ⩽ ∥ ( J k T J k ) − 1 S k ∥ ∥ h k ∥ + O ( ∥ h k ∥ 2 ) ⩽ ∥ ( J k T J k ) − 1 S k − ( J ∗ T J ∗ ) − 1 S ∗ ∥ ∥ h k ∥ + ∥ ( J ∗ T J ∗ ) − 1 ∥ ∥ S ∗ ∥ ∥ h k ∥ + O ( ∥ h k ∥ 2 ) (11) \begin{aligned} h_{k+1}&=-\left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}h_{k}+O\left( \left\| h_{k}\right\| ^{2}\right) \\ \left\| h_{k+1}\right\| &\leqslant \left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}\right\| \left\| h_{k}\right\| +O\left( \left\| h_{k}\right\| ^{2}\right) \\ &\leqslant \left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}-\left( J^{\ast T}J^{\ast }\right) ^{-1}S^{\ast }\right\| \left\| h_{k}\right\| +\left\| \left( J^{\ast T}J^{\ast }\right) ^{-1}\right\| \left\| S^{\ast }\right\| \left\| h_{k}\right\| +O\left( \left\| h_{k}\right\| ^{2}\right) \end{aligned}\tag{11} hk+1∥hk+1∥=−(JkTJk)−1Skhk+O(∥hk∥2)⩽ (JkTJk)−1Sk ∥hk∥+O(∥hk∥2)⩽ (JkTJk)−1Sk−(J∗TJ∗)−1S∗ ∥hk∥+ (J∗TJ∗)−1 ∥S∗∥∥hk∥+O(∥hk∥2)(11)
在下面关于 S ( x ) S (x) S(x)和 ( J ( z ) T J ( z ) ) − 1 (J(z)^TJ(z))^{-1} (J(z)TJ(z))−1在 x ∗ x^{\ast} x∗的邻域内Lipschitz连续的证明中,对于任意矩阵 A ( x ) A(x) A(x),我们采用记号 A x = A ( x ) A_x = A ( x ) Ax=A(x).因为 G x G_x Gx和 J x T J x J_x^TJ_x JxTJx 在 x ∗ x^{\ast} x∗的邻域中Lipschitz连续,所以存在 β , γ > 0 \beta,\gamma>0 β,γ>0,使得对 x ∗ x^{\ast} x∗邻域内的任意两点 x , y x , y x,y ,有
∥ G ( x ) − G ( y ) ∥ ⩽ β ∥ x − y ∥ ∥ J ( x ) T J ( x ) − J ( y ) T J ( y ) ∥ ⩽ γ ∥ x − y ∥ \begin{aligned}\left\| G\left( x\right) -G\left( y\right) \right\| &\leqslant \beta \left\| x-y\right\| \\ \left\| J\left( x\right) ^{T}J\left( x\right) -J\left( y\right) ^{T}J\left( y\right) \right\| &\leqslant \gamma \left\| x-y\right\| \end{aligned} ∥G(x)−G(y)∥ J(x)TJ(x)−J(y)TJ(y) ⩽β∥x−y∥⩽γ∥x−y∥
从而
∥ S ( x ) − S ( y ) ∥ = ∥ G ( x ) − a ( y ) − J ( x ) T J ( x ) + J ( y ) T J ( Y ) ∥ ⩽ ∥ G ( x ) − G ( y ) ∥ + ∥ J ( x ) T J ( x ) − J ( y ) T J ( y ) ∥ ⩽ ( β + γ ) ∥ x − y ∥ \begin{aligned}\left\| S\left( x\right) -S\left( y\right) \right\| &=\left\| G\left( x\right) -a\left( y\right) -J\left( x\right) ^{T}J\left( x\right) +J\left( y\right) ^{T}J\left( Y\right) \right\| \\ &\leqslant \left\| G\left( x\right) -G\left( y\right) \right\| +\left\| J\left( x\right) ^{T}J\left( x\right) -J\left( y\right) ^{T}J\left( y\right) \right\| \\ &\leqslant \left( \beta +\gamma \right) \left\| x-y\right\| \end{aligned} ∥S(x)−S(y)∥= G(x)−a(y)−J(x)TJ(x)+J(y)TJ(Y) ⩽∥G(x)−G(y)∥+ J(x)TJ(x)−J(y)TJ(y) ⩽(β+γ)∥x−y∥
对 x ∗ x^{\ast} x∗邻域内的任意点 x x x,由 J ∗ T J ∗ J^{\ast T}J^{\ast} J∗TJ∗的正定性知,存在 ξ > 0 \xi >0 ξ>0,使得 ∥ ( J x T J x ) − 1 ∥ ⩽ ξ \lVert(J^T_xJ_x)^{-1}\rVert\leqslant \xi ∥(JxTJx)−1∥⩽ξ,从而
∥ ( J x T J x ) − 1 − ( J y T J y ) − 1 ∥ = ∥ ( J x T J x ) − 1 ( J y T J y − J x T J x ) ( J y T J y ) − 1 ∥ ⩽ ∥ ( J x T J x ) − 1 ∥ ∥ ( J y T J y ) − 1 ∥ ∥ J y T J y − J x T J x ∥ ⩽ γ ξ 2 ∥ x − y ∥ \begin{aligned} \left\| \left( J_{x}^{T}J_{x}\right) ^{-1}-\left( J_{y}^{T}J_{y}\right) ^{-1}\right\| &=\left\| \left( J_{x}^{T}J_{x}\right) ^{-1}\left( J_{y}^{T}J_{y}-Jx^{T}J_{x}\right) \left( J_{y}^{T}Jy\right) ^{-1}\right\| \\ &\leqslant \left\| \left( J_{x}^{T}J_{x}\right) ^{-1}\right\| \left\| \left( J_{y}^{T}J_{y}\right) ^{-1}\right\| \left\| J_{y}^{T}J_{y}-J_{x}^{T}Jx\right\| \\ &\leqslant \gamma \xi ^{2}\left\| x-y\right\| \end{aligned} (JxTJx)−1−(JyTJy)−1 = (JxTJx)−1(JyTJy−JxTJx)(JyTJy)−1 ⩽ (JxTJx)−1 (JyTJy)−1 JyTJy−JxTJx ⩽γξ2∥x−y∥
所以 S x S_x Sx 与 ( J x T J x ) − 1 (J_x^TJ_x)^{-1} (JxTJx)−1也在 x ∗ x^{\ast} x∗的邻域内Lipschitz连续。
当 x k x_k xk充分接近 x ∗ x^{\ast} x∗时,有
∥ ( J k T J k ) − 1 S k − ( J ∗ T J ∗ ) − 1 S ∗ ∥ ⩽ ∥ ( J k T J k ) − 1 S k − ( J k T J k ) − 1 S ∗ ∥ + ∥ ( J k T J k ) − 1 S ∗ − ( J ∗ T J ∗ ) − 1 S ∗ ∥ ⩽ ( β + γ ) ∥ ( J k T J k ) − 1 ∥ ∥ h k ∥ + γ ξ 2 ∥ S ∗ ∥ ∥ h k ∥ ⩽ ( ( β + γ ) ξ + γ ξ 2 ∥ S ∗ ∥ ) ∥ h k ∥ \begin{aligned} &\left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}-\left( J^{\ast T}J^{\ast }\right) ^{-1}S^{\ast }\right\| \\ &\leqslant \left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}-\left( J_{k}^{T}J_{k}\right) ^{-1}S^{\ast }\right\| +\left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S^{\ast }-\left( J^{\ast T}J^{\ast }\right) ^{-1}S^{\ast }\right\| \\ &\leqslant \left( \beta +\gamma \right) \left\| \left( J_{k}^{T}J_{k}\right) ^{-1}\right\| \left\| h_{k}\right\| +\gamma \xi ^{2}\left\| S^{\ast }\right\| \left\| h_{k}\right\| \\ &\leqslant \left( \left( \beta +\gamma \right) \xi +\gamma \xi ^{2}\left\| S^{\ast }\right\| \right) \left\| h_{k}\right\| \end{aligned} (JkTJk)−1Sk−(J∗TJ∗)−1S∗ ⩽ (JkTJk)−1Sk−(JkTJk)−1S∗ + (JkTJk)−1S∗−(J∗TJ∗)−1S∗ ⩽(β+γ) (JkTJk)−1 ∥hk∥+γξ2∥S∗∥∥hk∥⩽((β+γ)ξ+γξ2∥S∗∥)∥hk∥
所以
∥ ( J k T J k ) − 1 S k − ( J ∗ T J ∗ ) − 1 S ∗ ∥ ∥ h k ∥ ⩽ ( ( β + γ ) ξ + γ ξ 2 ∥ S ∗ ∥ ) ∥ h k ∥ 2 \left\| \left( J_{k}^{T}J_{k}\right) ^{-1}S_{k}-\left( J^{\ast T}J^{\ast }\right) ^{-1}S^{\ast }\right\| \left\|h_k\right\|\leqslant \left( \left( \beta +\gamma \right) \xi +\gamma \xi ^{2}\left\| S^{\ast }\right\| \right) \left\| h_{k}\right\| ^2 (JkTJk)−1Sk−(J∗TJ∗)−1S∗ ∥hk∥⩽((β+γ)ξ+γξ2∥S∗∥)∥hk∥2
将上式代入(11)式可得
∥ h k + 1 ∥ ⩽ ∥ ( J ∗ T J ∗ ) − 1 ∥ ∥ S ∗ ∥ ∥ h k ∥ + O ( ∥ h k ∥ 2 ) \left\| h_{k+1}\right\| \leqslant \left\| \left( J^{\ast T}J^{\ast}\right) ^{-1}\right\|\left\|S^{\ast}\right\| \left\| h_{k}\right\| +O\left( \left\| h_{k}\right\| ^{2}\right) ∥hk+1∥⩽ (J∗TJ∗)−1 ∥S∗∥∥hk∥+O(∥hk∥2)
故定理结论成立。
该定理说明,若 x k → x ∗ x_k\to x^{\ast} xk→x∗,基本Gauss-Newton方法有如下两种情形的收敛速度:
由此可见,基本Gauss-Newton方法的收敛速度是与 x ∗ x^{\ast} x∗处剩余量的大小及剩余函数的线性程度有关的,即剩余量越小或剩余函数越接近线性,它的收敛速度就越快;反之就越慢,甚至对剩余量很大或剩余函数的非线性程度很强的问题不收敛.