最小二乘法数学原理推导

问题定义

(1) A x = b Ax=b \tag{1} Ax=b(1)
在实际问题中,该方程组可能不存在真正的解,这时我们就希望可以求解它的一个近似解 x ∗ x^* x,使得其能尽可能地接近 (1) 的真正解,其中 A ∈ R k × n , b ∈ R k A \in R^{k×n}, b \in R^k ARk×n,bRk 是已知的,而 x ∈ R n x \in R^n xRn 是未知量。注意,这里假定 k ≥ n k \geq n kn A T A ∈ R n × n A^TA \in R^{n×n} ATARn×n 是可逆的
在数学上,已经给出这类问题的解了: x ∗ = ( A T A ) − 1 A T b x^*=(A^TA)^{-1}A^Tb x=(ATA)1ATb
下面就来简单的推导下该解的由来:

法一:

(2) arg ⁡ min ⁡ x A x ≈ b ⇒ arg ⁡ min ⁡ x f ( x ) = ∥ A x − b ∥ 2 \mathop{\arg\min}_x{Ax \approx b} \Rightarrow \mathop{\arg\min}_x{f(x)=\|Ax-b\|^2} \tag{2} argminxAxbargminxf(x)=Axb2(2)
对于 (2) 式,将其展开:
(3) f ( x ) = ∥ A x − b ∥ 2 = ( A x − b ) T ( A x − b ) = ( x T A T − b T ) ( A x − b ) = x T A T A x − x T A T b − b T A x + b T b = x T A T A x − 2 b T A x + b T b = x T ( A T A ) 1 / 2 ⎵ y T ( A T A ) 1 / 2 x ⎵ y − 2 b T A ( A T A ) − 1 / 2 ( A T A ) 1 / 2 x ⎵ y + b T b = y T y − 2 b T A ( A T A ) − 1 / 2 y + b T b = y T y − 2 ( ( A T A ) − 1 / 2 A T b ) T y + ( ( A T A ) − 1 / 2 A T b ) T ( ( A T A ) − 1 / 2 A T b ) + b T b − ( ( A T A ) − 1 / 2 A T b ) T ( ( A T A ) − 1 / 2 A T b ) ⎵ d = ( y − ( A T A ) − 1 / 2 A T b ) T ( y − ( A T A ) − 1 / 2 A T b ) + d = ∥ y − ( A T A ) − 1 / 2 A T b ∥ 2 + d \begin{array}{lll} f(x) &=& \|Ax - b\|^2 \\ &=& (Ax - b)^T (Ax - b) \\ &=& (x^TA^T - b^T) (Ax - b) \\ &=& x^TA^TAx - x^TA^Tb - b^TAx + b^Tb \\ &=& x^TA^TAx - 2b^TAx + b^Tb \\ &=& \underbrace{x^T(A^TA)^{1/2}}_{y^T} \underbrace{(A^TA)^{1/2}x}_y - 2b^TA(A^TA)^{-1/2} \underbrace{(A^TA)^{1/2}x}_y + b^Tb \\ &=& y^T y - 2 b^T A (A^T A)^{-1/2} y + b^T b \\ &=& y^Ty - 2 \left((A^T A)^{-1/2}A^T b\right)^T y + \left((A^T A)^{-1/2}A^T b\right)^T \left((A^T A)^{-1/2}A^T b\right) + \underbrace{b^Tb - \left((A^T A)^{-1/2}A^T b\right)^T \left((A^T A)^{-1/2}A^T b\right)}_d \\ &=& \left(y - (A^T A)^{-1/2} A^T b\right)^T \left(y - (A^T A)^{-1/2} A^T b\right) + d \\ &=& \left\|y - (A^T A)^{-1/2} A^T b\right\|^2 + d \end{array} \tag{3} f(x)==========Axb2(Axb)T(Axb)(xTATbT)(Axb)xTATAxxTATbbTAx+bTbxTATAx2bTAx+bTbyT xT(ATA)1/2y (ATA)1/2x2bTA(ATA)1/2y (ATA)1/2x+bTbyTy2bTA(ATA)1/2y+bTbyTy2((ATA)1/2ATb)Ty+((ATA)1/2ATb)T((ATA)1/2ATb)+d bTb((ATA)1/2ATb)T((ATA)1/2ATb)(y(ATA)1/2ATb)T(y(ATA)1/2ATb)+dy(ATA)1/2ATb2+d(3)
通过 (3) 式可知,当 y = ( A T A ) − 1 / 2 A T b y=(A^T A)^{-1/2} A^T b y=(ATA)1/2ATb 时, f ( x ) f(x) f(x) 取得最小值 d d d
(4) y = ( A T A ) − 1 / 2 A T b ⇒ ( A T A ) 1 / 2 x = ( A T A ) − 1 / 2 A T b ⇒ x = ( A T A ) − 1 A T b \begin{array}{lrll} &y&=&(A^T A)^{-1/2} A^T b \\ \Rightarrow &(A^TA)^{1/2}x&=& (A^T A)^{-1/2} A^T b \\ \Rightarrow &x&=& (A^T A)^{-1} A^T b \end{array} \tag{4} y(ATA)1/2xx===(ATA)1/2ATb(ATA)1/2ATb(ATA)1ATb(4)
即当 x = ( A T A ) − 1 A T b x = (A^T A)^{-1} A^T b x=(ATA)1ATb 时, f ( x ) f(x) f(x) 取最小值 d d d。所以 x = ( A T A ) − 1 A T b x = (A^T A)^{-1} A^T b x=(ATA)1ATb 是式 (1) 的最近解。

法二:

(4) f ( x ) = ∥ A x − b ∥ 2 = ( A x − b ) T ( A x − b ) ∂ f ( x ) ∂ A = A T ( A x − b ) + ( A x − b ) T ( A ) = 2 A T ( A x − b ) \begin{array}{lll} f(x) &=& \|Ax - b\|^2 \\ &=& (Ax - b)^T (Ax - b) \\ \displaystyle\frac{\partial f(x)}{\partial A} &=& A^T (Ax-b) + (Ax-b)^T(A) = 2 A^T(Ax - b) \end{array} \tag{4} f(x)Af(x)===Axb2(Axb)T(Axb)AT(Axb)+(Axb)T(A)=2AT(Axb)(4)
∂ f ( x ) ∂ A = 0 \displaystyle\frac{\partial f(x)}{\partial A} = 0 Af(x)=0 可得,若 A T A A^TA ATA 是满秩矩阵或者正定阵,即当 A T A A^TA ATA 可逆时:
A T A x = A T b → x = ( A T A ) − 1 A T b A^TA x = A^T b \to x = (A^TA)^{-1}A^Tb ATAx=ATbx=(ATA)1ATb

你可能感兴趣的:(数学)