推导正规方程的解

1. 准备工作

1.1 矩阵转置公式 与 求导公式

1.1.1 转置公式:

  • ( m A ) T = m A T (mA)^T=mA^T (mA)T=mAT,M是常数
  • ( A + B ) T = A T + B T (A+B)^T = A^T + B^T (A+B)T=AT+BT
  • ( A B ) T = B T A T (AB)^T = B^TA^T (AB)T=BTAT
  • ( A T ) T = A (A^T)^T = A (AT)T=A

1.1.2 求导公式:

∂ X T ∂ X = I \frac{\partial{X^T}}{\partial{X}} = I XXT=I,求解出来是单位矩阵
∂ X T A ∂ X = A \frac{\partial{X^T}A}{\partial{X}} = A XXTA=A
∂ A X T ∂ X = A \frac{\partial{A}X^T}{\partial{X}} = A XAXT=A
∂ A X ∂ X = A T \frac{\partial{A}X}{\partial{X}} = A^T XAX=AT
∂ X A ∂ X = A T \frac{\partial{X}A}{\partial{X}} = A^T XXA=AT
∂ X T A X ∂ X = ( A + A T ) X \frac{\partial{X^T}AX}{\partial{X}} = (A + A^T) X XXTAX=(A+AT)X,则:A不是对称矩阵
∂ X T A X ∂ X = 2 A X \frac{\partial{X^T}AX}{\partial{X}} = 2A X XXTAX=2AX,则:A是对称矩阵

2. 推导过程

2.1 推导正规方程的 θ \theta θ

  1. 矩阵乘法公式展开
    J ( θ ) = 1 2 ( X θ − y ) T ( X θ − y ) J(\theta) = \frac{1}{2}(X \theta - y)^T(X \theta - y) J(θ)=21(y)T(y)
    J ( θ ) = 1 2 ( X T θ T − y T ) ( X θ − y ) J(\theta) = \frac{1}{2}(X^T \theta^T - y^T)(X \theta - y) J(θ)=21(XTθTyT)(y)
    J ( θ ) = 1 2 ( X T θ T X θ − X T θ T y − y T X θ + y T y ) J(\theta) = \frac{1}{2}(X^T \theta^T X\theta - X^T \theta^T y - y^T X \theta + y^T y) J(θ)=21(XTθTXTθTyyT+yTy)
  2. 进行求导(X,y是已知量, θ \theta θ是变量)
    J ′ ( θ ) = 1 2 ( X T θ T X θ − X T θ T y − y T X θ + y T y ) ′ J'(\theta) = \frac{1}{2}(X^T \theta^T X\theta - X^T \theta^T y - y^T X \theta + y^T y)' J(θ)=21(XTθTXTθTyyT+yTy)
  3. 根据上面求导公式进行运算,( θ \theta θ是变量)
    // 把上面2式子里的导数拿到括号里
    J ′ ( θ ) = 1 2 ( ∂ X T θ T X θ ∂ θ − ∂ X T θ T y ∂ θ − ∂ y T X θ ∂ θ + ∂ y T y ∂ θ ) J'(\theta) = \frac{1}{2}(\frac{\partial{X^T \theta^T X\theta}}{\partial{\theta}} - \frac{\partial{{X^T \theta^T y}}}{\partial{\theta}} - \frac{\partial{y^T X \theta}}{\partial{\theta}} + \frac{\partial{y^T y}}{\partial{\theta}}) J(θ)=21(θXTθTθXTθTyθyT+θyTy)
    // 下面根据求导公式转换
    J ′ ( θ ) = 1 2 ( X T X θ − X T y − ( y T X ) T + ( θ T X T X ) T ) J'(\theta) =\frac{1}{2}(X^TX\theta - X^Ty - (y^TX)^T + (\theta^TX^TX)^T) J(θ)=21(XTXTy(yTX)T+(θTXTX)T)
    J ′ ( θ ) = 1 2 ( X T X θ − X T y − X T y + X T X θ ) J'(\theta) =\frac{1}{2}(X^TX\theta - X^Ty - X^Ty + X^TX\theta) J(θ)=21(XTXTyXTy+XT)
    J ′ ( θ ) = 1 2 ( 2 X T X θ − 2 X T y ) J'(\theta) =\frac{1}{2}(2X^TX\theta - 2X^Ty) J(θ)=21(2XT2XTy)
    J ′ ( θ ) = X T X θ − X T y J'(\theta) =X^TX\theta - X^Ty J(θ)=XTXTy
    J ′ ( θ ) = X T ( X θ − y ) J'(\theta) =X^T(X\theta - y) J(θ)=XT(y) // 矩阵运算分配率
  4. 令导数 J ′ ( θ ) = 0 J'(\theta) = 0 J(θ)=0(为什么要令它等于0呢?因为最小二乘法公式上有个平方,所以必然是凸函数,所以它的导数=0时,函数值必然是最小值。)
    0 = X T X θ − X T y 0 =X^TX\theta - X^Ty 0=XTXTy
    X T X θ = X T y X^TX\theta = X^Ty XT=XTy
    // 到这里似乎可以得到 θ = y X \theta = \frac{y}{X} θ=Xy,不过不对,矩阵运算没有除法,得用逆矩阵参与运算
  5. 使用逆矩阵转换
    ( X T X ) − 1 X T X θ = ( X T X ) − 1 X T y (X^TX)^{-1}X^TX\theta = (X^TX)^{-1}X^Ty (XTX)1XT=(XTX)1XTy
    I θ = ( X T X ) − 1 X T y I\theta = (X^TX)^{-1}X^Ty Iθ=(XTX)1XTy
    θ = ( X T X ) − 1 X T y \theta = (X^TX)^{-1}X^Ty θ=(XTX)1XTy

2.2 到此为止,正规方程推到完毕,完结撒花!!!

公式:
θ = ( X T X ) − 1 X T y \theta = (X^TX)^{-1}X^Ty θ=(XTX)1XTy

你可能感兴趣的:(机器学习基础,机器学习,人工智能,正规方程,深度学习,线性代数,线性回归)