深度学习中的反向传播数学计算过程

反向传播的数学计算过程

  • 1 计算关于X关于的雅可比矩阵
  • 2 计算各分量的偏导和 **/** v投影各方向上的累加和
  • 3 确定最终分量的梯度计算表达式
  • 4 y.backward(v) 根据函数中有无参数v进行计算

====================================================================================分割线
假设:
X = (x1, x2, x3)
Y=2*X
= (2*x1, 2*x2, 2*x3) = (y1, y2, y3)
实际上有:
y1=f(x1,x2,x3)=2*x1
y2=f(x1,x2,x3)=2*x2
y3=f(x1,x2,x3)=2*x3
---------------------------------------------------------------------------------------------------------------------------------------------------分割线

1 计算关于X关于的雅可比矩阵

J = ( ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 1 ∂ x 3 ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ∂ y 2 ∂ x 3 ∂ y 3 ∂ x 1 ∂ y 3 ∂ x 2 ∂ y 3 ∂ x 3 ) \begin{equation*} J= \begin{pmatrix} \dfrac{\partial y_1}{\partial x_1}&\dfrac{\partial y_1}{\partial x_2}&\dfrac{\partial y_1}{\partial x_3} \\[2.5ex] \dfrac{\partial y_2}{\partial x_1}&\dfrac{\partial y_2}{\partial x_2}&\dfrac{\partial y_2}{\partial x_3} \\[2.5ex] \dfrac{\partial y_3}{\partial x_1}&\dfrac{\partial y_3}{\partial x_2}&\dfrac{\partial y_3}{\partial x_3} \\[1.5ex] \end{pmatrix} \end{equation*} J= x1y1x1y2x1y3x2y1x2y2x2y3x3y1x3y2x3y3

2 计算各分量的偏导和 / v投影各方向上的累加和

d Y d x 1 = ∂ y 1 ∂ x 1 + ∂ y 1 ∂ x 1 + ∂ y 1 ∂ x 1 d Y d x 2 = ∂ y 1 ∂ x 2 + ∂ y 2 ∂ x 2 + ∂ y 3 ∂ x 2 d Y d x 3 = ∂ y 1 ∂ x 3 + ∂ y 2 ∂ x 3 + ∂ y 3 ∂ x 3 \dfrac{dY}{dx_1}= \dfrac{\partial y_1}{\partial x_1}+\dfrac{\partial y_1}{\partial x_1}+\dfrac{\partial y_1}{\partial x_1}\\[2.5ex] \dfrac{dY}{dx_2}= \dfrac{\partial y_1}{\partial x_2}+\dfrac{\partial y_2}{\partial x_2}+\dfrac{\partial y_3}{\partial x_2}\\[2.5ex] \dfrac{dY}{dx_3}= \dfrac{\partial y_1}{\partial x_3}+\dfrac{\partial y_2}{\partial x_3}+\dfrac{\partial y_3}{\partial x_3}\\[2.5ex] dx1dY=x1y1+x1y1+x1y1dx2dY=x2y1+x2y2+x2y3dx3dY=x3y1+x3y2+x3y3

3 确定最终分量的梯度计算表达式

d Y d X = ( d Y d x 1 d Y d x 2 d Y d x 3 ) \dfrac{dY}{dX}= \begin{pmatrix} \dfrac{dY}{dx_1}&\dfrac{dY}{dx_2}&\dfrac{dY}{dx_3} \\ \end{pmatrix} dXdY=(dx1dYdx2dYdx3dY)

4 y.backward(v) 根据函数中有无参数v进行计算

若是v=(m,n,q),则偏导数计算过程中,偏导数前应该乘上分量对应投影值
比如,若v=(1,2,3),则在表示在偏导计算过程中,对应分量x1,x2,x3应该乘上对应的投影值
以   d Y d x 1 = 1 ∗ ∂ y 1 ∂ x 1 + 2 ∗ ∂ y 1 ∂ x 1 + 3 ∗ ∂ y 1 ∂ x 1  为例 以 \ \ \dfrac{dY}{dx_1}= 1*\dfrac{\partial y_1}{\partial x_1}+2*\dfrac{\partial y_1}{\partial x_1}+3*\dfrac{\partial y_1}{\partial x_1} \ 为例   dx1dY=1x1y1+2x1y1+3x1y1 为例
更广泛的:
d Y j d x i = m ∗ ∂ y j ∂ x i + n ∗ ∂ y j ∂ x i + q ∗ ∂ y j ∂ x i \dfrac{dY_j}{dx_i}= m*\dfrac{\partial y_j}{\partial x_i}+n*\dfrac{\partial y_j}{\partial x_i}+q*\dfrac{\partial y_j}{\partial x_i} dxidYj=mxiyj+nxiyj+qxiyj

所以当v=(1,1,1)时,有无投影没区别

===================================

如果哪里有错误,请在评论区指出,虚心听取

你可能感兴趣的:(深度学习pytorch,深度学习,人工智能)