深度学习不可不知的矩阵微积分

这篇文章介绍了深度学习中会用到的矩阵微积分,帮助我们更好的理解深度学习。

文章里面介绍的内容来自于https://explained.ai/matrix-calculus/index.html,感谢原作者,在这里我只是个搬运工。如有问题请指教,谢谢!


目录

引言 常用微积分介绍

向量微积分和偏导介绍

矩阵微积分

Jacobian矩阵

向量对应元素二元运算的导数

链式法则

1. 单变量链式法则

2. 单变量全导数链式法则

3. 向量链式法则 


引言 常用微积分介绍

首先我们先来回顾一下微积分的基本法则。

规则 f(x) x进行求导 示例
常量 c 0 \frac{d}{dx}99=0
乘以常量 cf c\frac{df}{dx} \frac{d}{dx}3x=3
x^{n} nx^{n-1} \frac{d}{dx}x^{3}=3x^{2}
加法 f+g \frac{df}{dx}+\frac{dg}{dx} \frac{d}{dx}(x^{2}+3x)=2x+3
减法 f-g \frac{df}{dx}-\frac{dg}{dx} \frac{d}{dx}(x^{2}-3x)=2x-3
乘法 fg f\frac{dg}{dx}+g\frac{df}{dx} \frac{d}{dx}(x^{2}x)=2x*x+x^{2}*1=3x^{2}
链式法则 f(g(x)) \frac{df}{du}\frac{du}{dx},u=g(x) \frac{d}{dx}ln(x^{2})=\frac{1}{x^{2}}2x=\frac{2}{x}

对于f(x),我们常用f'或者f'(x)代表\frac{d}{dx}f(x)


向量微积分和偏导介绍

神经网络中的函数往往不会只是一元函数f(x),这里我们来看一下例如f(x,y)的多元函数。

对于偏导,我们通常用\frac{\partial }{\partial x}代替\frac{d}{dx}

例子:

    假设f(x,y)=3x^{2}y,那么:

    \frac{\partial }{\partial x}3x^{2}y=3y\frac{\partial }{\partial x}x^{2}=3y2x=6xy

    \frac{\partial }{\partial y}3x^{2}y=3x^{2}\frac{\partial }{\partial y}y=3x^{2}

我们把下面公式成为向量f(x,y)的梯度:

{\color{Blue} \Delta f(x,y)=[\frac{\partial f(x,y)}{\partial x},\frac{\partial f(x,y)}{\partial y}]=[6xy,3x^{2}]}

所以,梯度就是它的偏导数的向量。梯度是向量微积分的一部分,它处理将n个标量参数映射到单个标量的函数。现在,让我们同时考虑多个函数的导数。


矩阵微积分

假设f(x,y)=3x^{2}yg(x,y)=2x+y^{8}

我们可以得出它们的梯度:\Delta f(x,y)=[6xy,3x^{2}]\Delta g(x,y)=[2,8y^{7}]

梯度向量构成了一个特定标量函数的所有偏导数。如果我们有两个函数,我们也可以通过叠加梯度将它们的梯度组织成一个矩阵。
当我们这样做时,我们得到了Jacobian 矩阵(或者仅仅是Jacobian),其中梯度是行。

J=\begin{bmatrix} \Delta f(x,y)\\ \Delta g(x,y) \end{bmatrix}=\begin{bmatrix} \frac{\partial f(x,y)}{\partial x} & \frac{\partial f(x,y)}{\partial y}\\ \frac{\partial g(x,y)}{\partial x} & \frac{\partial g(x,y)}{\partial y} \end{bmatrix}=\begin{bmatrix} 6xy & 3x^{2}\\ 2 & 8y^{7} \end{bmatrix}

 

Jacobian矩阵

在上面我们介绍了一个Jacobian矩阵的特例,下面我们将更详细的介绍什么是Jacobian矩阵。

对于多元函数f(x_{1},x_{2},x_{3})\Rightarrow f(\boldsymbol{x}),其中\boldsymbol{x}代表向量:

x=\begin{bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n} \end{bmatrix}

对于多个标量值函数,我们可以将它们组合成一个向量,就像我们对参数所做的那样。令\boldsymbol{y}= \boldsymbol{f}(\boldsymbol{x})是n标量的函数(就像x是n标量),所以\boldsymbol{y}= \boldsymbol{f}(\boldsymbol{x})可以展开如下:

\begin{matrix} y_{1}= f_{1}(\boldsymbol{x})\\ y_{2}= f_{2}(\boldsymbol{x})\\ \vdots \\ y_{m}= f_{m}(\boldsymbol{x}) \end{matrix}

例如,我们可以用如下公式来代表f(x,y)=3x^{2}yg(x,y)=2x+y^{8}

\begin{matrix} y_{1}= f_{1}(\boldsymbol{x})=3x_{1}^{2}x_{2}\\ y_{2}= f_{2}(\boldsymbol{x})=2x_{1}+x_{2}^{8}\end{matrix}

其中x_{1},x_{2}分别代表x,y

 

下面介绍m=n的例子,例如\boldsymbol{y}=\boldsymbol{f}(\boldsymbol{x})=\boldsymbol{x}:

\begin{matrix} y_{1}= f_{1}(\boldsymbol{x})=x_{1}\\ y_{2}= f_{2}(\boldsymbol{x})=x_{2}\\ \vdots \\ y_{n}= f_{n}(\boldsymbol{x})=x_{n} \end{matrix}

所以我们有了m=n的函数和参数。通俗来讲,Jacobian有m*n的偏导(有一些地方得出来的形式可能有所区别,是下式矩阵的转置):

{\color{Blue} \frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}}= \begin{bmatrix} \Delta f_{1}(\boldsymbol{x}))\\ \Delta f_{2}(\boldsymbol{x}))\\ \cdots \\ \Delta f_{m}(\boldsymbol{x})) \end{bmatrix}=\begin{bmatrix} \frac{\partial }{\partial \boldsymbol{x}} f_{1}(\boldsymbol{x})\\ \frac{\partial }{\partial \boldsymbol{x}} f_{2}(\boldsymbol{x})\\ \cdots \\ \frac{\partial }{\partial \boldsymbol{x}} f_{m}(\boldsymbol{x}) \end{bmatrix}=\begin{bmatrix} \frac{\partial }{\partial x_{1}} f_{1}(\boldsymbol{x})& \cdots & \frac{\partial }{\partial x_{n}} f_{1}(\boldsymbol{x}) \\ \frac{\partial }{\partial x_{1}} f_{2}(\boldsymbol{x})& \cdots & \frac{\partial }{\partial x_{n}} f_{2}(\boldsymbol{x}) \\ \cdots \\ \frac{\partial }{\partial x_{1}} f_{m}(\boldsymbol{x})& \cdots & \frac{\partial }{\partial x_{n}} f_{m}(\boldsymbol{x}) \end{bmatrix}}

其中,每一个\frac{\partial }{\partial \boldsymbol{x}} f_{i}(\boldsymbol{x})是n维的水平向量。

下面以方框来直观的表示偏导的维度:

深度学习不可不知的矩阵微积分_第1张图片

对于\boldsymbol{f}(\boldsymbol{x})=\boldsymbol{x}f_{i}(\boldsymbol{x})=x_{i}m=n来说:

\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}}= \begin{bmatrix} \Delta f_{1}(\boldsymbol{x}))\\ \Delta f_{2}(\boldsymbol{x}))\\ \cdots \\ \Delta f_{m}(\boldsymbol{x})) \end{bmatrix}=\begin{bmatrix} \frac{\partial }{\partial \boldsymbol{x}} f_{1}(\boldsymbol{x})\\ \frac{\partial }{\partial \boldsymbol{x}} f_{2}(\boldsymbol{x})\\ \cdots \\ \frac{\partial }{\partial \boldsymbol{x}} f_{m}(\boldsymbol{x}) \end{bmatrix}=\begin{bmatrix} \frac{\partial }{\partial x_{1}} f_{1}(\boldsymbol{x})& \cdots & \frac{\partial }{\partial x_{n}} f_{1}(\boldsymbol{x}) \\ \frac{\partial }{\partial x_{1}} f_{2}(\boldsymbol{x})& \cdots & \frac{\partial }{\partial x_{n}} f_{2}(\boldsymbol{x}) \\ \cdots \\ \frac{\partial }{\partial x_{1}} f_{m}(\boldsymbol{x})& \cdots & \frac{\partial }{\partial x_{n}} f_{m}(\boldsymbol{x}) \end{bmatrix}

=\begin{bmatrix} \frac{\partial }{\partial x_{1}} x_{1}& \cdots & \frac{\partial }{\partial x_{n}} x_{1} \\ \frac{\partial }{\partial x_{1}} x_{2}& \cdots & \frac{\partial }{\partial x_{n}} x_{2} \\ \cdots \\ \frac{\partial }{\partial x_{1}} x_{n}& \cdots & \frac{\partial }{\partial x_{n}} x_{n}\end{bmatrix}=\begin{bmatrix} \frac{\partial }{\partial x_{1}} x_{1}& \cdots & 0 \\ 0& \cdots & 0 \\ \cdots \\ 0& \cdots & \frac{\partial }{\partial x_{n}} x_{n}\end{bmatrix}

=\begin{bmatrix} 1 & 0 & \cdots & 0\\ 0& 1& \cdots & 0\\ & & \ddots & \\ 0 & 0 & \cdots &1 \end{bmatrix}=I

 

向量对应元素二元运算的导数

题目可能有点儿绕口,意思就是说第一个向量和第二个向量的第一个元素做运算,第一个向量和第二个向量的第二个元素做运算,……

例如\boldsymbol{w}+\boldsymbol{x}= \begin{bmatrix} w_{1}\\ w_{2}\\ \vdots \\ w_{n} \end{bmatrix}+ \begin{bmatrix} x_{1}\\ x_{2}\\ \vdots \\ x_{n} \end{bmatrix}= \begin{bmatrix} w_{1}+x_{1}\\ w_{2}+x_{2}\\ \vdots \\ w_{n}+x_{n} \end{bmatrix}就是对应元素做运算。

我们假对应元素二元运算为:y=\boldsymbol{f}(\boldsymbol{w})\bigcirc \boldsymbol{g}(\boldsymbol{x}),其中m=n=|y|=|w|=|x|\bigcirc代表任意对应元素运算符(例如+),而不是函数组合运算符。展开y=\boldsymbol{f}(\boldsymbol{w})\bigcirc \boldsymbol{g}(\boldsymbol{x}),得到以下公式:

\begin{bmatrix} y_{1}\\ y_{2}\\ \vdots \\ y_{n} \end{bmatrix}= \begin{bmatrix} f_{1}(\boldsymbol{w})\bigcirc g_{1}(\boldsymbol{x})\\ f_{2}(\boldsymbol{w})\bigcirc g_{2}(\boldsymbol{x})\\ \vdots \\ f_{n}(\boldsymbol{w})\bigcirc g_{n}(\boldsymbol{x}) \end{bmatrix}

根据上面学的Jacobian,我们可以得出偏导:

{\color{Blue} \boldsymbol{J}_{\boldsymbol{w}}=\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{w}}= \begin{bmatrix} \frac{\partial}{\partial w_{1}}(f_{1}(\boldsymbol{w})\bigcirc g_{1}(\boldsymbol{x}))& \frac{\partial}{\partial w_{2}}(f_{1}(\boldsymbol{w})\bigcirc g_{1}(\boldsymbol{x})) &\cdots & \frac{\partial}{\partial w_{n}}(f_{1}(\boldsymbol{w})\bigcirc g_{1}(\boldsymbol{x}))\\ \frac{\partial}{\partial w_{1}}(f_{2}(\boldsymbol{w})\bigcirc g_{2}(\boldsymbol{x}))& \frac{\partial}{\partial w_{2}}(f_{2}(\boldsymbol{w})\bigcirc g_{2}(\boldsymbol{x})) &\cdots & \frac{\partial}{\partial w_{n}}(f_{2}(\boldsymbol{w})\bigcirc g_{2}(\boldsymbol{x}))\\ & \cdots & & \\ \frac{\partial}{\partial w_{1}}(f_{n}(\boldsymbol{w})\bigcirc g_{n}(\boldsymbol{x}))& \frac{\partial}{\partial w_{2}}(f_{n}(\boldsymbol{w})\bigcirc g_{n}(\boldsymbol{x})) &\cdots & \frac{\partial}{\partial w_{n}}(f_{n}(\boldsymbol{w})\bigcirc g_{n}(\boldsymbol{x})) \end{bmatrix}}

{\color{Blue} \boldsymbol{J}_{\boldsymbol{x}}=\frac{\partial \boldsymbol{y}}{\partial \boldsymbol{x}}= \begin{bmatrix} \frac{\partial}{\partial x_{1}}(f_{1}(\boldsymbol{w})\bigcirc g_{1}(\boldsymbol{x}))& \frac{\partial}{\partial x_{2}}(f_{1}(\boldsymbol{w})\bigcirc g_{1}(\boldsymbol{x})) &\cdots & \frac{\partial}{\partial x_{n}}(f_{1}(\boldsymbol{w})\bigcirc g_{1}(\boldsymbol{x}))\\ \frac{\partial}{\partial x_{1}}(f_{2}(\boldsymbol{w})\bigcirc g_{2}(\boldsymbol{x}))& \frac{\partial}{\partial x_{2}}(f_{2}(\boldsymbol{w})\bigcirc g_{2}(\boldsymbol{x})) &\cdots & \frac{\partial}{\partial x_{n}}(f_{2}(\boldsymbol{w})\bigcirc g_{2}(\boldsymbol{x}))\\ & \cdots & & \\ \frac{\partial}{\partial x_{1}}(f_{n}(\boldsymbol{w})\bigcirc g_{n}(\boldsymbol{x}))& \frac{\partial}{\partial x_{2}}(f_{n}(\boldsymbol{w})\bigcirc g_{n}(\boldsymbol{x})) &\cdots & \frac{\partial}{\partial x_{n}}(f_{n}(\boldsymbol{w})\bigcirc g_{n}(\boldsymbol{x})) \end{bmatrix}}

 

链式法则

当我们在计算复杂函数的导数时,只用上面学的知识可能不能求出导数了,这是我们会使用链式法则。

1. 单变量链式法则

如果有一个函数y=f(g(x)),我们令u=g(x),则:

{\color{Blue} \frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}}

下面让试着我们计算y=f(g(x))=sin(x^{2})的导数吧!

1. 引入中间变量。u =x^{2}u (x)=x^{2}的简写):

    u =x^{2}

    y=f(u)=sin(u)

2. 计算导数。

    \frac{du}{dx}=2x

    \frac{dy}{du}=cos(u)

3. 合并

    \frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}=cos(u)2x

4. 替换

    \frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx}=cos(x^{2})2x=2xcos(x^{2})

可以看到计算过程非常的简单,接下来我们再来计算一个更为复杂的函数y=f(x)=f_{4}(f_{3}(f_{2}(f_{1}(x))))=ln(sin(x^{3})^2):

1. 引入中间变量。u =x^{2}u (x)=x^{2}的简写):

    u_{1} =f_{1}(x)=x^{3}

    u_{2} =f_{2}(u_{1})=sin(u_{1})

    u_{3} =f_{3}(u_2)=u_{2}^{2}

    u_4=f_4(u_3)=ln(u_3)(y=u_4)

2. 计算导数。

    \frac{du_1}{dx}=\frac{d}{dx}x^{3}=3x^2

    \frac{du_2}{du1}=\frac{d}{du_1}sin(u_1)=cos(u_1)

    \frac{du_3}{du_2}=\frac{d}{du_2}u_2^{2}=2u_2

    \frac{du_4}{du_3}=\frac{d}{du_3}ln(u_3)=\frac{1}{u_3}

3. 合并

    \frac{dy}{dx}=\frac{du_4}{dx}=\frac{du_4}{du_3}\frac{du_3}{du_2}\frac{du_2}{du_1}\frac{du_1}{x}=\frac{1}{u_3}2u_2cos(u_1)3x^2=\frac{6u_2x^2cos(u_1)}{u_3}

4. 替换

    \frac{dy}{dx}=\frac{6u_2x^2cos(u_1)}{u_3}=\frac{6sin(u_1)x^2cos(x^3)}{u_2^2}=\frac{6sin(x^3)x^2cos(x^3)}{sin(u_1)^2}=\frac{6sin(x^3)x^2cos(x^3)}{sin(x^3)^2}=\frac{6x^2cos(x^3)}{sin(x^3)}

2. 单变量全导数链式法则

我们的单变量链式法则的适用性有限,因为所有中间变量都必须是单变量的函数。但是,它演示了链式法则的核心机制,即把中间子表达式的所有导数乘出来。但是,要处理更一般的表达式,比如y=x+x^2,我们需要扩充基本的链式法则。

我们使:

    u_1(x)=x^2

    u_2(x,u_1)=x+u_1    y=f(x)=u_2(x,u_1)

让我们试试会发生什么。如果尝试\frac{du_2}{du_1}=0+1=1并且\frac{du_1}{dx}=2x,根据之前我们学习的,得到\frac{dy}{dx}=\frac{du_2}{dx}=\frac{du_2}{du_1}\frac{du_1}{dx}=2x,显然与正确答案1+2x不一样。

因为u_2(x,u_1)=x+u_1有多个参数,所以应该引入偏微分。让我们盲目的加入偏微分来看看会得到什么:

    \frac{\partial u_1(x)}{\partial x}=2x

    \frac{\partial u_2(x,u_1)}{\partial u_1}=\frac{\partial }{\partial u_1}=0+1=1

   {\color{Red} \frac{\partial u_2(x,u_1)}{\partial x}=\frac{\partial }{\partial x}=1+0=1}

有错误!{\color{Red} \frac{\partial u_2(x,u_1)}{\partial x}}的计算是错误的,因为它违反了偏导数的关键规则:当对x做偏导的时候,其它的变量应该与x相互独立。然而我们可以看到:u_1(x)=x^2,明显是与x有关系的。

正确的全导数的法则是:

\frac{dy}{dx}=\frac{\partial f(x)}{\partial x}=\frac{\partial u_2(x,u_1)}{\partial x}=\frac{\partial u_2}{\partial x}\frac{\partial x}{\partial x}+\frac{\partial u_2}{\partial u_1}\frac{\partial u_1}{\partial x}=\frac{\partial u_2}{\partial x}+\frac{\partial u_2}{\partial u_1}\frac{\partial u_1}{\partial x}

使用这个公式,我们得出正确的答案:

\frac{dy}{dx}=\frac{\partial u_2}{\partial x}+\frac{\partial u_2}{\partial u_1}\frac{\partial u_1}{\partial x}=1+1*2x=1+2x

我们称一下公式为单变量全导数链式法则

{\color{Blue} \frac{\partial f(x,u_1,u_2,\cdots ,u_n)}{\partial x}=\frac{\partial f}{\partial x}+\frac{\partial f}{\partial u_1}\frac{\partial u_1}{\partial x}+\frac{\partial f}{\partial u_2}\frac{\partial u_2}{\partial x}+\cdots +\frac{\partial f}{\partial u_n}\frac{\partial u_n}{\partial x}=\frac{\partial f}{\partial x}+\sum_{i=1}^{n}\frac{\partial f}{\partial u_i}\frac{\partial u_i}{\partial x}}

总导数假设所有变量都是相互依赖的,而偏导数假设除x外所有变量都是常数。

例子

    让我们看看嵌套子表达式,例如f(x)=sin(x+x^2)。我们引入三个中间变量:

    u_1(x)=x^2

    u_2(x,u_1)=x+u_1

   u_3(u_2)=sin(u_2)    (y=f(x)=u_3(u_2))

    偏导为:

    \frac{\partial u_1}{\partial x}=2x

    \frac{\partial u_2}{\partial x}=\frac{\partial }{\partial x}(x+u_1)+\frac{\partial u_2}{\partial u_1}\frac{\partial u_1}{\partial x}=1+1*2x=1+2x

    \frac{\partial f(x)}{\partial x}=\frac{\partial u_3}{\partial x}+\frac{\partial u_3}{\partial u_2}\frac{\partial u_2}{\partial x}=0+cos(u_2)\frac{\partial u_2}{\partial x}=cos(x+x^2)(1+2x)

例子

    例如y=x*x^2。我们引入两个中间变量:

    u_1(x)=x^2

    u_2(x,u_1)=xu_1    (y=f(x)=u_2(x,u_1))

    偏导为:

    \frac{\partial u_1}{\partial x}=2x

    \frac{\partial u_2}{\partial x}=\frac{\partial }{\partial x}(xu_1)+\frac{\partial u_2}{\partial u_1}\frac{\partial u_1}{\partial x}=u_1+x*2x=x^2+2x^2=3x^2

3. 向量链式法则 

到目前为止,我们已经对全导数练市法则已经有了一定的了解,现在我们来讨论向量链式法则。

假设\boldsymbol{y}=\boldsymbol{f}(x),展开后:

\begin{bmatrix} y_1(x)\\ y_2(x) \end{bmatrix}= \begin{bmatrix} f_1(x)\\ f_2(x) \end{bmatrix}= \begin{bmatrix} ln(x^2)\\ sin(3x) \end{bmatrix}

让我们引入两个中间变量,g_1g_2对应每一个f_i,所以\boldsymbol{y}=\boldsymbol{f}(\boldsymbol{g}(x)):

\begin{bmatrix} g_1(x)\\ g_2(x) \end{bmatrix}= \begin{bmatrix} x^2\\ 3x \end{bmatrix}

\begin{bmatrix} f_1(\boldsymbol{g})\\ f_2(\boldsymbol{g}) \end{bmatrix}= \begin{bmatrix} ln(g_1)\\ sin(g_2) \end{bmatrix}

\boldsymbol{y}的导数是垂直向量,每个元素是用单变量全导数法则求解出来的:

\frac{\partial \boldsymbol{y}}{\partial x}=\begin{bmatrix} \frac{\partial f_1(\boldsymbol{g})}{\partial x}\\ \frac{\partial f_2(\boldsymbol{g})}{\partial x} \end{bmatrix} =\begin{bmatrix} \frac{\partial f_1}{\partial g_1}\frac{\partial g_1}{x}+\frac{\partial f_1}{\partial g_2}\frac{\partial g_2}{x}\\ \frac{\partial f_2}{\partial g_1}\frac{\partial g_1}{x}+\frac{\partial f_2}{\partial g_2}\frac{\partial g_2}{x} \end{bmatrix}=\begin{bmatrix} \frac{1}{g_1}2x+0\\ 0+cos(g_2)3 \end{bmatrix}=\begin{bmatrix} \frac{2x}{x^2}\\ 3cos(3x) \end{bmatrix}=\begin{bmatrix} \frac{2}{x}\\ 3cos(3x) \end{bmatrix}

如果我们分离\frac{\partial f_i}{\partial g_j}\frac{\partial g_j}{\partial x},把\frac{\partial g_j}{\partial x}分离出一个向量,我们会得到一个矩阵乘以一个向量

\begin{bmatrix} \frac{\partial f_1}{\partial g_1} & \frac{\partial f_1}{\partial g_2}\\ \frac{\partial f_2}{\partial g_1} & \frac{\partial f_2}{\partial g_2} \end{bmatrix} \begin{bmatrix} \frac{\partial g_1}{x}\\ \frac{\partial g_2}{x} \end{bmatrix}= \frac{\partial \boldsymbol{f}}{\partial \boldsymbol{g}}\frac{\partial \boldsymbol{g}}{\partial x}

这意味着雅可比矩阵是另外两个雅可比矩阵的乘积,it's cool!让我们来看看结果:

\frac{\partial \boldsymbol{f}}{\partial \boldsymbol{g}}\frac{\partial \boldsymbol{g}}{\partial x}=\begin{bmatrix} \frac{1}{g_1} & 0\\ 0 & cos(g_2) \end{bmatrix} \begin{bmatrix} 2x\\ 3 \end{bmatrix} =\begin{bmatrix} \frac{1}{g_1}2x+0\\ 0+cos(g_2)3 \end{bmatrix} =\begin{bmatrix} \frac{2}{x}\\ 3cos(3x) \end{bmatrix}

我们得到了相同的结果!

所以,向量链式法则的公式为:

{\color{Blue} \frac{\partial }{\partial x}\boldsymbol{f}(\boldsymbol{g}(x))=\frac{\partial \boldsymbol{f}}{\partial \boldsymbol{g}}\frac{\partial \boldsymbol{g}}{\partial x}}

比较一下单元素链式法则,可以发现基本一样。

\frac{\partial }{\partial x}f(g(x))=\frac{\partial f}{\partial g}\frac{\partial g}{\partial x}

为了让多元向量\boldsymbol{x}也适用于该公式,我们只需要改变x\boldsymbol{x}

{\color{Blue} \frac{\partial }{\partial \boldsymbol{x}}\boldsymbol{f}(\boldsymbol{g}(\boldsymbol{x}))=\frac{\partial \boldsymbol{f}}{\partial \boldsymbol{g}}\frac{\partial \boldsymbol{g}}{\partial \boldsymbol{x}}}

{\color{Blue} \frac{\partial }{\partial \boldsymbol{x}}\boldsymbol{f}(\boldsymbol{g}(\boldsymbol{x}))= \begin{bmatrix} \frac{\partial f_1}{\partial g_1}& \frac{\partial f_1}{\partial g_2} & \cdots & \frac{\partial f_1}{\partial g_k} \\ \frac{\partial f_2}{\partial g_1}& \frac{\partial f_2}{\partial g_2} & \cdots & \frac{\partial f_2}{\partial g_k} \\ & & \ddots & \\ \frac{\partial f_m}{\partial g_1}& \frac{\partial f_m}{\partial g_2} & \cdots & \frac{\partial f_m}{\partial g_k} \end{bmatrix}\begin{bmatrix} \frac{\partial g_1}{\partial x_1}& \frac{\partial g_1}{\partial x_2} & \cdots & \frac{\partial g_1}{\partial x_n} \\ \frac{\partial g_2}{\partial x_1}& \frac{\partial g_2}{\partial x_2} & \cdots & \frac{\partial g_2}{\partial x_n} \\ & & \ddots & \\ \frac{\partial g_k}{\partial x_1}& \frac{\partial g_k}{\partial x_2} & \cdots & \frac{\partial g_k}{\partial x_n} \end{bmatrix}}

其中,|f|=m|x|=n|g|=k ,最后矩阵的大小为m*nm*k的矩阵乘以k*n的矩阵)^{}

 

 

 

 

 

 

 

 

你可能感兴趣的:(深度学习,微积分)