【深度学习】动手学深度学习(PyTorch版)李沐 2.4.3 梯度【公式推导】

2.4.3. 梯度

  我们可以连接一个多元函数对其所有变量的偏导数,以得到该函数的梯度(gradient)向量。 具体而言,设函数 f : R n → R f:\mathbb{R}^{n}\to\mathbb{R} f:RnR的输入是一个 n n n维向量 x ⃗ = [ x 1 x 2 ⋅ ⋅ ⋅ x n ] \vec x=\begin{bmatrix} x_1\\x_2\\···\\x_n\end{bmatrix} x = x1x2⋅⋅⋅xn ,输出是一个标量。 函数 f ( x ⃗ ) f(\vec x) f(x )相对于 x ⃗ \vec x x 的梯度是一个包含 n n n个偏导数的向量:
∇ x ⃗ f ( x ⃗ ) = [ ∂ f ( x ⃗ ) ∂ x 1 ∂ f ( x ⃗ ) ∂ x 2 ⋅ ⋅ ⋅ ∂ f ( x ⃗ ) ∂ x n ] \nabla_{\vec x} f(\vec x) = \begin{bmatrix}\frac{\partial f(\vec x)}{\partial x_1}\\\frac{\partial f(\vec x)}{\partial x_2}\\···\\ \frac{\partial f(\vec x)}{\partial x_n}\end{bmatrix} x f(x )= x1f(x )x2f(x )⋅⋅⋅xnf(x )
其中 ∇ x ⃗ f ( x ⃗ ) \nabla_{\vec x} f(\vec x) x f(x )通常在没有歧义时被 ∇ f ( x ⃗ ) \nabla f(\vec x) f(x )取代。


假设 x ⃗ \vec x x n n n维向量,在微分多元函数时经常使用以下规则:

一、对于所有 A ∈ R m × n A \in \mathbb{R^{m\times n}} ARm×n,都有 ∇ x ⃗ A x ⃗ = A ⊤ \nabla_{\vec x} A\vec x = A^\top x Ax =A

证明:设 A ( m , n ) A_{(m,n)} A(m,n) = [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , n a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a m , 1 a m , 2 ⋅ ⋅ ⋅ a m , n ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{m,1} & a_{m,2} &···&a_{m,n} \end{bmatrix} a1,1a2,1⋅⋅⋅am,1a1,2a2,2⋅⋅⋅am,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,na2,n⋅⋅⋅am,n
A x ⃗ ( m , 1 ) A\vec x_{(m,1)} Ax (m,1) = [ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ⋅ ⋅ ⋅ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ] \begin{bmatrix} a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n \\ a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n \\ ··· \\ a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n \end{bmatrix} a1,1x1+a1,2x2+⋅⋅⋅+a1,nxna2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅am,1x1+am,2x2+⋅⋅⋅+am,nxn ,
∇ x ⃗ A x ⃗ \nabla_{\vec x}A\vec x x Ax = [ ∂ A x ⃗ ∂ x 1 ∂ A x ⃗ ∂ x 2 ⋅ ⋅ ⋅ ∂ A x ⃗ ∂ x n ] \begin{bmatrix}\frac{\partial A\vec x}{\partial x_1}\\\frac{\partial A\vec x}{\partial x_2}\\···\\ \frac{\partial A\vec x}{\partial x_n}\end{bmatrix} x1Ax x2Ax ⋅⋅⋅xnAx
= [ ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x 1 ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x 1 ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x 1 ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x 2 ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x 2 ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∂ a 1 , 1 x 1 + a 1 , 2 x 2 + ⋅ ⋅ ⋅ + a 1 , n x n ∂ x n ∂ a 2 , 1 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a 2 , n x n ∂ x n ⋅ ⋅ ⋅ ∂ a m , 1 x 1 + a m , 2 x 2 + ⋅ ⋅ ⋅ + a m , n x n ∂ x n ] \begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_1}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_1}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_2}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_2}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_n}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_n}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_n}\end{bmatrix} x1a1,1x1+a1,2x2+⋅⋅⋅+a1,nxnx2a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn⋅⋅⋅xna1,1x1+a1,2x2+⋅⋅⋅+a1,nxnx1a2,1x1+a2,2x2+⋅⋅⋅+a2,nxnx2a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅xna2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅x1am,1x1+am,2x2+⋅⋅⋅+am,nxnx2am,1x1+am,2x2+⋅⋅⋅+am,nxn⋅⋅⋅xnam,1x1+am,2x2+⋅⋅⋅+am,nxn
= [ a 1 , 1 a 2 , 1 ⋅ ⋅ ⋅ a m , 1 a 1 , 2 a 2 , 2 ⋅ ⋅ ⋅ a m , 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a 1 , n a 2 , n ⋅ ⋅ ⋅ a m , n ] \begin{bmatrix} a_{1,1} & a_{2,1} & ··· & a_{m,1}\\ a_{1,2} & a_{2,2} & ··· & a_{m,2} \\ ···&···&···&··· \\ a_{1,n}&a_{2,n}&···&a_{m,n} \end{bmatrix} a1,1a1,2⋅⋅⋅a1,na2,1a2,2⋅⋅⋅a2,n⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅am,1am,2⋅⋅⋅am,n = A ⊤ A^\top A

二、对于所有 A ∈ R n × m A \in \mathbb{R^{n\times m}} ARn×m,都有 ∇ x ⃗ x ⃗ ⊤ A = A \nabla_{\vec x} \vec x^\top A = A x x A=A

证明:设 A ( n , m ) A_{(n,m)} A(n,m)= [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , m a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , m ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , m ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,m} \\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,m} \end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,ma2,m⋅⋅⋅an,m
x ⃗ ⊤ A \vec x^\top A x A=
[ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ⋅ ⋅ ⋅ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ] \begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n \end{bmatrix} [a1,1x1+a2,1x2+⋅⋅⋅+an,1xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅a1,mx1+a2,mx2+⋅⋅⋅+an,mxn],
∇ x ⃗ x ⃗ ⊤ A \nabla_{\vec x}\vec x^\top A x x A= [ ∂ x ⃗ ⊤ A ∂ x 1 ∂ x ⃗ ⊤ A ∂ x 2 ⋅ ⋅ ⋅ ∂ x ⃗ ⊤ A ∂ x n ] \begin{bmatrix}\frac{\partial \vec x^\top A}{\partial x_1}\\\frac{\partial \vec x^\top A}{\partial x_2}\\···\\ \frac{\partial \vec x^\top A}{\partial x_n}\end{bmatrix} x1x Ax2x A⋅⋅⋅xnx A
= [ ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x 1 ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x 1 ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x 1 ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x 2 ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x 2 ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∂ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n ∂ x n ∂ a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ∂ x n ⋅ ⋅ ⋅ ∂ a 1 , m x 1 + a 2 , m x 2 + ⋅ ⋅ ⋅ + a n , m x n ∂ x n ] \begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_1}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_1}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_2}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_2}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_n}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_n}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_n}\end{bmatrix} x1a1,1x1+a2,1x2+⋅⋅⋅+an,1xnx2a1,1x1+a2,1x2+⋅⋅⋅+an,1xn⋅⋅⋅xna1,1x1+a2,1x2+⋅⋅⋅+an,1xnx1a1,2x1+a2,2x2+⋅⋅⋅+an,2xnx2a1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅x1a1,mx1+a2,mx2+⋅⋅⋅+an,mxnx2a1,mx1+a2,mx2+⋅⋅⋅+an,mxn⋅⋅⋅xna1,mx1+a2,mx2+⋅⋅⋅+an,mxn
= [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , m a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , m ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , m ] \begin{bmatrix} a_{1,1} & a_{1,2}&···&a_{1,m}\\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ···&···&···&···\\ a_{n,1}&a_{n,2}&···&a_{n,m} \end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,ma2,m⋅⋅⋅an,m = A A A

三、对于所有 A ∈ R n × n A \in \mathbb{R^{n\times n}} ARn×n,都有 ∇ x ⃗ x ⃗ ⊤ A x ⃗ = ( A + A ⊤ ) x ⃗ \nabla_{\vec x} \vec x^\top A \vec x = (A+A^\top)\vec x x x Ax =(A+A)x

证明:设 A ( n , n ) A_{(n,n)} A(n,n)= [ a 1 , 1 a 1 , 2 ⋅ ⋅ ⋅ a 1 , n a 2 , 1 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 a n , 2 ⋅ ⋅ ⋅ a n , n ] \begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,n} \end{bmatrix} a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,na2,n⋅⋅⋅an,n
x ⃗ ⊤ A \vec x^\top A x A= [ a 1 , 1 x 1 + a 2 , 1 x 2 + ⋅ ⋅ ⋅ + a n , 1 x n a 1 , 2 x 1 + a 2 , 2 x 2 + ⋅ ⋅ ⋅ + a n , 2 x n ⋅ ⋅ ⋅ a 1 , n x 1 + a 2 , n x 2 + ⋅ ⋅ ⋅ + a n , n x n ] \begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,n}x_1+a_{2,n}x_2+···+a_{n,n}x_n \end{bmatrix} [a1,1x1+a2,1x2+⋅⋅⋅+an,1xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅a1,nx1+a2,nx2+⋅⋅⋅+an,nxn],
x ⃗ ⊤ A x ⃗ \vec x^\top A \vec x x Ax = [ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ] \begin{bmatrix} \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j) \end{bmatrix} [i=1nj=1n(ai,jxixj)],
∇ x ⃗ x ⃗ ⊤ A x ⃗ \nabla_{\vec x}\vec x^\top A \vec x x x Ax = [ ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x 1 ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x 2 ⋅ ⋅ ⋅ ∂ ∑ i = 1 n ∑ j = 1 n ( a i , j x i x j ) ∂ x n ] \begin{bmatrix} \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_1} \\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_2} \\ ···\\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_n} \end{bmatrix} x1i=1nj=1n(ai,jxixj)x2i=1nj=1n(ai,jxixj)⋅⋅⋅xni=1nj=1n(ai,jxixj) = [ ∑ i = 1 n ( a i , 1 + a 1 , i ) x i ∑ i = 1 n ( a i , 2 + a 2 , i ) x i ⋅ ⋅ ⋅ ∑ i = 1 n ( a i , n + a n , i ) x i ] \begin{bmatrix} \sum\limits_{i=1}^{n}(a_{i,1}+a_{1,i})x_i \\ \sum\limits_{i=1}^{n}(a_{i,2}+a_{2,i})x_i \\ ···\\ \sum\limits_{i=1}^{n}(a_{i,n}+a_{n,i})x_i \\ \end{bmatrix} i=1n(ai,1+a1,i)xii=1n(ai,2+a2,i)xi⋅⋅⋅i=1n(ai,n+an,i)xi
= [ 2 a 1 , 1 a 1 , 2 + a 2 , 1 ⋅ ⋅ ⋅ a 1 , n + a n , 1 a 2 , 1 + a 1 , 2 2 a 2 , 2 ⋅ ⋅ ⋅ a 2 , n + a n , 2 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ a n , 1 + a 1 , n a n , 2 + a 2 , n ⋅ ⋅ ⋅ 2 a n , n ] [ x 1 x 2 ⋅ ⋅ ⋅ x n ] \begin{bmatrix} 2a_{1,1} & a_{1,2}+a_{2,1} & ···&a_{1,n}+a_{n,1} \\ a_{2,1}+a_{1,2} & 2a_{2,2} & ···&a_{2,n}+a_{n,2} \\ ···&···&···&···\\ a_{n,1}+a_{1,n} & a_{n,2}+a_{2,n} & ···&2a_{n,n} \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ ···\\ x_n \end{bmatrix} 2a1,1a2,1+a1,2⋅⋅⋅an,1+a1,na1,2+a2,12a2,2⋅⋅⋅an,2+a2,n⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,n+an,1a2,n+an,2⋅⋅⋅2an,n x1x2⋅⋅⋅xn = ( A + A ⊤ ) x ⃗ (A+A^\top)\vec x (A+A)x

四、 ∇ x ⃗ ∥ x ∥ 2 = ∇ x ⃗ x ⃗ ⊤ x ⃗ = 2 x ⃗ \nabla_{\vec x} \Vert x \Vert ^2=\nabla_{\vec x}\vec x^\top\vec x = 2\vec x x x2=x x x =2x

证明: ∇ x ⃗ ∥ x ∥ 2 \nabla_{\vec x}\Vert x \Vert ^2 x x2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n 2 \nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2 x x12+x22+⋅⋅⋅+xnn 2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n \nabla_{\vec x}x_1^2+x_2^2+···+x_n^n x x12+x22+⋅⋅⋅+xnn= ∇ x ⃗ x ⊤ x \nabla_{\vec x}x^\top x x xx
∇ x ⃗ ∥ x ∥ 2 \nabla_{\vec x}\Vert x \Vert ^2 x x2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n 2 \nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2 x x12+x22+⋅⋅⋅+xnn 2= ∇ x ⃗ x 1 2 + x 2 2 + ⋅ ⋅ ⋅ + x n n \nabla_{\vec x}x_1^2+x_2^2+···+x_n^n x x12+x22+⋅⋅⋅+xnn= [ 2 x 1 2 x 2 ⋅ ⋅ ⋅ 2 x n ] \begin{bmatrix} 2x_1\\ 2x_2\\ ···\\ 2x_n \end{bmatrix} 2x12x2⋅⋅⋅2xn = 2 x 2x 2x

  同样,对于任何矩阵 X X X,都有 ∇ X ∥ X ∥ F 2 = 2 X \nabla_X \Vert X \Vert_F^2=2X XXF2=2X。正如我们之后将看到的,梯度对于设计深度学习中的优化算法有很大用处。

五、对于任何矩阵 X X X,都有 ∇ X ∥ X ∥ F 2 = 2 X \nabla_X \Vert X \Vert_F^2=2X XXF2=2X

证明:设 X X X m × n m\times n m×n的矩阵, X = [ x 1 , 1 x 1 , 2 ⋅ ⋅ ⋅ x 1 , n x 2 , 1 x 2 , 2 ⋅ ⋅ ⋅ x 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ x m , 1 x m , 2 ⋅ ⋅ ⋅ x m , n ] X = \begin{bmatrix} x_{1,1}& x_{1,2}&···&x_{1,n}\\ x_{2,1}& x_{2,2}&···&x_{2,n}\\ ···&···&···&···\\ x_{m,1}& x_{m,2}&···&x_{m,n}\\ \end{bmatrix} X= x1,1x2,1⋅⋅⋅xm,1x1,2x2,2⋅⋅⋅xm,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅x1,nx2,n⋅⋅⋅xm,n
∥ X ∥ F 2 \Vert X \Vert_F^2 XF2= ∑ i = 1 m ∑ j = 1 n x i , j 2 2 \sqrt{\sum\limits_{i=1}^{m}\sum\limits_{j=1}^n x_{i,j}^2}^2 i=1mj=1nxi,j2 2= ∑ i = 1 m ∑ j = 1 n x i , j 2 \sum\limits_{i=1}^{m}\sum\limits_{j=1}^n x_{i,j}^2 i=1mj=1nxi,j2
∇ X ∥ X ∥ F 2 \nabla_X \Vert X \Vert_F^2 XXF2= [ 2 x 1 , 1 2 x 1 , 2 ⋅ ⋅ ⋅ 2 x 1 , n 2 x 2 , 1 2 x 2 , 2 ⋅ ⋅ ⋅ 2 x 2 , n ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 2 x m , 1 2 x m , 2 ⋅ ⋅ ⋅ 2 x m , n ] \begin{bmatrix} 2x_{1,1}& 2x_{1,2}&···&2x_{1,n}\\ 2x_{2,1}& 2x_{2,2}&···&2x_{2,n}\\ ···&···&···&···\\ 2x_{m,1}& 2x_{m,2}&···&2x_{m,n}\\ \end{bmatrix} 2x1,12x2,1⋅⋅⋅2xm,12x1,22x2,2⋅⋅⋅2xm,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅2x1,n2x2,n⋅⋅⋅2xm,n = 2 X 2X 2X

初看公式时没看懂,所以自己推了一遍加深印象,以上内容为推导过程,有问题欢迎讨论

你可能感兴趣的:(深度学习,人工智能)