矩阵求导(标量对矢量求导)

文章目录

  • 前言
  • 一、标量对向量求导
  • 二、例子
    • 1. y = w T ∗ x y = w^T*x y=wTx
    • 2. y = x T ∗ w y = x^T * w y=xTw
    • 2. y = x T ∗ A n ∗ n ∗ x y = x^T * A_{n*n} * x y=xTAnnx
  • 总结


前言

矩阵求导的学习记录


一、标量对向量求导

标量对向量求导,实际上是标量对向量中的每个元素求偏导,然后再组成一个和向量形状相同的向量。也就是:
∂ y ∂ x ⃗ = ( ∂ y ∂ x 1 , ∂ y ∂ x 2 … ∂ y ∂ x n ) T \frac{\partial y}{\partial \vec{x}} = (\frac{\partial y}{\partial x_1},\frac{\partial y}{\partial x_2}\dots \frac{\partial y}{\partial x_n})^T x y=(x1y,x2yxny)T
式中 y y y是一个标量, x = ( x 1 , x 2 … x n ) T x = (x_1,x_2\dots x_n)^T x=(x1,x2xn)T为一个n维向量;

二、例子

1. y = w T ∗ x y = w^T*x y=wTx

这是在信号处理中比较常见的一种加权求和形式。
实际上我们将其乘积结果展开可以得到:
y = w 1 x 1 + w 2 x 2 + … w n x n y = w_1x_1 + w_2x_2 + \dots w_nx_n y=w1x1+w2x2+wnxn
那么根据我们以上理论
∂ y ∂ x ⃗ = ( ∂ w 1 x 1 + w 2 x 2 + … w n x n ∂ x 1 , ∂ w 1 x 1 + w 2 x 2 + … w n x n ∂ x 2 … ∂ w 1 x 1 + w 2 x 2 + … w n x n ∂ x n ) T \frac{\partial y}{\partial \vec{x}} = (\frac{\partial w_1x_1 + w_2x_2 + \dots w_nx_n}{\partial x_1},\frac{\partial w_1x_1 + w_2x_2 + \dots w_nx_n}{\partial x_2}\dots \frac{\partial w_1x_1 + w_2x_2 + \dots w_nx_n}{\partial x_n})^T x y=(x1w1x1+w2x2+wnxn,x2w1x1+w2x2+wnxnxnw1x1+w2x2+wnxn)T
显然
∂ y ∂ x ⃗ = ( w 1 , w 2 , … , w n ) T = w ⃗ \frac{\partial y}{\partial \vec{x}} = (w_1,w_2,\dots,w_n)^T=\vec{w} x y=(w1,w2,,wn)T=w
这样我们得到了第一种形式的导数求法。

2. y = x T ∗ w y = x^T * w y=xTw

同理 y = w 1 x 1 + w 2 x 2 + … w n x n y = w_1x_1 + w_2x_2 + \dots w_nx_n y=w1x1+w2x2+wnxn
实际上这个与上一种情况结果一样
∂ y ∂ x ⃗ = ( w 1 , w 2 , … , w n ) T = w ⃗ \frac{\partial y}{\partial \vec{x}} = (w_1,w_2,\dots,w_n)^T=\vec{w} x y=(w1,w2,,wn)T=w

2. y = x T ∗ A n ∗ n ∗ x y = x^T * A_{n*n} * x y=xTAnnx

这种二次型情况也比较常见,我们将二次型展开写可以得到
y = a 11 x 1 2 + a 12 x 1 x 2 + ⋯ + a 1 n x 1 x n + a 21 x 2 x 1 + a 22 x 2 2 + ⋯ + a 1 n x 1 x n + ⋮ a n 1 x n x 1 + a n 2 x n x 2 + ⋯ + a n n x n 2 \begin{aligned} y=&a_{11}x_1^2+a_{12}x_1x_2+\dots+a_{1n}x_1x_n + \\ & a_{21}x_2x_1+a_{22}x_2^2+\dots+a_{1n}x_1x_n +\\ & \vdots \\ & a_{n1}x_nx_1 + a_{n2}x_nx_2 + \dots + a_{nn}x_n^2 \end{aligned} y=a11x12+a12x1x2++a1nx1xn+a21x2x1+a22x22++a1nx1xn+an1xnx1+an2xnx2++annxn2

∂ y ∂ x 1 = ( 2 a 11 x 1 + ( a 12 + a 21 ) x 2 + ⋯ + ( a 1 n + a n 1 ) x n ) \frac{\partial y}{\partial x_1} = (2a_{11}x_1 +(a_{12}+a_{21})x_2+\dots+(a_{1n}+a_{n1})x_n) x1y=(2a11x1+(a12+a21)x2++(a1n+an1)xn) ∂ y ∂ x 2 = ( ( a 12 + a 21 ) x 1 + 2 a 22 x 2 + ⋯ + ( a 2 n + a n 2 ) x n ) \frac{\partial y}{\partial x_2} = ((a_{12}+a_{21})x_1 +2a_{22}x_2+\dots+(a_{2n}+a_{n2})x_n) x2y=((a12+a21)x1+2a22x2++(a2n+an2)xn) ∂ y ∂ x n = ( ( a 1 n + a n 1 ) x 1 + ( a 2 n + a n 2 ) x 2 + ⋯ + a n n 2 x n ) \frac{\partial y}{\partial x_n} = ((a_{1n}+a_{n1})x_1 +(a_{2n}+a_{n2})x_2+\dots+a_{nn}^2x_n) xny=((a1n+an1)x1+(a2n+an2)x2++ann2xn)
所以
∂ y ∂ x ⃗ = ( 2 a 11 a 12 + a 21 … a 1 n + a n 1 a 12 + a 21 2 a 22 … a 2 n + a n 2 ⋮ ⋮ ⋮ ⋮ a 1 n + a n 1 a 2 n + a n 2 2 a n n ) x ⃗ \frac{\partial y}{\partial \vec{x}} = \begin{pmatrix} 2a_{11} & a_{12} + a_{21} & \dots &a_{1n}+a_{n1}\\ a_{12}+a_{21}&2 a_{22} & \dots & a_{2n}+a_{n2}\\ \vdots& \vdots& \vdots & \vdots \\ a_{1n}+a_{n1}& a_{2n}+a_{n2} & & 2a_{nn} \end{pmatrix}\vec{x} x y=2a11a12+a21a1n+an1a12+a212a22a2n+an2a1n+an1a2n+an22annx
实际上
( 2 a 11 a 12 + a 21 … a 1 n + a n 1 a 12 + a 21 a 22 … a 2 n + a n 2 ⋮ ⋮ ⋮ ⋮ a 1 n + a n 1 a 2 n + a n 2 2 a n n ) = ( a 11 + a 11 a 12 + a 21 … a 1 n + a n 1 a 12 + a 21 a 22 + a 22 … a 2 n + a n 2 ⋮ ⋮ ⋮ ⋮ a 1 n + a n 1 a 2 n + a n 2 a n n + a n n ) = A T + A \begin{pmatrix} 2a_{11} & a_{12} + a_{21} & \dots &a_{1n}+a_{n1}\\ a_{12}+a_{21}& a_{22} & \dots & a_{2n}+a_{n2}\\ \vdots& \vdots& \vdots & \vdots \\ a_{1n}+a_{n1}& a_{2n}+a_{n2} & & 2a_{nn} \end{pmatrix}=\begin{pmatrix} a_{11}+a_{11} & a_{12} + a_{21} & \dots &a_{1n}+a_{n1}\\ a_{12}+a_{21}& a_{22}+a_{22} & \dots & a_{2n}+a_{n2}\\ \vdots& \vdots& \vdots & \vdots \\ a_{1n}+a_{n1}& a_{2n}+a_{n2} & & a_{nn}+a_{nn} \end{pmatrix}=A^T + A 2a11a12+a21a1n+an1a12+a21a22a2n+an2a1n+an1a2n+an22ann=a11+a11a12+a21a1n+an1a12+a21a22+a22a2n+an2a1n+an1a2n+an2ann+ann=AT+A
故可以得到 ∂ y ∂ x ⃗ = ( A T + A ) x ⃗ \frac{\partial y}{\partial \vec{x}} = (A^T+A)\vec{x} x y=(AT+A)x

我们来看前一篇文章中的一个求导。
L ( w ) = w T R ~ w + λ [ w T a ( θ d ) − 1 ] L(w) = w^T\tilde{R}w+\lambda[w^Ta(\theta_d)-1] L(w)=wTR~w+λ[wTa(θd)1]
式中 L L L为一个标量, w = ( w 1 , w 2 , … , w n ) T w=(w_1,w_2,\dots,w_n)^T w=(w1,w2,,wn)T, R R R为一个实对称矩阵。
要求 ∂ L ( w ) ∂ w \frac{\partial L(w)}{\partial w} wL(w)
分成两部分
∂ ( w T R w ) ∂ w = ( R T + R ) w = 2 R w \frac{\partial (w^TRw)}{\partial w}=(R^T+R)w=2Rw w(wTRw)=(RT+R)w=2Rw
∂ ( w T a ( θ d ) − 1 ) ∂ w = a ( θ d ) \frac{\partial (w^Ta(\theta_d)-1)}{\partial w}=a(\theta_d) w(wTa(θd)1)=a(θd)
故最终结果
∂ L ( w ) ∂ w = 2 R ~ w + λ a ( θ d ) \frac{\partial L(w)}{\partial w}=2\tilde{R}w+\lambda a(\theta_d) wL(w)=2R~w+λa(θd)


总结

主要介绍了常见的几种标量对向量求导,实际上在数字信号处理中和深度学习中,对向量求导很常见。后面有时间继续写。

你可能感兴趣的:(矩阵求导,矩阵,线性代数)