一个F范数对矩阵求导例子

==PART1 ==

1. 问题

如何求下列表达式中的未知参数W?求偏导?
min ⁡ f ( W ) = min ⁡ w ∥ ( X W ∘ D ˘ ) B ∥ F 2 (0) \min f(W)=\min_w \|(XW \circ \breve{D})B\|_F^2 \tag{0} minf(W)=wmin(XWD˘)BF2(0)
其中 只 有 W 是 未 知 参 数 \textcolor{red}{只有W是未知参数} W,且 X ∈ R n × m , W ∈ R m × c , D ˘ ∈ R n × c , B ∈ R c × c X \in R^{n \times m}, W \in R^{m \times c}, \breve{D} \in R^{n \times c}, B \in R^{c \times c} XRn×m,WRm×c,D˘Rn×c,BRc×c
同时 ∘ \circ 表示 Hadamard积,即矩阵按位乘(matlab的点乘)。

2. 推导过程

S = X W ∘ D ˘ S=XW \circ \breve{D} S=XWD˘,则
f ( W ) = ∥ ( X W ∘ D ˘ ) B ∥ F 2 = ∥ S B ∥ F 2 = t r ( S B B T S T ) (1) f(W)=\|(XW \circ \breve{D})B\|_F^2 \\ = \|SB\|_F^2 \\ =tr(SBB^TS^T) \tag{1} f(W)=(XWD˘)BF2=SBF2=tr(SBBTST)(1)
由于有 ∂ t r ( X B X T ) ∂ X = X B T + X B , (2) \frac{\partial{tr(XBX^T)}}{\partial{X}}=XB^T + XB \tag{2}, Xtr(XBXT)=XBT+XB(2)
所以
∂ f ∂ S = S B B T + S B B T = 2 S B B T (3) \frac{\partial{f}}{\partial{S}}=SBB^T + SBB^T=2SBB^T \tag{3} Sf=SBBT+SBBT=2SBBT(3)
继续求解:
d f = t r [ ( ∂ f ∂ S ) T d S ] = t r [ ( ∂ f ∂ S ) T d ( X W ∘ D ˘ ) ] = t r [ ( ∂ f ∂ S ) T ( d ( X W ) ∘ D ˘ ) ] = t r [ ( ∂ f ∂ S ) T ( X d W ∘ D ˘ ) ] = t r [ ( ∂ f ∂ S ) T ( D ˘ ∘ X d W ) ] = t r [ ( ∂ f ∂ S ∘ D ˘ ) T X d W ) ] = t r [ ( X T ( ∂ f ∂ S ∘ D ˘ ) ) T d W ] df=tr[(\frac{\partial{f}}{\partial{S}})^TdS]=tr[(\frac{\partial{f}}{\partial{S}})^Td(XW \circ \breve{D})] \\ =tr[(\frac{\partial{f}}{\partial{S}})^T(d(XW)\circ\breve{D}) ] \\ =tr[(\frac{\partial{f}}{\partial{S}})^T(XdW\circ\breve{D})] \\ =tr[(\frac{\partial{f}}{\partial{S}})^T(\breve{D} \circ XdW)] \\ =tr[(\frac{\partial{f}}{\partial{S}} \circ \breve{D})^TXdW)] \\ =tr[(X^T(\frac{\partial{f}}{\partial{S}} \circ \breve{D}))^TdW] \\ df=tr[(Sf)TdS]=tr[(Sf)Td(XWD˘)]=tr[(Sf)T(d(XW)D˘)]=tr[(Sf)T(XdWD˘)]=tr[(Sf)T(D˘XdW)]=tr[(SfD˘)TXdW)]=tr[(XT(SfD˘))TdW]
所以:
∂ f ∂ W = X T ( ∂ f ∂ S ∘ D ˘ ) = X T [ ( 2 S B B T ) ∘ D ˘ ] = X T ( 2 ( X W ∘ D ˘ ) B B T ∘ D ˘ ) \frac{\partial{f}}{\partial{W}}=X^T(\frac{\partial{f}}{\partial{S}} \circ \breve{D})=X^T[(2SBB^T) \circ \breve{D}] \\ =X^T(2(XW \circ \breve{D})BB^T\circ \breve{D} ) Wf=XT(SfD˘)=XT[(2SBBT)D˘]=XT(2(XWD˘)BBTD˘)

3. 说明

  • 本篇内容属于标量对矩阵的求导,还有更难的矩阵对矩阵求导
  • 本文内容经 矩阵求导术 相关内容推导而来,更详细说明请参阅矩阵求导术相关内容 https://blog.csdn.net/lgl123ok/article/details/120780368
  • 重点公式:
    一个F范数对矩阵求导例子_第1张图片
    一个F范数对矩阵求导例子_第2张图片
    一个F范数对矩阵求导例子_第3张图片

==PART2 ==

问题2: ∥ X W − Z ∥ F 2 \|XW-Z\|_F^2 XWZF2对W求偏导是多少?
答:令 f ( W ) = ∥ X W − Z ∥ F 2 f(W)=\|XW-Z\|_F^2 f(W)=XWZF2, S=XW-Z
∂ f ∂ S = 2 S = 2 ( X W − Z ) \frac{\partial{f}}{\partial{S}}=2S=2(XW-Z) Sf=2S=2(XWZ)

d f = t r [ ( ∂ f ∂ S ) T d S ] = t r [ ( ∂ f ∂ S ) T d ( X W − Z ) ] = t r [ ( ∂ f ∂ S ) T ( X d W ) ] = t r [ ( X T ∂ f ∂ S ) T d W ] df=tr[(\frac{\partial{f}}{\partial{S}})^TdS]=tr[(\frac{\partial{f}}{\partial{S}})^Td(XW -Z)] \\ =tr[(\frac{\partial{f}}{\partial{S}})^T(XdW) ] \\ =tr[(X^T\frac{\partial{f}}{\partial{S}})^TdW ] \\ df=tr[(Sf)TdS]=tr[(Sf)Td(XWZ)]=tr[(Sf)T(XdW)]=tr[(XTSf)TdW]
所以:
∂ f ∂ W = X T ∂ f ∂ S = X T [ 2 ( X W − Z ) ] = 2 X T ( X W − Z ) \frac{\partial{f}}{\partial{W}}=X^T\frac{\partial{f}}{\partial{S}} =X^T[2(XW-Z)] \\ =2X^T(XW-Z) Wf=XTSf=XT[2(XWZ)]=2XT(XWZ)
结论是正确的,参 Multi-label feature selection via manifold regularization and
dependence maximization 的第11式。


==PART3 ==

问题3: ∥ Q − W ∘ A + 1 ρ Γ 1 ∥ F 2 \|Q- W \circ A + \frac{1}{\rho}\Gamma_1\|_F^2 QWA+ρ1Γ1F2对W求偏导是多少? 圈是点乘。
答:令 f ( W ) = ∥ Q − W ∘ S + 1 ρ Γ 1 ∥ F 2 f(W)=\|Q- W \circ S + \frac{1}{\rho}\Gamma_1\|_F^2 f(W)=QWS+ρ1Γ1F2, S = Q − W ∘ S + 1 ρ Γ 1 S=Q- W \circ S + \frac{1}{\rho}\Gamma_1 S=QWS+ρ1Γ1
∂ f ∂ S = 2 S = 2 ( Q − W ∘ A + 1 ρ Γ 1 ) \frac{\partial{f}}{\partial{S}}=2S=2(Q- W \circ A + \frac{1}{\rho}\Gamma_1) Sf=2S=2(QWA+ρ1Γ1)

d f = t r [ ( ∂ f ∂ S ) T d S ] = t r [ ( ∂ f ∂ S ) T d ( Q − W ∘ A + 1 ρ Γ 1 ) ] = t r [ ( ∂ f ∂ S ) T ( − A ∘ d W ) ] = t r [ ( ∂ f ∂ S ∘ ( − A ) ) T d W ] df=tr[(\frac{\partial{f}}{\partial{S}})^TdS]=tr[(\frac{\partial{f}}{\partial{S}})^Td(Q- W \circ A + \frac{1}{\rho}\Gamma_1)] \\ =tr[(\frac{\partial{f}}{\partial{S}})^T(-A \circ dW) ] \\ =tr[(\frac{\partial{f}}{\partial{S}} \circ (-A))^TdW ] \\ df=tr[(Sf)TdS]=tr[(Sf)Td(QWA+ρ1Γ1)]=tr[(Sf)T(AdW)]=tr[(Sf(A))TdW]
所以:
∂ f ∂ W = ∂ f ∂ S ∘ ( − A ) = 2 ( Q − W ∘ A + 1 ρ Γ 1 ) ∘ ( − A ) = − 2 ( Q − W ∘ A + 1 ρ Γ 1 ) ∘ A \frac{\partial{f}}{\partial{W}}=\frac{\partial{f}}{\partial{S}} \circ (-A) =2(Q- W \circ A + \frac{1}{\rho}\Gamma_1) \circ (-A) \\ =-2(Q- W \circ A + \frac{1}{\rho}\Gamma_1) \circ A Wf=Sf(A)=2(QWA+ρ1Γ1)(A)=2(QWA+ρ1Γ1)A

这个只能适当参考。

你可能感兴趣的:(数学,矩阵,线性代数,matlab)