问题阐述:
优化的似然函数分别为:
(1) L X ( θ ) = log p ( X ∣ t , θ ) = − 1 2 ( X − m ) T ( K + σ n 2 I ) − 1 ( X − m ) − 1 2 log ∣ K + σ n 2 I ∣ − n 2 log 2 π L_{X}(\boldsymbol{\theta})=\log p(\boldsymbol{X} | t, \boldsymbol{\theta})=-\frac{1}{2}(\boldsymbol{X}-\boldsymbol{m})^{T}\left(\boldsymbol{K}+\sigma_{n}^{2} \boldsymbol{I}\right)^{-1}(\boldsymbol{X}-\boldsymbol{m})-\frac{1}{2} \log \left|\boldsymbol{K}+\sigma_{n}^{2} \boldsymbol{I}\right|-\frac{n}{2} \log 2 \pi\tag{1} LX(θ)=logp(X∣t,θ)=−21(X−m)T(K+σn2I)−1(X−m)−21log∣∣K+σn2I∣∣−2nlog2π(1) (2) L Y ( θ ) = log p ( Y ∣ t , θ ) = − 1 2 ( Y − m ) T ( K + σ n 2 I ) − 1 ( Y − m ) − 1 2 log ∣ K + σ n 2 I ∣ − n 2 log 2 π L_{Y}(\boldsymbol{\theta})=\log p(\boldsymbol{Y} | t, \boldsymbol{\theta})=-\frac{1}{2}(\boldsymbol{Y}-\boldsymbol{m})^{T}\left(\boldsymbol{K}+\sigma_{n}^{2} \boldsymbol{I}\right)^{-1}(\boldsymbol{Y}-\boldsymbol{m})-\frac{1}{2} \log \left|\boldsymbol{K}+\sigma_{n}^{2} \boldsymbol{I}\right|-\frac{n}{2} \log 2 \pi\tag{2} LY(θ)=logp(Y∣t,θ)=−21(Y−m)T(K+σn2I)−1(Y−m)−21log∣∣K+σn2I∣∣−2nlog2π(2)
其中 m ( t ) m(t) m(t)为均值函数, K K K为核函数,表达式分别为 (3) m ( t ) = a 0 + a 1 t + a 2 t 2 + a 3 t 3 + a 4 t 4 + a 5 t 5 m(t)=a_{0}+a_{1} t+a_{2} t^{2}+a_{3} t^{3}+a_{4} t^{4}+a_{5} t^{5}\tag{3} m(t)=a0+a1t+a2t2+a3t3+a4t4+a5t5(3)
(4) k ( t , t ′ ) = σ f 2 exp ( − 1 2 l 2 ( t − t ′ ) 2 ) + σ n 2 δ t t ′ k\left(t, t^{\prime}\right)=\sigma_{f}^{2} \exp \left(-\frac{1}{2 l^{2}}\left(t-t^{\prime}\right)^{2}\right)+\sigma_{n}^{2} \delta_{t t^{\prime}}\tag{4} k(t,t′)=σf2exp(−2l21(t−t′)2)+σn2δtt′(4)求解过程
(1)对 a 0 a 1 a 2 a 3 a 4 a 5 \begin{array}{llllll}{a_{0}} & {a_{1}} & {a_{2}} & {a_{3}} & {a_{4}} & {a_{5}}\end{array} a0a1a2a3a4a5求偏导
举个简单例子: f ( x ) = X T A X f(x)=X^{T} A X f(x)=XTAX,其中 X = ( ε 1 , ε 2 , … , ε n ) T X=\left(\varepsilon_{1}, \varepsilon_{2}, \ldots, \varepsilon_{n}\right)^{T} X=(ε1,ε2,…,εn)T, A = ( a 11 … a 1 n … … a n 1 … a n n ) A=\left(\begin{array}{ccc}{a_{11}} & {\dots} & {a_{1 n}} \\ {\ldots} & {} & {\dots} \\ {a_{n 1}} & {\dots} & {a_{n n}}\end{array}\right) A=⎝⎛a11…an1……a1n…ann⎠⎞,
则 (5) f ( x ) = X T A X = ∑ i = 1 n ∑ j = 1 n ε i a i ⋅ j ε j = ε 1 ∑ j = 1 n a 1 ⋅ j ε j + ⋯ + ε k − 1 ∑ j = 1 n a k − 1 ⋅ j ε j + ε k ∑ j = 1 n a k ∗ j ε j + ε k + 1 ∑ j = 1 n a k + 1 ⋅ j ε j + ⋯ + ε n ∑ j = 1 n a n ⋅ j ε j \begin{aligned}f(x)&=X^{T} A X\\&=\sum_{i=1}^{n} \sum_{j=1}^{n} \varepsilon_{i} a_{i \cdot j} \varepsilon_{j}\\&=\varepsilon_{1} \sum_{j=1}^{n} a_{1 \cdot j} \varepsilon_{j}+\cdots+\varepsilon_{k-1} \sum_{j=1}^{n} a_{k-1 \cdot j} \varepsilon_{j}+\varepsilon_{k} \sum_{j=1}^{n} a_{k^{*} j} \varepsilon_{j}+\varepsilon_{k+1} \sum_{j=1}^{n} a_{k+1 \cdot j} \varepsilon_{j}+\cdots+\varepsilon_{n} \sum_{j=1}^{n} a_{n \cdot j} \varepsilon_{j}\end{aligned}\tag{5} f(x)=XTAX=i=1∑nj=1∑nεiai⋅jεj=ε1j=1∑na1⋅jεj+⋯+εk−1j=1∑nak−1⋅jεj+εkj=1∑nak∗jεj+εk+1j=1∑nak+1⋅jεj+⋯+εnj=1∑nan⋅jεj(5) (6) ∂ f ∂ ε k = ε 1 a 1 ⋅ k + ⋯ + ε k − 1 a ( k − 1 ) ⋅ k + ( ∑ j = 1 n a k ⋅ j ε j + ε k a k ⋅ k ) + ε k + 1 a ( k + 1 ) ⋅ k + ⋯ + ε n a n ⋅ k = ∑ j = 1 n a k ⋅ j ε j + ∑ i = 1 n ε i a i ⋅ k \begin{aligned}\frac{\partial f}{\partial \varepsilon_{k}}=&\varepsilon_{1} a_{1 \cdot k}+\cdots+\varepsilon_{k-1} a_{(k-1) \cdot k}+\left(\sum_{j=1}^{n} a_{k \cdot j} \varepsilon_{j}+\varepsilon_{k} a_{k \cdot k}\right)+\varepsilon_{k+1} a_{(k+1) \cdot k}+\cdots+\varepsilon_{n} a_{n \cdot k}\\=&\sum_{j=1}^{n} a_{k^{\cdot} j} \varepsilon_{j}+\sum_{i=1}^{n} \varepsilon_{i} a_{i \cdot k}\tag{6}\end{aligned} ∂εk∂f==ε1a1⋅k+⋯+εk−1a(k−1)⋅k+(j=1∑nak⋅jεj+εkak⋅k)+εk+1a(k+1)⋅k+⋯+εnan⋅kj=1∑nak⋅jεj+i=1∑nεiai⋅k(6) (7) ∂ f ∂ X = ( ∂ f ∂ ε 1 ∂ f ∂ ε 2 ∂ f ∂ ε n ) = ( ∑ j = 1 n a 1 : j ε j + ∑ i = 1 n ε i a i ⋅ 1 ⋯ ∑ j = 1 n a n ⋅ j ε j + ∑ i = 1 n ε i a i ⋅ n ) = A X + A T X = 2 A X \frac{\partial f}{\partial X}=\left(\begin{array}{c}{\frac{\partial f}{\partial \varepsilon_{1}}} \\ {\frac{\partial f}{\partial \varepsilon_{2}}} \\ {\frac{\partial f}{\partial \varepsilon_{n}}}\end{array}\right)=\left(\begin{array}{c}{\sum_{j=1}^{n} a_{1 : j} \varepsilon_{j}+\sum_{i=1}^{n} \varepsilon_{i} a_{i \cdot 1}} \\ {\cdots} \\ {\sum_{j=1}^{n} a_{n \cdot j} \varepsilon_{j}+\sum_{i=1}^{n} \varepsilon_{i} a_{i \cdot n}}\end{array}\right)=A X+A^{T} X=2 AX\tag{7} ∂X∂f=⎝⎜⎛∂ε1∂f∂ε2∂f∂εn∂f⎠⎟⎞=⎝⎛∑j=1na1:jεj+∑i=1nεiai⋅1⋯∑j=1nan⋅jεj+∑i=1nεiai⋅n⎠⎞=AX+ATX=2AX(7)对超参数求导的形式为: (8) ∂ f ∂ a 0 = ∂ f ∂ x 1 ∂ x 1 ∂ a 0 + ∂ f ∂ x 2 ∂ x 2 ∂ a 0 + … + ∂ f ∂ x n ∂ x n ∂ a 0 \frac{\partial f}{\partial a_{0}}=\frac{\partial f}{\partial x_{1}} \frac{\partial x_{1}}{\partial a_{0}}+\frac{\partial f}{\partial x_{2}} \frac{\partial x_{2}}{\partial a_{0}}+\ldots+\frac{\partial f}{\partial x_{n}} \frac{\partial x_{n}}{\partial a_{0}}\tag{8} ∂a0∂f=∂x1∂f∂a0∂x1+∂x2∂f∂a0∂x2+…+∂xn∂f∂a0∂xn(8)则结合上面可以得到 (9) ∂ L ∂ a = 2 P A X \frac{\partial L}{\partial a}=2 PA X\tag{9} ∂a∂L=2PAX(9)其中 P 0 = [ 1 1...1 1 ] 1 × 200 , P 1 = t × [ 1 1...1 1 ] 1 × 200 , . . . , P 5 = t 5 × [ 1 1...1 1 ] 1 × 200 , P = [ P 0 P 1 P 2 P 3 P 4 P 5 ] T P_{0}=[1\ \ 1 ...1\ \ 1]_1\times_{200},P_{1}=t\times[1\ \ 1 ...1\ \ 1]_1\times_{200},...,P_{5}=t^5\times[1\ \ 1 ...1\ \ 1]_1\times_{200},P=[P_{0} \ \ P_{1}\ \ P_{2} \ \ P_{3} \ \ P_{4} \ \ P_{5}]^T P0=[1 1...1 1]1×200,P1=t×[1 1...1 1]1×200,...,P5=t5×[1 1...1 1]1×200,P=[P0 P1 P2 P3 P4 P5]T。
(2)对 σ f \sigma_{f} σf l l l σ n \sigma_n σn求偏导 (10) ∂ L ∂ K = X T X \frac{\partial L}{\partial K}=X^{T} X\tag{10} ∂K∂L=XTX(10)则根据 K K K的函数式可以求得偏导为 (11) ∂ L ∂ σ f = 2 σ f e ( − 1 2 ℓ 2 ( t − t ′ ) 2 X T I X ∂ L ∂ l = σ f 2 e ( − 1 2 ℓ 2 ( t − t ′ ) 2 ) 1 l 3 ( t − t ′ ) 2 X T I X ∂ L ∂ σ n = 2 σ n δ t t ′ X T I X \begin{aligned}\frac{\partial L}{\partial \sigma_{f}}=&2 \sigma_{f} e^{\left(-\frac{1}{2 \ell^2}\left(t-t^{\prime}\right)^{2}\right.} X^{T} I X\\\frac{\partial L}{\partial l}=&\sigma_{f}^{2} e^{\left(-\frac{1}{2 \ell^{2}}\left(t-t^{\prime}\right)^{2}\right)} \frac{1}{l^{3}}\left(t-t^{\prime}\right)^{2} X^{T} I X\\\frac{\partial L}{\partial \sigma_{n}}=&2 \sigma_{n} \delta_{t t^{\prime}} X^{T} I X\end{aligned}\tag{11} ∂σf∂L=∂l∂L=∂σn∂L=2σfe(−2ℓ21(t−t′)2XTIXσf2e(−2ℓ21(t−t′)2)l31(t−t′)2XTIX2σnδtt′XTIX(11)第二部分 1 2 log ∣ K + σ n 2 I ∣ \frac{1}{2} \log \left|\boldsymbol{K}+\sigma_{n}^{2} \boldsymbol{I}\right| 21log∣∣K+σn2I∣∣利用矩阵迹求偏导,求解之前先了解矩阵求导和迹的运算之间关系:
矩阵微分https://www.qiujiawei.com/matrix-calculus-1/
运算原理为: (12) d [ ln ∣ Σ ∣ ] = ∣ Σ ∣ − 1 d ∣ Σ ∣ = tr ( Σ − 1 d Σ ) d[\ln |\Sigma|]=|\Sigma|^{-1} d|\Sigma|=\operatorname{tr}\left(\Sigma^{-1} d \Sigma\right)\tag{12} d[ln∣Σ∣]=∣Σ∣−1d∣Σ∣=tr(Σ−1dΣ)(12) d f = tr ( ∂ f ∂ X T d X ) d f=\operatorname{tr}\left(\frac{\partial f}{\partial X}^{T} d X\right) df=tr(∂X∂fTdX)
参考网址
[1] 二次型求导
[2] 多元正态分布的似然估计