使用共轭梯度法求解二次型函数超参数的局部最优解

问题阐述:

优化的似然函数分别为:
(1) L X ( θ ) = log ⁡ p ( X ∣ t , θ ) = − 1 2 ( X − m ) T ( K + σ n 2 I ) − 1 ( X − m ) − 1 2 log ⁡ ∣ K + σ n 2 I ∣ − n 2 log ⁡ 2 π L_{X}(\boldsymbol{\theta})=\log p(\boldsymbol{X} | t, \boldsymbol{\theta})=-\frac{1}{2}(\boldsymbol{X}-\boldsymbol{m})^{T}\left(\boldsymbol{K}+\sigma_{n}^{2} \boldsymbol{I}\right)^{-1}(\boldsymbol{X}-\boldsymbol{m})-\frac{1}{2} \log \left|\boldsymbol{K}+\sigma_{n}^{2} \boldsymbol{I}\right|-\frac{n}{2} \log 2 \pi\tag{1} LX(θ)=logp(Xt,θ)=21(Xm)T(K+σn2I)1(Xm)21logK+σn2I2nlog2π(1) (2) L Y ( θ ) = log ⁡ p ( Y ∣ t , θ ) = − 1 2 ( Y − m ) T ( K + σ n 2 I ) − 1 ( Y − m ) − 1 2 log ⁡ ∣ K + σ n 2 I ∣ − n 2 log ⁡ 2 π L_{Y}(\boldsymbol{\theta})=\log p(\boldsymbol{Y} | t, \boldsymbol{\theta})=-\frac{1}{2}(\boldsymbol{Y}-\boldsymbol{m})^{T}\left(\boldsymbol{K}+\sigma_{n}^{2} \boldsymbol{I}\right)^{-1}(\boldsymbol{Y}-\boldsymbol{m})-\frac{1}{2} \log \left|\boldsymbol{K}+\sigma_{n}^{2} \boldsymbol{I}\right|-\frac{n}{2} \log 2 \pi\tag{2} LY(θ)=logp(Yt,θ)=21(Ym)T(K+σn2I)1(Ym)21logK+σn2I2nlog2π(2)

其中 m ( t ) m(t) m(t)为均值函数, K K K为核函数,表达式分别为 (3) m ( t ) = a 0 + a 1 t + a 2 t 2 + a 3 t 3 + a 4 t 4 + a 5 t 5 m(t)=a_{0}+a_{1} t+a_{2} t^{2}+a_{3} t^{3}+a_{4} t^{4}+a_{5} t^{5}\tag{3} m(t)=a0+a1t+a2t2+a3t3+a4t4+a5t5(3)

(4) k ( t , t ′ ) = σ f 2 exp ⁡ ( − 1 2 l 2 ( t − t ′ ) 2 ) + σ n 2 δ t t ′ k\left(t, t^{\prime}\right)=\sigma_{f}^{2} \exp \left(-\frac{1}{2 l^{2}}\left(t-t^{\prime}\right)^{2}\right)+\sigma_{n}^{2} \delta_{t t^{\prime}}\tag{4} k(t,t)=σf2exp(2l21(tt)2)+σn2δtt(4)求解过程

(1)对 a 0 a 1 a 2 a 3 a 4 a 5 \begin{array}{llllll}{a_{0}} & {a_{1}} & {a_{2}} & {a_{3}} & {a_{4}} & {a_{5}}\end{array} a0a1a2a3a4a5求偏导

举个简单例子: f ( x ) = X T A X f(x)=X^{T} A X f(x)=XTAX,其中 X = ( ε 1 , ε 2 , … , ε n ) T X=\left(\varepsilon_{1}, \varepsilon_{2}, \ldots, \varepsilon_{n}\right)^{T} X=(ε1,ε2,,εn)T A = ( a 11 … a 1 n … … a n 1 … a n n ) A=\left(\begin{array}{ccc}{a_{11}} & {\dots} & {a_{1 n}} \\ {\ldots} & {} & {\dots} \\ {a_{n 1}} & {\dots} & {a_{n n}}\end{array}\right) A=a11an1a1nann
(5) f ( x ) = X T A X = ∑ i = 1 n ∑ j = 1 n ε i a i ⋅ j ε j = ε 1 ∑ j = 1 n a 1 ⋅ j ε j + ⋯ + ε k − 1 ∑ j = 1 n a k − 1 ⋅ j ε j + ε k ∑ j = 1 n a k ∗ j ε j + ε k + 1 ∑ j = 1 n a k + 1 ⋅ j ε j + ⋯ + ε n ∑ j = 1 n a n ⋅ j ε j \begin{aligned}f(x)&=X^{T} A X\\&=\sum_{i=1}^{n} \sum_{j=1}^{n} \varepsilon_{i} a_{i \cdot j} \varepsilon_{j}\\&=\varepsilon_{1} \sum_{j=1}^{n} a_{1 \cdot j} \varepsilon_{j}+\cdots+\varepsilon_{k-1} \sum_{j=1}^{n} a_{k-1 \cdot j} \varepsilon_{j}+\varepsilon_{k} \sum_{j=1}^{n} a_{k^{*} j} \varepsilon_{j}+\varepsilon_{k+1} \sum_{j=1}^{n} a_{k+1 \cdot j} \varepsilon_{j}+\cdots+\varepsilon_{n} \sum_{j=1}^{n} a_{n \cdot j} \varepsilon_{j}\end{aligned}\tag{5} f(x)=XTAX=i=1nj=1nεiaijεj=ε1j=1na1jεj++εk1j=1nak1jεj+εkj=1nakjεj+εk+1j=1nak+1jεj++εnj=1nanjεj(5) (6) ∂ f ∂ ε k = ε 1 a 1 ⋅ k + ⋯ + ε k − 1 a ( k − 1 ) ⋅ k + ( ∑ j = 1 n a k ⋅ j ε j + ε k a k ⋅ k ) + ε k + 1 a ( k + 1 ) ⋅ k + ⋯ + ε n a n ⋅ k = ∑ j = 1 n a k ⋅ j ε j + ∑ i = 1 n ε i a i ⋅ k \begin{aligned}\frac{\partial f}{\partial \varepsilon_{k}}=&\varepsilon_{1} a_{1 \cdot k}+\cdots+\varepsilon_{k-1} a_{(k-1) \cdot k}+\left(\sum_{j=1}^{n} a_{k \cdot j} \varepsilon_{j}+\varepsilon_{k} a_{k \cdot k}\right)+\varepsilon_{k+1} a_{(k+1) \cdot k}+\cdots+\varepsilon_{n} a_{n \cdot k}\\=&\sum_{j=1}^{n} a_{k^{\cdot} j} \varepsilon_{j}+\sum_{i=1}^{n} \varepsilon_{i} a_{i \cdot k}\tag{6}\end{aligned} εkf==ε1a1k++εk1a(k1)k+(j=1nakjεj+εkakk)+εk+1a(k+1)k++εnankj=1nakjεj+i=1nεiaik(6) (7) ∂ f ∂ X = ( ∂ f ∂ ε 1 ∂ f ∂ ε 2 ∂ f ∂ ε n ) = ( ∑ j = 1 n a 1 : j ε j + ∑ i = 1 n ε i a i ⋅ 1 ⋯ ∑ j = 1 n a n ⋅ j ε j + ∑ i = 1 n ε i a i ⋅ n ) = A X + A T X = 2 A X \frac{\partial f}{\partial X}=\left(\begin{array}{c}{\frac{\partial f}{\partial \varepsilon_{1}}} \\ {\frac{\partial f}{\partial \varepsilon_{2}}} \\ {\frac{\partial f}{\partial \varepsilon_{n}}}\end{array}\right)=\left(\begin{array}{c}{\sum_{j=1}^{n} a_{1 : j} \varepsilon_{j}+\sum_{i=1}^{n} \varepsilon_{i} a_{i \cdot 1}} \\ {\cdots} \\ {\sum_{j=1}^{n} a_{n \cdot j} \varepsilon_{j}+\sum_{i=1}^{n} \varepsilon_{i} a_{i \cdot n}}\end{array}\right)=A X+A^{T} X=2 AX\tag{7} Xf=ε1fε2fεnf=j=1na1:jεj+i=1nεiai1j=1nanjεj+i=1nεiain=AX+ATX=2AX(7)对超参数求导的形式为: (8) ∂ f ∂ a 0 = ∂ f ∂ x 1 ∂ x 1 ∂ a 0 + ∂ f ∂ x 2 ∂ x 2 ∂ a 0 + … + ∂ f ∂ x n ∂ x n ∂ a 0 \frac{\partial f}{\partial a_{0}}=\frac{\partial f}{\partial x_{1}} \frac{\partial x_{1}}{\partial a_{0}}+\frac{\partial f}{\partial x_{2}} \frac{\partial x_{2}}{\partial a_{0}}+\ldots+\frac{\partial f}{\partial x_{n}} \frac{\partial x_{n}}{\partial a_{0}}\tag{8} a0f=x1fa0x1+x2fa0x2++xnfa0xn(8)则结合上面可以得到 (9) ∂ L ∂ a = 2 P A X \frac{\partial L}{\partial a}=2 PA X\tag{9} aL=2PAX(9)其中 P 0 = [ 1    1...1    1 ] 1 × 200 , P 1 = t × [ 1    1...1    1 ] 1 × 200 , . . . , P 5 = t 5 × [ 1    1...1    1 ] 1 × 200 , P = [ P 0    P 1    P 2    P 3    P 4    P 5 ] T P_{0}=[1\ \ 1 ...1\ \ 1]_1\times_{200},P_{1}=t\times[1\ \ 1 ...1\ \ 1]_1\times_{200},...,P_{5}=t^5\times[1\ \ 1 ...1\ \ 1]_1\times_{200},P=[P_{0} \ \ P_{1}\ \ P_{2} \ \ P_{3} \ \ P_{4} \ \ P_{5}]^T P0=[1  1...1  1]1×200,P1=t×[1  1...1  1]1×200,...,P5=t5×[1  1...1  1]1×200,P=[P0  P1  P2  P3  P4  P5]T

(2)对 σ f \sigma_{f} σf l l l σ n \sigma_n σn求偏导 (10) ∂ L ∂ K = X T X \frac{\partial L}{\partial K}=X^{T} X\tag{10} KL=XTX(10)则根据 K K K的函数式可以求得偏导为 (11) ∂ L ∂ σ f = 2 σ f e ( − 1 2 ℓ 2 ( t − t ′ ) 2 X T I X ∂ L ∂ l = σ f 2 e ( − 1 2 ℓ 2 ( t − t ′ ) 2 ) 1 l 3 ( t − t ′ ) 2 X T I X ∂ L ∂ σ n = 2 σ n δ t t ′ X T I X \begin{aligned}\frac{\partial L}{\partial \sigma_{f}}=&2 \sigma_{f} e^{\left(-\frac{1}{2 \ell^2}\left(t-t^{\prime}\right)^{2}\right.} X^{T} I X\\\frac{\partial L}{\partial l}=&\sigma_{f}^{2} e^{\left(-\frac{1}{2 \ell^{2}}\left(t-t^{\prime}\right)^{2}\right)} \frac{1}{l^{3}}\left(t-t^{\prime}\right)^{2} X^{T} I X\\\frac{\partial L}{\partial \sigma_{n}}=&2 \sigma_{n} \delta_{t t^{\prime}} X^{T} I X\end{aligned}\tag{11} σfL=lL=σnL=2σfe(221(tt)2XTIXσf2e(221(tt)2)l31(tt)2XTIX2σnδttXTIX(11)第二部分 1 2 log ⁡ ∣ K + σ n 2 I ∣ \frac{1}{2} \log \left|\boldsymbol{K}+\sigma_{n}^{2} \boldsymbol{I}\right| 21logK+σn2I利用矩阵迹求偏导,求解之前先了解矩阵求导和迹的运算之间关系:

矩阵微分https://www.qiujiawei.com/matrix-calculus-1/

运算原理为: (12) d [ ln ⁡ ∣ Σ ∣ ] = ∣ Σ ∣ − 1 d ∣ Σ ∣ = tr ⁡ ( Σ − 1 d Σ ) d[\ln |\Sigma|]=|\Sigma|^{-1} d|\Sigma|=\operatorname{tr}\left(\Sigma^{-1} d \Sigma\right)\tag{12} d[lnΣ]=Σ1dΣ=tr(Σ1dΣ)(12) d f = tr ⁡ ( ∂ f ∂ X T d X ) d f=\operatorname{tr}\left(\frac{\partial f}{\partial X}^{T} d X\right) df=tr(XfTdX)
参考网址

[1] 二次型求导
[2] 多元正态分布的似然估计

你可能感兴趣的:(使用共轭梯度法求解二次型函数超参数的局部最优解)