贝叶斯线性模型可以表征为:
y = H x + w (1) \boldsymbol y = \boldsymbol {H x} + \boldsymbol w \tag{1} y=Hx+w(1)
其中 y ∈ R N \boldsymbol{y} \in \mathbb{R}^{N} y∈RN, H ∈ R N × p \boldsymbol{H} \in \mathbb{R}^{N \times p} H∈RN×p已知, x ∈ R p \boldsymbol x \in \mathbb{R}^{p} x∈Rp且 x ∼ N ( μ x , C x ) \boldsymbol x \sim \mathcal{N}(\boldsymbol{ \mu_x}, \boldsymbol{C_x}) x∼N(μx,Cx), w ∈ R N \boldsymbol{w} \in \mathbb{R}^N w∈RN是噪声向量, w ∼ N ( 0 , C w ) \boldsymbol w \sim \mathcal{N}(\boldsymbol 0, \boldsymbol {C_w}) w∼N(0,Cw), x \boldsymbol x x与 w \boldsymbol{w} w相互独立。与传统的线性模型相比,贝叶斯线性模型将 x \boldsymbol{x} x看作是一个随机向量。
我们考虑 x , y \boldsymbol{x,y} x,y的联合概率分布,令 z = [ y T , x T ] T \boldsymbol{z} = [\boldsymbol{y}^T,\boldsymbol{x}^T]^T z=[yT,xT]T,则
z = [ H x + w x ] = [ H I N I p 0 ] [ x w ] = A [ x w ] (2) \begin{aligned} \boldsymbol{z}&=\left[ \begin{array}{c} \boldsymbol{Hx}+\boldsymbol{w}\\ \boldsymbol{x}\\ \end{array} \right] \,\, \\ &= \left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{w}\\ \end{array} \right] \\ & = \boldsymbol{A} \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{w}\\ \end{array} \right] \tag{2} \end{aligned} z=[Hx+wx]=[HIpIN0][xw]=A[xw](2)
因为 x \boldsymbol{x} x与 w \boldsymbol{w} w都服从高斯分布且各自独立,所以 [ x T , w T ] T [\boldsymbol{x}^T,\boldsymbol{w}^T]^T [xT,wT]T的联合分布也是高斯,因为 z \boldsymbol{z} z是由 [ x T , w T ] T [\boldsymbol{x}^T,\boldsymbol{w}^T]^T [xT,wT]T经过线性变换(矩阵 A \boldsymbol{A} A)得到的,所以 z \boldsymbol{z} z也服从高斯分布。关于高斯线性变换的性质,我们利用矩母函数,做出如下解释和证明。
高斯随机向量的线性变换仍为高斯:解释与证明
给定随机变量 X ∼ f X ( x ) {X} \sim f_{{X}}(x) X∼fX(x),其矩母函数为:
ϕ X ( w ) = ∫ f X ( x ) e j w x d x = E [ e j w x ] \begin{aligned} \phi_{X}(w) &= \int f_{{X}}(x) e^{jwx} \text{d}x \\ & = \mathbb{E} \left [ e^{jwx} \right] \end{aligned} ϕX(w)=∫fX(x)ejwxdx=E[ejwx]
我们假设
X ∈ R N , X ∼ N ( μ , Σ ) , Y = A X ∈ R m , A ∈ R m × n \boldsymbol X \in \mathbb{R}^{N}, \boldsymbol X \sim \mathcal{N}(\boldsymbol \mu, \boldsymbol \Sigma), \boldsymbol Y = \boldsymbol {AX} \in \mathbb{R}^{m}, \boldsymbol A \in \mathbb{R}^{m \times n} X∈RN,X∼N(μ,Σ),Y=AX∈Rm,A∈Rm×n
由于矩阵 A \boldsymbol{A} A不是方阵,无法从概率密度函数的角度分析 Y \boldsymbol{Y} Y的分布,我们借助矩母函数:
ϕ Y ( w ) = E [ exp ( j w T y ) ] = E [ exp ( j w T A x ) ] = E [ exp ( j ( A T w ) T x ) ] = ϕ X ( A T w ) = exp ( j w T A μ − 1 2 w T A Σ A T w ) ⇒ Y ∼ N ( A μ , A Σ A T ) (3) \begin{aligned} \phi_{\boldsymbol Y}(\boldsymbol w) &= \mathbb{E} \left [ \exp(j \boldsymbol w^T \boldsymbol y) \right ] \\ & = \mathbb{E} \left [ \exp(j \boldsymbol w^T \boldsymbol {Ax}) \right ] \\ & = \mathbb{E} \left [ \exp(j (\boldsymbol A^T \boldsymbol w)^T \boldsymbol {x}) \right ] \\ & = \phi_{\boldsymbol X}(\boldsymbol A^T \boldsymbol w) \\ & = \exp(j \boldsymbol w^T \boldsymbol A \boldsymbol \mu - \frac{1}{2} \boldsymbol w^T \boldsymbol {A \Sigma A}^T \boldsymbol w) \\ \Rightarrow \boldsymbol Y &\sim \mathcal{N}(\boldsymbol {A\mu},\boldsymbol {A \Sigma A}^T) \end{aligned} \tag{3} ϕY(w)⇒Y=E[exp(jwTy)]=E[exp(jwTAx)]=E[exp(j(ATw)Tx)]=ϕX(ATw)=exp(jwTAμ−21wTAΣATw)∼N(Aμ,AΣAT)(3)
因此高斯分布的线性变换仍然是高斯的。反过来,我们可以通过构造矩阵 A \boldsymbol{A} A来求解边际概率,比如要求 x 1 x_1 x1的边际概率,只需把矩阵 A \boldsymbol{A} A构造为第一个对角元素为1,其他元素都为0的矩阵即可。可以表述为:如果联合分布是高斯分布,则边际分布一定是高斯分布。但是反过来不一定成立:如果边际分布是高斯分布,则联合分布不一定是高斯分布,反例如下:构造
f X 1 , X 2 ( x 1 , x 2 ) = 1 2 π exp ( − x 1 2 + x 2 2 2 ) + K ( x 1 , x 2 ) f_{X_1,X_2}(x_1,x_2) = \frac{1}{2\pi} \exp \left( -\frac{x_1^2 + x_2^2}{2} \right) + K(x_1,x_2) fX1,X2(x1,x2)=2π1exp(−2x12+x22)+K(x1,x2)
其中
∫ K ( x 1 , x 2 ) d x 1 = ∫ K ( x 1 , x 2 ) d x 2 = 0 \int K(x_1,x_2) \text{d} x_1 = \int K(x_1,x_2) \text{d} x_2 = 0 ∫K(x1,x2)dx1=∫K(x1,x2)dx2=0
那么关于 x 1 , x 2 x_1,x_2 x1,x2的边际分布是高斯分布,但是不满足高斯分布的一种构造方式为:
f X 1 , X 2 ( x 1 , x 2 ) = 1 2 π exp ( − x 1 2 + x 2 2 2 ) ( 1 + sin x 1 sin x 2 ) f_{X_1,X_2}(x_1,x_2) = \frac{1}{2\pi} \exp \left( -\frac{x_1^2 + x_2^2}{2} \right)(1 + \sin x_1\sin x_2) fX1,X2(x1,x2)=2π1exp(−2x12+x22)(1+sinx1sinx2)
可以看出,构造的例子满足边际概率为高斯分布,但是不满足联合概率为高斯。
延续式(1)和式(2),因为独立性,不难得到
[ x w ] ∼ N ( [ μ 0 ] , [ C x 0 0 C w ] ) (5) \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{w}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{array}{c} \boldsymbol{\mu }\\ \boldsymbol{0}\\ \end{array} \right] ,\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol 0\\ \boldsymbol 0& \boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \right) \tag{5} [xw]∼N([μ0],[Cx00Cw])(5)
再考虑线性变换,根据式(4),可以得到:
z = [ H I N I p 0 ] [ x w ] ∼ N ( [ H I N I p 0 ] [ μ 0 ] , [ H I N I p 0 ] [ C x 0 0 C w ] [ H T I p I N 0 ] ) \boldsymbol{z} = \left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{w}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{\mu }\\ \boldsymbol{0}\\ \end{array} \right] ,\left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol 0\\ \boldsymbol 0& \boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{H}^T& \boldsymbol{I}_p\\ \boldsymbol{I}_N& \boldsymbol{0}\\ \end{matrix} \right] \right) z=[HIpIN0][xw]∼N([HIpIN0][μ0],[HIpIN0][Cx00Cw][HTINIp0])
即
[ y x ] ∼ N ( [ H I N I p 0 ] [ μ 0 ] , [ H I N I p 0 ] [ C x 0 0 C w ] [ H T I p I N 0 ] ) (6) \left[ \begin{array}{c} \boldsymbol{y}\\ \boldsymbol{x}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{\mu }\\ \boldsymbol{0}\\ \end{array} \right] ,\left[ \begin{matrix} \boldsymbol{H}& \boldsymbol{I}_N\\ \boldsymbol{I}_p& \boldsymbol{0}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol 0\\ \boldsymbol 0& \boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{H}^T& \boldsymbol{I}_p\\ \boldsymbol{I}_N& \boldsymbol{0}\\ \end{matrix} \right] \right) \tag{6} [yx]∼N([HIpIN0][μ0],[HIpIN0][Cx00Cw][HTINIp0])(6)
式(6)也就是 x , y \boldsymbol{x,y} x,y的联合概率分布,化简为
[ x y ] ∼ N ( [ μ H μ ] , [ C x C x H T H C x H C x H T + C w ] ) (7) \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{y}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{array}{c} \boldsymbol{\mu }\\ \boldsymbol{H\mu }\\ \end{array} \right] ,\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol{C}_{\boldsymbol{x}}\boldsymbol{H}^T\\ \boldsymbol{HC}_{\boldsymbol{x}}& \boldsymbol{HC}_{\boldsymbol{x}}\boldsymbol{H}^T+\boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \right) \tag{7} [xy]∼N([μHμ],[CxHCxCxHTHCxHT+Cw])(7)
基于(7),构造线性变换
[ 0 0 0 I ] [ x y ] \left[ \begin{matrix} \boldsymbol{0}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{I}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{y}\\ \end{array} \right] [000I][xy]
可以得到关于 y \boldsymbol{y} y的边际概率:
y ∼ N ( H μ , H C x H T + C w ) (8) \boldsymbol y \sim \mathcal{N}(\boldsymbol{H \mu},\boldsymbol{HC}_{\boldsymbol{x}}\boldsymbol{H}^T+\boldsymbol{C}_{\boldsymbol{w}}) \tag{8} y∼N(Hμ,HCxHT+Cw)(8)
为了方便描述,我们令
[ x y ] ∼ N ( [ E [ x ] E [ y ] ] , C ) \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{y}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{array}{c} \mathbb{E} [\boldsymbol{x}]\\ \mathbb{E} [\boldsymbol{y}]\\ \end{array} \right] ,\boldsymbol{C} \right) [xy]∼N([E[x]E[y]],C)
那么条件概率 p ( y ∣ x ) p(\boldsymbol{y}|\boldsymbol{x}) p(y∣x)可以表示为:
p ( y ∣ x ) = p ( x , y ) p ( x ) = 1 ( 2 π ) N + p 2 det 1 2 ( C ) exp [ − 1 2 [ x − E [ x ] y − E [ y ] ] T C − 1 [ x − E [ x ] y − E [ y ] ] ] 1 ( 2 π ) p 2 det 1 2 ( C x ) exp [ − 1 2 ( x − E [ x ) T C x − 1 ( x − E [ x ) ] \begin{aligned} p(\boldsymbol{y}|\boldsymbol{x}) & = \frac{p(\boldsymbol{x},\boldsymbol y)}{p(\boldsymbol x)} \\ & = \frac{\frac{1}{(2 \pi)^{{\frac{N+p}{2}}} \text{det}^{\frac{1}{2}} (\boldsymbol C)} \exp \left [ -\frac{1}{2} \left[ \begin{array}{c} \boldsymbol{x}-\mathbb{E} [\boldsymbol{x}]\\ \boldsymbol{y}-\mathbb{E} [\boldsymbol{y}]\\ \end{array} \right] ^T\boldsymbol{C}^{-1}\left[ \begin{array}{c} \boldsymbol{x}-\mathbb{E} [\boldsymbol{x}]\\ \boldsymbol{y}-\mathbb{E} [\boldsymbol{y}]\\ \end{array} \right] \right]} { \frac{1}{(2 \pi)^{\frac{p}{2}} \text{det}^{\frac{1}{2}} (\boldsymbol C_x)} \exp \left[ -\frac{1}{2} (\boldsymbol{x}-\mathbb{E} [\boldsymbol{x})^T \boldsymbol C^{-1}_x (\boldsymbol{x}-\mathbb{E} [\boldsymbol{x}) \right ] } \end{aligned} p(y∣x)=p(x)p(x,y)=(2π)2pdet21(Cx)1exp[−21(x−E[x)TCx−1(x−E[x)](2π)2N+pdet21(C)1exp[−21[x−E[x]y−E[y]]TC−1[x−E[x]y−E[y]]]
将协方差矩阵构造为分块矩阵的形式(对应到式(7)):
C = [ C x C x y C y x C y ] \boldsymbol C = \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol{C}_{\boldsymbol{xy}}\\ \boldsymbol{C}_{\boldsymbol{yx}}& \boldsymbol{C}_{\boldsymbol{y}}\\ \end{matrix} \right] C=[CxCyxCxyCy]
那么依据分块矩阵的行列式分解公式:
det ( [ A 11 A 12 A 21 A 22 ] ) = det ( A 11 ) det ( A 22 − A 21 A 11 − 1 A 12 ) \text{det} \left ( \left[ \begin{matrix} \boldsymbol{A}_{11}& \boldsymbol{A}_{12}\\ \boldsymbol{A}_{21}& \boldsymbol{A}_{22}\\ \end{matrix} \right] \right) = \text{det} (\boldsymbol{A}_{11}) \text{det} (\boldsymbol{A}_{22} - \boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1} \boldsymbol{A}_{12}) det([A11A21A12A22])=det(A11)det(A22−A21A11−1A12)
因此
det ( C ) = det ( C x ) det ( C y − C y x C x − 1 C x y ) ⇒ det ( C ) det ( C x ) = det ( C y − C y x C x − 1 C x y ) \begin{aligned} \text{det}(\boldsymbol C) &= \text{det}(\boldsymbol {C}_{\boldsymbol x}) \text{det} (\boldsymbol C_{\boldsymbol y} - \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} \boldsymbol{C}_{\boldsymbol{xy}} ) \\ \Rightarrow \frac{\text{det}(\boldsymbol C)}{\text{det}(\boldsymbol {C}_{\boldsymbol x})} &= \text{det} (\boldsymbol C_{\boldsymbol y} - \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} \boldsymbol{C}_{\boldsymbol{xy}} ) \end{aligned} det(C)⇒det(Cx)det(C)=det(Cx)det(Cy−CyxCx−1Cxy)=det(Cy−CyxCx−1Cxy)
因此,我们可以进一步把 p ( y ∣ x ) p(\boldsymbol{y}|\boldsymbol{x}) p(y∣x)表示为:
p ( y ∣ x ) = 1 ( 2 π ) N 2 det 1 2 ( C y − C y x C x − 1 C x y ) exp ( − 1 2 Q ) p(\boldsymbol{y}|\boldsymbol{x}) = \frac{1} {(2\pi)^{\frac{N}{2}} \text{det}^{\frac{1}{2}} (\boldsymbol C_{\boldsymbol y} - \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} \boldsymbol{C}_{\boldsymbol{xy}})} \exp \left( -\frac{1}{2} Q \right) p(y∣x)=(2π)2Ndet21(Cy−CyxCx−1Cxy)1exp(−21Q)
其中
Q = [ x − E [ x ] y − E [ y ] ] T C − 1 [ x − E [ x ] y − E [ y ] ] − ( x − E [ x ] ) T C x − 1 ( x − E [ x ] ) Q = \left[ \begin{array}{c} \boldsymbol{x}-\mathbb{E} [\boldsymbol{x}]\\ \boldsymbol{y}-\mathbb{E} [\boldsymbol{y}]\\ \end{array} \right] ^T\boldsymbol{C}^{-1}\left[ \begin{array}{c} \boldsymbol{x}-\mathbb{E} [\boldsymbol{x}]\\ \boldsymbol{y}-\mathbb{E} [\boldsymbol{y}]\\ \end{array} \right] - (\boldsymbol{x}-\mathbb{E} [\boldsymbol{x}])^T \boldsymbol C^{-1}_x (\boldsymbol{x}-\mathbb{E} [\boldsymbol{x}]) Q=[x−E[x]y−E[y]]TC−1[x−E[x]y−E[y]]−(x−E[x])TCx−1(x−E[x])
对于分块对称矩阵 C \boldsymbol{C} C,其求逆公式为:
[ A 11 A 12 A 21 A 22 ] − 1 = [ ( A 11 − A 12 A 22 − 1 A 21 ) − 1 − A 11 − 1 A 12 ( A 22 − A 21 A 11 − 1 A 12 ) − 1 − ( A 22 − A 21 A 11 − 1 A 12 ) − 1 A 21 A 11 − 1 ( A 22 − A 21 A 11 − 1 A 12 ) − 1 ] \left[ \begin{matrix} \boldsymbol{A}_{11}& \boldsymbol{A}_{12}\\ \boldsymbol{A}_{21}& \boldsymbol{A}_{22}\\ \end{matrix} \right] ^{-1}=\left[ \begin{matrix} \left( \boldsymbol{A}_{11}-\boldsymbol{A}_{12}\boldsymbol{A}_{22}^{-1}\boldsymbol{A}_{21} \right) ^{-1}& -\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12}\left( \boldsymbol{A}_{22}-\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12} \right) ^{-1}\\ -\left( \boldsymbol{A}_{22}-\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12} \right) ^{-1}\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1}& \left( \boldsymbol{A}_{22}-\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12} \right) ^{-1}\\ \end{matrix} \right] [A11A21A12A22]−1=[(A11−A12A22−1A21)−1−(A22−A21A11−1A12)−1A21A11−1−A11−1A12(A22−A21A11−1A12)−1(A22−A21A11−1A12)−1]
根据矩阵求逆引理
( A 11 − A 12 A 22 − 1 A 21 ) − 1 = A 11 − 1 + A 11 − 1 A 12 ( A 22 − A 21 A 11 − 1 A 12 ) − 1 A 21 A 11 − 1 \left( \boldsymbol{A}_{11}-\boldsymbol{A}_{12}\boldsymbol{A}_{22}^{-1}\boldsymbol{A}_{21} \right) ^{-1}\,\,=\,\,\boldsymbol{A}_{11}^{-1}\,\,+\,\,\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12}\left( \boldsymbol{A}_{22}-\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1}\boldsymbol{A}_{12} \right) ^{-1}\boldsymbol{A}_{21}\boldsymbol{A}_{11}^{-1} (A11−A12A22−1A21)−1=A11−1+A11−1A12(A22−A21A11−1A12)−1A21A11−1
我们代入可以得到
C − 1 = [ C x − 1 − C x − 1 C x y B − 1 C y x C x − 1 − C x − 1 C x y B − 1 − B − 1 C y x C x − 1 B − 1 ] \boldsymbol{C}^{-1}=\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}^{-1}-\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{C}_{\boldsymbol{xy}}\boldsymbol{B}^{-1}\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}& -\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{C}_{\boldsymbol{xy}}\boldsymbol{B}^{-1}\\ -\boldsymbol{B}^{-1}\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{B}^{-1}\\ \end{matrix} \right] C−1=[Cx−1−Cx−1CxyB−1CyxCx−1−B−1CyxCx−1−Cx−1CxyB−1B−1]
其中
B = C y y − C y x C x x − 1 C x y \boldsymbol B = \boldsymbol C_{\boldsymbol {y y}} - \boldsymbol C_{\boldsymbol {y x}} \boldsymbol C^{-1}_{\boldsymbol {x x}} \boldsymbol C_{\boldsymbol {x y}} B=Cyy−CyxCxx−1Cxy
进一步,我们把 C − 1 \boldsymbol{C}^{-1} C−1分解为:
C − 1 = [ I − C x − 1 C x y 0 I ] [ C x − 1 0 0 B − 1 ] [ I 0 − C y x C x − 1 I ] \boldsymbol{C}^{-1}=\,\,\left[ \begin{matrix} \boldsymbol{I}& -\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{C}_{\boldsymbol{xy}}\\ \boldsymbol{0}& \boldsymbol{I}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{B}^{-1}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{I}& \boldsymbol{0}\\ -\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{I}\\ \end{matrix} \right] C−1=[I0−Cx−1CxyI][Cx−100B−1][I−CyxCx−10I]
令 x ~ = x − E [ x ] \tilde{ \boldsymbol x} = \boldsymbol{x} - \mathbb{E}[\boldsymbol{x}] x~=x−E[x], y ~ = y − E [ y ] \tilde{ \boldsymbol y} = \boldsymbol{y} - \mathbb{E}[\boldsymbol{y}] y~=y−E[y],我们有
Q = [ x ~ y ~ ] T [ I − C x − 1 C x y 0 I ] [ C x − 1 0 0 B − 1 ] [ I 0 − C y x C x − 1 I ] [ x ~ y ~ ] − x ~ T C x − 1 x ~ = [ x ~ y ~ − C y x C x − 1 x ~ ] T [ C x − 1 0 0 B − 1 ] [ x ~ y ~ − C y x C x − 1 x ~ ] − x ~ T C x − 1 x ~ = ( y ~ − C y x C x − 1 x ~ ) T B − 1 ( y ~ − C y x C x − 1 x ~ ) \begin{aligned} Q &= \left[ \begin{array}{c} \boldsymbol{\tilde{x}}\\ \boldsymbol{\tilde{y}}\\ \end{array} \right] ^T\left[ \begin{matrix} \boldsymbol{I}& -\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{C}_{\boldsymbol{xy}}\\ \boldsymbol{0}& \boldsymbol{I}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{B}^{-1}\\ \end{matrix} \right] \left[ \begin{matrix} \boldsymbol{I}& \boldsymbol{0}\\ -\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{I}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{\tilde{x}}\\ \boldsymbol{\tilde{y}}\\ \end{array} \right] \,\,-\,\,\boldsymbol{\tilde{x}}^T\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}} \\ & = \left[ \begin{array}{c} \boldsymbol{\tilde{x}}\\ \boldsymbol{\tilde{y}}-\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}}\\ \end{array} \right] ^T\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}^{-1}& \boldsymbol{0}\\ \boldsymbol{0}& \boldsymbol{B}^{-1}\\ \end{matrix} \right] \left[ \begin{array}{c} \boldsymbol{\tilde{x}}\\ \boldsymbol{\tilde{y}}-\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}}\\ \end{array} \right] -\,\,\boldsymbol{\tilde{x}}^T\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}} \\ & = \left( \boldsymbol{\tilde{y}}-\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}} \right) ^T\boldsymbol{B}^{-1}\left( \boldsymbol{\tilde{y}}-\boldsymbol{C}_{\boldsymbol{yx}}\boldsymbol{C}_{\boldsymbol{x}}^{-1}\boldsymbol{\tilde{x}} \right) \end{aligned} Q=[x~y~]T[I0−Cx−1CxyI][Cx−100B−1][I−CyxCx−10I][x~y~]−x~TCx−1x~=[x~y~−CyxCx−1x~]T[Cx−100B−1][x~y~−CyxCx−1x~]−x~TCx−1x~=(y~−CyxCx−1x~)TB−1(y~−CyxCx−1x~)
因此,条件概率 p ( y ∣ x ) p(\boldsymbol{y}|\boldsymbol{x}) p(y∣x)表示为:
p ( y ∣ x ) = 1 ( 2 π ) N 2 det 1 2 ( C y − C y x C x − 1 C x y ) exp ( − 1 2 ∥ ( C y − C y x C x x − 1 C x y ) − 1 2 ( y − ( E [ y ] + C y x C x − 1 ( x − E [ x ] ) ) ) ∥ 2 2 ) p(\boldsymbol{y}|\boldsymbol{x}) = \frac{1} {(2\pi)^{\frac{N}{2}} \text{det}^{\frac{1}{2}} (\boldsymbol C_{\boldsymbol y} - \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} \boldsymbol{C}_{\boldsymbol{xy}})} \exp \left( -\frac{1}{2} { \left \Vert \left (\boldsymbol C_{\boldsymbol {y }} - \boldsymbol C_{\boldsymbol {y x}} \boldsymbol C^{-1}_{\boldsymbol {x x}} \boldsymbol C_{\boldsymbol {x y}}\right )^{-\frac{1}{2}} \left( \boldsymbol y - \left(\mathbb{E}[\boldsymbol y] + \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} (\boldsymbol{x} - \mathbb{E}[\boldsymbol{x}]) \right) \right) \right \Vert}^2_2 \right) p(y∣x)=(2π)2Ndet21(Cy−CyxCx−1Cxy)1exp(−21∥∥∥(Cy−CyxCxx−1Cxy)−21(y−(E[y]+CyxCx−1(x−E[x])))∥∥∥22)
即
y ∣ x ∼ N ( E [ y ] + C y x C x − 1 ( x − E [ x ] ) , C y − C y x C x x − 1 C x y ) (9) \boldsymbol y| \boldsymbol x \sim \mathcal{N}\left(\mathbb{E}[\boldsymbol y] + \boldsymbol{C}_{\boldsymbol{yx}} \boldsymbol {C}_{\boldsymbol x}^{-1} (\boldsymbol{x} - \mathbb{E}[\boldsymbol{x}]), \boldsymbol C_{\boldsymbol {y}} - \boldsymbol C_{\boldsymbol {y x}} \boldsymbol C^{-1}_{\boldsymbol {x x}} \boldsymbol C_{\boldsymbol {x y}} \right) \tag{9} y∣x∼N(E[y]+CyxCx−1(x−E[x]),Cy−CyxCxx−1Cxy)(9)
类似地,我们可以得到
x ∣ y ∼ N ( E [ x ] + C x y C y − 1 ( y − E [ y ] ) , C x − C x y C y y − 1 C y x ) (10) \boldsymbol x| \boldsymbol y \sim \mathcal{N}\left(\mathbb{E}[\boldsymbol x] + \boldsymbol{C}_{\boldsymbol{xy}} \boldsymbol {C}_{\boldsymbol y}^{-1} (\boldsymbol{y} - \mathbb{E}[\boldsymbol{y}]), \boldsymbol C_{\boldsymbol {x}} - \boldsymbol C_{\boldsymbol {x y}} \boldsymbol C^{-1}_{\boldsymbol {y y}} \boldsymbol C_{\boldsymbol {y x}} \right) \tag{10} x∣y∼N(E[x]+CxyCy−1(y−E[y]),Cx−CxyCyy−1Cyx)(10)
贝叶斯线性模型中协方差的对应关系:
C = [ C x C x y C y x C y ] = [ C x C x H T H C x H C x H T + C w ] (11) \boldsymbol C = \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol{C}_{\boldsymbol{xy}}\\ \boldsymbol{C}_{\boldsymbol{yx}}& \boldsymbol{C}_{\boldsymbol{y}}\\ \end{matrix} \right] = \left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol{C}_{\boldsymbol{x}}\boldsymbol{H}^T\\ \boldsymbol{HC}_{\boldsymbol{x}}& \boldsymbol{HC}_{\boldsymbol{x}}\boldsymbol{H}^T+\boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \tag{11} C=[CxCyxCxyCy]=[CxHCxCxHTHCxHT+Cw](11)
(1)似然函数
似然分布对应式(9),把式(11)代入到式(9)中,我们发现:
y ∣ x ∼ N ( y ; H x , C w ) (12) \boldsymbol y | \boldsymbol x \sim \mathcal{N} \left (\boldsymbol y; \boldsymbol {Hx}, \boldsymbol C_{\boldsymbol w} \right) \tag{12} y∣x∼N(y;Hx,Cw)(12)
我们要说明上述似然函数是由标准的联合分布推导的,很巧的是,该式与贝叶斯线性模型 y = H x + w \boldsymbol y = \boldsymbol {H x} + \boldsymbol w y=Hx+w直观意义上的似然形式一致,如果 x \boldsymbol{x} x和 w \boldsymbol{w} w都服从高斯分布(这是大前提,对于一般的 x \boldsymbol{x} x的分布,我现在还不确定是否可以直接这样写,感觉应该是不能,具体问题可能得写出(2)的线性转换模型,再根据矩母函数和相应的逆变换判断),我们可以根据概率公式直接写出联合概率
p ( y , x ) = p ( y ∣ x ) p ( x ) = N ( y ; H x , C w ) ⋅ N ( μ x , C x ) (13) p(\boldsymbol y, \boldsymbol x)=p(\boldsymbol y | \boldsymbol x)p(\boldsymbol x)=\mathcal{N}\left (\boldsymbol y; \boldsymbol {Hx}, \boldsymbol C_{\boldsymbol w} \right) \cdot \mathcal{N}(\boldsymbol{ \mu_x}, \boldsymbol{C_x}) \tag{13} p(y,x)=p(y∣x)p(x)=N(y;Hx,Cw)⋅N(μx,Cx)(13)
(2)后验函数
后验分布对应式(10),把式(11)代入到式(10)中,我们发现:
E [ x ∣ y ] = E [ x ] + C x H T ( H C x H T + C w ) − 1 ( y − E [ y ] ) = μ x + C x H T ( H C x H T + C w ) − 1 ( y − H μ x ) (14) \begin{aligned} \mathbb{E} [\boldsymbol x| \boldsymbol y] & = \mathbb{E} [\boldsymbol x] + \boldsymbol C_{\boldsymbol x} \boldsymbol H^T \left ( \boldsymbol H \boldsymbol C_{\boldsymbol x} \boldsymbol H^T + \boldsymbol C_{\boldsymbol w} \right )^{-1} (\boldsymbol y - \mathbb{E} [\boldsymbol y]) \\ &= \boldsymbol \mu_{\boldsymbol x} + \boldsymbol C_{\boldsymbol x} \boldsymbol H^T \left ( \boldsymbol H \boldsymbol C_{\boldsymbol x} \boldsymbol H^T + \boldsymbol C_{\boldsymbol w} \right )^{-1} (\boldsymbol y - \boldsymbol H \boldsymbol \mu_{\boldsymbol x}) \tag{14} \end{aligned} E[x∣y]=E[x]+CxHT(HCxHT+Cw)−1(y−E[y])=μx+CxHT(HCxHT+Cw)−1(y−Hμx)(14)
与之对应的协方差矩阵为
C x ∣ y = C x − C x H T ( H C x H T + C w ) − 1 H C x (15) \boldsymbol C_{\boldsymbol x|\boldsymbol y} = \boldsymbol C_{\boldsymbol x} - \boldsymbol C_{\boldsymbol x} \boldsymbol H^T \left ( \boldsymbol H \boldsymbol C_{\boldsymbol x} \boldsymbol H^T + \boldsymbol C_{\boldsymbol w} \right )^{-1} \boldsymbol H \boldsymbol C_{\boldsymbol x} \tag{15} Cx∣y=Cx−CxHT(HCxHT+Cw)−1HCx(15)
借助求逆定理
( E + B C D ) − 1 = E − 1 − E − 1 B ( C − 1 + D E − 1 B ) − 1 D E − 1 (\pmb E + \pmb B \pmb C \pmb D)^{-1}=\pmb E^{-1}- \pmb E^{-1} \pmb B (\pmb C^{-1}+ D \pmb E^{-1} \pmb B)^{-1} \pmb D \pmb E^{-1} (EEE+BBBCCCDDD)−1=EEE−1−EEE−1BBB(CCC−1+DEEE−1BBB)−1DDDEEE−1
经过一系列化简,式(14)(15)还可以化为:
E [ x ∣ y ] = μ x + ( C x − 1 + H T C w − 1 H ) − 1 H T C w − 1 ( y − H μ x ) C x ∣ y = ( C x − 1 + H T C w − 1 H ) − 1 (16) \begin{aligned} \mathbb{E} [\boldsymbol x| \boldsymbol y] &= \boldsymbol \mu_{\boldsymbol x} + \left ( \boldsymbol C^{-1}_{\boldsymbol x} + \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H \right)^{-1} \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} (\boldsymbol y - \boldsymbol H \boldsymbol \mu_{\boldsymbol x}) \\ \boldsymbol C_{\boldsymbol x|\boldsymbol y} &= \left ( \boldsymbol C^{-1}_{\boldsymbol x} + \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H \right)^{-1} \end{aligned} \tag{16} E[x∣y]Cx∣y=μx+(Cx−1+HTCw−1H)−1HTCw−1(y−Hμx)=(Cx−1+HTCw−1H)−1(16)
(3)似然函数 → \rightarrow →联合分布 → \rightarrow →后验分布
根据Bayes公式:
p ( x ∣ y ) = p ( y ∣ x ) p ( x ) p ( y ) = p ( y ∣ x ) p ( x ) ∫ p ( y ∣ x ) p ( x ) d x \begin{aligned} p( \boldsymbol x|\boldsymbol y) &= \frac{p(\boldsymbol y | \boldsymbol x)p(\boldsymbol x)}{p(\boldsymbol y)} \\ & = \frac{p(\boldsymbol y | \boldsymbol x)p(\boldsymbol x)}{\int p(\boldsymbol y | \boldsymbol x)p(\boldsymbol x) \text{d} \boldsymbol x} \end{aligned} p(x∣y)=p(y)p(y∣x)p(x)=∫p(y∣x)p(x)dxp(y∣x)p(x)
因为分母是归一化因子(或者理解为 y \boldsymbol{y} y已经被观测到了,所以认为 p ( y ) p(\boldsymbol y) p(y)已知),所以有:
p ( x ∣ y ) ∝ p ( y ∣ x ) p ( x ) = N ( y ; H x , C w ) ⋅ N ( μ x , C x ) \begin{aligned} p( \boldsymbol x|\boldsymbol y) & \propto p(\boldsymbol y | \boldsymbol x)p(\boldsymbol x) \\ & = \mathcal{N}\left (\boldsymbol y; \boldsymbol {Hx}, \boldsymbol C_{\boldsymbol w} \right) \cdot \mathcal{N}(\boldsymbol{ \mu_x}, \boldsymbol{C_x}) \end{aligned} p(x∣y)∝p(y∣x)p(x)=N(y;Hx,Cw)⋅N(μx,Cx)
根据之前我写的博客两个复高斯分布的乘积,我们可以得到 N ( y ; H x , C w ) ⋅ N ( μ x , C x ) \mathcal{N}\left (\boldsymbol y; \boldsymbol {Hx}, \boldsymbol C_{\boldsymbol w} \right) \cdot \mathcal{N}(\boldsymbol{ \mu_x}, \boldsymbol{C_x}) N(y;Hx,Cw)⋅N(μx,Cx)的均值和方差为
E [ x ∣ y ] = ( C x − 1 + H T C w − 1 H ) − 1 ( C x − 1 μ x + H T C w − 1 y ) C x ∣ y = ( C x − 1 + H T C w − 1 H ) − 1 (17) \begin{aligned} \mathbb{E} [\boldsymbol x| \boldsymbol y] &={\left ( \boldsymbol C^{-1}_{\boldsymbol x}+\boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H ^{} \right )}^{-1} \left( \boldsymbol C^{-1}_{\boldsymbol x}\boldsymbol \mu_{\boldsymbol x}+\boldsymbol H^T \boldsymbol C_{\boldsymbol w}^{-1}\boldsymbol{y } \right) \\ \boldsymbol C_{\boldsymbol x|\boldsymbol y} &= {\left ( \boldsymbol C^{-1}_{\boldsymbol x}+\boldsymbol H^T \boldsymbol C_{\boldsymbol w}^{-1} \boldsymbol H ^{} \right )}^{-1} \end{aligned} \tag{17} E[x∣y]Cx∣y=(Cx−1+HTCw−1H)−1(Cx−1μx+HTCw−1y)=(Cx−1+HTCw−1H)−1(17)
贝叶斯线性模型:
y = H x + w \boldsymbol y = \boldsymbol {H x} + \boldsymbol w y=Hx+w
其中 y ∈ R N \boldsymbol{y} \in \mathbb{R}^{N} y∈RN, H ∈ R N × p \boldsymbol{H} \in \mathbb{R}^{N \times p} H∈RN×p已知, x ∈ R p \boldsymbol x \in \mathbb{R}^{p} x∈Rp且 x ∼ N ( μ x , C x ) \boldsymbol x \sim \mathcal{N}(\boldsymbol{ \mu_x}, \boldsymbol{C_x}) x∼N(μx,Cx), w ∈ R N \boldsymbol{w} \in \mathbb{R}^N w∈RN是噪声向量, w ∼ N ( 0 , C w ) \boldsymbol w \sim \mathcal{N}(\boldsymbol 0, \boldsymbol {C_w}) w∼N(0,Cw), x \boldsymbol x x与 w \boldsymbol{w} w相互独立。
(1) x , y \boldsymbol {x,y} x,y的联合分布
[ x y ] ∼ N ( [ μ H μ ] , [ C x C x H T H C x H C x H T + C w ] ) \left[ \begin{array}{c} \boldsymbol{x}\\ \boldsymbol{y}\\ \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{array}{c} \boldsymbol{\mu }\\ \boldsymbol{H\mu }\\ \end{array} \right] ,\left[ \begin{matrix} \boldsymbol{C}_{\boldsymbol{x}}& \boldsymbol{C}_{\boldsymbol{x}}\boldsymbol{H}^T\\ \boldsymbol{HC}_{\boldsymbol{x}}& \boldsymbol{HC}_{\boldsymbol{x}}\boldsymbol{H}^T+\boldsymbol{C}_{\boldsymbol{w}}\\ \end{matrix} \right] \right) [xy]∼N([μHμ],[CxHCxCxHTHCxHT+Cw])
(2) y \boldsymbol {y} y的边际分布
y ∼ N ( H μ , H C x H T + C w ) \boldsymbol y \sim \mathcal{N}(\boldsymbol{H \mu},\boldsymbol{HC}_{\boldsymbol{x}}\boldsymbol{H}^T+\boldsymbol{C}_{\boldsymbol{w}}) y∼N(Hμ,HCxHT+Cw)
(3) y ∣ x \boldsymbol y | \boldsymbol x y∣x似然分布
y ∣ x ∼ N ( y ; H x , C w ) \boldsymbol y | \boldsymbol x \sim \mathcal{N} \left (\boldsymbol y; \boldsymbol {Hx}, \boldsymbol C_{\boldsymbol w} \right) y∣x∼N(y;Hx,Cw)
我们要说明上述似然函数是由标准的联合分布推导的,很巧的是,该式与贝叶斯线性模型 y = H x + w \boldsymbol y = \boldsymbol {H x} + \boldsymbol w y=Hx+w直观意义上的似然形式一致( x \boldsymbol{x} x和 w \boldsymbol{w} w都服从高斯分布是大前提,对于一般的 x \boldsymbol{x} x的分布,我现在还不确定是否可以直接这样写,感觉应该是不能,具体问题可能得写出(2)的线性转换模型,再根据矩母函数和相应的逆变换判断)
(4) x ∣ y \boldsymbol x | \boldsymbol y x∣y后验分布
E [ x ∣ y ] = a μ x + C x H T ( H C x H T + C w ) − 1 ( y − H μ x ) = b μ x + ( C x − 1 + H T C w − 1 H ) − 1 H T C w − 1 ( y − H μ x ) = c ( C x − 1 + H T C w − 1 H ) − 1 ( C x − 1 μ x + H T C w − 1 y ) \begin{aligned} \mathbb{E} [\boldsymbol x| \boldsymbol y] & \overset{a}{=} \boldsymbol \mu_{\boldsymbol x} + \boldsymbol C_{\boldsymbol x} \boldsymbol H^T \left ( \boldsymbol H \boldsymbol C_{\boldsymbol x} \boldsymbol H^T + \boldsymbol C_{\boldsymbol w} \right )^{-1} (\boldsymbol y - \boldsymbol H \boldsymbol \mu_{\boldsymbol x}) \\ & \overset{b}{=} \boldsymbol \mu_{\boldsymbol x} + \left ( \boldsymbol C^{-1}_{\boldsymbol x} + \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H \right)^{-1} \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} (\boldsymbol y - \boldsymbol H \boldsymbol \mu_{\boldsymbol x}) \\ & \overset{c}{=} {\left ( \boldsymbol C^{-1}_{\boldsymbol x}+\boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H ^{} \right )}^{-1} \left( \boldsymbol C^{-1}_{\boldsymbol x}\boldsymbol \mu_{\boldsymbol x}+\boldsymbol H^T \boldsymbol C_{\boldsymbol w}^{-1}\boldsymbol{y } \right) \end{aligned} E[x∣y]=aμx+CxHT(HCxHT+Cw)−1(y−Hμx)=bμx+(Cx−1+HTCw−1H)−1HTCw−1(y−Hμx)=c(Cx−1+HTCw−1H)−1(Cx−1μx+HTCw−1y)
上述 ( a , b , c ) (a,b,c) (a,b,c)三式是等价的,我们常见的应该是式(a)。这三个式子与LMMSE的形式也是等价的(只是形式上等价,与LMMSE的推导过程无关),因为后验分布是高斯分布,所以LMMSE与MMSE等价。
与之对应的协方差矩阵:
C x ∣ y = a C x − C x H T ( H C x H T + C w ) − 1 H C x = b , c ( C x − 1 + H T C w − 1 H ) − 1 \begin{aligned} \boldsymbol C_{\boldsymbol x|\boldsymbol y} & \overset{a}{=}\boldsymbol C_{\boldsymbol x} - \boldsymbol C_{\boldsymbol x} \boldsymbol H^T \left ( \boldsymbol H \boldsymbol C_{\boldsymbol x} \boldsymbol H^T + \boldsymbol C_{\boldsymbol w} \right )^{-1} \boldsymbol H \boldsymbol C_{\boldsymbol x} \\ & \overset{b,c}{=}\left ( \boldsymbol C^{-1}_{\boldsymbol x} + \boldsymbol H^T \boldsymbol C^{-1}_{\boldsymbol w} \boldsymbol H \right)^{-1} \end{aligned} Cx∣y=aCx−CxHT(HCxHT+Cw)−1HCx=b,c(Cx−1+HTCw−1H)−1
上式协方差的标号与 E [ x ∣ y ] \mathbb{E} [\boldsymbol x| \boldsymbol y] E[x∣y]的的标号相对应。