所谓高斯,即高斯分布
所谓过程,即随机过程
p ( x ) = N ( μ , σ 2 ) p(x)=N(\mu, \sigma^2) p(x)=N(μ,σ2)
多元高斯分布——高斯网络 x ∈ R p x\in \mathbb{R}^p x∈Rp
p ( x ) = N ( μ , Σ ) , Σ p × p , p < inf p(x)=N(\mu,\Sigma),\Sigma_{p\times p},p<\inf p(x)=N(μ,Σ),Σp×p,p<inf
即高斯过程,定义在连续域(时间或者空间)上的无限多个高斯随机变量所组成的随机过程
假设有一个连续域 T T T,对于任意正整数 n n n,有 t 1 , . . . , t n ∈ T t_1,...,t_n \in T t1,...,tn∈T,且满足条件
[ ξ t 1 ⋮ ξ t n ] ∼ N ( μ t 1 − t n , Σ t 1 − t n ) \left[ \begin{array}{c}{\xi_{t_{1}}} \\ {\vdots} \\ {\xi_{t_{n}}}\end{array}\right] \sim N(\mu_{t_1-t_n},\Sigma_{t_1-t_n}) ⎣⎢⎡ξt1⋮ξtn⎦⎥⎤∼N(μt1−tn,Σt1−tn)则 { ξ t } t ∈ T \{\xi_t\}_{t\in T} {ξt}t∈T 就是一个高斯过程。
则一个高斯过程可以表示为
G P ( m ( t ) , k ( s , t ) ) GP(m(t),k(s,t)) GP(m(t),k(s,t))其中 m ( t ) = E [ ξ t ] m(t)=E[\xi_t] m(t)=E[ξt]为均值函数, k ( s , t ) = E [ ξ s − E [ ξ s ] ] [ ξ t − E [ ξ t ] ] k(s,t)=E[\xi_s-E[\xi_s]][\xi_t-E[\xi_t]] k(s,t)=E[ξs−E[ξs]][ξt−E[ξt]]为协方差函数
线性回归
使用核函数就可以用于非线性
贝叶斯线性回归加上核方法(非线性转换内积)也就是高斯过程回归 { f ( x ) = ϕ T ( x ) w y = f ( x ) + ε \left\{\begin{array}{l}{f(x)=\phi^T (x)w} \\ {y=f(x)+\varepsilon}\end{array}\right. {f(x)=ϕT(x)wy=f(x)+ε这是从权重空间的角度来看
f ( x ) ∼ G P ( m ( x ) , k ( x , x ′ ) ) f(x) \sim GP(m(x),k(x,x')) f(x)∼GP(m(x),k(x,x′))
回归问题:
Data: { ( x i , y i ) } i = 1 N , y = f ( x ) + ϵ \{(x_i,y_i)\}_{i=1}^N,y=f(x)+\epsilon {(xi,yi)}i=1N,y=f(x)+ϵ
定义 X N × p = ( x 1 , . . . , x N ) T , Y N × 1 = ( y 1 , . . . , y N ) T X_{N\times p}=(x_1,...,x_N)^T,Y_{N\times 1}=(y_1,...,y_N)^T XN×p=(x1,...,xN)T,YN×1=(y1,...,yN)T
f ( X ) ∼ N ( μ ( X ) , K ( X , X ) ) f(X)\sim N(\mu(X),K(X,X)) f(X)∼N(μ(X),K(X,X))
Y = f ( X ) + ϵ ∼ N ( μ ( X ) , K ( X , X ) + σ 2 I ) Y=f(X)+\epsilon \sim N(\mu(X),K(X,X)+\sigma^2I) Y=f(X)+ϵ∼N(μ(X),K(X,X)+σ2I)
需要预测的数据为 X ∗ X^* X∗,则 Y ∗ = f ( X ∗ ) + ϵ Y^*=f(X^*)+\epsilon Y∗=f(X∗)+ϵ
已知 x ∼ N ( μ , Σ ) x \sim N(\mu,\Sigma) x∼N(μ,Σ)
其中 x = ( x a x b ) , μ = ( μ a μ b ) , Σ = ( Σ a a Σ a b Σ b a Σ b b ) x = \left( \begin{array}{l} {x_a}\\ {x_b} \end{array} \right),\mu = \left( \begin{array}{l} {\mu _a}\\ {\mu _b} \end{array} \right),\Sigma= \left( \begin{array}{ll}{\Sigma_{aa}} & {\Sigma_{ab}} \\ {\Sigma_{ba}} & {\Sigma_{bb}}\end{array}\right) x=(xaxb),μ=(μaμb),Σ=(ΣaaΣbaΣabΣbb)则 x b ∣ x a ∼ N ( μ b ∣ a , Σ b ∣ a ) x_b|x_a \sim N(\mu_{b|a},\Sigma_{b|a}) xb∣xa∼N(μb∣a,Σb∣a)其中
μ b ∣ a = Σ b a Σ a a − 1 ( x a − μ a ) + μ b , Σ b ∣ a = Σ b b − Σ b a Σ a a − 1 Σ a b \mu_{b|a}=\Sigma_{ba}\Sigma_{aa}^{-1}(x_a-\mu_a)+\mu_b,\Sigma_{b|a}=\Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab} μb∣a=ΣbaΣaa−1(xa−μa)+μb,Σb∣a=Σbb−ΣbaΣaa−1Σab
令 x a = Y , x b = f ( X ∗ ) x_a=Y,x_b=f(X^*) xa=Y,xb=f(X∗),所要求的的条件概率为 p ( f ( X ∗ ∣ Y , X , X ∗ ) ) p(f(X^*|Y,X,X^*)) p(f(X∗∣Y,X,X∗)) 即 p ( x b ∣ x a ) p(x_b|x_a) p(xb∣xa),带入公式可得 μ ∗ = K ( X ∗ , X ) ( K ( X , X ) + σ 2 I ) − 1 ( Y − μ ( X ) ) + μ ( X ∗ ) Σ ∗ = K ( X ∗ , X ∗ ) − K ( X ∗ , X ) ( K ( X , X ) + σ 2 I ) − 1 K ( X , X ∗ ) {\mu ^*} = K\left( {{X^*},X} \right){\left( {K\left( {X,X} \right) + {\sigma ^2}I} \right)^{ - 1}}\left( {Y - \mu \left( X \right)} \right) + \mu \left( {{X^*}} \right) \\ \Sigma^*=K(X^*,X^*)-K(X^*,X) {\left( {K\left( {X,X} \right) + {\sigma ^2}I} \right)^{ - 1}}K(X,X^*) μ∗=K(X∗,X)(K(X,X)+σ2I)−1(Y−μ(X))+μ(X∗)Σ∗=K(X∗,X∗)−K(X∗,X)(K(X,X)+σ2I)−1K(X,X∗)因此 p ( f ( X ∗ ∣ Y , X , X ∗ ) ) = N ( μ ∗ , Σ ∗ ) p(f(X^*|Y,X,X^*))=N(\mu^*,\Sigma^*) p(f(X∗∣Y,X,X∗))=N(μ∗,Σ∗) p ( Y ∗ ∣ Y , X , X ∗ ) = N ( μ ∗ , Σ ∗ + σ 2 I ) p(Y^*|Y,X,X^*)=N(\mu^*,\Sigma^*+\sigma^2I) p(Y∗∣Y,X,X∗)=N(μ∗,Σ∗+σ2I)
发现了一个易于理解的博客:https://blog.csdn.net/greenapple_shan/article/details/52402051