基于贝叶斯估计的概率公式推导

统计学习方法第四章贝叶斯估计题

参考1:https://blog.csdn.net/bumingqiu/article/details/73397812

参考2:https://blog.csdn.net/bitcarmanlee/article/details/82156281


 

、第一个公式:

p(Y=c_{k} )=\frac{\lambda+\sum_{i=1}^{N} {I(y_{i}=c_{k})}}{N+K\lambda}, (1)

其中,c_{k}为第k种类别,共有K种;N为样本数目;

证:

p(Y=c_{i})=\pi_{i},i\in [1,K],且(\pi_{1},\pi_{2},...,\pi_{K})服从参数为\lambda的Dirichlet分布(先验分布),则有概率质量函数(即离散变量的概率密度函数)如下:

\large p(\pi_{1},\pi_{2},...,\pi_{K})=\frac{1}{B(\lambda)}\prod_{i=1}^{K}\pi_{i}^{\lambda-1},(2);

(2)式可改写成:

\large p(\pi_{1},\pi_{2},...,\pi_{K})\propto \prod_{i=1}^{K}\pi_{i}^{\lambda-1},(3)

M_{j},j\in[1,K]为各类别的观测数,有:

M_{j}=\sum_{i=1}^{N}I(y_{i}=c_{j}),j\in[1,K],(4)

则根据观测数据对先验分布改进如下:

\large p(\overrightarrow{\pi}|\overrightarrow{M})=\frac{p(\overrightarrow{M}|\overrightarrow{\pi})p(\overrightarrow{\pi})}{p(\overrightarrow{M})},(5)

其中,\large \overrightarrow{\pi}=(\pi_{1},\pi_{2},...,\pi_{K}),\overrightarrow{M}=(M_{1},M_{2},...,M_{K}),又\large p(\overrightarrow{M})是与\large \pi无关的量,故(5)式可写为:

\large p(\overrightarrow{\pi}|\overrightarrow{M})\propto p(\overrightarrow{M}|\overrightarrow{\pi})p(\overrightarrow{\pi}),(6)

\large p(\overrightarrow{M}|\overrightarrow{\pi})服从多项分布,则有:

\large p(\overrightarrow{M}|\overrightarrow{\pi})=\frac{N!}{\prod_{j=1}^{K}M_{j}!}\prod_{j=1}^{K}\pi_{j}^{M_{j}},j\in[1,K],(7)

(7)式可改写成:

\large p(\overrightarrow{M}|\overrightarrow{\pi})\propto \prod_{j=1}^{K}\pi_{j}^{M_{j}},j\in[1,K],(8)

将(3)式和(8)式带入(6)式,可得:

\large p(\overrightarrow{\pi}|\overrightarrow{M})\propto \prod_{j=1}^{K}\pi_{j}^{M_{j}+\lambda-1},(9)

因此得出结论,\large \overrightarrow{\pi}的后验概率\large p(\overrightarrow{\pi}|\overrightarrow{M})服从参数为\large M_{j}+\lambda的Dirichlet分布:

\large \overrightarrow{\pi}的期望有(Dirichlet分布期望公式):

\large E(\overrightarrow{\pi})=(\frac{M_{1}+\lambda}{\sum_{j=1}^{K}(M_{j}+\lambda)},\frac{M_{2}+\lambda}{\sum_{j=1}^{K}(M_{j}+\lambda)},...,\frac{M_{K}+\lambda}{\sum_{j=1}^{K}(M_{j}+\lambda)}),(10)

即有:

\large E(\pi_{j})=\frac{M_{j}+\lambda}{\sum_{j=1}^{K}(M_{j}+\lambda)}\Leftrightarrow p(Y=c_{k})=\frac{\sum_{i=1}^{N}I(y_{i}=c_{k})+\lambda}{N+K\lambda},(11)

故原式得证。


二、第二个公式

p(X^{j}=a_{jl_{j}}|Y=c_{k})=\frac{\sum_{i=1}^{N}I(x_{i}^{j}=a_{jl_{j}},y_{i}=c_{k})+\lambda}{\sum_{i=1}^{N}I(y_{i}=c_{k})+S_{j}\lambda},j\in[1,n],l_{j}\in[1,S_{j}],k\in[1,K],(1)

其中,X_{i}^{j}表示第i个样本的第j维特征值,S_{j}表示第j维特征可取值个数,n表示特征维数,K表示类别数,N为样本数;

证明:

参考第一个公式的证明,设:

p(X^{j}=a_{jl_{j}}|Y=c_{k})=\pi_{l_{j}},l_{j}\in[1,S_{j}],且(\pi_{1},\pi_{2},...,\pi_{S_{j}})服从参数为\lambda的Dirichlet分布(先验分布),则有概率质量函数(即离散变量的概率密度函数)如下:

 

\large p(\pi_{1},\pi_{2},...,\pi_{S_{j}})=\frac{1}{B(\lambda)}\prod_{l_{j}=1}^{S_{j}}\pi_{l_{j}}^{\lambda-1},(2)

(2)是可改写为:

\large p(\pi_{1},\pi_{2},...,\pi_{S_{j}})\propto \prod_{l_{j}=1}^{S_{j}}\pi_{l_{j}}^{\lambda-1},(3)

M_{jl_{j}},j\in[1,n]为第j维度l_{j}种特征值的观测数,有:

M_{jl_j}}=\sum_{i=1}^{N}I(x_{i}^{j}=a_{jl_{j}},y_{i}=c_{k}),j\in[1,n],l_{j}\in[1,S_{j}],(4)

根据观测数据对(3)式进行改进如下:

\large p(\overrightarrow{\pi}|\overrightarrow{M})=\frac{p(\overrightarrow{M}|\overrightarrow{\pi})p(\overrightarrow{\pi})}{p(\overrightarrow{M})},(5)

其中,\large \overrightarrow{\pi}=(\pi_{1},\pi_{2},...,\pi_{S_{j}}),\overrightarrow{M}=(M_{j1},M_{j1},...,M_{jS_{j}}),又\large p(\overrightarrow{M})是与\large \pi无关的量,故(5)式可写为:

\large p(\overrightarrow{\pi}|\overrightarrow{M})\propto p(\overrightarrow{M}|\overrightarrow{\pi})p(\overrightarrow{\pi}),(6)

\large p(\overrightarrow{M}|\overrightarrow{\pi})服从多项分布,则有:

\large p(\overrightarrow{M}|\overrightarrow{\pi})=\frac{\Gamma (\sum_{i=1}^{N}I(y_{i}=c_{k}))}{\prod_{l_{j}=1}^{S_{j}}\Gamma (M_{jl_{j}})}\prod_{l_{j}=1}^{S_{j}}\pi_{l_{j}}^{M_{jl_{j}}},j\in[1,n],l_{j}\in[1,S_{j}],(7)

(7)式可改写为:

\large p(\overrightarrow{M}|\overrightarrow{\pi})\propto \prod_{l_{j}=1}^{S_{j}}\pi_{l_{j}}^{M_{jl_{j}}},j\in[1,n],l_{j}\in[1,S_{j}],(8)

将(3)式和(8)式带入(6)式,则有:

\large p(\overrightarrow{\pi}|\overrightarrow{M})\propto \prod_{l_{j}=1}^{S_{j}}\pi_{l_{j}}^{M_{jl_{j}}+\lambda-1},(9)

因此得出结论,\large \overrightarrow{\pi}的后验概率\large p(\overrightarrow{\pi}|\overrightarrow{M})服从参数为\large M_{jl_{j}}+\lambda的Dirichlet分布:

\large \overrightarrow{\pi}的期望有(Dirichlet分布期望公式):

\large E(\overrightarrow{\pi})=(\frac{M_{i1}+\lambda}{\sum_{l_{j}=1}^{S_{j}}(M_{jl_{j}}+\lambda)},\frac{M_{j2}+\lambda}{\sum_{l_{j}=1}^{S_{j}}(M_{jl_{j}}+\lambda)},...,\frac{M_{jS_{j}}+\lambda}{\sum_{l_{j}=1}^{S_{j}}(M_{jl_{j}}+\lambda)}),(10)

即有:

\large E(\pi_{l_{j}})=\frac{M_{jS_{j}}+\lambda}{\sum_{l_{j}=1}^{S_{j}}(M_{jl_{j}}+\lambda)}\Leftrightarrow p(X^{j}=a_{jl_{j}}|Y=c_{k})=\frac{\sum_{i=1}^{N}I(x_{i}^{j}=a_{jl_{j}},y_{i}=c_{k})+\lambda}{\sum_{i=1}^{N}I(y_{i}=c_{k})+S_{j}\lambda},(11)

于是,原式得证。

你可能感兴趣的:(数学,算法)