由贝叶斯定理,我们知道
p ( θ ∣ x ) ∝ p ( x ∣ θ ) p ( θ ) p(\theta|x)\varpropto p(x|\theta)p(\theta) p(θ∣x)∝p(x∣θ)p(θ)
如果 Θ \Theta Θ的先验分布 p ( θ ) p(\theta) p(θ)和后验分布 p ( θ ∣ x ) p(\theta|x) p(θ∣x)属于同一分布族,那么就称先验分布 p ( θ ) p(\theta) p(θ)和后验分布 p ( θ ∣ x ) p(\theta|x) p(θ∣x)为共轭分布,同时,也称 p ( θ ) p(\theta) p(θ)为似然函数 p ( x ∣ θ ) p(x|\theta) p(x∣θ)的共轭先验分布。
在 n n n次独立重复试验中,每次试验结果只有两种,发生和不发生,发生概率为 p p p, n n n次试验中发生的次数 X X X服从二项分布 X ∼ B ( n , p ) X\sim B(n,p) X∼B(n,p):
P ( X = k ) = C n k p k ( 1 − p ) n − k P(X=k)=C_n^k p^k(1-p)^{n-k} P(X=k)=Cnkpk(1−p)n−k
Beta分布 X ∼ B e ( α , β ) X\sim Be(\alpha,\beta) X∼Be(α,β):
f ( x ) = 1 B ( α , β ) x α − 1 ( 1 − x ) β − 1 , x ∈ [ 0 , 1 ] , α , β > 0 f(x) = \frac{1}{B(\alpha,\beta)} x^{\alpha-1}(1-x)^{\beta-1},\quad x\in[0,1],\alpha,\beta>0 f(x)=B(α,β)1xα−1(1−x)β−1,x∈[0,1],α,β>0
1 B ( α , β ) = Γ ( α + β ) Γ ( α ) + Γ ( β ) , Γ ( z ) = ∫ 0 ∞ t z − 1 e − t d t \frac{1}{B(\alpha,\beta)} =\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)+\Gamma(\beta)},\quad \Gamma(z)=\int_0^\infty t^{z-1}e^{-t}dt B(α,β)1=Γ(α)+Γ(β)Γ(α+β),Γ(z)=∫0∞tz−1e−tdt
Γ ( z + 1 ) = z Γ ( z ) , Γ ( 1 ) = 1 \Gamma(z+1)=z\Gamma(z), \Gamma(1)=1 Γ(z+1)=zΓ(z),Γ(1)=1
Beta分布的期望:
E [ X ] = ∫ x Γ ( α + β ) Γ ( α ) + Γ ( β ) x α − 1 ( 1 − x ) β − 1 d x = Γ ( α + β ) Γ ( α ) + Γ ( β ) ∫ x α ( 1 − x ) β − 1 d x = Γ ( α + β ) Γ ( α ) + Γ ( β ) Γ ( α + 1 ) + Γ ( β ) Γ ( α + β + 1 ) = α α + β \begin{aligned} E[X]&=\int x \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)+\Gamma(\beta)} x^{\alpha-1}(1-x)^{\beta-1} dx\\ &=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)+\Gamma(\beta)}\int x^\alpha (1-x)^{\beta-1} dx\\ &=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)+\Gamma(\beta)}\frac{\Gamma(\alpha+1)+\Gamma(\beta)}{\Gamma(\alpha+\beta+1)}\\ &=\frac{\alpha}{\alpha+\beta} \end{aligned} E[X]=∫xΓ(α)+Γ(β)Γ(α+β)xα−1(1−x)β−1dx=Γ(α)+Γ(β)Γ(α+β)∫xα(1−x)β−1dx=Γ(α)+Γ(β)Γ(α+β)Γ(α+β+1)Γ(α+1)+Γ(β)=α+βα
假设先验分布 Θ ∼ B e ( α , β ) \Theta\sim Be(\alpha,\beta) Θ∼Be(α,β):
p ( θ ) = 1 B ( α , β ) θ α − 1 ( 1 − θ ) β − 1 p(\theta)=\frac{1}{B(\alpha,\beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1} p(θ)=B(α,β)1θα−1(1−θ)β−1
似然函数 X ∣ Θ ∼ B ( n , θ ) X|\Theta\sim B(n,\theta) X∣Θ∼B(n,θ):
p ( X = k ∣ Θ = θ ) = C n k θ k ( 1 − θ ) n − k p(X=k|\Theta=\theta)= C_n^k \theta^k(1-\theta)^{n-k} p(X=k∣Θ=θ)=Cnkθk(1−θ)n−k
则后验概率 Θ ∣ X = k ∼ B e ( α + k , β + n − k ) \Theta|X=k\sim Be(\alpha+k,\beta+n-k) Θ∣X=k∼Be(α+k,β+n−k):
p ( X = k ∣ Θ = θ ) p ( θ ) = C n k θ k ( 1 − θ ) n − k 1 B ( α , β ) θ α − 1 ( 1 − θ ) β − 1 = C n k Γ ( α + β ) Γ ( α ) + Γ ( β ) θ α + k − 1 ( 1 − θ ) β + n − k − 1 = C θ α + k − 1 ( 1 − θ ) β + n − k − 1 \begin{aligned} p(X=k|\Theta=\theta)p(\theta)&=C_n^k \theta^k(1-\theta)^{n-k} \frac{1}{B(\alpha,\beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1}\\ &=C_n^k \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)+\Gamma(\beta)}\theta^{\alpha+k-1}(1-\theta)^{\beta+n-k-1}\\ &=C\theta^{\alpha+k-1}(1-\theta)^{\beta+n-k-1} \end{aligned} p(X=k∣Θ=θ)p(θ)=Cnkθk(1−θ)n−kB(α,β)1θα−1(1−θ)β−1=CnkΓ(α)+Γ(β)Γ(α+β)θα+k−1(1−θ)β+n−k−1=Cθα+k−1(1−θ)β+n−k−1
p ( X = k ) = ∫ p ( X = k ∣ Θ = θ ) p ( θ ) d θ = C ∫ θ α + k − 1 ( 1 − θ ) β + n − k − 1 d θ = C B ( α + k , β + n − k ) \begin{aligned} p(X=k)&=\int p(X=k|\Theta=\theta)p(\theta)d\theta\\ &=C\int\theta^{\alpha+k-1}(1-\theta)^{\beta+n-k-1}d\theta\\ &=C B(\alpha+k,\beta+n-k) \end{aligned} p(X=k)=∫p(X=k∣Θ=θ)p(θ)dθ=C∫θα+k−1(1−θ)β+n−k−1dθ=CB(α+k,β+n−k)
p ( θ ∣ X = k ) = p ( X = k ∣ Θ = θ ) p ( θ ) p ( X = k ) = 1 B ( α + k , β + n − k ) θ α + k − 1 ( 1 − θ ) β + n − k − 1 p(\theta|X=k)=\frac{p(X=k|\Theta=\theta)p(\theta)}{p(X=k)}= \frac{1}{B(\alpha+k,\beta+n-k)}\theta^{\alpha+k-1}(1-\theta)^{\beta+n-k-1} p(θ∣X=k)=p(X=k)p(X=k∣Θ=θ)p(θ)=B(α+k,β+n−k)1θα+k−1(1−θ)β+n−k−1
在 n n n次独立重复试验中,每次试验结果有 k k k个: A 1 , . . . , A k A_1,...,A_k A1,...,Ak,每个结果出现的概率为 p 1 , . . . , p k p_1,...,p_k p1,...,pk, n n n次独立重复试验中每个结果出现的次数 X 1 , . . . , X k X_1,...,X_k X1,...,Xk服从多项式分布 X ∼ m u l t i ( X 1 , . . . , X k ; p 1 , . . . , p k ) X\sim multi(X_1,...,X_k;p_1,...,p_k) X∼multi(X1,...,Xk;p1,...,pk):
P ( X 1 = n 1 , . . . , X k = n k ) = n ! n 1 ! . . . n k ! ∏ i = 1 k p i n i P(X_1=n_1,...,X_k=n_k)=\frac{n!}{n_1!...n_k!}\prod_{i=1}^kp_i^{n_i} P(X1=n1,...,Xk=nk)=n1!...nk!n!i=1∏kpini
∑ i = 1 k p i = 1 , p i > 0 \sum_{i=1}^kp_i=1,p_i>0 i=1∑kpi=1,pi>0
狄利克雷分布 X ∼ D i r ( X 1 , . . . , X k ; α 1 , . . . , α k ) X\sim Dir(X_1,...,X_k;\alpha_1,...,\alpha_k) X∼Dir(X1,...,Xk;α1,...,αk):
f ( x 1 , . . , x k ) = 1 B ( α 1 , . . . , α k ) ∏ i = 1 k x i α i − 1 f(x_1,..,x_k)=\frac{1}{B(\alpha_1,...,\alpha_k)}\prod_{i=1}^kx_i^{\alpha_i-1} f(x1,..,xk)=B(α1,...,αk)1i=1∏kxiαi−1
B ( α 1 , . . . , α k ) = ∏ i = 1 k Γ ( α i ) Γ ( ∑ i = 1 k α i ) , ∑ i = 1 k x i = 1 , α i > 0 ∀ i B(\alpha_1,...,\alpha_k)=\frac{\prod_{i=1}^k\Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^k\alpha_i)}, \sum_{i=1}^k x_i=1, \alpha_i>0\forall i B(α1,...,αk)=Γ(∑i=1kαi)∏i=1kΓ(αi),i=1∑kxi=1,αi>0∀i
狄利克雷分布的期望:
E [ X j ] = ∫ x j 1 B ( α 1 , . . . , α k ) ∏ i = 1 k x i α i − 1 d x 1 . . . d x k = 1 B ( α 1 , . . . , α k ) ∫ x j α j d x j ∏ i ≠ j k ∫ x i α i − 1 d x i = B ( α 1 , . . . , α j + 1 , . . . , α k ) B ( α 1 , . . . , α j , . . . , α k ) = α j ∑ i = 1 k α i \begin{aligned} E[X_j]&=\int x_j \frac{1}{B(\alpha_1,...,\alpha_k)}\prod_{i=1}^kx_i^{\alpha_i-1} dx_1...dx_k\\ &=\frac{1}{B(\alpha_1,...,\alpha_k)}\int x_j^{\alpha_j}dx_j\prod_{i\neq j}^k\int x_i^{\alpha_i-1}dx_i\\ &=\frac{B(\alpha_1,...,\alpha_j+1,...,\alpha_k)}{B(\alpha_1,...,\alpha_j,...,\alpha_k)}\\ &=\frac{\alpha_j}{\sum_{i=1}^k\alpha_i} \end{aligned} E[Xj]=∫xjB(α1,...,αk)1i=1∏kxiαi−1dx1...dxk=B(α1,...,αk)1∫xjαjdxji=j∏k∫xiαi−1dxi=B(α1,...,αj,...,αk)B(α1,...,αj+1,...,αk)=∑i=1kαiαj
假设先验分布 Θ 1 , . . . , Θ k ∼ D i r ( α 1 , . . . , α k ) \Theta_1,...,\Theta_k\sim Dir(\alpha_1,...,\alpha_k) Θ1,...,Θk∼Dir(α1,...,αk):
p ( θ 1 , . . , θ k ) = 1 B ( α 1 , . . . , α k ) ∏ i = 1 k θ i α i − 1 p(\theta_1,..,\theta_k)=\frac{1}{B(\alpha_1,...,\alpha_k)}\prod_{i=1}^k\theta_i^{\alpha_i-1} p(θ1,..,θk)=B(α1,...,αk)1i=1∏kθiαi−1
似然函数 X 1 , . . . , X k ∣ Θ 1 , . . . , Θ k ∼ m u l t i ( θ 1 , . . , θ k ) X_1,...,X_k|\Theta_1,...,\Theta_k\sim multi(\theta_1,..,\theta_k) X1,...,Xk∣Θ1,...,Θk∼multi(θ1,..,θk):
p ( n 1 , . . . , n k ∣ θ 1 , . . , θ k ) = n ! n 1 ! . . . n k ! ∏ i = 1 k θ i n i p(n_1,...,n_k|\theta_1,..,\theta_k)=\frac{n!}{n_1!...n_k!}\prod_{i=1}^k\theta_i^{n_i} p(n1,...,nk∣θ1,..,θk)=n1!...nk!n!i=1∏kθini
则后验概率 Θ 1 , . . . , Θ k ∣ X 1 = n 1 , . . . , X k = n k ∼ D i r ( α 1 + n 1 , . . . , α k + n k ) \Theta_1,...,\Theta_k|X_1=n_1,...,X_k=n_k\sim Dir(\alpha_1+n_1,...,\alpha_k+n_k) Θ1,...,Θk∣X1=n1,...,Xk=nk∼Dir(α1+n1,...,αk+nk):
p ( n 1 , . . . , n k ∣ θ 1 , . . , θ k ) p ( θ 1 , . . , θ k ) = n ! n 1 ! . . . n k ! ∏ i = 1 k θ i n i ∏ i = 1 k Γ ( α i ) Γ ( ∑ i = 1 k α i ) ∏ i = 1 k θ i α i − 1 = n ! n 1 ! . . . n k ! ∏ i = 1 k Γ ( α i ) Γ ( ∑ i = 1 k α i ) ∏ i = 1 k θ i α i + n i − 1 = C ∏ i = 1 k θ i α i + n i − 1 \begin{aligned} p(n_1,...,n_k|\theta_1,..,\theta_k)p(\theta_1,..,\theta_k)&=\frac{n!}{n_1!...n_k!}\prod_{i=1}^k\theta_i^{n_i}\frac{\prod_{i=1}^k\Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^k\alpha_i)}\prod_{i=1}^k\theta_i^{\alpha_i-1}\\ &=\frac{n!}{n_1!...n_k!}\frac{\prod_{i=1}^k\Gamma(\alpha_i)}{\Gamma(\sum_{i=1}^k\alpha_i)}\prod_{i=1}^k\theta_i^{\alpha_i+n_i-1}\\ &=C\prod_{i=1}^k\theta_i^{\alpha_i+n_i-1} \end{aligned} p(n1,...,nk∣θ1,..,θk)p(θ1,..,θk)=n1!...nk!n!i=1∏kθiniΓ(∑i=1kαi)∏i=1kΓ(αi)i=1∏kθiαi−1=n1!...nk!n!Γ(∑i=1kαi)∏i=1kΓ(αi)i=1∏kθiαi+ni−1=Ci=1∏kθiαi+ni−1
p ( n 1 , . . . , n k ) = ∫ p ( n 1 , . . . , n k ∣ θ 1 , . . , θ k ) p ( θ 1 , . . , θ k ) d θ = C ∫ ∏ i = 1 k θ i α i + n i − 1 d θ = C B ( α 1 + n 1 , . . . , α k + n k ) \begin{aligned} p(n_1,...,n_k)&=\int p(n_1,...,n_k|\theta_1,..,\theta_k)p(\theta_1,..,\theta_k)d\theta\\ &=C\int\prod_{i=1}^k\theta_i^{\alpha_i+n_i-1}d\theta\\ &=C B(\alpha_1+n_1,...,\alpha_k+n_k) \end{aligned} p(n1,...,nk)=∫p(n1,...,nk∣θ1,..,θk)p(θ1,..,θk)dθ=C∫i=1∏kθiαi+ni−1dθ=CB(α1+n1,...,αk+nk)
p ( θ 1 , . . , θ k ∣ n 1 , . . . , n k ) = p ( n 1 , . . . , n k ∣ θ 1 , . . , θ k ) p ( θ 1 , . . , θ k ) p ( n 1 , . . . , n k ) = 1 B ( α 1 + n 1 , . . . , α k + n k ) ∏ i = 1 k θ i α i + n i − 1 \begin{aligned} p(\theta_1,..,\theta_k|n_1,...,n_k)&=\frac{p(n_1,...,n_k|\theta_1,..,\theta_k)p(\theta_1,..,\theta_k)}{p(n_1,...,n_k)}\\ &= \frac{1}{ B(\alpha_1+n_1,...,\alpha_k+n_k)}\prod_{i=1}^k\theta_i^{\alpha_i+n_i-1} \end{aligned} p(θ1,..,θk∣n1,...,nk)=p(n1,...,nk)p(n1,...,nk∣θ1,..,θk)p(θ1,..,θk)=B(α1+n1,...,αk+nk)1i=1∏kθiαi+ni−1