本文地址:blog.lucien.ink/archives/500
数据 x 1 , … , x n x_1, \dots, x_n x1,…,xn 来自正态分布 N ( μ , σ 2 ) N(\mu, \sigma ^ 2) N(μ,σ2),其中 σ 2 \sigma ^ 2 σ2 已知。
正态分布概率密度函数为 f ( x ) = 1 σ 2 π e − ( x − μ ) 2 2 σ 2 { f(x) = { \frac { 1 }{ \sigma { \sqrt { 2 \pi } } } } e ^{ - { \frac {( x - \mu )^{ 2 } }{ 2 \sigma^{ 2 } } } } } f(x)=σ2π1e−2σ2(x−μ)2,则
L ( μ ) = ∏ i = 1 n f ( x i ) = ( 1 σ 2 π ) n ⋅ e − 1 2 σ 2 ∑ i = 1 n ( x i − μ ) 2 ∝ − 1 2 σ 2 ∑ i = 1 n ( x i − μ ) 2 L(\mu) = \prod \limits_{ i = 1 }^{ n } f(x_i) = (\frac { 1 }{ \sigma \sqrt { 2 \pi } }) ^ n \cdot e ^ { - \frac { 1 }{ 2 \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } (x_i - \mu) ^ 2 } \propto - \frac { 1 }{ 2 \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } (x_i - \mu) ^ 2 L(μ)=i=1∏nf(xi)=(σ2π1)n⋅e−2σ21i=1∑n(xi−μ)2∝−2σ21i=1∑n(xi−μ)2
则有 ∂ [ − 1 2 σ 2 ∑ i = 1 n ( x i − μ ) 2 ] ∂ μ = 1 2 σ 2 ∑ i = 1 n 2 ⋅ ( x i − μ ) = 1 σ 2 ∑ i = 1 n ( x i − μ ) \frac{ \partial [- \frac { 1 }{ 2 \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } (x_i - \mu) ^ 2] }{ \partial \mu } = \frac { 1 }{ 2 \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } 2 \cdot (x_i - \mu) = \frac { 1 }{ \sigma ^ 2 } \sum \limits_{ i = 1 }^{ n } (x_i - \mu) ∂μ∂[−2σ21i=1∑n(xi−μ)2]=2σ21i=1∑n2⋅(xi−μ)=σ21i=1∑n(xi−μ)
令 1 σ 2 ∑ i = 1 n ( x i − μ ) = 0 \frac { 1 }{ \sigma ^ 2 } \sum \limits_{ i = 1 }^{ n } (x_i - \mu) = 0 σ21i=1∑n(xi−μ)=0
得 μ ^ = ∑ i = 1 n x i n = x ˉ \widehat \mu = \frac { \sum \limits_{ i = 1 }^{ n } x_i }{ n } = \bar x μ =ni=1∑nxi=xˉ
大佬说这一问严格来讲是求最大后验概率估计
P ( μ ) = 1 τ 2 π e − μ 2 2 τ 2 P(\mu) = { \frac { 1 }{ \tau { \sqrt { 2 \pi } } } } e^{ - { \frac { \mu ^ 2 }{ 2 \tau ^ 2 } } } P(μ)=τ2π1e−2τ2μ2
P ( μ ∣ x 1 , … , x n ) = P ( μ ) ⋅ P ( x 1 , … , x n ∣ μ ) P ( x 1 , … , x n ) = P ( μ ) ⋅ ∏ i = 1 n P ( x i ∣ μ ) ∫ P ( μ , x 1 , … , x n ) d μ P(\mu | x_1, \dots, x_n) = \frac{ P(\mu) \cdot P( x_1, \dots, x_n | \mu) }{ P(x_1, \dots, x_n) } = \frac{ P(\mu) \cdot \prod \limits_{ i = 1 }^{ n } P(x_i | \mu) }{ \int P(\mu, x_1, \dots, x_n) \mathrm{ d } \mu } P(μ∣x1,…,xn)=P(x1,…,xn)P(μ)⋅P(x1,…,xn∣μ)=∫P(μ,x1,…,xn)dμP(μ)⋅i=1∏nP(xi∣μ) ∝ P ( μ ) ⋅ ∏ i = 1 n P ( x i ∣ μ ) = 1 τ 2 π ⋅ e − μ 2 2 τ 2 ⋅ ( 1 σ 2 π ) n ⋅ e − 1 2 σ 2 ∑ i = 1 n ( x i − μ ) 2 \propto P(\mu) \cdot \prod \limits_{ i = 1 }^{ n } P(x_i | \mu) = { \frac { 1 }{ \tau { \sqrt { 2 \pi } } } } \cdot e ^{ - { \frac { \mu ^ 2 }{ 2 \tau ^ 2 } } } \cdot (\frac { 1 }{ \sigma \sqrt { 2 \pi } }) ^ n \cdot e ^ { - \frac { 1 }{ 2 \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } (x_i - \mu) ^ 2 } ∝P(μ)⋅i=1∏nP(xi∣μ)=τ2π1⋅e−2τ2μ2⋅(σ2π1)n⋅e−2σ21i=1∑n(xi−μ)2
取对数得 ln ( 1 τ 2 π ⋅ e − μ 2 2 τ 2 ⋅ ( 1 σ 2 π ) n ⋅ e − 1 2 σ 2 ∑ i = 1 n ( x i − μ ) 2 ) \ln({ \frac { 1 }{ \tau { \sqrt { 2 \pi } } } } \cdot e ^{ - { \frac { \mu ^ 2 }{ 2 \tau ^ 2 } } } \cdot (\frac { 1 }{ \sigma \sqrt { 2 \pi } }) ^ n \cdot e ^ { - \frac { 1 }{ 2 \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } (x_i - \mu) ^ 2 }) ln(τ2π1⋅e−2τ2μ2⋅(σ2π1)n⋅e−2σ21i=1∑n(xi−μ)2) = ln 1 τ 2 π − μ 2 2 τ 2 + n ⋅ ln 1 σ 2 π − 1 2 σ 2 ∑ i = 1 n ( x i − μ ) 2 = \ln{ \frac { 1 }{ \tau { \sqrt { 2 \pi } } } } - { \frac { \mu ^ 2 }{ 2 \tau ^ 2 } } + n \cdot \ln \frac { 1 }{ \sigma \sqrt { 2 \pi } } - \frac { 1 }{ 2 \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } (x_i - \mu) ^ 2 =lnτ2π1−2τ2μ2+n⋅lnσ2π1−2σ21i=1∑n(xi−μ)2 ∝ − μ 2 2 τ 2 − 1 2 σ 2 ∑ i = 1 n ( x i − μ ) 2 \propto - { \frac { \mu ^ 2 }{ 2 \tau ^ 2 } } - \frac { 1 }{ 2 \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } (x_i - \mu) ^ 2 ∝−2τ2μ2−2σ21i=1∑n(xi−μ)2
则有 ∂ [ − μ 2 2 τ 2 − 1 2 σ 2 ∑ i = 1 n ( x i − μ ) 2 ] ∂ μ = − μ τ 2 + 1 σ 2 ∑ i = 1 n ( x i − μ ) \frac { \partial [- { \frac { \mu ^ 2 }{ 2 \tau ^ 2 } } - \frac { 1 }{ 2 \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } (x_i - \mu) ^ 2 ] }{ \partial \mu } = - { \frac { \mu }{ \tau ^ 2 } } + \frac { 1 }{ \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } (x_i - \mu) ∂μ∂[−2τ2μ2−2σ21i=1∑n(xi−μ)2]=−τ2μ+σ21i=1∑n(xi−μ)
令 − μ τ 2 + 1 σ 2 ∑ i = 1 n ( x i − μ ) = 0 - { \frac { \mu }{ \tau ^ 2 } } + \frac { 1 }{ \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } (x_i - \mu) = 0 −τ2μ+σ21i=1∑n(xi−μ)=0
则有 1 σ 2 ∑ i = 1 n x i − n σ 2 μ = μ τ 2 \frac { 1 }{ \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } x_i - \frac { n }{ \sigma ^ 2 } \mu = \frac { \mu }{ \tau ^ 2} σ21i=1∑nxi−σ2nμ=τ2μ ⇒ 1 σ 2 ∑ i = 1 n x i = ( 1 τ 2 + n σ 2 ) ⋅ μ = σ 2 + n τ 2 τ 2 σ 2 μ \Rightarrow \frac { 1 }{ \sigma ^ 2 }\sum \limits_{ i = 1 }^{ n } x_i = (\frac { 1 }{ \tau ^ 2} + \frac { n }{ \sigma ^ 2 }) \cdot \mu = \frac { \sigma ^ 2 + n \tau ^ 2 }{ \tau ^ 2 \sigma ^ 2 } \mu ⇒σ21i=1∑nxi=(τ21+σ2n)⋅μ=τ2σ2σ2+nτ2μ
得 μ ^ = τ 2 ∑ i = 1 n x i σ 2 + n τ 2 = ∑ i = 1 n x i σ 2 τ 2 + n \widehat \mu = \frac{ \tau ^ 2 \sum \limits_{ i = 1 }^{ n } x_i }{ \sigma ^ 2 + n \tau ^ 2 } = \frac{ \sum \limits_{ i = 1 }^{ n } x_i }{ \frac{ \sigma ^ 2 }{ \tau ^ 2 } + n } μ =σ2+nτ2τ2i=1∑nxi=τ2σ2+ni=1∑nxi
考虑这样一种情况,总共有 1000 1000 1000 个随机数字,每次有放回从中抽出 10 10 10 个数字,抽 100 100 100 次,就有 100 100 100 个 μ \mu μ,这些 μ \mu μ 服从同一种且拥有相同参数的分布。
μ \mu μ 取某个值发生的概率。