今天在看一个问题的时候,无意间看到需要证明:
E [ ∂ 2 l n ( f ( x : θ ) ∂ θ 2 ] = - E { ( ∂ l n f ( x ; θ ) ∂ θ ) 2 } E[\frac{\partial^2 ln(f(x:\theta)}{\partial \theta^2}] = \textbf{-}E\{(\frac{\partial lnf(x;\theta)}{ \partial\theta})^2\} E[∂θ2∂2ln(f(x:θ)]=-E{(∂θ∂lnf(x;θ))2}
结果查着查着,就查到了Fisher信息量的问题,顺便手推了一遍公式,感觉后面会忘记,抽点时间留手稿,打电子版是真浪费时间,每次都做很久的心里暗示(捂脸哭)。
备注:下面均是个人拙见,仅供参考。
我们知道点估计一般主要包含:矩估计和极大似然估计。
矩估计主要思想是:如果总体中有 K个未知参数,可以用前 K阶样本矩估计相应的前k阶总体矩,然后利用未知参数与总体矩的函数关系,求出参数的估计量;
极大似然估计主要思想是已经发生的样本出现概率最大化。
对于已经获取的多个统计量,如何评价其参数估计是好还是坏,该如何选择呢?这里就要用到评价统计量的三大标准:无偏性、有效性、相合性(或一致性)。
下面简单介绍三大性质的主要内容:
设 x 1 , x 2 , … , x n x_1, x_2, \dots,x_n x1,x2,…,xn为取自具有概率密度函数 f ( x ; θ ) , θ ∈ Θ = θ : a < θ < b f(x;\theta), \theta\in\Theta={\theta: a<\thetaf(x;θ),θ∈Θ=θ:a<θ<b的母体 X X X的一个子集, a , b a, b a,b为已知常数, a a a可以取 − ∞ -\infty −∞, b b b可以取 + ∞ +\infty +∞. 又 η = μ ( x 1 , x 2 , … , x n ) \eta=\mu(x_1, x_2, \dots,x_n) η=μ(x1,x2,…,xn)是 g ( θ ) g(\theta) g(θ)的一个无偏估计,且满足正则条件:
(1) 集合 { x : f ( x ; θ ) > 0 } \{x: f(x;\theta)>0\} {x:f(x;θ)>0}与 θ \theta θ无关;
(2) g ′ ( θ ) g^{'}(\theta) g′(θ)与 ∂ f ( x ; θ ) ∂ θ \frac{\partial f(x;\theta)}{\partial\theta} ∂θ∂f(x;θ)存在,且对一切 θ ∈ Θ \theta\in\Theta θ∈Θ,
∂ ∂ θ ∫ f ( x ; θ ) d x = ∫ ∂ f ( x ; θ ) ∂ θ d x \frac{\partial}{\partial\theta}\int f(x;\theta)dx = \int\frac{\partial f(x; \theta)}{\partial\theta}dx ∂θ∂∫f(x;θ)dx=∫∂θ∂f(x;θ)dx
∂ ∂ θ ∫ ∫ ⋯ ∫ μ ( x 1 , x 2 , … , x n ) f ( x 1 ; θ ) f ( x 2 ; θ ) … f ( x n ; θ ) d x 1 d x 2 … d x n = ∫ ∫ ⋯ ∫ μ ( x 1 , x 2 , … , x n ) ∂ ∂ θ [ ∏ i = 1 n f ( x i ; θ ) ] d x 1 d x 2 … d x n \frac{\partial}{\partial\theta}\int\int\dots\int \mu(x_1, x_2, \dots,x_n)f(x_1;\theta)f(x_2;\theta)\dots f(x_n;\theta)dx_1dx_2\dots dx_n = \\ \int\int\dots\int\mu(x_1, x_2, \dots,x_n)\frac{\partial}{\partial\theta}[\prod_{i=1}^nf(x_i;\theta)]dx_1dx_2\dots dx_n ∂θ∂∫∫⋯∫μ(x1,x2,…,xn)f(x1;θ)f(x2;θ)…f(xn;θ)dx1dx2…dxn=∫∫⋯∫μ(x1,x2,…,xn)∂θ∂[i=1∏nf(xi;θ)]dx1dx2…dxn
(3) 令
I ( θ ) = E θ { ( ∂ l n f ( x ; θ ) ∂ θ ) 2 } I(\theta) = E_\theta\{(\frac{\partial lnf(x;\theta)}{ \partial\theta})^2\} I(θ)=Eθ{(∂θ∂lnf(x;θ))2}
成为Fisher信息量,则
D θ η ≥ [ g ′ ( θ ) ] 2 n I ( θ ) D_\theta\eta\geq\frac{[g^{'}(\theta)]^2}{nI(\theta)} Dθη≥nI(θ)[g′(θ)]2
且其等式成立的充要条件为存在一个不依赖于 x 1 , x 2 , … , x n x_1,x_2,\dots,x_n x1,x2,…,xn,但可能依赖于 θ \theta θ的 K K K,使得等式
∑ i = 1 n ∂ l n f ( x i ; θ ) ∂ θ = K ( η − g ( θ ) ) \sum_{i=1}^{n}\frac{\partial lnf(x_i;\theta)}{ \partial\theta} = K(\eta - g(\theta)) i=1∑n∂θ∂lnf(xi;θ)=K(η−g(θ))
以概率1成立.
特别地当 g ( θ ) = θ g(\theta)=\theta g(θ)=θ时,不等式化为
D θ η ≥ 1 n I ( θ ) D_\theta\eta\geq\frac{1}{nI(\theta)} Dθη≥nI(θ)1
证明:
后续待补充
这个重要性质,其实是为了方便计算信息量 I ( θ ) ] I(\theta)] I(θ)]而证明出来的。数学定义为:
若 ∂ ∂ θ ∫ ∂ f ( x ; θ ) ∂ θ d x = ∫ ∂ 2 f ( x ; θ ) ∂ θ 2 d x \frac{\partial}{\partial\theta}\int\frac{\partial f(x;\theta)}{\partial\theta}dx = \int\frac{\partial^2 f(x;\theta)}{\partial\theta^2}dx ∂θ∂∫∂θ∂f(x;θ)dx=∫∂θ2∂2f(x;θ)dx
则:
I ( θ ) = − E [ ∂ 2 l n ( f ( x ; θ ) ∂ θ 2 ] I(\theta) = -E[\frac{\partial^2 ln(f(x;\theta)}{\partial \theta^2}] I(θ)=−E[∂θ2∂2ln(f(x;θ)]
证明:
E [ ∂ l n ( f ( x ; θ ) ∂ θ ] = ∫ 1 f ( x ; θ ) ∗ ∂ f ( x ; θ ) ∂ θ ∗ f ( x ; θ ) d x = ∫ ∂ f ( x ; θ ) ∂ θ d x = ∂ ∂ θ ∫ f ( x ; θ ) d x ‾ = ∂ ∂ θ ∗ 1 = 0 ‾ \begin{aligned} E[\frac{\partial ln(f(x;\theta)}{\partial \theta}] =& \int\frac{1}{f(x;\theta)}*\frac{\partial f(x;\theta)}{\partial\theta}*f(x;\theta)dx\\ =& \int\frac{\partial f(x;\theta)}{\partial\theta}dx\\ =& \underline{\frac{\partial}{\partial\theta}\int f(x;\theta)dx}\\ =& \frac{\partial}{\partial\theta}*1 \\ =& \underline{0} \end{aligned} E[∂θ∂ln(f(x;θ)]=====∫f(x;θ)1∗∂θ∂f(x;θ)∗f(x;θ)dx∫∂θ∂f(x;θ)dx∂θ∂∫f(x;θ)dx∂θ∂∗10
因此有:
∫ ∂ 2 f ( x ; θ ) ∂ θ 2 d x = ∂ ∂ θ ∫ ∂ f ( x ; θ ) ∂ θ d x = 0 \int\frac{\partial^2 f(x;\theta)}{\partial\theta^2}dx = \frac{\partial}{\partial\theta}\int\frac{\partial f(x;\theta)}{\partial\theta}dx = 0 ∫∂θ2∂2f(x;θ)dx=∂θ∂∫∂θ∂f(x;θ)dx=0
由方差定义 V a r ( X ) = E X 2 − ( E X ) 2 Var(X)=EX^2 - (EX)^2 Var(X)=EX2−(EX)2 及 E [ ∂ l n ( f ( x ; θ ) ∂ θ ] = 0 E[\frac{\partial ln(f(x;\theta)}{\partial \theta}]=0 E[∂θ∂ln(f(x;θ)]=0知:
V a r [ ∂ l n ( f ( x ; θ ) ∂ θ ] = E [ ( ∂ l n ( f ( x ; θ ) ∂ θ ) 2 ] − { E [ ∂ l n ( f ( x ; θ ) ∂ θ ] } 2 = E [ ( ∂ l n ( f ( x ; θ ) ∂ θ ) 2 ] \begin{aligned} Var[\frac{\partial ln(f(x;\theta)}{\partial \theta}] =& E[(\frac{\partial ln(f(x;\theta)}{\partial \theta})^2] - \{E[\frac{\partial ln(f(x;\theta)}{\partial \theta}] \}^2 \\ =& E[(\frac{\partial ln(f(x;\theta)}{\partial \theta})^2] \end{aligned} Var[∂θ∂ln(f(x;θ)]==E[(∂θ∂ln(f(x;θ))2]−{E[∂θ∂ln(f(x;θ)]}2E[(∂θ∂ln(f(x;θ))2]
又
E [ ∂ 2 l n ( f ( x ; θ ) ∂ θ 2 ] = ∫ ∂ ∂ θ ( ∂ l n ( f ( x ; θ ) ∂ θ ) f ( x ; θ ) d x = ∫ ∂ ∂ θ ( ∂ f ( x ; θ ) ∂ θ f ( x ; θ ) ) f ( x ; θ ) d x = ∫ ∂ 2 f ( x ; θ ) ∂ θ 2 ∗ f ( x ; θ ) − ∂ f ( x ; θ ) ∂ θ ∗ ∂ f ( x ; θ ) ∂ θ f ( x ; θ ) 2 f ( x ; θ ) d x = ∫ ∂ 2 f ( x ; θ ) ∂ θ 2 d x ‾ − ∫ ( ∂ f ( x ; θ ) ∂ θ f ( x ; θ ) ) 2 f ( x ; θ ) d x = 0 − ∫ ( ∂ l n f ( x ; θ ) ∂ θ ) 2 f ( x ; θ ) d x = − E ( ∂ l n f ( x ; θ ) ∂ θ ) 2 \begin{aligned} E[\frac{\partial^2 ln(f(x;\theta)}{\partial \theta^2}] &= \int\frac{\partial}{\partial\theta}(\frac{\partial ln(f(x;\theta)}{\partial \theta})f(x;\theta)dx \\ &= \int\frac{\partial}{\partial\theta} \Big( \frac{ \frac{\partial f(x;\theta)}{\partial\theta} }{f(x;\theta)} \Big) f(x;\theta)dx \\ &= \int\frac{\frac{\partial^2f(x;\theta)}{\partial\theta^2}*f(x;\theta) - \frac{\partial f(x;\theta)}{\partial\theta} *\frac{\partial f(x;\theta)}{\partial\theta}}{f(x;\theta)^2}f(x;\theta)dx \\ &= \underline{\int\frac{\partial^2f(x;\theta)}{\partial\theta^2}dx} - \int\Big(\frac{\frac{\partial f(x;\theta)}{\partial\theta}}{f(x;\theta)}\Big)^2f(x;\theta)dx \\ &= 0 - \int\Big(\frac{\partial lnf(x;\theta)}{\partial\theta}\Big)^2f(x;\theta)dx \\ &= - E\Big(\frac{\partial lnf(x;\theta)}{\partial\theta}\Big)^2 \end{aligned} E[∂θ2∂2ln(f(x;θ)]=∫∂θ∂(∂θ∂ln(f(x;θ))f(x;θ)dx=∫∂θ∂(f(x;θ)∂θ∂f(x;θ))f(x;θ)dx=∫f(x;θ)2∂θ2∂2f(x;θ)∗f(x;θ)−∂θ∂f(x;θ)∗∂θ∂f(x;θ)f(x;θ)dx=∫∂θ2∂2f(x;θ)dx−∫(f(x;θ)∂θ∂f(x;θ))2f(x;θ)dx=0−∫(∂θ∂lnf(x;θ))2f(x;θ)dx=−E(∂θ∂lnf(x;θ))2
再结合 I ( θ ) I(\theta) I(θ)定义,得:
I ( θ ) = E ( ∂ l n f ( x ; θ ) ∂ θ ) 2 = − E [ ∂ 2 l n ( f ( x ; θ ) ∂ θ 2 ] = − V a r [ ∂ l n ( f ( x ; θ ) ∂ θ ] I(\theta) = E\Big(\frac{\partial lnf(x;\theta)}{\partial\theta}\Big)^2 = -E[\frac{\partial^2 ln(f(x;\theta)}{\partial \theta^2}] = -Var[\frac{\partial ln(f(x;\theta)}{\partial \theta}] I(θ)=E(∂θ∂lnf(x;θ))2=−E[∂θ2∂2ln(f(x;θ)]=−Var[∂θ∂ln(f(x;θ)]
信息量的计算方式
根据上述性质,信息量的计算可以借助概率密度函数的对数二阶导获取.
一阶导与二阶导的巧妙
一阶导数的平方的期望 等于 二阶导的期望.
假设 X X X ~ B ( 1 , p ) B(1,p) B(1,p),即X服从两点分布. 其概率密度函数为: f(x;p)={px(1−p)x, x=0,10, 其它 0<p<1
f ( x ; p ) = { p x ( 1 − p ) x , x = 0 , 1 0 , 其 它 0 < p < 1 f(x;p)=\left\{ \begin{aligned} & p^x(1-p)^x, \ x=0,1 \\ & 0, \ \ 其它\\ \end{aligned} \right. \ \ \ 0
于是:
∂ l n f ( x ; p ) ∂ p = ∂ l n [ x p ( 1 − x ) p ] ∂ p = x p − x 1 − p \frac{\partial lnf(x;p)}{\partial p} = \frac{\partial ln[x^p(1-x)^p]}{\partial p} = \frac{x}{p} - \frac{x}{1-p} ∂p∂lnf(x;p)=∂p∂ln[xp(1−x)p]=px−1−px
∂ 2 l n f ( x ; p ) ∂ p 2 = ∂ [ x p − 1 − x 1 − p ] ∂ p = − x p 2 − x ( 1 − p ) 2 \frac{\partial^2 lnf(x;p)}{\partial p^2} = \frac{\partial [\frac{x}{p} - \frac{1-x}{1-p}]}{\partial p} =- \frac{x}{p^2} - \frac{x}{(1-p)^2} ∂p2∂2lnf(x;p)=∂p∂[px−1−p1−x]=−p2x−(1−p)2x
又因:E(X)=p
I ( p ) = E [ − ∂ 2 l n f ( x ; p ) ∂ p 2 ] = E [ x p 2 + x ( 1 − p ) 2 ] = 1 p ( 1 − p ) I(p) = E[-\frac{\partial^2 lnf(x;p)}{\partial p^2}] = E[ \frac{x}{p^2} + \frac{x}{(1-p)^2}]=\frac{1}{p(1-p)} I(p)=E[−∂p2∂2lnf(x;p)]=E[p2x+(1−p)2x]=p(1−p)1
已知 X X X的无偏估计为: X ˉ \bar{X} Xˉ 且其方差为: p ( 1 − p ) n \frac{p(1-p)}{n} np(1−p)
又
n I ( p ) = p ( 1 − p ) n = V a r ( X ˉ ) nI(p) = \frac{p(1-p)}{n} = Var(\bar{X}) nI(p)=np(1−p)=Var(Xˉ)
从而 X ˉ \bar{X} Xˉ的方差达到了Cramer-Rao下界.