本篇文章来源于知乎上一篇关于正态分布推导的文章,醍醐灌顶,因此记录下笔记
假设有误差概率密度函数 f ( t ) f(t) f(t),现在有 n n n 个独立观测的值 x 1 x_1 x1, x 2 x_2 x2, ⋯ \cdots ⋯, x n x_n xn,假设真值为 μ \mu μ,那么误差为:
ε 1 = x 1 − μ ε 2 = x 2 − μ ⋮ ε n = x n − μ \begin{aligned} \varepsilon_{1} & =x_{1}-\mu \\ \varepsilon_{2} & =x_{2}-\mu \\ & \vdots \\ \varepsilon_{n} & =x_{n}-\mu \end{aligned} ε1ε2εn=x1−μ=x2−μ⋮=xn−μ
根据生活经验,这个误差 ε \varepsilon ε,在做大量的观测下,其大部分的数值应在 0 0 0 附近范围波动,且出现的频数较多。而误差大的观测值,相应的 ∣ ε ∣ |\varepsilon| ∣ε∣ 也应很大,出现的频数也应该较小。做极大似然函数:
L ( μ ) = ∏ i = 1 n f ( ε i ) = f ( x 1 − μ ) f ( x 2 − μ ) ⋯ f ( x n − μ ) \begin{aligned} L(\mu) & =\prod_{i=1}^{n} f\left(\varepsilon_{i}\right) \\ & =f\left(x_{1}-\mu\right) f\left(x_{2}-\mu\right) \cdots f\left(x_{n}-\mu\right) \end{aligned} L(μ)=i=1∏nf(εi)=f(x1−μ)f(x2−μ)⋯f(xn−μ)
对 L ( μ ) L(\mu) L(μ) 取自然对数:
ln [ L ( μ ) ] = ln [ ∏ i = 1 n f ( ε i ) ] = ln [ f ( x 1 − μ ) f ( x 2 − μ ) ⋯ f ( x n − μ ) ] = ln [ f ( x 1 − μ ) ] + ln [ f ( x 2 − μ ) ] + ⋯ + ln [ f ( x n − μ ) ] = ∑ i = 1 n ln [ f ( x i − μ ) ] \begin{aligned} \ln [L(\mu)] & =\ln \left[\prod_{i=1}^{n} f\left(\varepsilon_{i}\right)\right] \\ & =\ln \left[f\left(x_{1}-\mu\right) f\left(x_{2}-\mu\right) \cdots f\left(x_{n}-\mu\right)\right] \\ & =\ln \left[f\left(x_{1}-\mu\right)\right]+\ln \left[f\left(x_{2}-\mu\right)\right]+\cdots+\ln \left[f\left(x_{n}-\mu\right)\right] \\ & =\sum_{i=1}^{n} \ln \left[f\left(x_{i}-\mu\right)\right] \end{aligned} ln[L(μ)]=ln[i=1∏nf(εi)]=ln[f(x1−μ)f(x2−μ)⋯f(xn−μ)]=ln[f(x1−μ)]+ln[f(x2−μ)]+⋯+ln[f(xn−μ)]=i=1∑nln[f(xi−μ)]
为了得到 ln [ L ( μ ) ] \ln [L(\mu)] ln[L(μ)] 的最大值,对其 ln [ L ( μ ) ] \ln [L(\mu)] ln[L(μ)] 求偏导并令其等于 0 0 0
∂ ln [ L ( μ ) ] ∂ μ = ∂ ∑ i = 1 n ln [ f ( x i − μ ) ] ∂ μ = − ∑ i = 1 n f ′ ( x i − μ ) f ( x i − μ ) = 0 \begin{aligned} \frac{\partial \ln [L(\mu)]}{\partial \mu} & =\frac{\partial \sum_{i=1}^{n} \ln \left[f\left(x_{i}-\mu\right)\right]}{\partial \mu} \\ & =-\sum_{i=1}^{n} \frac{f^{\prime}\left(x_{i}-\mu\right)}{f\left(x_{i}-\mu\right)} \\ & =0 \end{aligned} ∂μ∂ln[L(μ)]=∂μ∂∑i=1nln[f(xi−μ)]=−i=1∑nf(xi−μ)f′(xi−μ)=0
令 g ( t ) = f ′ ( t ) f ( t ) g(t)=\frac{f^{\prime}(t)}{f(t)} g(t)=f(t)f′(t),则上述式子变成:
∑ i = 1 n g ( x i − μ ) = 0 \sum_{i=1}^{n} g\left(x_{i}-\mu\right)=0 i=1∑ng(xi−μ)=0
到了这一步后,精彩的部分就开始来了,这也是高斯的高明之处,他认为 μ \mu μ 的无偏估计应为 x ˉ \bar{x} xˉ,则原式子变为
∑ i = 1 n g ( x i − x ˉ ) = 0 \sum_{i=1}^{n} g\left(x_{i}-\bar{x}\right)=0 i=1∑ng(xi−xˉ)=0
其中,
x ˉ = 1 n ∑ i = 1 n x i \bar{x}=\frac{1}{n} \sum_{i=1}^{n} x_{i} xˉ=n1i=1∑nxi
解上述方程,对每个 x i x_i xi 求偏导,比如对 x 1 x_1 x1 求偏导,可得如下方程:
∂ ∑ i = 1 n g ( x i − x ˉ ) ∂ x 1 = ∂ ∑ i = 1 n g ( x i − 1 n ∑ i = 1 n x i ) ∂ x 1 = g ′ ( x 1 − x ˉ ) ( 1 − 1 n ) + g ′ ( x 2 − x ˉ ) ( − 1 n ) + ⋯ + g ′ ( x n − x ˉ ) ( − 1 n ) = 0 \begin{aligned} \frac{\partial \sum_{i=1}^{n} g\left(x_{i}-\bar{x}\right)}{\partial x_{1}} & =\frac{\partial \sum_{i=1}^{n} g\left(x_{i}-\frac{1}{n} \sum_{i=1}^{n} x_{i}\right)}{\partial x_{1}} \\ & =g^{\prime}\left(x_{1}-\bar{x}\right)\left(1-\frac{1}{n}\right)+g^{\prime}\left(x_{2}-\bar{x}\right)\left(-\frac{1}{n}\right)+\cdots+g^{\prime}\left(x_{n}-\bar{x}\right)\left(-\frac{1}{n}\right) \\ & =0 \end{aligned} ∂x1∂∑i=1ng(xi−xˉ)=∂x1∂∑i=1ng(xi−n1∑i=1nxi)=g′(x1−xˉ)(1−n1)+g′(x2−xˉ)(−n1)+⋯+g′(xn−xˉ)(−n1)=0
将 g ′ ( x i − x ˉ ) g^{\prime}\left(x_{i}-\bar{x}\right) g′(xi−xˉ) 看做未知数,把上述 个齐次线性方程组写成矩阵方程 A x = 0 \boldsymbol{A x}=\mathbf{0} Ax=0 的形式:
( 1 − 1 n − 1 n ⋯ − 1 n − 1 n 1 − 1 n ⋯ − 1 n ⋮ ⋮ ⋮ ⋮ − 1 n − 1 n − 1 n 1 − 1 n ) ( g ′ ( x 1 − x ˉ ) g ′ ( x 2 − x ˉ ) ⋮ g ′ ( x n − x ˉ ) ) = ( 0 0 ⋮ 0 ) \left(\begin{array}{cccc} 1-\frac{1}{n} & -\frac{1}{n} & \cdots & -\frac{1}{n} \\ -\frac{1}{n} & 1-\frac{1}{n} & \cdots & -\frac{1}{n} \\ \vdots & \vdots & \vdots & \vdots \\ -\frac{1}{n} & -\frac{1}{n} & -\frac{1}{n} & 1-\frac{1}{n} \end{array}\right)\left(\begin{array}{c} g^{\prime}\left(x_{1}-\bar{x}\right) \\ g^{\prime}\left(x_{2}-\bar{x}\right) \\ \vdots \\ g^{\prime}\left(x_{n}-\bar{x}\right) \end{array}\right)=\left(\begin{array}{c} 0 \\ 0 \\ \vdots \\ 0 \end{array}\right) 1−n1−n1⋮−n1−n11−n1⋮−n1⋯⋯⋮−n1−n1−n1⋮1−n1 g′(x1−xˉ)g′(x2−xˉ)⋮g′(xn−xˉ) = 00⋮0
对于上述方程组的系数矩阵 M \mathbf{M} M,将第 1 , 2 , 3 ⋯ , n 1,2,3 \cdots,n 1,2,3⋯,n 行依次加到第 1 1 1 行,可得如下矩阵:
M = ( 1 − 1 n − 1 n ⋯ − 1 n − 1 n 1 − 1 n ⋯ − 1 n ⋮ ⋮ ⋮ ⋮ − 1 n − 1 n − 1 n 1 − 1 n ) → ( 0 0 ⋯ 0 − 1 n 1 − 1 n ⋯ − 1 n ⋮ ⋮ ⋮ ⋮ − 1 n − 1 n − 1 n 1 − 1 n ) \boldsymbol{M}=\left(\begin{array}{cccc} 1-\frac{1}{n} & -\frac{1}{n} & \cdots & -\frac{1}{n} \\ -\frac{1}{n} & 1-\frac{1}{n} & \cdots & -\frac{1}{n} \\ \vdots & \vdots & \vdots & \vdots \\ -\frac{1}{n} & -\frac{1}{n} & -\frac{1}{n} & 1-\frac{1}{n} \end{array}\right) \rightarrow\left(\begin{array}{cccc} 0 & 0 & \cdots & 0 \\ -\frac{1}{n} & 1-\frac{1}{n} & \cdots & -\frac{1}{n} \\ \vdots & \vdots & \vdots & \vdots \\ -\frac{1}{n} & -\frac{1}{n} & -\frac{1}{n} & 1-\frac{1}{n} \end{array}\right) M= 1−n1−n1⋮−n1−n11−n1⋮−n1⋯⋯⋮−n1−n1−n1⋮1−n1 → 0−n1⋮−n101−n1⋮−n1⋯⋯⋮−n10−n1⋮1−n1
第一行全为0,那么 det M = 0 \det{M}=0 detM=0,这只能说明方程组有无穷多解,具体还要算出 rank ( M ) \operatorname{rank}(\boldsymbol{M}) rank(M)。最终,上述方程组的解可以写为
X = k ( g ′ ( x 1 − x ˉ ) g ′ ( x 2 − x ˉ ) ⋮ g ′ ( x n − x ˉ ) ) = k ( 1 1 ⋮ 1 ) \boldsymbol{X}=k\left(\begin{array}{c} g^{\prime}\left(x_{1}-\bar{x}\right) \\ g^{\prime}\left(x_{2}-\bar{x}\right) \\ \vdots \\ g^{\prime}\left(x_{n}-\bar{x}\right) \end{array}\right)=k\left(\begin{array}{c} 1 \\ 1 \\ \vdots \\ 1 \end{array}\right) X=k g′(x1−xˉ)g′(x2−xˉ)⋮g′(xn−xˉ) =k 11⋮1
即 g ′ ( x 1 − x ˉ ) = g ′ ( x 2 − x ˉ ) = ⋯ = g ′ ( x n − x ˉ ) = k g^{\prime}\left(x_{1}-\bar{x}\right)=g^{\prime}\left(x_{2}-\bar{x}\right)=\cdots=g^{\prime}\left(x_{n}-\bar{x}\right)=k g′(x1−xˉ)=g′(x2−xˉ)=⋯=g′(xn−xˉ)=k,解微分方程,可得:
g ( t ) = k t + b g(t)=k t+b g(t)=kt+b
求解该微分方程:
∫ f ′ ( t ) f ( t ) d t = ∫ k t d t ⇔ ∫ d [ f ( t ) ] f ( t ) = 1 2 k t 2 + c ⇔ ln [ f ( t ) ] = 1 2 k t 2 + c ⇔ f ( t ) = K e 1 2 k t 2 \begin{aligned} \int \frac{f^{\prime}(t)}{f(t)} \mathrm{d} t=\int k t \mathrm{~d} t & \Leftrightarrow \int \frac{\mathrm{d}[f(t)]}{f(t)}=\frac{1}{2} k t^{2}+c \\ & \Leftrightarrow \ln [f(t)]=\frac{1}{2} k t^{2}+c \\ & \Leftrightarrow f(t)=K \mathrm{e}^{\frac{1}{2} k t^{2}} \end{aligned} ∫f(t)f′(t)dt=∫kt dt⇔∫f(t)d[f(t)]=21kt2+c⇔ln[f(t)]=21kt2+c⇔f(t)=Ke21kt2
同时, f ( t ) f(t) f(t) 为概率密度函数,那么其从 − ∞ -\infty −∞ 到 ∞ \infty ∞ 的积分为 1 1 1(概率密度的正则性)
∫ − ∞ + ∞ f ( t ) d t = ∫ − ∞ + ∞ K e 1 2 k t 2 d t = K ∫ − ∞ + ∞ e − t 2 2 σ 2 d t = K 2 σ [ ∫ − ∞ + ∞ e − ( t 2 σ ) 2 d ( 1 2 σ t ) ] [ 2 σ ∫ − ∞ + ∞ e − ( s 2 σ ) 2 d ( 1 2 σ s ) ] = K 2 σ ∫ − ∞ + ∞ ∫ − ∞ + ∞ e − ( u 2 + v 2 ) d u d v = K 2 σ ∫ 0 2 π d θ ∫ 0 + ∞ e − r 2 r d r = K 2 σ π = 1 \begin{aligned} \int_{-\infty}^{+\infty} f(t) \mathrm{d} t & =\int_{-\infty}^{+\infty} K \mathrm{e}^{\frac{1}{2} k t^{2}} \mathrm{~d} t \\ & =K \int_{-\infty}^{+\infty} \mathrm{e}^{-\frac{t^{2}}{2 \sigma^{2}}} \mathrm{~d} t \\ & =K \sqrt{\sqrt{2} \sigma\left[\int_{-\infty}^{+\infty} \mathrm{e}^{-\left(\frac{t}{\sqrt{2} \sigma}\right)^{2}} \mathrm{~d}\left(\frac{1}{\sqrt{2} \sigma} t\right)\right]\left[\sqrt{2} \sigma \int_{-\infty}^{+\infty} \mathrm{e}^{-\left(\frac{s}{\sqrt{2} \sigma}\right)^{2}} \mathrm{~d}\left(\frac{1}{\sqrt{2} \sigma} s\right)\right]} \\ & =K \sqrt{2} \sigma \sqrt{\int_{-\infty}^{+\infty} \int_{-\infty}^{+\infty} \mathrm{e}^{-\left(u^{2}+v^{2}\right)} \mathrm{d} u \mathrm{~d} v} \\ & =K \sqrt{2} \sigma \sqrt{\int_{0}^{2 \pi} \mathrm{d} \theta \int_{0}^{+\infty} \mathrm{e}^{-r^{2}} r \mathrm{~d} r} \\ & =K \sqrt{2} \sigma \sqrt{\pi} \\ & =1 \end{aligned} ∫−∞+∞f(t)dt=∫−∞+∞Ke21kt2 dt=K∫−∞+∞e−2σ2t2 dt=K2σ[∫−∞+∞e−(2σt)2 d(2σ1t)][2σ∫−∞+∞e−(2σs)2 d(2σ1s)]=K2σ∫−∞+∞∫−∞+∞e−(u2+v2)du dv=K2σ∫02πdθ∫0+∞e−r2r dr=K2σπ=1
最终求得概率密度函数:
f ( t ) = 1 2 π σ e − 1 2 ( t σ ) 2 f(t)=\frac{1}{\sqrt{2 \pi} \sigma} \mathrm{e}^{-\frac{1}{2}\left(\frac{t}{\sigma}\right)^{2}} f(t)=2πσ1e−21(σt)2