本文分析为什么样本方差要除以n-1
举例一个应用场景,例如想要知道全市高中生数学成绩的平均分和方差,全市共有N个高中生,想把所有学生的成绩都统计出来比较难,所以我们只在其中取n个学生的成绩,用这n个学生成绩的平均分和方差来估计全市N个学生的平均分和方差,并希望尽量估计的准确。
首先明确几个定义:
μ = 1 N ∑ i = 1 N X i \mu=\frac{1}{N}\sum\limits_{i=1}^N X_i μ=N1i=1∑NXi:总体均值,未知的( N N N:总体个数)
X ˉ = 1 n ∑ i = 1 n X i \bar{X}=\frac{1}{n} \sum\limits_{i=1}^n X_i Xˉ=n1i=1∑nXi:样本均值( n → N n\rightarrow N n→N时, X ˉ = μ \bar{X}=\mu Xˉ=μ; n n n:样本个数)
σ 2 = 1 N ∑ i = 1 N ( X i − μ ) 2 \sigma^2=\frac{1}{N}\sum\limits_{i=1}^N (X_i-\mu)^2 σ2=N1i=1∑N(Xi−μ)2:总体方差,注意这里减的是 μ \mu μ
S 2 S^2 S2:样本方差,有无偏估计和有偏估计两种形式
{ S 2 = 1 n ∑ i = 1 n ( X i − X ˉ ) 2 , 有偏估计 S 2 = 1 n − 1 ∑ i = 1 n ( X i − X ˉ ) 2 , 无偏估计 \begin{cases} S^2=\frac{1}{n} \sum\limits_{i=1}^n (X_i-\bar{X})^2, & \text{有偏估计} \\ S^2=\frac{1}{n-1} \sum\limits_{i=1}^n (X_i-\bar{X})^2, & \text{无偏估计} \\ \end{cases} ⎩ ⎨ ⎧S2=n1i=1∑n(Xi−Xˉ)2,S2=n−11i=1∑n(Xi−Xˉ)2,有偏估计无偏估计
我们希望样本方差等于总体方差,也就是样本方差的期望等于总体方差,即 E ( S 2 ) = σ 2 E(S^2)=\sigma^2 E(S2)=σ2,取有偏估计的公式来计算:
E ( S 2 ) = E [ 1 n ∑ i = 1 n ( X i − X ˉ ) 2 ] = 1 n E [ ∑ i = 1 n ( X i − μ + μ − X ˉ ) 2 ] = 1 n E ∑ i = 1 n [ ( X i − μ ) − ( X ˉ − μ ) ] 2 = 1 n E ∑ i = 1 n [ ( X i − μ ) 2 + ( X ˉ − μ ) 2 − 2 ( X i − μ ) ( X ˉ − μ ) ] = 1 n E [ ∑ i = 1 n ( X i − μ ) 2 + ∑ i = 1 n ( X ˉ − μ ) 2 − 2 ∑ i = 1 n ( X i − μ ) ( X ˉ − μ ) ] 1 ◯ = 1 n E [ ∑ i = 1 n ( X i − μ ) 2 + n ( X ˉ − μ ) 2 − 2 n ( X ˉ − μ ) 2 ] 2 ◯ = 1 n E [ ∑ i = 1 n ( X i − μ ) 2 − n ( X ˉ − μ ) 2 ] = 1 n E ∑ i = 1 n ( X i − μ ) 2 − E ( X ˉ − μ ) 2 3 ◯ = D ( X ) − 1 n D ( X ) 4 ◯ = n − 1 n σ 2 E(S^2) =E[\frac{1}{n} \sum\limits_{i=1}^n (X_i-\bar{X})^2]\\ = \frac{1}{n}E[ \sum\limits_{i=1}^n (X_i-\mu+\mu-\bar{X})^2]\\ = \frac{1}{n}E\sum\limits_{i=1}^n [(X_i-\mu)-(\bar{X}-\mu)]^2\\ = \frac{1}{n}E\sum\limits_{i=1}^n [(X_i-\mu)^2+(\bar{X}-\mu)^2-2(X_i-\mu)(\bar{X}-\mu)]\\ = \frac{1}{n}E [\sum\limits_{i=1}^n(X_i-\mu)^2+\sum\limits_{i=1}^n(\bar{X}-\mu)^2-2\sum\limits_{i=1}^n(X_i-\mu)(\bar{X}-\mu)] \textcircled{\scriptsize{1}}\\ = \frac{1}{n}E[\sum\limits_{i=1}^n (X_i-\mu)^2+n(\bar{X}-\mu)^2-2n(\bar{X}-\mu)^2]\textcircled{\scriptsize{2}}\\ = \frac{1}{n}E[\sum\limits_{i=1}^n (X_i-\mu)^2-n(\bar{X}-\mu)^2]\\ = \frac{1}{n}E\sum\limits_{i=1}^n (X_i-\mu)^2-E(\bar{X}-\mu)^2\textcircled{\scriptsize{3}}\\ = D(X)-\frac{1}{n}D(X)\textcircled{\scriptsize{4}}\\ = \frac{n-1}{n}\sigma^2 E(S2)=E[n1i=1∑n(Xi−Xˉ)2]=n1E[i=1∑n(Xi−μ+μ−Xˉ)2]=n1Ei=1∑n[(Xi−μ)−(Xˉ−μ)]2=n1Ei=1∑n[(Xi−μ)2+(Xˉ−μ)2−2(Xi−μ)(Xˉ−μ)]=n1E[i=1∑n(Xi−μ)2+i=1∑n(Xˉ−μ)2−2i=1∑n(Xi−μ)(Xˉ−μ)]1◯=n1E[i=1∑n(Xi−μ)2+n(Xˉ−μ)2−2n(Xˉ−μ)2]2◯=n1E[i=1∑n(Xi−μ)2−n(Xˉ−μ)2]=n1Ei=1∑n(Xi−μ)2−E(Xˉ−μ)23◯=D(X)−n1D(X)4◯=nn−1σ2
解释1: 1 ◯ \textcircled{\scriptsize{1}} 1◯到 2 ◯ \textcircled{\scriptsize{2}} 2◯的推导
∑ i = 1 n ( X i − μ ) ( X ˉ − μ ) = ( X ˉ − μ ) ∑ i = 1 n ( X i − μ ) \sum\limits_{i=1}^n(X_i-\mu)(\bar{X}-\mu)=(\bar{X}-\mu)\sum\limits_{i=1}^n(X_i-\mu) i=1∑n(Xi−μ)(Xˉ−μ)=(Xˉ−μ)i=1∑n(Xi−μ),且
∑ i = 1 n ( X i − μ ) = ∑ i = 1 n ( X ˉ − μ ) \sum\limits_{i=1}^n(X_i-\mu)=\sum\limits_{i=1}^n(\bar{X}-\mu) i=1∑n(Xi−μ)=i=1∑n(Xˉ−μ)
可举例,如样本1,2,3,4,5,其中假设总体均值 μ = 1 \mu=1 μ=1,样本均值 X ˉ = 3 \bar{X}=3 Xˉ=3
∑ i = 1 n ( X i − μ ) = 0 + 1 + 2 + 3 + 4 = 10 \sum\limits_{i=1}^n(X_i-\mu)=0+1+2+3+4=10 i=1∑n(Xi−μ)=0+1+2+3+4=10
∑ i = 1 n ( X ˉ − μ ) = 2 + 2 + 2 + 2 + 2 = 10 \sum\limits_{i=1}^n(\bar{X}-\mu)=2+2+2+2+2=10 i=1∑n(Xˉ−μ)=2+2+2+2+2=10
解释2: 3 ◯ \textcircled{\scriptsize{3}} 3◯到 4 ◯ \textcircled{\scriptsize{4}} 4◯的推导
D ( X ) = 1 N ∑ i = 1 N ( X i − μ ) 2 = 1 n E ∑ i = 1 n ( X i − μ ) 2 = σ 2 D(X)=\frac{1}{N}\sum\limits_{i=1}^N (X_i-\mu)^2=\frac{1}{n}E\sum\limits_{i=1}^n (X_i-\mu)^2=\sigma^2 D(X)=N1i=1∑N(Xi−μ)2=n1Ei=1∑n(Xi−μ)2=σ2,减的是 μ \mu μ,代表样本方差的期望值是总体方差
E ( X ˉ − μ ) 2 = E ( X ˉ − E ( X ˉ ) ) 2 = D ( X ˉ ) = D ( 1 n ∑ i = 1 n X i ) = 1 n 2 ∑ i = 1 n D ( X i ) = 1 n D ( X ) = 1 n σ 2 E(\bar{X}-\mu)^2\\ =E(\bar{X}-E(\bar{X}))^2\\ =D(\bar{X})\\ =D(\frac{1}{n} \sum\limits_{i=1}^n X_i)\\ =\frac{1}{n ^ 2} \sum\limits_{i=1}^n D(X_i)\\ =\frac{1}{n}D(X)\\ =\frac{1}{n}\sigma^2 E(Xˉ−μ)2=E(Xˉ−E(Xˉ))2=D(Xˉ)=D(n1i=1∑nXi)=n21i=1∑nD(Xi)=n1D(X)=n1σ2
可见,除非 n → ∞ n\rightarrow\infty n→∞,否则就差一个 n n − 1 \frac{n}{n-1} n−1n的倍数,所以要对 S 2 S^2 S2进行补偿,故引出新的无偏估计:
S 2 = n n − 1 1 n ∑ i = 1 n ( X i − X ˉ ) 2 = 1 n − 1 ∑ i = 1 n ( X i − X ˉ ) 2 S^2=\frac{n}{n-1} \frac{1}{n} \sum\limits_{i=1}^n (X_i-\bar{X})^2= \frac{1}{n-1} \sum\limits_{i=1}^n (X_i-\bar{X})^2 S2=n−1nn1i=1∑n(Xi−Xˉ)2=n−11i=1∑n(Xi−Xˉ)2
自由度:在这里经常会听到自由度的概念,可以理解为线性无关的量。
在样本中,已知样本均值和前n-1个样本值,就可以计算出第n个样本的值,可见最后一个样本与前n-1个样本线性相关,故自由度为n-1。
而如果已经总体均值 μ \mu μ 和前 n − 1 n-1 n−1 个样本值,无法计算出第 n n n 个样本的值,故在 D ( x ) = 1 n E ∑ i = 1 n ( X i − μ ) 2 = σ 2 D(x)=\frac{1}{n}E\sum\limits_{i=1}^n(X_i-\mu)^2=\sigma^2 D(x)=n1Ei=1∑n(Xi−μ)2=σ2中除的是 n n n
~~~~~~~~~~~~~~~分割线~~~~~~~~~~~~~~~~
额外记录向量方差
设向量 x = [ x 1 x 2 . . . x n ] x=\begin{bmatrix} x_1\\ x_2 \\ ... \\ x_n\\ \end{bmatrix} x=⎣ ⎡x1x2...xn⎦ ⎤, E ( x ) = [ E ( x 1 ) E ( x 2 ) . . . E ( x n ) ] E(x)=\begin{bmatrix} E(x_1)\\ E(x_2) \\ ... \\ E(x_n)\\ \end{bmatrix} E(x)=⎣ ⎡E(x1)E(x2)...E(xn)⎦ ⎤,
V a r ( x ) = E [ ( x − μ ) ( x − μ ) T ] = [ v a r ( x 1 ) c o v ( x 1 , x 2 ) ⋯ c o v ( x 1 , x n ) c o v ( x 2 , x 1 ) c o v ( x 2 , x 2 ) ⋯ c o v ( x 2 , x n ) ⋮ ⋮ ⋱ ⋮ c o v ( x n , x 1 ) c o v ( x n , x 2 ) ⋯ c o v ( x n , x n ) ] Var(x)=E[(x-\mu)(x-\mu)^T]\\ =\begin{bmatrix} var(x_1) & cov(x_1,x_2) & \cdots & cov(x_1, x_n)\\ cov(x_2,x_1) & cov(x_2,x_2) & \cdots & cov(x_2, x_n) \\ \vdots & \vdots & \ddots & \vdots \\ cov(x_n,x_1) & cov(x_n,x_2) & \cdots & cov(x_n,x_n)\\ \end{bmatrix} Var(x)=E[(x−μ)(x−μ)T]=⎣ ⎡var(x1)cov(x2,x1)⋮cov(xn,x1)cov(x1,x2)cov(x2,x2)⋮cov(xn,x2)⋯⋯⋱⋯cov(x1,xn)cov(x2,xn)⋮cov(xn,xn)⎦ ⎤
V a r ( A x ) = E [ ( A x − A μ ) ( A x − A μ ) T ] = E [ A ( x − μ ) ( x − μ ) T A T ] = A E [ ( x − μ ) ( x − μ ) T ] A T = A V a r ( x ) A T Var(Ax)=E[(Ax-A\mu)(Ax-A\mu)^T]\\ =E[A(x-\mu)(x-\mu)^TA^T]\\ =AE[(x-\mu)(x-\mu)^T]A^T\\ =AVar(x)A^T Var(Ax)=E[(Ax−Aμ)(Ax−Aμ)T]=E[A(x−μ)(x−μ)TAT]=AE[(x−μ)(x−μ)T]AT=AVar(x)AT