超详细推导样本方差和总体方差(n-1的由来)

本文分析为什么样本方差要除以n-1

举例一个应用场景,例如想要知道全市高中生数学成绩的平均分和方差,全市共有N个高中生,想把所有学生的成绩都统计出来比较难,所以我们只在其中取n个学生的成绩,用这n个学生成绩的平均分和方差来估计全市N个学生的平均分和方差,并希望尽量估计的准确。

首先明确几个定义:

μ = 1 N ∑ i = 1 N X i \mu=\frac{1}{N}\sum\limits_{i=1}^N X_i μ=N1i=1NXi:总体均值,未知的( N N N:总体个数)

X ˉ = 1 n ∑ i = 1 n X i \bar{X}=\frac{1}{n} \sum\limits_{i=1}^n X_i Xˉ=n1i=1nXi:样本均值( n → N n\rightarrow N nN时, X ˉ = μ \bar{X}=\mu Xˉ=μ n n n:样本个数)

σ 2 = 1 N ∑ i = 1 N ( X i − μ ) 2 \sigma^2=\frac{1}{N}\sum\limits_{i=1}^N (X_i-\mu)^2 σ2=N1i=1N(Xiμ)2:总体方差,注意这里减的是 μ \mu μ

S 2 S^2 S2:样本方差,有无偏估计和有偏估计两种形式

{ S 2 = 1 n ∑ i = 1 n ( X i − X ˉ ) 2 , 有偏估计 S 2 = 1 n − 1 ∑ i = 1 n ( X i − X ˉ ) 2 , 无偏估计 \begin{cases} S^2=\frac{1}{n} \sum\limits_{i=1}^n (X_i-\bar{X})^2, & \text{有偏估计} \\ S^2=\frac{1}{n-1} \sum\limits_{i=1}^n (X_i-\bar{X})^2, & \text{无偏估计} \\ \end{cases} S2=n1i=1n(XiXˉ)2,S2=n11i=1n(XiXˉ)2,有偏估计无偏估计

我们希望样本方差等于总体方差,也就是样本方差的期望等于总体方差,即 E ( S 2 ) = σ 2 E(S^2)=\sigma^2 E(S2)=σ2,取有偏估计的公式来计算:

E ( S 2 ) = E [ 1 n ∑ i = 1 n ( X i − X ˉ ) 2 ] = 1 n E [ ∑ i = 1 n ( X i − μ + μ − X ˉ ) 2 ] = 1 n E ∑ i = 1 n [ ( X i − μ ) − ( X ˉ − μ ) ] 2 = 1 n E ∑ i = 1 n [ ( X i − μ ) 2 + ( X ˉ − μ ) 2 − 2 ( X i − μ ) ( X ˉ − μ ) ] = 1 n E [ ∑ i = 1 n ( X i − μ ) 2 + ∑ i = 1 n ( X ˉ − μ ) 2 − 2 ∑ i = 1 n ( X i − μ ) ( X ˉ − μ ) ] 1 ◯ = 1 n E [ ∑ i = 1 n ( X i − μ ) 2 + n ( X ˉ − μ ) 2 − 2 n ( X ˉ − μ ) 2 ] 2 ◯ = 1 n E [ ∑ i = 1 n ( X i − μ ) 2 − n ( X ˉ − μ ) 2 ] = 1 n E ∑ i = 1 n ( X i − μ ) 2 − E ( X ˉ − μ ) 2 3 ◯ = D ( X ) − 1 n D ( X ) 4 ◯ = n − 1 n σ 2 E(S^2) =E[\frac{1}{n} \sum\limits_{i=1}^n (X_i-\bar{X})^2]\\ = \frac{1}{n}E[ \sum\limits_{i=1}^n (X_i-\mu+\mu-\bar{X})^2]\\ = \frac{1}{n}E\sum\limits_{i=1}^n [(X_i-\mu)-(\bar{X}-\mu)]^2\\ = \frac{1}{n}E\sum\limits_{i=1}^n [(X_i-\mu)^2+(\bar{X}-\mu)^2-2(X_i-\mu)(\bar{X}-\mu)]\\ = \frac{1}{n}E [\sum\limits_{i=1}^n(X_i-\mu)^2+\sum\limits_{i=1}^n(\bar{X}-\mu)^2-2\sum\limits_{i=1}^n(X_i-\mu)(\bar{X}-\mu)] \textcircled{\scriptsize{1}}\\ = \frac{1}{n}E[\sum\limits_{i=1}^n (X_i-\mu)^2+n(\bar{X}-\mu)^2-2n(\bar{X}-\mu)^2]\textcircled{\scriptsize{2}}\\ = \frac{1}{n}E[\sum\limits_{i=1}^n (X_i-\mu)^2-n(\bar{X}-\mu)^2]\\ = \frac{1}{n}E\sum\limits_{i=1}^n (X_i-\mu)^2-E(\bar{X}-\mu)^2\textcircled{\scriptsize{3}}\\ = D(X)-\frac{1}{n}D(X)\textcircled{\scriptsize{4}}\\ = \frac{n-1}{n}\sigma^2 E(S2)=E[n1i=1n(XiXˉ)2]=n1E[i=1n(Xiμ+μXˉ)2]=n1Ei=1n[(Xiμ)(Xˉμ)]2=n1Ei=1n[(Xiμ)2+(Xˉμ)22(Xiμ)(Xˉμ)]=n1E[i=1n(Xiμ)2+i=1n(Xˉμ)22i=1n(Xiμ)(Xˉμ)]1=n1E[i=1n(Xiμ)2+n(Xˉμ)22n(Xˉμ)2]2=n1E[i=1n(Xiμ)2n(Xˉμ)2]=n1Ei=1n(Xiμ)2E(Xˉμ)23=D(X)n1D(X)4=nn1σ2

解释1: 1 ◯ \textcircled{\scriptsize{1}} 1 2 ◯ \textcircled{\scriptsize{2}} 2的推导
∑ i = 1 n ( X i − μ ) ( X ˉ − μ ) = ( X ˉ − μ ) ∑ i = 1 n ( X i − μ ) \sum\limits_{i=1}^n(X_i-\mu)(\bar{X}-\mu)=(\bar{X}-\mu)\sum\limits_{i=1}^n(X_i-\mu) i=1n(Xiμ)(Xˉμ)=(Xˉμ)i=1n(Xiμ),且
∑ i = 1 n ( X i − μ ) = ∑ i = 1 n ( X ˉ − μ ) \sum\limits_{i=1}^n(X_i-\mu)=\sum\limits_{i=1}^n(\bar{X}-\mu) i=1n(Xiμ)=i=1n(Xˉμ)
可举例,如样本1,2,3,4,5,其中假设总体均值 μ = 1 \mu=1 μ=1,样本均值 X ˉ = 3 \bar{X}=3 Xˉ=3
∑ i = 1 n ( X i − μ ) = 0 + 1 + 2 + 3 + 4 = 10 \sum\limits_{i=1}^n(X_i-\mu)=0+1+2+3+4=10 i=1n(Xiμ)=0+1+2+3+4=10
∑ i = 1 n ( X ˉ − μ ) = 2 + 2 + 2 + 2 + 2 = 10 \sum\limits_{i=1}^n(\bar{X}-\mu)=2+2+2+2+2=10 i=1n(Xˉμ)=2+2+2+2+2=10

解释2: 3 ◯ \textcircled{\scriptsize{3}} 3 4 ◯ \textcircled{\scriptsize{4}} 4的推导
D ( X ) = 1 N ∑ i = 1 N ( X i − μ ) 2 = 1 n E ∑ i = 1 n ( X i − μ ) 2 = σ 2 D(X)=\frac{1}{N}\sum\limits_{i=1}^N (X_i-\mu)^2=\frac{1}{n}E\sum\limits_{i=1}^n (X_i-\mu)^2=\sigma^2 D(X)=N1i=1N(Xiμ)2=n1Ei=1n(Xiμ)2=σ2,减的是 μ \mu μ,代表样本方差的期望值是总体方差
E ( X ˉ − μ ) 2 = E ( X ˉ − E ( X ˉ ) ) 2 = D ( X ˉ ) = D ( 1 n ∑ i = 1 n X i ) = 1 n 2 ∑ i = 1 n D ( X i ) = 1 n D ( X ) = 1 n σ 2 E(\bar{X}-\mu)^2\\ =E(\bar{X}-E(\bar{X}))^2\\ =D(\bar{X})\\ =D(\frac{1}{n} \sum\limits_{i=1}^n X_i)\\ =\frac{1}{n ^ 2} \sum\limits_{i=1}^n D(X_i)\\ =\frac{1}{n}D(X)\\ =\frac{1}{n}\sigma^2 E(Xˉμ)2=E(XˉE(Xˉ))2=D(Xˉ)=D(n1i=1nXi)=n21i=1nD(Xi)=n1D(X)=n1σ2

可见,除非 n → ∞ n\rightarrow\infty n,否则就差一个 n n − 1 \frac{n}{n-1} n1n的倍数,所以要对 S 2 S^2 S2进行补偿,故引出新的无偏估计:

S 2 = n n − 1 1 n ∑ i = 1 n ( X i − X ˉ ) 2 = 1 n − 1 ∑ i = 1 n ( X i − X ˉ ) 2 S^2=\frac{n}{n-1} \frac{1}{n} \sum\limits_{i=1}^n (X_i-\bar{X})^2= \frac{1}{n-1} \sum\limits_{i=1}^n (X_i-\bar{X})^2 S2=n1nn1i=1n(XiXˉ)2=n11i=1n(XiXˉ)2

自由度:在这里经常会听到自由度的概念,可以理解为线性无关的量。

在样本中,已知样本均值和前n-1个样本值,就可以计算出第n个样本的值,可见最后一个样本与前n-1个样本线性相关,故自由度为n-1。

而如果已经总体均值 μ \mu μ 和前 n − 1 n-1 n1 个样本值,无法计算出第 n n n 个样本的值,故在 D ( x ) = 1 n E ∑ i = 1 n ( X i − μ ) 2 = σ 2 D(x)=\frac{1}{n}E\sum\limits_{i=1}^n(X_i-\mu)^2=\sigma^2 D(x)=n1Ei=1n(Xiμ)2=σ2中除的是 n n n

~~~~~~~~~~~~~~~分割线~~~~~~~~~~~~~~~~

额外记录向量方差

设向量 x = [ x 1 x 2 . . . x n ] x=\begin{bmatrix} x_1\\ x_2 \\ ... \\ x_n\\ \end{bmatrix} x= x1x2...xn E ( x ) = [ E ( x 1 ) E ( x 2 ) . . . E ( x n ) ] E(x)=\begin{bmatrix} E(x_1)\\ E(x_2) \\ ... \\ E(x_n)\\ \end{bmatrix} E(x)= E(x1)E(x2)...E(xn)

V a r ( x ) = E [ ( x − μ ) ( x − μ ) T ] = [ v a r ( x 1 ) c o v ( x 1 , x 2 ) ⋯ c o v ( x 1 , x n ) c o v ( x 2 , x 1 ) c o v ( x 2 , x 2 ) ⋯ c o v ( x 2 , x n ) ⋮ ⋮ ⋱ ⋮ c o v ( x n , x 1 ) c o v ( x n , x 2 ) ⋯ c o v ( x n , x n ) ] Var(x)=E[(x-\mu)(x-\mu)^T]\\ =\begin{bmatrix} var(x_1) & cov(x_1,x_2) & \cdots & cov(x_1, x_n)\\ cov(x_2,x_1) & cov(x_2,x_2) & \cdots & cov(x_2, x_n) \\ \vdots & \vdots & \ddots & \vdots \\ cov(x_n,x_1) & cov(x_n,x_2) & \cdots & cov(x_n,x_n)\\ \end{bmatrix} Var(x)=E[(xμ)(xμ)T]= var(x1)cov(x2,x1)cov(xn,x1)cov(x1,x2)cov(x2,x2)cov(xn,x2)cov(x1,xn)cov(x2,xn)cov(xn,xn)

V a r ( A x ) = E [ ( A x − A μ ) ( A x − A μ ) T ] = E [ A ( x − μ ) ( x − μ ) T A T ] = A E [ ( x − μ ) ( x − μ ) T ] A T = A V a r ( x ) A T Var(Ax)=E[(Ax-A\mu)(Ax-A\mu)^T]\\ =E[A(x-\mu)(x-\mu)^TA^T]\\ =AE[(x-\mu)(x-\mu)^T]A^T\\ =AVar(x)A^T Var(Ax)=E[(AxAμ)(AxAμ)T]=E[A(xμ)(xμ)TAT]=AE[(xμ)(xμ)T]AT=AVar(x)AT

你可能感兴趣的:(深度学习,机器学习,算法,概率论)