关注代数、几何与统计观点。
仅关注一项属性, D = ( X x 1 x 2 ⋮ x n ) , x i ∈ R \mathbf{D}=\left(\begin{array}{c} X \\ \hline x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right),x_i\in\mathbb{R} D=⎝ ⎛Xx1x2⋮xn⎠ ⎞,xi∈R
统计: X X X 可视为(高维)随机变量, x i x_i xi 均是恒等随机变量, x 1 , ⋯ , x n x_1,\cdots,x_n x1,⋯,xn 也看作源于 X X X 的长度为 n n n 的随机样本。
Def.1. 经验积累分布函数
Def.2. 反积累分布函数
Def.3. 随机变量 X X X 的经验概率质量函数是指
f ^ ( x ) = 1 n ∑ i = 1 n I ( x i = x ) , ∀ x i ∈ R I ( x i = x ) = { 1 , x i = x 0 , x i ≠ x \hat{f}(x)=\frac{1}{n} \sum_{i=1}^{n} I\left(x_{i} = x\right),\forall x_i \in \mathbb{R}\\ I\left(x_{i} = x\right)=\left\{\begin{matrix} 1,x_i=x\\ 0,x_i\ne x \end{matrix}\right. f^(x)=n1i=1∑nI(xi=x),∀xi∈RI(xi=x)={1,xi=x0,xi=x
Def.4. 离散随机变量 X X X 的期望是指: μ : = E ( X ) = ∑ x x f ( x ) \mu:=E(X) = \sum\limits_{x} xf(x) μ:=E(X)=x∑xf(x), f ( x ) f(x) f(x) 是 X X X 的PMF
连续随机变量 X X X 的期望是指: μ : = E ( X ) = ∫ − ∞ + ∞ x f ( x ) d x \mu:=E(X) = \int\limits_{-\infin}^{+\infin} xf(x)dx μ:=E(X)=−∞∫+∞xf(x)dx, f ( x ) f(x) f(x) 是 X X X 的PDF
注: E ( a X + b Y ) = a E ( X ) + b E ( Y ) E(aX+bY)=aE(X)+bE(Y) E(aX+bY)=aE(X)+bE(Y)
Def.5. X X X 的样本平均值是指 μ ^ = 1 n ∑ i = 1 n x i \hat{\mu}=\frac{1}{n} \sum\limits_{i=1}^{n}x_i μ^=n1i=1∑nxi,注 μ ^ \hat{\mu} μ^ 是 μ \mu μ 的估计量
Def.6. 一个估计量(统计量) θ ^ \hat{\theta} θ^ 被称作统计量 θ \theta θ 的无偏估计,如果 E ( θ ^ ) = θ E(\hat{\theta})=\theta E(θ^)=θ
自证:样本平均值 μ ^ \hat{\mu} μ^ 是期望 μ \mu μ 的无偏估计量, E ( x i ) = μ for all x i E(x_i)=\mu \text{ for all } x_i E(xi)=μ for all xi
Def.7. 一个估计量是稳健的,如果它不会被样本中的极值影响。(样本平均值并不是稳健的。)
Def.8. 随机变量 X X X 的中位数
Def.9. 随机变量 X X X 的样本中位数
Def.10. 随机变量 X X X 的众数, 随机变量 X X X 的样本众数
Def.11. 随机变量 X X X 的极差与样本极差
Def.12. 随机变量 X X X 的四分位距,样本的四分位距
Def.13. 随机变量 X X X 的方差是
σ 2 = var ( X ) = E [ ( X − μ ) 2 ] = { ∑ x ( x − μ ) 2 f ( x ) if X is discrete ∫ − ∞ ∞ ( x − μ ) 2 f ( x ) d x if X is continuous \sigma^{2}=\operatorname{var}(X)=E\left[(X-\mu)^{2}\right]=\left\{\begin{array}{ll} \sum_{x}(x-\mu)^{2} f(x) & \text { if } X \text { is discrete } \\ \\ \int_{-\infty}^{\infty}(x-\mu)^{2} f(x) d x & \text { if } X \text { is continuous } \end{array}\right. σ2=var(X)=E[(X−μ)2]=⎩ ⎨ ⎧∑x(x−μ)2f(x)∫−∞∞(x−μ)2f(x)dx if X is discrete if X is continuous
标准差 σ \sigma σ 是指 σ 2 \sigma^2 σ2 的正的平方根。
注:方差是关于期望的第二阶动差, r r r 阶动差是指 E [ ( x − μ ) r ] E[(x-\mu)^r] E[(x−μ)r]。
性质:
Def.14. 样本方差是 σ ^ 2 = 1 n ∑ i = 1 n ( x i − μ ^ ) 2 \hat{\sigma}^{2}=\frac{1}{n} \sum\limits_{i=1}^{n}\left(x_{i}-\hat{\mu}\right)^{2} σ^2=n1i=1∑n(xi−μ^)2,底下非 n − 1 n-1 n−1
样本方差的几何意义:考虑中心化数据矩阵
C : = ( x 1 − μ ^ x 2 − μ ^ ⋮ x n − μ ^ ) n ⋅ σ ^ 2 = ∑ i = 1 n ( x i − μ ^ ) 2 = ∣ ∣ C ∣ ∣ 2 C:=\left(\begin{array}{c} x_{1}-\hat{\mu} \\ x_{2}-\hat{\mu} \\ \vdots \\ x_{n}-\hat{\mu} \end{array}\right)\\ n\cdot \hat{\sigma}^2=\sum\limits_{i=1}^{n}\left(x_{i}-\hat{\mu}\right)^{2}=||C||^2 C:=⎝ ⎛x1−μ^x2−μ^⋮xn−μ^⎠ ⎞n⋅σ^2=i=1∑n(xi−μ^)2=∣∣C∣∣2
问题: X X X 的样本平均数的期望与方差?
E ( μ ^ ) = E ( 1 n ∑ i = 1 n x i ) = 1 n ∑ i = 1 n E ( x i ) = 1 n ∑ i = 1 n μ = μ E(\hat{\mu})=E(\frac{1}{n} \sum\limits_{i=1}^{n}x_i)=\frac{1}{n} \sum\limits_{i=1}^{n} E(x_i)=\frac{1}{n}\sum\limits_{i=1}^{n}\mu=\mu\\ E(μ^)=E(n1i=1∑nxi)=n1i=1∑nE(xi)=n1i=1∑nμ=μ
方差有两种方法:第一种直接展开,第二种:运用 x 1 , ⋯ , x n x_1,\cdots,x_n x1,⋯,xn 独立同分布:
v a r ( ∑ i = 1 n x i ) ) = ∑ i = 1 n v a r ( x i ) = n ⋅ σ 2 ⟹ v a r ( μ ^ ) = σ 2 n var(\sum\limits_{i=1}^{n}x_i))=\sum\limits_{i=1}^{n}var(x_i)=n\cdot \sigma^2\Longrightarrow var(\hat{\mu})=\frac{\sigma^2}{n} var(i=1∑nxi))=i=1∑nvar(xi)=n⋅σ2⟹var(μ^)=nσ2
注:样本方差是有偏估计,因为: E ( σ 2 ) = ( n − 1 n ) σ 2 → n → + ∞ σ 2 E(\sigma^2)=(\frac{n-1}{n})\sigma^2\xrightarrow{n\to +\infin}\sigma^2 E(σ2)=(nn−1)σ2n→+∞σ2
略
D = ( X 1 X 2 ⋯ X d x 1 x 11 x 12 ⋯ x 1 d x 2 x 21 x 22 ⋯ x 2 d ⋮ ⋮ ⋮ ⋱ ⋮ x n x n 1 x n 2 ⋯ x n d ) \mathbf{D}=\left(\begin{array}{c|cccc} & X_{1} & X_{2} & \cdots & X_{d} \\ \hline \mathbf{x}_{1} & x_{11} & x_{12} & \cdots & x_{1 d} \\ \mathbf{x}_{2} & x_{21} & x_{22} & \cdots & x_{2 d} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathbf{x}_{n} & x_{n 1} & x_{n 2} & \cdots & x_{n d} \end{array}\right) D=⎝ ⎛x1x2⋮xnX1x11x21⋮xn1X2x12x22⋮xn2⋯⋯⋯⋱⋯Xdx1dx2d⋮xnd⎠ ⎞
可视为: X = ( X 1 , ⋯ , X d ) T \mathbf{X}=(X_1,\cdots,X_d)^T X=(X1,⋯,Xd)T
Def.15. 对于随机变量向量 X \mathbf{X} X,其期望向量为: E [ X ] = ( E [ X 1 ] E [ X 2 ] ⋮ E [ X d ] ) E[\mathbf{X}]=\left(\begin{array}{c} E\left[X_{1}\right] \\ E\left[X_{2}\right] \\ \vdots \\ E\left[X_{d}\right] \end{array}\right) E[X]=⎝ ⎛E[X1]E[X2]⋮E[Xd]⎠ ⎞
样本平均值为: μ ^ = 1 n ∑ i = 1 n x i , ( = m e a n ( D ) ) ∈ R d \hat{\boldsymbol{\mu}}=\frac{1}{n} \sum\limits_{i=1}^{n} \mathbf{x}_{i},(=mean(\mathbf{D})) \in \mathbb{R}^{d} μ^=n1i=1∑nxi,(=mean(D))∈Rd
Def.16. 对于 X 1 , X 2 X_1,X_2 X1,X2,定义协方差 σ 12 = E [ ( X 1 − E ( X 1 ) ) ( X 2 − E ( X 2 ) ] = E ( X 1 X 2 ) − E ( X 1 ) E ( X 2 ) \sigma_{12}=E[(X_1-E(X_1))(X_2-E(X_2)]=E(X_1X_2)-E(X_1)E(X_2) σ12=E[(X1−E(X1))(X2−E(X2)]=E(X1X2)−E(X1)E(X2)
Remark:
Def.17. 对于随机变量向量 X = ( X 1 , ⋯ , X d ) T \mathbf{X}=(X_1,\cdots,X_d)^T X=(X1,⋯,Xd)T,定义协方差矩阵:
Σ = E [ ( X − μ ) ( X − μ ) T ] = ( σ 1 2 σ 12 ⋯ σ 1 d σ 21 σ 2 2 ⋯ σ 2 d ⋯ ⋯ ⋯ ⋯ σ d 1 σ d 2 ⋯ σ d 2 ) d × d \boldsymbol{\Sigma}=E\left[(\mathbf{X}-\boldsymbol{\mu})(\mathbf{X}-\boldsymbol{\mu})^{T}\right]=\left(\begin{array}{cccc} \sigma_{1}^{2} & \sigma_{12} & \cdots & \sigma_{1 d} \\ \sigma_{21} & \sigma_{2}^{2} & \cdots & \sigma_{2 d} \\ \cdots & \cdots & \cdots & \cdots \\ \sigma_{d 1} & \sigma_{d 2} & \cdots & \sigma_{d}^{2} \end{array}\right)_{d\times d} Σ=E[(X−μ)(X−μ)T]=⎝ ⎛σ12σ21⋯σd1σ12σ22⋯σd2⋯⋯⋯⋯σ1dσ2d⋯σd2⎠ ⎞d×d
其为对称矩阵,定义 X \mathbf{X} X 的广义方差为 d e t ( Σ ) det(\boldsymbol{\Sigma}) det(Σ)
注:
Def.18. 对于 X = ( X 1 , ⋯ , X d ) T \mathbf{X}=(X_1,\cdots,X_d)^T X=(X1,⋯,Xd)T,定义样本协方差矩阵
Σ ^ = 1 n ( Z T Z ) = 1 n ( Z 1 T Z 1 Z 1 T Z 2 ⋯ Z 1 T Z d Z 2 T Z 1 Z 2 T Z 2 ⋯ Z 2 T Z d ⋮ ⋮ ⋱ ⋮ Z d T Z 1 Z d T Z 2 ⋯ Z d T Z d ) d × d \hat{\boldsymbol{\Sigma}}=\frac{1}{n}\left(\mathbf{Z}^{T} \mathbf{Z}\right)=\frac{1}{n}\left(\begin{array}{cccc} Z_{1}^{T} Z_{1} & Z_{1}^{T} Z_{2} & \cdots & Z_{1}^{T} Z_{d} \\ Z_{2}^{T} Z_{1} & Z_{2}^{T} Z_{2} & \cdots & Z_{2}^{T} Z_{d} \\ \vdots & \vdots & \ddots & \vdots \\ Z_{d}^{T} Z_{1} & Z_{d}^{T} Z_{2} & \cdots & Z_{d}^{T} Z_{d} \end{array}\right)_{d\times d} Σ^=n1(ZTZ)=n1⎝ ⎛Z1TZ1Z2TZ1⋮ZdTZ1Z1TZ2Z2TZ2⋮ZdTZ2⋯⋯⋱⋯Z1TZdZ2TZd⋮ZdTZd⎠ ⎞d×d
其中
Z = D − 1 ⋅ μ ^ T = ( x 1 T − μ ^ T x 2 T − μ ^ T ⋮ x n T − μ ^ T ) = ( − z 1 T − − z 2 T − ⋮ − z n T − ) = ( ∣ ∣ ∣ Z 1 Z 2 ⋯ Z d ∣ ∣ ∣ ) \mathbf{Z}=\mathbf{D}-\mathbf{1} \cdot \hat{\boldsymbol{\mu}}^{T}=\left(\begin{array}{c} \mathbf{x}_{1}^{T}-\hat{\boldsymbol{\mu}}^{T} \\ \mathbf{x}_{2}^{T}-\hat{\boldsymbol{\mu}}^{T} \\ \vdots \\ \mathbf{x}_{n}^{T}-\hat{\boldsymbol{\mu}}^{T} \end{array}\right)=\left(\begin{array}{ccc} -& \mathbf{z}_{1}^{T} & - \\ -& \mathbf{z}_{2}^{T} & - \\ & \vdots \\ -& \mathbf{z}_{n}^{T} & - \end{array}\right)=\left(\begin{array}{cccc} \mid & \mid & & \mid \\ Z_{1} & Z_{2} & \cdots & Z_{d} \\ \mid & \mid & & \mid \end{array}\right) Z=D−1⋅μ^T=⎝ ⎛x1T−μ^Tx2T−μ^T⋮xnT−μ^T⎠ ⎞=⎝ ⎛−−−z1Tz2T⋮znT−−−⎠ ⎞=⎝ ⎛∣Z1∣∣Z2∣⋯∣Zd∣⎠ ⎞
样本总方差是 t r ( Σ ^ ) tr(\hat{\boldsymbol{\Sigma}}) tr(Σ^),广义样本方差是 d e t ( Σ ^ ) ≥ 0 det(\hat{\boldsymbol{\Sigma}})\ge0 det(Σ^)≥0
Σ ^ = 1 n ∑ i = 1 n z i z i T \hat{\boldsymbol{\Sigma}}=\frac{1}{n}\sum\limits_{i=1}^n\mathbf{z}_{i}\mathbf{z}_{i}^T Σ^=n1i=1∑nziziT