B 1 B_1 B1 | B 2 B_2 B2 | ⋯ \cdots ⋯ | B s B_s Bs | 合计 | |
---|---|---|---|---|---|
A 1 A_1 A1 | n 11 n_{11} n11 | n 12 n_{12} n12 | ⋯ \cdots ⋯ | n 1 s n_{1s} n1s | n 1 ⋅ n_{1\cdot} n1⋅ |
A 2 A_2 A2 | n 21 n_{21} n21 | n 22 n_{22} n22 | ⋯ \cdots ⋯ | n 2 s n_{2s} n2s | n 2 ⋅ n_{2\cdot} n2⋅ |
⋮ \vdots ⋮ | ⋮ \vdots ⋮ | ⋮ \vdots ⋮ | ⋯ \cdots ⋯ | ⋮ \vdots ⋮ | ⋮ \vdots ⋮ |
A r A_r Ar | n r 1 n_{r1} nr1 | n r 2 n_{r2} nr2 | ⋯ \cdots ⋯ | n r s n_{rs} nrs | n r ⋅ n_{r\cdot} nr⋅ |
合计 | n ⋅ 1 n_{\cdot1} n⋅1 | n ⋅ 2 n_{\cdot2} n⋅2 | ⋯ \cdots ⋯ | n ⋯ n_{\cdots} n⋯ | n n n |
其中, n i ⋅ = ∑ j = 1 s n i j n_{i\cdot}=\sum\limits_{j=1}^sn_{ij} ni⋅=j=1∑snij, n ⋅ j = ∑ i = 1 r n i j n_{\cdot j}=\sum\limits_{i=1}^rn_{ij} n⋅j=i=1∑rnij,此时指标 A , B A, B A,B分别有 r , s r, s r,s个水平,且以 n i j n_{ij} nij表示在 n n n个样本中属于 A i ∩ B j A_i\cap B_j Ai∩Bj的样本个数。
考虑两个指标之间是否独立,即 H 0 : 指 标 A 与 B 独 立 或 指 标 A 与 B 没 有 关 系 (1) H_0:指标A与B独立或指标A与B没有关系\tag1 H0:指标A与B独立或指标A与B没有关系(1)
如记 p i j = P { X ∈ A i ∩ B j } p_{ij}=P\{X\in A_i\cap B_j\} pij=P{ X∈Ai∩Bj},则这 n n n个样本可以看成来自多项分布 X X X的样本。
再记 p i ⋅ = P { X ∈ A i } , i = 1 , ⋯ , r p_{i\cdot}=P\{X\in A_i\}, i=1,\cdots,r pi⋅=P{ X∈Ai},i=1,⋯,r, p ⋅ j = P { X ∈ B j } , j = 1 , ⋯ , s p_{\cdot j}=P\{X\in B_j\}, j=1,\cdots,s p⋅j=P{ X∈Bj},j=1,⋯,s,则有 p i ⋅ = ∑ j = 1 s p i j p_{i\cdot}=\sum\limits_{j=1}^sp_{ij} pi⋅=j=1∑spij, p ⋅ j = ∑ i = 1 r p i j p_{\cdot j}=\sum\limits_{i=1}^rp_{ij} p⋅j=i=1∑rpij,且有如下约束 ∑ i = 1 r p i ⋅ = ∑ j = 1 s p ⋅ j = 1 (2) \sum\limits_{i=1}^rp_{i\cdot}=\sum\limits_{j=1}^sp_{\cdot j}=1\tag2 i=1∑rpi⋅=j=1∑sp⋅j=1(2)
当 H 0 H_0 H0成立时,应该有 p i j = p i ⋅ p ⋅ j p_{ij}=p_{i\cdot}p_{\cdot j} pij=pi⋅p⋅j,于是假设 ( 2 ) (2) (2)等价于 H 0 : p i j = p i ⋅ p ⋅ j (3) H_0:p_{ij}=p_{i\cdot}p_{\cdot j}\tag3 H0:pij=pi⋅p⋅j(3)
由于我们可以把上述列联表数据看作时多项分布的样本,故可以用 χ 2 \chi^2 χ2拟合优度检验对其独立性假设 ( 3 ) (3) (3)进行显著性检验。
不过由于 p i ⋅ p_{i\cdot} pi⋅和 p ⋅ j p_{\cdot j} p⋅j均未知,且有约束 ( 2 ) (2) (2),故当 H 0 H_0 H0成立时,共有 r + s − 2 r+s-2 r+s−2个未知参数,此时,其未知参数的极大似然估计为 p ^ i ⋅ = n i ⋅ n , p ^ ⋅ j = n ⋅ j n \hat p_{i\cdot}=\frac{n_{i\cdot}}{n}, \hat p_{\cdot j}=\frac{n_{\cdot j}}{n} p^i⋅=nni⋅,p^⋅j=nn⋅j
于是有统计量为 χ 2 = n ∑ i = 1 r ∑ j = 1 s ( n i j − n i ⋅ n ⋅ j n ) 2 n i ⋅ n ⋅ j (4) \chi^2=n\sum_{i=1}^r\sum_{j=1}^s\frac{(n_{ij}-\frac{n_{i\cdot}n_{\cdot j}}{n})^2}{n_{i\cdot}n_{\cdot j}}\tag4 χ2=ni=1∑rj=1∑sni⋅n⋅j(nij−nni⋅n⋅j)2(4)
且当 H 0 H_0 H0成立及 n → ∞ n\to\infty n→∞时,有 χ 2 → χ 2 ( ( r − 1 ) ( s − 1 ) ) \chi^2\to\chi^2((r-1)(s-1)) χ2→χ2((r−1)(s−1))
于是,拒绝域为 W = { χ 2 ≥ χ α 2 ( ( r − 1 ) ( s − 1 ) ) } (5) W=\{\chi^2\ge\chi^2_\alpha((r-1)(s-1))\}\tag5 W={ χ2≥χα2((r−1)(s−1))}(5)