频数表和列联表
本节的数据来源于vcd包中的Arthritis数据集
> library(vcd)
载入需要的程辑包:grid
> head(Arthritis)
ID Treatment Sex Age Improved
1 57 Treated Male 27 Some
2 46 Treated Male 29 None
3 77 Treated Male 30 None
4 17 Treated Male 32 Marked
5 36 Treated Male 46 Marked
6 23 Treated Male 58 Marked
一维列联表
可以使用table()函数生成简单的频数统计表
> mytable<-table(Arthritis$Improved)
> mytable
None Some Marked
42 14 28
使用prop.table()将这些频数转化为比例值
> prop.table(mytable)
None Some Marked
0.5000000 0.1666667 0.3333333
或者使用prop.table()*100转化为百分比
> prop.table(mytable)*100
None Some Marked
50.00000 16.66667 33.33333
二维列联表
对于二维列联表,table()的调用格式如下:
table(a,b) 其中a为行变量,b为列变量
> table(Arthritis$Treatment,Arthritis$Improved)
None Some Marked
Placebo 29 7 7
Treated 13 7 21
还可以使用xtabs()函数,调用格式如下:
xtabs(~A+B,data=mydata)
其中mydata是一个矩阵或数据框,要进行交叉分类的变量写在~的右边
> xtabs(~Treatment+Improved,data = Arthritis)
Improved
Treatment None Some Marked
Placebo 29 7 7
Treated 13 7 21
此外可以使用margin.table()和prop.table()函数分别生成边际频数和比例
> margin.table(mytable,1)#1代表第一个变量
Treatment
Placebo Treated
43 41
> margin.table(mytable,2)#2代表第二个变量
Improved
None Some Marked
42 14 28
> prop.table(mytable)
Improved
Treatment None Some Marked
Placebo 0.34523810 0.08333333 0.08333333
Treated 0.15476190 0.08333333 0.25000000
> prop.table(mytable,1)
Improved
Treatment None Some Marked
Placebo 0.6744186 0.1627907 0.1627907
Treated 0.3170732 0.1707317 0.5121951
> prop.table(mytable,2)
Improved
Treatment None Some Marked
Placebo 0.6904762 0.5000000 0.2500000
Treated 0.3095238 0.5000000 0.7500000
可以使用addmargins()函数可以为表格添加边际和
> addmargins(mytable)
Improved
Treatment None Some Marked Sum
Placebo 29 7 7 43
Treated 13 7 21 41
Sum 42 14 28 84
> addmargins(prop.table(mytable))
Improved
Treatment None Some Marked Sum
Placebo 0.34523810 0.08333333 0.08333333 0.51190476
Treated 0.15476190 0.08333333 0.25000000 0.48809524
Sum 0.50000000 0.16666667 0.33333333 1.00000000
> addmargins(prop.table(mytable,1),2)
Improved
Treatment None Some Marked Sum
Placebo 0.6744186 0.1627907 0.1627907 1.0000000
Treated 0.3170732 0.1707317 0.5121951 1.0000000
> addmargins(prop.table(mytable,2),1)
Improved
Treatment None Some Marked
Placebo 0.6904762 0.5000000 0.2500000
Treated 0.3095238 0.5000000 0.7500000
Sum 1.0000000 1.0000000 1.0000000
使用gmodels包中的CrossTable()函数是创建二维列联表的第三种方法
> library(gmodels)
> library(vcd)
载入需要的程辑包:grid
> CrossTable(Arthritis$Treatment,Arthritis$Improved)
Cell Contents
|-------------------------|
| N |
| Chi-square contribution |
| N / Row Total |
| N / Col Total |
| N / Table Total |
|-------------------------|
Total Observations in Table: 84
| Arthritis$Improved
Arthritis$Treatment | None | Some | Marked | Row Total |
--------------------|-----------|-----------|-----------|-----------|
Placebo | 29 | 7 | 7 | 43 |
| 2.616 | 0.004 | 3.752 | |
| 0.674 | 0.163 | 0.163 | 0.512 |
| 0.690 | 0.500 | 0.250 | |
| 0.345 | 0.083 | 0.083 | |
--------------------|-----------|-----------|-----------|-----------|
Treated | 13 | 7 | 21 | 41 |
| 2.744 | 0.004 | 3.935 | |
| 0.317 | 0.171 | 0.512 | 0.488 |
| 0.310 | 0.500 | 0.750 | |
| 0.155 | 0.083 | 0.250 | |
--------------------|-----------|-----------|-----------|-----------|
Column Total | 42 | 14 | 28 | 84 |
| 0.500 | 0.167 | 0.333 | |
--------------------|-----------|-----------|-----------|-----------|
多维列联表
多维列联表的创建方式与二维列联表的方式差不多,具体代码如下:
#第一个变量是行变量,第二个变量是列变量,第三个变量是分组变量
> mytable<-xtabs(~Treatment+Improved+Sex,data = Arthritis)
> mytable
, , Sex = Female
Improved
Treatment None Some Marked
Placebo 19 7 6
Treated 6 5 16
, , Sex = Male
Improved
Treatment None Some Marked
Placebo 10 0 1
Treated 7 2 5
> ftable(mytable)
Sex Female Male
Treatment Improved
Placebo None 19 10
Some 7 0
Marked 6 1
Treated None 6 7
Some 5 2
Marked 16 5
#计算每个变量的边际数
> margin.table(mytable,1)
Treatment
Placebo Treated
43 41
> margin.table(mytable,2)
Improved
None Some Marked
42 14 28
> margin.table(mytable,3)
Sex
Female Male
59 25
#两个变量组合的边际数
> margin.table(mytable,c(1,3))
Sex
Treatment Female Male
Placebo 32 11
Treated 27 14
> margin.table(mytable,c(1,2))
Improved
Treatment None Some Marked
Placebo 29 7 7
Treated 13 7 21
> ftable(prop.table(mytable,c(1,2)))
Sex Female Male
Treatment Improved
Placebo None 0.6551724 0.3448276
Some 1.0000000 0.0000000
Marked 0.8571429 0.1428571
Treated None 0.4615385 0.5384615
Some 0.7142857 0.2857143
Marked 0.7619048 0.2380952
> ftable(addmargins(prop.table(mytable,c(1,2)),3))
Sex Female Male Sum
Treatment Improved
Placebo None 0.6551724 0.3448276 1.0000000
Some 1.0000000 0.0000000 1.0000000
Marked 0.8571429 0.1428571 1.0000000
Treated None 0.4615385 0.5384615 1.0000000
Some 0.7142857 0.2857143 1.0000000
Marked 0.7619048 0.2380952 1.0000000
> ftable(addmargins(prop.table(mytable,c(1,2)),3))*100
Sex Female Male Sum
Treatment Improved
Placebo None 65.51724 34.48276 100.00000
Some 100.00000 0.00000 100.00000
Marked 85.71429 14.28571 100.00000
Treated None 46.15385 53.84615 100.00000
Some 71.42857 28.57143 100.00000
Marked 76.19048 23.80952 100.00000