一、table 函数对应的就是统计学中的列联表,是一种记录频数的方法,对于统计来说有非常重要的应用,下面的例子都是针对维数为2的情况举例,多维的情况是类似的
下面看一个例子:
> ct <- data.frame(
+ Vote.for.X = factor(c("Yes", "Yes", "No", "Not Sure", "No"), levels = c("Yes", "No", "Not Sure")),
+ Vote.for.X.Last.Time = factor(c("Yes", "No", "No", "Yes", "No"), levels = c("Yes", "No"))
+ )
> ct
Vote.for.X Vote.for.X.Last.Time
1 Yes Yes
2 Yes No
3 No No
4 Not Sure Yes
5 No No
> cttab <-table(ct)
> cttab
Vote.for.X.Last.Time
Vote.for.X Yes No
Yes 1 1
No 0 2
Not Sure 1 0
下面看一下 cttab 的特点:
> mode(cttab)
[1] "numeric"
> str(cttab)
'table' int [1:3, 1:2] 1 0 1 1 2 0
- attr(*, "dimnames")=List of 2
..$ Vote.for.X : chr [1:3] "Yes" "No" "Not Sure"
..$ Vote.for.X.Last.Time: chr [1:2] "Yes" "No"
> summary(cttab)
Number of cases in table: 5
Number of factors: 2
Test for independence of all factors:
Chisq = 2.9167, df = 2, p-value = 0.2326
Chi-squared approximation may be incorrect
> attributes(cttab)
$dim
[1] 3 2
$dimnames
$dimnames$Vote.for.X
[1] "Yes" "No" "Not Sure"
$dimnames$Vote.for.X.Last.Time
[1] "Yes" "No"
$class
[1] "table"
二、table对象的操作
一个必须要掌握的操作,addmargins
> addmargins(cttab)
Vote.for.X.Last.Time
Vote.for.X Yes No Sum
Yes 1 1 2
No 0 2 2
Not Sure 1 0 1
Sum 2 3 5
> dimnames(cttab)
$Vote.for.X
[1] "Yes" "No" "Not Sure"
$Vote.for.X.Last.Time
[1] "Yes" "No"
subtable(tbl,subnames) tbl 感兴趣的表,subnames 一个类表,列出自己各个维度感兴趣的水平, subtable 实现如下
subtable <- function(tbl, subnames) {
#将 table 转换称 array 获得 table 里面的所有元素
tblarray <- unclass(tbl)
#将 tblarray 以及 subnames 组合到一个list中
dcargs <- list(tblarray)
ndims <- length(subnames)
for(i in 1:ndims) {
dcargs[[i+1]] <- subnames[[i]]
}
#等价与执行 dcargs[[1]][dcargs[[2]][i], dcargs[[3]][j]] i,j 取遍所有该属性的元素
subarray <- do.call("[", dcargs)
#对list中的每一个属性调用 length
dims <- lapply(subnames, length)
subtbl <- array(subarray, dims, dimnames = subnames)
class(subtbl) <- "table"
return(subtbl)
}
> as.data.frame(cttab)
Vote.for.X Vote.for.X.Last.Time Freq
1 Yes Yes 1
2 No Yes 0
3 Not Sure Yes 1
4 Yes No 1
5 No No 2
6 Not Sure No 0
tabdom 计算table的统计频率
tabdom <- function(tbl, k) {
tbldf <- as.data.frame(tbl)
freqord <- order(tabldf$Freq, decreasing=TRUE)
dom <- tbldf[freqord, ][1:k]
return(dom)
}
注意:aggregate() 函数 cut() 函数