R-基础-个案数据框、汇总数据框、交叉频数表互转

我的R基础并没有很好,在研究工作中也在积累经验。
本篇文档主要生成个案数据框、汇总数据框和交叉频数表之间的相互变换进行汇总

概念

利用R内置数据集进行解释何为“个案数据框”、“汇总数据框”、“交叉频数表”。

  • 个案数据框
#概念
#利用R内置数据集进行解释概念。
#个案数据框,本质上是data.frame数据类型
class(iris)
# [1] "data.frame"
head(iris)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa
  • 交叉频数表
# 交叉频数表,本质上是table数据类型
class(Titanic)
# [1] "table"
head(Titanic)
# , , Age = Child, Survived = No
# 
# Sex
# Class  Male Female
# 1st     0      0
# 2nd     0      0
# 3rd    35     17
# Crew    0      0
# 
# , , Age = Adult, Survived = No
# 
# Sex
# Class  Male Female
# 1st   118      4
# 2nd   154     13
# 3rd   387     89
# Crew  670      3
# 
# , , Age = Child, Survived = Yes
# 
# Sex
# Class  Male Female
# 1st     5      1
# 2nd    11     13
# 3rd    13     14
# Crew    0      0
# 
# , , Age = Adult, Survived = Yes
# 
# Sex
# Class  Male Female
# 1st    57    140
# 2nd    14     80
# 3rd    75     76
# Crew  192     20
  • 汇总数据框
#汇总数据框,本质上是数据框,可由交叉频数表经函数as.data.frame直接转换得到
t <- as.data.frame(Titanic)
class(t)
# [1] "data.frame"
head(t)
# Class    Sex   Age Survived Freq
# 1   1st   Male Child       No    0
# 2   2nd   Male Child       No    0
# 3   3rd   Male Child       No   35
# 4  Crew   Male Child       No    0
# 5   1st Female Child       No    0
# 6   2nd Female Child       No    0

数据准备

构建一个数据集(个案数据框),列变量必须均为分类变量(因为只有分类变量才需要做交叉频数表)。

# 构建一个数据集(个案数据框),A、B、C必须均为分类变量
A <- c(rep("male",15),rep("female",20),rep("male",15))# 创建变量A
B <- c(rep("healthy",4),rep("sick",35),rep("healthy",11)) # 创建变量B
C <- c(rep("smoker",26), rep("nonsmoker",24)) # 创建变量C
mydata <- data.frame(A,B,C) # 利用以创建的变量构建数据框

(个案数据框)提取交叉频数表

# (个案数据框)提取交叉频数表
head(mydata)
#   A       B      C
# 1 male healthy smoker
# 2 male healthy smoker
# 3 male healthy smoker
# 4 male healthy smoker
# 5 male    sick smoker
# 6 male    sick smoker
#####提取二维交叉表(先行后列)
table(mydata$A,mydata$B) #省略变量名A和B
with(mydata,table(A,B))  #等价于table(A,B), with只是一个对对象赋予函数的函数
ftable(mydata,row.vars = 'A',col.vars = 'B') #col.vars,row.vars可省略
xtabs(~A+B,mydata)
#library(gmodels) #gmodel::CrossTable
CrossTable(mydata$A,mydata$B,prop.c = F,prop.r = F,prop.t = F,prop.chisq = F,format='SPSS')
#####提取大于二维交叉表
table(mydata)     #大于二维的显示多层二维
with(mydata,table(A,B,C)) #大于二维的显示多层二维
ftable(mydata)    #大于二维的显示为(n-1)*1形式
xtabs(~A+B+C,mydata) 

交叉频数表转【汇总数据框】

# 交叉频数表转【汇总数据框】
# 准备数据 crosstable <- table(mydata)
# head(crosstable)
# , , C = nonsmoker
# 
# B
# A        healthy sick
# female       0    9
# male        11    4
# 
# , , C = smoker
# 
# B
# A        healthy sick
# female       0   11
# male         4   11
as.data.frame(crosstable)
ftable(crosstable,row.vars = 'A')#同 ftable(mydata,row.vars='A')

【汇总数据框】提取交叉频数表

# 【汇总数据框】提取交叉频数表
# sumdf <- as.data.frame(crosstable)
# head(sumdf)
#   A       B         C Freq
# 1 female healthy nonsmoker    0
# 2   male healthy nonsmoker   11
# 3 female    sick nonsmoker    9
# 4   male    sick nonsmoker    4
# 5 female healthy    smoker    0
# 6   male healthy    smoker    4
xtabs(Freq~A+B,sumdf) #区别于 xtabs(~A+B, sumdf),但等价于xtabs(~A+B,mydata)

总结

互转关系及所用函数

题外话~今天一个小技巧

R从剪切板读取表格

#R从剪切板读取表格
read.table('clipboard', header=T)

你可能感兴趣的:(R-基础-个案数据框、汇总数据框、交叉频数表互转)