我的R基础并没有很好,在研究工作中也在积累经验。
本篇文档主要生成个案数据框、汇总数据框和交叉频数表之间的相互变换进行汇总
概念
利用R内置数据集进行解释何为“个案数据框”、“汇总数据框”、“交叉频数表”。
- 个案数据框
#概念
#利用R内置数据集进行解释概念。
#个案数据框,本质上是data.frame数据类型
class(iris)
# [1] "data.frame"
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
- 交叉频数表
# 交叉频数表,本质上是table数据类型
class(Titanic)
# [1] "table"
head(Titanic)
# , , Age = Child, Survived = No
#
# Sex
# Class Male Female
# 1st 0 0
# 2nd 0 0
# 3rd 35 17
# Crew 0 0
#
# , , Age = Adult, Survived = No
#
# Sex
# Class Male Female
# 1st 118 4
# 2nd 154 13
# 3rd 387 89
# Crew 670 3
#
# , , Age = Child, Survived = Yes
#
# Sex
# Class Male Female
# 1st 5 1
# 2nd 11 13
# 3rd 13 14
# Crew 0 0
#
# , , Age = Adult, Survived = Yes
#
# Sex
# Class Male Female
# 1st 57 140
# 2nd 14 80
# 3rd 75 76
# Crew 192 20
- 汇总数据框
#汇总数据框,本质上是数据框,可由交叉频数表经函数as.data.frame直接转换得到
t <- as.data.frame(Titanic)
class(t)
# [1] "data.frame"
head(t)
# Class Sex Age Survived Freq
# 1 1st Male Child No 0
# 2 2nd Male Child No 0
# 3 3rd Male Child No 35
# 4 Crew Male Child No 0
# 5 1st Female Child No 0
# 6 2nd Female Child No 0
数据准备
构建一个数据集(个案数据框),列变量必须均为分类变量(因为只有分类变量才需要做交叉频数表)。
# 构建一个数据集(个案数据框),A、B、C必须均为分类变量
A <- c(rep("male",15),rep("female",20),rep("male",15))# 创建变量A
B <- c(rep("healthy",4),rep("sick",35),rep("healthy",11)) # 创建变量B
C <- c(rep("smoker",26), rep("nonsmoker",24)) # 创建变量C
mydata <- data.frame(A,B,C) # 利用以创建的变量构建数据框
(个案数据框)提取交叉频数表
# (个案数据框)提取交叉频数表
head(mydata)
# A B C
# 1 male healthy smoker
# 2 male healthy smoker
# 3 male healthy smoker
# 4 male healthy smoker
# 5 male sick smoker
# 6 male sick smoker
#####提取二维交叉表(先行后列)
table(mydata$A,mydata$B) #省略变量名A和B
with(mydata,table(A,B)) #等价于table(A,B), with只是一个对对象赋予函数的函数
ftable(mydata,row.vars = 'A',col.vars = 'B') #col.vars,row.vars可省略
xtabs(~A+B,mydata)
#library(gmodels) #gmodel::CrossTable
CrossTable(mydata$A,mydata$B,prop.c = F,prop.r = F,prop.t = F,prop.chisq = F,format='SPSS')
#####提取大于二维交叉表
table(mydata) #大于二维的显示多层二维
with(mydata,table(A,B,C)) #大于二维的显示多层二维
ftable(mydata) #大于二维的显示为(n-1)*1形式
xtabs(~A+B+C,mydata)
交叉频数表转【汇总数据框】
# 交叉频数表转【汇总数据框】
# 准备数据 crosstable <- table(mydata)
# head(crosstable)
# , , C = nonsmoker
#
# B
# A healthy sick
# female 0 9
# male 11 4
#
# , , C = smoker
#
# B
# A healthy sick
# female 0 11
# male 4 11
as.data.frame(crosstable)
ftable(crosstable,row.vars = 'A')#同 ftable(mydata,row.vars='A')
【汇总数据框】提取交叉频数表
# 【汇总数据框】提取交叉频数表
# sumdf <- as.data.frame(crosstable)
# head(sumdf)
# A B C Freq
# 1 female healthy nonsmoker 0
# 2 male healthy nonsmoker 11
# 3 female sick nonsmoker 9
# 4 male sick nonsmoker 4
# 5 female healthy smoker 0
# 6 male healthy smoker 4
xtabs(Freq~A+B,sumdf) #区别于 xtabs(~A+B, sumdf),但等价于xtabs(~A+B,mydata)
总结
题外话~今天一个小技巧
R从剪切板读取表格
#R从剪切板读取表格
read.table('clipboard', header=T)