reshape2是一个强大的数据处理操作的R包。
主要函数,melt,*cast.两个函数
###S3 method for class 'data.frame'
melt(data, id.vars, measure.vars,
variable.name = "variable", ..., na.rm = FALSE, value.name = "value",factorsAsStrings = TRUE)
### Default(vector) S3 method:
melt(data, ..., na.rm = FALSE, value.name = "value")
### S3 method for class 'list'
melt(data, ..., level = 1)
### S3 method for class 'array''table''matrix'
melt(data, varnames = names(dimnames(data)), ...,
na.rm = FALSE, as.is = FALSE, value.name = "value")
###data.frame###
head(airquality)
ozone solar.r wind temp month day
41 190 7.4 67 5 1
36 118 8.0 72 5 2
12 149 12.6 74 5 3
18 313 11.5 62 5 4
NA NA 14.3 56 5 5
melt(airquality, id=c("month", "day"))
month day variable value
5 1 ozone 41
5 2 ozone 36
5 3 ozone 12
5 4 ozone 18
###matrix,array,list###
a <- array(c(1:8,NA), c(3,3))
a
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 NA
melt(a)
Var1 Var2 value
1 1 1 1
2 2 1 2
3 3 1 3
4 1 2 4
5 2 2 5
#Var1,Var2为value的下标,三维同上(增加Var3),list增加l1列,表列表位置
acast,dcast的区别在于输出结果。 acast 输出结果为vector/matrix/array, dcast 输出结果为data.frame.参数formula中,.表示后面没有数据列,…表示之前或之后的所有数据列
*cast(data, formula, fun.aggregate = NULL, ..., margins = NULL,
subset = NULL, fill = NULL, drop = TRUE,
value.var = guess_value(data))
##Arguments解释##
formula指的是处理公式。
fun.aggregate为计算公式。
subset 为帅选规则,plyr包可扩展其功能.
aqm <- melt(airquality, id=c("month", "day"), na.rm=TRUE)
head(aqm)
month day variable value
1 5 1 ozone 41
2 5 2 ozone 36
3 5 3 ozone 12
4 5 4 ozone 18
acast(aqm, day ~ month ~ variable) ##按照month,day,variable,分割数据,返还数组.
, , ozone
5 6 7 8 9
1 41 NA 135 39 96
2 36 NA 49 9 78
, , solar.r
5 6 7 8 9
1 190 286 269 83 167
2 118 287 248 24 197
, , wind
5 6 7 8 9
1 7.4 8.6 4.1 6.9 6.9
2 8.0 9.7 9.2 13.8 5.1
, , temp
5 6 7 8 9
1 67 78 84 81 91
2 72 74 85 81 92
##结果中X为day,Y为Month,Z为variable##
acast(aqm, formula=month ~ variable, fun.aggregate=mean)#按month,variable切割,并求均值
ozone solar.r wind temp
5 23.61538 181.2963 11.622581 65.54839
6 29.44444 190.1667 10.266667 79.10000
······
##行名为month,列名为variable,结果类型为矩阵
dcast(aqm, month ~ variable, mean, margins = TRUE)
month ozone solar.r wind temp (all)
1 5 23.61538 181.2963 11.622581 65.54839 68.70696
2 6 29.44444 190.1667 10.266667 79.10000 87.38384
······
6 (all) 42.12931 185.9315 9.957516 77.88235 80.05722
##行名为month,列名为variable,结果类型为数据框,margins对整体进行处理。
library(plyr)
acast(aqm, variable ~ month, mean, subset = .(variable == "ozone"))
5 6 7 8 9
ozone 23.61538 29.44444 59.11538 59.96154 31.44828
colsplit(string, pattern, names)
##examle###
x
[1] "a_1" "a_2" "b_2" "c_3"
vars <- colsplit(x, "_", c("trt", "time"))
trt time
1 a 1
2 a 2
3 b 2
4 c 3