使用tidyr的时候也有可能需要dplyr包。
gather的用法:
gather(data, key, value, ..., na.rm = FALSE, convert = FALSE,
factor_key = FALSE)
data为要处理的数据,key为新创建的变量,value为创建变量下的数值。...为列的设定,比如下面的例子创建变量stock,而不需要把time变成变量下的数值,则用X:Z或-time。
stocks <- data.frame( time = as.Date("2009-01-01") + 0:9, X = rnorm(10, 0, 1), Y = rnorm(10, 0, 2), Z = rnorm(10, 0, 4) ) > stocks time X Y Z 1 2009-01-01 -1.06024371 -1.3799111 1.2076349 2 2009-01-02 -0.49010731 -2.4870899 3.7888550 3 2009-01-03 1.15709245 -1.5708653 2.2164464 4 2009-01-04 -0.52049101 1.9813593 2.0094476 5 2009-01-05 0.60130291 -2.6300865 -3.9681798 6 2009-01-06 -0.09306031 -0.8420800 0.6704426 7 2009-01-07 -0.83323255 -1.4147638 2.3398504 8 2009-01-08 -0.15517432 1.8773903 1.6437630 9 2009-01-09 0.36519166 0.3696238 1.5746588 10 2009-01-10 0.09442596 -0.5840472 -1.0298875
stocks %>% gather(stock, price, -time) time stock price 1 2009-01-01 X -1.06024371 2 2009-01-02 X -0.49010731 3 2009-01-03 X 1.15709245 4 2009-01-04 X -0.52049101 5 2009-01-05 X 0.60130291 6 2009-01-06 X -0.09306031 7 2009-01-07 X -0.83323255 8 2009-01-08 X -0.15517432 9 2009-01-09 X 0.36519166 10 2009-01-10 X 0.09442596 11 2009-01-01 Y -1.37991110 12 2009-01-02 Y -2.48708993 13 2009-01-03 Y -1.57086533 14 2009-01-04 Y 1.98135935 15 2009-01-05 Y -2.63008648 16 2009-01-06 Y -0.84208003 17 2009-01-07 Y -1.41476385 18 2009-01-08 Y 1.87739029 19 2009-01-09 Y 0.36962382 20 2009-01-10 Y -0.58404720 21 2009-01-01 Z 1.20763492 22 2009-01-02 Z 3.78885500 23 2009-01-03 Z 2.21644638 24 2009-01-04 Z 2.00944760 25 2009-01-05 Z -3.96817978 26 2009-01-06 Z 0.67044262 27 2009-01-07 Z 2.33985039 28 2009-01-08 Z 1.64376300 29 2009-01-09 Z 1.57465875 30 2009-01-10 Z -1.02988745
gather_的用法如上,只是更适合于编程用。
separate的用法:
separate(data, col, into, sep = "[^[:alnum:]]+", remove =TRUE,
convert = FALSE, extra = "warn", fill = "warn", ...)
data为要处理的数据;col为要选择的变量名;into为要分割为几个变量的名;sep为正则表达式,用于分割;extra为在字符串过多的情况下处理;fill为在字符串过少的情况下处理。
> df <- data.frame(x = c(NA, "a.b", "a.d","b.c")) > df %>% separate(x, c("A", "B")) A B 1 2 a b 3 a d 4 b c > df <- data.frame(x = c("a", "a b", "a b c", NA)) > df x 1 a 2 a b 3 a b c 4
df %>% separate(x, c("a", "b"), extra = "merge", fill = "left")
a b
1 a
2 a b
3 a b c
4
> df <- data.frame(x = c("x: 123", "y: error: 7")) > df x 1 x: 123 2 y: error: 7 > df %>% separate(x, c("key", "value"), sep = ":", extra = "merge") key value 1 x 123 2 y error: 7
separate_的用法如上,但更适用于编程。
separate_rows的用法:
separate_rows(data, ..., sep = "[^[:alnum:].]+", convert =FALSE)
> df <- data.frame( + x = 1:3, + y = c("a","d,e,f","g,h"), + z = c("1","2,3,4","5,6"), + stringAsFactors =F + ) > separate_rows(df, y, z, convert = T) x stringAsFactors y z 1 1 FALSE a 1 2 2 FALSE d 2 3 2 FALSE e 3 4 2 FALSE f 4 5 3 FALSE g 5 6 3 FALSE h 6
而使用separate_rows(df, c("y","z"), convert =T)是不对的,必须使用separate_rows_ 才可。
spread的用法:
spread(data, key, value, fill = NA, convert = FALSE, drop =TRUE,
sep = NULL)
data为要扩展的数据,key为列的名字,其值将用于列表的表头, value将填充列表的值;fill为填充数据的缺失值。
使用上面的stocks例子
> stocksm <- stocks %>% gather(stock, price, -time) > stocksm %>% spread(stock, price) time X Y Z 1 2009-01-01 -1.04456577 -0.97179381 2.4330105 2 2009-01-02 -0.09717072 -0.77313257 -2.3443736 3 2009-01-03 1.36933739 1.54137383 0.6551325 4 2009-01-04 2.13655070 -0.05889974 1.9988306 5 2009-01-05 0.39546822 3.80644394 3.7166546 6 2009-01-06 0.09720381 -0.44658971 0.5136471 7 2009-01-07 -0.50775134 -2.53712365 0.2004835 8 2009-01-08 -1.65134456 1.01235639 10.4377141 9 2009-01-09 -1.62622446 -0.80074087 3.0444515 10 2009-01-10 -0.37272122 3.86510320 4.7424977 > stocksm %>% spread(time, price) stock 2009-01-01 2009-01-02 2009-01-03 2009-01-04 2009-01-05 1 X -1.0445658 -0.09717072 1.3693374 2.13655070 0.3954682 2 Y -0.9717938 -0.77313257 1.5413738 -0.05889974 3.8064439 3 Z 2.4330105 -2.34437356 0.6551325 1.99883061 3.7166546 2009-01-06 2009-01-07 2009-01-08 2009-01-09 2009-01-10 1 0.09720381 -0.5077513 -1.651345 -1.6262245 -0.3727212 2 -0.44658971 -2.5371237 1.012356 -0.8007409 3.8651032 3 0.51364707 0.2004835 10.437714 3.0444515 4.7424977
spread_用于编程。
unite的用法:
unite(data, col, ..., sep = "_", remove = TRUE)
data为数据;col为创建的变量;...为要合并的列名;sep为要为合并的变量添加的符号。
> mtcars %>% unite(vs_am, vs, am) mpg cyl disp hp drat wt qsec vs_am gear Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0_1 4 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0_1 4 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1_1 4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1_0 3 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0_0 3 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1_0 3
也可用unite_:
> mtcars %>% unite_("vs_am", c("vs","am")) mpg cyl disp hp drat wt qsec vs_am gear Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0_1 4 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0_1 4 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1_1 4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1_0 3 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0_0 3 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1_0 3