summarise()和mutate()是plyr包中的两个函数。
summarise()函数
用法:
summarise(.data, ...)
参数:
.data
the data frame to be summarised
...
further arguments of the form var = value
mutate()函数
用法:
mutate(.data, ...)
参数:
.data
the data frame to transform
...
named parameters giving definitions of new columns.
参考:http://xukuang.github.io/blog/2014/06/dots-in-plyr-of-r-packages/
summarise和mutate函数都可以对一个数据框的某一列(而不是整个数据框)进行修改和汇总,两者的主要区别在于返回结果的方式不同,其中summarise函数返回一个只包含修改或汇总后数据的数据框,而mutate函数则返回一个由原始数据和修改或汇总后数据两部分构成的数据框(mutate函数与基础包的transform函数相似,两者的区别在于;muate函数可以对刚刚建立起来的列进行计算,而transform函数只能针对数据的原始列进行计算)。
示例:
require(plyr)
set.seed(1) # 保证每次产生的数据框的唯一性
dfx <- data.frame(
group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
sex = sample(c("M", "F"), size = 29, replace = TRUE),
age = sample(20:30, size = 29, replace = TRUE),
worktime = sample(1:5, size = 29, replace = TRUE)
)
### 数据修改
summarise(dfx, age = age + 1) # 返回一个只含一列age的数据框
mutate(dfx, age = age + 1) # 返回一个和dfx列数一样的4列数据框,但age列的数值已经修改
### 数据汇总
summarise(dfx, mean.age = mean(age), sd.age = sd(age)) # 返回一个只含汇总结果的2列数据框
mutate(dfx, mean.age = mean(age), sd.age = sd(age)) # 返回一个由dfx和汇总结果组成的4列数据框
输出结果:
dfx
group sex age worktime
1 A M 21 5
2 A F 29 2
3 A M 28 1
4 A M 20 3
5 A F 23 3
6 A M 22 4
7 A M 25 3
8 A M 29 1
9 B F 29 4
10 B F 25 5
11 B M 23 1
12 B M 23 1
13 B M 29 4
14 B M 28 5
15 B M 26 5
16 B F 25 4
17 B F 28 5
18 B F 27 4
19 B F 28 4
20 B M 26 1
21 B M 27 5
22 B M 25 5
23 B M 29 1
24 C M 26 1
25 C M 22 3
26 C M 29 2
27 C F 25 2
28 C M 27 3
29 C M 21 2
summarise(dfx, age = age + 1)
age
1 22
2 30
3 29
4 21
5 24
6 23
7 26
8 30
9 30
10 26
11 24
12 24
13 30
14 29
15 27
16 26
17 29
18 28
19 29
20 27
21 28
22 26
23 30
24 27
25 23
26 30
27 26
28 28
29 22
mutate(dfx, age = age + 1)
group sex age worktime
1 A M 22 5
2 A F 30 2
3 A M 29 1
4 A M 21 3
5 A F 24 3
6 A M 23 4
7 A M 26 3
8 A M 30 1
9 B F 30 4
10 B F 26 5
11 B M 24 1
12 B M 24 1
13 B M 30 4
14 B M 29 5
15 B M 27 5
16 B F 26 4
17 B F 29 5
18 B F 28 4
19 B F 29 4
20 B M 27 1
21 B M 28 5
22 B M 26 5
23 B M 30 1
24 C M 27 1
25 C M 23 3
26 C M 30 2
27 C F 26 2
28 C M 28 3
29 C M 22 2
> summarise(dfx, mean.age = mean(age), sd.age = sd(age))
mean.age sd.age
1 25.68966 2.804377
> mutate(dfx, mean.age = mean(age), sd.age = sd(age))
group sex age worktime mean.age sd.age
1 A M 21 5 25.68966 2.804377
2 A F 29 2 25.68966 2.804377
3 A M 28 1 25.68966 2.804377
4 A M 20 3 25.68966 2.804377
5 A F 23 3 25.68966 2.804377
6 A M 22 4 25.68966 2.804377
7 A M 25 3 25.68966 2.804377
8 A M 29 1 25.68966 2.804377
9 B F 29 4 25.68966 2.804377
10 B F 25 5 25.68966 2.804377
11 B M 23 1 25.68966 2.804377
12 B M 23 1 25.68966 2.804377
13 B M 29 4 25.68966 2.804377
14 B M 28 5 25.68966 2.804377
15 B M 26 5 25.68966 2.804377
16 B F 25 4 25.68966 2.804377
17 B F 28 5 25.68966 2.804377
18 B F 27 4 25.68966 2.804377
19 B F 28 4 25.68966 2.804377
20 B M 26 1 25.68966 2.804377
21 B M 27 5 25.68966 2.804377
22 B M 25 5 25.68966 2.804377
23 B M 29 1 25.68966 2.804377
24 C M 26 1 25.68966 2.804377
25 C M 22 3 25.68966 2.804377
26 C M 29 2 25.68966 2.804377
27 C F 25 2 25.68966 2.804377
28 C M 27 3 25.68966 2.804377
29 C M 21 2 25.68966 2.804377