R语言 Mutate家族


首先是加载相关的包,mutate主要属于dplyr包里,这里我们统一使用tidyverse包。
tidyverse包中含有各种数据整理以及画图的包,如下加载tidyverse包:

> library(tidyverse)
-- Attaching packages ------------------------ tidyverse 1.3.0 --
√ ggplot2 3.3.3     √ purrr   0.3.4
√ tibble  3.0.5     √ dplyr   1.0.3
√ tidyr   1.1.2     √ stringr 1.4.0
√ readr   1.4.0     √ forcats 0.5.1
-- Conflicts --------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

参考
https://dplyr.tidyverse.org/reference/mutate_all.html
教材《R数据科学》


mutate函数
mutate() 的主要功能是为数据框增加列。mutate总是把新的列加在数据集的最后。新列一旦创建就可以立即使用。
一个简单的栗子

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
#在最后的地方增加新列
> mutate(iris, new_col = Petal.Length + Petal.Width) %>% head()
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_col
1          5.1         3.5          1.4         0.2  setosa     1.6
2          4.9         3.0          1.4         0.2  setosa     1.6
3          4.7         3.2          1.3         0.2  setosa     1.5
4          4.6         3.1          1.5         0.2  setosa     1.7
5          5.0         3.6          1.4         0.2  setosa     1.6
6          5.4         3.9          1.7         0.4  setosa     2.1

PS:%>%是管道符号,用于把前面的数据向后传递,避免函数嵌套,增加代码的可阅读性。


mutate还有三个衍生函数:
mutate_at(); mutate_if(); mutate_all()
在官网上的关于这三个后缀的解释如下:
_all: affects every variable
_at: affects variables selected with a character vector or vars()
_if : affects variables selected with a predicate function:
其中,all是针对所有列,at是针对特定的列,if的满足特定条件的列
参数如下:
mutate_all(.tbl, .funs, ...)
mutate_if(.tbl, .predicate, .funs, ...)
mutate_at(.tbl, .vars, .funs, ..., .cols = NULL)

Arguments

image.png

解释一下官网给出的例子
mutate_at

scale2  <-  function(x, na.rm  =  FALSE)(x  -  mean(x, na.rm =  na.rm)) / sd(x, na.rm)
starwars  %>%  mutate_at(c("height", "mass"), scale2)
# A tibble: 87 x 14
   name    height  mass hair_color skin_color eye_color birth_year sex   gender
                                  
 1 Luke S~     NA    NA blond      fair       blue            19   male  mascu~
 2 C-3PO       NA    NA NA         gold       yellow         112   none  mascu~
 3 R2-D2       NA    NA NA         white, bl~ red             33   none  mascu~
 4 Darth ~     NA    NA none       white      yellow          41.9 male  mascu~
 5 Leia O~     NA    NA brown      light      brown           19   fema~ femin~
 6 Owen L~     NA    NA brown, gr~ light      blue            52   male  mascu~
 7 Beru W~     NA    NA brown      light      blue            47   fema~ femin~
 8 R5-D4       NA    NA NA         white, red red             NA   none  mascu~
 9 Biggs ~     NA    NA black      light      brown           24   male  mascu~
10 Obi-Wa~     NA    NA auburn, w~ fair       blue-gray       57   male  mascu~
# ... with 77 more rows, and 5 more variables: homeworld , species ,
#   films , vehicles , starships 

在height,mass列执行scale2
以下两个命令是等同的

starwars  %>% mutate_at(c(height,mass), scale2) 
starwars  %>% mutate(across(c("height", "mass"), scale2))

PS: across() 即让函数穿过所选择的列,即同时对所选择的多列应用若干函数,这里和mutate联合使用,达到mutate_at的作用。
mutate_at的参数中使用vars(), funs()来完善整个函数
eg:

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
> mutate_at(iris, vars(-Species), funs(log(.))) %>% head()
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1     1.629241    1.252763    0.3364722  -1.6094379  setosa
2     1.589235    1.098612    0.3364722  -1.6094379  setosa
3     1.547563    1.163151    0.2623643  -1.6094379  setosa
4     1.526056    1.131402    0.4054651  -1.6094379  setosa
5     1.609438    1.280934    0.3364722  -1.6094379  setosa
6     1.686399    1.360977    0.5306283  -0.9162907  setosa

mutate_if

starwars %>% mutate_if(is.numeric, scale2, na.rm = TRUE)
# A tibble: 87 x 14
   name        height    mass hair_color  skin_color eye_color birth_year sex  
                                       
 1 Luke Skyw~ -0.0678 -0.120  blond       fair       blue          -0.443 male 
 2 C-3PO      -0.212  -0.132  NA          gold       yellow         0.158 none 
 3 R2-D2      -2.25   -0.385  NA          white, bl~ red           -0.353 none 
 4 Darth Vad~  0.795   0.228  none        white      yellow        -0.295 male 
 5 Leia Orga~ -0.701  -0.285  brown       light      brown         -0.443 fema~
 6 Owen Lars   0.105   0.134  brown, grey light      blue          -0.230 male 
 7 Beru Whit~ -0.269  -0.132  brown       light      blue          -0.262 fema~
 8 R5-D4      -2.22   -0.385  NA          white, red red           NA     none 
 9 Biggs Dar~  0.249  -0.0786 black       light      brown         -0.411 male 
10 Obi-Wan K~  0.220  -0.120  auburn, wh~ fair       blue-gray     -0.198 male 
# ... with 77 more rows, and 6 more variables: gender , homeworld ,
#   species , films , vehicles , starships 

同理,这两行代码的性质也是一样的

starwars %>% mutate_if(is.numeric, scale2, na.rm = TRUE)
starwars  %>% mutate(across(where(is.numeric), scale2, na.rm = TRUE))

使用where函数筛选出numeric的列,再使用across联合这些列,因此函数可以特定的穿过这些列,达到mutate_if的作用。

如果你想对数据框中的某列同时使用多个函数,使用list()。当同时使用多个function时,将会创建一个新的列,而不是像之前那样在原列上进行修饰。
eg:

 > head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
> iris %>% mutate_if(is.numeric, list(scale2, log)) %>% head()
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_fn1
1          5.1         3.5          1.4         0.2  setosa       -0.8976739
2          4.9         3.0          1.4         0.2  setosa       -1.1392005
3          4.7         3.2          1.3         0.2  setosa       -1.3807271
4          4.6         3.1          1.5         0.2  setosa       -1.5014904
5          5.0         3.6          1.4         0.2  setosa       -1.0184372
6          5.4         3.9          1.7         0.4  setosa       -0.5353840
  Sepal.Width_fn1 Petal.Length_fn1 Petal.Width_fn1 Sepal.Length_fn2
1      1.01560199        -1.335752       -1.311052         1.629241
2     -0.13153881        -1.335752       -1.311052         1.589235
3      0.32731751        -1.392399       -1.311052         1.547563
4      0.09788935        -1.279104       -1.311052         1.526056
5      1.24503015        -1.335752       -1.311052         1.609438
6      1.93331463        -1.165809       -1.048667         1.686399
  Sepal.Width_fn2 Petal.Length_fn2 Petal.Width_fn2
1        1.252763        0.3364722      -1.6094379
2        1.098612        0.3364722      -1.6094379
3        1.163151        0.2623643      -1.6094379
4        1.131402        0.4054651      -1.6094379
5        1.280934        0.3364722      -1.6094379
6        1.360977        0.5306283      -0.9162907

还可以进一步对function进行命名,注意下面的dataframe的列名与上面的不一样,冠以函数名。

> iris %>% mutate_if(is.numeric, list(scale = scale2, log = log)) %>% head()
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_scale
1          5.1         3.5          1.4         0.2  setosa         -0.8976739
2          4.9         3.0          1.4         0.2  setosa         -1.1392005
3          4.7         3.2          1.3         0.2  setosa         -1.3807271
4          4.6         3.1          1.5         0.2  setosa         -1.5014904
5          5.0         3.6          1.4         0.2  setosa         -1.0184372
6          5.4         3.9          1.7         0.4  setosa         -0.5353840
  Sepal.Width_scale Petal.Length_scale Petal.Width_scale Sepal.Length_log
1        1.01560199          -1.335752         -1.311052         1.629241
2       -0.13153881          -1.335752         -1.311052         1.589235
3        0.32731751          -1.392399         -1.311052         1.547563
4        0.09788935          -1.279104         -1.311052         1.526056
5        1.24503015          -1.335752         -1.311052         1.609438
6        1.93331463          -1.165809         -1.048667         1.686399
  Sepal.Width_log Petal.Length_log Petal.Width_log
1        1.252763        0.3364722      -1.6094379
2        1.098612        0.3364722      -1.6094379
3        1.163151        0.2623643      -1.6094379
4        1.131402        0.4054651      -1.6094379
5        1.280934        0.3364722      -1.6094379
6        1.360977        0.5306283      -0.9162907

mutate_all
mutate_all网页上没有过多的例子,但是根据其解释,应该是对所有的变量进行操作。

> a = matrix(rep(1:5,each =10),10) %>% as.data.frame()
> a
   V1 V2 V3 V4 V5
1   1  2  3  4  5
2   1  2  3  4  5
3   1  2  3  4  5
4   1  2  3  4  5
5   1  2  3  4  5
6   1  2  3  4  5
7   1  2  3  4  5
8   1  2  3  4  5
9   1  2  3  4  5
10  1  2  3  4  5
> mutate_all(a,funs(sum(.)))
   V1 V2 V3 V4 V5
1  10 20 30 40 50
2  10 20 30 40 50
3  10 20 30 40 50
4  10 20 30 40 50
5  10 20 30 40 50
6  10 20 30 40 50
7  10 20 30 40 50
8  10 20 30 40 50
9  10 20 30 40 50
10 10 20 30 40 50

补充一点:
调用funs时,可以按照例子那样自己写一个function,多个function使用list(),也可以使用~fun(.)调用。


image.png
starwars  %>%  mutate_at(c("height", "mass"), ~scale2(., na.rm =  TRUE))

总结
与mutate增加新变量不同,mutate的衍生函数主要是按列对数据赋予function,如果想增加按行,可以增加group_by以及rowwise函数。

你可能感兴趣的:(R语言 Mutate家族)