基于tidyverse的R使用指南

基于tidyverse的R使用指南

tidyverse包简介

tidyverse包是一系列常用包,如ggplot2,dplyr,tibble,readr,forcats等包的集合,覆盖数据读取、数据清洗、可视化,可处理因子变量、日期变量、字符串变量。

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.1     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

数据读取

长用读取有readr中的read_csv(),以及readxl包中的read_excel函数。使用其它函数读入的数据,可使用as_tibble转换成tibble对象。

# dat <- read_csv("iris.csv")
# dat

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
iris<-iris %>% as_tibble()

数据预览

iris %>% str()
## tibble[,5] [150 x 5] (S3: tbl_df/tbl/data.frame)
##  $ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
iris %>% glimpse()
## Rows: 150
## Columns: 5
## $ Sepal.Length  5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.~
## $ Sepal.Width   3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.~
## $ Petal.Length  1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.~
## $ Petal.Width   0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.~
## $ Species       setosa, setosa, setosa, setosa, setosa, setosa, setosa, s~
iris %>% summary()
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
iris %>% skimr::skim()
Data summary
Name Piped data
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Sepal.Length 0 1 5.84 0.83 4.3 5.1 5.80 6.4 7.9 ▆▇▇▅▂
Sepal.Width 0 1 3.06 0.44 2.0 2.8 3.00 3.3 4.4 ▁▆▇▂▁
Petal.Length 0 1 3.76 1.77 1.0 1.6 4.35 5.1 6.9 ▇▁▆▇▂
Petal.Width 0 1 1.20 0.76 0.1 0.3 1.30 1.8 2.5 ▇▁▇▅▃

子集选取

行选择

# 行选择
iris %>% head(5)
## # A tibble: 5 x 5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##                                   
## 1          5.1         3.5          1.4         0.2 setosa 
## 2          4.9         3            1.4         0.2 setosa 
## 3          4.7         3.2          1.3         0.2 setosa 
## 4          4.6         3.1          1.5         0.2 setosa 
## 5          5           3.6          1.4         0.2 setosa
iris %>% slice(3:5)
## # A tibble: 3 x 5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##                                   
## 1          4.7         3.2          1.3         0.2 setosa 
## 2          4.6         3.1          1.5         0.2 setosa 
## 3          5           3.6          1.4         0.2 setosa

行条件选择

iris %>% distinct()
## # A tibble: 149 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##                                    
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # ... with 139 more rows
iris %>% distinct(across(1:2))
## # A tibble: 117 x 2
##    Sepal.Length Sepal.Width
##                  
##  1          5.1         3.5
##  2          4.9         3  
##  3          4.7         3.2
##  4          4.6         3.1
##  5          5           3.6
##  6          5.4         3.9
##  7          4.6         3.4
##  8          5           3.4
##  9          4.4         2.9
## 10          4.9         3.1
## # ... with 107 more rows
iris %>% distinct(across(where(is.numeric), ~round(.x,0)))
## # A tibble: 33 x 4
##    Sepal.Length Sepal.Width Petal.Length Petal.Width
##                                 
##  1            5           4            1           0
##  2            5           3            1           0
##  3            5           3            2           0
##  4            5           4            2           0
##  5            4           3            1           0
##  6            6           4            1           0
##  7            6           4            2           0
##  8            4           2            1           0
##  9            5           4            2           1
## 10            7           3            5           1
## # ... with 23 more rows
iris %>% filter(Sepal.Length>5,Sepal.Width>3.5)
## # A tibble: 16 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
##                                      
##  1          5.4         3.9          1.7         0.4 setosa   
##  2          5.4         3.7          1.5         0.2 setosa   
##  3          5.8         4            1.2         0.2 setosa   
##  4          5.7         4.4          1.5         0.4 setosa   
##  5          5.4         3.9          1.3         0.4 setosa   
##  6          5.7         3.8          1.7         0.3 setosa   
##  7          5.1         3.8          1.5         0.3 setosa   
##  8          5.1         3.7          1.5         0.4 setosa   
##  9          5.2         4.1          1.5         0.1 setosa   
## 10          5.5         4.2          1.4         0.2 setosa   
## 11          5.1         3.8          1.9         0.4 setosa   
## 12          5.1         3.8          1.6         0.2 setosa   
## 13          5.3         3.7          1.5         0.2 setosa   
## 14          7.2         3.6          6.1         2.5 virginica
## 15          7.7         3.8          6.7         2.2 virginica
## 16          7.9         3.8          6.4         2   virginica
iris %>% filter(if_any(everything(), ~ !is.na(.x)))
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##                                    
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # ... with 140 more rows
iris %>% filter(if_all(everything(), ~ !is.na(.x)))
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##                                    
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # ... with 140 more rows
iris %>% filter(across(everything(), ~ !is.na(.x)))
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##                                    
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # ... with 140 more rows
iris %>% top_n(Sepal.Length,n=6)
## # A tibble: 6 x 5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
##                                     
## 1          7.6         3            6.6         2.1 virginica
## 2          7.7         3.8          6.7         2.2 virginica
## 3          7.7         2.6          6.9         2.3 virginica
## 4          7.7         2.8          6.7         2   virginica
## 5          7.9         3.8          6.4         2   virginica
## 6          7.7         3            6.1         2.3 virginica
iris %>% slice_tail(n=10)
## # A tibble: 10 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
##                                      
##  1          6.7         3.1          5.6         2.4 virginica
##  2          6.9         3.1          5.1         2.3 virginica
##  3          5.8         2.7          5.1         1.9 virginica
##  4          6.8         3.2          5.9         2.3 virginica
##  5          6.7         3.3          5.7         2.5 virginica
##  6          6.7         3            5.2         2.3 virginica
##  7          6.3         2.5          5           1.9 virginica
##  8          6.5         3            5.2         2   virginica
##  9          6.2         3.4          5.4         2.3 virginica
## 10          5.9         3            5.1         1.8 virginica
iris %>% slice_head(n=10)
## # A tibble: 10 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##                                    
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa
iris %>% slice_max(Sepal.Length,n=5)
## # A tibble: 5 x 5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
##                                     
## 1          7.9         3.8          6.4         2   virginica
## 2          7.7         3.8          6.7         2.2 virginica
## 3          7.7         2.6          6.9         2.3 virginica
## 4          7.7         2.8          6.7         2   virginica
## 5          7.7         3            6.1         2.3 virginica
iris %>% slice_min(Sepal.Length,n=5)
## # A tibble: 5 x 5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##                                   
## 1          4.3         3            1.1         0.1 setosa 
## 2          4.4         2.9          1.4         0.2 setosa 
## 3          4.4         3            1.3         0.2 setosa 
## 4          4.4         3.2          1.3         0.2 setosa 
## 5          4.5         2.3          1.3         0.3 setosa
iris %>% slice_min(Sepal.Length,prop=0.1)
## # A tibble: 16 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##                                    
##  1          4.3         3            1.1         0.1 setosa 
##  2          4.4         2.9          1.4         0.2 setosa 
##  3          4.4         3            1.3         0.2 setosa 
##  4          4.4         3.2          1.3         0.2 setosa 
##  5          4.5         2.3          1.3         0.3 setosa 
##  6          4.6         3.1          1.5         0.2 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          4.6         3.6          1           0.2 setosa 
##  9          4.6         3.2          1.4         0.2 setosa 
## 10          4.7         3.2          1.3         0.2 setosa 
## 11          4.7         3.2          1.6         0.2 setosa 
## 12          4.8         3.4          1.6         0.2 setosa 
## 13          4.8         3            1.4         0.1 setosa 
## 14          4.8         3.4          1.9         0.2 setosa 
## 15          4.8         3.1          1.6         0.2 setosa 
## 16          4.8         3            1.4         0.3 setosa
iris %>% slice_sample(n=20,replace=T) #随机抽取20个
## # A tibble: 20 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
##                                       
##  1          4.4         3            1.3         0.2 setosa    
##  2          5.5         2.3          4           1.3 versicolor
##  3          6.1         2.9          4.7         1.4 versicolor
##  4          5.9         3            4.2         1.5 versicolor
##  5          6.1         2.8          4           1.3 versicolor
##  6          5.2         2.7          3.9         1.4 versicolor
##  7          6.1         3            4.9         1.8 virginica 
##  8          4.9         2.5          4.5         1.7 virginica 
##  9          6.9         3.2          5.7         2.3 virginica 
## 10          4.4         3            1.3         0.2 setosa    
## 11          5.8         2.8          5.1         2.4 virginica 
## 12          7.1         3            5.9         2.1 virginica 
## 13          5.7         4.4          1.5         0.4 setosa    
## 14          5.1         3.8          1.9         0.4 setosa    
## 15          5.7         3            4.2         1.2 versicolor
## 16          5.5         2.5          4           1.3 versicolor
## 17          6.7         2.5          5.8         1.8 virginica 
## 18          6.7         3            5.2         2.3 virginica 
## 19          6.9         3.1          4.9         1.5 versicolor
## 20          4.8         3.4          1.6         0.2 setosa

列选择

iris %>% select(1,3)
## # A tibble: 150 x 2
##    Sepal.Length Petal.Length
##                   
##  1          5.1          1.4
##  2          4.9          1.4
##  3          4.7          1.3
##  4          4.6          1.5
##  5          5            1.4
##  6          5.4          1.7
##  7          4.6          1.4
##  8          5            1.5
##  9          4.4          1.4
## 10          4.9          1.5
## # ... with 140 more rows
iris %>% select(-c(1,3))
## # A tibble: 150 x 3
##    Sepal.Width Petal.Width Species
##                    
##  1         3.5         0.2 setosa 
##  2         3           0.2 setosa 
##  3         3.2         0.2 setosa 
##  4         3.1         0.2 setosa 
##  5         3.6         0.2 setosa 
##  6         3.9         0.4 setosa 
##  7         3.4         0.3 setosa 
##  8         3.4         0.2 setosa 
##  9         2.9         0.2 setosa 
## 10         3.1         0.1 setosa 
## # ... with 140 more rows
iris %>% select(Sepal.Length:Petal.Length)
## # A tibble: 150 x 3
##    Sepal.Length Sepal.Width Petal.Length
##                          
##  1          5.1         3.5          1.4
##  2          4.9         3            1.4
##  3          4.7         3.2          1.3
##  4          4.6         3.1          1.5
##  5          5           3.6          1.4
##  6          5.4         3.9          1.7
##  7          4.6         3.4          1.4
##  8          5           3.4          1.5
##  9          4.4         2.9          1.4
## 10          4.9         3.1          1.5
## # ... with 140 more rows
iris %>% select(-Sepal.Length:Petal.Length)
## Warning in x:y: numerical expression has 4 elements: only the first used
## # A tibble: 150 x 2
##    Sepal.Width Petal.Length
##                  
##  1         3.5          1.4
##  2         3            1.4
##  3         3.2          1.3
##  4         3.1          1.5
##  5         3.6          1.4
##  6         3.9          1.7
##  7         3.4          1.4
##  8         3.4          1.5
##  9         2.9          1.4
## 10         3.1          1.5
## # ... with 140 more rows
iris %>% select(everything(1:2))
## # A tibble: 150 x 2
##    Sepal.Length Sepal.Width
##                  
##  1          5.1         3.5
##  2          4.9         3  
##  3          4.7         3.2
##  4          4.6         3.1
##  5          5           3.6
##  6          5.4         3.9
##  7          4.6         3.4
##  8          5           3.4
##  9          4.4         2.9
## 10          4.9         3.1
## # ... with 140 more rows
iris %>% select(last_col())
## # A tibble: 150 x 1
##    Species
##      
##  1 setosa 
##  2 setosa 
##  3 setosa 
##  4 setosa 
##  5 setosa 
##  6 setosa 
##  7 setosa 
##  8 setosa 
##  9 setosa 
## 10 setosa 
## # ... with 140 more rows
iris %>% select(last_col(0:2))
## # A tibble: 150 x 3
##    Species Petal.Width Petal.Length
##                     
##  1 setosa          0.2          1.4
##  2 setosa          0.2          1.4
##  3 setosa          0.2          1.3
##  4 setosa          0.2          1.5
##  5 setosa          0.2          1.4
##  6 setosa          0.4          1.7
##  7 setosa          0.3          1.4
##  8 setosa          0.2          1.5
##  9 setosa          0.2          1.4
## 10 setosa          0.1          1.5
## # ... with 140 more rows

列条件选择

# 变量名规则
# starts_with(),ends_with(),contains(),matches("(.)\\1"),num_range("x", 1:3)
iris %>% select(starts_with("S") & ends_with('s'))
## # A tibble: 150 x 1
##    Species
##      
##  1 setosa 
##  2 setosa 
##  3 setosa 
##  4 setosa 
##  5 setosa 
##  6 setosa 
##  7 setosa 
##  8 setosa 
##  9 setosa 
## 10 setosa 
## # ... with 140 more rows
iris %>% select(contains("Length"))
## # A tibble: 150 x 2
##    Sepal.Length Petal.Length
##                   
##  1          5.1          1.4
##  2          4.9          1.4
##  3          4.7          1.3
##  4          4.6          1.5
##  5          5            1.4
##  6          5.4          1.7
##  7          4.6          1.4
##  8          5            1.5
##  9          4.4          1.4
## 10          4.9          1.5
## # ... with 140 more rows
iris %>% select(all_of(c("Sepal.Length","Species")))
## # A tibble: 150 x 2
##    Sepal.Length Species
##              
##  1          5.1 setosa 
##  2          4.9 setosa 
##  3          4.7 setosa 
##  4          4.6 setosa 
##  5          5   setosa 
##  6          5.4 setosa 
##  7          4.6 setosa 
##  8          5   setosa 
##  9          4.4 setosa 
## 10          4.9 setosa 
## # ... with 140 more rows
iris %>% select(any_of(c("Sepal.Length","Species","aaa")))
## # A tibble: 150 x 2
##    Sepal.Length Species
##              
##  1          5.1 setosa 
##  2          4.9 setosa 
##  3          4.7 setosa 
##  4          4.6 setosa 
##  5          5   setosa 
##  6          5.4 setosa 
##  7          4.6 setosa 
##  8          5   setosa 
##  9          4.4 setosa 
## 10          4.9 setosa 
## # ... with 140 more rows
# 数据类型规则
iris %>% select(where(is.numeric))
## # A tibble: 150 x 4
##    Sepal.Length Sepal.Width Petal.Length Petal.Width
##                                 
##  1          5.1         3.5          1.4         0.2
##  2          4.9         3            1.4         0.2
##  3          4.7         3.2          1.3         0.2
##  4          4.6         3.1          1.5         0.2
##  5          5           3.6          1.4         0.2
##  6          5.4         3.9          1.7         0.4
##  7          4.6         3.4          1.4         0.3
##  8          5           3.4          1.5         0.2
##  9          4.4         2.9          1.4         0.2
## 10          4.9         3.1          1.5         0.1
## # ... with 140 more rows
iris %>% select(where(is.factor))
## # A tibble: 150 x 1
##    Species
##      
##  1 setosa 
##  2 setosa 
##  3 setosa 
##  4 setosa 
##  5 setosa 
##  6 setosa 
##  7 setosa 
##  8 setosa 
##  9 setosa 
## 10 setosa 
## # ... with 140 more rows
iris %>% select(where(~ is.factor(.x)))
## # A tibble: 150 x 1
##    Species
##      
##  1 setosa 
##  2 setosa 
##  3 setosa 
##  4 setosa 
##  5 setosa 
##  6 setosa 
##  7 setosa 
##  8 setosa 
##  9 setosa 
## 10 setosa 
## # ... with 140 more rows
iris %>% select(where(~ is.numeric(.x) & mean(.x) > 3.5))
## Warning in mean.default(.x): argument is not numeric or logical: returning NA
## # A tibble: 150 x 2
##    Sepal.Length Petal.Length
##                   
##  1          5.1          1.4
##  2          4.9          1.4
##  3          4.7          1.3
##  4          4.6          1.5
##  5          5            1.4
##  6          5.4          1.7
##  7          4.6          1.4
##  8          5            1.5
##  9          4.4          1.4
## 10          4.9          1.5
## # ... with 140 more rows

排序

根据列数值行排序

sort,rank,order

aa=c(10,4,6,2,8)
sort(aa,decreasing = F)
## [1]  2  4  6  8 10
rank(aa) # 排名位置
## [1] 5 2 3 1 4
order(aa) # index
## [1] 4 2 3 5 1
iris %>% arrange(Sepal.Length,desc(Sepal.Width))
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##                                    
##  1          4.3         3            1.1         0.1 setosa 
##  2          4.4         3.2          1.3         0.2 setosa 
##  3          4.4         3            1.3         0.2 setosa 
##  4          4.4         2.9          1.4         0.2 setosa 
##  5          4.5         2.3          1.3         0.3 setosa 
##  6          4.6         3.6          1           0.2 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          4.6         3.2          1.4         0.2 setosa 
##  9          4.6         3.1          1.5         0.2 setosa 
## 10          4.7         3.2          1.3         0.2 setosa 
## # ... with 140 more rows
iris %>% arrange(across(everything(), desc))
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
##                                      
##  1          7.9         3.8          6.4         2   virginica
##  2          7.7         3.8          6.7         2.2 virginica
##  3          7.7         3            6.1         2.3 virginica
##  4          7.7         2.8          6.7         2   virginica
##  5          7.7         2.6          6.9         2.3 virginica
##  6          7.6         3            6.6         2.1 virginica
##  7          7.4         2.8          6.1         1.9 virginica
##  8          7.3         2.9          6.3         1.8 virginica
##  9          7.2         3.6          6.1         2.5 virginica
## 10          7.2         3.2          6           1.8 virginica
## # ... with 140 more rows
iris %>% arrange(across(starts_with("Sepal"),desc))
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
##                                      
##  1          7.9         3.8          6.4         2   virginica
##  2          7.7         3.8          6.7         2.2 virginica
##  3          7.7         3            6.1         2.3 virginica
##  4          7.7         2.8          6.7         2   virginica
##  5          7.7         2.6          6.9         2.3 virginica
##  6          7.6         3            6.6         2.1 virginica
##  7          7.4         2.8          6.1         1.9 virginica
##  8          7.3         2.9          6.3         1.8 virginica
##  9          7.2         3.6          6.1         2.5 virginica
## 10          7.2         3.2          6           1.8 virginica
## # ... with 140 more rows

列排序

iris %>% select(Species, everything())
## # A tibble: 150 x 5
##    Species Sepal.Length Sepal.Width Petal.Length Petal.Width
##                                    
##  1 setosa           5.1         3.5          1.4         0.2
##  2 setosa           4.9         3            1.4         0.2
##  3 setosa           4.7         3.2          1.3         0.2
##  4 setosa           4.6         3.1          1.5         0.2
##  5 setosa           5           3.6          1.4         0.2
##  6 setosa           5.4         3.9          1.7         0.4
##  7 setosa           4.6         3.4          1.4         0.3
##  8 setosa           5           3.4          1.5         0.2
##  9 setosa           4.4         2.9          1.4         0.2
## 10 setosa           4.9         3.1          1.5         0.1
## # ... with 140 more rows
iris %>% relocate(Species)
## # A tibble: 150 x 5
##    Species Sepal.Length Sepal.Width Petal.Length Petal.Width
##                                    
##  1 setosa           5.1         3.5          1.4         0.2
##  2 setosa           4.9         3            1.4         0.2
##  3 setosa           4.7         3.2          1.3         0.2
##  4 setosa           4.6         3.1          1.5         0.2
##  5 setosa           5           3.6          1.4         0.2
##  6 setosa           5.4         3.9          1.7         0.4
##  7 setosa           4.6         3.4          1.4         0.3
##  8 setosa           5           3.4          1.5         0.2
##  9 setosa           4.4         2.9          1.4         0.2
## 10 setosa           4.9         3.1          1.5         0.1
## # ... with 140 more rows
iris %>% relocate(ends_with("Width"))
## # A tibble: 150 x 5
##    Sepal.Width Petal.Width Sepal.Length Petal.Length Species
##                                    
##  1         3.5         0.2          5.1          1.4 setosa 
##  2         3           0.2          4.9          1.4 setosa 
##  3         3.2         0.2          4.7          1.3 setosa 
##  4         3.1         0.2          4.6          1.5 setosa 
##  5         3.6         0.2          5            1.4 setosa 
##  6         3.9         0.4          5.4          1.7 setosa 
##  7         3.4         0.3          4.6          1.4 setosa 
##  8         3.4         0.2          5            1.5 setosa 
##  9         2.9         0.2          4.4          1.4 setosa 
## 10         3.1         0.1          4.9          1.5 setosa 
## # ... with 140 more rows

数据整理

列改名

iris %>% rename(Sepal_Length=Sepal.Length)
## # A tibble: 150 x 5
##    Sepal_Length Sepal.Width Petal.Length Petal.Width Species
##                                    
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # ... with 140 more rows

行计数

count()

iris %>% count()
## # A tibble: 1 x 1
##       n
##   
## 1   150
iris %>% tally()
## # A tibble: 1 x 1
##       n
##   
## 1   150
iris %>% count(Species,Sepal.Length,sort=T)
## # A tibble: 57 x 3
##    Species    Sepal.Length     n
##                  
##  1 setosa              5       8
##  2 setosa              5.1     8
##  3 virginica           6.3     6
##  4 setosa              4.8     5
##  5 setosa              5.4     5
##  6 versicolor          5.5     5
##  7 versicolor          5.6     5
##  8 versicolor          5.7     5
##  9 virginica           6.4     5
## 10 virginica           6.7     5
## # ... with 47 more rows

计算新值

mutate()transmute()

iris %>% mutate(delta=Sepal.Length-Petal.Length)
## # A tibble: 150 x 6
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species delta
##                                     
##  1          5.1         3.5          1.4         0.2 setosa    3.7
##  2          4.9         3            1.4         0.2 setosa    3.5
##  3          4.7         3.2          1.3         0.2 setosa    3.4
##  4          4.6         3.1          1.5         0.2 setosa    3.1
##  5          5           3.6          1.4         0.2 setosa    3.6
##  6          5.4         3.9          1.7         0.4 setosa    3.7
##  7          4.6         3.4          1.4         0.3 setosa    3.2
##  8          5           3.4          1.5         0.2 setosa    3.5
##  9          4.4         2.9          1.4         0.2 setosa    3  
## 10          4.9         3.1          1.5         0.1 setosa    3.4
## # ... with 140 more rows
iris %>% transmute(delta=Sepal.Length-Petal.Length)
## # A tibble: 150 x 1
##    delta
##    
##  1   3.7
##  2   3.5
##  3   3.4
##  4   3.1
##  5   3.6
##  6   3.7
##  7   3.2
##  8   3.5
##  9   3  
## 10   3.4
## # ... with 140 more rows
iris %>% mutate(across(c(1:2),~.x/2,.names="new.{.col}"))
## # A tibble: 150 x 7
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species new.Sepal.Length
##                                                
##  1          5.1         3.5          1.4         0.2 setosa              2.55
##  2          4.9         3            1.4         0.2 setosa              2.45
##  3          4.7         3.2          1.3         0.2 setosa              2.35
##  4          4.6         3.1          1.5         0.2 setosa              2.3 
##  5          5           3.6          1.4         0.2 setosa              2.5 
##  6          5.4         3.9          1.7         0.4 setosa              2.7 
##  7          4.6         3.4          1.4         0.3 setosa              2.3 
##  8          5           3.4          1.5         0.2 setosa              2.5 
##  9          4.4         2.9          1.4         0.2 setosa              2.2 
## 10          4.9         3.1          1.5         0.1 setosa              2.45
## # ... with 140 more rows, and 1 more variable: new.Sepal.Width 
iris %>% mutate(across(.cols=where(is.numeric),.fns=list(log=log,abs=abs),.names="{.fn}_{.col}"))
## # A tibble: 150 x 13
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species log_Sepal.Length
##                                                
##  1          5.1         3.5          1.4         0.2 setosa              1.63
##  2          4.9         3            1.4         0.2 setosa              1.59
##  3          4.7         3.2          1.3         0.2 setosa              1.55
##  4          4.6         3.1          1.5         0.2 setosa              1.53
##  5          5           3.6          1.4         0.2 setosa              1.61
##  6          5.4         3.9          1.7         0.4 setosa              1.69
##  7          4.6         3.4          1.4         0.3 setosa              1.53
##  8          5           3.4          1.5         0.2 setosa              1.61
##  9          4.4         2.9          1.4         0.2 setosa              1.48
## 10          4.9         3.1          1.5         0.1 setosa              1.59
## # ... with 140 more rows, and 7 more variables: abs_Sepal.Length ,
## #   log_Sepal.Width , abs_Sepal.Width , log_Petal.Length ,
## #   abs_Petal.Length , log_Petal.Width , abs_Petal.Width 

汇总

group_by,summarise

iris %>% group_by(Species) %>% tally()
## # A tibble: 3 x 2
##   Species        n
##         
## 1 setosa        50
## 2 versicolor    50
## 3 virginica     50
iris %>% group_by(Species) %>% summarise(n=n(),
                                         first=first(Sepal.Length),
                                         max=max(Sepal.Length)
                                         )
## # A tibble: 3 x 4
##   Species        n first   max
##           
## 1 setosa        50   5.1   5.8
## 2 versicolor    50   7     7  
## 3 virginica     50   6.3   7.9
iris %>% group_by(Species) %>% 
  summarise(n=n(),mean_Sepal.Length=mean(Sepal.Length)) %>% 
  arrange(mean_Sepal.Length)
## # A tibble: 3 x 3
##   Species        n mean_Sepal.Length
##                      
## 1 setosa        50              5.01
## 2 versicolor    50              5.94
## 3 virginica     50              6.59
iris %>% group_by(Species) %>% 
  summarise(n=n(),mean_Sepal.Length=mean(Sepal.Length)) %>% 
  arrange(desc(mean_Sepal.Length)) %>% 
  filter(rank(desc(mean_Sepal.Length))<3)
## # A tibble: 2 x 3
##   Species        n mean_Sepal.Length
##                      
## 1 virginica     50              6.59
## 2 versicolor    50              5.94
iris %>% group_by(Species) %>% 
  summarise(n=n(),across(where(is.numeric),list(mean=mean,se=~sd(.x)/sqrt(n)),.names="{.fn}_{.col}"))
## # A tibble: 3 x 12
##   Species     n mean_Sepal.Leng~ se_Sepal.Length mean_Sepal.Width se_Sepal.Width
##                                                   
## 1 setosa     50             5.01          0.0498             3.43         0.0536
## 2 versic~    50             5.94          0.0730             2.77         0.0444
## 3 virgin~    50             6.59          0.0899             2.97         0.0456
## # ... with 6 more variables: mean_Petal.Length , se_Petal.Length ,
## #   mean_Petal.Width , se_Petal.Width , mean_n , se_n 

连接join

iris1 <- iris %>% select(1,5) %>% slice(c(1,52:53))
iris2 <- iris %>% select(2,5) %>% slice(c(53,105:107))
iris1
## # A tibble: 3 x 2
##   Sepal.Length Species   
##                
## 1          5.1 setosa    
## 2          6.4 versicolor
## 3          6.9 versicolor
iris2
## # A tibble: 4 x 2
##   Sepal.Width Species   
##               
## 1         3.1 versicolor
## 2         3   virginica 
## 3         3   virginica 
## 4         2.5 virginica
iris1 %>% left_join(iris2) #keep x
## Joining, by = "Species"
## # A tibble: 3 x 3
##   Sepal.Length Species    Sepal.Width
##                       
## 1          5.1 setosa            NA  
## 2          6.4 versicolor         3.1
## 3          6.9 versicolor         3.1
iris1 %>% right_join(iris2) # keep y
## Joining, by = "Species"
## # A tibble: 5 x 3
##   Sepal.Length Species    Sepal.Width
##                       
## 1          6.4 versicolor         3.1
## 2          6.9 versicolor         3.1
## 3         NA   virginica          3  
## 4         NA   virginica          3  
## 5         NA   virginica          2.5
iris1 %>% inner_join(iris2) # keep x & y
## Joining, by = "Species"
## # A tibble: 2 x 3
##   Sepal.Length Species    Sepal.Width
##                       
## 1          6.4 versicolor         3.1
## 2          6.9 versicolor         3.1
iris1 %>% full_join(iris2) # both keep
## Joining, by = "Species"
## # A tibble: 6 x 3
##   Sepal.Length Species    Sepal.Width
##                       
## 1          5.1 setosa            NA  
## 2          6.4 versicolor         3.1
## 3          6.9 versicolor         3.1
## 4         NA   virginica          3  
## 5         NA   virginica          3  
## 6         NA   virginica          2.5
iris1 %>% semi_join(iris2,by=c("Species"="Species")) # return x match y
## # A tibble: 2 x 2
##   Sepal.Length Species   
##                
## 1          6.4 versicolor
## 2          6.9 versicolor
iris1 %>% anti_join(iris2) # return x wihout match in y
## Joining, by = "Species"
## # A tibble: 1 x 2
##   Sepal.Length Species
##             
## 1          5.1 setosa

集合set操作

iris1 <- iris %>% as_tibble(rownames = "ID") %>% slice(c(1:5,52:55))
iris2 <- iris %>% as_tibble(rownames = "ID") %>% slice(c(3:7,54:57))
iris1
## # A tibble: 9 x 6
##   ID    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
##                                       
## 1 1              5.1         3.5          1.4         0.2 setosa    
## 2 2              4.9         3            1.4         0.2 setosa    
## 3 3              4.7         3.2          1.3         0.2 setosa    
## 4 4              4.6         3.1          1.5         0.2 setosa    
## 5 5              5           3.6          1.4         0.2 setosa    
## 6 52             6.4         3.2          4.5         1.5 versicolor
## 7 53             6.9         3.1          4.9         1.5 versicolor
## 8 54             5.5         2.3          4           1.3 versicolor
## 9 55             6.5         2.8          4.6         1.5 versicolor
iris2
## # A tibble: 9 x 6
##   ID    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
##                                       
## 1 3              4.7         3.2          1.3         0.2 setosa    
## 2 4              4.6         3.1          1.5         0.2 setosa    
## 3 5              5           3.6          1.4         0.2 setosa    
## 4 6              5.4         3.9          1.7         0.4 setosa    
## 5 7              4.6         3.4          1.4         0.3 setosa    
## 6 54             5.5         2.3          4           1.3 versicolor
## 7 55             6.5         2.8          4.6         1.5 versicolor
## 8 56             5.7         2.8          4.5         1.3 versicolor
## 9 57             6.3         3.3          4.7         1.6 versicolor
iris1 %>% intersect(iris2) #交集
## # A tibble: 5 x 6
##   ID    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
##                                       
## 1 3              4.7         3.2          1.3         0.2 setosa    
## 2 4              4.6         3.1          1.5         0.2 setosa    
## 3 5              5           3.6          1.4         0.2 setosa    
## 4 54             5.5         2.3          4           1.3 versicolor
## 5 55             6.5         2.8          4.6         1.5 versicolor
iris1 %>% union(iris2) #并集
## # A tibble: 13 x 6
##    ID    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
##                                        
##  1 1              5.1         3.5          1.4         0.2 setosa    
##  2 2              4.9         3            1.4         0.2 setosa    
##  3 3              4.7         3.2          1.3         0.2 setosa    
##  4 4              4.6         3.1          1.5         0.2 setosa    
##  5 5              5           3.6          1.4         0.2 setosa    
##  6 52             6.4         3.2          4.5         1.5 versicolor
##  7 53             6.9         3.1          4.9         1.5 versicolor
##  8 54             5.5         2.3          4           1.3 versicolor
##  9 55             6.5         2.8          4.6         1.5 versicolor
## 10 6              5.4         3.9          1.7         0.4 setosa    
## 11 7              4.6         3.4          1.4         0.3 setosa    
## 12 56             5.7         2.8          4.5         1.3 versicolor
## 13 57             6.3         3.3          4.7         1.6 versicolor
iris1 %>% setdiff(iris2)
## # A tibble: 4 x 6
##   ID    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
##                                       
## 1 1              5.1         3.5          1.4         0.2 setosa    
## 2 2              4.9         3            1.4         0.2 setosa    
## 3 52             6.4         3.2          4.5         1.5 versicolor
## 4 53             6.9         3.1          4.9         1.5 versicolor

合并bind

iris1 <- iris %>% slice(1:5)
iris2 <- iris %>% slice(101:105)
iris1 %>% bind_cols(iris2) #行数要相同
## New names:
## * Sepal.Length -> Sepal.Length...1
## * Sepal.Width -> Sepal.Width...2
## * Petal.Length -> Petal.Length...3
## * Petal.Width -> Petal.Width...4
## * Species -> Species...5
## * ...
## # A tibble: 5 x 10
##   Sepal.Length...1 Sepal.Width...2 Petal.Length...3 Petal.Width...4 Species...5
##                                                       
## 1              5.1             3.5              1.4             0.2 setosa     
## 2              4.9             3                1.4             0.2 setosa     
## 3              4.7             3.2              1.3             0.2 setosa     
## 4              4.6             3.1              1.5             0.2 setosa     
## 5              5               3.6              1.4             0.2 setosa     
## # ... with 5 more variables: Sepal.Length...6 , Sepal.Width...7 ,
## #   Petal.Length...8 , Petal.Width...9 , Species...10 
iris1 %>% bind_rows(iris2) 
## # A tibble: 10 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
##                                      
##  1          5.1         3.5          1.4         0.2 setosa   
##  2          4.9         3            1.4         0.2 setosa   
##  3          4.7         3.2          1.3         0.2 setosa   
##  4          4.6         3.1          1.5         0.2 setosa   
##  5          5           3.6          1.4         0.2 setosa   
##  6          6.3         3.3          6           2.5 virginica
##  7          5.8         2.7          5.1         1.9 virginica
##  8          7.1         3            5.9         2.1 virginica
##  9          6.3         2.9          5.6         1.8 virginica
## 10          6.5         3            5.8         2.2 virginica

pivot_longer/pivot_wider

iris %>% group_by(Species) %>% summarise(across(where(is.numeric),mean)) %>% 
  pivot_longer(where(is.numeric),names_to = "para",values_to = "value")
## # A tibble: 12 x 3
##    Species    para         value
##                  
##  1 setosa     Sepal.Length 5.01 
##  2 setosa     Sepal.Width  3.43 
##  3 setosa     Petal.Length 1.46 
##  4 setosa     Petal.Width  0.246
##  5 versicolor Sepal.Length 5.94 
##  6 versicolor Sepal.Width  2.77 
##  7 versicolor Petal.Length 4.26 
##  8 versicolor Petal.Width  1.33 
##  9 virginica  Sepal.Length 6.59 
## 10 virginica  Sepal.Width  2.97 
## 11 virginica  Petal.Length 5.55 
## 12 virginica  Petal.Width  2.03
iris %>% group_by(Species) %>% summarise(across(where(is.numeric),mean)) %>% 
  pivot_longer(where(is.numeric),names_to = "para",values_to = "value") %>% 
  pivot_wider(names_from = para,values_from = value)
## # A tibble: 3 x 5
##   Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
##                                      
## 1 setosa             5.01        3.43         1.46       0.246
## 2 versicolor         5.94        2.77         4.26       1.33 
## 3 virginica          6.59        2.97         5.55       2.03

unite/separate

iris %>% unite("Sepal",c(1:2),sep="/",remove=F)
## # A tibble: 150 x 6
##    Sepal   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##                                       
##  1 5.1/3.5          5.1         3.5          1.4         0.2 setosa 
##  2 4.9/3            4.9         3            1.4         0.2 setosa 
##  3 4.7/3.2          4.7         3.2          1.3         0.2 setosa 
##  4 4.6/3.1          4.6         3.1          1.5         0.2 setosa 
##  5 5/3.6            5           3.6          1.4         0.2 setosa 
##  6 5.4/3.9          5.4         3.9          1.7         0.4 setosa 
##  7 4.6/3.4          4.6         3.4          1.4         0.3 setosa 
##  8 5/3.4            5           3.4          1.5         0.2 setosa 
##  9 4.4/2.9          4.4         2.9          1.4         0.2 setosa 
## 10 4.9/3.1          4.9         3.1          1.5         0.1 setosa 
## # ... with 140 more rows
iris %>% unite("Sepal",c(1:2),sep="_") %>% 
  separate(Sepal,c("Length","Width"),sep="_",remove=F)
## # A tibble: 150 x 6
##    Sepal   Length Width Petal.Length Petal.Width Species
##                           
##  1 5.1_3.5 5.1    3.5            1.4         0.2 setosa 
##  2 4.9_3   4.9    3              1.4         0.2 setosa 
##  3 4.7_3.2 4.7    3.2            1.3         0.2 setosa 
##  4 4.6_3.1 4.6    3.1            1.5         0.2 setosa 
##  5 5_3.6   5      3.6            1.4         0.2 setosa 
##  6 5.4_3.9 5.4    3.9            1.7         0.4 setosa 
##  7 4.6_3.4 4.6    3.4            1.4         0.3 setosa 
##  8 5_3.4   5      3.4            1.5         0.2 setosa 
##  9 4.4_2.9 4.4    2.9            1.4         0.2 setosa 
## 10 4.9_3.1 4.9    3.1            1.5         0.1 setosa 
## # ... with 140 more rows

ggplot

iris %>% group_by(Species) %>% 
  summarise(n=n(),
            across(ends_with("th"),list(mean=mean,se=~sd(.x)/sqrt(n)),.names="{.fn}_{.col}")) %>% 
  ggplot(aes(x=Species,y=mean_Sepal.Length))+
  geom_col()+
  geom_errorbar(aes(ymin=mean_Sepal.Length,ymax=mean_Sepal.Length+se_Sepal.Length),width=0.4)

基于tidyverse的R使用指南_第1张图片

建模

iris%>%group_nest(Species)%>%
  mutate(model=map(data,~lm(Sepal.Length~Sepal.Width,data=.x)))%>%
  mutate(rsquare=map2(model,data,predict))
## # A tibble: 3 x 4
##   Species                  data model  rsquare   
##         >      
## 1 setosa               [50 x 4]    
## 2 versicolor           [50 x 4]    
## 3 virginica            [50 x 4]    
iris%>%group_by(Species)%>%
  group_map(~ broom::tidy(lm(Sepal.Length~Sepal.Width,data=.x)))
## [[1]]
## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##                           
## 1 (Intercept)    2.64     0.310       8.51 3.74e-11
## 2 Sepal.Width    0.690    0.0899      7.68 6.71e-10
## 
## [[2]]
## # A tibble: 2 x 5
##   term        estimate std.error statistic      p.value
##                               
## 1 (Intercept)    3.54      0.563      6.29 0.0000000907
## 2 Sepal.Width    0.865     0.202      4.28 0.0000877   
## 
## [[3]]
## # A tibble: 2 x 5
##   term        estimate std.error statistic    p.value
##                             
## 1 (Intercept)    3.91      0.757      5.16 0.00000466
## 2 Sepal.Width    0.902     0.253      3.56 0.000843
# 参数
iris%>%group_by(Species)%>%
  group_modify(~ broom::tidy(lm(Sepal.Length~Sepal.Width,data=.x)))
## # A tibble: 6 x 6
## # Groups:   Species [3]
##   Species    term        estimate std.error statistic  p.value
##                                 
## 1 setosa     (Intercept)    2.64     0.310       8.51 3.74e-11
## 2 setosa     Sepal.Width    0.690    0.0899      7.68 6.71e-10
## 3 versicolor (Intercept)    3.54     0.563       6.29 9.07e- 8
## 4 versicolor Sepal.Width    0.865    0.202       4.28 8.77e- 5
## 5 virginica  (Intercept)    3.91     0.757       5.16 4.66e- 6
## 6 virginica  Sepal.Width    0.902    0.253       3.56 8.43e- 4
# 模型评价
iris%>%group_by(Species)%>%
  group_modify(~ broom::glance(lm(Sepal.Length~Sepal.Width,data=.x)))
## # A tibble: 3 x 13
## # Groups:   Species [3]
##   Species    r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC
##                                    
## 1 setosa         0.551         0.542 0.239      59.0 6.71e-10     1   1.73  2.53
## 2 versicolor     0.277         0.262 0.444      18.4 8.77e- 5     1 -29.3  64.6 
## 3 virginica      0.209         0.193 0.571      12.7 8.43e- 4     1 -41.9  89.9 
## # ... with 4 more variables: BIC , deviance , df.residual ,
## #   nobs 
# 残差
iris%>%group_by(Species)%>%
  group_modify(~ broom::augment(lm(Sepal.Length~Sepal.Width,data=.x)))
## # A tibble: 150 x 9
## # Groups:   Species [3]
##    Species Sepal.Length Sepal.Width .fitted  .resid   .hat .sigma   .cooksd
##                                    
##  1 setosa           5.1         3.5    5.06  0.0443 0.0207  0.241 0.000373 
##  2 setosa           4.9         3      4.71  0.190  0.0460  0.239 0.0160   
##  3 setosa           4.7         3.2    4.85 -0.149  0.0274  0.240 0.00561  
##  4 setosa           4.6         3.1    4.78 -0.180  0.0353  0.240 0.0107   
##  5 setosa           5           3.6    5.12 -0.125  0.0242  0.240 0.00348  
##  6 setosa           5.4         3.9    5.33  0.0681 0.0516  0.241 0.00234  
##  7 setosa           4.6         3.4    4.99 -0.387  0.0201  0.234 0.0275   
##  8 setosa           5           3.4    4.99  0.0133 0.0201  0.241 0.0000327
##  9 setosa           4.4         2.9    4.64 -0.241  0.0596  0.238 0.0345   
## 10 setosa           4.9         3.1    4.78  0.120  0.0353  0.240 0.00484  
## # ... with 140 more rows, and 1 more variable: .std.resid 
model_lm6<-lm(Sepal.Length~Sepal.Width+Petal.Length,data=iris)
summary(model_lm6)
## 
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length, data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.96159 -0.23489  0.00077  0.21453  0.78557 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.24914    0.24797    9.07 7.04e-16 ***
## Sepal.Width   0.59552    0.06933    8.59 1.16e-14 ***
## Petal.Length  0.47192    0.01712   27.57  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3333 on 147 degrees of freedom
## Multiple R-squared:  0.8402, Adjusted R-squared:  0.838 
## F-statistic: 386.4 on 2 and 147 DF,  p-value: < 2.2e-16
anova(model_lm6)
## Analysis of Variance Table
## 
## Response: Sepal.Length
##               Df Sum Sq Mean Sq F value    Pr(>F)    
## Sepal.Width    1  1.412   1.412  12.714 0.0004902 ***
## Petal.Length   1 84.427  84.427 760.059 < 2.2e-16 ***
## Residuals    147 16.329   0.111                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
vip::vi(model_lm6)
## # A tibble: 2 x 3
##   Variable     Importance Sign 
##                 
## 1 Petal.Length      27.6  POS  
## 2 Sepal.Width        8.59 POS
vip::vi(model_lm6,method="firm")
## # A tibble: 2 x 2
##   Variable     Importance
##                
## 1 Petal.Length      0.832
## 2 Sepal.Width       0.441
bruceR::model_summary(model_lm6)
## 
## ==============================
##               (1) Sepal.Length
## ------------------------------
## (Intercept)     2.249 ***     
##                (0.248)        
## Sepal.Width     0.596 ***     
##                (0.069)        
## Petal.Length    0.472 ***     
##                (0.017)        
## ------------------------------
## R^2             0.840         
## Adj. R^2        0.838         
## Num. obs.     150             
## ==============================
## Note. * p < .05, ** p < .01, *** p < .001.
## 
## # Check for Multicollinearity
## 
## Low Correlation
## 
##          Term  VIF Increased SE Tolerance
##   Sepal.Width 1.22         1.11      0.82
##  Petal.Length 1.22         1.11      0.82

你可能感兴趣的:(R,#,R包)