20180409-D · NFL Positional Salaries · ggplot2 ggbeeswarm 绘制抖动散点图 · R 语言数据可视化 案例 源码

所有作品合集传送门: Tidy Tuesday

2018 年合集传送门: 2018

NFL Positional Salaries

NFL Positional Salaries


Tidy Tuesday 在 GitHub 上的传送地址:
Thomas Mock (2022). Tidy Tuesday: A weekly data project aimed at the R ecosystem. https://github.com/rfordatascience/tidytuesday


20180409-D · NFL Positional Salaries · ggplot2 ggbeeswarm 绘制抖动散点图 · R 语言数据可视化 案例 源码_第1张图片



1. 一些环境设置

# 设置为国内镜像, 方便快速安装模块
options("repos" = c(CRAN = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))

2. 设置工作路径

wkdir <- '/home/user/R_workdir/TidyTuesday/2018/2018-04-09_NFL_Positional_Salaries/src-d'
setwd(wkdir)

3. 加载 R 包

library(tidyverse)
library(ggbeeswarm)
library(showtext)
# 在 Ubuntu 系统上测试的, 不加这个我画出来的汉字会乱码 ~
showtext_auto()

4. 加载数据

df_input <- readxl::read_excel("../data/nfl_salary.xlsx")

# 简要查看数据内容
glimpse(df_input)
## Rows: 800
## Columns: 11
## $ year                 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 20…
## $ Cornerback           11265916, 11000000, 10000000, 10000000, 10000000, …
## $ `Defensive Lalbert`  17818000, 16200000, 12476000, 11904706, 11762782, …
## $ Linebacker           16420000, 15623000, 11825000, 10083333, 10020000, …
## $ `Offensive Lineman`  15960000, 12800000, 11767500, 10358200, 10000000, …
## $ Quarterback          17228125, 16000000, 14400000, 14100000, 13510000, …
## $ `Running Back`       12955000, 10873833, 9479000, 7700000, 7500000, 703…
## $ Safety               8871428, 8787500, 8282500, 8000000, 7804333, 76527…
## $ `Special Teamer`     4300000, 3725000, 3556176, 3500000, 3250000, 32250…
## $ `Tight End`          8734375, 8591000, 8290000, 7723333, 6974666, 61333…
## $ `Wide Receiver`      16250000, 14175000, 11424000, 11415000, 10800000, …
# 检查数据的列名
colnames(df_input)
##  [1] "year"              "Cornerback"        "Defensive Lineman"
##  [4] "Linebacker"        "Offensive Lineman" "Quarterback"      
##  [7] "Running Back"      "Safety"            "Special Teamer"   
## [10] "Tight End"         "Wide Receiver"

5. 数据预处理

# 整理数据, 从宽数据透视到长数据转换
df_plot <- df_input %>% 
  # pivot_longer() 从宽数据透视到长数据转换
  pivot_longer(cols = -year,
               names_to = "position",
               values_to = "salary_position") %>% 
  # 去除缺失值
  filter(!is.na(salary_position))

# 简要查看数据内容
glimpse(df_plot)
## Rows: 7,944
## Columns: 3
## $ year             2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, …
## $ position         "Cornerback", "Defensive Lineman", "Linealbert", "Offe…
## $ salary_position  11265916, 17818000, 16420000, 15960000, 17228125, 1295…

6. 利用 ggplot2 绘图

# PS: 方便讲解, 我这里进行了拆解, 具体使用时可以组合在一起
gg <- ggplot(df_plot, aes(year, salary_position / 1000000, group = year))
# geom_quasirandom() 绘制抖动散点图
gg <- gg + geom_quasirandom(size = 0.7, alpha = .3, colour = "#FF7F50")
# facet_wrap() 可视化分面图, ncol = 5 表示有五列
gg <- gg + facet_wrap( ~ position, ncol = 5)
# scale_y_continuous() 对连续变量设置坐标轴显示范围
gg <- gg + scale_y_continuous(labels = scales::dollar_format(suffix = "m"))
# labs() 对图形添加注释和标签(包含标题、子标题、坐标轴和引用等注释)
gg <- gg + labs(title = "NFL中不同位置的工资情况",
                subtitle = NULL,
                x = NULL,
                y = '薪资',
                caption = "NFL Quarterback Salaries · graph by 数绘小站")
# theme_minimal() 去坐标轴边框的最小化主题
gg <- gg + theme_minimal()
# theme() 实现对非数据元素的调整, 对结果进行进一步渲染, 使之更加美观
gg <- gg + theme(
  # panel.grid.major 主网格线, 这一步表示删除主要网格线
  panel.grid.major = element_line("grey", size = 0.2),
  # panel.grid.minor 次网格线, 这一步表示删除次要网格线
  panel.grid.minor = element_blank(),
  # axis.text 坐标轴刻度文本
  axis.text = element_text(color = "black", size = 9),
  # axis.title 坐标轴标题
  axis.title = element_text(color = "black", size = 12),
  # axis.ticks 坐标轴刻度线
  axis.ticks = element_blank(),
  # plot.title 主标题
  plot.title = element_text(hjust = 0.5, color = "black", size = 16, face = "bold"),
  # plot.background 图片背景
  plot.background = element_rect(fill = "white"),
  # strip.text 自定义分面图每个分面标题的文字
  strip.text = element_text(face = "bold", size = rel(0.8), vjust = -.2),
  # strip.background 自定义分面图每个分面的背景颜色
  strip.background = element_blank())

7. 保存图片到 PDF 和 PNG

gg

20180409-D · NFL Positional Salaries · ggplot2 ggbeeswarm 绘制抖动散点图 · R 语言数据可视化 案例 源码_第2张图片

filename = '20180409-D-01'
ggsave(filename = paste0(filename, ".pdf"), width = 8.6, height = 5, device = cairo_pdf)
ggsave(filename = paste0(filename, ".png"), width = 8.6, height = 5, dpi = 100, device = "png")

8. session-info

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: albert
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] showtext_0.9-5   showtextdb_3.0   sysfonts_0.8.8   ggbeeswarm_0.6.0
##  [5] forcats_0.5.2    stringr_1.4.1    dplyr_1.0.10     purrr_0.3.4     
##  [9] readr_2.1.2      tidyr_1.2.1      tibble_3.1.8     ggplot2_3.3.6   
## [13] tidyverse_1.3.2 
## 
## loaded via a namespace (and not attached):
##  [1] lubridate_1.8.0     assertthat_0.2.1    digest_0.6.29      
##  [4] utf8_1.2.2          R6_2.5.1            cellranger_1.1.0   
##  [7] backports_1.4.1     reprex_2.0.2        evaluate_0.16      
## [10] highr_0.9           httr_1.4.4          pillar_1.8.1       
## [13] rlang_1.0.5         googlesheets4_1.0.1 readxl_1.4.1       
## [16] rstudioapi_0.14     jquerylib_0.1.4     rmarkdown_2.16     
## [19] textshaping_0.3.6   labeling_0.4.2      googledrive_2.0.0  
## [22] munsell_0.5.0       broom_1.0.1         compiler_4.2.1     
## [25] vipor_0.4.5         modelr_0.1.9        xfun_0.32          
## [28] systemfonts_1.0.4   pkgconfig_2.0.3     htmltools_0.5.3    
## [31] tidyselect_1.1.2    fansi_1.0.3         crayon_1.5.1       
## [34] tzdb_0.3.0          dbplyr_2.2.1        withr_2.5.0        
## [37] grid_4.2.1          jsonlite_1.8.0      gtable_0.3.1       
## [40] lifecycle_1.0.1     DBI_1.1.3           magrittr_2.0.3     
## [43] scales_1.2.1        cli_3.3.0           stringi_1.7.8      
## [46] cachem_1.0.6        farver_2.1.1        fs_1.5.2           
## [49] xml2_1.3.3          bslib_0.4.0         ragg_1.2.3         
## [52] ellipsis_0.3.2      generics_0.1.3      vctrs_0.4.1        
## [55] tools_4.2.1         glue_1.6.2          beeswarm_0.4.0     
## [58] hms_1.1.2           fastmap_1.1.0       yaml_2.3.5         
## [61] colorspace_2.0-3    gargle_1.2.1        rvest_1.0.3        
## [64] knitr_1.40          haven_2.5.1         sass_0.4.2

测试数据

配套数据下载:nfl_salary.xlsx

你可能感兴趣的:(#,Tidy,Tuesday,(2018),r语言)