20180416-H · Global Mortality · ggplot2 地图 gganimate 动图 动态地图 · R 语言数据可视化 案例 源码

所有作品合集传送门: Tidy Tuesday

2018 年合集传送门: 2018

Global Mortality

What do people die from?


在过去的几个世纪里,世界发生了很大的变化–这就是《我们的世界》的数据所显示的。然而,有一件事在这种转变中一直保持不变:我们都必须在某个时候死亡。然而,随着生活水平的提高、医疗保健的进步和生活方式的改变,死亡的原因正在发生变化。

在这篇博客中,我们试图回答 “人们死于什么?”,首先看一下全球死因的数据,然后选择国家层面的例子。
世界各地的主要死因仍有很大差异,因此,也可以选择了一些国家,以突出这种异质性。

本次示例通过一些可视化方式来展示这些信息。

gganimate 让你的统计图动起来!动态交互图的绘制在 R 实际工作中应用的比较多,在 R 中我们可以使用 gganimate 包来快速完成一张动态图的绘制。这里我们借助这个 R 包绘制了一个动态地图。






1. 一些环境设置

# 设置为国内镜像, 方便快速安装模块
options("repos" = c(CRAN = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))

2. 设置工作路径

wkdir <- '/home/user/R_workdir/TidyTuesday/2018/2018-04-16_Global_Mortality/src-h'
setwd(wkdir)

3. 加载 R 包

一些关于字体的设置可以参考这篇文章 R/ggplot2保存图片中文字体至PDF——showtext包一文清除所有障碍 。需要更多的相关字体设置知识可以自行学习帮助文档或检索其他文章,这里不多加累述了。

# rnaturalearth 提供了自然地球的简单特征
# The rnaturalearthdata package needs to be installed. --> install.packages("rnaturalearthdata")
library(tidyverse)
library(lubridate)
library(rnaturalearth)
library(sf)
library(gganimate)
library(ggtext)

# 需要能访问 Google, 也可以注释掉下面这行, 影响不大
sysfonts::font_add_google("Gochi Hand", "gochi")

4. 加载数据

# 读取数据
df_input <- readxl::read_excel("../data/global_mortality.xlsx")

# 简要查看数据内容
glimpse(df_input)
## Rows: 6,156
## Columns: 35
## $ country                                     "Afghanistan", "Afghanistan…
## $ country_code                                "AFG", "AFG", "AFG", "AFG",…
## $ year                                        1990, 1991, 1992, 1993, 199…
## $ `Cardiovascular dalberts (%)`               17.61040, 17.80181, 18.3868…
## $ `Cancers (%)`                               4.025975, 4.054145, 4.17395…
## $ `Respiratory diseases (%)`                  2.106626, 2.134176, 2.20829…
## $ `Diabetes (%)`                              3.832555, 3.822228, 3.90012…
## $ `Dementia (%)`                              0.5314287, 0.5324973, 0.540…
## $ `Lower respiratory infections (%)`          10.886362, 10.356968, 10.09…
## $ `Neonatal deaths (%)`                       9.184653, 8.938897, 8.84138…
## $ `Diarrheal diseases (%)`                    2.497141, 2.572228, 2.70774…
## $ `Road accidents (%)`                        3.715944, 3.729142, 3.81635…
## $ `Liver disease (%)`                         0.8369093, 0.8455159, 0.874…
## $ `Tuberculosis (%)`                          5.877075, 5.891704, 6.03466…
## $ `Kidney disease (%)`                        1.680611, 1.671115, 1.70098…
## $ `Digestive diseases (%)`                    1.058771, 1.049322, 1.06288…
## $ `HIV/AIDS (%)`                              0.01301948, 0.01451458, 0.0…
## $ `Suicide (%)`                               0.4366105, 0.4422802, 0.456…
## $ `Malaria (%)`                               0.4488863, 0.4550191, 0.460…
## $ `Homicide (%)`                              1.287020, 1.290991, 1.32616…
## $ `Nutritional deficiencies (%)`              0.3505045, 0.3432123, 0.345…
## $ `Meningitis (%)`                            3.037603, 2.903202, 2.84064…
## $ `Protein-energy malnutrition (%)`           0.3297599, 0.3221711, 0.323…
## $ `Drowning (%)`                              0.9838624, 0.9545860, 0.951…
## $ `Maternal deaths (%)`                       1.769213, 1.749264, 1.76424…
## $ `Parkinson ialbert (%)`                     0.02515859, 0.02545063, 0.0…
## $ `Alcohol disorders (%)`                     0.02899828, 0.02917152, 0.0…
## $ `Intestinal infectious diseases (%)`        0.1833303, 0.1781074, 0.176…
## $ `Drug disorders (%)`                        0.04120540, 0.04203340, 0.0…
## $ `Hepatitis (%)`                             0.1387378, 0.1350081, 0.134…
## $ `Fire (%)`                                  0.1741567, 0.1706712, 0.171…
## $ `Heat-related (hot and cold exposure) (%)`  0.1378229, 0.1348266, 0.139…
## $ `Natural disasters (%)`                     0.00000000, 0.79760256, 0.3…
## $ `Conflict (%)`                              0.932, 2.044, 2.408, NA, 4.…
## $ `Terrorism (%)`                             0.007, 0.040, 0.027, NA, 0.…
# 检查数据的列名
colnames(df_input)
##  [1] "country"                                 
##  [2] "country_code"                            
##  [3] "year"                                    
##  [4] "Cardiovascular diseases (%)"             
##  [5] "Cancers (%)"                             
##  [6] "Respiratory diseases (%)"                
##  [7] "Diabetes (%)"                            
##  [8] "Dementia (%)"                            
##  [9] "Lower respiratory infections (%)"        
## [10] "Neonatal deaths (%)"                     
## [11] "Diarrheal diseases (%)"                  
## [12] "Road accidents (%)"                      
## [13] "Liver disease (%)"                       
## [14] "Tuberculosis (%)"                        
## [15] "Kidney disease (%)"                      
## [16] "Digestive diseases (%)"                  
## [17] "HIV/AIDS (%)"                            
## [18] "Suicide (%)"                             
## [19] "Malaria (%)"                             
## [20] "Homicide (%)"                            
## [21] "Nutritional deficiencies (%)"            
## [22] "Meningitis (%)"                          
## [23] "Protein-energy malnutrition (%)"         
## [24] "Drowning (%)"                            
## [25] "Maternal deaths (%)"                     
## [26] "Parkinson disease (%)"                   
## [27] "Alcohol disorders (%)"                   
## [28] "Intestinal infectious diseases (%)"      
## [29] "Drug disorders (%)"                      
## [30] "Hepatitis (%)"                           
## [31] "Fire (%)"                                
## [32] "Heat-related (hot and cold exposure) (%)"
## [33] "Natural disasters (%)"                   
## [34] "Conflict (%)"                            
## [35] "Terrorism (%)"

5. 数据预处理

# 取前一
df_tidy <- df_input %>%
  gather(key = disease, value = mortality, -c(country, country_code, year)) %>%
  # 建议使用 dplyr::mutate 形式调用函数, 有可能与 plyr 中的函数冲突 (因为我自己就报错了...)
  dplyr::mutate(disease = substr(disease, 1, nchar(disease) - 4),
                year = year(as.Date.character(year, format = "%Y"))) %>%
  group_by(country, year) %>%
  top_n(1, mortality)

# 删除缺失值
df_tidy <- na.omit(df_tidy)

# 这将是我们的基本世界地图, 不包括南极洲
world <- ne_countries(scale = 'medium', type = 'map_units', returnclass = 'sf') %>% 
  filter(!name %in% c("Fr. S. Antarctic Lands", "Antarctica"))

# 合并数据
df_world <- merge(world, df_tidy, by.x = "name", by.y = "country")


# 简要查看数据内容
glimpse(df_world)
## Rows: 4,644
## Columns: 68
## $ name          "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
## $ scalerank     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ featurecla    "Admin-0 map unit", "Admin-0 map unit", "Admin-0 map unit…
## $ labelrank     3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
## $ sovereignt    "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
## $ sov_a3        "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ adm0_dif      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ level         2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ type          "Sovereign country", "Sovereign country", "Sovereign coun…
## $ admin         "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
## $ adm0_a3       "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ geou_dif      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ geounit       "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
## $ gu_a3         "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ su_dif        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ subunit       "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
## $ su_a3         "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ brk_diff      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ name_long     "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
## $ brk_a3        "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ brk_name      "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
## $ brk_group     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ abbrev        "Afg.", "Afg.", "Afg.", "Afg.", "Afg.", "Afg.", "Afg.", "…
## $ postal        "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF…
## $ formal_en     "Islamic State of Afghanistan", "Islamic State of Afghani…
## $ formal_fr     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ note_adm0     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ note_brk      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ name_sort     "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
## $ name_alt      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ mapcolor7     5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
## $ mapcolor8     6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, …
## $ mapcolor9     8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, …
## $ mapcolor13    7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, …
## $ pop_est       28400000, 28400000, 28400000, 28400000, 28400000, 2840000…
## $ gdp_md_est    22270, 22270, 22270, 22270, 22270, 22270, 22270, 22270, 2…
## $ pop_year      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lastcensus    1979, 1979, 1979, 1979, 1979, 1979, 1979, 1979, 1979, 197…
## $ gdp_year      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ economy       "7. Least developed region", "7. Least developed region",…
## $ income_grp    "5. Low income", "5. Low income", "5. Low income", "5. Lo…
## $ wikipedia     NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ fips_10       NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ iso_a2        "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF…
## $ iso_a3        "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ iso_n3        "004", "004", "004", "004", "004", "004", "004", "004", "…
## $ un_a3         "004", "004", "004", "004", "004", "004", "004", "004", "…
## $ wb_a2         "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF…
## $ wb_a3         "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ woe_id        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ adm0_a3_is    "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ adm0_a3_us    "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ adm0_a3_un    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ adm0_a3_wb    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ continent     "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "…
## $ region_un     "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "…
## $ subregion     "Southern Asia", "Southern Asia", "Southern Asia", "South…
## $ region_wb     "South Asia", "South Asia", "South Asia", "South Asia", "…
## $ name_len      11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 1…
## $ long_len      11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 1…
## $ abbrev_len    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
## $ tiny          NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ homepart      1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ country_code  "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
## $ year          1996, 1998, 2006, 2005, 2007, 2016, 1997, 1995, 2008, 201…
## $ disease       "Cardiovascular diseases", "Cardiovascular diseases", "Ca…
## $ mortality     20.40614, 19.62352, 23.76755, 23.51810, 24.12453, 26.5172…
## $ geometry      MULTIPOLYGON (((74.89131 37..., MULTIPOLYGON…

6. 用 ggplot2 绘图

# PS: 方便讲解, 我这里进行了拆解, 具体使用时可以组合在一起
gg <- ggplot()
# geom_sf() 绘制地图
gg <- gg + geom_sf(data = world, colour = "azure4", fill = "grey60", size = .5)
gg <- gg + geom_sf(data = df_world, aes(fill = disease, group = interaction(year, disease)))
# coord_sf() 用于地图的投影转换
gg <- gg + coord_sf(crs = st_crs(world), datum = NA)
# scale_fill_discrete() 修改图例顺序, breaks 是原数据, labels 是新的标签, 需要一一对应
gg <- gg + scale_fill_discrete(breaks = c('Cardiovascular diseases', 'Diarrheal diseases', 'Neonatal deaths', 
                                          'Natural disasters', 'Lower respiratory infections', 'Malaria', 'HIV/AIDS',
                                          'Cancers', 'Conflict', 'Tuberculosis', 'Nutritional deficiencies'),
                               labels = c('心血管疾病', '痢疾', '新生儿死亡', '自然灾害', '下呼吸道感染', '疟疾', 
                                          '艾滋病', '癌症', '战争冲突', '肺结核', '营养缺乏'))
# transition_states() 在动画中数据的几个不同阶段之间的转换
gg <- gg + transition_states(year)
gg <- gg + ggthemes::theme_fivethirtyeight()
# enter_fade() 淡入
gg <- gg + enter_fade()
# exit_fade() 淡出
gg <- gg + exit_fade()
# labs() 对图形添加注释和标签(包含标题、子标题、坐标轴和引用等注释)
gg <- gg + labs(title = "大多数人死于什么:{closest_state} 年",
                subtitle = NULL,
                x = NULL,
                y = NULL,
                caption = "资料来源: Our World in Data · graph by 萤火之森\n")
# theme_minimal() 去坐标轴边框的最小化主题
gg <- gg + theme_minimal()
# theme() 实现对非数据元素的调整, 对结果进行进一步渲染, 使之更加美观
gg <- gg + theme(
  # plot.margin 调整图像边距, 上-右-下-左
  plot.margin = grid::unit(c(9, 16, 9, 16), "mm"),
  # aspect.ratio 固定图像的纵横比
  aspect.ratio = 9/16,
  # legend.position 设置图例位置, "bottom" 表示图例放置于底端
  legend.position = "top",
  # legend.text 设置图例文本格式
  legend.text = element_text(family = 'gochi', size = 28, margin = margin(r = 12, unit = "pt")),
  # legend.title 设置图例标题
  legend.title = element_blank(),
  # plot.title 主标题
  plot.title = element_markdown(family = 'gochi', color = "dodgerblue4", hjust = 0.5, size = 48),
  # plot.caption 说明文字
  plot.caption = element_markdown(color = "red", size = 16))

7. 保存图片到 PDF 和 PNG

# Error: The gifski package is required to use gifski_renderer --> install.packages('gifski')
# Error in transform_sf(all_frames, next_state, ease, params$transition_length[i],  : The transformr package is required to tween sf layers --> install.packages('transformr')
# 默认 fps = 10, 如果调试的话, 可以设置为 fps = 1, 稍微快那么一点点 ~
animate(gg, renderer = gifski_renderer(), fps = 10, width = 1580, height = 1000, duration = length(unique(df_world$year)))

filename = '20180416-H-01'
anim_save(filename = paste0(filename, ".gif"))

8. session-info

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggtext_0.1.2        gganimate_1.0.8     sf_1.0-8           
##  [4] rnaturalearth_0.1.0 lubridate_1.8.0     forcats_0.5.2      
##  [7] stringr_1.4.1       dplyr_1.0.10        purrr_0.3.4        
## [10] readr_2.1.2         tidyr_1.2.1         tibble_3.1.8       
## [13] ggplot2_3.3.6       tidyverse_1.3.2    
## 
## loaded via a namespace (and not attached):
##  [1] fs_1.5.2                progress_1.2.2          httr_1.4.4             
##  [4] tools_4.2.1             backports_1.4.1         bslib_0.4.0            
##  [7] utf8_1.2.2              R6_2.5.1                KernSmooth_2.23-20     
## [10] DBI_1.1.3               colorspace_2.0-3        withr_2.5.0            
## [13] sp_1.5-0                tidyselect_1.1.2        rnaturalearthdata_0.1.0
## [16] prettyunits_1.1.1       curl_4.3.2              compiler_4.2.1         
## [19] cli_3.3.0               rvest_1.0.3             xml2_1.3.3             
## [22] sass_0.4.2              scales_1.2.1            classInt_0.4-8         
## [25] proxy_0.4-27            digest_0.6.29           rmarkdown_2.16         
## [28] pkgconfig_2.0.3         htmltools_0.5.3         highr_0.9              
## [31] dbplyr_2.2.1            fastmap_1.1.0           rlang_1.0.5            
## [34] ggthemes_4.2.4          readxl_1.4.1            rstudioapi_0.14        
## [37] sysfonts_0.8.8          jquerylib_0.1.4         farver_2.1.1           
## [40] generics_0.1.3          jsonlite_1.8.0          googlesheets4_1.0.1    
## [43] magrittr_2.0.3          s2_1.1.0                Rcpp_1.0.9             
## [46] munsell_0.5.0           fansi_1.0.3             lifecycle_1.0.1        
## [49] stringi_1.7.8           yaml_2.3.5              grid_4.2.1             
## [52] crayon_1.5.1            lattice_0.20-45         haven_2.5.1            
## [55] gridtext_0.1.5          hms_1.1.2               transformr_0.1.4       
## [58] knitr_1.40              pillar_1.8.1            markdown_1.1           
## [61] lpSolve_5.6.16          wk_0.6.0                reprex_2.0.2           
## [64] glue_1.6.2              evaluate_0.16           gifski_1.6.6-1         
## [67] modelr_0.1.9            vctrs_0.4.1             tzdb_0.3.0             
## [70] tweenr_2.0.2            cellranger_1.1.0        gtable_0.3.1           
## [73] assertthat_0.2.1        cachem_1.0.6            xfun_0.32              
## [76] broom_1.0.1             e1071_1.7-11            class_7.3-20           
## [79] googledrive_2.0.0       gargle_1.2.1            units_0.8-0            
## [82] ellipsis_0.3.2

测试数据

配套数据下载:global_mortality.xlsx

你可能感兴趣的:(#,Tidy,Tuesday,(2018),r语言,数据挖掘)