桑基图(Sankey):定向追踪你的数据

小郭叨叨叨
上年发现运来兄有介绍桑基图,感觉好玩,但不知如何使用,无学习的欲望;最近发现果子大神也在介绍桑基图,又勾起了我的好奇心,偷闲学习了一下有关桑基图的R包,特此搬砖备用,以待不时之需

What's that ?

桑基图: 流图 ( flow diagram ) 的一种,用来描述流动情况,图中延伸的分支的宽度对应数据流量的大小,通常应用于能源、材料成分、金融等数据的可视化分析。最早由爱尔兰人Matthew Henry Phineas Riall Sanke提出。Sankey是一名船长也是工程师,1898年Sankey在土木工程师学会会报纪要的一篇关于蒸汽机能源效率的文章中首次推出了第一个能量流动图,后来被命名为Sankey图,中文音译为桑基图。

R语言中实现方式

ggalluvial packages;
ggforce packages.

本文档仅仅为了学习桑基图相关R包的使用方法,因手上缺少实际数据,不再展开

1. ggalluvial : Alluvial Diagrams in ggplot2

准备工作
install.packages("ggalluvial")
library(ggalluvial)
##使用vignette查看演示教程
vignette(topic = "ggalluvial", package = "ggalluvial")
Alluvial data

ggalluvial识别两种形式的数据:分类重复测量数据的“wide”和“long” formats。用于存储多个分类维度的数据类型"tabular(or array)"也很受欢迎,如TitanicUCBAdmissions数据集。
为了和ggplot2数据格式保持一致,ggalluvial不接受表格输入;base::data.frame()可将这些array 转换成可接受的 data frame.

简单的例子
> head(as.data.frame(UCBAdmissions), n = 12)
      Admit Gender Dept Freq
1  Admitted   Male    A  512
2  Rejected   Male    A  313
3  Admitted Female    A   89
4  Rejected Female    A   19
5  Admitted   Male    B  353
6  Rejected   Male    B  207
7  Admitted Female    B   17
8  Rejected Female    B    8
9  Admitted   Male    C  120
10 Rejected   Male    C  205
11 Admitted Female    C  202
12 Rejected Female    C  391
> is_alluvia_form(as.data.frame(UCBAdmissions), axes = 1:3, silent = TRUE)
[1] TRUE
ggplot(as.data.frame(UCBAdmissions),
       aes(y = Freq, axis1 = Gender, axis2 = Dept)) +
  geom_alluvium(aes(fill = Admit), width = 1/12) +
  geom_stratum(width = 1/12, fill = "black", color = "grey") +
  geom_label(stat = "stratum", label.strata = TRUE) +
  scale_x_discrete(limits = c("Gender", "Dept"), expand = c(.05, .05)) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  ggtitle("UC Berkeley admissions and rejections, by sex and department")
Alluvia format :wide & long

wide格式:as.data.frame

ggplot(as.data.frame(Titanic),
       aes(axis1 = Class, axis2 = Sex, axis3 = Age,
           y= Freq)) +
  scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.1, .05)) +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE) +
  theme_minimal() +
  ggtitle("passengers on the maiden voyage of the Titanic",
          "stratified by demographics and survival")


参数说明:data设置数据源,axis设置显示的柱,weight为数值,geom_alluvium为冲击图组间面积连接并按生存率比填充分组,geom_stratum()每种有柱状图,geom_text()显示柱状图中标签,theme_minimal()主题样式的一种,ggtitle()设置图标题。

转换成 long格式 :to_lodes_form

titanic_long <- to_lodes_form(data.frame(Titanic),
                         key = "Demographic",
                         axes = 1:3)
> head(titanic_long)
  Survived Freq alluvium Demographic stratum
1       No    0        1       Class     1st
2       No    0        2       Class     2nd
3       No   35        3       Class     3rd
4       No    0        4       Class    Crew
5       No    0        5       Class     1st
6       No    0        6       Class     2nd
ggplot(data = titanic_long,
       aes(x = Demographic, stratum = stratum, alluvium = alluvium,
           y = Freq, label = stratum)) +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() + geom_text(stat = "stratum") +
  theme_minimal() +
  ggtitle("passengers on the maiden voyage of the Titanic",
          "stratified by demographics and survival")

使用coord_flip函数进行X轴与Y轴的对调

ggplot(as.data.frame(Titanic),
       aes(y = Freq,
           axis1 = Survived, axis2 = Sex, axis3 = Class)) +
  geom_alluvium(aes(fill = Class),
                width = 0, knot.pos = 0, reverse = FALSE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/8, reverse = FALSE) +
  geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
  scale_x_continuous(breaks = 1:3, labels = c("Survived", "Sex", "Class")) +
  coord_flip() +
  ggtitle("Titanic survival by class and sex")
非等高冲击图
data(Refugees, package = "alluvial")
country_regions <- c(
  Afghanistan = "Middle East",
  Burundi = "Central Africa",
  `Congo DRC` = "Central Africa",
  Iraq = "Middle East",
  Myanmar = "Southeast Asia",
  Palestine = "Middle East",
  Somalia = "Horn of Africa",
  Sudan = "Central Africa",
  Syria = "Middle East",
  Vietnam = "Southeast Asia"
)
Refugees$region <- country_regions[Refugees$country]
ggplot(data = Refugees,
       aes(x = year, y = refugees, alluvium = country)) +
  geom_alluvium(aes(fill = country, colour = country),
                alpha = .75, decreasing = FALSE) +
  scale_x_continuous(breaks = seq(2003, 2013, 2)) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = -30, hjust = 0)) +
  scale_fill_brewer(type = "qual", palette = "Set3") +
  scale_color_brewer(type = "qual", palette = "Set3") +
  facet_wrap(~ region, scales = "fixed") +
  ggtitle("refugee volume by country and region of origin")
Warning message:
In f(...) :
  Some differentiation aesthetics vary within alluvia, and will be diffused by their first value.
Consider using `geom_flow()` instead.
登高非等量关系
data(majors)
majors$curriculum <- as.factor(majors$curriculum)
ggplot(majors,
       aes(x = semester, stratum = curriculum, alluvium = student,
           fill = curriculum, label = curriculum)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  geom_flow(stat = "alluvium", lode.guidance = "rightleft",
            color = "darkgray") +
  geom_stratum() +
  theme(legend.position = "bottom") +
  ggtitle("student curricula across several semesters")
流状态随时间转换
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = response)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "none") +
  ggtitle("vaccination survey responses at three points in time")

2. ggforce: Visual Guide

data <- reshape2::melt(Titanic)
head(data)
  Class    Sex   Age Survived value
1   1st   Male Child       No     0
2   2nd   Male Child       No     0
3   3rd   Male Child       No    35
4  Crew   Male Child       No     0
5   1st Female Child       No     0
6   2nd Female Child       No     0
data <- gather_set_data(data, 1:4)
head(data)
  Class    Sex   Age Survived value id     x    y
1   1st   Male Child       No     0  1 Class  1st
2   2nd   Male Child       No     0  2 Class  2nd
3   3rd   Male Child       No    35  3 Class  3rd
4  Crew   Male Child       No     0  4 Class Crew
5   1st Female Child       No     0  5 Class  1st
6   2nd Female Child       No     0  6 Class  2nd
ggplot(data, aes(x, id = id, split = y, value = value)) +
  geom_parallel_sets(aes(fill = Sex), alpha = 0.3, axis.width = 0.1) +
  geom_parallel_sets_axes(axis.width = 0.1) +
  geom_parallel_sets_labels(colour = 'white')

巨人的肩膀

ggalluvial : Alluvial Diagrams in ggplot2
ggforce: Visual Guide
桑基图(Sankey)的简单实现
桑基图怎么看怎么画(附R代码)

你可能感兴趣的:(桑基图(Sankey):定向追踪你的数据)