ggcor在微生物生态领域的使用
目前厚哥这个包还在不断开发中,但是我已经迫不及待想试试了。包的安装使用github安装函数install_github。
简单来说一下这个包用来干什么的?
相关系数矩阵可视化已经至少有两个版本的实现了,魏太云基于base绘图系统写了corrplot包,应该说是相关这个小领域中最精美的包了,使用简单,样式丰富,只能用惊艳来形容。Kassambara的ggcorrplot基于ggplot2重写了corrplot,实现了corrplot中绝大多数的功能,但仅支持“square”和“circle”的绘图标记,样式有些单调,不过整个ggcorrplot包的代码大概300行,想学习用ggplot2来自定义绘图函数,看这个包的源代码很不错。还有部分功能相似的corrr包(在写ggcor之前完全没有看过这个包,写完之后发现在相关系数矩阵变data.frame方面惊人的相似),这个包主要在数据相关系数提取、转换上做了很多的工作。ggcor的核心是为相关性分析、数据提取、转换、可视化提供一整套解决方案.
if(!require(devtools))
install.packages("devtools")
if(!require(ggcor))
devtools::install_github("houyunhuang/ggcor")
首先我们共同来学习一下厚哥这些函数
待会咱们要使用这些函数完成实战,也挑选出来我喜欢的几个可视化方案。这里的示例数据都是现成的,所以大家在这里对照运行即可。
导入r包和示例数据
suppressWarnings(suppressMessages(library("ggcor")))
quickcor提供相关计算和出图转化的合并
首先快速计算和出图,默认使用cor计算相关,没有计算显著性
quickcor(mtcars) + geom_star()
因为矩阵对象一样,所以需要学习如何修改为一半矩阵
quickcor(mtcars, type = "lower", show.diag = FALSE) + geom_square()
没有显著性计算,我们是没办法进行确定下结论的,所以这里设置cor.test = TRUE。这是厚哥包中推荐的显著性标记凡方式,但是不免看出有些冗余
add_diag_label:功能似乎没有找到,???丢弃了吗?比较包是我刚下载的最新版本。
quickcor(mtcars, cor.test = TRUE) +
geom_square(data = get_data(type = "lower")) +
geom_mark(data = get_data(type = "upper")) +
# add_diag_label(size = 5, colour = "red") +
remove_axis()
下面这个例子就很好的解决了我们的问题,仅仅展示显著的,一半。并且将相关大小标注一下,由于都是显著的,所以没必要使用显著性标记了,这里我们将geom_mark弃用,
修改为geom_number。
#
quickcor(mtcars, cor.test = TRUE,type = "lower", show.diag = FALSE) +
geom_square(data = get_data(p.value < 0.05, type = "lower")) +
geom_number(data = get_data(p.value < 0.05, type = "lower"),aes(num = r))
如果说我喜欢厚哥五角心,可以再来一个五角心的版本
quickcor(mtcars, cor.test = TRUE,type = "lower", show.diag = FALSE) +
geom_star(data = get_data(p.value < 0.05, type = "lower"))
虽然可以出现我们喜欢的图片,但是相关求取的算法,还是需要可以修改和调整的,调整起来当然很简单,method = "kendall"参数修改即可:就是corr.test
d的三个距离:
"pearson", "kendall", "spearman"
quickcor(mtcars, cor.test = TRUE,type = "lower", show.diag = FALSE,method = "kendall") +
geom_star(data = get_data(p.value < 0.05, type = "lower"))
提供两组数据之间相关的简易模式
这将简化我们的数据处理过程。
library(vegan)
data("varechem")
data("varespec")
quickcor(varespec, varechem,cor.test = TRUE,method = "pearson") +
geom_star()
了解内部工作机制 ,如果你想了解的话,比较前面已经够用了
library(ggplot2)
library(tidyverse)
correlate(mtcars, cor.test = TRUE) %>%
as_cor_tbl() %>%
ggcor() + geom_point(aes(size = r, fill = r), shape = 21) +
scale_size_area(max_size = 10) +
theme(axis.title = element_blank()) +
coord_fixed(xlim = c(0.5, 11.5), ylim = c(0.5, 11.5))
下面是我第一次学习ggcor的时候写的,现在需要修改
下面介绍几种本人喜欢的可视化方式,我这里统一将图例修改为离散型颜色:fill.bin = TRUE, legend.breaks = seq(-1, 1, length.out = 5)
注意一下开发版本才能添加颜色设置参数:,fill.bin = TRUE, legend.breaks = seq(-1, 1, length.out = 5) ,文末尾安装方式
#这里不仅仅展示了相关值,而且展示了置信区间,geom_cross用于排除不显著相关。
#
quickcor(mtcars, type = "full", cor.test = TRUE) + geom_confbox()+ geom_cross()
这里使用方形我觉得高档一些,本人比较喜欢,将不显著的去掉。
# 这里使用方形我觉得高档一些,本人比较喜欢
quickcor(mtcars, type = "full", cor.test = TRUE) +
geom_square() + geom_cross()
配合颜色填充和相关标注,加上显著性标签很容易整体把控指标的相关性。使用r =
这里厚哥添加的五角星,我们来了解一下。
# 这里使用方形我觉得高档一些,本人比较喜欢
quickcor(mtcars, type = "full", cor.test = TRUE) +
# geom_square() +
geom_star(n = 5)+
geom_cross()
NA参数去除相关值,因为颜色已经表示过相关大小了,值就省略掉吧,mark = "*"参数规范统一的显著性标签。
#配合颜色填充和相关标注,加上显著性标签很容易整体把控指标的相关性。使用r = NA参数去除相关值,因为颜色已经表示过相关大小了,值就省略掉吧,mark = "*"参数规范统一的显著性标签。
quickcor(mtcars, type = "full", cor.test = TRUE, cluster.type = "all") +
# geom_raster() +
geom_colour()+
geom_mark(r = NA,sig.thres = 0.05, size = 3, colour = "grey90")
群落矩阵和环境因子相关,这一套下来比较完整,并且可以很好的实现。
我们需要注意的是
data("varespec", package = "vegan")
data("varechem", package = "vegan")
df <- mantel_test(varespec, varechem)
library(ggplot2)
df <- df %>%
mutate(lty = cut(r, breaks = c(-Inf, 0, Inf),
labels = c("r <= 0", "r > 0")),
col = cut(p.value, breaks = c(0, 0.01, 0.05, 1),
labels = c("< 0.01", "< 0.05", ">= 0.05"),
right = FALSE, include.lowest = TRUE))
quickcor(varechem, type = "upper") +
geom_square() +
# add_diag_label() +
add_link(df, mapping = aes(colour = col,
size = r,
linetype = lty),diag.label = TRUE) +
scale_fill_gradient2n() +
scale_size_area(max_size = 3) +
scale_linetype_manual(values = c("dotted", "solid")) +
guides(
fill = guide_colourbar(title = "corr", order = 1),
colour = guide_legend(title = "Mantel's p", order = 2),
size = guide_legend(title = "Mantel's r", order = 3),
linetype = "none"
)
多个群落做分析,那么多个群落是如何整合进去的呢?varespecz这个似乎群落数据,是一个数据框,这里将其风分隔为三个群落,来做
df02 <- mantel_test(varespec, varechem,
spec.select = list(spec01 = 1:5,
spec02 = 6:12,
spec03 = 13:15,
spec03 = 13:17
)) %>%
mutate(lty = cut(r, breaks = c(-Inf, 0, Inf),
labels = c("r <= 0", "r > 0")),
col = cut(p.value, breaks = c(0, 0.01, 0.05, 1),
labels = c("< 0.01", "< 0.05", ">= 0.05"),
right = FALSE, include.lowest = TRUE))
extra.params <- extra_params(
spec.label = text_params(colour = "red", size = 7),
env.point = point_params(size = 2, fill = "grey80"),
spec.point = point_params(size = 4, shape = 24, fill = "red"),
link.params = link_params(env.point.hjust = -0.5,
env.point.vjust = -0.1,
spec.point.hjust = 1)
)
quickcor(varechem, type = "lower") +
geom_square() +
add_link(df02, mapping = aes(colour = col,
size = r,
linetype = lty),
diag.label = TRUE, spec.label.hspace = 0.5,
extra.params = extra.params) +
# add_diag_label(angle = 45) +
remove_axis("y") +
expand_axis(x = 25) + ## 扩展x轴范围
scale_fill_gradient2n() +
scale_colour_manual(values = c("#D95F02", "#1B9E77", "#D2D2D2")) +
scale_size_area(max_size = 3) +
scale_linetype_manual(values = c("dotted", "solid")) +
guides(
fill = guide_colourbar(title = "corr", order = 1),
colour = guide_legend(title = "Mantel's p", order = 2),
size = guide_legend(title = "Mantel's r", order = 3),
linetype = "none"
)
三组数据做相关
厚哥这个设计我认为很好,因为综合环境指标,和其他指标相关,同时将群落纳入其中,更加丰度了数据和合并处理和结论的流畅度。
corr <- fortify_cor(varechem, varespec[1:6])
extra.params <- extra_params(
spec.label = text_params(colour = "red", size = 7),
env.point = point_params(size = 2, fill = "grey80"),
spec.point = point_params(size = 4, shape = 24, fill = "red"),
link.params = link_params(spec.point.hjust = 3)
)
quickcor(corr) +
geom_square() +
add_link(df02, mapping = aes(colour = col,
size = r,
linetype = lty),
spec.label.hspace = 0.5,
extra.params = extra.params) +
expand_axis(x = 15) +
scale_fill_gradient2n() +
scale_colour_manual(values = c("#D95F02", "#1B9E77", "#D2D2D2")) +
scale_size_area(max_size = 3) +
scale_linetype_manual(values = c("dotted", "solid")) +
guides(
fill = guide_colourbar(title = "corr", order = 1),
colour = guide_legend(title = "Mantel's p", order = 2),
size = guide_legend(title = "Mantel's r", order = 3),
linetype = "none"
)
另外一种方式组合
library(cowplot)
mantel <- fortify_mantel(varespec, varechem,
spec.select = list(spec01 = 22:25,
spec02 = 1:4,
spec03 = 38:43,
spec04 = 15:20),
mantel.fun = "mantel.randtest")
mantel$p <- cut(mantel$p.value, breaks = c(0, 0.001, 0.01, 0.05, 1),
labels = c("< 0.001", "< 0.01", "< 0.05", ">= 0.05"),
right = FALSE, include.lowest = TRUE)
p1 <- quickcor(varechem) + geom_square() + remove_axis("x")
p2 <- quickcor(mantel, mapping = aes(fill = p.value), is.minimal = TRUE, keep.name = TRUE) +
geom_star(aes(r = 0.65), n = 5, ratio = 0.6)
plot_grid(p1, p2, ncol = 1, align = "v", labels = c('A', 'B'),
rel_heights = c(0.135*dim( varechem)[2], 1))
实战
基于phloseq我开发了一系列基于扩增子的数据分析脚本,我也将再不久将这个脚本纳入,这里大家必须学习phyloseq的封装格式和基本用法。
library("phyloseq")
library(microbiomeSeq)
library("vegan")
library("grid")
library("gridExtra")
library("ggplot2")
ps = readRDS(".//ps_OTU_.rds")
ps1 = ps
ps1 = filter_taxa(ps1, function(x) sum(x ) > 200 , TRUE);ps1
ps1 = transform_sample_counts(ps1, function(x) x / sum(x) );ps1
path = "./phyloseq_5_RDA_CCA_cor/"
dir.create(path)
vegan_otu <- function(physeq){
OTU <- otu_table(physeq)
if(taxa_are_rows(OTU)){
OTU <- t(OTU)
}
return(as(OTU,"matrix"))
}
otu = as.data.frame(t(vegan_otu(ps1)))
mapping = as.data.frame( sample_data(ps1))
env.dat = mapping[,3:ncol(sample_data(ps1))]
env.st = decostand(env.dat, method="standardize", MARGIN=2)#
env_dat = env.st
#这里不仅仅展示了相关值,而且展示了置信区间,geom_cross用于排除不显著相关。
quickcor(env_dat, type = "full", cor.test = TRUE) + geom_confbox()+ geom_cross()
这里使用方形我觉得高档一些,本人比较喜欢,将不显著的去掉。
# 这里使用方形我觉得高档一些,本人比较喜欢
quickcor(env_dat, type = "full", cor.test = TRUE) +
geom_square() + geom_cross()
配合颜色填充和相关标注,加上显著性标签很容易整体把控指标的相关性。使用r = NA参数去除相关值,因为颜色已经表示过相关大小了,值就省略掉吧,mark = "*"参数规范统一的显著性标签。
#配合颜色填充和相关标注,加上显著性标签很容易整体把控指标的相关性。使用r = NA参数去除相关值,因为颜色已经表示过相关大小了,值就省略掉吧,mark = "*"参数规范统一的显著性标签。
quickcor(env_dat, type = "full", cor.test = TRUE, cluster.type = "all") +
# geom_raster() +
geom_colour()+
geom_mark(r = NA,sig.thres = 0.05, size = 3, colour = "grey90")
这样一来我们来挑选微生物和环境因子做相关就方便多了。
# 非对称相关图形可以节省空间,很多人曾求助怎么做,其实就是将最后的出图矩阵做相应的裁剪就好了,之前我都是自己裁剪的,现在厚哥包装进去了,方便了很多。
#计算相关
#太多OTU展示起来不太好看,这里我选择30个展示
ss = t(otu)[,1:30]
df03 <- fortify_cor(x = env_dat, y = ss, cluster.type = "col")
quickcor(df03) + geom_square()
现在我们来组合群落和环境因子关系,这里我模拟了两个群落,这两个群落都是一样的,我通过list收纳这两个群落.
xlim = c(-5, (dim(env_dat)[2] +0.5)):在厚哥的指点下修改为环境因子数量加0.5,厚哥表示可能之后这个参数会被写到函数内部。
#转置otu表格,作为第一个群落
otu2 = t(otu)
#同样赋值为第二个群落
otu3 = t(otu)
#无论多少个群落,将其使用list包起来,注意设置名称
spe = list(A = otu2,B = otu3,C = otu3,D = otu3)
a = dim(spe[[1]])[2]
b = dim(spe[[2]])[2]
c = dim(spe[[3]])[2]
d = dim(spe[[4]])[2]
spec.select = list(A = 1:a,
B = (a+1) :(b +a ),
C = (a +b +1) :(b +a +c),
D = (a +b +c +1) :(b +a +c +d)
)
df02 <- mantel_test(as.data.frame(spe), env_dat,
spec.select = spec.select) %>%
mutate(lty = cut(r, breaks = c(-Inf, 0, Inf),
labels = c("r <= 0", "r > 0")),
col = cut(p.value, breaks = c(0, 0.01, 0.05, 1),
labels = c("< 0.01", "< 0.05", ">= 0.05"),
right = FALSE, include.lowest = TRUE))
extra.params <- extra_params(
spec.label = text_params(colour = "red", size = 7),
env.point = point_params(size = 2, fill = "grey80"),
spec.point = point_params(size = 4, shape = 24, fill = "red"),
link.params = link_params(env.point.hjust = -0.5,
env.point.vjust = -0.1,
spec.point.hjust = 1)
)
quickcor(env_dat, type = "lower") +
geom_square() +
add_link(df02, mapping = aes(colour = col,
size = r,
linetype = lty),
diag.label = TRUE, spec.label.hspace = 0.5,
extra.params = extra.params) +
# add_diag_label(angle = 45) +
remove_axis("y") +
expand_axis(x = 25) + ## 扩展x轴范围
scale_fill_gradient2n() +
scale_colour_manual(values = c("#D95F02", "#1B9E77", "#D2D2D2")) +
scale_size_area(max_size = 3) +
scale_linetype_manual(values = c("dotted", "solid")) +
guides(
fill = guide_colourbar(title = "corr", order = 1),
colour = guide_legend(title = "Mantel's p", order = 2),
size = guide_legend(title = "Mantel's r", order = 3),
linetype = "none"
)
另外一种可视化群落和环境因子关系的组合图表,B图需要调整大小,这里我本来以为是线性映射,做一个变量,显然不是,我还是老老实实修改了一下大小。
library(cowplot)
library(cowplot)
mantel <- fortify_mantel(as.data.frame(spe), env_dat,
spec.select = spec.select,
mantel.fun = "mantel.randtest")
mantel$p <- cut(mantel$p.value, breaks = c(0, 0.001, 0.01, 0.05, 1),
labels = c("< 0.001", "< 0.01", "< 0.05", ">= 0.05"),
right = FALSE, include.lowest = TRUE)
p1 <- quickcor(env_dat) + geom_square() + remove_axis("x")
p2 <- quickcor(mantel, mapping = aes(fill = p.value), is.minimal = TRUE, keep.name = TRUE) +
geom_star(aes(r = 0.65), n = 5, ratio = 0.6)
plot_grid(p1, p2, ncol = 1, align = "v", labels = c('A', 'B'),
rel_heights = c(0.14*dim( varechem)[2], 1))