论文是
Environmental factors shaping the gut microbiome in a Dutch population
数据和代码的github主页链接
https://github.com/GRONINGEN-MICROBIOME-CENTRE/DMP
这个也是数据代码的下载链接,可以看目录结构
https://zenodo.org/record/5910709#.YmAcp4VBzic
今天的推文重复一下论文中的figure2c
论文中提供的代码自定义和一个很长的 函数,好像是把统计检验和作图全都综合到一起了,但是我没看明白其中统计检验的部分,这里就把作图代码单独整理出来了,统计检验的FDR值最后手动添加
首先是读取数据
dfToPlot<-read.csv("dfToPlot.csv")
head(dfToPlot)
给x轴的变量赋予因子水平
dfToPlot$RELATIONSHIP.0 <- factor(dfToPlot$RELATIONSHIP.0,
levels=c("RND.PAIR","PARTNERS","PARENT_CHILD","SIBLINGS"))
这个因子水平主要是控制x轴的先后顺序
加载ggplot2
library(ggplot2)
箱线图
ggplot(data=dfToPlot,aes(x=RELATIONSHIP.0,
y=BC_Spec,
color=RELATIONSHIP.0))+
geom_boxplot()
抖动散点图
ggplot(data=dfToPlot,aes(x=RELATIONSHIP.0,
y=BC_Spec,
color=RELATIONSHIP.0))+
geom_jitter()
小提琴图
ggplot(data=dfToPlot,aes(x=RELATIONSHIP.0,
y=BC_Spec,
color=RELATIONSHIP.0))+
geom_violin()
三个图拼图
library(ggplot2)
ggplot(data=dfToPlot,aes(x=RELATIONSHIP.0,
y=BC_Spec,
color=RELATIONSHIP.0))+
geom_boxplot() -> p1
p1
ggplot(data=dfToPlot,aes(x=RELATIONSHIP.0,
y=BC_Spec,
color=RELATIONSHIP.0))+
geom_jitter() -> p2
p2
ggplot(data=dfToPlot,aes(x=RELATIONSHIP.0,
y=BC_Spec,
color=RELATIONSHIP.0))+
geom_violin() -> p3
p3
将三个图叠加到一张图上
cbPalette <- c("#E69F00", "#CC79A7", "#56B4E9", "#009E73", "#CC79A7", "#F0E442", "#999999","#0072B2","#D55E00")
ggplot(data=dfToPlot,aes(x=RELATIONSHIP.0,
y=BC_Spec,
color=RELATIONSHIP.0))+
geom_jitter(alpha=0.2,
position=position_jitterdodge(jitter.width = 0.35,
jitter.height = 0,
dodge.width = 0.8))+
geom_boxplot(alpha=0.2,width=0.45,
position=position_dodge(width=0.8),
size=0.75,outlier.colour = NA)+
geom_violin(alpha=0.2,width=0.9,
position=position_dodge(width=0.8),
size=0.75)+
scale_color_manual(values = cbPalette)+
theme_classic() +
theme(legend.position="none") +
theme(text = element_text(size=16)) +
#ylim(0.0,1.3)+
ylab("Bray-Curtis distance of Species")
最后是手动添加统计检验的文字
ggplot(data=dfToPlot,aes(x=RELATIONSHIP.0,
y=BC_Spec,
color=RELATIONSHIP.0))+
geom_jitter(alpha=0.2,
position=position_jitterdodge(jitter.width = 0.35,
jitter.height = 0,
dodge.width = 0.8))+
geom_boxplot(alpha=0.2,width=0.45,
position=position_dodge(width=0.8),
size=0.75,outlier.colour = NA)+
geom_violin(alpha=0.2,width=0.9,
position=position_dodge(width=0.8),
size=0.75)+
scale_color_manual(values = cbPalette)+
theme_classic() +
theme(legend.position="none") +
theme(text = element_text(size=16)) +
#ylim(0.0,1.3)+
ylab("Bray-Curtis distance of Species")+
#scale_x_discrete(labels=c("A","B","C","D"))+
annotate("segment", x = 1-0.01, y = 1, xend = 2.01,lineend = "round",
yend = 1,size=1,colour="black",arrow = arrow(length = unit(0.02, "npc")))+
annotate("segment", x = 2.01, y = 1, xend = 0.99,lineend = "round",
yend = 1,size=1,colour="black",arrow = arrow(length = unit(0.02, "npc")))+
annotate("text", x=1.5,y=1.01,
label=expression("**"~"FDR"~2.41%*%10^-10),vjust=0)
他这里的双箭头的处理方式是把一个单箭头添加两次
制作封面图
p4+p4+scale_color_manual(values = cbPalette[5:8])
今天推文的示例数据和代码可以在公众号后台留言
20220505
获取
欢迎大家关注我的公众号
小明的数据分析笔记本
小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!