用R重现文章图片Reproducing-1 Core gut microbial communities are maintained by beneficial interactions an...

原始文章:https://www.nature.com/articles/s41564-019-0560-0
文章题目：Core gut microbial communities are maintained by beneficial interactions and strain variability in fish
杂志：Nature Microbiology
一、要重现的目标图（分类如标注）：
图1.

image.png

图2.

image.png

二、需要的工具及准备：
首先：本次可视化基于windows 10系统。
1）进化树的构建：MEGA X version 10.2.2，用于构建进化树；
下载地址：https://www.megasoftware.net/

2）进化树的修饰（在线工具）：EvolView,用于对进化树带上标记，着色
https://evolgenius.info/evolview-v2/#mytrees/DEMOS/yeast%20duplications

3）数据存储和准备：Microsoft Excel，text

4）数据可视化：R version 4.0.3 和 R Studio,用来要绘制、修饰和整合：
堆积柱状图（Stacked Histogram ), 箱图（box plot），柱状图（bar chart），Venn图（Venn diagram），泡泡图（bubble plot)，曲线图（Line Graph）
安装请参考：https://www.jianshu.com/p/1a0f25086e8b
两者下载地址分别是：
R: https://www.r-project.org/
R Studio: https://rstudio.com/products/rstudio/download/

5）辅助工具Adobe Photoshop CS4: 做出来的图中的一些文字标记进行一定的修补等。

三、数据准备与实现：
3.1堆积柱状图（Stacked Histogram )：
3.1.1数据准备：一般可视化最重要的一部分就是数据的准备，请看以下图，给出了对应的数据内容，以下简作说明：
第一列：是分组，即共四个组。
第二列：每个组里的不同物种。
第三列：每个分组里的每个物种的相对丰度。

这样分清层次后，我就自己生成了以下数据，并存为CSV格式。

image.png

3.1.2需要的R 包即具体实现过程：

install.packages('ggplot2') #安装ggplot画图包
library(ggplot2)#导入ggplot包

然后就是要读取数据了,为了方便初学者，我用file.choose跳出框来读取：

df<-read.csv(file.choose())

点击enter，会跳出框，选择刚才的csv文件，即完成了读取，数据内容可以点击

df

查看：

image.png

说明没问题,是得到了dataframe；这样：数据准备好了，作图的工具也准备好了，那么我们就开始作图：

ggplot(df,aes(x=Status,y=Relative.abundance....,fill=Taxonomy))   +     geom_bar(stat = 'identity', width = 0.3, position = 'fill')

image.png

说明：
ggplot(作图用的dataframe,aes(x=组名的列,y=丰度数值,fill=按照物种类型着色)) + geom_bar(stat = 'identity', width =柱子的宽度 , position = 'fill' 则会铺满整个高度，如果删除则会显示如下这种图(复制以下代码再运行就会明白))

ggplot(df,aes(x=Status,y=Relative.abundance....,fill=Taxonomy))   +     geom_bar(stat = 'identity', width = 0.5)

image.png

然而发现：背景颜色为灰色而且带着线条，难看，所以去掉背景颜色和线条，即

ggplot(df,aes(x=Status,y=Relative.abundance....,fill=Taxonomy))   +     geom_bar(stat = 'identity', width = 0.3, position = 'fill')+theme_set(theme_bw())+theme(panel.grid.major=element_line(colour=NA))

说明：
theme_set(theme_bw())#去掉背景色，
theme(panel.grid.major=element_line(colour=NA)) #去掉线条颜色

这样就得到以下图，离目标越来越近了：

image.png

发现目标图里的横坐标标签存在一定角度的，那么横坐标和纵坐标标签用以下代码实现：

ggplot(df,aes(x=Status,y=Relative.abundance....,fill=Taxonomy))   +     geom_bar(stat = 'identity', width = 0.3, position = 'fill')+theme_set(theme_bw())+theme(panel.grid.major=element_line(colour=NA))
+theme(axis.text.x = element_text(face="bold", color="black", size=10, angle=90), axis.text.y = element_text(face="bold", color="black", size=10))

image.png

删除横坐标轴的名称：

ggplot(df,aes(x=Status,y=Relative.abundance....,fill=Taxonomy))   +     geom_bar(stat = 'identity', width = 0.3, position = 'fill')+theme_set(theme_bw())+theme(panel.grid.major=element_line(colour=NA))+theme(axis.text.x = element_text(face="bold", color="black", size=10, angle=90), axis.text.y = element_text(face="bold", color="black", size=10))
+theme(axis.title.x=element_blank())

image.png

发现如果角度设置为45的话，横坐标标签与坐标轴交叉，所以用theme的调节来拉开距离，

p0<-ggplot(df,aes(x=Status,y=Relative.abundance....,fill=Taxonomy))   +     geom_bar(stat = 'identity', width = 0.3, position = 'fill')+theme_set(theme_bw())+theme(panel.grid.major=element_line(colour=NA))+theme(axis.text.x = element_text(face="bold", color="black", size=10, angle=45), axis.text.y = element_text(face="bold", color="black", size=10))+theme(axis.title.x=element_blank())+ theme(axis.text.x = element_text(angle = 45, hjust = 0.4, size = 10, vjust=0.4))

得到了较好的图：

image.png

3.1.3 如果对其文字格式或字体需要继续调整，可以用ps实现。
具体如下：
1）首先将上述图输出：

image.png

2）用Adobe Photoshop CS4 打开，并选择工具---选择--delete删除文字部分；

image.png

3）ctrl+shift+N 新建图层--用文字工具输入标签---再点击选择工具，点击文字--输入-45度--应用---得到比较好看的理想图：

image.png

其他的文字部分也是类似处理并保存：得到下图

Rplot2.png

3.2箱图box plot的绘制：
3.2.1简单箱图的绘制
箱图的表示的意义与理解请参考我之前的一个笔记：https://www.jianshu.com/p/54d4996d73cd

箱图数据的格式与要求：

image.png

同上：读取数据（是dataframe）

bp<-read.csv(file.choose())
bp

image.png

作图：

boxplot(bp, col = c("green","brown","purple","blue"),names = c("High marine Protein","Medium fat","High fat","Low marine protein"),  ylab = "Richness")

解释：
boxplot(bp#选择上面的dataframe, col = c(#给每个箱图选择颜色，用英文标点双引号！"green","brown","purple","blue"),names = c(#给每个箱子命名"High marine Protein","Medium fat","High fat","Low marine protein"), ylab = #给纵轴起名"Richness")
得到：

image.png

发现目标图中，下标都单独拿出来形成了legend，这是想到其实ggplot直接作box plot就可以直接搞定，但是数据格式略有不同，准备的数据为：

image.png

#读取数据
bp<-read.csv(file.choose())

然后进行作图：

ggplot(bp, aes(x=Diet, y=Richness, fill=Diet)) + geom_boxplot()

得到：

image.png

这下就按照第一个堆积图的后续修饰，删除横坐标标签：

ggplot(bp, aes(x=Diet, y=Richness, fill=Diet)) + geom_boxplot()+theme(axis.title.x=element_blank())

即得到：

image.png

或者再原图基础上可以删除横坐标的所有标签：

p1<-p1+theme(axis.title.x=element_blank(),
         axis.text.x=element_blank(),
         axis.ticks.x=element_blank())
p1#会看到以下的结果：

image.png

第一张箱图先就这么搞定！为了后续合并图形结果，我们把这张赋值为p1,
即：

p1<-ggplot(bp, aes(x=Diet, y=Richness, fill=Diet)) + geom_boxplot()+theme(axis.title.x=element_blank())

image.png

3.2.2 箱图2：带p value的箱图的绘制
先要安装另外两个包：
ggsignif这个显著性检验的包

install.packages('ggsignif')

和ggthemr包：

 install.packages('ggthemr') #主题配置包

如果ggthemr报错，则：用devtools搞定！
方法：

install.packages("devtools")# 先安装或更新
devtools::install_github('Mikata-Project/ggthemr')#安装ggthemr,如果提示选择None

并导入包：

library(ggplot2)
library(ggthemr)   
library(ggsignif)

这样就可以做正事了：

#分组
compaired <- list(c("High marine Protein", "Medium fat"), c("High marine Protein","High fat"), c("High marine Protein","Low marine protein"),c("High fat", "Medium fat"),c("High fat", "Low marine protein"),c("High fat", "Low marine protein"))#先进行比较的分组

然后作图：

ggthemr("flat") 
p2 <- ggplot(bp, aes(Diet, Richness, fill = Diet)) + geom_boxplot() + geom_signif(comparisons = compaired, step_increase = 0.3, map_signif_level = F, test = wilcox.test)
p2

得到wilcox.test()比较两组(非参数)差异分析标注的箱图，如下：

image.png

注：map_signif_level 如果为TRUE，显示方法为""=0.001, ""=0.01, ""=0.05,如果选择f会显示数值
即下图:NS 表示not significant 不显著

image.png

各类检验适用的情况如下：

image.png

ggsignif主要的一个函数是geom_signif()，使用方法和ggplot2中其他的geom_***()一样，作为图层添加到图形中就可以，他的主要参数为：

image.png

另外要注意的是：做显著性检验的前提是数据要符合正态分布，对应的正态分布的检验可参考我之前的简单笔记：https://www.jianshu.com/p/0150a9233809

3.2.3 分组箱图的绘制
发现有一个箱图按照组别分箱，所以我们给原数据进行分组，数据如下：

image.png

然后对上述结果进行一个分组：

p3<-p2+geom_boxplot()+facet_wrap(~gender)#分箱处理

运行得到分组的箱图：

image.png

3.2.4 用ggpubr实现多个箱图的合并与组合：

install.packages('ggpubr')#安装包
library(ggpubr)#加载包

然后给定参数，进行组合：

ggarrange(p0,p1,p2,p3,labels = c("A","B","C","D",ncol=2,nrow=2))#两行两列

说明：输入每个箱图的名字p0,p1,p2,p3,给定图名，几行几列，运行就可以得到组合图：

Rplot11.png

重要：去掉横坐标标签
对四张图都去除横坐标的标签：
用的代码是：

+theme(axis.title.x=element_blank(),
             axis.text.x=element_blank(),
             axis.ticks.x=element_blank())

具体实现为

p0<-p0+theme(axis.title.x=element_blank(),
             axis.text.x=element_blank(),
             axis.ticks.x=element_blank())

p1<-p1+theme(axis.title.x=element_blank(),
             axis.text.x=element_blank(),
             axis.ticks.x=element_blank())

p2<-p2+theme(axis.title.x=element_blank(),
    axis.text.x=element_blank(),
    axis.ticks.x=element_blank())

p3<-p3+theme(axis.title.x=element_blank(),
             axis.text.x=element_blank(),
             axis.ticks.x=element_blank())

合并的代码

ggarrange(p0,p1,p2,p3,labels = c("A","B","C","D",ncol=2,nrow=2))#两行两列

去除横坐标再合并的结果是：

image.png

对于文字部分的修订,文字格式的调整可以参考上述3.1.3 Adobe Photoshop CS4进行美化。

3.3 系统进化树的构建与美化：
3.3.1 系统进化树的美化：
基于16S rRNA基因序列构建细菌进化树略过（首先要有全长的16S序列fasta文件，然后将其导入至mega进行muscle algorithm比对---输出为mega文件，此文以Neighbor-joining method为例，bootsrap值为1000,作出进化树：

image.png

得到进化树：

image.png

输出为nwk格式：

image.png

3.3.2美化进化树：
1)打开Evolview主页：
https://evolgenius.info/evolview-v2/#mytrees/DEMOS/yeast%20duplications
上传进化树

image.png

做成环形不带物种名称的树：

image.png

2)进行对应的标注：
https://www.evolgenius.info/evolview/helpsite/dat9.html
根据这个页面的标注提示，可以作出对应的标注形状和颜色，位置等，甚至可以添加柱状图等，这里只重现对树枝进行标注。

1.先将进化树中的物种名称输入到Excel文件：

image.png

颜色参考：
http://xh.5156edu.com/page/z1015m9220j18754.html
具体内容为：

!defaultstrokewidth     0.7
Blautia obeum ATCC 29174T (X85101)  rect,  #008080
Blautia glucerasea HFTH-1T (AB439724)   rect,  #008080
Blautia faecis strain M25T (HM626178)   triangle,  #008080
Blautia wexlerae strain WAL 14507T (EF036467)   triangle,  #008080
Blautia schinkii strain Bie 41T (X94964)    circle,#008080
Blautia caecimuris strain SJ18T (KR364746)  circle,#FFA500
Blautia luti strain bln9T (AJ133124)    triangle, #FFA500
Blautia stercoris strain GAM6-1T (HM626177) triangle, #FFA500
Blautia hansenii JCM 14655T (AB534168)  star,#C71585
Blautia hominis strain KB1T (KY703632)  star,#C71585
Blautia producta ATCC 27340T (X94966)   star,#C71585
Blautia coccoides JCM 1395T (AB571656)  star,#FFA500
Blautia hydrogenotrophica JCM 14656T (AB910751) star,#FFA500
Murimonas intestini strain SRB-530-5-HT (KC311366)  star,#008080

image.png

将上述标签存于txt文件，并上传：

image.png

输出的图片为：

tree.png

最终再用3.1.3中的PS 做上对应的标签即可。

即：

tree.png

这部分要实现的内容：

image.png

3.4直方图的绘制：添加间隔线和加上上标

3.4.1 数据的准备：

image.png

数据的读取：

b<-read.csv(file.choose())
b

image.png

3.2.2 直方图的绘制与修饰

library(ggplot2)#加载包
b1<-ggplot(b,mapping=aes(x=Niche.width,y=Number.of.ESV,fill='green',group=factor(2)))+  geom_bar(stat="identity",position=position_dodge(0.7),width=0.99, fill=c("grey","grey","grey","grey","grey","pink"), colour=c("black","black","black","black","black","pink"))
b1

说明：position=position_dodge()是柱子间距；
width=0.99柱子宽度；fill=c("grey","grey","grey","grey","grey","pink"),是填充色；colour是柱子外周颜色。

得到柱状图：

image.png

插入红色间隔虚线：

b1<-b1+geom_vline(xintercept = 4.0,linetype=5,col="red")
b1

image.png

添加柱子上的文字标签：
说明:vjust 表示文字与柱子上线的间距，正的话是在柱子内，负值在柱子上，size为字体大小

b1<-b1+geom_text(aes(Niche.width, Number.of.ESV,label=label),size=4.3,vjust=-0.75)
b1

得到：

image.png

柱子横纵坐标名称文字字体调整

b1<-b1+theme_classic(base_size = 16) #去掉背景色，并坐标名称文字调整
b1

得到：

image.png

纵坐标范围的制定：

b1<-b1+scale_y_continuous(limits=c(0,1150))
b1

得到最终的目标图：

image.png

3.5 Venn Diagram Venn图的绘制：三维和四维Venn图的绘制
Venn图是可以表示某个值在不同组中的交集和并集关系，因此可以是数值、字符串（即文字）类的相互重叠和不重叠部分的展示。

3.5.1 三维Venn图的绘制：
数据格式：

image.png

install.packages('VennDiagram')#安装包
library(VennDiagram)#导入包
sdv<-read.csv(file.choose())#读取数据

三维Venn图的绘制：

sdv.plot <- venn.diagram(
    x = list( Pyloric.caeca=sdv$Pyloric.caeca, Midgut = sdv$Midgut, Hindgut = sdv$Hindgut),
    filename = 'C:/Users/Mr.R/Documents/ReproduceImages/Reproduction1 Core gut microbial communities are maintained by beneficial interactions and strain/sdvVennDiagram.tiff',    
    col = "black", lwd = 1,
    fill = c("cornflowerblue", "green", "yellow"),alpha = 0.20, cex = 0.6,    fontfamily = "serif",    fontface = "bold",  cat.col = c("darkblue", "darkgreen", "orange"), cat.cex = 1.2,    
    cat.fontface = "bold",cat.fontfamily = "serif")

得到的结果为：

image.png

3.5.2 四维Venn图的绘制
1）数据格式为：

image.png

首先安装对应的包并加载

install.packages('VennDiagram')#安装包
library(VennDiagram)#导入包

读取数据

vn= read.table('C:/Users/Mr.R/Documents/ReproduceImages/Reproduction1 Core gut microbial communities are maintained by beneficial interactions and strain/venn.txt', header = T,sep="\t")
head(vn)#显示数值首6行

image.png

开始作图：

venn.plot <- venn.diagram(
x = list( High.marine.Protein= vn$High.marine.Protein, Medium.fat = vn$Medium.fat, High.fat = vn$High.fat,  Low.marine.protein = vn$Low.marine.protein    ),
filename = 'C:/Users/Mr.R/Documents/ReproduceImages/Reproduction1 Core gut microbial communities are maintained by beneficial interactions and strain/VennDiagram1.tiff',    
col = "black",    lty = "dotted",lwd = 1,
fill = c("cornflowerblue", "green", "yellow", "darkorchid1"),alpha = 0.20,    
label.col = c("orange", "white", "darkorchid4", "white", "white", "white","white", "white", "darkblue", "white", "white", "white", "white", "darkgreen", "white"),
cex = 0.6,    fontfamily = "serif",    fontface = "bold",    
cat.col = c("darkblue", "darkgreen", "orange", "darkorchid4"), cat.cex = 0.6,    
cat.fontface = "bold",cat.fontfamily = "serif")

得到以下图形：

image.png

说明：

x = list( High.marine.Protein= vn$High.marine.Protein, Medium.fat = vn$Medium.fat, High.fat = vn$High.fat,  Low.marine.protein = vn$Low.marine.protein    ),#形成list
filename = 'C:/Users/Mr.R/Documents/ReproduceImages/Reproduction1 Core gut microbial communities are maintained by beneficial interactions and strain/VennDiagram1.tiff', #输出位置和图片格式
   col = "black",    lty = "dotted", #边框线型为"dotted"可以得到虚线，如果不赋值会默认实现
   lwd = 1, # 边框线的宽度
 fill = c("cornflowerblue", "green", "yellow", "darkorchid1"),   alpha = 0.20,    #对不同组的着色
label.col = c("orange", "white", "darkorchid4", "white", "white", "white", "white", "white", "darkblue", "white",white", "white", "white", "darkgreen", "white"),#是对交集元素个数的数字的颜色
cex = 1.0,  fontfamily = "serif",    fontface = "bold",    cat.col = c("darkblue", "darkgreen", "orange", "darkorchid4"),    #字体粗细和颜色等
cat.cex = 0.8, cat.fontface = "bold",#标记文字的字体大小和粗细
 cat.fontfamily = "serif")

但是原图是实现，数字标记为黑色，所以稍加修改：去掉虚线线性，去掉数字标记颜色，默认使用黑色实现。

venn.plot <- venn.diagram(
    x = list( High.marine.Protein= vn$High.marine.Protein, Medium.fat = vn$Medium.fat, High.fat = vn$High.fat,  Low.marine.protein = vn$Low.marine.protein    ),
    filename = 'C:/Users/Mr.R/Documents/ReproduceImages/Reproduction1 Core gut microbial communities are maintained by beneficial interactions and strain/VennDiagram.tiff',    
    col = "black",    ,lwd = 1,
    fill = c("cornflowerblue", "green", "yellow", "darkorchid1"),alpha = 0.20,    
    cex = 0.6,    fontfamily = "serif",    fontface = "bold",    
    cat.col = c("darkblue", "darkgreen", "orange", "darkorchid4"), cat.cex = 0.6,    
    cat.fontface = "bold",cat.fontfamily = "serif")

得到最终结果

image.png

3.6 气泡图Bubble plot的绘制
气泡图用于多维数据的展示，即：
1）不同动物
2）不同物种
3）物种数量（泡泡大小）
4）物种丰度（泡泡颜色的深浅）

首先需要的数据与格式：

image.png

读取数据

bubble<-read.csv(file.choose())#选择对应的CSV文件
bubble#展示读取的数据

开始绘制：

#导入画图包
library(ggplot2)

p = ggplot(bubble,aes(Animals,Species))
p=p + geom_point()  

# 修稿点的大小
p=p + geom_point(aes(size=Number))

# 展示四维数据
pbubble = p+ geom_point(aes(size=Number,color=Relative.Abundance))

# 设置渐变色
pr = pbubble+scale_color_gradient(low="green",high = "red")
#删除轴名称，调整字体倾斜度
bbp<-pr + theme(axis.text.x = element_text(angle = 45, hjust = 0.4, size = 10, vjust=0.5,face="bold" ))+theme(axis.text.y = element_text(face = "bold.italic"))+theme(legend.position = "bottom") +theme(axis.title = element_blank())
#查看结果
bbp

## 保存图片
ggsave("C:/Users/Mr.R/Documents/ReproduceImages/Reproduction1 Core gut microbial communities are maintained by beneficial interactions and strain/bubble.pdf")# 保存为pdf格式

ggsave("C:/Users/Mr.R/Documents/ReproduceImages/Reproduction1 Core gut microbial communities are maintained by beneficial interactions and strain/bubble.png",width=8,height=6)# 设定画布大小

这样就得到了目标泡泡图：

image.png

若需修改文字标签，可继续借用Adobe Photoshop CS4，进行修改。

3.7 纵轴log化的带标签散点图Scatter Plot的绘制
这张图看着像个线图，但是仔细看才发现是散点图!!!!

image.png

重点：
1）散点图上带标签
2）纵轴log化
（即：ggplot2::scale_y_log10()的应用）
具体实现如下：
3.7.1数据格式

image.png

读取数据：

l1<-read.csv(file.choose())
head(l1)

image.png

3.7.2 散点图绘图：

sp1<-ggplot(l1) + geom_point(aes(Species.Rank, Cumultative.relative.abundance), color = 'black') 
sp1

image.png

3.7.3纵轴的log化

sp1<-sp1+ggplot2::scale_y_log10()
sp1

image.png

3.7.3 散点图的点上添加文字标签：

sp1<-sp1+geom_text(aes(Species.Rank, Cumultative.relative.abundance,label=r1),size=3)
sp1

image.png

发现标签标到点上去了，有点难看，所以利用另一个ggrepel包里的标签函数进行标记，

install.packages(ggrepel)#安装包
library(ggrepel)#加载包

再对前面的图进行标记，总的代码为：

sp1<-ggplot(l1) + geom_point(aes(Species.Rank, Cumultative.relative.abundance), color = 'black') +ggplot2::scale_y_log10()+geom_text_repel(aes(Species.Rank, Cumultative.relative.abundance,label=r1)) #进行标记
sp1<-sp1+ theme_classic(base_size = 16)#横纵坐标的标记的字体大小调大
sp1

即：

image.png

这样就得到了比较理想的带标签的，纵轴为log化的散点图。

用ggpur包进行对上面图片的组合：

library(ggpubr)#加载包
ggarrange(b1,sp1,bbp,labels = c("a","b","c", ncol=1,nrow=3))

得到：

image.png

用R重现文章图片Reproducing-1 Core gut microbial communities are maintained by beneficial interactions an...

你可能感兴趣的:(用R重现文章图片Reproducing-1 Core gut microbial communities are maintained by beneficial interactions an...)