火山图是测序分析报告中最为核心的图片之一。绘制火山图的方法有许多,Excel和第三方软件等,本文主要运用ggplot2和ggrepel两个R包演示[1][2]。
横坐标为Fold change(倍数),越偏离中心差异倍数越大;纵坐标为P value(P值),值越大差异越显著。
(一):绘图分析:
基本要素: 区分上调、下调的散点、辅助线、坐标轴、图例、标签
运用参数: geom_point()、geom_vline and geom_hline、labs()、theme()、geom_text_repel()
注:区分上、下调基因,就要对差异基因进行统计学分析,主要包括统计显著性和生物学差异显著性;需要用到ifelse函数 [3]
(二):数据处理
rm(list = ls())
library(ggplot2)
# 数据来自https://www.jianshu.com/p/a973c29c9103 [参考资料2]
dataset <- read.table(file = "C:/Users/Administrator/Documents/dataset_volcano.txt",
header = TRUE, sep = "")
# 设置p_value和logFC的阈值
cut_off_pvalue = 0.0000001 #统计显著性
cut_off_logFC = 1 #差异倍数值
# 根据阈值参数,上调基因设置为‘up’,下调基因设置为‘Down’,无差异设置为‘Stable’,并保存到change列中
dataset$change = ifelse(dataset$P.Value < cut_off_pvalue & abs(dataset$logFC) >= cut_off_logFC,
ifelse(dataset$logFC> cut_off_logFC ,'Up','Down'),
'Stable')
head(dataset)
gene logFC P.Value change
1 LOC100909539 -4.514013 4.33e-12 Down
2 Steap2 -3.678175 1.12e-11 Down
3 Trpt1 2.594115 1.21e-11 Up
4 Cers6 -3.630773 1.45e-11 Down
5 Hs3st1 -3.129658 1.74e-11 Down
6 Lama5 -2.772380 2.51e-11 Down
先从整体判断条件的真假:
1. 不满足,记作"stable";
2. 满足;
logFC大于1,记作"up";
logFC小于1,记作"down";
(三):绘图
p <- ggplot(
# 数据、映射、颜色
dataset, aes(x = logFC, y = -log10(P.Value), colour=change)) +
geom_point(alpha=0.4, size=3.5) +
scale_color_manual(values=c("#546de5", "#d2dae2","#ff4757"))+
# 辅助线
geom_vline(xintercept=c(-1,1),lty=4,col="black",lwd=0.8) +
geom_hline(yintercept = -log10(cut_off_pvalue),lty=4,col="black",lwd=0.8) +
# 坐标轴
labs(x="log2(fold change)",
y="-log10 (p-value)")+
theme_bw()+
# 图例
theme(plot.title = element_text(hjust = 0.5),
legend.position="right",
legend.title = element_blank())
p
(四):添加标签
# 将需要标记的基因放置在label列(logFC >= 5)
library(ggrepel)
dataset$label <- ifelse(dataset$P.Value > cut_off_pvalue & abs(dataset$logFC) >= 5,
as.character(dataset$gene), "")
dataset[2600:2606, ]
gene logFC P.Value change label
2600 Ano5 0.1883082 1.20e-07 Stable
2601 Rab3a 0.8501624 1.21e-07 Stable
2602 Pank1 5.2116194 1.21e-07 Stable Pank1
2603 Ghitm 2.4541362 1.21e-07 Stable
2604 Ndufa8 1.4141038 1.21e-07 Stable
2605 Zc3h7a -0.9070267 1.21e-07 Stable
2606 Trpm3 0.1879326 1.21e-07 Stable
p + geom_label_repel(data = dataset, aes(x = dataset$logFC,
y = -log10(dataset$P.Value),
label = label),
size = 3, box.padding = unit(0.5, "lines"),
point.padding = unit(0.8, "lines"),
segment.color = "black",
show.legend = FALSE)
关于ggrepel包,后期会写参数使用相关教程
(五):Volcano plot[4]
# 火山图应用
genes <- read.table(file = "C:/Users/Administrator/Documents/volcano.txt", header = TRUE)
genes$Significant <- ifelse(genes$padj < 0.05 & abs(genes$log2FoldChange) >= 1,
ifelse(genes$log2FoldChange > 1, "Up", "Down"), "Stable")
ggplot(
# 数据、映射、颜色
genes, aes(x = log2FoldChange, y = -log10(pvalue))) +
geom_point(aes(color = Significant), size=2) +
scale_color_manual(values = c("blue","grey", "red")) +
# 注释
geom_text_repel(
data = subset(genes, padj < 0.05 & abs(genes$log2FoldChange) >= 1),
aes(label = Gene),
size = 5,
box.padding = unit(0.35, "lines"),
point.padding = unit(0.3, "lines")) +
# 辅助线
geom_vline(xintercept=c(-1,1),lty=4,col="black",lwd=0.8) +
geom_hline(yintercept = 1,lty=4,col="black",lwd=0.8) +
# 坐标轴
labs(x="log2(fold change)",
y="-log10 (p-value)") +
# 图例
theme(legend.position = "bottom")
进一步修饰
# 进一步修饰
ggplot(
# 数据、映射、颜色
genes, aes(x = log2FoldChange, y = -log10(pvalue))) +
geom_point(aes(color = Significant), size=2) +
scale_color_manual(values = c("blue","grey", "red")) +
# 注释
geom_label_repel(
data = subset(genes, padj < 0.05 & abs(genes$log2FoldChange) >= 1),
aes(label = Gene),
size = 5,fill = "darkred", color = "white",
box.padding = unit(0.35, "lines"),
point.padding = unit(0.3, "lines")) +
# 辅助线
geom_vline(xintercept=c(-0.5, 0.5),lty=4,col="#666666",lwd=0.8) +
geom_hline(yintercept = 1,lty=4,col="#666666",lwd=0.8) +
# 坐标轴
labs(x="log2(fold change)",
y="-log10 (p-value)") +
# 图例
theme(legend.position = "bottom")
参考资料:
-
使用ggplot2和ggrepel包做火山图 ↩
-
R 数据可视化 02 | 火山图 ↩
-
R语言_ifelse()函数用法 ↩
-
ggplot2 texts : Add text annotations to a graph in R software ↩