R绘图:ggplot2绘制火山图

火山图是测序分析报告中最为核心的图片之一。绘制火山图的方法有许多,Excel和第三方软件等,本文主要运用ggplot2和ggrepel两个R包演示[1][2]
横坐标为Fold change(倍数),越偏离中心差异倍数越大;纵坐标为P value(P值),值越大差异越显著。

(一):绘图分析:

基本要素: 区分上调、下调的散点辅助线坐标轴图例标签
运用参数: geom_point()、geom_vline and geom_hline、labs()、theme()、geom_text_repel()

注:区分上、下调基因,就要对差异基因进行统计学分析,主要包括统计显著性和生物学差异显著性;需要用到ifelse函数 [3]

(二):数据处理

rm(list = ls())
library(ggplot2)
# 数据来自https://www.jianshu.com/p/a973c29c9103 [参考资料2]
dataset <- read.table(file = "C:/Users/Administrator/Documents/dataset_volcano.txt", 
                      header = TRUE, sep = "")
# 设置p_value和logFC的阈值
cut_off_pvalue = 0.0000001  #统计显著性
cut_off_logFC = 1           #差异倍数值

# 根据阈值参数,上调基因设置为‘up’,下调基因设置为‘Down’,无差异设置为‘Stable’,并保存到change列中
dataset$change = ifelse(dataset$P.Value < cut_off_pvalue & abs(dataset$logFC) >= cut_off_logFC, 
                          ifelse(dataset$logFC> cut_off_logFC ,'Up','Down'),
                          'Stable')
head(dataset)
          gene     logFC  P.Value change
1 LOC100909539 -4.514013 4.33e-12   Down
2       Steap2 -3.678175 1.12e-11   Down
3        Trpt1  2.594115 1.21e-11     Up
4        Cers6 -3.630773 1.45e-11   Down
5       Hs3st1 -3.129658 1.74e-11   Down
6        Lama5 -2.772380 2.51e-11   Down

先从整体判断条件的真假:
  1. 不满足,记作"stable";
  2. 满足;
     logFC大于1,记作"up";
     logFC小于1,记作"down";

(三):绘图

p <- ggplot(
  # 数据、映射、颜色
  dataset, aes(x = logFC, y = -log10(P.Value), colour=change)) +
  geom_point(alpha=0.4, size=3.5) +
  scale_color_manual(values=c("#546de5", "#d2dae2","#ff4757"))+
  # 辅助线
  geom_vline(xintercept=c(-1,1),lty=4,col="black",lwd=0.8) +
  geom_hline(yintercept = -log10(cut_off_pvalue),lty=4,col="black",lwd=0.8) +
  # 坐标轴
  labs(x="log2(fold change)",
       y="-log10 (p-value)")+
  theme_bw()+
  # 图例
  theme(plot.title = element_text(hjust = 0.5), 
        legend.position="right", 
        legend.title = element_blank())
p
  • Rplot01.png

(四):添加标签

# 将需要标记的基因放置在label列(logFC >= 5)
library(ggrepel)
dataset$label <- ifelse(dataset$P.Value > cut_off_pvalue & abs(dataset$logFC) >= 5,
                      as.character(dataset$gene), "")
dataset[2600:2606, ]
     gene      logFC  P.Value change label
2600   Ano5  0.1883082 1.20e-07 Stable      
2601  Rab3a  0.8501624 1.21e-07 Stable      
2602  Pank1  5.2116194 1.21e-07 Stable Pank1
2603  Ghitm  2.4541362 1.21e-07 Stable      
2604 Ndufa8  1.4141038 1.21e-07 Stable      
2605 Zc3h7a -0.9070267 1.21e-07 Stable      
2606  Trpm3  0.1879326 1.21e-07 Stable

p + geom_label_repel(data = dataset, aes(x = dataset$logFC, 
                                     y = -log10(dataset$P.Value), 
                                     label = label),
                 size = 3, box.padding = unit(0.5, "lines"),
                 point.padding = unit(0.8, "lines"), 
                 segment.color = "black", 
                 show.legend = FALSE)

关于ggrepel包,后期会写参数使用相关教程

  • Rplot02.png

(五):Volcano plot[4]

# 火山图应用
genes <- read.table(file = "C:/Users/Administrator/Documents/volcano.txt", header = TRUE)
genes$Significant <- ifelse(genes$padj < 0.05 & abs(genes$log2FoldChange) >= 1, 
                            ifelse(genes$log2FoldChange > 1, "Up", "Down"), "Stable")
ggplot(
  # 数据、映射、颜色
  genes, aes(x = log2FoldChange, y = -log10(pvalue))) +
  geom_point(aes(color = Significant), size=2) +
  scale_color_manual(values = c("blue","grey", "red")) +
  # 注释
  geom_text_repel(
    data = subset(genes, padj < 0.05 & abs(genes$log2FoldChange) >= 1),
    aes(label = Gene),
    size = 5,
    box.padding = unit(0.35, "lines"),
    point.padding = unit(0.3, "lines")) +
  # 辅助线
  geom_vline(xintercept=c(-1,1),lty=4,col="black",lwd=0.8) +
  geom_hline(yintercept = 1,lty=4,col="black",lwd=0.8) +
  # 坐标轴
  labs(x="log2(fold change)",
       y="-log10 (p-value)") +
  # 图例
  theme(legend.position = "bottom")
Rplot01.png

进一步修饰

# 进一步修饰
ggplot(
  # 数据、映射、颜色
  genes, aes(x = log2FoldChange, y = -log10(pvalue))) +
  geom_point(aes(color = Significant), size=2) +
  scale_color_manual(values = c("blue","grey", "red")) +
  # 注释
  geom_label_repel(
    data = subset(genes, padj < 0.05 & abs(genes$log2FoldChange) >= 1),
    aes(label = Gene),
    size = 5,fill = "darkred", color = "white",
    box.padding = unit(0.35, "lines"),
    point.padding = unit(0.3, "lines")) +
  # 辅助线
  geom_vline(xintercept=c(-0.5, 0.5),lty=4,col="#666666",lwd=0.8) +
  geom_hline(yintercept = 1,lty=4,col="#666666",lwd=0.8) +
  # 坐标轴
  labs(x="log2(fold change)",
       y="-log10 (p-value)") +
  # 图例
  theme(legend.position = "bottom")
Rplot02.png

参考资料:


  1. 使用ggplot2和ggrepel包做火山图 ↩

  2. R 数据可视化 02 | 火山图 ↩

  3. R语言_ifelse()函数用法 ↩

  4. ggplot2 texts : Add text annotations to a graph in R software ↩

你可能感兴趣的:(R绘图:ggplot2绘制火山图)