用EnhancedVolcano 绘制火山图

最近发现一个新包EnhancedVolcano,画火山图令人发指的简单,而且用户自主设置颜色、形状、大小和阴影等参数定义不同的绘图属性,还可以通过添加连线的方式有效避免数据点之间的重叠现象,强烈分享一下:
介绍来源:
https://www.bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html

1 安装

1.1 1. 来源 Bioconductor

  if (!requireNamespace('BiocManager', quietly = TRUE))
    install.packages('BiocManager')

  BiocManager::install('EnhancedVolcano')

注:安装开发版

  devtools::install_github('kevinblighe/EnhancedVolcano')

1.2 2. 加载R包

  library(EnhancedVolcano)

2 快速启动

通过示例演示 RNA-seq workflow: gene-level exploratory analysis and differential expression. 加载 ‘airway’ 数据

  library(airway)
  library(magrittr)

  data('airway')
  airway$dex %<>% relevel('untrt')

注释Ensembl gene IDs为gene symbols:

  ens <- rownames(airway)

  library(org.Hs.eg.db)
  symbols <- mapIds(org.Hs.eg.db, keys = ens,
    column = c('SYMBOL'), keytype = 'ENSEMBL')
  symbols <- symbols[!is.na(symbols)]
  symbols <- symbols[match(rownames(airway), names(symbols))]
  rownames(airway) <- symbols
  keep <- !is.na(rownames(airway))
  airway <- airway[keep,]

运行DESeq2进行两组的差异分析

  library('DESeq2')

  dds <- DESeqDataSet(airway, design = ~ cell + dex)
  dds <- DESeq(dds, betaPrior=FALSE)
  res <- results(dds,
    contrast = c('dex','trt','untrt'))
  res <- lfcShrink(dds,
    contrast = c('dex','trt','untrt'), res=res, type = 'normal')

2.1 最基础的火山图绘制

对于多数基础火山图而言,仅有单个数据框、数据矩阵或三列数据结果即可,包括标签、log2FC和校正或未校正的P值,其中默认log2FC的阈值为|2|; 默认 P值阈值为10e-6.

  EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue')
最基础火山图

3 高级参数

实际上,EnhancedVolcano图的所有方面都可以配置为适应所有类型的统计分布和标签偏好。默认情况下,EnhancedVolcano只会尝试标记那些通过统计显著性阈值的基因,即“pCutoff”和“FCcutoff”。此外,它只会标注尽可能多的这些可以合理地适应情节空间。用户可以选择提供他/她希望在绘图中标记的标签向量(如“selectLab”)。

3.1 修改log2FC和P值的截止值;指定标题;调整点和标签大小

对于大多数研究而言,10e-6的默认P值截止值可能过于宽松,因此可能需要将该阈值提高几个数量级。同样,log2FC截止值可能过于严格,因为现在可以计算差异表达分析中log2FC差异的适度“收缩”估计。

在本例中,我们还修改了点和标签大小,这有助于提高差异表达式分析中许多变量的清晰度。

  EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    title = 'N061011 versus N61311',
    pCutoff = 10e-32,  ##自己定义p阈值
    FCcutoff = 0.5,  ##自己定义FC阈值
    pointSize = 3.0,
    labSize = 6.0)
修改log2FC和P值的截止值;指定标题;调整点和标签大小

4.2 调整点和字母的颜色

默认的配色方案可能不符合每个人的口味。在这里,我们使得只有通过log2FC和P值阈值的变量是红色的,其他的都是黑色的。我们还调整“alpha”的值,它控制打印点的透明度:1=100%不透明;0=100%透明。

  EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    title = 'N061011 versus N61311',
    pCutoff = 10e-16,
    FCcutoff = 1.5,
    pointSize = 3.0,
    labSize = 6.0,
    col=c('black', 'black', 'black', 'red3'),
    colAlpha = 1)
调整点和字母的颜色

3.3 调整点的形状

也可以绘制不同的点作为不同的形状。默认形状是圆。用户可以通过“shape”参数指定自己的形状编码,该参数接受单个或四个可能的值:如果是四个值,则这些值映射到同样由颜色指定的标准名称;如果只有一个值,则所有点都将使用该值进行变化。

更多形状信息见 ggplot2 Quick Reference: shape

 EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    title = 'N061011 versus N61311',
    pCutoff = 10e-16,
    FCcutoff = 1.5,
    pointSize = 4.0,
    labSize = 6.0,
    shape = 8,
    colAlpha = 1)
  EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    title = 'N061011 versus N61311',
    pCutoff = 10e-16,
    FCcutoff = 1.5,
    pointSize = 3.0,
    labSize = 6.0,
    shape = c(1, 4, 23, 25),
    colAlpha = 1)
点的形状

3.4 Adjust cut-off lines and add extra threshold lines

The lines that are drawn to indicate cut-off points are also modifiable. The parameter ‘cutoffLineType’ accepts the following values: “blank”, “solid”, “dashed”, “dotted”, “dotdash”, “longdash”, and “twodash”. The colour and thickness of these can also be modified with ‘cutoffLineCol’ and ‘cutoffLineWidth’. To disable the lines, set either cutoffLineType=“blank” or cutoffLineWidth=0.

Extra lines can also be added via ‘hline’ and ‘vline’ to display other cut-offs.

To make these more visible, we will also remove the default gridlines.

  EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    xlim = c(-6, 6),
    title = 'N061011 versus N61311',
    pCutoff = 10e-12,
    FCcutoff = 1.5,
    pointSize = 3.0,
    labSize = 6.0,
    colAlpha = 1,
    cutoffLineType = 'blank',
    cutoffLineCol = 'black',
    cutoffLineWidth = 0.8,
    hline = c(10e-20,
      10e-20 * 10e-30,
      10e-20 * 10e-60,
      10e-20 * 10e-90),
    hlineCol = c('pink', 'red', 'red2', 'red4'),
    hlineType = c('solid', 'longdash', 'dotdash', 'dotted'),
    hlineWidth = c(4, 3, 2, 1),
    gridlines.major = FALSE,
    gridlines.minor = FALSE)
image.png

Adjust cut-off lines and add extra threshold lines.

4.5 Adjust legend position, size, and text

The position of the legend can also be changed to “left” or “right” (and stacked vertically), or ‘top’ or “bottom” (stacked horizontally). The legend text, label size, and icon size can also be modified.

  EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    pCutoff = 10e-12,
    FCcutoff = 1.5,
    cutoffLineType = 'twodash',
    cutoffLineWidth = 0.8,
    pointSize = 4.0,
    labSize = 6.0,
    colAlpha = 1,
    legendLabels=c('Not sig.','Log (base 2) FC','p-value',
      'p-value & Log (base 2) FC'),
    legendPosition = 'right',
    legendLabSize = 16,
    legendIconSize = 5.0)
image.png

Adjust legend position, size, and text.

Note: to make the legend completely invisible, specify:

legendPosition = 'none'

4.6 Fit more labels by adding connectors

In order to maximise free space in the plot window, one can fit more labels by adding connectors from labels to points, where appropriate. The width and colour of these connectors can also be modified with ‘widthConnectors’ and ‘colConnectors’, respectively. Further configuration is achievable via ‘typeConnectors’ (“open”, “closed”), ‘endsConnectors’ (“last”, “first”, “both”), and lengthConnectors (default = unit(0.01, ‘npc’)).

The result may not always be desirable as it can make the plot look overcrowded.

  EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    xlab = bquote(~Log[2]~ 'fold change'),
    pCutoff = 10e-32,
    FCcutoff = 2.0,
    pointSize = 4.0,
    labSize = 6.0,
    colAlpha = 1,
    legendPosition = 'right',
    legendLabSize = 12,
    legendIconSize = 4.0,
    drawConnectors = TRUE,
    widthConnectors = 0.75)
image.png

Fit more labels by adding connectors.

4.7 Only label key variables

In many situations, people may only wish to label their key variables / variables of interest. One can therefore supply a vector of these variables via the ‘selectLab’ parameter, the contents of which have to also be present in the vector passed to ‘lab’. In addition, only those variables that pass both the cutoff for log2FC and P value will be labelled.

  EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    selectLab = c('TMEM176B','ADH1A'),
    xlab = bquote(~Log[2]~ 'fold change'),
    pCutoff = 10e-14,
    FCcutoff = 2.0,
    pointSize = 4.0,
    labSize = 6.0,
    shape = c(4, 35, 17, 18),
    colAlpha = 1,
    legendPosition = 'right',
    legendLabSize = 14,
    legendIconSize = 5.0)

4.8 Draw labels in boxes

To improve label clarity, we can draw simple boxes around the plots labels. This works much better when drawConnectors is also TRUE.

  EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    selectLab = c('VCAM1','KCTD12','ADAM12',
      'CXCL12','CACNB2','SPARCL1','DUSP1','SAMHD1','MAOA'),
    xlab = bquote(~Log[2]~ 'fold change'),
    pCutoff = 10e-14,
    FCcutoff = 2.0,
    pointSize = 4.0,
    labSize = 6.0,
    labCol = 'black',
    labFace = 'bold',
    boxedLabels = TRUE,
    colAlpha = 4/5,
    legendPosition = 'right',
    legendLabSize = 14,
    legendIconSize = 4.0,
    drawConnectors = TRUE,
    widthConnectors = 1.0,
    colConnectors = 'black')
image.png

Draw labels in boxes.

4.9 Over-ride colouring scheme with custom key-value pairs

In certain situations, one may wish to over-ride the default colour scheme with their own colour-scheme, such as colouring variables by pathway, cell-type or group. This can be achieved by supplying a named vector as ‘colCustom’.

In this example, we just wish to colour all variables with log2FC > 2.5 as ‘high’ and those with log2FC < -2.5 as ‘low’.

  # create custom key-value pairs for 'high', 'low', 'mid' expression by fold-change
  # this can be achieved with nested ifelse statements
  keyvals <- ifelse(
    res$log2FoldChange < -2.5, 'royalblue',
      ifelse(res$log2FoldChange > 2.5, 'gold',
        'black'))
  keyvals[is.na(keyvals)] <- 'black'
  names(keyvals)[keyvals == 'gold'] <- 'high'
  names(keyvals)[keyvals == 'black'] <- 'mid'
  names(keyvals)[keyvals == 'royalblue'] <- 'low'
  EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    selectLab = rownames(res)[which(names(keyvals) %in% c('high', 'low'))],
    xlab = bquote(~Log[2]~ 'fold change'),
    title = 'Custom colour over-ride',
    pCutoff = 10e-14,
    FCcutoff = 1.0,
    pointSize = 3.5,
    labSize = 4.5,
    shape = c(6, 4, 2, 11),
    colCustom = keyvals,
    colAlpha = 1,
    legendPosition = 'left',
    legendLabSize = 15,
    legendIconSize = 5.0,
    drawConnectors = TRUE,
    widthConnectors = 1.0,
    colConnectors = 'black',
    arrowheads = FALSE,
    gridlines.major = TRUE,
    gridlines.minor = FALSE,
    border = 'partial',
    borderWidth = 1.5,
    borderColour = 'black')
image.png

Over-ride colouring scheme with custom key-value pairs.

4.10 Over-ride colour and/or shape scheme with custom key-value pairs

In this example, we first over-ride the existing shape scheme and then both the colour and shape scheme at the same time.

  # define different cell-types that will be shaded
  celltype1 <- c('VCAM1','KCTD12','ADAM12','CXCL12')
  celltype2 <- c('CACNB2','SPARCL1','DUSP1','SAMHD1','MAOA')

  # create custom key-value pairs for different cell-types
  # this can be achieved with nested ifelse statements
  keyvals.shape <- ifelse(
    rownames(res) %in% celltype1, 17,
      ifelse(rownames(res) %in% celltype2, 64,
        3))
  keyvals.shape[is.na(keyvals.shape)] <- 3
  names(keyvals.shape)[keyvals.shape == 3] <- 'PBMC'
  names(keyvals.shape)[keyvals.shape == 17] <- 'Cell-type 1'
  names(keyvals.shape)[keyvals.shape == 64] <- 'Cell-type 2'
  p1 <- EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    selectLab = rownames(res)[which(names(keyvals) %in% c('high', 'low'))],
    xlab = bquote(~Log[2]~ 'fold change'),
    title = 'Custom shape over-ride',
    pCutoff = 10e-14,
    FCcutoff = 1.0,
    pointSize = 4.5,
    labSize = 4.5,
    shapeCustom = keyvals.shape,
    colCustom = NULL,
    colAlpha = 1,
    legendLabSize = 15,
    legendPosition = 'left',
    legendIconSize = 5.0,
    drawConnectors = TRUE,
    widthConnectors = 0.5,
    colConnectors = 'grey50',
    gridlines.major = TRUE,
    gridlines.minor = FALSE,
    border = 'partial',
    borderWidth = 1.5,
    borderColour = 'black')

  # create custom key-value pairs for 'high', 'low', 'mid' expression by fold-change
  # this can be achieved with nested ifelse statements
  keyvals.colour <- ifelse(
    res$log2FoldChange < -2.5, 'royalblue',
      ifelse(res$log2FoldChange > 2.5, 'gold',
        'black'))
  keyvals.colour[is.na(keyvals.colour)] <- 'black'
  names(keyvals.colour)[keyvals.colour == 'gold'] <- 'high'
  names(keyvals.colour)[keyvals.colour == 'black'] <- 'mid'
  names(keyvals.colour)[keyvals.colour == 'royalblue'] <- 'low'

  p2 <- EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    selectLab = rownames(res)[which(names(keyvals) %in% c('High', 'Low'))],
    xlab = bquote(~Log[2]~ 'fold change'),
    title = 'Custom shape & colour over-ride',
    pCutoff = 10e-14,
    FCcutoff = 1.0,
    pointSize = 5.5,
    labSize = 0.0,
    shapeCustom = keyvals.shape,
    colCustom = keyvals.colour,
    colAlpha = 1,
    legendPosition = 'right',
    legendLabSize = 15,
    legendIconSize = 5.0,
    drawConnectors = TRUE,
    widthConnectors = 0.5,
    colConnectors = 'grey50',
    gridlines.major = TRUE,
    gridlines.minor = FALSE,
    border = 'full',
    borderWidth = 1.0,
    borderColour = 'black')

  library(gridExtra)
  library(grid)
  grid.arrange(p1, p2,
    ncol=2,
    top = textGrob('EnhancedVolcano',
      just = c('center'),
      gp = gpar(fontsize = 32)))
image.png

Over-ride colour and/or shape scheme with custom key-value pairs.

4.11 Encircle / highlight certain variables

In this example we add an extra level of identifying key variables by encircling them.

This feature works best for shading just 1 or 2 key variables. It is expected that the user can use the ‘shapeCustom’ parameter for more in depth identification of different types of variables.

  # define different cell-types that will be shaded
  celltype1 <- c('VCAM1','CXCL12')
  celltype2 <- c('SORT1', 'KLF15')
  EnhancedVolcano(res,
    lab = rownames(res),
    x = 'log2FoldChange',
    y = 'pvalue',
    selectLab = c(celltype1, celltype2),
    xlab = bquote(~Log[2]~ 'fold change'),
    title = 'Shading cell-type 1|2',
    pCutoff = 10e-14,
    FCcutoff = 1.0,
    pointSize = 8.0,
    labSize = 6.0,
    labCol = 'black',
    labFace = 'bold',
    boxedLabels = TRUE,
    shape = 42,
    colCustom = keyvals,
    colAlpha = 1,
    legendPosition = 'right',
    legendLabSize = 20,
    legendIconSize = 20.0,
    # encircle
      encircle = celltype1,
      encircleCol = 'black',
      encircleSize = 2.5,
      encircleFill = 'pink',
      encircleAlpha = 1/2,
    # shade
      shade = celltype2,
      shadeAlpha = 1/2,
      shadeFill = 'skyblue',
      shadeSize = 1,
      shadeBins = 5,
    drawConnectors = TRUE,
    widthConnectors = 2.0,
    gridlines.major = TRUE,
    gridlines.minor = FALSE,
    border = 'full',
    borderWidth = 5,
    borderColour = 'black')
image.png

Shade certain variables.

4.12 Highlighting key variables via custom point sizes

One can also supply a vector of sizes to pointSize for the purpose of having a different size for each poin. For example, if we want to change the size of just those variables with log2FC>2:

  library("pasilla")
  pasCts <- system.file("extdata", "pasilla_gene_counts.tsv",
    package="pasilla", mustWork=TRUE)
  pasAnno <- system.file("extdata", "pasilla_sample_annotation.csv",
    package="pasilla", mustWork=TRUE)
  cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
  coldata <- read.csv(pasAnno, row.names=1)
  coldata <- coldata[,c("condition","type")]
  rownames(coldata) <- sub("fb", "", rownames(coldata))
  cts <- cts[, rownames(coldata)]
  library("DESeq2")
  dds <- DESeqDataSetFromMatrix(countData = cts,
    colData = coldata,
    design = ~ condition)

  featureData <- data.frame(gene=rownames(cts))
  mcols(dds) <- DataFrame(mcols(dds), featureData)
  dds <- DESeq(dds)
  res <- results(dds)

  p1 <- EnhancedVolcano(res,
    lab = rownames(res),
    x = "log2FoldChange",
    y = "pvalue",
    pCutoff = 10e-4,
    FCcutoff = 2,
    ylim = c(0, -log10(10e-12)),
    pointSize = c(ifelse(res$log2FoldChange>2, 8, 1)),
    labSize = 6.0,
    shape = c(6, 6, 19, 16),
    title = "DESeq2 results",
    subtitle = "Differential expression",
    caption = bquote(~Log[2]~ "fold change cutoff, 2; p-value cutoff, 10e-4"),
    legendPosition = "right",
    legendLabSize = 14,
    col = c("grey30", "forestgreen", "royalblue", "red2"),
    colAlpha = 0.9,
    drawConnectors = TRUE,
    hline = c(10e-8),
    widthConnectors = 0.5)

  p1
image.png

Highlighting key variabvles via custom point sizes.

4.13 Change to continuous colour scheme

We can over-ride the default ‘discrete’ colour scheme with a continuous one that shades between 2 colours based on nominal or adjusted p-value, whichever is selected by y, via colGradient:

  p1 <- EnhancedVolcano(res,
    lab = rownames(res),
    x = "log2FoldChange",
    y = "pvalue",
    pCutoff = 10e-4,
    FCcutoff = 2,
    ylim = c(0, -log10(10e-12)),
    pointSize = c(ifelse(res$log2FoldChange>2, 8, 1)),
    labSize = 6.0,
    shape = c(6, 6, 19, 16),
    title = "DESeq2 results",
    subtitle = "Differential expression",
    caption = bquote(~Log[2]~ "fold change cutoff, 2; p-value cutoff, 10e-4"),
    legendPosition = "right",
    legendLabSize = 14,
    colAlpha = 0.9,
    colGradient = c('red3', 'royalblue'),
    drawConnectors = TRUE,
    hline = c(10e-8),
    widthConnectors = 0.5)

  p1
image.png

Highlighting key variabvles via custom point sizes.

4.14 Custom axis tick marks

Custom axis ticks can be added in a ‘plug and play’ fashion via ggplot2 functionality, as follows:

  p1 +
    ggplot2::coord_cartesian(xlim=c(-6, 6)) +
    ggplot2::scale_x_continuous(
      breaks=seq(-6,6, 1))
image.png

Custom axis tick marks

More information on this can be found here: http://www.sthda.com/english/wiki/ggplot2-axis-ticks-a-guide-to-customize-tick-marks-and-labels

你可能感兴趣的:(用EnhancedVolcano 绘制火山图)