install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)

在基因组研究领域,探索一个或一系列与感兴趣的途径相关的基因表达谱是很常见的。这里我介绍一个很实用而且华丽的包ggpubr,可以提供发表级质量的作图效果,而且可以直接套用特定期刊规定的调色板,以方便生命科学家进行探索性数据分析(EDA)。

install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第1张图片

举个栗子

看到这样一张图,小伙伴们是不是觉得很专业?是不是想做出一张同样的图?下面我将逐步演示。事先说明,所有这些图都可以使用非常灵活的ggplot2 R包创建。然而,要自定义gglot,对于初学者来说,语法可能看起来不透明,这增加了没有高级R编程技能的研究人员的难度。ggpubr是一个围绕ggplot2的包装器,它提供了一些易于使用的函数,用于创建基于“ggplot2”的发表级绘图。我们将使用ggpubr函数从TCGA基因组数据集中可视化基因表达谱。

Contents:

  • Prerequisites:ggpubr package,TCGA data
  • Gene expression data
  • Box plots
  • Violin plots
  • Stripcharts and dot plots
  • Density plots
  • Histogram plots
  • Empirical cumulative density function
  • Quantile - Quantile plot

必备条件

1、ggpubr包:可以用CRAN以如下命令安装。

install.packages("ggpubr")

或者,从Github安装最新的测试版。

if(!require(devtools)) install.packages("devtools")devtools::install_github("kassambara/ggpubr")

然后加载该包。

library(ggpubr)

2、TCGA数据

癌症基因组图谱(TCGA)数据是一个公开的数据,包含33种癌症的临床和基因组数据。这些数据包括基因表达、CNV图谱、SNP基因型、DNA甲基化、miRNA图谱、外显子组测序和其他类型的数据。Marcin等人开发的RTCGA 软件包为获取TCGA中可用的临床和基因组数据提供了方便的解决方案。具体的安装方法可以查询Bioconductor仓库或者参考我的另一篇文章《RTCGA:TCGA数据挖掘的终极利器》。下面的R代码需要安装核心RTCGA软件包以及clinical和mRNA基因表达数据包。

要查看每种癌症类型的可用数据类型,请使用以下命令:

library(RTCGA)infoTCGA()
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第2张图片

每种癌症类型的可用数据类型

3、基因表达数据

RTCGA包中的函数expressionTCGA()可以很容易地提取一种或多种癌症类型中感兴趣的基因的表达值。在下面的R代码中,我们首先从3个不同的数据集中(乳腺浸润性癌BRCA,卵巢浆液性囊腺癌OV,肺鳞癌LUSC)提取5个感兴趣的基因GATA3、PTEN、XBP1、ESR1和MUC1的mRNA表达。

library(RTCGA)library(RTCGA.mRNA)expr 
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第3张图片

提取mRNA表达值

要显示每个数据集中的样本数,请键入以下内容。

nb_samples 
6793840778e9dbbdfdd23829aa5ef0e8.png

样本数

我们可以通过删除“mRNA”标记来简化数据集名称。这可以使用R基本函数gsub()来完成。

expr$dataset 

让我们也简化一下患者的条形码(barcode)列。下面的R码会将条形码更改为BRCA1、BRCA2、…,ov1,ov2,…等。

expr$bcr_patient_barcode 
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第4张图片

简化标签后的数据集

上述演示所需数据集在网上也已经整理好,可供下载。此数据是练习本教程中提供的R代码所必需的。如果您在安装RTCGA包时遇到一些问题,您可以简单地加载数据,如下所示:

expr 

盒装图、框图(Box plots)

创建基因表达谱的框图,按组着色(此处为数据集/癌症类型):

library(ggpubr)# GATA3ggboxplot(expr, x = "dataset", y = "GATA3",          title = "GATA3", ylab = "Expression",          color = "dataset", palette = "jco")# PTENggboxplot(expr, x = "dataset", y = "PTEN",          title = "PTEN", ylab = "Expression",          color = "dataset", palette = "jco")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第5张图片

palette参数用于使用不同的调色板。关于调色板知识,以后打算再写一篇文章来系统性介绍。目前您只需知道,ggpubr可以直接调用ggsci包的科学期刊调色板,例如:“NPG”,“AAAS”,“Lancet”,“JCO”,“ucscgb”等。很显然,上面代码直接调用了适合于JCO杂志的调色板,很美观大方。

您可以一次创建一个曲线图列表,而不是为每个基因重复相同的R代码,如下所示:

# Create a  list of plotsp 

请注意,当参数y包含多个变量(这里是多个基因名称)时,参数titlexlabylab也可以是与y长度相同的字符向量。要将p值和显著性级别添加到框图中,简单地说,您可以这样做:

my_comparisons 
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第6张图片

对于每个基因,您可以按如下方式比较不同的组

compare_means(c(GATA3, PTEN, XBP1) ~ dataset, data = expr)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第7张图片

基因在不同癌症中两两比较

如果要选择要显示的项目(此处为癌症类型)或要从绘图中删除特定项目,请使用参数selectremove,如下所示:

# Select BRCA and OV cancer typesggboxplot(expr, x = "dataset", y = "GATA3",          title = "GATA3", ylab = "Expression",          color = "dataset", palette = "jco",          select = c("BRCA", "OV"))# or remove BRCAggboxplot(expr, x = "dataset", y = "GATA3",          title = "GATA3", ylab = "Expression",          color = "dataset", palette = "jco",          remove = "BRCA")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第8张图片

要更改数据集在x轴上的顺序,请使用参数order。例如order=c(“LUSC”,“OV”,“BRCA”)

# Order data setsggboxplot(expr, x = "dataset", y = "GATA3",          title = "GATA3", ylab = "Expression",          color = "dataset", palette = "jco",          order = c("LUSC", "OV", "BRCA"))
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第9张图片

数据集在x轴上的顺序

要创建水平绘图,请使用参数rotate=true

ggboxplot(expr, x = "dataset", y = "GATA3",          title = "GATA3", ylab = "Expression",          color = "dataset", palette = "jco",          rotate = TRUE)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第10张图片

水平盒装图

要将三个基因表达图合并为多面板图,请使用参数combine=TRUE

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,          ylab = "Expression",          color = "dataset", palette = "jco")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第11张图片

多面板图

也可以使用参数merge=TRUEmerge=“asis”合并这3个绘图。

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          merge = TRUE,          ylab = "Expression",           palette = "jco")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第12张图片

融合盒装图,肿瘤类型作为分组变量

在上面的图表中,很容易直观地比较每种癌症类型中不同基因的表达水平。但是你可能想把基因(y变量)放在x轴上,以便比较不同细胞亚群中的表达水平。在这种情况下,y变量(即:基因)成为x刻度标签,而x变量(即:数据集)成为分组变量。为此,请使用参数merge=“Flip”

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          merge = "flip",          ylab = "Expression",           palette = "jco")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第13张图片

基因作为分组变量

您可能希望在框图上添加抖动点。每一点都对应于个别的观察结果。要添加抖动点,请使用参数add=“jitter”,如下所示。要自定义添加的元素,请指定参数add.params

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,          color = "dataset", palette = "jco",          ylab = "Expression",           add = "jitter",                              # Add jittered points          add.params = list(size = 0.1, jitter = 0.2)  # Point size and the amount of jittering          )
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第14张图片

添加jitter

注意,当使用ggboxplot()时,参数add的合理值是c(“jitter”,“dotplot”)之一。如果您决定使用add=“dotplot”,当您有一个很密集的点图时,您可以调整点大小和bin宽度。您可以按如下方式添加和调整点图。

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,          color = "dataset", palette = "jco",          ylab = "Expression",           add = "dotplot",                              # Add dotplot          add.params = list(binwidth = 0.1, dotsize = 0.3)          )
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第15张图片

添加dotplot

您可能希望在盒装图中标记前n个最高或最低值的样本名称。在这种情况下,可以使用以下参数:

label:包含点标签的列的名称。

label.select:可以有两种格式:

指定要显示的一些标签的字符向量。

包含以下组件之一或组合的列表:

top.uptop.down:用于显示顶部向上/向下点的标签。例如,label.select=list(top.up=10,
top.down=4)

criteria:例如,要按x和y变量值进行过滤,请使用以下命令: label.select=list(criteria=“`y`>3.9 & `y`<5 & `x` %in% c(‘BRCA’,‘OV’)”)

ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,          color = "dataset", palette = "jco",          ylab = "Expression",           add = "jitter",                               # Add jittered points          add.params = list(size = 0.1, jitter = 0.2),  # Point size and the amount of jittering          label = "bcr_patient_barcode",                # column containing point labels          label.select = list(top.up = 2, top.down = 2),# Select some labels to display          font.label = list(size = 9, face = "italic"), # label font          repel = TRUE                                  # Avoid label text overplotting          )
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第16张图片

top样本标记

可以按如下方式指定复杂的标签。

label.select.criteria  3.9 & `x` %in% c('BRCA', 'OV')")ggboxplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,          color = "dataset", palette = "jco",          ylab = "Expression",           label = "bcr_patient_barcode",              # column containing point labels          label.select = label.select.criteria,       # Select some labels to display          font.label = list(size = 9, face = "italic"), # label font          repel = TRUE                                # Avoid label text overplotting          )
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第17张图片

自定义复杂标签

小提琴图(Violin plots)

下面的R代码绘制内部带有框图的小提琴图。

ggviolin(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,           color = "dataset", palette = "jco",          ylab = "Expression",           add = "boxplot")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第18张图片

内部带有框图的小提琴图

除了在小提琴图内添加框图外,您可以按如下方式添加中位数+分位数范围。

ggviolin(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,           color = "dataset", palette = "jco",          ylab = "Expression",           add = "median_iqr")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第19张图片

带有中位数+分位数范围的小提琴图

使用函数ggviolin()时,参数add的合理值包括:“means”、“means_se”、“means_sd”、“mean_ci”、“mean_range”、“median”、“median_iqr”、“median_mad”、“median_range”。您还可以在小提琴曲线图中添加“jitter”点和“dotplot”,如前所述。

条形图和点图(Stripcharts and dot plots)

要绘制条形图,请键入以下内容。

ggstripchart(expr, x = "dataset",             y = c("GATA3", "PTEN", "XBP1"),             combine = TRUE,              color = "dataset", palette = "jco",             size = 0.1, jitter = 0.2,             ylab = "Expression",              add = "median_iqr",             add.params = list(color = "gray"))
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第20张图片

条形图

对于点图,用下面代码。

ggdotplot(expr, x = "dataset",          y = c("GATA3", "PTEN", "XBP1"),          combine = TRUE,           color = "dataset", palette = "jco",          fill = "white",          binwidth = 0.1,          ylab = "Expression",           add = "median_iqr",          add.params = list(size = 0.9))
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第21张图片

带dotplot的小提琴图

密度图(Density plots)

要将分布可视化为密度图,请使用函数ggdensity(),如下所示。

# Basic density plotggdensity(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..density..",       combine = TRUE,                  # Combine the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE                       # Add marginal rug)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第22张图片

基本密度图

# Change color and fill by datasetggdensity(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..density..",       combine = TRUE,                  # Combine the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE,                      # Add marginal rug       color = "dataset",        fill = "dataset",       palette = "jco")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第23张图片

改变颜色,以数据集填充

# Merge the 3 plots# and use y = "..count.." instead of "..density.."ggdensity(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第24张图片

三图融合

# color and fill by x variablesggdensity(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       color = ".x.", fill = ".x.",     # color and fill by x variables       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第25张图片

颜色以变量填充

# Facet by "dataset"ggdensity(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       color = ".x.", fill = ".x.",        facet.by = "dataset",            # Split by "dataset" into multi-panel       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第26张图片

以数据集做分面图

直方图(Histogram plots)

要将分布可视化为直方图,请使用函数gghistogram(),如下所示。

# Basic histogram plot gghistogram(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..density..",       combine = TRUE,                  # Combine the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE                       # Add marginal rug)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第27张图片

基本直方图

# Change color and fill by datasetgghistogram(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..density..",       combine = TRUE,                  # Combine the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE,                      # Add marginal rug       color = "dataset",        fill = "dataset",       palette = "jco")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第28张图片

彩色直方图,以数据集填充

# Merge the 3 plots# and use y = "..count.." instead of "..density.."gghistogram(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第29张图片

融合直方图

# color and fill by x variablesgghistogram(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       color = ".x.", fill = ".x.",     # color and fill by x variables       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第30张图片

以x变量填充

# Facet by "dataset"gghistogram(expr,       x = c("GATA3", "PTEN",  "XBP1"),       y = "..count..",       color = ".x.", fill = ".x.",        facet.by = "dataset",            # Split by "dataset" into multi-panel       merge = TRUE,                    # Merge the 3 plots       xlab = "Expression",        add = "median",                  # Add median line.        rug = TRUE ,                     # Add marginal rug       palette = "jco"                  # Change color palette)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第31张图片

以数据集分面

经验累积密度函数(Empirical cumulative density function)

# Basic ECDF plot ggecdf(expr,       x = c("GATA3", "PTEN",  "XBP1"),       combine = TRUE,                        xlab = "Expression", ylab = "F(expression)")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第32张图片

基本ECDF

# Change color  by datasetggecdf(expr,       x = c("GATA3", "PTEN",  "XBP1"),       combine = TRUE,                        xlab = "Expression", ylab = "F(expression)",       color = "dataset", palette = "jco")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第33张图片

根据数据集上色

# Merge the 3 plots and color by x variablesggecdf(expr,       x = c("GATA3", "PTEN",  "XBP1"),       merge = TRUE,                        xlab = "Expression", ylab = "F(expression)",       color = ".x.", palette = "jco")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第34张图片

融合ECDF

# Merge the 3 plots and color by x variables# facet by "dataset" into multi-panelggecdf(expr,       x = c("GATA3", "PTEN",  "XBP1"),       merge = TRUE,                        xlab = "Expression", ylab = "F(expression)",       color = ".x.", palette = "jco",       facet.by = "dataset")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第35张图片

根据数据集分面

分位数-分位数曲线图(Quantile - Quantile plot)

# Basic ECDF plot ggqqplot(expr,       x = c("GATA3", "PTEN",  "XBP1"),       combine = TRUE, size = 0.5)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第36张图片

基本Q-Q图

# Change color  by datasetggqqplot(expr,       x = c("GATA3", "PTEN",  "XBP1"),       combine = TRUE, color = "dataset", palette = "jco",       size = 0.5)
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第37张图片

以数据集上色

# Merge the 3 plots and color by x variablesggqqplot(expr,       x = c("GATA3", "PTEN",  "XBP1"),       merge = TRUE,         color = ".x.", palette = "jco")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第38张图片

融合Q-Q图

# Merge the 3 plots and color by x variables# facet by "dataset" into multi-panelggqqplot(expr,       x = c("GATA3", "PTEN",  "XBP1"),       merge = TRUE, size = 0.5,       color = ".x.", palette = "jco",       facet.by = "dataset")
install package vif包_ggpubr: Publication Ready Plots (发表级质量的作图R包)_第39张图片

以数据集做分面图

看了上面的演示,小伙伴们是不是跃跃欲试了呢?欢迎继续关注我的后续文章!

你可能感兴趣的:(install,package,vif包)