桓峰基因公众号推出基于基因组变异数据生信分析教程并配有视频在线教程,目前整理出来的教程目录如下:
DNA 1. Germline Mutation Vs. Somatic Mutation 傻傻分不清楚
DNA 2. SCI 文章中基因组变异分析神器之 maftools
DNA 3. SCI 文章中基因组变异分析神器之 maftools
DNA 4. SCI 文章中基因组的突变信号(maftools)
DNA 5. 基因组变异文件VCF格式详解
DNA 6. 基因组变异之绘制精美瀑布图(ComplexHeatmap)
如果有maf格式的文件,可以直接oncoplot包绘制瀑布图,有多种展示和统计maftools | 从头开始绘制发表级oncoplot(瀑布图)和maftools|TCGA肿瘤突变数据的汇总,分析和可视化,如果只有多个样本的基因突变与否的excel,不用担心,也可以用complexheatmap包绘制。
ComplexHeatmap这个包功能很强大,本次只简单地介绍如何绘制基因组景观图(瀑布图)。参数也非常的多,需要我们自己设计一些图形传入的数据,比如突变矩阵,颜色,legend的数据以及不同位置。另外oncoPrint也可以与draw这个函数并用,完美的设计自己觉着不错的图形用于文章中非常棒,一般来说凡是有复杂一些的突变瀑布图都是使用ComplexHeatmap软件包绘制处理的,别老是用oncoplot,毕竟灵活性不足而且不好对连续变量进行展示。下面主要简单说一下oncoPrint用到的几个参数:
mat: The value should be a character matrix which encodes mulitple alterations or a list of matrices for which every matrix contains binary value representing whether the alteration is present or absent. When the value is a list, the names of the list represent alteration types. You can use unify_mat_list to make all matrix having same row names and column names.
alter_fun: A single function or a list of functions which defines how to add graphics for different alterations. You can use alter_graphic to automatically generate for rectangles and points.
alter_fun_is_vectorized:Whether alter_fun is implemented vectorized. Internally the function will guess.
col:A vector of color for which names correspond to alteration types.
top_annotation:Annotation put on top of the oncoPrint. By default it is barplot which shows the number of genes with a certain alteration in each sample.
right_annotation:Annotation put on the right of the oncoPrint. By default it is barplot which shows the number of samples with a certain alteration in each gene.
left_annotation:Annotation put on the left of the oncoPrint.
bottom_annotation:Annotation put at the bottom of the oncoPrint.
heatmap_legend_param:pass to Heatmap.
安装ComplexHeatmap软件包并加载,我们这里是使用maftools获得的突变矩阵,非常方便,一会教大家怎么使用。
if (!require(maftools)) BiocManager::install("maftools")
if (!require(ComplexHeatmap)) BiocManager::install("ComplexHeatmap")
library(maftools)
library(ComplexHeatmap)
举例的数据就是LAML的数据集,在TCGA数据库上也有,但是我这里是使用maftools自带的例子数据,方便一些,获取突变矩阵只需要在oncoplot函数设置参数writeMatrix = TRUE,就会自动生成一个文件名为“onco_matrix.txt”的突变矩阵文件了,简单吧,不过前提条件是我们有maf格式文件,如果没有自己搞一个突变矩阵吧,行名为基因,列名为样本即可。
laml.maf = system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
# clinical information containing survival information and histology. This is
# optional
laml.clin = system.file("extdata", "tcga_laml_annot.tsv", package = "maftools")
laml = read.maf(maf = laml.maf, clinicalData = laml.clin)
## -Reading
## -Validating
## -Silent variants: 475
## -Summarizing
## -Processing clinical data
## -Finished in 3.190s elapsed (0.450s cpu)
matMut <- read.table("onco_matrix.txt", header = T, check.names = F, sep = "\t")
matMut[1:3, 1:3]
## TCGA-AB-2945 TCGA-AB-2965 TCGA-AB-2993
## FLT3 Missense In-frame In-frame
## DNMT3A Missense Missense Truncating
## NPM1 Truncating Truncating Truncating
接下来我们需要查看突变有哪几种类型,方便我们设置相应的图形大小,颜色等。我们看到共4种突变类型,那么就按照这四种突变开始设计图形等参数。
library(reshape2)
matMut[matMut == "In-frame"] = "In_frame"
matMuttmp = matMut
matMuttmp$gene = row.names(matMuttmp)
mat_long <- melt(matMuttmp, id.vars = "gene", value.name = "Variant_Classification")
levels(factor(mat_long$Variant_Classification))
## [1] "" "In_frame" "Missense" "Multi_Hit" "Truncating"
临床信息需要我们区分连续型变量、离散型变量还是分类变量,这样在设置legend时颜色的筛选有非常大的不同。
pdata <- getClinicalData(laml)
pdata <- subset(pdata, pdata$Tumor_Sample_Barcode %in% colnames(matMut))
pdata = as.data.frame(pdata)
pdata$days_to_last_followup = ifelse(pdata$days_to_last_followup == "-Inf", 0, pdata$days_to_last_followup)
# 画图并去除无突变的样本和基因
pdata$days_to_last_followup = as.numeric(pdata$days_to_last_followup)
pdata$FAB_classification = factor(pdata$FAB_classification)
pdata$Overall_Survival_Status = factor(pdata$Overall_Survival_Status)
str(pdata)
## 'data.frame': 164 obs. of 4 variables:
## $ Tumor_Sample_Barcode : chr "TCGA-AB-2802" "TCGA-AB-2804" "TCGA-AB-2805" "TCGA-AB-2806" ...
## $ FAB_classification : Factor w/ 8 levels "M0","M1","M2",..: 5 4 1 2 2 3 4 3 3 5 ...
## $ days_to_last_followup : num 365 2557 577 945 181 ...
## $ Overall_Survival_Status: Factor w/ 2 levels "0","1": 2 1 2 2 2 1 2 2 2 2 ...
matMut <- matMut[, pdata$Tumor_Sample_Barcode]
指定变异的形状,x,y,w,h代表变异的位置(x,y)和宽度(w),高度(h)
alter_fun <- list(
background = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"),
gp = gpar(fill = "white", col = NA))
},
In_frame = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"),
gp = gpar(fill = col["In_frame"], col = NA))
},
Missense = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"),
gp = gpar(fill = col["Missense"], col = NA))
},
Multi_Hit = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"),
gp = gpar(fill = col["Multi_Hit"], col = NA))
},
Truncating = function(x, y, w, h) {
grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.5, "mm"),
gp = gpar(fill = col["Truncating"], col = NA))
}
# Splice_Site = function(x, y, w, h) {
# grid.rect(x, y, w-unit(0.5, "mm"),h-unit(0.5, "mm"),
# gp = gpar(fill = col["Splice_Site"], col = NA))
#}
)
heatmap_legend_param <- list(title = "Alternations", at = c("In_frame", "Missense",
"Truncating", "Multi_Hit"), labels = c("In_frame", "Missense", "Truncating",
"Multi_Hit"))
指定颜色包括热图突变类型的颜色以及样本注释的颜色,基因注释的颜色等,我们这里只对突变类型和样本信息注释的颜色。
# 指定颜色, 调整颜色代码即可
col <- c(In_frame = "purple", Missense = "orange", Multi_Hit = "black", Truncating = "blue")
# 定义注释信息 自定义颜色 连续性变量设置颜色(外)
library(circlize)
col_OS = colorRamp2(c(0, 973), c("white", "red"))
通过HeatmapAnnotation函数设置样本的注释信息,如下:
ha <- HeatmapAnnotation(OS = pdata$days_to_last_followup, Status = pdata$Overall_Survival_Status,
FAB_classification = pdata$FAB_classification, col = list(OS = col_OS), show_annotation_name = TRUE,
annotation_name_gp = gpar(fontsize = 7))
column_title <- "This is Oncoplot "
简单瀑布图只包括热图部分,legend就是突变类型,如下:
oncoPrint(matMut, alter_fun = alter_fun, col = col, alter_fun_is_vectorized = FALSE)
添加样子注释结果,如下:
oncoPrint(matMut,
bottom_annotation = ha, #注释信息在底部
# top_annotation=top_annotation,
#right_annotation=NULL,
alter_fun = alter_fun,
col = col,
column_title = column_title,
heatmap_legend_param = heatmap_legend_param,
row_names_side = "left",
pct_side = "right",
# column_order=sample_order,
# column_split=3
alter_fun_is_vectorized = FALSE
)
瀑布图提供三种注释方式,一种就是突变类型的注释,而另一种就是样本注释,当然基因也可以注释,而注释的位置需要根据draw给出来的参数自行调整,放在瀑布图的上下左右等四个位置,举几个例子稍微说明一下。首先是通过oncoPrint函数获得绘图参数,如下:
oncoplot_anno <- oncoPrint(matMut,
bottom_annotation = ha, #注释信息在底部
# top_annotation=top_annotation,
#right_annotation=NULL,
alter_fun = alter_fun,
col = col,
column_title = "",
heatmap_legend_param = heatmap_legend_param,
row_names_side = "left",
pct_side = "right",
# column_order=sample_order,
# column_split=3
alter_fun_is_vectorized = FALSE
)
将样本的注释放在左边,如下:
draw(oncoplot_anno, annotation_legend_side = "left", )
样本注释放在左边位置,而突变类型也就是热图注释放在右边,如下:
draw(oncoplot_anno, annotation_legend_side = "left", heatmap_legend_side = "right")
调整位置与上面位置正好相反,如下:
draw(oncoplot_anno, annotation_legend_side = "right", heatmap_legend_side = "left",
)
将热图注释放在瀑布图的最下面,并通过参数align_heatmap_legend调整位于中央位置,如下:
draw(oncoplot_anno, annotation_legend_side = "right", heatmap_legend_side = "bottom",
align_heatmap_legend = "global_center")
总之根据自己的感觉,喜欢怎么设计就怎么搞起来就可以了,记住以后不是只能用maftools包里面的oncoplot绘制瀑布图哦!