ComplexHeatmap复杂热图绘制学习——11.例子

更多示例

参考链接

1.为基因表达矩阵添加更多信息

热图非常流行用于可视化基因表达矩阵。矩阵中的行对应于基因,在表达热图之后可以添加关于这些基因的更多信息。

在下面的例子中,热图可视化了基因的相对表达(每个基因的表达被缩放)。在右侧,我们将基因的绝对表达水平作为单列热图。基因长度和基因类型(即蛋白质编码或lincRNA)也作为热图注释或热图。

在热图的最左侧,绘制了彩色矩形 anno_block()以识别 k 均值聚类中的五个聚类。在“基本均值”和“基因类型”热图之上,还有汇总图(条形图和箱形图)显示五个集群中数据点的统计数据或分布。

library(ComplexHeatmap)
library(circlize)

expr = readRDS(system.file(package = "ComplexHeatmap", "extdata", "gene_expression.rds"))
mat = as.matrix(expr[, grep("cell", colnames(expr))])
base_mean = rowMeans(mat)
mat_scaled = t(apply(mat, 1, scale))

type = gsub("s\\d+_", "", colnames(mat))
ha = HeatmapAnnotation(type = type, annotation_name_side = "left")

ht_list = Heatmap(mat_scaled, name = "expression", row_km = 5, 
    col = colorRamp2(c(-2, 0, 2), c("green", "white", "red")),
    top_annotation = ha, 
    show_column_names = FALSE, row_title = NULL, show_row_dend = FALSE) +
Heatmap(base_mean, name = "base mean", 
    top_annotation = HeatmapAnnotation(summary = anno_summary(gp = gpar(fill = 2:6), 
        height = unit(2, "cm"))),
    width = unit(15, "mm")) +
rowAnnotation(length = anno_points(expr$length, pch = 16, size = unit(1, "mm"), 
    axis_param = list(at = c(0, 2e5, 4e5, 6e5), 
        labels = c("0kb", "200kb", "400kb", "600kb")),
    width = unit(2, "cm"))) +
Heatmap(expr$type, name = "gene type", 
    top_annotation = HeatmapAnnotation(summary = anno_summary(height = unit(2, "cm"))),
    width = unit(15, "mm"))

ht_list = rowAnnotation(block = anno_block(gp = gpar(fill = 2:6, col = NA)), 
    width = unit(2, "mm")) + ht_list

draw(ht_list)
image

2.麻疹疫苗热图

以下代码重现了此处和此处介绍的热图。

mat = readRDS(system.file("extdata", "measles.rds", package = "ComplexHeatmap"))
ha1 = HeatmapAnnotation(
    dist1 = anno_barplot(
        colSums(mat), 
        bar_width = 1, 
        gp = gpar(col = "white", fill = "#FFE200"), 
        border = FALSE,
        axis_param = list(at = c(0, 2e5, 4e5, 6e5, 8e5),
            labels = c("0", "200k", "400k", "600k", "800k")),
        height = unit(2, "cm")
    ), show_annotation_name = FALSE)
ha2 = rowAnnotation(
    dist2 = anno_barplot(
        rowSums(mat), 
        bar_width = 1, 
        gp = gpar(col = "white", fill = "#FFE200"), 
        border = FALSE,
        axis_param = list(at = c(0, 5e5, 1e6, 1.5e6),
            labels = c("0", "500k", "1m", "1.5m")),
        width = unit(2, "cm")
    ), show_annotation_name = FALSE)
year_text = as.numeric(colnames(mat))
year_text[year_text %% 10 != 0] = ""
ha_column = HeatmapAnnotation(
    year = anno_text(year_text, rot = 0, location = unit(1, "npc"), just = "top")
)
col_fun = colorRamp2(c(0, 800, 1000, 127000), c("white", "cornflowerblue", "yellow", "red"))
ht_list = Heatmap(mat, name = "cases", col = col_fun,
    cluster_columns = FALSE, show_row_dend = FALSE, rect_gp = gpar(col= "white"), 
    show_column_names = FALSE,
    row_names_side = "left", row_names_gp = gpar(fontsize = 8),
    column_title = 'Measles cases in US states 1930-2001\nVaccine introduced 1961',
    top_annotation = ha1, bottom_annotation = ha_column,
    heatmap_legend_param = list(at = c(0, 5e4, 1e5, 1.5e5), 
        labels = c("0", "50k", "100k", "150k"))) + ha2
draw(ht_list, ht_gap = unit(3, "mm"))
decorate_heatmap_body("cases", {
    i = which(colnames(mat) == "1961")
    x = i/ncol(mat)
    grid.lines(c(x, x), c(0, 1), gp = gpar(lwd = 2, lty = 2))
    grid.text("Vaccine introduced", x, unit(1, "npc") + unit(5, "mm"))
})
image

3.从单细胞 RNASeq 中可视化细胞异质性

在这个例子中,小鼠 T 细胞的单细胞 RNA-Seq 数据被可视化以显示细胞的异质性。 ( mouse_scRNAseq_corrected.txt) 数据来自Buettner 等人,2015 年,补充数据 1mouse_scRNAseq_corrected.txt,表“细胞周期校正基因表达”。你可以到 这里。

在以下代码中,删除了重复的基因。

expr = read.table("data/mouse_scRNAseq_corrected.txt", sep = "\t", header = TRUE)
expr = expr[!duplicated(expr[[1]]), ]
rownames(expr) = expr[[1]]
expr = expr[-1]
expr = as.matrix(expr)

过滤掉一半以上细胞中未表达的基因。

expr = expr[apply(expr, 1, function(x) sum(x > 0)/length(x) > 0.5), , drop = FALSE]

get_correlated_variable_rows()函数在这里被定义。它提取在细胞之间可变表达与其他基因相关的特征基因。

get_correlated_variable_genes = function(mat, n = nrow(mat), cor_cutoff = 0, n_cutoff = 0) {
    ind = order(apply(mat, 1, function(x) {
            q = quantile(x, c(0.1, 0.9))
            x = x[x < q[1] & x > q[2]]
            var(x)/mean(x)
        }), decreasing = TRUE)[1:n]
    mat2 = mat[ind, , drop = FALSE]
    dt = cor(t(mat2), method = "spearman")
    diag(dt) = 0
    dt[abs(dt) < cor_cutoff] = 0
    dt[dt < 0] = -1
    dt[dt > 0] = 1

    i = colSums(abs(dt)) > n_cutoff

    mat3 = mat2[i, ,drop = FALSE]
    return(mat3)
}

特征基因被定义为一个基因列表,其中每个基因的绝对相关性超过 0.5 并且有 20 多个相关基因。

mat2包含按基因缩放的表达值,这意味着它包含每个基因跨细胞的相对表达。由于单细胞 RNASeq 数据高度可变且异常值频繁出现,基因表达仅在第10和第90分位数内缩放。

mat = get_correlated_variable_genes(expr, cor_cutoff = 0.5, n_cutoff = 20)
mat2 = t(apply(mat, 1, function(x) {
    q10 = quantile(x, 0.1)
    q90 = quantile(x, 0.9)
    x[x < q10] = q10
    x[x > q90] = q90
    scale(x)
}))
colnames(mat2) = colnames(mat)

加载细胞周期基因和核糖核蛋白基因。细胞周期基因列表来自Buettner et al., 2015 , 补充表 1, “ Union of Cyclebase and GO gene”。核糖核蛋白基因来自 GO:0030529。基因列表存储在 mouse_cell_cycle_gene.rdsmouse_ribonucleoprotein.rds中。这两个文件可以在这里和 这里找到 。

cc = readRDS("data/mouse_cell_cycle_gene.rds")
ccl = rownames(mat) %in% cc
cc_gene = rownames(mat)[ccl]

rp = readRDS("data/mouse_ribonucleoprotein.rds")
rpl = rownames(mat) %in% rp

由于缩放每个基因的表达值,一个基因相对于其他基因的表达水平已经丢失,我们将基本平均值计算为所有样本中基因的平均表达。基本平均值可用于比较基因之间的表达水平。

base_mean = rowMeans(mat)

现在可以使用以下信息:

  1. 缩放表达式, mat2,
  2. 基本均值, base_mean,
  3. 基因是否为核糖核蛋白基因rpl
  4. 基因是否是细胞周期基因ccl
  5. 细胞周期基因的符号, cc_gene,

在下一步中,我们可以将信息放在一起并将其可视化为热图列表。最后添加了一个基因-基因相关热图,并定义为 main_heatmap,这意味着所有热图/行注释的行顺序都基于此相关矩阵的聚类。

对于表达水平相对较高的细胞周期基因(大于所有基因的 25% 分位数),基因名称以文本标签表示。在第一个热图中,列树状图下面有两种不同的颜色,这些颜色基于通过层次聚类得出的两个主要组,以突出显示两个亚群。

library(GetoptLong)
ht_list = Heatmap(mat2, col = colorRamp2(c(-1.5, 0, 1.5), c("blue", "white", "red")), 
    name = "scaled_expr", column_title = qq("relative expression for @{nrow(mat)} genes"),
    show_column_names = FALSE, width = unit(8, "cm"),
    heatmap_legend_param = list(title = "Scaled expr")) +
    Heatmap(base_mean, name = "base_expr", width = unit(5, "mm"),
        heatmap_legend_param = list(title = "Base expr")) +
    Heatmap(rpl + 0, name = "ribonucleoprotein", col = c("0" = "white", "1" = "purple"), 
        show_heatmap_legend = FALSE, width = unit(5, "mm")) +
    Heatmap(ccl + 0, name = "cell_cycle", col = c("0" = "white", "1" = "red"), 
        show_heatmap_legend = FALSE, width = unit(5, "mm")) +
    rowAnnotation(link = anno_mark(at = which(ccl & base_mean > quantile(base_mean, 0.25)), 
        labels = rownames(mat)[ccl & base_mean > quantile(base_mean, 0.25)], 
        labels_gp = gpar(fontsize = 10), padding = unit(1, "mm"))) +
    Heatmap(cor(t(mat2)), name = "cor", 
        col = colorRamp2(c(-1, 0, 1), c("green", "white", "red")), 
        show_row_names = FALSE, show_column_names = FALSE, row_dend_side = "right", 
        show_column_dend = FALSE, column_title = "pairwise correlation between genes",
        heatmap_legend_param = list(title = "Correlation"))
ht_list = draw(ht_list, main_heatmap = "cor")
decorate_column_dend("scaled_expr", {
    tree = column_dend(ht_list)$scaled_expr
    ind = cutree(as.hclust(tree), k = 2)[order.dendrogram(tree)]

    first_index = function(l) which(l)[1]
    last_index = function(l) { x = which(l); x[length(x)] }
    x1 = c(first_index(ind == 1), first_index(ind == 2)) - 1
    x2 = c(last_index(ind == 1), last_index(ind == 2))
    grid.rect(x = x1/length(ind), width = (x2 - x1)/length(ind), just = "left",
        default.units = "npc", gp = gpar(fill = c("#FF000040", "#00FF0040"), col = NA))
})
image

热图清楚地表明细胞被分成两个亚群。第一个热图中左侧的群体表现出细胞周期基因子集的高表达(细胞周期基因在“ cell_cycle ”热图中表示)。然而,这些基因的整体表达水平相对较低(参见“ base_expr ”热图)。右侧的群体在其他特征基因中具有更高的表达。有趣的是,在该亚群中高表达的特征基因富含编码核糖核蛋白的基因(参见“核糖核蛋白”热图)。核糖核蛋白基因的一个子集显示出强烈的共表达(参见相关热图)和整体高表达水平(“ base_expr”热图)。

4.甲基化、表达值和其他基因组特征之间的相关性

在以下示例中,数据是根据未发表分析中发现的模式随机生成的。首先我们加载数据。meth.rds可以在这里找到。

res_list = readRDS("data/meth.rds")
type = res_list$type
mat_meth = res_list$mat_meth
mat_expr = res_list$mat_expr
direction = res_list$direction
cor_pvalue = res_list$cor_pvalue
gene_type = res_list$gene_type
anno_gene = res_list$anno_gene
dist = res_list$dist
anno_enhancer = res_list$anno_enhancer

不同的信息来源和相应的变量是:

  1. type:显示样本是肿瘤还是正常的标签。
  2. mat_meth:一个矩阵,其中的行对应于差异甲基化区域 (DMR)。矩阵中的值是每个样品中 DMR 中的平均甲基化水平。
  3. mat_expr:一个矩阵,其中的行对应于与 DMR 相关的基因(即最接近 DMR 的基因)。矩阵中的值是每个样本中每个基因的表达水平。表达针对样本中的每个基因进行缩放。
  4. direction:甲基化变化的方向(hyper 表示肿瘤样本中较高的甲基化,hypo 表示肿瘤样本中较低的甲基化)。
  5. cor_pvalue:甲基化和相关基因表达之间相关性检验的 p 值。
  6. gene_type:基因的类型(例如蛋白质编码基因或lincRNA)。
  7. anno_gene: 基因模型的注释(基因间、基因内或 TSS)。
  8. dist:相关基因的 DMR 到 TSS 的距离。
  9. anno_enhancer:与增强子重叠的 DMR 的分数。

数据仅包括相关基因的甲基化和表达呈负相关的 DMR。

首先计算甲基化矩阵的列聚类,以便可以将表达矩阵中的列调整为与甲基化矩阵中的列顺序相同。

column_tree = hclust(dist(t(mat_meth)))
column_order = column_tree$order
library(RColorBrewer)
meth_col_fun = colorRamp2(c(0, 0.5, 1), c("blue", "white", "red"))
direction_col = c("hyper" = "red", "hypo" = "blue")
expr_col_fun = colorRamp2(c(-2, 0, 2), c("green", "white", "red"))
pvalue_col_fun = colorRamp2(c(0, 2, 4), c("white", "white", "red"))
gene_type_col = structure(brewer.pal(length(unique(gene_type)), "Set3"), 
    names = unique(gene_type))
anno_gene_col = structure(brewer.pal(length(unique(anno_gene)), "Set1"), 
    names = unique(anno_gene))
dist_col_fun = colorRamp2(c(0, 10000), c("black", "white"))
enhancer_col_fun = colorRamp2(c(0, 1), c("white", "orange"))

我们首先定义两列注释,然后制作复杂的热图。

ht_opt(
    legend_title_gp = gpar(fontsize = 8, fontface = "bold"), 
    legend_labels_gp = gpar(fontsize = 8), 
    heatmap_column_names_gp = gpar(fontsize = 8),
    heatmap_column_title_gp = gpar(fontsize = 10),
    heatmap_row_title_gp = gpar(fontsize = 8)
)

ha = HeatmapAnnotation(type = type, 
    col = list(type = c("Tumor" = "pink", "Control" = "royalblue")),
    annotation_name_side = "left")
ha2 = HeatmapAnnotation(type = type, 
    col = list(type = c("Tumor" = "pink", "Control" = "royalblue")), 
    show_legend = FALSE)

ht_list = Heatmap(mat_meth, name = "methylation", col = meth_col_fun,
    column_order= column_order,
    top_annotation = ha, column_title = "Methylation") +
    Heatmap(direction, name = "direction", col = direction_col) +
    Heatmap(mat_expr[, column_tree$order], name = "expression", 
        col = expr_col_fun, 
        column_order = column_order, 
        top_annotation = ha2, column_title = "Expression") +
    Heatmap(cor_pvalue, name = "-log10(cor_p)", col = pvalue_col_fun) +
    Heatmap(gene_type, name = "gene type", col = gene_type_col) +
    Heatmap(anno_gene, name = "anno_gene", col = anno_gene_col) +
    Heatmap(dist, name = "dist_tss", col = dist_col_fun) +
    Heatmap(anno_enhancer, name = "anno_enhancer", col = enhancer_col_fun, 
        cluster_columns = FALSE, column_title = "Enhancer")

draw(ht_list, row_km = 2, row_split = direction,
    column_title = "Comprehensive correspondence between methylation, expression and other genomic features", 
    column_title_gp = gpar(fontsize = 12, fontface = "bold"), 
    merge_legends = TRUE, heatmap_legend_side = "bottom")
image
ht_opt(RESET = TRUE)

复杂的热图显示高度甲基化的 DMR 富含基因间和基因内区域,很少与增强子重叠。相比之下,低甲基化的 DMR 富含转录起始位点 (TSS) 和增强子。

5. 使用复杂注释可视化甲基化概况

在这个例子中,Strum et al., 2012 中的图 1 重新实现了一些调整。

需要先加载包。

library(matrixStats)
library(GenomicRanges)

甲基化图谱可以从GEO 数据库下载。该GEOquery 包用于从GEO检索数据。

library(GEOquery)
gset = getGEO("GSE36278")

已通过 Illumina HumanMethylation450 BeadChip 阵列测量甲基化图谱。我们通过IlluminaHumanMethylation450kanno.ilmn12.hg19包加载探针数据。

调整矩阵中的行名称与探针相同。

library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
data(Locations)

mat = exprs(gset[[1]])
colnames(mat) = phenoData(gset[[1]])@data$title
mat = mat[rownames(Locations), ] 

probe包含探针的位置以及 CpG 位点是否与 SNP 重叠的信息。在这里,我们去除了性染色体上的探针和SNP 重叠的探针。

data(SNPs.137CommonSingle)
data(Islands.UCSC)
l = Locations$chr %in% paste0("chr", 1:22) & is.na(SNPs.137CommonSingle$Probe_rs)
mat = mat[l, ]

相应地获取探针位置的子集和对 CpG 岛的注释。

cgi = Islands.UCSC$Relation_to_Island[l]
loc = Locations[l, ]

将基质分为肿瘤样品基质和正常样品基质。还要修改肿瘤样本的列名以和表型数据一致。

mat1 = as.matrix(mat[, grep("GBM", colnames(mat))])   # tumor samples
mat2 = as.matrix(mat[, grep("CTRL", colnames(mat))])  # normal samples
colnames(mat1) = gsub("GBM", "dkfz", colnames(mat1))

表型数据来自Sturm 等人,2012 年,补充表 S1,可在此处找到。

表型数据的行被调整为与甲基化矩阵的列相同。

phenotype = read.table("data/450K_annotation.txt", header = TRUE, sep = "\t", 
    row.names = 1, check.names = FALSE, comment.char = "", stringsAsFactors = FALSE)
phenotype = phenotype[colnames(mat1), ]

请注意,我们仅使用来自 DKFZ 的 136 个样本,而在Sturm et al., 2012 中,使用了额外的 74 个 TCGA 样本。

提取肿瘤样本中甲基化变化最大的前 8000 个探针,并相应地对其他信息进行子集化。

ind = order(rowVars(mat1, na.rm = TRUE), decreasing = TRUE)[1:8000]
m1 = mat1[ind, ]
m2 = mat2[ind, ]
cgi2 = cgi[ind]
cgi2 = ifelse(grepl("Shore", cgi2), "Shore", cgi2)
cgi2 = ifelse(grepl("Shelf", cgi2), "Shelf", cgi2)
loc = loc[ind, ]

对于每个探头,找到到最近的 TSS 的距离。pc_tx_tss.bed包含来自蛋白质编码基因的 TSS 位置。

gr = GRanges(loc[, 1], ranges = IRanges(loc[, 2], loc[, 2]+1))
tss = read.table("data/pc_tx_tss.bed", stringsAsFactors = FALSE)
tss = GRanges(tss[[1]], ranges = IRanges(tss[, 2], tss[, 3]))

tss_dist = distanceToNearest(gr, tss)
tss_dist = tss_dist@elementMetadata$distance

因为NA矩阵中有一些( sum(is.na(m1))/length(m1)= 0.0011967) 会破坏cor()函数,所以我们替换NA为中间甲基化 (0.5)。请注意,尽管ComplexHeatmap允许NA在矩阵中,但删除NA会加速聚类。

m1[is.na(m1)] = 0.5
m2[is.na(m2)] = 0.5

以下注释将添加到甲基化矩阵的列中:

  1. 年龄
  2. DKFZ 亚型分类
  3. TCGA 亚型分类
  4. 基于表达谱的 TCGA 亚型分类
  5. IDH1突变
  6. H3F3A突变
  7. TP53突变
  8. chr7增益
  9. chr10 损失
  10. CDKN2A 缺失
  11. EGFR扩增
  12. PDGFRA 扩增

在下面的代码中,我们在ha变量中定义了列注释。我们还自定义注释的颜色、图例和高度。

mutation_col = structure(names = c("MUT", "WT", "G34R", "G34V", "K27M"), 
    c("black", "white", "#4DAF4A", "#4DAF4A", "#377EB8"))
cnv_col = c("gain" = "#E41A1C", "loss" = "#377EB8", "amp" = "#E41A1C", 
    "del" = "#377EB8", "normal" = "white")
ha = HeatmapAnnotation(
    age = anno_points(phenotype[[13]], 
        gp = gpar(col = ifelse(phenotype[[13]] > 20, "black", "red")), 
        height = unit(3, "cm")),
    dkfz_cluster = phenotype[[1]],
    tcga_cluster = phenotype[[2]],
    tcga_expr = phenotype[[3]],
    IDH1 = phenotype[[5]],
    H3F3A = phenotype[[4]],
    TP53 = phenotype[[6]],
    chr7_gain = ifelse(phenotype[[7]] == 1, "gain", "normal"),
    chr10_loss = ifelse(phenotype[[8]] == 1, "loss", "normal"),
    CDKN2A_del = ifelse(phenotype[[9]] == 1, "del", "normal"),
    EGFR_amp = ifelse(phenotype[[10]] == 1, "amp", "normal"),
    PDGFRA_amp = ifelse(phenotype[[11]] == 1, "amp", "normal"),
    col = list(dkfz_cluster = structure(names = c("IDH", "K27", "G34", "RTK I PDGFRA", 
            "Mesenchymal", "RTK II Classic"), brewer.pal(6, "Set1")),
        tcga_cluster = structure(names = c("G-CIMP+", "Cluster #2", "Cluster #3"), 
            brewer.pal(3, "Set1")),
        tcga_expr = structure(names = c("Proneural", "Classical", "Mesenchymal"), 
            c("#377EB8", "#FFFF33", "#FF7F00")),
        IDH1 = mutation_col,
        H3F3A = mutation_col,
        TP53 = mutation_col,
        chr7_gain = cnv_col,
        chr10_loss = cnv_col,
        CDKN2A_del = cnv_col,
        EGFR_amp = cnv_col,
        PDGFRA_amp = cnv_col),
    na_col = "grey", border = TRUE,
    show_legend = c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE),
    show_annotation_name = FALSE,
    annotation_legend_param = list(
        dkfz_cluster = list(title = "DKFZ Methylation"),
        tcga_cluster = list(title = "TCGA Methylation"),
        tcga_expr = list(title = "TCGA Expression"),
        H3F3A = list(title = "Mutations"),
        chr7_gain = list(title = "CNV"))
)

在最后的图中,添加了四个热图。从左到右,有

  1. 肿瘤样本中甲基化的热图
  2. 正常样本中的甲基化
  3. 到最近的 TSS 的距离
  4. CpG 岛 (CGI) 注释。

热图根据 CGI 注释按行拆分。

绘制热图后,decorate_*()函数会添加附加图形,例如注释标签 。

col_fun = colorRamp2(c(0, 0.5, 1), c("#377EB8", "white", "#E41A1C"))
ht_list = Heatmap(m1, col = col_fun, name = "Methylation",
    clustering_distance_columns = "spearman",
    show_row_dend = FALSE, show_column_dend = FALSE,
    show_column_names = FALSE,
    bottom_annotation = ha, column_title = qq("GBM samples (n = @{ncol(m1)})"),
    row_split = factor(cgi2, levels = c("Island", "Shore", "Shelf", "OpenSea")), 
    row_title_gp = gpar(col = "#FFFFFF00")) + 
Heatmap(m2, col = col_fun, show_column_names = FALSE, 
    show_column_dend = FALSE, column_title = "Controls",
    show_heatmap_legend = FALSE, width = unit(1, "cm")) +
Heatmap(tss_dist, name = "tss_dist", col = colorRamp2(c(0, 2e5), c("white", "black")), 
    width = unit(5, "mm"),
    heatmap_legend_param = list(at = c(0, 1e5, 2e5), labels = c("0kb", "100kb", "200kb"))) + 
Heatmap(cgi2, name = "CGI", show_row_names = FALSE, width = unit(5, "mm"),
    col = structure(names = c("Island", "Shore", "Shelf", "OpenSea"), c("red", "blue", "green", "#CCCCCC")))
draw(ht_list, row_title = paste0("DNA methylation probes (n = ", nrow(m1), ")"),
    annotation_legend_side = "left", heatmap_legend_side = "left")

annotation_titles = c(dkfz_cluster = "DKFZ Methylation",
    tcga_cluster = "TCGA Methylation",
    tcga_expr = "TCGA Expression",
    IDH1 = "IDH1",
    H3F3A = "H3F3A",
    TP53 = "TP53",
    chr7_gain = "Chr7 gain",
    chr10_loss = "Chr10 loss",
    CDKN2A_del = "Chr10 loss",
    EGFR_amp = "EGFR amp",
    PDGFRA_amp = "PDGFRA amp")
for(an in names(annotation_titles)) {
    decorate_annotation(an, {
        grid.text(annotation_titles[an], unit(-2, "mm"), just = "right")
        grid.rect(gp = gpar(fill = NA, col = "black"))
    })
}
decorate_annotation("age", {
    grid.text("Age", unit(8, "mm"), just = "right")
    grid.rect(gp = gpar(fill = NA, col = "black"))
    grid.lines(unit(c(0, 1), "npc"), unit(c(20, 20), "native"), gp = gpar(lty = 2))
})
decorate_annotation("IDH1", {
    grid.lines(unit(c(-40, 0), "mm"), unit(c(1, 1), "npc"))
})
decorate_annotation("chr7_gain", {
    grid.lines(unit(c(-40, 0), "mm"), unit(c(1, 1), "npc"))
})
image

6. 添加多个箱线图到单行

注释函数anno_boxplot()只为一行绘制一个箱线图。当多个热图连接在一起,或者已经定义了列的组时,对于每一行,我们要在热图或列组之间进行比较,因此,需要为每一行绘制多个箱线图。

在以下示例中,我们演示了如何实现为单行绘制多个箱线图的注释函数。该grid.boxplot()函数来自ComplexHeatmap包,它可以很容易地在网格系统下绘制箱线图。

m1 = matrix(sort(rnorm(100)), 10, byrow = TRUE)
m2 = matrix(sort(rnorm(100), decreasing = TRUE), 10, byrow = TRUE)

ht_list = Heatmap(m1, name = "m1") + Heatmap(m2, name = "m2")

rg = range(c(m1, m2))
rg[1] = rg[1] - (rg[2] - rg[1])* 0.02
rg[2] = rg[2] + (rg[2] - rg[1])* 0.02
anno_multiple_boxplot = function(index) {
    nr = length(index)
    pushViewport(viewport(xscale = rg, yscale = c(0.5, nr + 0.5)))
    for(i in seq_along(index)) {
        grid.rect(y = nr-i+1, height = 1, default.units = "native")
        grid.boxplot(m1[ index[i], ], pos = nr-i+1 + 0.2, box_width = 0.3, 
            gp = gpar(fill = "red"), direction = "horizontal")
        grid.boxplot(m2[ index[i], ], pos = nr-i+1 - 0.2, box_width = 0.3, 
            gp = gpar(fill = "green"), direction = "horizontal")
    }
    grid.xaxis()
    popViewport()
}

ht_list = ht_list + rowAnnotation(boxplot = anno_multiple_boxplot, width = unit(4, "cm"), 
    show_annotation_name = FALSE)
lgd = Legend(labels = c("m1", "m2"), title = "boxplots",
    legend_gp = gpar(fill = c("red", "green")))
draw(ht_list, padding = unit(c(20, 2, 2, 2), "mm"), heatmap_legend_list = list(lgd))
image

其他技巧

参考链接

1.为不同维度的不同热图设置相同的单元格大小

假设您有一个 heatmaps/oncoPrints 列表要保存为不同的文件(例如 png 或 pdf ),您可能想要做的一件事是使热图中每个网格/单元格的大小在热图中相同,因此,您需要根据热图中的行数或列数计算png/pdf文件的大小。在ComplexHeatmap生成的热图中,所有热图组件都具有绝对大小,并且只有热图主体的大小(或单元格的大小)是可更改的(或者换句话说,如果您更改最终图形设备的大小,例如通过拖动图形窗口,如果您绘制时,仅调整热图主体的大小),这意味着整个图的大小与热图中的行数或列数呈线性相关。这意味着我们实际上可以拟合一个线性模型y = a*x + b,其中y是整个图的高度,x是行数。

在下面的例子中,我们简单地演示了如何在热图中建立绘图高度和行数之间的关系。我们首先定义一个函数,该函数生成一个具有特定行数的 10 列矩阵。请注意,矩阵中的值在本演示中并不重要。

random_mat = function(nr) {
    m = matrix(rnorm(10*nr), nc = 10)
    colnames(m) = letters[1:10]
    return(m)
}

由于是绝对线性的关系,我们只需要测试两个具有不同行数的热图,其中单行的高度为unit(5, "mm")。在热图中,还有列标题、列树状图、列注释和列名。

下面的代码有几点需要注意:

  1. 热图对象应该由draw()返回,因为热图的布局是在执行后才计算的draw()
  2. component_height()返回一个单位向量,这些单位对应于热图中从上到下的所有热图组件的高度。(component_width()返回热图组件的宽度)。
  3. 在计算 时ht_height,我们添加unit(4, "mm"),因为在最终图的顶部和底部,有2mm白色边框。
  4. ht_height需要转换为cminch单位。

在下文中,y包含以inch单位测量的值。

y = NULL
for(nr in c(10, 20)) {
    ht = draw(Heatmap(random_mat(nr), height = unit(5, "mm")*nr, 
        column_title = "foo", # one line text
        top_annotation = HeatmapAnnotation(bar = 1:10)))
    ht_height = sum(component_height(ht)) + unit(4, "mm")
    ht_height = convertHeight(ht_height, "inch", valueOnly = TRUE)
    y = c(y, ht_height)
}

然后我们可以拟合y和行数之间的线性关系:

x = c(10, 20)
lm(y ~ x)
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##      1.2222       0.1969

这意味着行数x和绘图高度y之间的关系是:y = 0.1969*x + 1.3150

您可以通过以下代码测试不同行的热图的单行高度是否相同。请注意,所有热图配置都应与您准备的y相同。

for(nr in c(10, 20)) {
    png(paste0("test_heatmap_nr_", nr, ".png"), width = 5, height = 0.1969*nr + 1.3150, 
        units = "in", res = 100)
    draw(Heatmap(random_mat(nr), height = unit(5, "mm")*nr, 
        column_title = "foo", # column title can be any one-line string
        top_annotation = HeatmapAnnotation(bar = 1:10)))
    dev.off()
}
image
image

你可能感兴趣的:(ComplexHeatmap复杂热图绘制学习——11.例子)