R 数据可视化 —— circlize 圆形热图

前言

在前一节,我们介绍了如何使用 circos.rect() 函数来绘制圆形热图,在 0.4.10 版本之后,circlize 包提供了 circos.heatmap() 函数,来简化圆形热图的创建和绘制。

下面我们来介绍如何使用该函数来绘制圆形热图,首先,创建数据

mat <- rbind(
  cbind(matrix(
    rnorm(50*5, mean = 1), nr = 50), 
    matrix(rnorm(50*5, mean = -1), nr = 50)
    ),
  cbind(matrix(
    rnorm(50*5, mean = -1), nr = 50), 
    matrix(rnorm(50*5, mean = 1), nr = 50)
    )
)
rownames(mat) <- paste0("R", 1:100)
colnames(mat) <- paste0("C", 1:10)
mat <- mat[sample(100, 100), ]
split <- sample(letters[1:5], 100, replace = TRUE)
split <- factor(split, levels = letters[1:5])

构造 100*10 的矩阵,并随机分为 5 类。如果是正常聚类热图的话

Heatmap(mat, row_split = split)

圆形热图

1. 输入数据

circos.heatmap() 的输入数据必须为矩阵,如果矩阵存在不同的分组,必须在 split 参数中传递字符向量或因子,如果传递了数值向量,会自动转换为字符

如果输入数据的矩阵是连续型数值,颜色参数 col 必须为 colorRamp2() 生成的颜色映射,如果是字符型,则需要传递定义颜色映射的命名向量

col_fun <- colorRamp2(c(-2, 0, 2), c("#fc8d59", "#ffffbf", "#91bfdb"))
circos.heatmap(mat, split = split, col = col_fun)
circos.clear()

行数据的绘制是沿着圆的方向,列是沿着半径的方向

如果不添加 split,则默认绘制一个大的扇形区域

circos.heatmap(mat, col = col_fun)
circos.clear()

记住要用 circos.clear() 来删除布局

2. 布局控制

可以在绘制圆形热图之前,使用 circos.par() 函数来控制布局,例如起始的角度、扇形之间的间隔等

circos.par(start.degree = 90, gap.degree = 10)
circos.heatmap(
  mat, split = split, col = col_fun, 
  track.height = 0.4, bg.border = "red", 
  bg.lwd = 2, bg.lty = 2, show.sector.labels = TRUE
  )
circos.clear()

扇形的顺序可以使用 factorlevels 参数来设置

circos.heatmap(
  mat, split = factor(split, levels = c("e", "d", "c", "b", "a")), 
  col = col_fun, show.sector.labels = TRUE
  )
circos.clear()

我们使用 circos.clear() 清除布局之后,扇形从 0 度开始

3. 聚类

默认情况下,会对数值矩阵按行进行聚类,将 cluster 设置为 FALSE 可以关闭聚类。如果没有执行聚类,尽管设置了 dend.side 参数的值也不会绘制树状图

par(mfcol = c(1, 2))
circos.heatmap(mat, split = split, 
               cluster = TRUE, col = col_fun)
circos.clear()
text(0, 0, "cluster = TRUE")
circos.heatmap(mat, split = split, 
               cluster = FALSE, col = col_fun)
circos.clear()
text(0, 0, "cluster = FALSE")

clustering.methoddistance.method 两个参数用于设置聚类方法(hclust)和距离度量(dist)

circos.heatmap() 不能直接对列进行聚类,可以在绘制之前,手动对列进行聚类,然后按聚类的顺序对矩阵的列进行重排

column_od <- hclust(dist(t(mat)))$order
circos.heatmap(mat[, column_od], split = split, 
               col = col_fun)
circos.clear()

4. 树状图和行名

dend.side 参数可以控制树状图相对于热图的位置,树状图是一个独立的轨迹

par(mfcol = c(1, 2))
circos.heatmap(
  mat, split = split, col = col_fun, 
  dend.side = "inside"
  )
circos.clear()
circos.heatmap(
  mat, split = split, col = col_fun, 
  dend.side = "outside"
  )
circos.clear()

dend.track.height 参数可以控制树状图的高度

rownames.side 参数用于控制矩阵行名的显示,类似于树状图,也是一个独立于热图的轨迹

par(mfcol = c(1, 2))
circos.heatmap(
  mat, split = split, col = col_fun, 
  rownames.side = "inside"
  )
circos.clear()
text(0, 0, 'rownames.side = "inside"')
circos.heatmap(
  mat, split = split, col = col_fun, 
  rownames.side = "outside"
  )
circos.clear()
text(0, 0, 'rownames.side = "outside"')

同时显示树状图和行名,当然你不会傻到两个图绘制在同一个方向

par(mfcol = c(1, 2))
circos.heatmap(
  mat, split = split, col = col_fun, 
  rownames.side = "inside",  dend.side = "outside"
  )
circos.clear()
text(0, 0, 'A')
circos.heatmap(
  mat, split = split, col = col_fun, 
  rownames.side = "outside",  dend.side = "inside"
  )
circos.clear()
text(0, 0, 'B')

行名的图形属性可以是标量值或长度与行数一致向量值

circos.heatmap(
  mat, split = split, col = col_fun, 
  rownames.side = "outside",
  rownames.col = 1:nrow(mat) %% 10 + 1,
  rownames.cex = runif(nrow(mat), min = 0.3, max = 1.5),
  rownames.font = 1:nrow(mat) %% 4 + 1
)

树状图的渲染是通过回调函数来实现的,回调函数会在相应的扇形中生成了树状图之后调用,可以用于设置树状图的顺序和颜色

dend.callback 参数用于接收自定义的回调函数,该函数需要传递三个参数

  • dend: 当前扇形的树状图
  • m: 当前扇形所对应的数据子集
  • si: 当前扇形的索引或名称

默认的回调函数是

function(dend, m, si) reorder(dend, rowMeans(m))

会根据行均值对树状图进行重排

我们也可以使用其他排序方式,例如 dendsort::dendsort()

library(dendsort)

par(mfcol = c(1, 2))
circos.heatmap(
  mat, split = split, col = col_fun, 
  dend.side = "inside"
)
circos.clear()
text(0, 0, "reorder by row means")

circos.heatmap(
  mat, split = split, col = col_fun, 
  dend.side = "inside", 
  dend.callback = function(dend, m, si) {
    dendsort(dend)
  }
)
circos.clear()
text(0, 0, "reorder by dendsort")

使用 dendextend 包的 color_branches() 函数,可以对树状图进行渲染

library(dendextend)

dend_col <- structure(rainbow(5), names = letters[1:5])
circos.heatmap(
  mat, split = split, col = col_fun, 
  dend.side = "inside", dend.track.height = 0.2,
  dend.callback = function(dend, m, si) {
    color_branches(dend, k = 1, col = dend_col[si])
  }
)
circos.clear()

如果矩阵未分组,可以设置子树的颜色,在这里,我们设置子树的数量为 4

circos.heatmap(
  mat, col = col_fun, 
  dend.side = "inside", dend.track.height = 0.2,
  dend.callback = function(dend, m, si) {
    color_branches(dend, k = 4, col = rainbow(4))
  }
)
circos.clear()

5. 多个热图轨迹

绘制多个热图轨迹,只要多次调用 circos.heatmap() 就行,但是还有一些细节需要注意

首先,第一次调用 circos.heatmap() 函数时,会初始化圆形布局,后面调用的 circos.heatmap() 函数,将会按照该布局进行绘制,即具有相同的行顺序以及分组。

例如

mat2 <- mat[sample(100, 100), ]
col_fun2 <- colorRamp2(c(-2, 0, 2), c("#af8dc3", "#f7f7f7", "#7fbf7b"))

par(mfcol = c(1, 2))
circos.heatmap(mat, split = split, col = col_fun, dend.side = "outside")
circos.heatmap(mat2, col = col_fun2)
circos.clear()
text(0, 0, "first")

circos.heatmap(mat2, split = split, col = col_fun2, dend.side = "outside")
circos.heatmap(mat, col = col_fun)
circos.clear()
text(0, 0, "second")

上面两个图形都是按照第一条轨迹的布局进行绘制的,如果不想以第一条轨迹作为布局,可以使用 circos.heatmap.initialize() 函数来指定初始化布局的数据,例如

circos.heatmap.initialize(mat, split = split)
circos.heatmap(mat2, col = col_fun2, dend.side = "outside")
circos.heatmap(mat, col = col_fun)
circos.clear()

我们以 mat 数据来初始化布局,然后分别绘制前、后五列的数据

circos.heatmap.initialize(mat, split = split)
circos.heatmap(mat[, 1:5], col = col_fun)
circos.heatmap(mat[, 6:10], col = col_fun)
circos.clear()

可以看到,分开绘制与同事绘制,其整体效果是一样的。

6. 与其他轨迹混用

circos.heatmap() 还可以与其他类型的轨迹混用,在初始化圆形热图布局之后,也有一个特殊的变量 CELL_META 用于保存扇形/轨迹/单元格的信息

  • CELL_META$row_dend(CELL_META$dend): 当前扇形的树状图,如果未聚类,则为 NULL
  • CELL_META$row_order(simply CELL_META$order): 当前扇形所对应的数据子集聚类后的顺序,如果未聚类,则为 c(1,2,...)
  • CELL_META$subset: 当前扇形所对应的数据在矩阵中的索引,索引按升序排列

例如,在热图的内侧绘制行均值的点图

circos.heatmap(mat, split = split, col = col_fun)
row_mean <- rowMeans(mat[, 1:5])
circos.track(
  ylim = range(row_mean), panel.fun = function(x, y) {
    # 获取当前扇形对应的数据的行均值
    y = row_mean[CELL_META$subset]
    # 将行均值按聚类顺序排列
    y = y[CELL_META$row_order]
    # 添加线条和点
    circos.lines(CELL_META$cell.xlim, c(0, 0), lty = 2, col = "grey50")
    circos.points(seq_along(y) - 0.5, y, col = ifelse(y > 0, "red", "blue"))
  }, 
  cell.padding = c(0.02, 0, 0.02, 0)
  )
circos.clear()

如果想要在外侧绘制点图,内侧绘制热图,则需要先使用 circos.heatmap.initialize() 函数进行热图布局

circos.heatmap.initialize(mat, split = split)

row_mean <- rowMeans(mat[, 1:5])
circos.track(
  ylim = range(row_mean), panel.fun = function(x, y) {
    y = row_mean[CELL_META$subset]
    y = y[CELL_META$row_order]
    circos.lines(CELL_META$cell.xlim, c(0, 0), lty = 2, col = "grey50")
    circos.points(seq_along(y) - 0.5, y, col = ifelse(y > 0, "red", "blue"))
  }, 
  cell.padding = c(0.02, 0, 0.02, 0)
)
circos.heatmap(mat, split = split, col = col_fun)
circos.clear()

或者,使用箱线图

circos.heatmap(mat, split = split, col = col_fun)
circos.track(
  ylim = range(mat), panel.fun = function(x, y) {
    m = mat[CELL_META$subset, , drop = FALSE]
    m = m[CELL_META$row_order, , drop = FALSE]
    n = nrow(m)
    # circos.boxplot 应用于矩阵的列,所以需要转置
    circos.boxplot(
      t(m), pos = 1:n - 0.5, pch = 16, cex = 0.3, 
      col = CELL_META$sector.numeric.index)
    circos.lines(CELL_META$cell.xlim, c(0, 0), lty = 2, col = "grey50")
  }, 
  cell.padding = c(0.02, 0, 0.02, 0)
)
circos.clear()

7. 添加注释

显示扇形的标签,可以设置 show.sector.labels = TRUE

如果需要显示自定义的标签,需要使用 panel.fun 函数来绘制,例如

circos.heatmap(mat, split = split, col = col_fun)
circos.track(
  track.index = get.current.track.index(), 
  panel.fun = function(x, y) {
    circos.text(
      CELL_META$xcenter,
      CELL_META$cell.ylim[2] + convert_y(2, "mm"),
      paste0("this is group ", CELL_META$sector.index),
      facing = "bending.inside", cex = 0.8, 
      adj = c(0.5, 0), niceFacing = TRUE
    )
  }, bg.border = NA)
circos.clear()

circos.heatmap() 函数并没有提供显示列名的方法,所以只能在 panel.fun 函数中添加

我们需要使用 circos.par() 函数为最后一个扇形设置较大间隔,好放置列标签,然后在 panel.fun 函数中,在最后一个扇形的末尾处添加列标签,例如

circos.par(gap.after = c(2, 2, 2, 2, 10))
circos.heatmap(
  mat, split = split, col = col_fun,
  track.height = 0.4
  )
circos.track(
  track.index = get.current.track.index(), 
  panel.fun = function(x, y) {
    if (CELL_META$sector.numeric.index == 5) {
      # 在最后一个扇形中
      cn = colnames(mat)
      n = length(cn)
      circos.text(
        rep(CELL_META$cell.xlim[2], n) + convert_x(1, "mm"),
        1:n - 0.5, cn, cex = 0.5, adj = c(0, 0.5),
        facing = "inside"
      )
    }
  }, 
  bg.border = NA
)
circos.clear()

类似的,我们可以将列名换成分组标签,并为标签添加矩形框

circos.par(gap.after = c(2, 2, 2, 2, 10))
circos.heatmap(mat, split = split, col = col_fun, track.height = 0.4)
circos.track(
  track.index = get.current.track.index(), 
  panel.fun = function(x, y) {
    if(CELL_META$sector.numeric.index == 5) {
      # 第一个分组
      circos.rect(
        CELL_META$cell.xlim[2] + convert_x(1, "mm"), 0,
        CELL_META$cell.xlim[2] + convert_x(5, "mm"), 5,
        col = "green", border = NA
      )
      circos.text(
        CELL_META$cell.xlim[2] + convert_x(3, "mm"), 2.5,
        "group 1", cex = 0.5, facing = "clockwise"
      )
      # 第二个分组
      circos.rect(
        CELL_META$cell.xlim[2] + convert_x(1, "mm"), 5,
        CELL_META$cell.xlim[2] + convert_x(5, "mm"), 10,
        col = "pink", border = NA
      )
      circos.text(
        CELL_META$cell.xlim[2] + convert_x(3, "mm"), 7.5,
        "group 2", cex = 0.5, facing = "clockwise"
      )
    }
  }, bg.border = NA
)
circos.clear()

添加图例

circos.heatmap(mat, split = split, col = col_fun)
circos.clear()

library(ComplexHeatmap)
lgd <- Legend(title = "mat", col_fun = col_fun)
grid.draw(lgd)

8. 复杂热图示例

我们将通过一个示例来说明如何绘制复杂热图,以热图的方式来展示 DNA 甲基化、基因表达以及其他基因组层面的信息。

我们有一份模拟的数据:https://github.com/dxsbiocc/learn/blob/main/data/meth.rds

res_list <- readRDS("Downloads/meth.rds")

type <- res_list$type
mat_meth <- res_list$mat_meth
mat_expr <- res_list$mat_expr
direction <- res_list$direction
cor_pvalue <- res_list$cor_pvalue
gene_type <- res_list$gene_type
anno_gene <- res_list$anno_gene
dist <- res_list$dist
anno_enhancer <- res_list$anno_enhancer

该数据包含 9 部分:

  • mat_meth: 甲基化谱,行表示差异甲基化区域(DMR),列表示样本
  • mat_expr: 与 DMR 相关的基因的表达谱
  • direction: 甲基化方向(hyper:超高甲基化,hypo:低甲基化)
  • cor_pvalue: 基因表达与甲基化的相关性,经过 -log10 转化
  • gene_type: 基因的类型(如,protein coding geneslincRNA)
  • anno_gene: 基因模型的注释(如,intergenic, intragenictranscription start site(TSS))
  • dist: DMR 与相关基因的 TSS 之间的距离
  • anno_enhancer: DMR 与增强子之间的重叠部分

mat_methmat_exprcor_pvaluedistanno_enhancer 是数值型,可以设置为连续型颜色,其他的数据可以设置为离散的颜色映射

例如

# k-means 聚类
km <- kmeans(mat_meth, centers = 5)$cluster
# 绘制甲基化谱热图
col_meth <- colorRamp2(c(0, 0.5, 1), c("#a6611a", "#f5f5f5", "#018571"))
circos.heatmap(mat_meth, split = km, col = col_meth, track.height = 0.12)
# 绘制甲基化方向热图,离散型
col_direction <- c("hyper" = "red", "hypo" = "blue")
circos.heatmap(direction, col = col_direction, track.height = 0.01)
# 绘制基因表达谱热图
col_expr <- colorRamp2(c(-2, 0, 2), c("#d01c8b", "#f7f7f7", "#4dac26"))
circos.heatmap(mat_expr, col = col_expr, track.height = 0.12)
# 绘制相关性热图
col_pvalue <- colorRamp2(c(0, 2, 4), c("#f1a340", "#f7f7f7", "#998ec3"))
circos.heatmap(cor_pvalue, col = col_pvalue, track.height = 0.01)
# 绘制基因类型热图,离散型
library(RColorBrewer)
col_gene_type <- structure(brewer.pal(length(unique(gene_type)), "Set3"), names = unique(gene_type))
circos.heatmap(gene_type, col = col_gene_type, track.height = 0.01)
# 基因注释信息,离散型
col_anno_gene <- structure(brewer.pal(length(unique(anno_gene)), "Set1"), names = unique(anno_gene))
circos.heatmap(anno_gene, col = col_anno_gene, track.height = 0.01) 
# 距离热图
col_dist <- colorRamp2(c(0, 10000), c("#ef8a62", "#67a9cf"))
circos.heatmap(dist, col = col_dist, track.height = 0.01)
# 重叠比例热图
col_enhancer <- colorRamp2(c(0, 1), c("#fc8d59", "#99d594"))
circos.heatmap(anno_enhancer, col = col_enhancer, track.height = 0.03)

因为矩阵的行对应的是差异甲基化区域,我们可以绘制不同区域之间的连接线,表示不同区域之间存在某种关系。

我们构造随机的连接

df_link <- data.frame(
  from_index = sample(nrow(mat_meth), 20),
  to_index = sample(nrow(mat_meth), 20)
)

绘制连接线

for(i in seq_len(nrow(df_link))) {
  # 假设,我们的连接时 DMR1 ——>  DMR2.
  
  # 获取 DMR1 所属的扇形
  group1 = km[ df_link$from_index[i] ]
  # 获取 DMR2 所属的扇形
  group2 = km[ df_link$to_index[i] ]
  
  # 获取 DMR1 所属扇形的数据索引
  subset1 = get.cell.meta.data("subset", sector.index = group1)
  # 该扇形的行顺序
  row_order1 = get.cell.meta.data("row_order", sector.index = group1)
  # 获取 DMR1 在扇形中的位置
  x1 = which(subset1[row_order1] == df_link$from_index[i])
  
  # 获取 DMR2 所属扇形的数据索引
  subset2 = get.cell.meta.data("subset", sector.index = group2)
  row_order2 = get.cell.meta.data("row_order", sector.index = group2)
  x2 = which(subset2[row_order2] == df_link$to_index[i])
  
  # 连接 DMR1 和 DMR2 的中点位置
  circos.link(group1, x1 - 0.5, group2, x2 - 0.5, col = rand_color(1))
}

其实,上面的代码也可以直接使用连接行索引的方式

for(i in seq_len(nrow(df_link))) {
  circos.heatmap.link(
    df_link$from_index[i],
    df_link$to_index[i],
    col = rand_color(1)
    )
}

现在,就剩最后一步了,添加图例,还是使用 ComplexHeatmap::Legend() 函数来进行添加

lgd_meth <- Legend(title = "Methylation", col_fun = col_meth)
lgd_direction <- Legend(
  title = "Direction", at = names(col_direction), 
  legend_gp = gpar(fill = col_direction)
  )
lgd_expr <- Legend(title = "Expression", col_fun = col_expr)
lgd_pvalue <- Legend(
  title = "P-value", col_fun = col_pvalue, at = c(0, 2, 4), 
  labels = c(1, 0.01, 0.0001)
  )
lgd_gene_type <- Legend(
  title = "Gene type", at = names(col_gene_type), 
  legend_gp = gpar(fill = col_gene_type)
  )
lgd_anno_gene <- Legend(
  title = "Gene anno", at = names(col_anno_gene), 
  legend_gp = gpar(fill = col_anno_gene)
  )
lgd_dist <- Legend(
  title = "Dist to TSS", col_fun = col_dist, 
  at = c(0, 5000, 10000), labels = c("0kb", "5kb", "10kb")
  )
lgd_enhancer <- Legend(
  title = "Enhancer overlap", col_fun = col_enhancer, 
  at = c(0, 0.25, 0.5, 0.75, 1), 
  labels = c("0%", "25%", "50%", "75%", "100%")
  )

同时,我们将热图的绘制放置一个函数里面

circlize_plot = function() {
  circos.heatmap(mat_meth, split = km, col = col_meth, track.height = 0.12)
  circos.heatmap(direction, col = col_direction, track.height = 0.01)
  circos.heatmap(mat_expr, col = col_expr, track.height = 0.12)
  circos.heatmap(cor_pvalue, col = col_pvalue, track.height = 0.01)
  circos.heatmap(gene_type, col = col_gene_type, track.height = 0.01)
  circos.heatmap(anno_gene, col = col_anno_gene, track.height = 0.01) 
  circos.heatmap(dist, col = col_dist, track.height = 0.01)
  circos.heatmap(anno_enhancer, col = col_enhancer, track.height = 0.03)
  
  for(i in seq_len(nrow(df_link))) {
    circos.heatmap.link(
      df_link$from_index[i], df_link$to_index[i], col = rand_color(1))
  }
  circos.clear()
}

我们使用 gridBase 包来混合 base 图形和 grid 系统的图例。在这里,我们将图形输出为文件,如果在 RStudio 中绘制,显示的图形会出现不完整或重叠的情况,需要注意

library(gridBase)

# 创建 png 图形设备,并设置足够的大小
# 注意:如果图形设备的大小太小,会提示 "figure margins too large"
# 并且,gridOMI() 会返回负值
png(filename = "~/Downloads/a.png", width = 1000, height = 800)
plot.new()
circle_size = unit(1, "snpc") # snpc unit gives you a square region

pushViewport(viewport(
  x = 0, y = 0.5, width = circle_size, 
  height = circle_size, just = c("left", "center"))
  )
# 设置 new = TRUE,避免重新创建图形
par(omi = gridOMI(), new = TRUE)
circlize_plot()
upViewport()

# 获取图形设备的高度
h <- dev.size()[2]
lgd_list = packLegend(
  lgd_meth, lgd_direction, lgd_expr, 
  lgd_pvalue, lgd_gene_type,  lgd_anno_gene, 
  lgd_dist, lgd_enhancer, 
  max_height = unit(0.9*h, "inch")
  )
draw(lgd_list, x = circle_size, just = "left")
dev.off()
circos.clear()

完整代码:https://github.com/dxsbiocc/learn/blob/main/R/plot/circos_heatmap.R

你可能感兴趣的:(R 数据可视化 —— circlize 圆形热图)