用Harmony整合单细胞数据并用LISI评估整合效果

harmony是一个非常流行的整合单细胞数据的算法,由于它的核心是用C++写的所以相比其他的类似算法快很多,如果单细胞样本量很大的时候推荐使用。

首先加载其文章中用到的一个示例数据,为三个人10X pbmc数据,分别为5'端测序,3'端测序V1,3'端测序V2。

exprs_raw <- list(
  './harmony/data/fiveprime_raw.rds',
  './harmony/data/threepv2_raw.rds',
  './harmony/data/threepv1_raw.rds'
) %>% 
  lapply(readRDS) %>% 
  purrr::reduce(Matrix::cbind2)

首先简单跑一遍Seurat并查看批次差异。可以看到有很明显的差异。

scRNA_harmony <- CreateSeuratObject(counts = exprs_raw, project = "pbmc harmony", min.cells = 3, min.features = 200)
[email protected]$group <- [email protected]$orig.ident
scRNA_harmony <- NormalizeData(scRNA_harmony) %>% FindVariableFeatures(selection.method = "vst", nfeatures = 2000) %>% ScaleData() %>% RunPCA(verbose=FALSE)

p1 <- DimPlot(object = scRNA_harmony, dims=c(1,2), reduction = "pca", pt.size = .1, group.by = "orig.ident")
p2 <- VlnPlot(object = scRNA_harmony, features = c("PC_1", "PC_2"), group.by = "orig.ident", pt.size = .1)
plot_grid(p1,p2)

plot_zoom_png.png

跑RunHarmony并检查整合结果。可以看到三组数据很好的整合在一起了。

scRNA_harmony <- RunHarmony(scRNA_harmony, group.by.vars = "orig.ident", theta = 1, plot_convergence = TRUE, dims.use=1:30)

p1 <- DimPlot(object = scRNA_harmony, reduction = "harmony", pt.size = .1, group.by = "orig.ident")
p2 <- VlnPlot(object = scRNA_harmony, features = c("harmony_1","harmony_2"), group.by = "orig.ident", pt.size = .1)
plot_grid(p1,p2)
plot_zoom_png.png

接下来就可以根据新的数据共享降维空间(dimension reduced shared space)来对数据进行下游分析了,此处省略。比较感兴趣是文章中提到的评估整合效果的metric iLISI(integration local inverse Simpson’s Index),原理示意图如下:

Screen Shot 2021-02-01 at 12.00.48 PM.png

简单来说就是对每个细胞计算其邻近细胞(nearest neighbours)是否来自于同一个数据集,是的话其值为1,反之其值大于1。下面我们来计算一下。

ix <- match(rownames(scRNA_harmony[[]]), meta_data$cell_id)
meta_data <- meta_data[ix,]

lisi_res <- rbind(
  lisi::compute_lisi(Embeddings(scRNA_harmony,reduction = "harmony"), meta_data, c('donor', 'res_0.80')) %>% 
    dplyr::mutate(type = 'harmony'),
  lisi::compute_lisi(Embeddings(scRNA_harmony,reduction = "pca"), meta_data, c('donor', 'res_0.80')) %>% 
    dplyr::mutate(type = 'pca')
) %>% 
  tidyr::gather(key, val, res_0.80, donor)

lisi_res %>% 
  ggplot(aes(val)) + 
  geom_density() + 
  facet_wrap(type ~ key, scales = 'free') + 
  theme_tufte(base_size = 20) +
  xlim(1, 3)+
  theme_grey()

可以看出整合前(PCA做聚类),iLISI值基本都在1,而整合后(harmony做聚类)明显增大且峰值在2。

plot_zoom_png.png

Reference:
https://www.nature.com/articles/s41592-019-0619-0#MOESM5
https://github.com/immunogenomics/harmony

你可能感兴趣的:(用Harmony整合单细胞数据并用LISI评估整合效果)