单细胞测序分析: R harmony包 整合多个单细胞数据

Overview of Harmony algorithm

Fast, sensitive and accurate integration of single-cell data with Harmony

#系统需求

  • Linux, OS X, 和Windows系统均可以;
  • R 版本需要3.4以上
  • Python 用户参考harmonypy

#安装

library(devtools)
install_github("immunogenomics/harmony")

#例子

##PCA matrix

Harmony 可以迭代矫正PCA 降维数据;使用PCA数据,需要设置:do_pca=FALSE

data(cell_lines_small)
pca_matrix <- cell_lines_small$scaled_pcs
meta_data <- cell_lines_small$meta_data
harmony_embeddings <- HarmonyMatrix(pca_matrix, meta_data, 'dataset', 
                                    do_pca=FALSE)

##\## Output is a matrix of corrected PC embeddings
dim(harmony_embeddings)
harmony_embeddings[seq_len(5), seq_len(5)]

##\## Finally, we can return an object with all the underlying data structures
harmony_object <- HarmonyMatrix(pca_matrix, meta_data, 'dataset', 
                                do_pca=FALSE, return_object=TRUE)
dim(harmony_object$Y) ## cluster centroids
dim(harmony_object$R) ## soft cluster assignment
dim(harmony_object$Z_corr) ## corrected PCA embeddings
head(harmony_object$O) ## batch by cluster co-occurence matrix

##Normalized gene matrix

Harmony期望导入的数据是标准化之后的数据。Harmony 会缩放数据,降维(PCA),最后数据整合。

library(harmony)
my_harmony_embeddings <- HarmonyMatrix(normalized_counts, meta_data, "dataset")

##Seurat

在Seurat分析流程中使用Harmony:Seurat V2 Seurat V3;使用RunHarmony()代替PCA,之后runUMAP().

seuratObj <- RunHarmony(seuratObj, "dataset")
seuratObj <- RunUMAP(seuratObj, reduction = "harmony")

##Harmony with two or more covariates

Harmony 可以基于多个协变量整合数据;整合时,通过向量指定协变量。

my_harmony_embeddings <- HarmonyMatrix(
  my_pca_embeddings, meta_data, c("dataset", "donor", "batch_id"),
  do_pca = FALSE
)

Seurat 流程中:

seuratObject <- RunHarmony(seuratObject, c("dataset", "donor", "batch_id"))

详细使用方法参考: advanced tutorial

Fast, sensitive and accurate integration of single-cell data with Harmony 文章代码复现见harmony2019

#参考:

Harmony
Fast, sensitive and accurate integration of single-cell data with Harmony

你可能感兴趣的:(单细胞测序分析: R harmony包 整合多个单细胞数据)