我们提出了一个稍微修改的工作流程,用于整合 scRNA-seq 数据集。我们不再使用("CCA") 来识别锚点,而是使用互惠 PCA ("RPCA")。在使用RPCA确定任意两个数据集之间的锚点时,我们将每个数据集投影到其他 PCA 空间中,并按相同的邻近要求寻找锚点。两个工作流的命令基本相同,但两种方法可能在不同的环境中应用。
CCA 非常适合在细胞类型保守时识别锚点,但在整个实验中,基因表达通常存在非常显著的差异。因此,基于CCA的整合能够在实验条件或疾病状态引入非常强烈的表达变化时,或在将数据集跨模式和物种时进行综合分析。但是,基于 CCA 的整合可能导致过度校正,尤其是当很大一部分细胞在数据集之间不重叠时。
基于RPCA的集成运行速度明显加快,代表了一种更为保守的方法,即不同生物状态的细胞在整合后不太可能"对齐"。因此,我们建议RPCA应用在如下综合分析中:
一个数据集中的很大一部分细胞在另一个数据集中没有匹配类型
数据集来自同一平台(即10x的多个通道)
有大量的数据集或细胞要整合
下面,我们演示了互惠 PCA 的用法。虽然命令几乎相同,但此工作流要求用户在整合之前在每个数据集上单独运行主成分分析 (PCA)。用户在运行时还应将"降维"参数设置为"rpca"。
library(SeuratData)
# install dataset
InstallData("ifnb")
# load dataset
LoadData("ifnb")
# split the dataset into a list of two seurat objects (stim and CTRL)
ifnb.list <- SplitObject(ifnb, split.by = "stim")
# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
# select features that are repeatedly variable across datasets for integration run PCA on each
# dataset using these features
features <- SelectIntegrationFeatures(object.list = ifnb.list)
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- ScaleData(x, features = features, verbose = FALSE)
x <- RunPCA(x, features = features, verbose = FALSE)
})
执行整合
我们使用[FindIntegrationAnchors()]识别锚点,该功能以 Seurat 对象列表作为输入,并使用这些锚点将两个数据集集成在一起。
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features, reduction = "rpca")
# this command creates an 'integrated' data assay
immune.combined <- IntegrateData(anchorset = immune.anchors)
现在,我们可以对所有细胞进行单次整合分析!
# specify that we will perform downstream analysis on the corrected data note that the original
# unmodified data still resides in the 'RNA' assay
DefaultAssay(immune.combined) <- "integrated"
# Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", group.by = "seurat_annotations", label = TRUE,
repel = TRUE)
p1 + p2
修改整合的强度
结果表明,基于rpca的整合更为保守,在这种情况下,不能完全对齐实验中细胞的亚群(如naive and memory T cells)。您可以通过增加参数k.anchor
来增加对齐的强度,该参数默认设置为 5。将这一参数增加到20将有助于对齐这些亚群。
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, anchor.features = features, reduction = "rpca",
k.anchor = 20)
immune.combined <- IntegrateData(anchorset = immune.anchors)
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 13999
## Number of edges: 594589
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9128
## Number of communities: 15
## Elapsed time: 8 seconds
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap", label = TRUE, repel = TRUE)
p1 + p2
现在,数据集已经整合,您可以按照之前的步骤来识别细胞类型和细胞类型特定响应。
在SCTransform的数据集上执行整合
作为一个例子,我们重复上述分析,但使用SCTransform标准化数据。我们可以选择将方法参数设置为glmGamPoi
(安装在这里),以便更快地估计回归参数。
LoadData("ifnb")
ifnb.list <- SplitObject(ifnb, split.by = "stim")
ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform, method = "glmGamPoi")
features <- SelectIntegrationFeatures(object.list = ifnb.list, nfeatures = 3000)
ifnb.list <- PrepSCTIntegration(object.list = ifnb.list, anchor.features = features)
ifnb.list <- lapply(X = ifnb.list, FUN = RunPCA, features = features)
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list, normalization.method = "SCT",
anchor.features = features, dims = 1:30, reduction = "rpca", k.anchor = 20)
immune.combined.sct <- IntegrateData(anchorset = immune.anchors, normalization.method = "SCT", dims = 1:30)
immune.combined.sct <- RunPCA(immune.combined.sct, verbose = FALSE)
immune.combined.sct <- RunUMAP(immune.combined.sct, reduction = "pca", dims = 1:30)
# Visualization
p1 <- DimPlot(immune.combined.sct, reduction = "umap", group.by = "stim")
p2 <- DimPlot(immune.combined.sct, reduction = "umap", group.by = "seurat_annotations", label = TRUE,
repel = TRUE)
p1 + p2