SingleR包可基于参考数据集，实现对单细胞数据细胞类型的自动注释。celldex包提供若干常用的人/老鼠的注释细胞类型的Bulk RNA-seq/microarray参考数据。

https://bioconductor.org/packages/release/bioc/vignettes/SingleR/inst/doc/SingleR.html

http://bioconductor.org/packages/release/data/experiment/vignettes/celldex/inst/doc/userguide.html

一、SingleR自动注释流程

1、准备输入数据

同一格式要求：（1）矩阵(matrix)/稀疏矩阵(dgCMatrix)/数据框(data.frame)均可，或者是SummarizedExperiment对象（默认指定为logcounts slot）；（2）必须是经标准化并log转换的表达矩阵，对应Seurat对象的data slot。
未知细胞类型、待注释的单细胞表达矩阵

#例如从 Seurat对象中提取
library(scRNAseq)
hESCs <- LaMannoBrainData('human-es') #SingleCellExperiment对象
assays(hESCs) #only counts
hESCs <- scuttle::logNormCounts(hESCs) 
# add logcounts slot
hESCs
library(Seurat)
scRNA = as.Seurat(hESCs)
scRNA
head([email protected])
scRNA <- NormalizeData(scRNA, normalization.method = "LogNormalize", scale.factor = 10000)
scRNA <- FindVariableFeatures(scRNA, selection.method = "vst", nfeatures = 2000) 
scRNA <- ScaleData(scRNA, features = VariableFeatures(scRNA))
scRNA <- RunPCA(scRNA, features = VariableFeatures(scRNA)) 
pc.num=1:20
scRNA <- FindNeighbors(scRNA, dims = pc.num) 
scRNA <- FindClusters(scRNA, resolution = c(0.01,0.05,0.1,0.2,0.5,0.7,0.9),
                      verbose = F)
table(scRNA$originalexp_snn_res.0.2)

norm_count = GetAssayData(scRNA, slot="data") #稀疏矩阵
dim(norm_count)
#[1] 18538  1715
norm_count[1:4,1:4]
# 4 x 4 sparse Matrix of class "dgCMatrix"
# 1772122_301_C02 1772122_180_E05 1772122_300_H02 1772122_180_B09
# WASH7P-p1               1.1275620       .                0.485604               .
# LINC01002-loc4          .               .                .                      .
# LOC100133331-loc1       0.7149377       0.5823631        .                      .
# LOC100132287-loc2       .               .                .                      .

已经完成细胞类型注释的参考数据集 ref；

library(celldex)
ref = HumanPrimaryCellAtlasData()
#ref = readRDS("C:/Users/xiaoxin/Desktop/生信数据/celldex/HumanPrimaryCellAtlasDatar.rds")

2、SingleR注释

根据注释方法以及参考数据集可分为如下几种情况

（1）参考数据集为Bulk RNA-seq/microarray来源的SummarizedExperiment对象，为每个细胞进行单独注释

pred<- SingleR(test = norm_count, 
               ref = ref, 
               labels = ref$label.main)
head(pred)
table(pred$labels)
# Astrocyte         Chondrocytes Embryonic_stem_cells            iPS_cells 
# 32                    1                  127                  195 
# Neuroepithelial_cell              Neurons  Smooth_muscle_cells 
# 1030                  325                    5 


#identical(rownames(pred),colnames(norm_count))
#TRUE

#将注释结果添加到seurat对象里
scRNA$singleR_cell = pred$labels
table(scRNA$singleR_cell)

（2）参考数据集为Bulk RNA-seq/microarray来源的SummarizedExperiment对象，以每个cluster为单位进行注释
其实只需要添加clusters参数即可，如下：

pred<- SingleR(test = norm_count, 
               ref = ref, 
               clusters = scRNA$originalexp_snn_res.0.2,
               labels = ref$label.main)
head(pred)
table(pred$labels)
scRNA$singleR_cluster = pred$labels[match(scRNA$originalexp_snn_res.0.2,
                                          rownames(pred))]
table(scRNA$singleR_cluster)

（3）参考数据集为Bulk RNA-seq/microarray来源的自己构建的表达矩阵，为每个细胞进行单独注释

# 表达矩阵: 行名为基因名，列名为细胞类型注释

# if start from "counts"
# ref <-  SummarizedExperiment(assays=list(counts=ref))
# ref <- scuttle::logNormCounts(ref) 
# ref_logcount<- assay(ref, "logcounts")

ref_logcount[1:4,1:4]
pred<- SingleR(test = norm_count, 
               ref = ref_logcount, 
               labels = colnames(ref_logcount))
head(pred)
table(pred$labels)

（4）参考数据集为scRNA-seq，为每个细胞进行单独注释
由于单细胞表达矩阵的特殊性(稀疏，大部分表达值为零)，所以需要选择更适合的比较算法Wilcoxon ranked sum test秩和检验。其它与上述一致

pred<- SingleR(test = test, 
               ref = ref, 
               labels = ref$label.main,
               de.method="wilcox")

（5）注释score可视化

plotScoreHeatmap(pred)
plotDeltaDistribution(pred.grun, ncol = 3)

二、celldex参考数据包

celldex数据包按人/鼠来分包含以下类型

Human reference dataset

Mouse reference dataset

下载数据集时，使用同名函数即可

library(celldex)
ref = HumanPrimaryCellAtlasData()

因为国外数据原因，有时数据集下载不稳定，可以在网络良好情况时下载，并保存为本地对象，方便下次使用

ref = readRDS("C:/Users/xiaoxin/Desktop/生信数据/celldex/HumanPrimaryCellAtlasDatar.rds")
assay(ref, "logcounts")[1:4,1:4]
unique(ref$label.main)
unique(ref$label.fine)
unique(ref$label.ont)

SingleR单细胞类型自动注释与celldex参考数据包

一、SingleR自动注释流程

1、准备输入数据

2、SingleR注释

二、celldex参考数据包

你可能感兴趣的:(SingleR单细胞类型自动注释与celldex参考数据包)