garnett细胞注释(基于monocle3)

Garnett is a software package that faciliates automated cell type classification from single-cell expression data. Garnett works by taking single-cell data, along with a cell type definition (marker) file, and training a regression-based classifier. Once a classifier is trained for a tissue/sample type, it can be applied to classify future datasets from similar tissues. In addition to describing training and classifying functions, this website aims to be a repository of previously trained classifiers.(https://cole-trapnell-lab.github.io/garnett/docs_m3/)
Garnett是一个R软件包,可以根据单细胞表达数据进行自动细胞类型分类。 Garnett的工作方式是获取但单个细胞数据以及细胞类型定义(标记)文件,并训练基于回归的分类器。 当针对某个组织/样本类型训练出了分类器,就可以将其应用于对相似组织的未来数据集进行分类。 除了训练和分类功能之外,该网站还旨在成为以前培训过的分类器的存储库。

目前软件有的数据库有这些:


数据库截图

1. 安装garnett包

先安装这些依赖包,如果已经安装,可以忽略这步。

# First install Bioconductor and Monocle 3
if (!requireNamespace("BiocManager"))
    install.packages("BiocManager")

BiocManager::install()

# Next install a few more dependencies
BiocManager::install(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats',
                       'limma', 'S4Vectors', 'SingleCellExperiment',
                       'SummarizedExperiment'))

install.packages("devtools")
devtools::install_github('cole-trapnell-lab/monocle3')

然后再来安装garnett

# Install a few Garnett dependencies:
BiocManager::install(c('org.Hs.eg.db', 'org.Mm.eg.db'))

# Install Garnett
devtools::install_github("cole-trapnell-lab/garnett", ref="monocle3")

要注意这里一定要加上ref="monocle3",要不然就会装成monocle版本的了,有些函数和monocle3是不一样的。
这个教程我是直接使用的现有的参考分类器来测试的,感兴趣的可以自己自定义生成分类器来测试。
首先下载这个分类器:

RDS分类器下载

保存到指定的目录下,导入分类器:

# 导入包
library(monocle3)
library(garnett)
# 导入现有分类器
classifier <- readRDS("D:/test/garnett/hsPBMC_20191017.RDS")

然后得到测试数据

# load in the data
# NOTE: the 'system.file' file name is only necessary to read in
# included package data
#
mat <- Matrix::readMM(system.file("extdata", "exprs_sparse.mtx", package = "garnett"))
fdata <- read.table(system.file("extdata", "fdata.txt", package = "garnett"))
pdata <- read.table(system.file("extdata", "pdata.txt", package = "garnett"),
                    sep="\t")
row.names(mat) <- row.names(fdata)
colnames(mat) <- row.names(pdata)

# create a new CDS object
pbmc_cds <- new_cell_data_set(as(mat, "dgCMatrix"),
                              cell_metadata = pdata,
                              gene_metadata = fdata)

pbmc_classifier <- classifier
library(org.Hs.eg.db)
pbmc_cds <- classify_cells(pbmc_cds, pbmc_classifier,
                           db = org.Hs.eg.db,
                           cluster_extend = TRUE,
                           cds_gene_id_type = "SYMBOL")

cluster_extend = TRUE表示可拓展的分类。

classify_cells函数参数详细说明如下:

  • cds: This is the CDS object containing your gene expression data (see above).

  • classifier: This the the garnett_classifier you obtained above.

  • db: db is a required argument for a Bioconductor AnnotationDb-class package used for converting gene IDs. For example, for humans use org.Hs.eg.db. See available packages at the Bioconductor website. Load your chosen db using library(db). If your species does not have an AnnotationDb-class package, see here.

  • cluster_extend: This tells Garnett whether to create a second set of assignments that expands classifications to cells in the same cluster. You can either provide cluster IDs in the pData table in a column titled "garnett_cluster", or you can let Garnett calculate the clusters and populate the column.

    Warning: if not providing a "garnett_cluster" column and setting cluster_extend to TRUE with a very large dataset, this function will slow down considerably. For convenience, Garnett will save the clusters it calculates to "garnett_cluster", so the function will be faster if run again.

  • cds_gene_id_type: This argument tells Garnett the format of the gene IDs in your CDS object. It should be one of the values in columns(db). The default is "ENSEMBL".

最后看下结果:

head(pData(pbmc_cds))
结果

然后看下细胞类型统计:

table(pData(pbmc_cds)$cell_type)
table(pData(pbmc_cds)$cluster_ext_type)
结果

结果

根据上面两个注释结果进行绘图:

library(ggplot2)
qplot(tsne_1, tsne_2, color = cell_type, data = as.data.frame(pData(pbmc_cds))) + 
    theme_bw()
qplot(tsne_1, tsne_2, color = cluster_ext_type, data = as.data.frame(pData(pbmc_cds))) + 
    theme_bw()
cell_type

cluster_ext_type

参考链接:

  1. 官方包的教程:https://cole-trapnell-lab.github.io/garnett/docs_m3/
  2. 现有的分类器:https://cole-trapnell-lab.github.io/garnett/classifiers/

你可能感兴趣的:(garnett细胞注释(基于monocle3))