Garnett is a software package that faciliates automated cell type classification from single-cell expression data. Garnett works by taking single-cell data, along with a cell type definition (marker) file, and training a regression-based classifier. Once a classifier is trained for a tissue/sample type, it can be applied to classify future datasets from similar tissues. In addition to describing training and classifying functions, this website aims to be a repository of previously trained classifiers.(https://cole-trapnell-lab.github.io/garnett/docs_m3/)
Garnett是一个R软件包,可以根据单细胞表达数据进行自动细胞类型分类。 Garnett的工作方式是获取但单个细胞数据以及细胞类型定义(标记)文件,并训练基于回归的分类器。 当针对某个组织/样本类型训练出了分类器,就可以将其应用于对相似组织的未来数据集进行分类。 除了训练和分类功能之外,该网站还旨在成为以前培训过的分类器的存储库。
目前软件有的数据库有这些:
1. 安装garnett包
先安装这些依赖包,如果已经安装,可以忽略这步。
# First install Bioconductor and Monocle 3
if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
BiocManager::install()
# Next install a few more dependencies
BiocManager::install(c('BiocGenerics', 'DelayedArray', 'DelayedMatrixStats',
'limma', 'S4Vectors', 'SingleCellExperiment',
'SummarizedExperiment'))
install.packages("devtools")
devtools::install_github('cole-trapnell-lab/monocle3')
然后再来安装garnett
# Install a few Garnett dependencies:
BiocManager::install(c('org.Hs.eg.db', 'org.Mm.eg.db'))
# Install Garnett
devtools::install_github("cole-trapnell-lab/garnett", ref="monocle3")
要注意这里一定要加上ref="monocle3"
,要不然就会装成monocle版本的了,有些函数和monocle3是不一样的。
这个教程我是直接使用的现有的参考分类器来测试的,感兴趣的可以自己自定义生成分类器来测试。
首先下载这个分类器:
保存到指定的目录下,导入分类器:
# 导入包
library(monocle3)
library(garnett)
# 导入现有分类器
classifier <- readRDS("D:/test/garnett/hsPBMC_20191017.RDS")
然后得到测试数据
# load in the data
# NOTE: the 'system.file' file name is only necessary to read in
# included package data
#
mat <- Matrix::readMM(system.file("extdata", "exprs_sparse.mtx", package = "garnett"))
fdata <- read.table(system.file("extdata", "fdata.txt", package = "garnett"))
pdata <- read.table(system.file("extdata", "pdata.txt", package = "garnett"),
sep="\t")
row.names(mat) <- row.names(fdata)
colnames(mat) <- row.names(pdata)
# create a new CDS object
pbmc_cds <- new_cell_data_set(as(mat, "dgCMatrix"),
cell_metadata = pdata,
gene_metadata = fdata)
pbmc_classifier <- classifier
library(org.Hs.eg.db)
pbmc_cds <- classify_cells(pbmc_cds, pbmc_classifier,
db = org.Hs.eg.db,
cluster_extend = TRUE,
cds_gene_id_type = "SYMBOL")
cluster_extend = TRUE
表示可拓展的分类。
classify_cells
函数参数详细说明如下:
cds
: This is the CDS object containing your gene expression data (see above).classifier
: This the the garnett_classifier you obtained above.db
:db
is a required argument for a Bioconductor AnnotationDb-class package used for converting gene IDs. For example, for humans use org.Hs.eg.db. See available packages at the Bioconductor website. Load your chosen db usinglibrary(db)
. If your species does not have an AnnotationDb-class package, see here.-
cluster_extend
: This tells Garnett whether to create a second set of assignments that expands classifications to cells in the same cluster. You can either provide cluster IDs in the pData table in a column titled "garnett_cluster", or you can let Garnett calculate the clusters and populate the column.Warning: if not providing a "garnett_cluster" column and settingcluster_extend
toTRUE
with a very large dataset, this function will slow down considerably. For convenience, Garnett will save the clusters it calculates to "garnett_cluster", so the function will be faster if run again. cds_gene_id_type
: This argument tells Garnett the format of the gene IDs in your CDS object. It should be one of the values incolumns(db)
. The default is "ENSEMBL".
最后看下结果:
head(pData(pbmc_cds))
然后看下细胞类型统计:
table(pData(pbmc_cds)$cell_type)
table(pData(pbmc_cds)$cluster_ext_type)
根据上面两个注释结果进行绘图:
library(ggplot2)
qplot(tsne_1, tsne_2, color = cell_type, data = as.data.frame(pData(pbmc_cds))) +
theme_bw()
qplot(tsne_1, tsne_2, color = cluster_ext_type, data = as.data.frame(pData(pbmc_cds))) +
theme_bw()
参考链接:
- 官方包的教程:https://cole-trapnell-lab.github.io/garnett/docs_m3/
- 现有的分类器:https://cole-trapnell-lab.github.io/garnett/classifiers/