新的一周,新的开始,今天我们来分享一个cellphoneDB的升级版,CellphoneDB的V3.0版本,升级的最大改进在于服务于空间转录组的通讯分析,会结合空间位置和生态位进行有效通讯的识别和判断,我们来看一下。
关于CellphoneDB,大家应该都不陌生,目前做通讯分析引用率最高的软件,之前的版本升级到2之前都是服务于单细胞的数据分析的,但是现在作者将软件升级到3,就是将空间转录组纳入分析,非常棒的想法和运用,参考文章在Mapping the temporal and spatial dynamics of the human endometrium in vivo and in vitro,2021年12月份发表于Nature Genetics,IF38.330,文献的内容我们下一篇再分享,这一片我们专注于CellphoneDB(V3.0)。
CellphoneDB(V3.0)的创新点
- 合并空间信息 CellPhoneDB 现在允许通过微环境文件合并细胞的空间信息。 这是一个两列文件,指示哪种细胞类型在哪个空间微环境中(参见示例见下图)。 CellphoneDB 将使用此信息来定义可能的交互细胞对(即在微环境中共享/共存的集群对)。 可以使用 cell2location(关于cell2location分享了多次,大家可以参考文章 10X单细胞和空间联合分析的方法---cell2location,10X单细胞空间联合分析之再次解读cell2location) 定义具有先验知识、成像或 Visium 分析的微环境。
cell_type | microenviroment |
---|---|
epi_Ciliated | Proliferative |
epi_Pre-ciliated | Proliferative |
epi_SOX9_LGR5 | Proliferative |
epi_SOX9_prolif | Proliferative |
epi_SOX9 | Proliferative |
Fibroblast eS | Proliferative |
Lymphoid | Proliferative |
Myeloid | Proliferative |
Fibroblast C7 | Proliferative |
epi_Ciliated | Secretory |
epi_Lumenal 1 | Secretory |
epi_Lumenal 2 | Secretory |
epi_Glandular | Secretory |
epi_Glandular_secretory | Secretory |
Fibroblast dS | Secretory |
Lymphoid | Secretory |
epi_SOX9 | Secretory |
Fibroblast C7 | Secretory |
Myeloid | Secretory |
- 添加了新的分析方法,使用差异表达基因 (DEG) 而不是 random shuffling(cellphonedb 方法 degs_analysis)。 这种方法将选择所有基因都由高于--阈值的一小部分细胞表达并且至少一个基因是 DEG 的相互作用。 可以使用喜欢的工具识别 DEG,并通过文本文件将信息提供给 CellphoneDB。 第一列应该是细胞类型/cluster,第二列应该是相关的基因 id。 其余列将被忽略(参见示例见下表)。 这里提供了为 Seurat 和 Scanpy 的参考示例。
- Database update WNT pathway has been further curated.
先来看看R版本的分析(联合Seurat)
library(Seurat)
library(SeuratObject)
library(Matrix)
so = readRDS('endometrium_example_counts_seurat.rds')
writeMM(so@assays$RNA@counts, file = 'endometrium_example_counts_mtx/matrix.mtx')
# save gene and cell names
write(x = rownames(so@assays$RNA@counts), file = "endometrium_example_counts_mtx/genes.tsv")
write(x = colnames(so@assays$RNA@counts), file = "endometrium_example_counts_mtx/barcodes.tsv")
[email protected]$Cell = rownames([email protected])
df = [email protected][, c('Cell', 'cell_type')]
write.table(df, file ='endometrium_example_meta.tsv', sep = '\t', quote = F, row.names = F)
## OPTION 1 - compute DEGs for all cell types
## Extract DEGs for each cell_type
# DEGs <- FindAllMarkers(so,
# test.use = 'LR',
# verbose = F,
# only.pos = T,
# random.seed = 1,
# logfc.threshold = 0.2,
# min.pct = 0.1,
# return.thresh = 0.05)
# OPTION 2 - optional - Re-compute hierarchical (per lineage) DEGs for Epithelial and Stromal lineages
DEGs = c()
for( lin in c('Epithelial', 'Stromal') ){
message('Computing DEGs within linage ', lin)
so_in_lineage = subset(so, cells = Cells(so)[ so$lineage == lin ] )
celltye_in_lineage = unique(so$cell_type[ so$lineage == lin ])
DEGs_lin = FindAllMarkers(so_in_lineage,
verbose = F,
only.pos = T,
random.seed = 1,
logfc.threshold = 0,
min.pct = 0.1,
return.thresh = 0.05)
DEGs = rbind(DEGs_lin, DEGs)
}
fDEGs = subset(DEGs, p_val_adj < 0.05 & avg_logFC > 0.1)
# 1st column = cluster; 2nd column = gene
fDEGs = fDEGs[, c('cluster', 'gene', 'p_val_adj', 'p_val', 'avg_logFC', 'pct.1', 'pct.2')]
write.table(fDEGs, file ='endometrium_example_DEGs.tsv', sep = '\t', quote = F, row.names = F)
Run cellphoneDB
cellphonedb method degs_analysis \
endometrium_example_meta.tsv \
endometrium_example_counts_mtx \
endometrium_example_DEGs.tsv \
--microenvs endometrium_example_microenviroments.tsv \ #optional
--counts-data hgnc_symbol \
--database database/database/cellphonedb_user_2021-06-29-11_41.db \
--threshold 0.1
python版本的分析
import numpy as np
import pandas as pd
import scanpy as sc
import anndata
import os
import sys
from scipy import sparse
sc.settings.verbosity = 1 # verbosity: errors (0), warnings (1), info (2), hints (3)
sys.executable
adata = sc.read('endometrium_example_counts.h5ad')
df_meta = pd.DataFrame(data={'Cell':list(adata.obs.index),
'cell_type':[ i for i in adata.obs['cell_type']]
})
df_meta.set_index('Cell', inplace=True)
df_meta.to_csv('endometrium_example_meta.tsv', sep = '\t')
# Conver to dense matrix for Seurat
adata.X = adata.X.toarray()
import rpy2.rinterface_lib.callbacks
import logging
# Ignore R warning messages
#Note: this can be commented out to get more verbose R output
rpy2.rinterface_lib.callbacks.logger.setLevel(logging.ERROR)
import anndata2ri
anndata2ri.activate()
%load_ext rpy2.ipython
%%R -o DEGs
library(Seurat)
so = as.Seurat(adata, counts = "X", data = "X")
Idents(so) = so$cell_type
## OPTION 1 - compute DEGs for all cell types
## Extract DEGs for each cell_type
# DEGs <- FindAllMarkers(so,
# test.use = 'LR',
# verbose = F,
# only.pos = T,
# random.seed = 1,
# logfc.threshold = 0.2,
# min.pct = 0.1,
# return.thresh = 0.05)
# OPTION 2 - optional - Re-compute hierarchical (per lineage) DEGs for Epithelial and Stromal lineages
DEGs = c()
for( lin in c('Epithelial', 'Stromal') ){
message('Computing DEGs within linage ', lin)
so_in_lineage = subset(so, cells = Cells(so)[ so$lineage == lin ] )
celltye_in_lineage = unique(so$cell_type[ so$lineage == lin ])
DEGs_lin = FindAllMarkers(so_in_lineage,
test.use = 'LR',
verbose = F,
only.pos = T,
random.seed = 1,
logfc.threshold = 0.2,
min.pct = 0.1,
return.thresh = 0.05)
DEGs = rbind(DEGs_lin, DEGs)
}
cond1 = DEGs['p_val_adj'] < 0.05
cond2 = DEGs['avg_log2FC'] > 0.1
mask = [all(tup) for tup in zip(cond1, cond2)]
fDEGs = DEGs[mask]
# 1st column = cluster; 2nd column = gene
fDEGs = fDEGs[['cluster', 'gene', 'p_val_adj', 'p_val', 'avg_log2FC', 'pct.1', 'pct.2']]
fDEGs.to_csv('endometrium_example_DEGs.tsv', index=False, sep='\t')
Run cellphoneDB
cellphonedb method degs_analysis \
endometrium_example_meta.tsv \
endometrium_example_counts.h5ad \
endometrium_example_DEGs.tsv \
--microenvs endometrium_example_microenviroments.tsv \
--counts-data hgnc_symbol \
--database database/database/cellphonedb_user_2021-06-29-11_41.db \
--threshold 0.1
至于cellphoneDB的绘图操作,大家可以参考文章空间通讯分析章节2,软件链接在CellphoneDB.
生活很好,有你更好