hello,大家好,今天周一,我们不分享那些算法思路什么的,分享一点简单的可视化,专门针对cellphoneDB的结果进行可视化,不知道为什么,一直想起吕子乔的语录,谁年轻时没喜欢过几个人渣,
关于cellphoneDB,大家绝对低估了其价值,之前分享的有关cellphoneDB的重要内容分享在这里,供大家参考。
10X单细胞(10X空间转录组)细胞通讯的一个小小的细节
考虑空间位置的通讯分析手段---CellphoneDB(V3.0)
10X单细胞(10X空间转录组)空间相关性分析和cellphoneDB与NicheNet联合进行细胞通讯分析
关于cellphoneDB,目前升级到了3.0,已经用于分析生态位的通讯分析,当然官网也有很多的介绍,包括考虑复合物等等,也可以用自己定义的配受体数据集,算法上也算是非常经典,对结果文件也有解释,放在下面,供大家查阅。
User-defined receptor-ligand datasets
Our system allows users to create their own lists of curated proteins and complexes. In order to do so, the format of the users’ lists must be compatible with the input files. Users can submit their lists using the Python package version of CellPhoneDB, and then send them via email, the cellphonedb.org form, or a pull request to the CellPhoneDB data repository (https://github.com/Teichlab/cellphonedb-data).
Database structure
Information is stored in an SQLite relational database (https://www.sqlite.org). SQLAlchemy (www.sqlalchemy.org) and Python 3 were used to build the database structure and the query logic. The application is designed to allow analysis on potentially large count matrices to be performed in parallel. This requires an efficient database design, including optimisation for query times, indices and related strategies. All application code is open source and uploaded both to github and the web server.
Statistical inference of receptor-ligand specificity(显著性)
To assess cellular crosstalk between different cell types, we use our repository in a statistical framework for inferring cell–cell communication networks from single-cell transcriptome data. We predict enriched receptor–ligand interactions between two cell types based on expression of a receptor by one cell type and a ligand by another cell type, using scRNA-seq data. To identify the most relevant interactions between cell types, we look for the cell-type specific interactions between ligands and receptors. Only receptors and ligands expressed in more than a user-specified threshold percentage of the cells in the specific cluster are considered significant (default is 0.1).
We then perform pairwise comparisons between all cell types. First, we randomly permute the cluster labels of all cells (1,000 times as a default) and determine the mean of the average receptor expression level in a cluster and the average ligand expression level in the interacting cluster. For each receptor–ligand pair in each pairwise comparison between two cell types, this generates a null distribution. By calculating the proportion of the means which are as or higher than the actual mean, we obtain a p-value for the likelihood of cell-type specificity of a given receptor–ligand complex. We then prioritize interactions that are highly enriched between cell types based on the number of significant pairs, so that the user can manually select biologically relevant ones. For the multi-subunit heteromeric complexes, we require that all subunits of the complex are expressed (using a user-specified threshold), and therefore we use the member of the complex with the minimum average expression to perform the random shuffling.
Cell subsampling for accelerating analyses(下采样)
Technological developments and protocol improvements have enabled an exponential growth of the number of cells obtained from scRNA-seq experiments. Large-scale datasets can profile hundreds of thousands cells, which presents a challenge for the existing analysis methods in terms of both memory usage and runtime. In order to improve the speed and efficiency of our protocol and facilitate its broad accessibility, we integrated subsampling as described in Hie et al.28. This "geometric sketching" approach aims to maintain the transcriptomic heterogeneity within a dataset with a smaller subset of cells. The subsampling step is optional, enabling users to perform the analysis either on all cells, or with other subsampling methods of their choice.
结果文件的解释
Table. Description of the output files means.csv, pvalues.csv, significant_means.csv and relevant_interactions.txt
Identifier | Definition | Output file | Example |
---|---|---|---|
id_cp_interaction | Unique CellPhoneDB identifier for each interaction stored in the database. | means.csv; pvalues.csv; significant_means.csv | CPI-SS096F3E0F2 |
interacting_pair | Name of the interacting pairs separated by "|". | means.csv; pvalues.csv; significant_means.csv | JAG2|NOTCH4 |
partner A or B | Identifier for the first interacting partner (A) or the second (B). It could be: UniProt (prefix simple:) or complex (prefix complex:) | means.csv; pvalues.csv; significant_means.csv | simple:Q9Y219 |
gene A or B | Gene identifier for the first interacting partner (A) or the second (B). The identifier will depend on the input user list. | means.csv; pvalues.csv; significant_means.csv | ENSG00000184916 |
secreted | True if one of the partners is secreted. | means.csv; pvalues.csv; significant_means.csv | FALSE |
Receptor A or B | True if the first interacting partner (A) or the second (B) is annotated as a receptor in our database. | means.csv; pvalues.csv; significant_means.csv | FALSE |
annotation_strategy | Curated if the interaction was annotated by the CellPhoneDB developers. Otherwise, the name of the database where the interaction has been downloaded from. | means.csv; pvalues.csv; significant_means.csv | curated |
is_integrin | True if one of the partners is integrin. | means.csv; pvalues.csv; significant_means.csv | FALSE |
rank | Total number of significant p-values for each interaction divided by the number of cell type-cell type comparisons. | significant_means.csv | 0.25 |
means | Mean values for all the interacting partners: mean value refers to the total mean of the individual partner average expression values in the corresponding interacting pairs of cell types. If one of the mean values is 0, then the total mean is set to 0. | means.csv | 0.53 |
p.values | p-values for the all the interacting partners: p.value refers to the enrichment of the interacting ligand-receptor pair in each of the interacting pairs of cell types. | pvalues.csv | 0.01 |
significant_mean | Significant mean calculation for all the interacting partners. If p.value < 0.05, the value will be the mean. Alternatively, the value is set to 0. | significant_means.csv | 0.53 |
relevant_interactions | Indicates if the interaction is relevant (1) or not (0). If a gene in the interaction is a DEG, and all the participants are expressed, the interaction will be classified as relevant. Alternatively, the value is set to 0. | relevant_interactions.txt | 1 or 0 |
Table. Description of the output file deconvoluted.csv
Identifier | Definition | Output file | Example |
---|---|---|---|
gene_name | Gene identifier for one of the subunits that are participating in the interaction defined in "means.csv" file. The identifier will depend on the input of the user list. | deconvoluted.csv | JAG2 |
uniprot | UniProt identifier for one of the subunits that are participating in the interaction defined in "means.csv" file. | deconvoluted.csv | Q9Y219 |
is_complex | True if the subunit is part of a complex. Single if it is not, complex if it is. | deconvoluted.csv | FALSE |
protein_name | Protein name for one of the subunits that are participating in the interaction defined in "means.csv" file. | deconvoluted.csv | JAG2_HUMAN |
complex_name | Complex name if the subunit is part of a complex. Empty if not. | deconvoluted.csv | a10b1 complex |
id_cp_interaction | Unique CellPhoneDB identifier for each of the interactions stored in the database. | deconvoluted.csv | CPI-SS0DB3F5A37 |
mean | Mean expression of the corresponding gene in each cluster. | 0.9 |
最后软件默认的可视化
可视化进阶,这里介绍一个专门为cellphoneDB分析结果配套的可视化方法,软件在ktplots,这个软件的作者同时开发了BCR的集成分析软件dandelion,大家可以参考我的文章10X单细胞(10X空间转录组)BCR(TCR)数据分析之(11)dandelion
安装
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
devtools::install_github('zktuong/ktplots', dependencies = TRUE)
library(Seurat)
data(kidneyimmune)
library(ktplots)
plot_cpdb
This function seems like it's the most popular so I moved it up! Please see below for alternative visualisation options.
Generates a dot plot after CellPhoneDB analysis via specifying the query celltypes and genes. The difference compared to the original cellphonedb plot is that this is totally customizable!
The plotting is largely determined by the format of the meta file provided to CellPhoneDB analysis.
For the split.by option to work, the annotation in the meta file must be defined in the following format:
{split.by}_{idents}
so to set up an example vector, it would be something like:
annotation <- paste0(kidneyimmune$Experiment, '_', kidneyimmune$celltype)
To run, you will need to load in the means.txt and pvals.txt from the analysis. If you are using results from cellphonedb version 3, the pvalues.txt is relevant_interactions.txt and also add version3 = TRUE into all the functions below.
# pvals <- read.delim("pvalues.txt", check.names = FALSE)
# means <- read.delim("means.txt", check.names = FALSE)
# I've provided an example dataset
data(cpdb_output)
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
idents = 'celltype', # column name where the cell ids are located in the metadata
split.by = 'Experiment', # column name where the grouping column is. Optional.
means = means, pvals = pvals,
genes = c("XCR1", "CXCL10", "CCL5")) +
small_axis(fontsize = 3) + small_grid() + small_guide() + small_legend(fontsize = 2) # some helper functions included in ktplots to help with the plotting
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
idents = 'celltype', means = means, pvals = pvals, split.by = 'Experiment',
gene.family = 'chemokines') + small_guide() + small_axis() + small_legend(keysize=.5)
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
idents = 'celltype', means = means, pvals = pvals, split.by = 'Experiment',
gene.family = 'chemokines', col_option = "maroon", highlight = "blue") + small_guide() + small_axis() + small_legend(keysize=.5)
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
idents = 'celltype', means = means, pvals = pvals, split.by = 'Experiment',
gene.family = 'chemokines', col_option = viridis::cividis(50)) + small_guide() + small_axis() + small_legend(keysize=.5)
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
idents = 'celltype', means = means, pvals = pvals, split.by = 'Experiment',
gene.family = 'chemokines', noir = TRUE) + small_guide() + small_axis() + small_legend(keysize=.5)
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
idents = 'celltype', means = means, pvals = pvals, split.by = 'Experiment',
gene.family = 'chemokines', default_style = FALSE) + small_guide() + small_axis() + small_legend(keysize=.5)
plot_cpdb2
library(ktplots)
data(kidneyimmune)
data(cpdb_output2)
sce <- Seurat::as.SingleCellExperiment(kidneyimmune)
p <- plot_cpdb2(cell_type1 = 'B cell', cell_type2 = 'CD4T cell',
scdata = sce,
idents = 'celltype', # column name where the cell ids are located in the metadata
means = means2,
pvals = pvals2,
deconvoluted = decon2, # new options from here on specific to plot_cpdb2
desiredInteractions = list(
c('CD4T cell', 'B cell'),
c('B cell', 'CD4T cell')),
interaction_grouping = interaction_annotation,
edge_group_colors = c(
"Activating" = "#e15759",
"Chemotaxis" = "#59a14f",
"Inhibitory" = "#4e79a7",
"Intracellular trafficking" = "#9c755f",
"DC_development" = "#B07aa1",
"Unknown" = "#e7e7e7"
),
node_group_colors = c(
"CD4T cell" = "red",
"B cell" = "blue"),
keep_significant_only = TRUE,
standard_scale = TRUE,
remove_self = TRUE
)
p
# code example but not using the example datasets
library(SingleCellExperiment)
library(reticulate)
library(ktplots)
ad=import('anndata')
adata = ad$read_h5ad('rna.h5ad')
counts <- Matrix::t(adata$X)
row.names(counts) <- row.names(adata$var)
colnames(counts) <- row.names(adata$obs)
sce <- SingleCellExperiment(list(counts = counts), colData = adata$obs, rowData = adata$var)
means <- read.delim('out/means.txt', check.names = FALSE)
pvalues <- read.delim('out/pvalues.txt', check.names = FALSE)
deconvoluted <- read.delim('out/deconvoluted.txt', check.names = FALSE)
interaction_grouping <- read.delim('interactions_groups.txt')
# > head(interaction_grouping)
# interaction role
# 1 ALOX5_ALOX5AP Activating
# 2 ANXA1_FPR1 Inhibitory
# 3 BTLA_TNFRSF14 Inhibitory
# 4 CCL5_CCR5 Chemotaxis
# 5 CD2_CD58 Activating
# 6 CD28_CD86 Activating
test <- plot_cpdb2(cell_type1 = "CD4_Tem|CD4_Tcm|CD4_Treg", # same usage style as plot_cpdb
cell_type2 = "cDC",
idents = 'fine_clustering',
split.by = 'treatment_group_1',
scdata = sce,
means = means,
pvals = pvalues,
deconvoluted = deconvoluted, # new options from here on specific to plot_cpdb2
gene_symbol_mapping = 'index', # column name in rowData holding the actual gene symbols if the row names is ENSG Ids. Might be a bit buggy
desiredInteractions = list(c('CD4_Tcm', 'cDC1'), c('CD4_Tcm', 'cDC2'), c('CD4_Tem', 'cDC1'), c('CD4_Tem', 'cDC2 '), c('CD4_Treg', 'cDC1'), c('CD4_Treg', 'cDC2')),
interaction_grouping = interaction_grouping,
edge_group_colors = c("Activating" = "#e15759", "Chemotaxis" = "#59a14f", "Inhibitory" = "#4e79a7", " Intracellular trafficking" = "#9c755f", "DC_development" = "#B07aa1"),
node_group_colors = c("CD4_Tcm" = "#86bc86", "CD4_Tem" = "#79706e", "CD4_Treg" = "#ff7f0e", "cDC1" = "#bcbd22" ,"cDC2" = "#17becf"),
keep_significant_only = TRUE,
standard_scale = TRUE,
remove_self = TRUE)
compare_cpdb/plot_compare_cpdb
data(covid_sample_metadata)
data(covid_cpdb_meta)
file <- system.file("extdata", "covid_cpdb.tar.gz", package = "ktplots")
# copy and unpack wherever you want this to end up
file.copy(file, ".")
system("tar -xzf covid_cpdb.tar.gz")
It requires 1) an input table like so:
covid_cpdb_meta
sample_id cellphonedb_folder sce_file
1 MH9143325 covid_cpdb/MH9143325/out covid_cpdb/MH9143325/sce.rds
2 MH9143320 covid_cpdb/MH9143320/out covid_cpdb/MH9143320/sce.rds
3 MH9143274 covid_cpdb/MH9143274/out covid_cpdb/MH9143274/sce.rds
4 MH8919226 covid_cpdb/MH8919226/out covid_cpdb/MH8919226/sce.rds
5 MH8919227 covid_cpdb/MH8919227/out covid_cpdb/MH8919227/sce.rds
6 newcastle49 covid_cpdb/newcastle49/out covid_cpdb/newcastle49/sce.rds
7 MH9179822 covid_cpdb/MH9179822/out covid_cpdb/MH9179822/sce.rds
8 MH8919178 covid_cpdb/MH8919178/out covid_cpdb/MH8919178/sce.rds
9 MH8919177 covid_cpdb/MH8919177/out covid_cpdb/MH8919177/sce.rds
10 MH8919176 covid_cpdb/MH8919176/out covid_cpdb/MH8919176/sce.rds
11 MH8919179 covid_cpdb/MH8919179/out covid_cpdb/MH8919179/sce.rds
12 MH9179826 covid_cpdb/MH9179826/out covid_cpdb/MH9179826/sce.rds
where each row is a sample, the location of the out folder generated by cellphonedb and a single-cell object used to generate the cellphonedb result. If cellphonedb was ran with a .h5ad the sce_file would be the path to the .h5ad file.
and 2) a sample metadata data frame for the tests to run:
covid_sample_metadata
sample_id Status_on_day_collection_summary
MH9143325 MH9143325 Severe
MH9143320 MH9143320 Severe
MH9143274 MH9143274 Severe
MH8919226 MH8919226 Healthy
MH8919227 MH8919227 Healthy
newcastle49 newcastle49 Severe
MH9179822 MH9179822 Severe
MH8919178 MH8919178 Healthy
MH8919177 MH8919177 Healthy
MH8919176 MH8919176 Healthy
MH8919179 MH8919179 Healthy
MH9179826 MH9179826 Severe
重点So if I want to compare between Severe vs Healthy, I would specify the function as follows:
## set up the levels
covid_sample_metadata$Status_on_day_collection_summary <- factor(covid_sample_metadata$Status_on_day_collection_summary, levels = c('Healthy', 'Severe'))
out <- compare_cpdb(cpdb_meta = covid_cpdb_meta,
sample_metadata = covid_sample_metadata,
celltypes = c("B_cell", "CD14", "CD16", "CD4", "CD8", "DCs", "MAIT", "NK_16hi", "NK_56hi", "Plasmablast", "Platelets", "Treg", "gdT", "pDC"), # the actual celltypes you want to test
celltype_col = "initial_clustering", # the column that holds the cell type annotation in the sce object
groupby = "Status_on_day_collection_summary") # the column in the sample_metadata that holds the column where you want to do the comparison. In this example, it's Severe vs Healthy
This returns a list of dataframes (for each contrast found) with which you can use to plot the results.
plot_compare_cpdb is a simple function to achieve that but you can always just make a custom plotting function based on what you want.
plot_compare_cpdb(out) # red is significantly increased in Severe compared to Healthy.
If there are multiple contrasts and groups, you can facet the plot by specifying groups = c('group1', 'group2')
# let's mock up a new contrast like this
covid_sample_metadata$Status_on_day_collection_summary <- c(rep('Severe', 3), rep('Healthy', 2), rep('notSevere', 2), rep('Healthy', 4), 'notSevere')
out <- compare_cpdb(cpdb_meta = covid_cpdb_meta,
sample_metadata = covid_sample_metadata,
celltypes = c("B_cell", "CD14", "CD16", "CD4", "CD8", "DCs", "MAIT", "NK_16hi", "NK_56hi", "Plasmablast", "Platelets", "Treg", "gdT", "pDC"), # the actual celltypes you want to test
celltype_col = "initial_clustering", # the column that holds the cell type annotation in the sce object
groupby = "Status_on_day_collection_summary")
plot_compare_cpdb(out, alpha = .1, groups = names(out)) # there's no significant hit at 0.05 in this dummy example
The default method uses a pairwise wilcox.test. Alternatives are pairwise Welch's t.test or a linear mixed model with lmer.
To run the linear mixed effect model, it expects that the input data is paired (i.e an individual with multiple samples corresponding to multiple timepoints):
# just as a dummy example, lets say the samples are matched where there are two samples per individual
covid_sample_metadata$individual <- rep(c("A", "B", "C", "D", "E", "F"), 2)
# actually run it
out <- compare_cpdb(cpdb_meta = covid_cpdb_meta,
sample_metadata = covid_sample_metadata,
celltypes = c("B_cell", "CD14", "CD16", "CD4", "CD8", "DCs", "MAIT", "NK_16hi",
"NK_56hi", "Plasmablast", "Platelets", "Treg", "gdT", "pDC"),
celltype_col = "initial_clustering",
groupby = "Status_on_day_collection_summary",
formula = "~ Status_on_day_collection_summary + (1|individual)", # formula passed to lmer
method = "lmer")
plot_compare_cpdb(out, contrast = 'Status_on_day_collection_summarySevere') # use the colnames(out) to pick the right column.
Specifying cluster = TRUE will move the rows and columns to make it look a bit more ordered.
plot_compare_cpdb(out, contrast = 'Status_on_day_collection_summarySevere', cluster = TRUE)
其他绘图功能
# Note, this conflicts with tidyr devel version
geneDotPlot(scdata = kidneyimmune, # object
genes = c("CD68", "CD80", "CD86", "CD74", "CD2", "CD5"), # genes to plot
idents = "celltype", # column name in meta data that holds the cell-cluster ID/assignment
split.by = 'Project', # column name in the meta data that you want to split the plotting by. If not provided, it will just plot according to idents
standard_scale = TRUE) + # whether to scale expression values from 0 to 1. See ?geneDotPlot for other options
theme(strip.text.x = element_text(angle=0, hjust = 0, size =7)) + small_guide() + small_legend()
correlationSpot(空间相关性通讯分析)
library(ggplot2)
scRNAseq <- Seurat::SCTransform(scRNAseq, verbose = FALSE) %>% Seurat::RunPCA(., verbose = FALSE) %>% Seurat::RunUMAP(., dims = 1:30, verbose = FALSE)
anchors <- Seurat::FindTransferAnchors(reference = scRNAseq, query = spatial, normalization.method = "SCT")
predictions.assay <- Seurat::TransferData(anchorset = anchors, refdata = scRNAseq$label, dims = 1:30, prediction.assay = TRUE, weight.reduction = spatial[["pca"]])
spatial[["predictions"]] <- predictions.assay
Seurat::DefaultAssay(spatial) <- "predictions"
Seurat::DefaultAssay(spatial) <- 'SCT'
pa <- Seurat::SpatialFeaturePlot(spatial, features = c('Tnfsf13b', 'Cd79a'), pt.size.factor = 1.6, ncol = 2, crop = TRUE) + viridis::scale_fill_viridis()
Seurat::DefaultAssay(spatial) <- 'predictions'
pb <- Seurat::SpatialFeaturePlot(spatial, features = 'Group1-3', pt.size.factor = 1.6, ncol = 2, crop = TRUE) + viridis::scale_fill_viridis()
p1 <- correlationSpot(spatial, genes = c('Tnfsf13b', 'Cd79a'), celltypes = 'Group1-3', pt.size.factor = 1.6, ncol = 2, crop = TRUE) + scale_fill_gradientn( colors = rev(RColorBrewer::brewer.pal(12, 'Spectral')),limits = c(-1, 1))
p2 <- correlationSpot(spatial, genes = c('Tnfsf13b', 'Cd79a'), celltypes = 'Group1-3', pt.size.factor = 1.6, ncol = 2, crop = TRUE, average_by_cluster = TRUE) + scale_fill_gradientn(colors = rev(RColorBrewer::brewer.pal(12, 'Spectral')),limits = c(-1, 1)) + ggtitle('correlation averaged across clusters')
cowplot::plot_grid(pa, pb, p1, p2, ncol = 2)
好像改进的余地还挺多。
StackedVlnPlot
features <- c("CD79A", "MS4A1", "CD8A", "CD8B", "LYZ", "LGALS3", "S100A8", "GNLY", "NKG7", "KLRB1", "FCGR3A", "FCER1A", "CST3")
StackedVlnPlot(kidneyimmune, features = features) + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))
rainCloudPlot(data = [email protected], groupby = "celltype", parameter = "n_counts") + coord_flip()
生活很好,有你更好,百度文库出现了大量抄袭我的文章,对此我深表无奈,我写的文章,别人挂上去赚钱,抄袭可耻,挂到百度文库的人更可耻