10X单细胞 && 10X空间转录组分析之cellphoneDB可视化进阶

hello,大家好,今天周一,我们不分享那些算法思路什么的,分享一点简单的可视化,专门针对cellphoneDB的结果进行可视化,不知道为什么,一直想起吕子乔的语录,谁年轻时没喜欢过几个人渣,

关于cellphoneDB,大家绝对低估了其价值,之前分享的有关cellphoneDB的重要内容分享在这里,供大家参考。

10X单细胞(10X空间转录组)细胞通讯的一个小小的细节

考虑空间位置的通讯分析手段---CellphoneDB(V3.0)

10X单细胞(10X空间转录组)空间相关性分析和cellphoneDB与NicheNet联合进行细胞通讯分析

关于cellphoneDB,目前升级到了3.0,已经用于分析生态位的通讯分析,当然官网也有很多的介绍,包括考虑复合物等等,也可以用自己定义的配受体数据集,算法上也算是非常经典,对结果文件也有解释,放在下面,供大家查阅。

User-defined receptor-ligand datasets

Our system allows users to create their own lists of curated proteins and complexes. In order to do so, the format of the users’ lists must be compatible with the input files. Users can submit their lists using the Python package version of CellPhoneDB, and then send them via email, the cellphonedb.org form, or a pull request to the CellPhoneDB data repository (https://github.com/Teichlab/cellphonedb-data).

Database structure

Information is stored in an SQLite relational database (https://www.sqlite.org). SQLAlchemy (www.sqlalchemy.org) and Python 3 were used to build the database structure and the query logic. The application is designed to allow analysis on potentially large count matrices to be performed in parallel. This requires an efficient database design, including optimisation for query times, indices and related strategies. All application code is open source and uploaded both to github and the web server.

Statistical inference of receptor-ligand specificity(显著性)

To assess cellular crosstalk between different cell types, we use our repository in a statistical framework for inferring cell–cell communication networks from single-cell transcriptome data. We predict enriched receptor–ligand interactions between two cell types based on expression of a receptor by one cell type and a ligand by another cell type, using scRNA-seq data. To identify the most relevant interactions between cell types, we look for the cell-type specific interactions between ligands and receptors. Only receptors and ligands expressed in more than a user-specified threshold percentage of the cells in the specific cluster are considered significant (default is 0.1).

We then perform pairwise comparisons between all cell types. First, we randomly permute the cluster labels of all cells (1,000 times as a default) and determine the mean of the average receptor expression level in a cluster and the average ligand expression level in the interacting cluster. For each receptor–ligand pair in each pairwise comparison between two cell types, this generates a null distribution. By calculating the proportion of the means which are as or higher than the actual mean, we obtain a p-value for the likelihood of cell-type specificity of a given receptor–ligand complex. We then prioritize interactions that are highly enriched between cell types based on the number of significant pairs, so that the user can manually select biologically relevant ones. For the multi-subunit heteromeric complexes, we require that all subunits of the complex are expressed (using a user-specified threshold), and therefore we use the member of the complex with the minimum average expression to perform the random shuffling.

Cell subsampling for accelerating analyses(下采样)

Technological developments and protocol improvements have enabled an exponential growth of the number of cells obtained from scRNA-seq experiments. Large-scale datasets can profile hundreds of thousands cells, which presents a challenge for the existing analysis methods in terms of both memory usage and runtime. In order to improve the speed and efficiency of our protocol and facilitate its broad accessibility, we integrated subsampling as described in Hie et al.28. This "geometric sketching" approach aims to maintain the transcriptomic heterogeneity within a dataset with a smaller subset of cells. The subsampling step is optional, enabling users to perform the analysis either on all cells, or with other subsampling methods of their choice.

结果文件的解释

Table. Description of the output files means.csv, pvalues.csv, significant_means.csv and relevant_interactions.txt
Identifier Definition Output file Example
id_cp_interaction Unique CellPhoneDB identifier for each interaction stored in the database. means.csv; pvalues.csv; significant_means.csv CPI-SS096F3E0F2
interacting_pair Name of the interacting pairs separated by "|". means.csv; pvalues.csv; significant_means.csv JAG2|NOTCH4
partner A or B Identifier for the first interacting partner (A) or the second (B). It could be: UniProt (prefix simple:) or complex (prefix complex:) means.csv; pvalues.csv; significant_means.csv simple:Q9Y219
gene A or B Gene identifier for the first interacting partner (A) or the second (B). The identifier will depend on the input user list. means.csv; pvalues.csv; significant_means.csv ENSG00000184916
secreted True if one of the partners is secreted. means.csv; pvalues.csv; significant_means.csv FALSE
Receptor A or B True if the first interacting partner (A) or the second (B) is annotated as a receptor in our database. means.csv; pvalues.csv; significant_means.csv FALSE
annotation_strategy Curated if the interaction was annotated by the CellPhoneDB developers. Otherwise, the name of the database where the interaction has been downloaded from. means.csv; pvalues.csv; significant_means.csv curated
is_integrin True if one of the partners is integrin. means.csv; pvalues.csv; significant_means.csv FALSE
rank Total number of significant p-values for each interaction divided by the number of cell type-cell type comparisons. significant_means.csv 0.25
means Mean values for all the interacting partners: mean value refers to the total mean of the individual partner average expression values in the corresponding interacting pairs of cell types. If one of the mean values is 0, then the total mean is set to 0. means.csv 0.53
p.values p-values for the all the interacting partners: p.value refers to the enrichment of the interacting ligand-receptor pair in each of the interacting pairs of cell types. pvalues.csv 0.01
significant_mean Significant mean calculation for all the interacting partners. If p.value < 0.05, the value will be the mean. Alternatively, the value is set to 0. significant_means.csv 0.53
relevant_interactions Indicates if the interaction is relevant (1) or not (0). If a gene in the interaction is a DEG, and all the participants are expressed, the interaction will be classified as relevant. Alternatively, the value is set to 0. relevant_interactions.txt 1 or 0
Table. Description of the output file deconvoluted.csv
Identifier Definition Output file Example
gene_name Gene identifier for one of the subunits that are participating in the interaction defined in "means.csv" file. The identifier will depend on the input of the user list. deconvoluted.csv JAG2
uniprot UniProt identifier for one of the subunits that are participating in the interaction defined in "means.csv" file. deconvoluted.csv Q9Y219
is_complex True if the subunit is part of a complex. Single if it is not, complex if it is. deconvoluted.csv FALSE
protein_name Protein name for one of the subunits that are participating in the interaction defined in "means.csv" file. deconvoluted.csv JAG2_HUMAN
complex_name Complex name if the subunit is part of a complex. Empty if not. deconvoluted.csv a10b1 complex
id_cp_interaction Unique CellPhoneDB identifier for each of the interactions stored in the database. deconvoluted.csv CPI-SS0DB3F5A37
mean Mean expression of the corresponding gene in each cluster. 0.9

最后软件默认的可视化

图片.png

可视化进阶,这里介绍一个专门为cellphoneDB分析结果配套的可视化方法,软件在ktplots,这个软件的作者同时开发了BCR的集成分析软件dandelion,大家可以参考我的文章10X单细胞(10X空间转录组)BCR(TCR)数据分析之(11)dandelion

安装

if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
devtools::install_github('zktuong/ktplots', dependencies = TRUE)
library(Seurat)
data(kidneyimmune)
library(ktplots)

plot_cpdb

This function seems like it's the most popular so I moved it up! Please see below for alternative visualisation options.

Generates a dot plot after CellPhoneDB analysis via specifying the query celltypes and genes. The difference compared to the original cellphonedb plot is that this is totally customizable!

The plotting is largely determined by the format of the meta file provided to CellPhoneDB analysis.

For the split.by option to work, the annotation in the meta file must be defined in the following format:

{split.by}_{idents}

so to set up an example vector, it would be something like:

annotation <- paste0(kidneyimmune$Experiment, '_', kidneyimmune$celltype)

To run, you will need to load in the means.txt and pvals.txt from the analysis. If you are using results from cellphonedb version 3, the pvalues.txt is relevant_interactions.txt and also add version3 = TRUE into all the functions below.

# pvals <- read.delim("pvalues.txt", check.names = FALSE)
# means <- read.delim("means.txt", check.names = FALSE)

# I've provided an example dataset
data(cpdb_output)
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
    idents = 'celltype', # column name where the cell ids are located in the metadata
    split.by = 'Experiment', # column name where the grouping column is. Optional.
    means = means, pvals = pvals,
    genes = c("XCR1", "CXCL10", "CCL5")) +
small_axis(fontsize = 3) + small_grid() + small_guide() + small_legend(fontsize = 2) # some helper functions included in ktplots to help with the plotting
图片.png
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
    idents = 'celltype', means = means, pvals = pvals, split.by = 'Experiment',
    gene.family = 'chemokines') + small_guide() + small_axis() + small_legend(keysize=.5)
图片.png
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
    idents = 'celltype', means = means, pvals = pvals, split.by = 'Experiment',
    gene.family = 'chemokines', col_option = "maroon", highlight = "blue") + small_guide() + small_axis() + small_legend(keysize=.5)
图片.png
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
    idents = 'celltype', means = means, pvals = pvals, split.by = 'Experiment',
    gene.family = 'chemokines', col_option = viridis::cividis(50)) + small_guide() + small_axis() + small_legend(keysize=.5)
图片.png
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
    idents = 'celltype', means = means, pvals = pvals, split.by = 'Experiment',
    gene.family = 'chemokines', noir = TRUE) + small_guide() + small_axis() + small_legend(keysize=.5)
图片.png
plot_cpdb(cell_type1 = 'B cell', cell_type2 = 'CD4T cell', scdata = kidneyimmune,
    idents = 'celltype', means = means, pvals = pvals, split.by = 'Experiment',
    gene.family = 'chemokines', default_style = FALSE) + small_guide() + small_axis() + small_legend(keysize=.5)
图片.png

plot_cpdb2

library(ktplots)
data(kidneyimmune)
data(cpdb_output2)

sce <- Seurat::as.SingleCellExperiment(kidneyimmune)
p <- plot_cpdb2(cell_type1 = 'B cell', cell_type2 = 'CD4T cell',
    scdata = sce,
    idents = 'celltype', # column name where the cell ids are located in the metadata
    means = means2,
    pvals = pvals2,
    deconvoluted = decon2, # new options from here on specific to plot_cpdb2
    desiredInteractions = list(
        c('CD4T cell', 'B cell'),
        c('B cell', 'CD4T cell')),
    interaction_grouping = interaction_annotation,
    edge_group_colors = c(
        "Activating" = "#e15759",
        "Chemotaxis" = "#59a14f",
        "Inhibitory" = "#4e79a7",
        "Intracellular trafficking" = "#9c755f",
        "DC_development" = "#B07aa1",
        "Unknown" = "#e7e7e7"
        ),
    node_group_colors = c(
        "CD4T cell" = "red",
        "B cell" = "blue"),
    keep_significant_only = TRUE,
    standard_scale = TRUE,
    remove_self = TRUE
    )
p
图片.png
# code example but not using the example datasets
library(SingleCellExperiment)
library(reticulate)
library(ktplots)
ad=import('anndata')

adata = ad$read_h5ad('rna.h5ad')
counts <- Matrix::t(adata$X)
row.names(counts) <- row.names(adata$var)
colnames(counts) <- row.names(adata$obs)
sce <- SingleCellExperiment(list(counts = counts), colData = adata$obs, rowData = adata$var)

means <- read.delim('out/means.txt', check.names = FALSE)
pvalues <- read.delim('out/pvalues.txt', check.names = FALSE)
deconvoluted <- read.delim('out/deconvoluted.txt', check.names = FALSE)
interaction_grouping <- read.delim('interactions_groups.txt')
# > head(interaction_grouping)
#     interaction       role
# 1 ALOX5_ALOX5AP Activating
# 2    ANXA1_FPR1 Inhibitory
# 3 BTLA_TNFRSF14 Inhibitory
# 4     CCL5_CCR5 Chemotaxis
# 5      CD2_CD58 Activating
# 6     CD28_CD86 Activating

test <- plot_cpdb2(cell_type1 = "CD4_Tem|CD4_Tcm|CD4_Treg", # same usage style as plot_cpdb
    cell_type2 = "cDC",
    idents = 'fine_clustering',
    split.by = 'treatment_group_1',
    scdata = sce,
    means = means,
    pvals = pvalues,
    deconvoluted = deconvoluted, # new options from here on specific to plot_cpdb2
    gene_symbol_mapping = 'index', # column name in rowData holding the actual gene symbols if the row names is ENSG Ids. Might be a bit buggy
    desiredInteractions = list(c('CD4_Tcm', 'cDC1'), c('CD4_Tcm', 'cDC2'), c('CD4_Tem', 'cDC1'), c('CD4_Tem', 'cDC2 '), c('CD4_Treg', 'cDC1'), c('CD4_Treg', 'cDC2')),
    interaction_grouping = interaction_grouping,
    edge_group_colors = c("Activating" = "#e15759", "Chemotaxis" = "#59a14f", "Inhibitory" = "#4e79a7", "   Intracellular trafficking" = "#9c755f", "DC_development" = "#B07aa1"),
    node_group_colors = c("CD4_Tcm" = "#86bc86", "CD4_Tem" = "#79706e", "CD4_Treg" = "#ff7f0e", "cDC1" = "#bcbd22"  ,"cDC2" = "#17becf"),
    keep_significant_only = TRUE,
    standard_scale = TRUE,
    remove_self = TRUE)
图片.png

compare_cpdb/plot_compare_cpdb

data(covid_sample_metadata)
data(covid_cpdb_meta)
file <- system.file("extdata", "covid_cpdb.tar.gz", package = "ktplots")
# copy and unpack wherever you want this to end up
file.copy(file, ".")
system("tar -xzf covid_cpdb.tar.gz")

It requires 1) an input table like so:

covid_cpdb_meta
sample_id cellphonedb_folder sce_file
1 MH9143325 covid_cpdb/MH9143325/out covid_cpdb/MH9143325/sce.rds
2 MH9143320 covid_cpdb/MH9143320/out covid_cpdb/MH9143320/sce.rds
3 MH9143274 covid_cpdb/MH9143274/out covid_cpdb/MH9143274/sce.rds
4 MH8919226 covid_cpdb/MH8919226/out covid_cpdb/MH8919226/sce.rds
5 MH8919227 covid_cpdb/MH8919227/out covid_cpdb/MH8919227/sce.rds
6 newcastle49 covid_cpdb/newcastle49/out covid_cpdb/newcastle49/sce.rds
7 MH9179822 covid_cpdb/MH9179822/out covid_cpdb/MH9179822/sce.rds
8 MH8919178 covid_cpdb/MH8919178/out covid_cpdb/MH8919178/sce.rds
9 MH8919177 covid_cpdb/MH8919177/out covid_cpdb/MH8919177/sce.rds
10 MH8919176 covid_cpdb/MH8919176/out covid_cpdb/MH8919176/sce.rds
11 MH8919179 covid_cpdb/MH8919179/out covid_cpdb/MH8919179/sce.rds
12 MH9179826 covid_cpdb/MH9179826/out covid_cpdb/MH9179826/sce.rds

where each row is a sample, the location of the out folder generated by cellphonedb and a single-cell object used to generate the cellphonedb result. If cellphonedb was ran with a .h5ad the sce_file would be the path to the .h5ad file.

and 2) a sample metadata data frame for the tests to run:

covid_sample_metadata
sample_id Status_on_day_collection_summary
MH9143325 MH9143325 Severe
MH9143320 MH9143320 Severe
MH9143274 MH9143274 Severe
MH8919226 MH8919226 Healthy
MH8919227 MH8919227 Healthy
newcastle49 newcastle49 Severe
MH9179822 MH9179822 Severe
MH8919178 MH8919178 Healthy
MH8919177 MH8919177 Healthy
MH8919176 MH8919176 Healthy
MH8919179 MH8919179 Healthy
MH9179826 MH9179826 Severe

重点So if I want to compare between Severe vs Healthy, I would specify the function as follows:

## set up the levels
covid_sample_metadata$Status_on_day_collection_summary <- factor(covid_sample_metadata$Status_on_day_collection_summary, levels = c('Healthy', 'Severe'))

out <- compare_cpdb(cpdb_meta = covid_cpdb_meta,
    sample_metadata = covid_sample_metadata,
    celltypes = c("B_cell", "CD14", "CD16", "CD4", "CD8", "DCs", "MAIT", "NK_16hi", "NK_56hi", "Plasmablast", "Platelets", "Treg", "gdT", "pDC"), # the actual celltypes you want to test
    celltype_col = "initial_clustering", # the column that holds the cell type annotation in the sce object
    groupby = "Status_on_day_collection_summary") # the column in the sample_metadata that holds the column where you want to do the comparison. In this example, it's Severe vs Healthy

This returns a list of dataframes (for each contrast found) with which you can use to plot the results.

plot_compare_cpdb is a simple function to achieve that but you can always just make a custom plotting function based on what you want.

plot_compare_cpdb(out) # red is significantly increased in Severe compared to Healthy.
图片.png
If there are multiple contrasts and groups, you can facet the plot by specifying groups = c('group1', 'group2')
# let's mock up a new contrast like this
covid_sample_metadata$Status_on_day_collection_summary <- c(rep('Severe', 3), rep('Healthy', 2), rep('notSevere', 2), rep('Healthy', 4), 'notSevere')
out <- compare_cpdb(cpdb_meta = covid_cpdb_meta,
    sample_metadata = covid_sample_metadata,
    celltypes = c("B_cell", "CD14", "CD16", "CD4", "CD8", "DCs", "MAIT", "NK_16hi", "NK_56hi", "Plasmablast", "Platelets", "Treg", "gdT", "pDC"), # the actual celltypes you want to test
    celltype_col = "initial_clustering", # the column that holds the cell type annotation in the sce object
    groupby = "Status_on_day_collection_summary")
plot_compare_cpdb(out, alpha = .1, groups = names(out)) # there's no significant hit at 0.05 in this dummy example
图片.png

The default method uses a pairwise wilcox.test. Alternatives are pairwise Welch's t.test or a linear mixed model with lmer.

To run the linear mixed effect model, it expects that the input data is paired (i.e an individual with multiple samples corresponding to multiple timepoints):

# just as a dummy example, lets say the samples are matched where there are two samples per individual
covid_sample_metadata$individual <- rep(c("A", "B", "C", "D", "E", "F"), 2)

# actually run it
out <- compare_cpdb(cpdb_meta = covid_cpdb_meta,
    sample_metadata = covid_sample_metadata,
    celltypes = c("B_cell", "CD14", "CD16", "CD4", "CD8", "DCs", "MAIT", "NK_16hi",
            "NK_56hi", "Plasmablast", "Platelets", "Treg", "gdT", "pDC"),
    celltype_col = "initial_clustering",
    groupby = "Status_on_day_collection_summary",
    formula = "~ Status_on_day_collection_summary + (1|individual)", # formula passed to lmer
    method = "lmer")

plot_compare_cpdb(out, contrast = 'Status_on_day_collection_summarySevere') # use the colnames(out) to pick the right column.
图片.png

Specifying cluster = TRUE will move the rows and columns to make it look a bit more ordered.

plot_compare_cpdb(out, contrast = 'Status_on_day_collection_summarySevere', cluster = TRUE)
图片.png

其他绘图功能

# Note, this conflicts with tidyr devel version
geneDotPlot(scdata = kidneyimmune, # object
    genes = c("CD68", "CD80", "CD86", "CD74", "CD2", "CD5"), # genes to plot
    idents = "celltype", # column name in meta data that holds the cell-cluster ID/assignment
    split.by = 'Project', # column name in the meta data that you want to split the plotting by. If not provided, it will just plot according to idents
    standard_scale = TRUE) + # whether to scale expression values from 0 to 1. See ?geneDotPlot for other options
theme(strip.text.x = element_text(angle=0, hjust = 0, size =7)) + small_guide() + small_legend()
图片.png
correlationSpot(空间相关性通讯分析)
library(ggplot2)
scRNAseq <- Seurat::SCTransform(scRNAseq, verbose = FALSE) %>% Seurat::RunPCA(., verbose = FALSE) %>% Seurat::RunUMAP(., dims = 1:30, verbose = FALSE)
anchors <- Seurat::FindTransferAnchors(reference = scRNAseq, query = spatial, normalization.method = "SCT")
predictions.assay <- Seurat::TransferData(anchorset = anchors, refdata = scRNAseq$label, dims = 1:30, prediction.assay = TRUE, weight.reduction = spatial[["pca"]])
spatial[["predictions"]] <- predictions.assay
Seurat::DefaultAssay(spatial) <- "predictions"
Seurat::DefaultAssay(spatial) <- 'SCT'
pa <- Seurat::SpatialFeaturePlot(spatial, features = c('Tnfsf13b', 'Cd79a'), pt.size.factor = 1.6, ncol = 2, crop = TRUE) + viridis::scale_fill_viridis()
Seurat::DefaultAssay(spatial) <- 'predictions'
pb <- Seurat::SpatialFeaturePlot(spatial, features = 'Group1-3', pt.size.factor = 1.6, ncol = 2, crop = TRUE) + viridis::scale_fill_viridis()

p1 <- correlationSpot(spatial, genes = c('Tnfsf13b', 'Cd79a'), celltypes = 'Group1-3', pt.size.factor = 1.6, ncol = 2, crop = TRUE) + scale_fill_gradientn( colors = rev(RColorBrewer::brewer.pal(12, 'Spectral')),limits = c(-1, 1))
p2 <- correlationSpot(spatial, genes = c('Tnfsf13b', 'Cd79a'), celltypes = 'Group1-3', pt.size.factor = 1.6, ncol = 2, crop = TRUE, average_by_cluster = TRUE) + scale_fill_gradientn(colors = rev(RColorBrewer::brewer.pal(12, 'Spectral')),limits = c(-1, 1)) + ggtitle('correlation averaged across clusters')

cowplot::plot_grid(pa, pb, p1, p2, ncol = 2)
图片.png

好像改进的余地还挺多。

StackedVlnPlot

features <- c("CD79A", "MS4A1", "CD8A", "CD8B", "LYZ", "LGALS3", "S100A8", "GNLY", "NKG7", "KLRB1", "FCGR3A", "FCER1A", "CST3")
StackedVlnPlot(kidneyimmune, features = features) + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))
图片.png
rainCloudPlot(data = [email protected], groupby = "celltype", parameter = "n_counts") + coord_flip()
图片.png

生活很好,有你更好,百度文库出现了大量抄袭我的文章,对此我深表无奈,我写的文章,别人挂上去赚钱,抄袭可耻,挂到百度文库的人更可耻

你可能感兴趣的:(10X单细胞 && 10X空间转录组分析之cellphoneDB可视化进阶)