10X单细胞(10X空间转录组)通讯分析之汇总(scSeqComm)

开工第一天,开年红包到手,不知道大家工作的怎么样呢??这一篇我们来汇总单细胞数据的通讯分析,参考文章在Identify, quantify and characterize cellular communication from single-cell RNA sequencing data with scSeqComm,开年了,新的开始,我们来汇总一下

细胞通讯2种主要的信号机制
  • an intercellular signaling through the ligand–receptor proteins interaction
  • an intracellular signaling triggered by the receptor
细胞通讯的软件分析汇总(这些软件我都分享过)
Intercellular signaling methods
图片.png
Intercellular + intracellular signaling methods
图片.png

所以真正的细胞通讯包括两部分,the intercellular and the related intracellular signaling

  • 第一部分从 scRNA-seq 识别和量化细胞间信号的新方法。 所提出的细胞间评分方案可以处理包括在最新配体-受体数据库中的配体和受体的多亚基结构,并且它比通常使用的方法更保守,其精确目标是减少和优先考虑实验目标。

  • 其次是引入了一种计算程序,以量化接收细胞中正在进行的细胞内信号传导对已知生物信号通路的激活的影响。 量化细胞间和细胞内信号的证据的能力允许整合这两个方面,从而实现从 scRNA-seq 数据推断正在进行的细胞通信的更可靠的方法。 与检测到的细胞通讯效果相关的基因也可用于通过GO富集分析在功能上表征细胞间通讯的效果。

图片.png

来看看代码(软件scSeqComm)

安装
library(devtools)
install_gitlab("sysbiobig/scseqcomm")
Quick start

scSeqComm performs the identification, quantification and functional characterization of cellular communication from scRNA-seq data.

First, it identifies and quantifies ongoing intercellular and intracellular signaling through the function scSeqComm_analyze(). Then, it functionally characterizes the cellular response in the receiving cells through the function scSeqComm_GO_analysis(). Analysis results can be saved in csv/excel files or visualized using one of the many plot functions such as scSeqComm_plot_scores().

scSeqComm requires 4 input to work

  • scRNA-seq dataset
  • ligand - receptor pairs
  • transcriptional regulatory network
  • receptor - transcription factor a-priori association from gene signaling networks

The scRNA-seq dataset must be provided by the users. Users can provide as input their own list of ligand-receptor pairs, the transcriptional regulatory network and the a-priori association between receptor and transcription factor, as well as use the ones included in the package.

library(scSeqComm)

########## Load scRNA-seq data ##########
# load example data (gene expression matrix and cell groups) from Tirosh et al. (2016)
data(Example_data_GSE72056)
gene_expr_matrix <- Example_data_GSE72056$GSE72056_log2tpm_filtered
cell_cluster <- Example_data_GSE72056$GSE72056_clusters

########## Ligand-receptor pairs ##########
# let's use the ligand-receptor pairs from Kumar et al. (2018)
data(LR_pairs_Kumar_2018)
LR_db <- LR_pairs_Kumar_2018

########## Transcriptional regulatory networks ##########
# let's use the transcriptional regulatory network obtained by merging TRRUST v2, HTRIdb and RegNetwork
data(TF_TG_TRRUSTv2_HTRIdb_RegNetwork_High)
TF_TG_db <- TF_TG_TRRUSTv2_HTRIdb_RegNetwork_High

########## Receptor-Transcription factor a-priori association #########
# let's use the R-TF a-priori association obtained by KEGG signaling networks
data(TF_PPR_KEGG_human)
TF_PPR <- TF_PPR_KEGG_human

########## Identify and quantify intercellular and intracellular signaling ##########
num_core <- 2
scSeqComm_res <- scSeqComm_analyze(gene_expr = gene_expr_matrix,
                                  cell_group = cell_cluster,
                                  LR_pairs_DB = LR_db,
                                  TF_reg_DB = TF_TG_db, 
                                  R_TF_association = TF_PPR,
                                  N_cores = num_core)
head(scSeqComm_res$comm_results)

The main output of scSeqComm_analyze() is a dataframe (comm_results) containing several information for each ligand-receptor pair across any pair of cell clusters. A complete description of scSeqComm_analyze() output is provided in the dedicated section. Among the most important information contained in comm_results, there are

  • ligands and receptors names and their expressing clusters
  • evidence of an ongoing intercellular signaling (S_inter score, range [0,1])
  • evidence of an ongoing intracellular signaling (S_intra score, range [0,1])
  • differentially expressed genes connected to the given receptor through the specified pathway
An example of the above information is shown in the table below
ligand receptor cluster_L cluster_R S_inter S_intra pathway list_genes
CCR5 CCL5 T_cell Macrophage 0.985 0.823 Chemokine signaling pathway TNFSF10,IFI27,CD44,…
CCR5 CCL5 T_cell Macrophage 0.985 0.654 Human immunodeficiency virus 1 infection MMP2,SOX7,CLU,EZR,…
CCR5 CCL5 B_cell Macrophage 0.782 0.823 Chemokine signaling pathway TNFSF10,CASP10,CD44,…
CCR5 CCL5 B_cell Macrophage 0.782 0.654 Human immunodeficiency virus 1 infection MMP2,SOX7,CLU,EZR,…
CCR5 CCL5 Macrophage B_cell 0.357 0.024 Chemokine signaling pathway FBXO32,TNFSF10,VEGFA,…
CCR5 CCL5 Macrophage B_cell 0.357 0.001 Human immunodeficiency virus 1 infection GSTP1,FOS,MAPK1,…
A2M LRP1 T_cell B_cell 0.324 NA NA NA

Receptors having no information about intracellular signaling (i.e. NA values) are the ones without known downstream genes in pathway database.

The data contained in the comm_results table allows to quantify the ongoing cellular communication in terms of evidence of an ongoing intercellular signaling and a consequent intracellular signaling in the receiving cells. We suggest users to prioritize their analyses and focus first on the strongest signals (see functions scSeqComm_select() and scSeqComm_summaryze_S_intra()).

Once the evidence of ongoing intracellular and intracellular signaling have been computed, many analyses can be performed. As an example, the following code allows to visualize the amount of ongoing cellular communication among cell types based on the evidence of intercellular signaling.

#summarise S_intra score for each ligand-receptor pair and cell cluster couple as max S_intra values
inter_max_intra_scores <- scSeqComm_summaryze_S_intra(scSeqComm_res$comm_results)

## select communication with "medium" intercellular evidence (e.g. > 0.5)
selected_comm <- scSeqComm_select(inter_max_intra_scores, 
                                  S_inter = 0.5)

# interactive chord diagram - Figure A)
scSeqComm_chorddiag_cardinality(data = selected_comm)
# heatmap - Figure B)
scSeqComm_heatmap_cardinality(data = selected_comm, 
                              title = "Ongoing cellular communication")
图片.png

As second example, the following code allows to visualize the evidence of ongoing intracellular and intercellular signaling on a set of selected ligand-receptor pairs.

# List of receptors
selected_rec <- c("CXCR3", "CXCR4", "ITGA5")

# Select results involving the above receptors
selected_comm <- scSeqComm_select(inter_max_intra_scores,
                                  receptor = selected_rec)

# Figure A)
scSeqComm_plot_scores(selected_comm, 
                      title = "Ligand-receptor pairs involving receptors CXCR3, CXCR4 and ITGA5")

# Figure B)
pair <- "ADAM15 - ITGA5" 
scSeqComm_plot_LR_pair(data = inter_max_intra_scores, 
                       title = pair, 
                       selected_LR_pair = pair)

# Figure C)
scSeqComm_plot_cluster_pair (data = selected_comm, 
                             selected_cluster_pair = "CAF --> NK_cell", 
                             title = "CAF --> NK cells")
图片.png

Once the evidence of ongoing intracellular and intracellular signaling have been computed, it is possible to functionally characterize the cellular response associated to the ongoing cellular communication through the function scSeqComm_GO_analysis(). The functional characterization is performed through a Gene Ontology enrichment analysis on differentially expressed genes downstream receptors.

As an example, the following code performs the functional characterization of cellular response in cell cluster CAF considering only intercellular and intracellular signaling with an evidence higher than 0.8.

# CAF communication with strong intercellular and intracellular evidence
CAF_comm <- scSeqComm_select(data = scSeqComm_res$comm_results,
                             cluster_R = "CAF",
                             S_inter = 0.8, S_intra = 0.8,
                             NA_included = FALSE)

# GO analysis of CAF communication
geneUniverse <- unique(unlist(scSeqComm_res$TF_reg_DB_scrnaseq))

CAF_cell_functional_response <- scSeqComm_GO_analysis(results_signaling = CAF_comm, 
                                                    geneUniverse = geneUniverse,
                                                    method = "general")

The output is a table of GO terms and the associated p values. An example of a possible output is shown in the table below.

图片.png

The following code allows to visualize intercellular and intracellular signaling evidence of the selected pairs in CAF cells, together with the information about the functional response of CAF cells.

# ligand-receptor pairs of interest
CAF_cell_pairs <- unique(CAF_comm$LR_pair)

# select communication related to ligand-receptor pairs of interest
CAF_cell_comm <- scSeqComm_select(scSeqComm_res$comm_results,
                                  cluster_R = "CAF", 
                                  LR_pair = CAF_cell_pairs)

# plot scores and GO terms having adjusted pvalue < 0.05
scSeqComm_plot_scores(data = CAF_cell_comm, 
                       title = "CAF intercellular, intracellular and functional analysis",
                       facet_grid_y = NULL,
                       annotation_GO = CAF_cell_functional_response, cutoff = 0.05, topGO = 20)
图片.png

GO analysis can be also performed for each receptor independently, i.e. functionally characterize the cellular responses triggered by each specific receptor, specifying the parameter method = "specific".

As an example, the following code performs the functional characterization of cellular response in cell cluster CAF considering only ligand-receptor pairs identified above. The function performs an independent GO analysis on each receptor specified in example_LR_list that pass the 0.8 threshold.

# GO analysis of CAF cells communication (stratified by receptors)
CAF_cell_functional_response_rec <- scSeqComm_GO_analysis(results_signaling = CAF_comm, 
                                                    geneUniverse = geneUniverse,
                                                    method = "specific")

Among all the ligand-receptor pairs of interest, the following code visualizes the results for ligand-receptor pairs involving receptors CD44 and IL10RB.

# Among all the ligand-receptor pairs of interest, select results involving receptors CD44 and IL10RB.
CAF_cell_comm <- scSeqComm_select(scSeqComm_res$comm_results,
                                cluster_R = "CAF", 
                                LR_pair = CAF_cell_pairs,
                                receptor = c("CD44","IL10RB"))

# plot scores and GO terms having adjustec pvalue < 0.05
scSeqComm_plot_scores_pathway (data = CAF_cell_comm, 
                       title = "CAF intercellular, intracellular and functional analysis",
                       annotation_GO = CAF_cell_functional_response_rec[c("CD44","IL10RB")], cutoff = 0.05, topGO = 20)
图片.png

scSeqComm input

scSeqComm requires 4 input to work

  • scRNA-seq dataset
  • ligand - receptor pairs
  • transcriptional regulatory network
  • receptor - transcription factor a-priori association from gene signaling networks
单细胞数据

The main input of scSeqComm is a scRNA-seq dataset to analyze. Users must provide as input:

  • scRNA-seq gene expression matrix: normalized scRNA-seq gene expression matrix, genes on rows and cells on columns. rownames() must contain the genes’ names.
  • Cell groups/clusters: a named list of cell groups/clusters. Each element in the list is an array of cells/columns identifiers (i.e. a subset of colnames() or a subset of column indices).
Ligand - receptor pairs

Users must provide as input a set of ligand-receptor pairs to be used in intercellular signaling computation.

For user convenience, scSeqComm includes 21 ligand-receptor pairs databases derived from literature data of human (13 databases) and mouse (8 databases) species. Some of the available databases includes information about multi-subunit structure of ligands and receptors. Function available_LR_pairs() lists the names of the available databases; additional information could be obtained typing ?available_LR_pairs().

The current version of scSeqComm contains the following ligand-receptor databases:


图片.png

Additional information for each ligand-receptor database can be obtained typing ?. For example, details about LR_pairs_Ramilowski_2015 database can be obtained typing ?LR_pairs_Ramilowski_2015.

Ligand-receptor databases included in scSeqComm can be loaded in the environment typing data(). Alternatively, ligand-receptor databases included in scSeqComm can be accessed typing scSeqComm::.

For example, typing data(LR_pairs_Ramilowski_2015) will load in the environment the R object named LR_pairs_Ramilowski_2015; the same R object can be accessed typing scSeqComm::LR_pairs_Ramilowski_2015.

Users can also specify their own ligand-receptor pairs in the form of a R data.frame. Dataframe must contain two columns named “ligand” and “receptor”, one ligand-receptor pair for each row. The following code shows an example of a correctly formatted dataframe.

library(scSeqComm)
head(scSeqComm::LR_pairs_Choi_2015)
>##   ligand receptor
>## 1    CCK    CCKAR
>## 2   GAST    CCKBR
>## 3    GRP     GRPR
>## 4  IL17F   IL17RA
>## 5   NTN1    DSCAM
>## 6 SEMA3A   PLXNA1

In case of multi-subunit structure of ligands and/or receptors, each sub-unit must be specified in a comma separated way (no spaces). The following code shows an example of a correctly formatted dataframe.

library(scSeqComm)
head(scSeqComm::LR_pairs_Efremova_2020)
##          ligand        receptor
## 1   IL12A,IL12B IL12RB1,IL12RB2
## 2 ACVR1B,ACVR2A     INHBA,INHBB
## 3  ACVR1,ACVR2A     INHBA,INHBB
## 4   IL12B,IL23A   IL12RB1,IL23R
## 5 ACVR1B,ACVR2B     INHBA,INHBB
## 6   ITGB2,ITGAM     FGA,FGG,FGB
Transcriptional regulatory network

Users must provide as input a transcriptional regulatory network, i.e. a set of transcription factors (TFs) and their target genes (TGs), to be used in intercellular signaling computation.

For user convenience, scSeqComm includes several TFs-TGs databases derived from literature. Function available_TF_TG() lists the names of the available databases; additional information could be obtained typing ?available_TF_TG().

The current version of scSeqComm contains the following TFs-TGs databases:


图片.png

Additional information for each TFs-TGs database can be obtained typing ?. For example, details about TF_TG_TRRUSTv2 database can be obtained typing ?TF_TG_TRRUSTv2.

TFs-TGs databases included in scSeqComm can be loaded in the environment typing data(). Alternatively, TFs-TGs databases included in scSeqComm can be accessed typing scSeqComm::.

For example, typing data(TF_TG_TRRUSTv2) will load in the environment the R object named TF_TG_TRRUSTv2; the same R object can be accessed typing scSeqComm::TF_TG_TRRUSTv2.

Users can also specify their own TFs-TGs database in the form of a R named list. Each element in the list represents a TF and it contains an array of target genes names. The following code shows an example of a correctly formatted list.

library(scSeqComm)
head(scSeqComm::TF_TG_TRRUSTv2, n = 3)
## $AATF
## [1] "BAX"    "CDKN1A" "KLK3"   "MYC"    "TP53"  
## 
## $ABL1
##  [1] "BAX"    "BCL2"   "BCL6"   "CCND2"  "CDKN1A" "CSF1"   "FOXO3"  "JUN"   
##  [9] "PIM1"   "TP53"  
## 
## $AES
## [1] "EPHA3" "LEF1"  "RND3"
Receptor-Transcription factor a-priori association from gene signaling networks

Users must provide as input a dataframe describing the receptor-transcription factor (R-TF) a-priori association to be used in intracellular signaling computation.

For user convenience, scSeqComm includes 4 receptor-transcription factor a-priori association from gene signaling networks derived by human and mouse KEGG and REACTOME pathway database, computed as Personalized PageRank (PPR) score of each transcription factor using as seed node the receptor annotated in that pathway. The available data includes also information about the relative signaling pathway category. Function available_TF_PPR() lists the names of the available a-priori association dataframes; additional information could be obtained typing ?available_TF_PPR().

The current version of scSeqComm contains the following receptor-transcription factor a-priori associations:


图片.png

Additional information for each R-TF a-priori association can be obtained typing ?_. For example, details about receptor-transcription factor a-priori association from human KEGG gene networks can be obtained typing ?TF_PPR_KEGG_human.

R-TF a-priori associations included in scSeqComm can be loaded in the environment typing data(). Alternatively, R-TF a-priori associations included in scSeqComm can be accessed typing scSeqComm::.

For example, typing data(TF_PPR_REACTOME_mouse) will load in the environment the R object named TF_PPR_REACTOME_mouse; the same R object can be accessed typing scSeqComm::TF_PPR_REACTOME_mouse.

Users can also specify their own receptor-transcription factor a-priori association in the form of a R data.frame. Dataframe must contain four columns:

  • “receptor” containing the receptor names
  • “pathway” containing the pathway names
  • “tf” containing the transcription factor names
  • “tf_PPR” containing the receptor-transcription factor association (e.g. Personalzed PageRank score)
    Each row described the a-priori association of a given transcription factor associated to a given receptor in a given signaling pathway. The following code shows an example of a correctly formatted dataframe.
library(scSeqComm)
head(scSeqComm::TF_PPR_KEGG_human)
##   receptor                      pathway   tf       tf_PPR   category
## 1      PKM Glycolysis / Gluconeogenesis ENO1 0.0429336576 Metabolism
## 2    ALDOA Glycolysis / Gluconeogenesis ENO1 0.0096962874 Metabolism
## 3      GPI Glycolysis / Gluconeogenesis ENO1 0.0009609631 Metabolism
## 4   MINPP1 Glycolysis / Gluconeogenesis ENO1 0.0660902131 Metabolism
## 5     NPR2            Purine metabolism NME2 0.0224362875 Metabolism
## 6      PKM            Purine metabolism NME2 0.0194948840 Metabolism
##               subcategory
## 1 Carbohydrate metabolism
## 2 Carbohydrate metabolism
## 3 Carbohydrate metabolism
## 4 Carbohydrate metabolism
## 5   Nucleotide metabolism
## 6   Nucleotide metabolism

Users can build their own R-TF a-priori association from a given gene signaling networks using compute_tfactors_PPR() function. The function performs Personalized PageRank (PPR) algorithm to measure the a-priori association between specified receptors and transcription factors in the given gene signaling networks. Users can specified their own gene signaling networks as list of directed igraph objects, as well as the list of receptors and transcription factors to be considered in the computation. Additional information about igraph objects format could be obtained by typing ?compute_tfactors_PPR().

The scSeqComm package included two R objects named example_KEGG_igraphs_human and example_REACTOME_igraphs_mouse as examples of correctly formatted objects to be given as input.

As an example, the following code performs a-priori association between any receptors and transcription factors included in scSeqComm from two human gene signaling networks derived by KEGG database.

library(scSeqComm)

# example of KEGG gene networks in igraph format
data(example_KEGG_igraphs_human)

# receptor list
receptors_human <- unique(c(LR_pairs_BaderLab_2017$receptor, LR_pairs_Browaeys_2019$receptor, LR_pairs_CabelloAguilar_2020$receptor,                              LR_pairs_Choi_2015$receptor,  LR_pairs_ConnectomeDB_2020$receptor, LR_pairs_Efremova_2020$receptor,                                   LR_pairs_Jin_2020$receptor, LR_pairs_Kumar_2018$receptor, LR_pairs_Noel_2020$receptor,                                                LR_pairs_Ramilowski_2015$receptor,LR_pairs_Shao_2020$receptor,LR_pairs_Wang_2019$receptor,                                            LR_pairs_Zhang_2019$receptor))

# transcription factor list
tfactors_human <- unique(c(names(TF_TG_TRRUSTv2_HTRIdb_RegNetwork_High), names(TF_TG_RegNetwork_Med_High)))

# computation of R-TF a-priori association
TF_PPR <- compute_tfactors_PPR(receptors_human, tfactors_human, example_KEGG_igraphs_human)

Similarly, the receptor-transcription factor a-priori assocation can be computed from two mouse gene signaling networks derived by Reactome database.

library(scSeqComm)

# example of KEGG gene networks in igraph format
data(example_REACTOME_igraphs_mouse)

# receptor list
receptors_mouse <- unique(c(LR_pairs_Cain_2020_mouse$receptor, LR_pairs_Ding_2016_mouse$receptor, LR_pairs_Hu_2021_mouse$receptor,                                LR_pairs_Jin_2020_mouse$receptor, LR_pairs_Shao_2020_mouse$receptor, LR_pairs_Sheikh_2019_mouse$receptor,                             LR_pairs_Skelly_2018_mouse$receptor, LR_pairs_Yuzwa_2016_mouse$receptor))

# transcription factor list
tfactors_mouse <- unique(c(names(TF_TG_TRRUSTv2_RegNetwork_High_mouse), names(TF_TG_RegNetwork_Med_High_mouse)))

# computation of R-TF a-priori association
TF_PPR <- compute_tfactors_PPR(receptors_mouse, tfactors_mouse, example_REACTOME_igraphs_mouse)

scSeqComm analysis

scSeqComm performs the identification, quantification and functional characterization of cellular communication from scRNA-seq data.

First, it identifies and quantifies ongoing intercellular and intracellular signaling though the function scSeqComm_analyze(). Then, it functionally characterize the cellular response in the receiving cells through the function scSeqComm_GO_analysis(). Additionally, scSeqComm makes other functions available for data handling and visualization.

Intercellular and intracellular signaling analysis

The function scSeqComm_analyze() performs the identification and quantification of ongoing intercellular and intracellular signaling from user specified inputs (see section scSeqComm input). It returns a complete report of the cellular communication analysis (comm_results), the ligand-receptor pairs (LR_pairs_DB_scrnaseq) and transcriptional regulatory network (TF_reg_DB_scrnaseq) characterizing the given scRNA-seq data.

As an example, the following code performs the analysis on a scRNA-seq example dataset included in the package, using built-in ligand-receptor pairs list and transcriptional regulatory network.

########## Identify and quantify intercellular and intracellular signaling ##########
num_core <- 2
scSeqComm_res <- scSeqComm_analyze(gene_expr = gene_expr_matrix,
                                  cell_group = cell_cluster,
                                  LR_pairs_DB = LR_db,
                                  TF_reg_DB = TF_TG_db, 
                                  R_TF_association = TF_PPR,
                                  N_cores = num_core)

#intercellular and intracellular signaling analysis report
inter_intra_scores <- scSeqComm_res$comm_results
head(scSeqComm_res$comm_results)

#Ligand-receptor pairs characterizing input scRNA-seq data
head(scSeqComm_res$LR_pairs_DB_scrnaseq)

#transcriptional regulatory network characterizing input scRNA-seq data
head(scSeqComm_res$TF_reg_DB_scrnaseq)

The main output of scSeqComm_analyze() is a dataframe (comm_results) containing intercellular and intracellular signaling information for each ligand-receptor pair across any pair of cell clusters. The data contained in the comm_results table allows to quantify the ongoing cellular communication in terms of evidence of an ongoing intercellular signaling and a consequent intracellular signaling in the receiving cells.

An ongoing cellular communication occurs through a ligand and a receptor and the two clusters expressing them. Each ligand-receptor pair and cell cluster couple is uniquely characterized by a S_inter score, which quantifies the intercellular signaling evidence. Among the most important information on intercellular signaling, there are

  • ligand and receptor scores
  • evidence of an ongoing intercellular signaling (S_inter score, range [0,1])

An example of the above intercellular information is shown in the table below.


图片.png

Additional columns contain information about the involved ligands and receptors (e.g. their average expression levels) and (optional) alternative intercellular signaling scoring schemes. The complete list of alternative intercellular signaling scoring schemes is available in scSeqComm_analyze() function documentation.

Among the most important information on intracellular signaling contained in comm_results, there are

  • receptor name and its expressing cluster
  • Kegg or Reactome pathway names
  • evidence of an ongoing intracellular signaling (S_intra score, range [0,1])
  • differentially expressed genes connected to the given receptor through the specified KEGG or Reactome pathway

An example of the above intracellular information is shown in the table below.


图片.png

By default, differentially expressed genes are identified comparing their expression level in the current cluster with the ones in the remaining clusters. User can also specify their own background cells to be used during differential expression analysis using the cell_reference parameter. cell_reference parameter could be an array of cell names, to be used as background across any cell cluster, or cluster-specific “backgrounds” in the form of a named list where each element of the list is an array of cell names. This option is useful to adapt the concept of background/reference to different experimental designs and biological scenarios of interest.

A cellular communication can trigger different cellular responses in the receiving cell through different pathways: thus, to a ligand-receptor pair and cell cluster pair can be associated to multiple S_intra scores. Receptors having no information about intracellular signaling are the ones without known downstream genes in KEGG or Reactome pathway database: columns related to intracellular evidence will have NA values.

We suggest users to focus their analysis on the most important information contained in comm_results, summarized in the table below.


图片.png

Once the evidence of ongoing intracellular and intercellular signaling have been computed, many analyses can be performed.

User might be interested in intercellular or intracellular-only analysis: inter_signaling and intra_signaling logical arguments allow to specify the analysis to be performed. By default, the function scSeqComm_analyze() performes both intercellular and intracellular analysis from user specified inputs. The main output of scSeqComm_analyze() is a dataframe (comm_results) containing intercellular and/or intracellular signaling information based on the passed arguments.

For example, if intercellular-only signaling is performed, the main output is a dataframe containing intercelllar signaling information for each ligand-receptor pair across any cell clusters, such as ligand/receptor scores and ongoing intercellular signaling evidence (see Table “Intercellular information” above). If intracellular-only signaling is performed, the main output is a dataframe containing intracelllar signaling information for each receptor-pathway pair across any cell clusters, such as ongoing intracellular signaling evidence and related differentially expressed genes (see Table “Intracellular information” above).

Select and summarize signaling results

Cellular communication results are multidimensional data involving different ligand-receptor pairs, several cell clusters, etc. Therefore, scSeqComm provides to users a set of functions to select and summarize results of interest.

Often users want to prioritize their analyses, focusing on the strongest signals and, eventually, on the cellular communications of interest. The function scSeqComm_select() selects a subset of the inferred intercellular and intracellular signaling, by providing as input the selection criteria to be used for filtering the corresponding columns of the input data.frame (e.g. comm_results).

For example, the following code selects all cellular signals involving a set of user specified receptor.

#list of receptors
selected_rec <- c("CXCR3","CXCR4","ITGA5")

## select cellular communication involving user specified receptors
selected_comm <- scSeqComm_select(inter_intra_scores, 
                                  receptor = selected_rec)

Multiple filtering options can be combined using an AND operator (i.e. select the cellular communication results that fulfill all the input filtering options) or using an OR operator (i.e. select the cellular communication results that fulfill at least one of the input filtering options). By default, filtering options are combined with an AND operator. As example, the following code selects communications having both intercellular and intracellular evidence above a certain threshold.

## select cellular communication having intercellular AND intracellular evidence above a user specified threshold (e.g. 0.9)
selected_comm <- scSeqComm_select(inter_intra_scores, 
                                  S_inter = 0.9,
                                  S_intra = 0.9)

User can specify an OR operator using parameter operator = "OR". As example, the following code selects cellular communication results involving the user specified receptors or having and intercellular signaling evidence above 0.5.

#list of receptors
selected_rec <- c("CXCR3","CXCR4","ITGA5")

## select cellular communication involving the user specified receptors OR having an intercellular signaling evidence above 0.5.
selected_comm <- scSeqComm_select(inter_intra_scores, 
                                  S_inter = 0.5,
                                  receptor = selected_rec,
                                  operator = "OR")

Additional information on other possible filtering criteria can be obtained typing ?scSeqComm_select().

In addition to filtering cellular communication results, scSeqComm provides to users the possibility to summarize results of intracellular signaling evidence. Indeed, a given ligand-receptor pair between two cell clusters can be associated to multiple evidences of ongoing intracellular signaling (i.e. multiple S_intra scores, one for each KEGG or Reactome pathway including the given receptor). Function scSeqComm_summaryze_S_intra() allows users to summarize intracellular signaling results by taking the largest S_intra value among all the S_intra values associated to each receptor in a cell cluster.

The output is a data.frame having the same columns of comm_results, but each ligand-receptor pair and cell cluster couple is associated to an unique value of S_inter and S_intra score. As an example, let’s us consider the following entries.

图片.png

Applying function scSeqComm_summaryze_S_intra() the entries will be summarized as follow.

图片.png

Functionally characterize cellular communication

scSeqComm allows user to functionally characterize the cellular response in the receiving cells through the function `scSeqComm_GO_analysis()``.

The functional characterization is performed through a Gene Ontology enrichment analysis on differentially expressed genes downstream receptors (genes in list_genes column of comm_results). Users must define the list of genes to be used as “background” to compare DE genes in the enrichment analysis (geneUniverse parameter): we suggest to use the list of all genes in transcriptional regulatory network characterizing the scRNA-seq data (i.e. all target genes present in the dataset).

# gene universe is all the target genes in dataset
geneUniverse <- unique(unlist(scSeqComm_res$TF_reg_DB_scrnaseq))

Users can perform functional characterization of cellular response associated to a subset of ongoing cellular communications using ?scSeqComm_select() (e.g. by specifying lower bound values for intercellular and intracellular evidence and/or a set of ligand-receptors pairs of interest). Additional information about parameters to be specified on scSeqComm_select() function can be obtained typing ?scSeqComm_select().

The results of GO analysis is formatted as a table of GO terms ordered by their pvalues. An example of a possible output is shown in the table below.

图片.png

The overall functional characterization of a cell cluster response is performed on all the DE genes in the cluster of interest associated to the selected receptors, specifying the parameter method = general. As an example, the following code performs the overall functional characterization of CAF cellular response considering only intercellular and intracellular signaling with an evidence higher than 0.8.

# CAF communication with strong intercellular and intracellular evidence
CAF_comm <- scSeqComm_select(data = scSeqComm_res$comm_results,
                             cluster_R = "CAF",
                             S_inter = 0.8, S_intra = 0.8,
                             NA_included = FALSE)

# GO analysis of CAF communication
CAF_cell_functional_response <- scSeqComm_GO_analysis(results_signaling = CAF_comm, 
                                                    geneUniverse = geneUniverse,
                                                    method = "general")

head(CAF_cell_functional_response)

Similarly, GO enrichment analysis can be performed independently for each receptor setting method = specific: the function scSeqComm_GO_analysis() provides, then, the functional characterization of cluster cellular response triggered by each receptor. The output is a list of data.frame with enrichment analysis results for each receptor. As an example, the following code performs the independent functional characterization of “CAF” cellular response on each receptor of the specified ligand-receptor pairs list passing the 0.8 threshold for intercellular and intracellular evidence.

# set of ligand-receptor pairs of interest
pairs <- c("ANXA1 - EGFR","CDH1 - EGFR","FN1 - CD44","CCL5 - CCR1", "A2M - LRP1")

# CAF communication with strong evidence and involving selected pairs
CAF_comm <- scSeqComm_select(data = scSeqComm_res$comm_results,
                             cluster_R = "CAF",
                             S_inter = 0.8, S_intra = 0.8,
                             LR_pair = pairs,
                             NA_included = FALSE)

# GO analysis of CAF communication for each receptor
CAF_cell_functional_response_rec <- scSeqComm_GO_analysis(results_signaling = CAF_comm, 
                                                    geneUniverse = geneUniverse,
                                                    method = "specific")

#receptors for which the GO analysis is performed
names(CAF_cell_functional_response_rec)
# [1] "EGFR" "CD44"

head(CAF_cell_functional_response_rec[["EGFR"]])
Visualize cellular communication results

Analysis results can be visualized using one of the many plot functions implemented in scSeqComm.

Functions scSeqComm_chordiag_cardinality() and scSeqComm_heatmap_cardinality() plot the amount of ongoing cellular communication among cell types in terms of number of interacting ligand-receptor pairs contained in the input data. The two functions allow to visualize the same information using two different data visualization technique: a (interactive) chordiagram and a heatmap. Users can select their ongoing cellular communication of interest to plot using scSeqComm_select() and give it in input to these functions.

As an example, the following code allows to visualize the amount of cell-cell communication signals considering only ligand-receptor pairs having intercellular score S_inter above 0.5.

#summarize S_intra score
inter_max_intra_scores <- scSeqComm_summaryze_S_intra(scSeqComm_res$comm_results)

# select communication with s_inter > 0.5
selected_comm <- scSeqComm_select(inter_max_intra_scores, 
                                  S_inter = 0.5)

# interactive chord diagram - Figure A
scSeqComm_chorddiag_cardinality(data = selected_comm)
# heatmap - Figure B
scSeqComm_heatmap_cardinality(data = selected_comm, 
                              title = "Ongoing cellular communication (inter- and intra-cellular evidence)")
图片.png

Functions scSeqComm_plot_scores(), scSeqComm_plot_scores_pathway(), scSeqComm_plot_LR_pair() and scSeqComm_plot_cluster_pair() use the combination of two data visualization technique (proportional area charts and heatmaps) allows to simultaneously plot information about inferred ongoing intercellular and intracellular signaling. Particularly, the functions map the intercellular evidence (S_inter score) in circle size and the intracellular evidence (S_intra scores) in circle color. Grey color is associated to an ongoing cellular communication with no information about intracellular signaling (i.e. NA values of S_intra scores).

Analysis results can be visualized using one of the many plot function and users can select their ongoing cellular communications of interest to be plotted using scSeqComm_select() and give it in input to these functions.

The function scSeqComm_plot_scores() allows to visualize S_inter and the maximal S_intra scores for each ligand-receptor pairs between any cell types described in data. Results can be grouped by receptors (default) or ligands on rows dimension and by sender or receiver cell (default) clusters on columns dimension setting the proper parameter.

The following code will show intercellular and intracellular evidence of selected ligand-receptor pairs across all cell clusters.

# set of ligand-receptor pairs of interest
pairs <- c("ANXA1 - EGFR","CDH1 - EGFR","FN1 - CD44","CCL5 - CCR1", "A2M - LRP1")

# select communication involving above LR pairs
selected_comm <- scSeqComm_select(inter_max_intra_scores, 
                                  LR_pair = pairs)

# plot scores
scSeqComm_plot_scores(data = selected_comm,
                      title = "Intercellular and intracellular results of selected ligand-receptor pairs")
图片.png

Besides, scSeqComm_plot_scores() function allows users to add the functional analysis results to the figure by giving in input the results of scSeqComm_GO_analysis() and setting parameters on GO terms to be visualized. The input data to be plotted should be consistent with the cellular communications used for the performed enrichment analysis. As an example, the following code will visualize intercellular and intracellular signaling evidence of the selected pair involving CAF cells as receiving cells, together with the corresponding GO analysis results computed above.

# select communication related to ligand-receptor pairs of interest and the GO analysis computed above
CAF_cell_comm <- scSeqComm_select(inter_max_intra_scores,
                                  cluster_R = "CAF", 
                                  LR_pair = pairs,
                                  receptor = names(CAF_cell_functional_response_rec))

# plot scores and GO terms 
scSeqComm_plot_scores (data = CAF_cell_comm, 
                       title = "CAF intercellular, intracellular and functional analysis",
                       annotation_GO = CAF_cell_functional_response_rec, cutoff = 0.05, topGO = 5)
图片.png

Similarly to scSeqComm_plot_scores() function, scSeqComm_plot_scores_pathway() function plots S_inter and S_intra scores for each ligand receptor pair and cell cluster couple, as well as information about KEGG or Reactome pathways to which the intracellular evidence of the ongoing cellular communication is related. The function allows also users to visualize information about the functional analysis by giving in input the results of scSeqComm_GO_analysis(). The input data should be consistent with the cellular communications used for the performed enrichment analysis.

The following code will plot intercellular and intracellular signaling evidence of selected ligand-receptor pairs, together with information about pathways.

# set of ligand-receptor pairs of interest
pairs <- c("ANXA1 - EGFR","CDH1 - EGFR","FN1 - CD44","CCL5 - CCR1", "A2M - LRP1")

# select communication involving above LR pairs
selected_comm <- scSeqComm_select(inter_intra_scores, 
                                  LR_pair = pairs)

# plots scores and pathway info (Figure A)
scSeqComm_plot_scores_pathway(data = selected_comm,
                              title = "Intercellular and intracellular results of selected ligand-receptor pairs")
图片.png

The code below will plot intercellular and intracellular signaling evidence of selected ligand-receptor pairs, as well as KEGG pathways information and functional analysis results computed above.

# select communication related to ligand-receptor pairs of interest and the GO analysis computed above
CAF_cell_comm <- scSeqComm_select(inter_intra_scores,
                                  cluster_R = "CAF", 
                                  LR_pair = pairs,
                                  S_inter = 0.8,
                                  S_intra = 0.8,
                                  receptor = names(CAF_cell_functional_response_rec))

# plot scores, pathway info and GO terms
scSeqComm_plot_scores_pathway (data = CAF_cell_comm, 
                              title = "CAF intercellular, intracellular and functional analysis",
                              annotation_GO = CAF_cell_functional_response_rec, cutoff = 0.05, topGO = 5)
图片.png

Users can also visualize S_inter and (maximal) S_intra scores related to i) a specific ligand-receptor pairs across cell types or ii) a specific cell cluster couple among ligand-receptor pairs using functions scSeqComm_plot_LR_pair() and scSeqComm_plot_cluster_pair(), respectively.

The following code will visualize intercellular and intracellular evidence for a specific ligand-receptor pair.

# plot scores for a selected ligand-receptor pair
scSeqComm_plot_LR_pair(data = inter_max_intra_scores,
                       title = "FN1 - CD44",
                       selected_LR_pair = "FN1 - CD44")
图片.png

The following code will visualize intercellular and intracellular evidence of selected ligand-receptor pairs for a specific cell cluster pair.

# set of ligand-receptor pairs of interest
pairs <- c("CCL4 - CCR1","CCL3 - CCR1","GNAI2 - CCR5","CCL5 - CCR1", "CCL5 - CCR5")

# select communication involving above LR pairs
selected_comm <- scSeqComm_select(inter_max_intra_scores, 
                                  LR_pair = pairs)

# plot scores contained in "selected_comm" for a selected cluster couple 
scSeqComm_plot_cluster_pair(data = selected_comm,
                       title = "CAF --> Malignant_cell",
                       selected_cluster_pair = "CAF --> Macrophage")
图片.png

Real case example: cellular communication analysis on scRNA-seq human metastatic melanoma

library(scSeqComm)

########## Load scRNA-seq data ##########
# load example data (gene expression matrix and cell groups) from Tirosh et al. (2016)
data(Example_data_GSE72056)
gene_expr_matrix <- Example_data_GSE72056$GSE72056_log2tpm_filtered
cell_cluster <- Example_data_GSE72056$GSE72056_clusters

########## Ligand-receptor pairs ##########
# list available ligand-receptor pairs databases included in scSeqComm
?available_LR_pairs()
available_LR_pairs(species = "human")
# let's use the ligand-receptor pairs from Kumar et al. (2018)
?LR_pairs_Kumar_2018
data(LR_pairs_Kumar_2018)
LR_db <- LR_pairs_Kumar_2018

########## Transcriptional regulatory networks ##########
# list available transcriptional regulatory networks databases included in scSeqComm
?available_TF_TG()
available_TF_TG(species = "human")
# let's use the transcriptional regulatory networks from TRRUST v2 [Han et al. (2018)], HTRIdb [Bovolenta et al. (2012)] and RegNetwork (Only "High confidence" entries) [Liu et al. (2015)]
?TF_TG_TRRUSTv2_HTRIdb_RegNetwork_High
data(TF_TG_TRRUSTv2_HTRIdb_RegNetwork_High)
TF_TG_db <- TF_TG_TRRUSTv2_HTRIdb_RegNetwork_High

########## Receptor-Transcription factor a-priori association  ##########
# list available receptor-transcription factor a-priori associations included in scSeqComm
?available_TF_PPR()
available_TF_PPR(species = "human")
# let's use the R-TF a-priori association obtained by KEGG signaling networks
?TF_PPR_KEGG_human
data(TF_PPR_KEGG_human)
TF_PPR <- TF_PPR_KEGG_human

rm(Example_data_GSE72056,LR_pairs_Kumar_2018,TF_TG_TRRUSTv2_HTRIdb_RegNetwork_High,PPR_Rec_TF_KEGG)

Once the inputs are ready, we can run the intracellular/intracellular analysis.

########## Cellular communication analysis ##########

# Define the number of cores for parallel execution (more cores reduce execution time at the cost of higher RAM usage)
num_cores <- 4

##### Identify and quantify intercellular and intracellular signaling
comm_res <- scSeqComm_analyze(gene_expr = gene_expr_matrix,
                              cell_group = cell_cluster,
                              LR_pairs_DB = LR_db,
                              TF_reg_DB = TF_TG_db,
                              R_TF_assocition = TF_PPR,
                              N_cores = num_cores)

inter_intra_scores <- comm_res$comm_results
Amount of cell-cell communication signals
#summarise S_intra score for each LR pair and cell cluster couple as max S_intra
inter_max_intra_scores <- scSeqComm_summaryze_S_intra(inter_intra_scores)

##### Ongoing cellular communication among cell types #####

## select communication with "strong" intercellular evidence (e.g. > 0.8)
sinter_TH <- 0.8
selected_comm <- scSeqComm_select(inter_max_intra_scores, 
                                  S_inter = sinter_TH)
# interactive chord diagram - Figure A)
scSeqComm_chorddiag_cardinality(data = selected_comm)
# heatmap - Figure B)
scSeqComm_heatmap_cardinality(data = selected_comm, 
                              title = "Ongoing cellular communication (intercellular evidence only)")
图片.png
Then, we refined the analysis counting only ligand-receptor pairs having also S_intra above 0.8, so to account for cell-cell communication for which there is evidence of a corresponding intracellular signal.
# select communication with "strong" intercellular AND intracellular evidence (e.g. s_inter > 0.8 and s_intra > 0.8)
sintra_TH <- 0.8
selected_comm <- scSeqComm_select(inter_max_intra_scores, 
                                  S_inter = sinter_TH, 
                                  S_intra = sintra_TH)
# interactive chord diagram - Figure C)
scSeqComm_chorddiag_cardinality(data = selected_comm)
# heatmap - Figure D)
scSeqComm_heatmap_cardinality(data = selected_comm, 
                              title = "Ongoing cellular communication (inter- and intra-cellular evidence)")
图片.png
## select communication with "medium" intercellular evidence (e.g. > 0.5)
sinter_TH <- 0.5
selected_comm <- scSeqComm_select(inter_max_intra_scores, 
                                  S_inter = sinter_TH)
# interactive chord diagram - Figure A)
scSeqComm_chorddiag_cardinality(data = selected_comm)
# heatmap - Figure B)
scSeqComm_heatmap_cardinality(data = selected_comm, title = "Ongoing cellular communication (intercellular evidence only)")

# select communication with "medium" intercellular AND intracellular evidence (e.g. s_inter > 0.5 and s_intra > 0.5)
sintra_TH <- 0.5
selected_comm <- scSeqComm_select(inter_max_intra_scores, 
                                  S_inter = sinter_TH, 
                                  S_intra = sintra_TH)
# interactive chord diagram - Figure C)
scSeqComm_chorddiag_cardinality(data = selected_comm)
# heatmap - Figure D)
scSeqComm_heatmap_cardinality(data = selected_comm, title = "Ongoing cellular communication (inter- and intra-cellular evidence)")
图片.png
Communication involving chemokines
### chemokine pairs
chemokine_lig <- c("CX3CL1","CXCL2","CXCL3","CXCL9","CXCL10","CXCL12","CXCL13","CXCL16",
                   "CCL14","CCL16","CCL2","CCL21","CCL3","CCL3L3","CCL4","CCL5")
chemokine_rec <- c("CX3CR1", "CXCR2", "CXCR3", "CXCR4", "CXCR5", "CXCR6",
                   "CCR1","CCR2","CCR5","CCR7","CCRL2")

chemokine_comm <- scSeqComm_select(inter_max_intra_scores, operator = "OR",
                                   ligand = chemokine_lig, receptor = chemokine_rec)
# plot group by receptor and receptor cluster - Figure S5
scSeqComm_plot_scores(data = chemokine_comm,
              title = "Ligand-receptor pairs involving chemokines - Intercellular and intracellular signaling analysis")
图片.png
# plot group by ligand and ligand cluster
scSeqComm_plot_scores(data = chemokine_comm, 
              title = "Ligand-receptor pairs involving chemokines - Intercellular and intracellular signaling analysis",
              facet_grid_x = "cluster_L", facet_grid_y = "ligand")
图片.png
## specific cell cluster pairs

# Macrophage --> CAF - Figure A)
# first select entries related to the given cell clusters pair
specific_cluster_pair_comm <- scSeqComm_select(chemokine_comm, 
                                               cluster_L = "Macrophage", cluster_R = "CAF")
# then plot the selected entries
scSeqComm_plot_cluster_pair (data = specific_cluster_pair_comm, 
                             title = "Chemokine - Macrophage --> CAF")

# CAF --> Macrophage - Figure B)
# alternative: specify the given cell clusters pair as input parameter
scSeqComm_plot_cluster_pair (data = chemokine_comm, 
                             selected_cluster_pair = "CAF --> Macrophage", 
                             title = "Chemokine - CAF --> Macrophage")
图片.png
## specific ligand-receptor pairs

pair <- "CCL2 - CCR5" # Figure A)
# first select entries related to the given LR pair
specific_LR_pair_comm <- scSeqComm_select(chemokine_comm, LR_pair = pair) 
# then plot the selected entries
scSeqComm_plot_LR_pair(data = specific_LR_pair_comm, title = pair)

pair <- "CXCL12 - ITGB1" # Figure B)
# alternative: specify the given LR pair as input parameter
scSeqComm_plot_LR_pair(data = chemokine_comm, title = pair, selected_LR_pair = pair)
图片.png
CAF specific communication
##### CAF specific cellular communication #####

## select cellular communication
# - with "strong" intercellular AND intracellular evidence (e.g. s_inter > 0.8 and s_intra > 0.8)
# - specific of CAF cells (i.e. they do not show a high evidence of being involved in other groups of cells)

sinter_TH <- 0.8
sintra_TH <- 0.8
all_cluster <- names(cell_cluster)
all_pair <- unique(inter_intra_scores$LR_pair)
selected_cluster <- "CAF"

# ligand-receptor pairs with strong intercellular AND intracellular evidence in the other cell clusters
non_specific_pair <- unique(scSeqComm_select(inter_intra_scores, 
                                              cluster_R = setdiff(all_cluster,selected_cluster),
                                              S_inter = sinter_TH, S_intra = sintra_TH, 
                                              NA_included = FALSE, 
                                              output_field = "LR_pair"))

# ligand-receptor pairs with strong intercellular AND intracellular evidence ONLY in CAF cells
CAF_specific_comm <- scSeqComm_select(data = inter_intra_scores,
                                      cluster_R = selected_cluster,
                                      LR_pair = setdiff(all_pair, non_specific_pair),
                                      S_inter = sinter_TH, S_intra = sintra_TH,
                                      NA_included = FALSE)
CAF_specific_pair <- unique(CAF_specific_comm$LR_pair)
# KEGG pathways associated to CAF specific communication (for visualization purposes)
KEGG_CAF_specific <- scSeqComm_select(data = CAF_specific_comm,
                                      output_field = "pathway")

# select results associated to CAF specific cellular communication (for visualization purposes)
CAF_specific_comm_plot <- scSeqComm_select(inter_intra_scores,
                                      cluster_R = "CAF",
                                      LR_pair = CAF_specific_pair,
                                      pathway = KEGG_CAF_specific)

# plot (GO global)
scSeqComm_plot_scores_pathway(data = CAF_specific_comm_plot, 
                  title = "CAF intercellular, intracellular and functional analysis",
                  annotation_GO = enrich_res_CAF_general_specificpair, cutoff = 0.05, topGO = 20)
图片.png

生活很好,有你更好

你可能感兴趣的:(10X单细胞(10X空间转录组)通讯分析之汇总(scSeqComm))