单细胞分析之细胞交互-7：目标基因集的配体和靶基因活性预测分析

常用的细胞通讯软件：

CellphoneDB：是公开的人工校正的，储存受体、配体以及两种相互作用的数据库。此外，还考虑了结构组成，能够描述异构复合物。（配体-受体+多聚体）

iTALK：通过平均表达量方式，筛选高表达的胚体和受体，根据结果作圈图。（配体-受体）

CellChat：CellChat将细胞的基因表达数据作为输入，并结合配体受体及其辅助因子的相互作用来模拟细胞间通讯。（配体-受体+多聚体+辅因子）

NicheNet // NicheNet多样本分析 // 目标基因的配体和靶基因活性预测：通过将相互作用细胞的表达数据与信号和基因调控网络的先验知识相结合来预测相互作用细胞之间的配体-靶标联系的方法。( 配体-受体+信号通路）
附：NicheNet使用的常见问题汇总

其它细胞互作软件还包括Celltalker，SingleCellSignalR，scTensor和SoptSC（这几个也是基于配体-受体相互作用）

Nichenet可以预测：

which ligands from one cell population (“sender/niche”) are most likely to affect target gene expression in an interacting cell population (“receiver/target”);
which specific target genes are affected by which of these predicted ligands.

1. 演示

1.0 加载R包和数据集

# R包
library(nichenetr)
library(tidyverse)

# 配体靶基因信息
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
##                 CXCL1        CXCL2        CXCL3        CXCL5         PPBP
## A1BG     3.534343e-04 4.041324e-04 3.729920e-04 3.080640e-04 2.628388e-04
## A1BG-AS1 1.650894e-04 1.509213e-04 1.583594e-04 1.317253e-04 1.231819e-04
## A1CF     5.787175e-04 4.596295e-04 3.895907e-04 3.293275e-04 3.211944e-04
## A2M      6.027058e-04 5.996617e-04 5.164365e-04 4.517236e-04 4.590521e-04
## A2M-AS1  8.898724e-05 8.243341e-05 7.484018e-05 4.912514e-05 5.120439e-05

##表达矩阵和metadata
hnscc_expression = readRDS(url("https://zenodo.org/record/3260758/files/hnscc_expression.rds"))
expression = hnscc_expression$expression
sample_info = hnscc_expression$sample_info # contains meta-information about the cells
View(hnscc_expression)

1.1 Define expressed genes in sender and receiver cell populations

我们的目标是探究CAFs表达的什么配体引起了周围肿瘤细胞的p-EMT，所以CAFs是sender细胞，肿瘤细胞是receiver细胞。（sender 和 receiver 也可以是同一种细胞类型，也就是自分泌）
因为我们想要研究的是high quality primary tumors，因此less quality的和lymph node metastases的肿瘤样本将被剔除。
在这个数据集中，为了定义expressed genes，我们采用了Ea, the aggregate expression of each gene i across the k cells, calculated as Ea(i) = log2(average(TPM(i)1…k)+1), should be >= 4. 而10x的数据集，我们更推荐genes to be expressed in a cell type when they have non-zero values in at least 10% of the cells from that cell type.

tumors_remove = c("HN10","HN","HN12", "HN13", "HN24", "HN7", "HN8","HN23")

CAF_ids = sample_info %>% filter(`Lymph node` == 0 & !(tumor %in% tumors_remove) & `non-cancer cell type` == "CAF") %>% pull(cell)
malignant_ids = sample_info %>% filter(`Lymph node` == 0 & !(tumor %in% tumors_remove) & `classified  as cancer cell` == 1) %>% pull(cell)

expressed_genes_sender = expression[CAF_ids,] %>% apply(2,function(x){10*(2**x - 1)}) %>% apply(2,function(x){log2(mean(x) + 1)}) %>% .[. >= 4] %>% names()
expressed_genes_receiver = expression[malignant_ids,] %>% apply(2,function(x){10*(2**x - 1)}) %>% apply(2,function(x){log2(mean(x) + 1)}) %>% .[. >= 4] %>% names()

# Check the number of expressed genes: should be a 'reasonable' number of total expressed genes in a cell type, e.g. between 5000-10000 (and not 500 or 20000)
length(expressed_genes_sender)
## [1] 6706
length(expressed_genes_receiver)
## [1] 6351

1.2 Define the gene set of interest and a background of genes

我们使用定义好的p-EMT基因集作为interest gene set，用肿瘤细胞表达的基因作为background.

geneset_oi = readr::read_tsv(url("https://zenodo.org/record/3260758/files/pemt_signature.txt"), col_names = "gene") %>% pull(gene) %>% .[. %in% rownames(ligand_target_matrix)] # only consider genes also present in the NicheNet model - this excludes genes from the gene list for which the official HGNC symbol was not used by Puram et al.
head(geneset_oi)
## [1] "SERPINE1" "TGFBI"    "MMP10"    "LAMC2"    "P4HA2"    "PDPN"

background_expressed_genes = expressed_genes_receiver %>% .[. %in% rownames(ligand_target_matrix)]
head(background_expressed_genes)
## [1] "RPS11"   "ELMO2"   "PNMA1"   "MMP2"    "TMEM216" "ERCC5"

1.3 Define a set of potential ligands

作为潜在的活性配体，我们将使用 1) 由 CAF 表达和 2) 可以结合恶性细胞表达的（putative）受体的配体。假定的配体-受体links是从 NicheNet 的ligand-receptor data sources收集的。

lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))

# If wanted, users can remove ligand-receptor interactions that were predicted based on protein-protein interactions and only keep ligand-receptor interactions that are described in curated databases. To do this: uncomment following line of code:
# lr_network = lr_network %>% filter(database != "ppi_prediction_go" & database != "ppi_prediction")

ligands = lr_network %>% pull(from) %>% unique()
expressed_ligands = intersect(ligands,expressed_genes_sender)

receptors = lr_network %>% pull(to) %>% unique()
expressed_receptors = intersect(receptors,expressed_genes_receiver)

lr_network_expressed = lr_network %>% filter(from %in% expressed_ligands & to %in% expressed_receptors) 
head(lr_network_expressed)
## # A tibble: 6 x 4
##   from    to        source         database
##                        
## 1 HGF     MET       kegg_cytokines kegg    
## 2 TNFSF10 TNFRSF10A kegg_cytokines kegg    
## 3 TNFSF10 TNFRSF10B kegg_cytokines kegg    
## 4 TGFB2   TGFBR1    kegg_cytokines kegg    
## 5 TGFB3   TGFBR1    kegg_cytokines kegg    
## 6 INHBA   ACVR2A    kegg_cytokines kegg

配体-受体网络包含表达的配体-受体相互作用。作为 NicheNet 分析的潜在活性配体，我们将考虑来自该网络的配体。

potential_ligands = lr_network_expressed %>% pull(from) %>% unique()
head(potential_ligands)
## [1] "HGF"     "TNFSF10" "TGFB2"   "TGFB3"   "INHBA"   "CD99"

1.4 Perform NicheNet’s ligand activity analysis on the gene set of interest

现在进行配体活性分析：在此分析中，我们将计算每个配体的配体活性，或者换句话说，我们将评估每个 CAF 配体 (和背景基因相比) 预测 p-EMT 基因的能力（预测一个基因是否属于 p-EMT program）。

ligand_activities = predict_ligand_activities(geneset = geneset_oi, background_expressed_genes = background_expressed_genes, ligand_target_matrix = ligand_target_matrix, potential_ligands = potential_ligands)

现在，我们要根据配体活性对配体进行排名。在我们的validation study中，我们发现配体的靶基因预测与观察到的转录反应之间的pearson 相关系数 (PCC) 是定义配体活性的最有用的测量方法。因此，我们将根据配体的PCC对配体进行排名。 This allows us to prioritize p-EMT-regulating ligands.

ligand_activities %>% arrange(-pearson) 
## # A tibble: 131 x 4
##    test_ligand auroc   aupr pearson
##                
##  1 PTHLH       0.667 0.0720   0.128
##  2 CXCL12      0.680 0.0507   0.123
##  3 AGT         0.676 0.0581   0.120
##  4 TGFB3       0.689 0.0454   0.117
##  5 IL6         0.693 0.0510   0.115
##  6 INHBA       0.695 0.0502   0.113
##  7 ADAM17      0.672 0.0526   0.113
##  8 TNC         0.700 0.0444   0.109
##  9 CTGF        0.680 0.0473   0.108
## 10 FN1         0.679 0.0505   0.108
## # ... with 121 more rows
best_upstream_ligands = ligand_activities %>% top_n(20, pearson) %>% arrange(-pearson) %>% pull(test_ligand)
head(best_upstream_ligands)
## [1] "PTHLH"  "CXCL12" "AGT"    "TGFB3"  "IL6"    "INHBA"

我们在这里看到，performance metrics 表明 20 个排名靠前的配体可以合理地预测 p-EMT 基因，这意味着配体的排名应该是准确的。然而，对于某些基因集，排名靠前的配体的目标基因预测性能可能不会比随机预测好多少。在这种情况下，配体的优先级将不太可信。

Additional note：我们在这里查看了前 20 个配体，并将通过推断这 20 个配体的 p-EMT 靶基因来继续分析。然而，选择仅查看排名靠前的 20 个配体以进行进一步的生物学解释是基于生物学直觉并且是相当随意的。因此，用户可以决定使用不同数量的配体继续分析。我们建议通过查看配体活性值的分布来检查选定的截止值。在这里，我们显示了配体活性直方图（第 20 个配体的分数通过虚线表示）。

# show histogram of ligand activity scores
p_hist_lig_activity = ggplot(ligand_activities, aes(x=pearson)) + 
  geom_histogram(color="black", fill="darkorange")  + 
  # geom_density(alpha=.1, fill="orange") +
  geom_vline(aes(xintercept=min(ligand_activities %>% top_n(20, pearson) %>% pull(pearson))), color="red", linetype="dashed", size=1) + 
  labs(x="ligand activity (PCC)", y = "# ligands") +
  theme_classic()
p_hist_lig_activity

1.5 Infer target genes of top-ranked ligands and visualize in a heatmap

现在我们将展示如何查看配体和感兴趣的靶基因之间的调节潜力评分。在这种情况下，我们将研究排名靠前的 p-EMT 调节配体和 p-EMT 基因之间的联系。在配体-靶标热图中，我们展示了 20 个排名靠前的配体与以下靶基因之间相互作用的调节潜力评分：属于感兴趣基因组的基因和 20 个排名靠前的配体中至少一个的 250 个最强烈预测的靶标（根据一般先验模型的前 250 个靶标，因此不是该数据集的前 250 个靶标）。因此，基因集中不是优先配体之一的top靶基因的基因将不会显示在热图上。

active_ligand_target_links_df = best_upstream_ligands %>% lapply(get_weighted_ligand_target_links,geneset = geneset_oi, ligand_target_matrix = ligand_target_matrix, n = 250) %>% bind_rows()

nrow(active_ligand_target_links_df)
## [1] 143
head(active_ligand_target_links_df)
## # A tibble: 6 x 3
##   ligand target  weight
##         
## 1 PTHLH  COL1A1 0.00399
## 2 PTHLH  MMP1   0.00425
## 3 PTHLH  MMP2   0.00210
## 4 PTHLH  MYH9   0.00116
## 5 PTHLH  P4HA2  0.00190
## 6 PTHLH  PLAU   0.00401

出于可视化目的，我们按照如下方法调整了配体-靶标regulatory potential matrix。如果它们的分数低于预定义的阈值，则将调节潜力评分设置为 0，在这里使用的是 20 个排名靠前的配体与其各自的顶级目标之间的相互作用分数的 0.25 分位数（see the ligand-target network defined in the data frame）。

active_ligand_target_links = prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.25)

nrow(active_ligand_target_links_df)
## [1] 143
head(active_ligand_target_links_df)
## # A tibble: 6 x 3
##   ligand target  weight
##         
## 1 PTHLH  COL1A1 0.00399
## 2 PTHLH  MMP1   0.00425
## 3 PTHLH  MMP2   0.00210
## 4 PTHLH  MYH9   0.00116
## 5 PTHLH  P4HA2  0.00190
## 6 PTHLH  PLAU   0.00401

我们使用热图来对putatively活性配体-目标links进行可视化。配体的顺序与根据配体活性预测的排序一致。

order_ligands = intersect(best_upstream_ligands, colnames(active_ligand_target_links)) %>% rev()
order_targets = active_ligand_target_links_df$target %>% unique()
vis_ligand_target = active_ligand_target_links[order_targets,order_ligands] %>% t()

p_ligand_target_network = vis_ligand_target %>% make_heatmap_ggplot("Prioritized CAF-ligands","p-EMT genes in malignant cells", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential") + scale_fill_gradient2(low = "whitesmoke",  high = "purple", breaks = c(0,0.005,0.01)) + theme(axis.text.x = element_text(face = "italic"))

p_ligand_target_network

请注意，这些可视化cutoffs的选择是比较主观的。我们建议多测试几个cutoffs。

如果根据先验信息考虑超过前 250 个targets，将推断出更多但不太confident的ligand-target links；通过考虑少于 250 个targets，结果会更加stringent。

如果您将用于将分数设置为 0（出于可视化目的）的分位数截止值更改，则降低此截止值将导致更密集的热图，而提高此截止值将导致更稀疏的热图。

2. Follow-up analysis

2.1 Ligand-receptor network inference for top-ranked ligands

一种类型的后续分析是观察receiver细胞群（此处：肿瘤细胞）的哪些受体可能与来自sender细胞群（此处：CAF）的优先配体结合。

因此，我们现在将推断出排名靠前的配体的预测配体-受体相互作用，并在热图中将它们可视化。

# get the ligand-receptor network of the top-ranked ligands
lr_network_top = lr_network %>% filter(from %in% best_upstream_ligands & to %in% expressed_receptors) %>% distinct(from,to)
best_upstream_receptors = lr_network_top %>% pull(to) %>% unique()

# get the weights of the ligand-receptor interactions as used in the NicheNet model
weighted_networks = readRDS(url("https://zenodo.org/record/3260758/files/weighted_networks.rds"))
lr_network_top_df = weighted_networks$lr_sig %>% filter(from %in% best_upstream_ligands & to %in% best_upstream_receptors)

# convert to a matrix
lr_network_top_df = lr_network_top_df %>% spread("from","weight",fill = 0)
lr_network_top_matrix = lr_network_top_df %>% select(-to) %>% as.matrix() %>% magrittr::set_rownames(lr_network_top_df$to)

# perform hierarchical clustering to order the ligands and receptors
dist_receptors = dist(lr_network_top_matrix, method = "binary")
hclust_receptors = hclust(dist_receptors, method = "ward.D2")
order_receptors = hclust_receptors$labels[hclust_receptors$order]

dist_ligands = dist(lr_network_top_matrix %>% t(), method = "binary")
hclust_ligands = hclust(dist_ligands, method = "ward.D2")
order_ligands_receptor = hclust_ligands$labels[hclust_ligands$order]

Show a heatmap of the ligand-receptor interactions

vis_ligand_receptor_network = lr_network_top_matrix[order_receptors, order_ligands_receptor]
p_ligand_receptor_network = vis_ligand_receptor_network %>% t() %>% make_heatmap_ggplot("Prioritized CAF-ligands","Receptors expressed by malignant cells", color = "mediumvioletred", x_axis_position = "top",legend_title = "Prior interaction potential")
p_ligand_receptor_network

2.2 Visualize expression of top-predicted ligands and their target genes in a combined heatmap

NicheNet 只考虑sender细胞的表达配体，但不考虑它们的表达来对配体进行排序。该排名纯粹基于在给定先验知识的情况下，配体可能调节感兴趣的基因集的潜力。因为进一步研究配体及其靶基因的表达也很有用，我们在此演示如何制作显示配体活性、配体表达、靶基因表达和配体-靶调节潜力的组合图。

Load additional packages required for the visualization:

library(RColorBrewer)
library(cowplot)
library(ggpubr)

Prepare the ligand activity matrix

ligand_pearson_matrix = ligand_activities %>% select(pearson) %>% as.matrix() %>% magrittr::set_rownames(ligand_activities$test_ligand)

vis_ligand_pearson = ligand_pearson_matrix[order_ligands, ] %>% as.matrix(ncol = 1) %>% magrittr::set_colnames("Pearson")

p_ligand_pearson = vis_ligand_pearson %>% make_heatmap_ggplot("Prioritized CAF-ligands","Ligand activity", color = "darkorange",legend_position = "top", x_axis_position = "top", legend_title = "Pearson correlation coefficient\ntarget gene prediction ability)")
p_ligand_pearson

Prepare expression of ligands in fibroblast per tumor

因为单细胞数据是从多个肿瘤中收集的，我们将在这里显示每个肿瘤配体的平均表达。

expression_df_CAF = expression[CAF_ids,order_ligands] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell")

aggregated_expression_CAF = expression_df_CAF %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)

aggregated_expression_df_CAF = aggregated_expression_CAF %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_CAF$tumor) %>% data.frame() %>% rownames_to_column("ligand") %>% as_tibble() 

aggregated_expression_matrix_CAF = aggregated_expression_df_CAF %>% select(-ligand) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_CAF$ligand)

order_tumors = c("HN6","HN20","HN26","HN28","HN22","HN25","HN5","HN18","HN17","HN16") # this order was determined based on the paper from Puram et al. Tumors are ordered according to p-EMT score.
vis_ligand_tumor_expression = aggregated_expression_matrix_CAF[order_ligands,order_tumors]

library(RColorBrewer)
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(100)
p_ligand_tumor_expression = vis_ligand_tumor_expression %>% make_heatmap_ggplot("Prioritized CAF-ligands","Tumor", color = color[100],legend_position = "top", x_axis_position = "top", legend_title = "Expression\n(averaged over\nsingle cells)") + theme(axis.text.y = element_text(face = "italic"))
p_ligand_tumor_expression

Prepare expression of target genes in malignant cells per tumor

expression_df_target = expression[malignant_ids,geneset_oi] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell") 

aggregated_expression_target = expression_df_target %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)

aggregated_expression_df_target = aggregated_expression_target %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_target$tumor) %>% data.frame() %>% rownames_to_column("target") %>% as_tibble() 

aggregated_expression_matrix_target = aggregated_expression_df_target %>% select(-target) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_target$target)

vis_target_tumor_expression_scaled = aggregated_expression_matrix_target %>% t() %>% scale_quantile() %>% .[order_tumors,order_targets]

p_target_tumor_scaled_expression = vis_target_tumor_expression_scaled  %>% make_threecolor_heatmap_ggplot("Tumor","Target", low_color = color[1],mid_color = color[50], mid = 0.5, high_color = color[100], legend_position = "top", x_axis_position = "top" , legend_title = "Scaled expression\n(averaged over\nsingle cells)") + theme(axis.text.x = element_text(face = "italic"))
p_target_tumor_scaled_expression

都画在一起

figures_without_legend = plot_grid(
  p_ligand_pearson + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()),
  p_ligand_tumor_expression + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()) + ylab(""),
  p_ligand_target_network + theme(legend.position = "none", axis.ticks = element_blank()) + ylab(""), 
  NULL,
  NULL,
  p_target_tumor_scaled_expression + theme(legend.position = "none", axis.ticks = element_blank()) + xlab(""), 
  align = "hv",
  nrow = 2,
  rel_widths = c(ncol(vis_ligand_pearson)+ 4.5, ncol(vis_ligand_tumor_expression), ncol(vis_ligand_target)) -2,
  rel_heights = c(nrow(vis_ligand_pearson), nrow(vis_target_tumor_expression_scaled) + 3)) 

legends = plot_grid(
  as_ggplot(get_legend(p_ligand_pearson)),
  as_ggplot(get_legend(p_ligand_tumor_expression)),
  as_ggplot(get_legend(p_ligand_target_network)),
  as_ggplot(get_legend(p_target_tumor_scaled_expression)),
  nrow = 2,
  align = "h")

plot_grid(figures_without_legend, 
          legends, 
          rel_heights = c(10,2), nrow = 2, align = "hv")

2.3 Other follow-up analyses:

As another follow-up analysis, you can infer possible signaling paths between ligands and targets of interest. You can read how to do this in the following vignette Inferring ligand-to-target signaling paths:vignette("ligand_target_signaling_path", package="nichenetr").
Another follow-up analysis is getting a “tangible” measure of how well top-ranked ligands predict the gene set of interest and assess which genes of the gene set can be predicted well. You can read how to do this in the following vignette Assess how well top-ranked ligands can predict a gene set of interest:vignette("target_prediction_evaluation_geneset", package="nichenetr").
In case you want to visualize ligand-target links between multiple interacting cells, you can make an appealing circos plot as shown in vignette Circos plot visualization to show active ligand-target links between interacting cells:vignette("circos", package="nichenetr").

3. 从seurat对象做分析

3.0 数据集准备

library(nichenetr)
library(tidyverse)

# 配体靶基因信息
ligand_target_matrix = readRDS(url("https://zenodo.org/record/3260758/files/ligand_target_matrix.rds"))
ligand_target_matrix[1:5,1:5] # target genes in rows, ligands in columns
#读入seurat对象，用的是nichenet多组比较的那个数据集
seuratObj <- readRDS(url("https://zenodo.org/record/4675430/files/seurat_obj_hnscc.rds"))
#得到表达矩阵和metadata
expression=t(as.matrix(seuratObj@assays$SCT@data))
[email protected]

3.1 定义sender 和 receiver细胞群中的 expressed genes

#筛选样本
CAF_ids = sample_info %>% filter(`non.cancer.cell.type` == "CAF") %>% pull(cell)
malignant_ids = sample_info %>% filter(`classified..as.cancer.cell` == 1) %>% pull(cell)

#10x推荐至少pct = 0.10
receiver = "Malignant"
expressed_genes_receiver = get_expressed_genes(receiver, seuratObj, pct = 0.10)
length(expressed_genes_receiver)
# [1] 9994
sender = "CAF"
expressed_genes_sender = get_expressed_genes(sender, seuratObj, pct = 0.10)
length(expressed_genes_sender)
# [1] 8117

# 多种sender细胞的情况
# sender_celltypes = c("CD4 T","Treg", "Mono", "NK", "B", "DC")
# list_expressed_genes_sender = sender_celltypes %>% unique() %>% lapply(get_expressed_genes, seuratObj, 0.10) # lapply to get the expressed genes of every sender cell type separately here
# expressed_genes_sender = list_expressed_genes_sender %>% unlist() %>% unique()

3.2 定义interest 和 background基因集

geneset_oi = readr::read_tsv(url("https://zenodo.org/record/3260758/files/pemt_signature.txt"), col_names = "gene") %>% pull(gene) %>% .[. %in% rownames(ligand_target_matrix)] # only consider genes also present in the NicheNet model - this excludes genes from the gene list for which the official HGNC symbol was not used by Puram et al.
head(geneset_oi)
## [1] "SERPINE1" "TGFBI"    "MMP10"    "LAMC2"    "P4HA2"    "PDPN"

background_expressed_genes = expressed_genes_receiver %>% .[. %in% rownames(ligand_target_matrix)]
head(background_expressed_genes)

3.3 Define a set of potential ligands

lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))

# If wanted, users can remove ligand-receptor interactions that were predicted based on protein-protein interactions and only keep ligand-receptor interactions that are described in curated databases. To do this: uncomment following line of code:
# lr_network = lr_network %>% filter(database != "ppi_prediction_go" & database != "ppi_prediction")

ligands = lr_network %>% pull(from) %>% unique()
expressed_ligands = intersect(ligands,expressed_genes_sender)

receptors = lr_network %>% pull(to) %>% unique()
expressed_receptors = intersect(receptors,expressed_genes_receiver)

lr_network_expressed = lr_network %>% filter(from %in% expressed_ligands & to %in% expressed_receptors) 
head(lr_network_expressed)
## A tibble: 6 × 4
#  from  to     source         database
#                  
# 1 IL6   IL6ST  kegg_cytokines kegg    
# 2 IL6   IL6R   kegg_cytokines kegg    
# 3 IL11  IL6ST  kegg_cytokines kegg    
# 4 CLCF1 IL6ST  kegg_cytokines kegg    
# 5 HGF   MET    kegg_cytokines kegg    
# 6 IL10  IL10RB kegg_cytokines kegg

potential_ligands = lr_network_expressed %>% pull(from) %>% unique()
head(potential_ligands)
# [1] "IL6"     "IL11"    "CLCF1"   "HGF"     "IL10"    "TNFSF10"

3.4 Perform NicheNet’s ligand activity analysis on the gene set of interest

ligand_activities = predict_ligand_activities(geneset = geneset_oi, background_expressed_genes = background_expressed_genes, ligand_target_matrix = ligand_target_matrix, potential_ligands = potential_ligands)
ligand_activities %>% arrange(-pearson) 
best_upstream_ligands = ligand_activities %>% top_n(20, pearson) %>% arrange(-pearson) %>% pull(test_ligand)
head(best_upstream_ligands)
p_hist_lig_activity = ggplot(ligand_activities, aes(x=pearson)) + 
  geom_histogram(color="black", fill="darkorange")  + 
  # geom_density(alpha=.1, fill="orange") +
  geom_vline(aes(xintercept=min(ligand_activities %>% top_n(20, pearson) %>% pull(pearson))), color="red", linetype="dashed", size=1) + 
  labs(x="ligand activity (PCC)", y = "# ligands") +
  theme_classic()
p_hist_lig_activity

3.5 Infer target genes of top-ranked ligands and visualize in a heatmap

active_ligand_target_links_df = best_upstream_ligands %>% lapply(get_weighted_ligand_target_links,geneset = geneset_oi, ligand_target_matrix = ligand_target_matrix, n = 250) %>% bind_rows()

nrow(active_ligand_target_links_df)
## [1] 136
head(active_ligand_target_links_df)

active_ligand_target_links = prepare_ligand_target_visualization(ligand_target_df = active_ligand_target_links_df, ligand_target_matrix = ligand_target_matrix, cutoff = 0.25)

nrow(active_ligand_target_links_df)
## [1] 136
head(active_ligand_target_links_df)

order_ligands = intersect(best_upstream_ligands, colnames(active_ligand_target_links)) %>% rev()
order_targets = active_ligand_target_links_df$target %>% unique()
vis_ligand_target = active_ligand_target_links[order_targets,order_ligands] %>% t()

p_ligand_target_network = vis_ligand_target %>% make_heatmap_ggplot("Prioritized CAF-ligands","p-EMT genes in malignant cells", color = "purple",legend_position = "top", x_axis_position = "top",legend_title = "Regulatory potential") + scale_fill_gradient2(low = "whitesmoke",  high = "purple", breaks = c(0,0.005,0.01)) + theme(axis.text.x = element_text(face = "italic"))

p_ligand_target_network

3.6 合并绘图

## Load additional packages required for the visualization
library(RColorBrewer)
library(cowplot)
library(ggpubr)

## Prepare the ligand activity matrix
ligand_pearson_matrix = ligand_activities %>% select(pearson) %>% as.matrix() %>% magrittr::set_rownames(ligand_activities$test_ligand)
vis_ligand_pearson = ligand_pearson_matrix[order_ligands, ] %>% as.matrix(ncol = 1) %>% magrittr::set_colnames("Pearson")
p_ligand_pearson = vis_ligand_pearson %>% make_heatmap_ggplot("Prioritized CAF-ligands","Ligand activity", color = "darkorange",legend_position = "top", x_axis_position = "top", legend_title = "Pearson correlation coefficient\ntarget gene prediction ability)")
# p_ligand_pearson

## Prepare expression of ligands in fibroblast per tumor
expression_df_CAF = expression[CAF_ids,order_ligands] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell")
aggregated_expression_CAF = expression_df_CAF %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)
aggregated_expression_df_CAF = aggregated_expression_CAF %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_CAF$tumor) %>% data.frame() %>% rownames_to_column("ligand") %>% as_tibble() 
aggregated_expression_matrix_CAF = aggregated_expression_df_CAF %>% select(-ligand) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_CAF$ligand)
order_tumors = c("HN6","HN20","HN26","HN28","HN22","HN25","HN5","HN18","HN17","HN16") # this order was determined based on the paper from Puram et al. Tumors are ordered according to p-EMT score.
vis_ligand_tumor_expression = aggregated_expression_matrix_CAF[order_ligands,order_tumors]
library(RColorBrewer)
color = colorRampPalette(rev(brewer.pal(n = 7, name ="RdYlBu")))(100)
p_ligand_tumor_expression = vis_ligand_tumor_expression %>% make_heatmap_ggplot("Prioritized CAF-ligands","Tumor", color = color[100],legend_position = "top", x_axis_position = "top", legend_title = "Expression\n(averaged over\nsingle cells)") + theme(axis.text.y = element_text(face = "italic"))
# p_ligand_tumor_expression

## Prepare expression of target genes in malignant cells per tumor
expression_df_target = expression[malignant_ids,geneset_oi] %>% data.frame() %>% rownames_to_column("cell") %>% as_tibble() %>% inner_join(sample_info %>% select(cell,tumor), by =  "cell") 
aggregated_expression_target = expression_df_target %>% group_by(tumor) %>% select(-cell) %>% summarise_all(mean)
aggregated_expression_df_target = aggregated_expression_target %>% select(-tumor) %>% t() %>% magrittr::set_colnames(aggregated_expression_target$tumor) %>% data.frame() %>% rownames_to_column("target") %>% as_tibble() 
aggregated_expression_matrix_target = aggregated_expression_df_target %>% select(-target) %>% as.matrix() %>% magrittr::set_rownames(aggregated_expression_df_target$target)
vis_target_tumor_expression_scaled = aggregated_expression_matrix_target %>% t() %>% scale_quantile() %>% .[order_tumors,order_targets]
p_target_tumor_scaled_expression = vis_target_tumor_expression_scaled  %>% make_threecolor_heatmap_ggplot("Tumor","Target", low_color = color[1],mid_color = color[50], mid = 0.5, high_color = color[100], legend_position = "top", x_axis_position = "top" , legend_title = "Scaled expression\n(averaged over\nsingle cells)") + theme(axis.text.x = element_text(face = "italic"))
# p_target_tumor_scaled_expression

##都画在一起
figures_without_legend = plot_grid(
  p_ligand_pearson + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()),
  p_ligand_tumor_expression + theme(legend.position = "none", axis.ticks = element_blank()) + theme(axis.title.x = element_text()) + ylab(""),
  p_ligand_target_network + theme(legend.position = "none", axis.ticks = element_blank()) + ylab(""), 
  NULL,
  NULL,
  p_target_tumor_scaled_expression + theme(legend.position = "none", axis.ticks = element_blank()) + xlab(""), 
  align = "hv",
  nrow = 2,
  rel_widths = c(ncol(vis_ligand_pearson)+ 4.5, ncol(vis_ligand_tumor_expression), ncol(vis_ligand_target)) -2,
  rel_heights = c(nrow(vis_ligand_pearson), nrow(vis_target_tumor_expression_scaled) + 3)) 

legends = plot_grid(
  as_ggplot(get_legend(p_ligand_pearson)),
  as_ggplot(get_legend(p_ligand_tumor_expression)),
  as_ggplot(get_legend(p_ligand_target_network)),
  as_ggplot(get_legend(p_target_tumor_scaled_expression)),
  nrow = 2,
  align = "h")

plot_grid(figures_without_legend, 
          legends, 
          rel_heights = c(10,2), nrow = 2, align = "hv")