在我们之前的分享中,已经分享过很多的单细胞和空间联合分析的方法,有关Spotlight,之前也提到过,但是并没有很认真的对关注这个方法,这次的分享,就让我们来参透这个方法;这个软件目前已经发表,文献在SPOTlight:Seeded NMF regression to Deconvolute Spatial Transcriptomics Spots with Single-Cell Transcriptomes,发表于Nucleic acids research,影响因子11分以上。文章大家可以认真研读一下,算法也很经典。
软件联合分析的原理
SPOTlight is centered around a seeded non-negative matrix factorization (NMF) regression, initialized using cell-type marker genes, and nonnegative least squares (NNLS) to subsequently deconvolute ST capture locations (spots).
其中的几个概念我们需要了解;
(1)NMF,这个可以参考文章Non-Negative Matrix Factorization (NMF)。我们这里举一个简单的例子帮助大家理解:
矩阵分解的想法源于我们相信每个电影都是由某些元素组成的,而这些元素到底是什么,有多少,我们是不知道的。
而不同的用户出于个人喜好一定会对这些元素有个评价。
比如:
一个电影里如果只考虑爱情和动作两个因素。
一个用户也一定会有我给爱情打分多一点还是喜欢动作多一点。
如图:
期望就是我们已知下面的矩阵,而期望分解出比较合理的上面两个矩阵。这样就可以用来预测别的用户对别的电影的看法了。把这个想法运用到我们的空间数据上,就可以了。
(2) nonnegative least squares (NNLS),非负最小二乘法,这个可以参考文章【技术分享】非负最小二乘,算法很难,了解一下即可。
总而言之一句话,Spotlight利用单细胞数据的特征,deconvolute ST 数据,从而知道每种细胞类型在空间上的位置。
软件的优势
Using synthetic spots, simulating varying reference quantities and qualities, we confirmed high prediction accuracy also with shallowly sequenced or small-sized scRNA-seq reference datasets(准确度高,文献肯定都这么写)。We trained the NMF regression model with sample-matched or external datasets, resulting in accurate and sensitive spatial predictions。还有一点,Using SPOTlight to detect regional enrichment of immune cells and their co-localization with tumor and adjacent stroma provides an illustrative example in its flexible application spectrum and future potential in digital pathology.(免疫细胞和肿瘤细胞的共定位问题,值得关注)。
(重点)Successful integration of both data modalities could enable an in-depth study of tissue and organ architecture, elucidate cellular cross-talk, spatially track dynamic cell trajectories, and identify disease-specific interaction networks 。Intersecting cell-type-specific genes from scRNA-seq with ST capture sites previously identified local enrichments, sufficient to segment tumor sections into normal and cancerous areas.(这个优点大家要多注意下,单细胞分析得到的通讯啊,轨迹啊,在空间上又是什么状态呢?)
技术示意图
我们来看一下这个软件对空间数据分解的过程:
1、最核心的部分,identify cell type-specific topic profiles used to deconvolute ST spots。
这里多说一句,第一步的做法相当于将我们的表达矩阵拆分成了两个矩阵,W和H。(理论部分 NMF(Non-negative matrix factorization),即对于任意给定的一个非负矩阵V,其能够寻找到一个非负矩阵W和一个非负矩阵H,满足条件V=WH,从而将一个非负的矩阵分解为左右两个非负矩阵的乘积。*其中,V矩阵中每一列代表一个观测(observation),每一行代表一个特征(feature);W矩阵称为基矩阵,H矩阵称为系数矩阵或权重矩阵。这时用系数矩阵H代替原始矩阵,就可以实现对原始矩阵进行降维,得到数据特征的降维矩阵。)We set out to use NMF to obtain topic profiles due to its previous success in identifying biologically relevant gene expression programs, as well as its previous implementation in ST analysis。这一步大家着重理解一下,关于NMF。Importantly, the NMF is initialized by the two main matrices: the basis matrix (W) with unique cell type marker genes and weights, and the coefficient matrix (H) in which each row is initialized, specifying the corresponding relationship of a cell to a topic。
2、Factorization is then carried out using non-smooth NMF(因式分解)。promoting cell type-specific topic profiles, while reducing overfitting during training。
3、NNLS regression is used to map each spot’s transcriptome to a topic profile distribution using the unit-variance normalized ST count matrix and the basis matrix previously obtained。回归分析,注意这里用到的矩阵关系。
4、Lastly, NNLS is again applied to determine the weights for each cell type that best fit each spot’s topic profile by minimizing the residuals.
We use a minimum weight contribution threshold to determine which cell types are contributing to the profile of a given spot, also considering the possibility of partial contributions. NNLS also returns a measure of error along with the predicted cell proportions, allowing the user to estimate the reliability of predicted spot compositions
这个地方大家需要注意的是,矩阵的分解,以及分别进行回归,最后一起预测细胞类型的组成。
接下来就是一些运用了,包括文献本身的验证和空间数据上的运用,大家看看即可, 肯定效果很不错,要不然发不出来,。
最后我们来看一下代码部分:
SPOTlight
这就不照抄别人的代码了,我们只关注重点:spotlight_deconvolution
关于这个函数有几个参数必须要注意:
1、cluster_markers 注意这里需要的marker单细胞分析得到的,但是用多少,需要判断。
2、cl_n Object of integer indicating how many cells to keep from each cluster. If a cluster has n < cl_n then all cells will be selected, if it has more then cl_n will be sampled randomly,100 by default.
3、method Object of class character; Type of method to us to find W and H. Look at NMF package for the options and specifications, by default nsNMF(最重要的一步,因式分解的方法)。
等等等等
代码部分
The goal of SPOTlight is to provide a tool that enables the deconvolution of cell types and cell type proportions present within each capture locations comprising mixtures of cells, originally developed for 10X’s Visium - spatial trancsiptomics- technology, it can be used for all technologies returning mixtures of cells. SPOTlight is based on finding topic profile signatures, by means of an NMFreg model, for each cell type and finding which combination fits best the spot we want to deconvolute.
Graphical abstract
0.1 Installation
You can install the latest stable version from the GitHub repository SPOTlight with:
# install.packages("devtools")
devtools::install_github("https://github.com/MarcElosua/SPOTlight")
devtools::install_git("https://github.com/MarcElosua/SPOTlight")
Or the latest version in development by downloading the devel branch
devtools::install_github("https://github.com/MarcElosua/SPOTlight", ref = "devel")
devtools::install_git("https://github.com/MarcElosua/SPOTlight", ref = "devel")
0.2 Libraries
library(Matrix)
library(data.table)
library(Seurat)
library(SeuratData)
library(dplyr)
library(gt)
library(SPOTlight)
library(igraph)
library(RColorBrewer)
0.3 Load data
For the purpose of this tutorial we are going to use adult mouse brain data. The scRNAseq data can be downloaded here while the spatial data is the one put out publicly by 10X and the processed object can be downloaded using SeuratData as shown below.
Load single-cell reference dataset.
path_to_data <- system.file(package = "SPOTlight")
cortex_sc <- readRDS(glue::glue("{path_to_data}/allen_cortex_dwn.rds"))
Load Spatial data
if (! "stxBrain" %in% SeuratData::AvailableData()[, "Dataset"]) {
# If dataset not downloaded proceed to download it
SeuratData::InstallData("stxBrain")
}
# Load data
anterior <- SeuratData::LoadData("stxBrain", type = "anterior1")
0.4 Pre-processing
set.seed(123)
cortex_sc <- Seurat::SCTransform(cortex_sc, verbose = FALSE) %>%
Seurat::RunPCA(., verbose = FALSE) %>%
Seurat::RunUMAP(., dims = 1:30, verbose = FALSE)
Visualize the clustering
Seurat::DimPlot(cortex_sc,
group.by = "subclass",
label = TRUE) + Seurat::NoLegend()
0.5 Descriptive
[email protected] %>%
dplyr::count(subclass) %>%
gt::gt(.[-1, ]) %>%
gt::tab_header(
title = "Cell types present in the reference dataset",
) %>%
gt::cols_label(
subclass = gt::html("Cell Type")
)
Cell types present in the reference dataset | |
---|---|
--- | |
Cell Type | n |
--- | --- |
Astro | 70 |
CR | 7 |
Endo | 70 |
L2/3 IT | 70 |
L4 | 70 |
L5 IT | 70 |
L5 PT | 70 |
L6 CT | 70 |
L6 IT | 70 |
L6b | 70 |
Lamp5 | 70 |
Macrophage | 51 |
Meis2 | 45 |
NP | 70 |
Oligo | 70 |
Peri | 32 |
Pvalb | 70 |
Serpinf1 | 27 |
SMC | 55 |
Sncg | 70 |
Sst | 70 |
Vip | 70 |
VLMC | 67 |
0.6 Compute marker genes
To determine the most important marker genes we can use the function Seurat::FindAllMarkers
which will return the markers for each cluster.
Seurat::Idents(object = cortex_sc) <- [email protected]$subclass
cluster_markers_all <- Seurat::FindAllMarkers(object = cortex_sc,
assay = "SCT",
slot = "data",
verbose = TRUE,
only.pos = TRUE)
saveRDS(object = cluster_markers_all,
file = here::here("inst/markers_sc.RDS"))
0.6.1 SPOTlight Decomposition
set.seed(123)
spotlight_ls <- spotlight_deconvolution(
se_sc = cortex_sc,
counts_spatial = anterior@assays$Spatial@counts,
clust_vr = "subclass", # Variable in sc_seu containing the cell-type annotation
cluster_markers = cluster_markers_all, # Dataframe with the marker genes
cl_n = 100, # number of cells per cell type to use
hvg = 3000, # Number of HVG to use
ntop = NULL, # How many of the marker genes to use (by default all)
transf = "uv", # Perform unit-variance scaling per cell and spot prior to factorzation and NLS
method = "nsNMF", # Factorization method
min_cont = 0 # Remove those cells contributing to a spot below a certain threshold
)
saveRDS(object = spotlight_ls, file = here::here("inst/spotlight_ls.rds"))
Read RDS object
spotlight_ls <- readRDS(file = here::here("inst/spotlight_ls.rds"))
nmf_mod <- spotlight_ls[[1]]
decon_mtrx <- spotlight_ls[[2]]
0.6.2 Assess deconvolution
Before even looking at the decomposed spots we can gain insight on how well the model performed by looking at the topic profiles for the cell types.
The first thing we can do is look at how specific the topic profiles are for each cell type.
h <- NMF::coef(nmf_mod[[1]])
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- SPOTlight::dot_plot_profiles_fun(
h = h,
train_cell_clust = nmf_mod[[2]])
topic_profile_plts[[2]] + ggplot2::theme(
axis.text.x = ggplot2::element_text(angle = 90),
axis.text = ggplot2::element_text(size = 12))
0.7 Visualization
Join decomposition with metadata
# This is the equivalent to setting min_cont to 0.04
decon_mtrx_sub <- decon_mtrx[, colnames(decon_mtrx) != "res_ss"]
decon_mtrx_sub[decon_mtrx_sub < 0.08] <- 0
decon_mtrx <- cbind(decon_mtrx_sub, "res_ss" = decon_mtrx[, "res_ss"])
rownames(decon_mtrx) <- colnames(anterior)
decon_df <- decon_mtrx %>%
data.frame() %>%
tibble::rownames_to_column("barcodes")
[email protected] <- [email protected] %>%
tibble::rownames_to_column("barcodes") %>%
dplyr::left_join(decon_df, by = "barcodes") %>%
tibble::column_to_rownames("barcodes")
0.7.1 Specific cell-types
we can use the standard Seurat::SpatialFeaturePlot
to view predicted celltype proportions one at a time.
Seurat::SpatialFeaturePlot(
object = anterior,
features = c("L2.3.IT", "L6b", "Meis2", "Oligo"),
alpha = c(0.1, 1))
0.7.2 Spatial scatterpies
Plot spot composition of all the spots.
cell_types_all <- colnames(decon_mtrx)[which(colnames(decon_mtrx) != "res_ss")]
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = "sample_data/spatial/tissue_lowres_image.png",
pie_scale = 0.4)
Plot spot composition of spots containing cell-types of interest
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = "sample_data/spatial/tissue_lowres_image.png",
cell_types_interest = "L6b",
pie_scale = 0.8)
0.7.3 Spatial interaction graph
Now that we know which cell types are found within each spot we can make a graph representing spatial interactions where cell types will have stronger edges between them the more often we find them within the same spot. To do this we will only need to run the function get_spatial_interaction_graph
, this function prints the plot and returns the elements needed to plot it.
graph_ntw <- SPOTlight::get_spatial_interaction_graph(decon_mtrx = decon_mtrx[, cell_types_all])
If you want to tune how the graph looks you can do the following or you can check out more options here:
deg <- degree(graph_ntw, mode="all")
# Get color palette for difusion
edge_importance <- E(graph_ntw)$importance
# Select a continuous palette
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == 'seq',]
# Create a color palette
getPalette <- colorRampPalette(brewer.pal(9, "YlOrRd"))
# Get how many values we need
grad_edge <- seq(0, max(edge_importance), 0.1)
# Generate extended gradient palette dataframe
graph_col_df <- data.frame(value = as.character(grad_edge),
color = getPalette(length(grad_edge)),
stringsAsFactors = FALSE)
# Assign color to each edge
color_edge <- data.frame(value = as.character(round(edge_importance, 1)), stringsAsFactors = FALSE) %>%
dplyr::left_join(graph_col_df, by = "value") %>%
dplyr::pull(color)
# Open a pdf file
plot(graph_ntw,
# Size of the edge
edge.width = edge_importance,
edge.color = color_edge,
# Size of the buble
vertex.size = deg,
vertex.color = "#cde394",
vertex.frame.color = "white",
vertex.label.color = "black",
vertex.label.family = "Ubuntu", # Font family of the label (e.g.“Times”, “Helvetica”)
layout = layout.circle)
Lastly one can compute cell-cell correlations to see groups of cells that correlate positively or negatively.
# Remove cell types not predicted to be on the tissue
decon_mtrx_sub <- decon_mtrx[, cell_types_all]
decon_mtrx_sub <- decon_mtrx_sub[, colSums(decon_mtrx_sub) > 0]
# Compute correlation
decon_cor <- cor(decon_mtrx_sub)
# Compute correlation P-value
p.mat <- corrplot::cor.mtest(mat = decon_mtrx_sub, conf.level = 0.95)
# Visualize
ggcorrplot::ggcorrplot(
corr = decon_cor,
p.mat = p.mat[[1]],
hc.order = TRUE,
type = "full",
insig = "blank",
lab = TRUE,
outline.col = "lightgrey",
method = "square",
# colors = c("#4477AA", "white", "#BB4444"))
colors = c("#6D9EC1", "white", "#E46726"),
title = "Predicted cell-cell proportion correlation",
legend.title = "Correlation\n(Pearson)") +
ggplot2::theme(
plot.title = ggplot2::element_text(size = 22, hjust = 0.5, face = "bold"),
legend.text = ggplot2::element_text(size = 12),
legend.title = ggplot2::element_text(size = 15),
axis.text.x = ggplot2::element_text(angle = 90),
axis.text = ggplot2::element_text(size = 18, vjust = 0.5))
0.8 Step-by-Step insight
Here we are going to show step by step what is going on and all the different steps involved in the process.
还是那句话,生活很好,有你更好