10X单细胞空间联合分析之四----DSTG

今天我们来分享另外一个10X单细胞空间联合分析的方法----DSTG(Deconvoluting Spatial Transcriptomics Data),我们在了解这个方法之前,先对一些基础的知识进行了解。

基础知识

graph convolutional networks (GCN,图神经网络)

了解这个概念之前,先要对CNN(Convolutional Neural Networks,卷积神经网络),这个我在之前的文章中分享过,大家可以参考文章10X空间转录组与卷积神经网络(CNNs),大家可以去看一下,这里不再多做解释了。
然后我们来看GCN,大家参考这篇文章深度学习新星 | 图卷积神经网络(GCN)有多强大,不关心算法的可以跳过这部分。

知道了这个之后,我们来看文章DSTG: Deconvoluting Spatial Transcriptomics Data through Graph-based Artificial Intelligence,该文章目前已发表,影响因11分(很高了,而且是中国人写的)。
文章读懂并不难,我们这里只关注重点。

In this work, we have developed a novel graph-based artificial intelligence model, Deconvoluting Spatial Transcriptomics data through Graph-based convolutional networks(DSTG), for reliable and accurate decomposition of cell mixtures in the spatially resolved transcriptomics data. Based on the well-characterized scRNA-seq dataset(需要定义好的单细胞数据), DSTG is able to learn the precise composition of spatial transcriptomics data using semi-supervised graph convolutional network.(图卷积网络解卷积空间数据)。

The performance of DSTG has been validated on synthetic ST data(合成数据的验证), as well as on different experimental ST datasets with well-defined structures including mouse cortex layer, hippocampus tissue, and pancreatic tumor tissues(真实空间数据的验证)。

首先来看第一点:原理

图片.png

Our hypothesis is that the captured gene expression on a spot is contributed by a mixture of cells located on that spot.(这里需要注意,也就是说空间的spot是由几个细胞的混合物),Our strategy is to use the scRNA-seqderived synthetic spatial transcriptomics data called “pseudo-ST”, to predict cell compositions in real-ST data through semi-supervised learning.(用单细胞数据随机混合几个细胞来“伪造”空间的数据,来预测真实的空间转录组数据)。
这个地方需要注意一个问题
如果说单细胞数据和空间数据不是完全匹配的,比如说单细胞数据缺少或者多了某种细胞类型,这样的话,预测的结果完全是有问题的。
我们来看看步骤:
(1)DSTG constructs the synthetic pseudo-ST data from scRNA-seq data as the learning basis of our method(利用单细胞数据随机几个细胞的信息合成pseudo-ST data,这里就需要注意我们上面提到的细胞类型的问题)
(2)DSTG learns a link graph of spot mapping across the pseudo-ST data and real-ST data using shared nearest neighbors. The link graph captures the intrinsic topological similarity between spots and incorporate the pseudo-ST and real-ST data into the same graph for learning.(两个数据之间找邻居,类似于Seurat的findAnchor)。
(3)based on the link graph, semi-supervised GCN is used to learn a latent representation of both local graph structure and gene expression patterns that can explain the various cell compositions at spots(GCN寻找最佳的“组分”)
步骤设计的还是很严谨的,就是方法上需要很多的调整。
方法的advantages
(1)sensitive and efficient,since for each spot, only the features of similar spots (i.e., neighbor nodes) are used。
(2)acquiring generalizable(可归纳的) knowledge about the association between gene expression patterns and cell compositions across spots in both pseudo- and real-ST, since the weight parameters in the convolution kernel are shared by all spots.
方法的缺点文献没有说,但是我们可以总结一下
(1)数据必须匹配
(2)“伪造”的空间数据,要考虑细胞内部异型性的问题,对于提取细胞类型的特征来代表这种细胞,其实是有一定的问题的。从这个角度看,细胞越细分,对联合分析越有利,但是对单细胞数据分析就会要求很高。

接下来是一些实例验证,当然,还是老套路,结果很好,不然发不出文章

图片.png

图片.png

接下来看一下软件的算法:

首先看单细胞数据的分析方法

Variable gene selection
For the scRNA-seq data, we first identify genes that exhibit the most variability across different cell types using the analysis of variance (ANOVA). The top 2,000 most variable gene features in the scRNA-seq data are selected according to adjusted P values with Bonferroni correction. Using the scRNA-seq data of the top variable genes, we then generate the pseudo-ST data (这个地方注意,高变的前2000个基因“伪造”ST data)with synthetic mixtures of cells with known cell 。The gene expressions at each pseudospot of the pseudo-ST data is generated by combining the randomly selected 2 to 8 cells from the scRNA-seq data.compositions.(这个地方就需要注意了,一种细胞类型其实内部也是有异质性的,都是T细胞,高变基因的随机组合结果也是千差万别的)。For simplicity and illustration, we consistently use the term “spot” to represent the synthetic cell mixture of the pseudo-ST data as well as a spot or a bead of real-ST data。
Link graph
这个地方大家需要注意两点:
(1)这种link的建立,算法在


图片.png

(2)低维空间数据的分析Second, in the low dimension space, we identify the mutual nearest neighbors among spots from pseudo-ST and real-ST data。
算法相对复杂,学数学的大牛可以出来解释一下。

至于这个方法的代码在DSTG,代码就不带着大家做一做了,关键在于自己理解这个软件的用法,以及代码的参数,封脚本很简单,大家自己动手做做就可以了。

生活很好,有你更好

你可能感兴趣的:(10X单细胞空间联合分析之四----DSTG)