Manifold learning-based methods for analyzing single-cell RNA-sequencing data 

Yale university 2017年12月发布的基于机器学习中流形学习的单细胞降维降噪处理优化。

The manifold learning:


 常见的MFL:PCA、MDS、diffusion mapping等,图下为不同方法的优劣简介。

Manifold learning-based methods for analyzing single-cell RNA-sequencing data_第1张图片

本文关键词:MFL(Manifold models can also be useful for analyzing data generated from disparate dynamics or profiles as the data can be modeled with several disconnected mani- folds)、DPT(a pseudotime trajectory through the data to describe a latent axis of development or cell state transition)、DPT method(to find a major axis of variability in the data, DPT defines a distance from a source cell to all other cells over a modified transition operator that includes only non- trivial diffusion components. This produces trajec- tories of nonlinear variation across a dataset)


gene selection, 

manifold learning, 

cell organization,

Dimensionality reduction and visualization,

Density estimation and clustering。

而整体的前三步统称为pseudotime methods。


Manifold learning-based methods for analyzing single-cell RNA-sequencing data_第2张图片




Manifold learning-based methods for analyzing single-cell RNA-sequencing data_第3张图片

Comparison of pseudotime methods. Pseudotime methods(four kinds of method) may generally be broken down into three stages: gene selection, manifold learning, and cell organization.


A current limitation of these methods is their reliance to varying degrees on assumptions about the underlying shape of the data (数据潜在形态的假设几何对后期分型影响很大)(e.g. a tree, bifurcating trajectory, etc.)

而他们开发的DPT,也就是最后一种方法:provideing two significant advantages over other pseudotemporal techniques. First, working directly on a diffusion map does not require any greedy computational steps(层级聚类的经典算法,每一步都是贪婪模型,也就是局部最优而不是树的全局最优). Second and most importantly, because DPT operates directly on the diffusion space, it features the least coarse graining or over-fitting of data into low-dimensional assumptions(DPT的工作对象是整体的扩散空间,而不是二分支结构以及树状结构,所以可以以最小的粗粒度过拟合到低维空间).



Manifold learning-based methods for analyzing single-cell RNA-sequencing data_第4张图片

三种降维分析的验证以及模拟数据点的jaccard index similarity validation in jaccard graph ,I mentioned in one piece of previous blog

 文章整篇都是叙述性的算法介绍,而没有任何公示和代码stick up。就本人拙见,比较重要的机器学习思维是其中的manifold learning,pseudotime method,以及根据MFL衍生出来的降维分析方法。



Manifold learning-based methods for analyzing single-cell RNA-sequencing data_第5张图片



你可能感兴趣的:(Manifold learning-based methods for analyzing single-cell RNA-sequencing data)