Paper reading (十二):Challenges in unsupervised clustering of single-cell RNA-seq data

论文题目:Challenges in unsupervised clustering of single-cell RNA-seq data

scholar 引用:47

页数:10

发表时间:2019.01

发表刊物:nature reviews genetics

作者:Vladimir Yu Kiselev , Tallulah S. Andrews and Martin Hemberg *

摘要:

Single-cell RNA sequencing (scRNA-seq) allows researchers to collect large catalogues detailing the transcriptomes of individuals cells. Unsupervised clustering is of central importance for the analysis of these data, as it is used to identify putative cell types. However, there are many challenges involved. We discuss why clustering is a challenging probelm from a computational point of view and what aspects of the data make it challenging. We also consider the difficulties related to the biological interpretation and annotation of the identified clusters.

结论:

  • Unsupervised clustering is likely to remain a central component of scRNA-seq analysis.
  • One interesting line of research is into so-called multi-omics methods.
  • Another important technological development is spatial methods.
  • Incorporating spatial information will be important for clustering.

Introduction:

  • discoveries in molecular biology made it possible to
  • cell types on the basis of the presence or absence of surface proteins.
  • large volumes of scRNA-seq data make it possible to provide detailed catalogues of the cell found in a sample.
  • We encourage the reader to consult recently published overviews of this workflow.
  • this review focuses on clustering.
  • Defining cell types on the basis of the transcriptome is attractive.
  • For a cell atlas to be of practical use, reliable methods for unsupervised clustering of the cells will be one of the key computational challenges.
  • questions: no strong consensus about what is the best approach or how cell types can be defined based on scRNA-seq data.

正文组织架构:

1. Introduction

2. What clustering strategies are avaliable?

3. Discrete versus continus continuous cell grouping

4. Technical challenges

5. Biological challenges

6. Computational challenges

7. Biological interpretation and annotation

8. When does a cluster represent a new cell type?

9. Outlook

正文部分内容摘录:

  • cosine similarity, Pearson's correlation and Spearman's correlation, their scale invariance, they consider relative differences in values, making them more robust to library or cell size differences.
  • k-means, Lloyd's lagorithm
  • Another disadvantage of k-means is its bias towards identifying equal-sized clusters, which may result in rare cell types being hidden among a larger group.
  • hierarchical clustering, expensive, it is necessary to construct a k-nearest-neighbours graph.
  • An advantage is that most graph-based methods do not require the user to specify the number of clusters to identify, instead employing indirect resolution parameters.
  • Louvain algorithms
  • clustering based on the Louvalin method does not perform as well for smaller data sets.
  • clustering methods will partition the data, regardless of whether or not there are any biologically meaningfull groups present.
  • why dropouts are observed?
  1. the transcript was not present and the zero is thus an accurate representation of the state of the cell
  2. the sequencing depth was low, and although it was present, the transcript is not reported.
  3. as part of the library preparation, the transcript was not captured or failed to amplicy.
  • Estimating technical noise in scRNA-seq data is challenging because each individual cell is a biological, not a technical, replicate.
  • batch effect
  • Table 1 | Clustering methods for scRNA- seq  这个表就很直观了,是review一些方法的时候展示的好参考。
  • The heterogeneity of most tissues presents an additional challenge.
  • determining when a large cluster should or should not be reclustered is difficult.
  • nonlinear techniques are more flexible, as they can provide outcomes that are often more aesthetically pleasing and easier to interpret by visual inspection.
  • The main limitation of nonlinear dimensionality reductions is that they contain parameters that are required to be manually defined by the user and can strongly affect the visualization.
  • Perhaps the most challenging aspect of sc-RNA-seq analysis is how to validate a computational analysis method.

你可能感兴趣的:(Paper,Reading)