题目：Full-Length mRNA-Seq from single cell levels of RNA and individual circulating tumor cells
期刊：Nat Biotechnol.
通讯作者：Rickard Sandberg

1. Background

The question remained whether single-cell transcriptomes faithfully represent the RNA population before amplification and how technical variation limits the power to find differential expression.
This initial mRNA-Seq method also **preferentially amplified the 3′ ends of mRNAs, and hence the data could only be used to identify distal splicing events. ** Recently, a method for multiplexed single-cell RNA-Seq was introduced that quantifies transcripts through reads mapping to mRNA 5′ ends. Neither of these methods generates read coverage across full transcripts.

2. Gap

Since most mammalian multi-exons genes are subject to alternative RNA processing, there is a need for a single-cell transcriptome method that can both quantify gene expression and provide the coverage for efficient detection of transcript variants and alleles.

3. Aims

In this study, we introduce a single-cell RNA-Sequencing protocol with markedly improved transcriptome coverage, which samples cDNAs from more than just the ends of mRNAs.

4. Approaches

Smart-seq protocol: For Smart-Seq, first we lysed each cell in hypotonic solution and converted poly(A)+ RNA to full-length cDNA using oligo(dT) priming and SMART template switching technology, followed by 12‐18 cycles of PCR preamplification of cDNA. To enable gene and mRNA isoform expression analyses in single cells, a novel full-transcriptome mRNA-Seq protocol (Smart-Seq) was developed. Smart-Seq makes use of SMART™ template switching technology for the generation of full-length cDNAs and only 12 to 18 cycles of PCR following the initial cDNA synthesis steps. The amplified cDNA was used to construct standard Illumina sequencing libraries using either Covaris shearing followed by **ligation of adaptors (PE) or Tn5-mediated “tagmentation” **using the Nextera technology (Tn5). Both of these library preparation methods enable random shotgun sequencing of cDNAs.

workflow

5. Results

5.1 Smart-Seq read coverage across transcripts

Smart-Seq read coverage across transcripts

5.2 Quantitative assessment of single-cell transcriptomics

Aim ：Analyses of gene expression from millions of cells using mRNA-Seq is highly reproducible and has low technical variation. So far, no single-cell mRNA-Seq study has measured the technical variation intrinsic to the cDNA pre-amplification components of single-cell methods.
Method： We therefore diluted microgram amounts of reference total RNA down to nano- and picogram levels and applied Smart-Seq to assess sensitivity, technical variability and detection of differentially expressed transcripts of Smart-Seq on low amounts of total RNA. For comparison, standard mRNA-Seq libraries were generated from 100 ng to microgram levels of reference total RNA.

5.2.1 the sensitivity of the method in detecting transcripts present at different expression levels

sensitivity

Starting with 10 ng or 1 ng of total RNA, we found no or minimal decline in sensitivity compared with standard mRNA-Seq. However, lowering the starting amounts to single-cell levels decreased the detection rate of less abundant transcripts (Fig. 2a). Analyses of the twelve cancer cell line cells (four cells each from the LNCaP, PC3 and T24 lines) showed that ~76% of transcripts expressed at 10 RPKM (reads per kilobase exon model and million mappable reads), an expression level that roughly equals the median expression level for detected transcripts, were reproducibly detected in all single-cell profiles (Fig. 2b).
summary：Transcript detection sensitivity is affected by limiting starting amounts of RNA that lead to random loss of low abundance transcripts, but still the majority of low abundance and the vast majority of highly expressed transcripts are reliably detected even in single cells.

5.2.2 the reproducibility in expression levels generated from diluted RNA and individual cells.

expression level estimation with Smart-Seq (lower oocyte to oocyte variability)

Correlation analyses

Correlation analyses between technical replicates of diluted RNA showed increasing concordance with larger amounts of RNA. Comparing the single cells against the RNA dilution, we observed higher correlations (Pearson correlations of 0.75–0.85) among individual cells of the same type than among dilution replicates at 10 pg (Pearson correlations of 0.65–0.75).

variability

Since variability in measurements of expression levels depends on transcript expression levels, we computed the variability as a function of the expression level (Fig. 2c,d). This analysis showed that Smart-Seq on 10 ng total RNA had the same technical variability as standard mRNA-Seq and that Smart-Seq on 1 ng total RNA showed only a modest increase in technical noise (Fig. 2c). When lowering input amounts down to picogram levels, there was a clear increase in technical variability, particularly for less abundantly expressed transcripts (Fig. 2c). The levels of technical variability at picogram levels of total RNA were compared to the biological variation found in comparisons of human brain and UHRR using standard mRNA-Seq (Fig. 2c, green line). Interestingly, analyses of variation between individual cancer cells of different origin revealed extensive biological variation in highly expressed genes (Fig. 2d).

5.2.3 whether pre-amplified single-cell expression profiles were representative of the original expression profiles.

Spearman correlations between standard mRNA-Seq and those estimated from Smart-Seq

Comparing relative gene expression levels (UHRR - brain) estimated using **standard mRNA-Seq to those estimated from Smart-Seq **with different amounts of input RNA, we again found a high concordance (Fig. 2e–g). Starting with 1 ng or 100 pg total RNA, the relative expression in Smart-Seq and standard mRNA-Seq respectively had Spearman correlations of 0.87 and 0.77 (Fig. 2e,f). Comparisons with 10 pg input RNA showed overall good correlation (Fig. 2g), but identified two populations of transcripts with distorted expression in Smart-Seq data from either human brain or UHRR, reflecting stochastic losses, mostly of low abundance transcripts when starting with such minute RNA of levels (Fig. 2g and Fig. 2a).

Analyses of GC and length biases in Smart-Seq and mRNA-Seq data

Pre-amplification of cDNA could also lead to disproportionate amplification of short transcripts, but we found no systematic bias (Supplementary Fig. 7). A previous microarray study analyzed PCR amplified cDNA (from picogram levels) and found the transcriptome overall preserved, but skewed.

Together, these results demonstrated that transcriptome analyses from few or single cells, in general, preserved relative expression level differences for detected transcripts.

5.3 Analyses of transcriptional and post-transcriptional (alternatively spliced exons) differences from single-cells.

Transcriptional and post-transcriptional analyses of cancer cell line cells using Smart-Seq

Conclude that Smart-Seq significantly improves our ability to detect alternative RNA processing in single cells.

5.4 Analyses of circulating tumor cell transcriptomes

Aim: whether global transcriptome analyses of putative circulating tumor cells (CTCs) could reveal their tumor of origin and provide data to support the use of this method for unbiased cancer-specific biomarker identification.
Method: generated transcriptomes from NG2+ putative melanoma circulating tumor cells (CTCs) isolated from peripheral blood drawn from a patient with recurrent melanoma using immunomagnetic purification with a MagSweeper instrument (Illumina Inc.) For comparison, we also generated Smart-Seq libraries from single cells derived from primary melanocytes (PMs, n=2), melanoma cancer cell line (SKMEL5, n=4 and UACC257, n=3) cells and from human embryonic stem cells (ESCs, n=8). Since the NG2+ putative CTCs were isolated from blood, it was important to compare them to blood cells.
**The putative CTCs were distinct from lymphoma cell lines (BL41 and BJAB)13 and immune tissues (lymphnode and white blood cell samples), as well as embryonic stem cells, and instead showed high similarity to PMs and melanoma cell line cells. **
results:
Unsupervised hierarchical clustering and correlation analyses of gene expression levels showed a clear clustering of cells according to cell type of origin;
Further support for the melanocytic origin of the putative melanoma CTCs came from analyses of melanocyte lineage specific markers, as all NG2+ cells expressed high levels of MLANA14, TYR15 and the melanocyte specific m-form of MITF16 but not immune markers such as PTPRC, in contrast to peripheral blood lymphocytes. Furthermore, NG2+ cells expressed high levels of melanoma-associated genes (based on our unbiased selection of the 100 transcripts most strongly associated with melanoma, see Methods), but not immune cell-associated genes selected in a similar manner.
Thus, both their global transcriptomes and expression patterns of melanoma-associated transcripts clearly support a melanocyte origin for the NG2+ cells putative melanoma CTCs.
Smart-Seq enables screening for SNPs and mutations in transcribed regions using only few cells.

6. Novelty and significance

Generating high-coverage transcriptomes from single cells and small numbers of cells.
Importantly, Smart-Seq has significantly improved read coverage across transcripts, which enables detailed analyses of alternative splicing and identification of SNPs and mutations.

7. Problems

The coverage is uneven, preferring the 3‘ end of the transcripts

Smart seq 2012