转录组分析文章笔记

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

用HISAT, StringTie 和 Ballgown来进行转录组测序数据的表达水平分析

ps：文章省略了数据的质控，去除污染物，去除接头等操作，直接从序列比对开始

分析流程可以分成4个主要的方面：

(i) alignment of the reads to the genome;

(ii) assembly of thealignments into full-length transcripts;

(iii) quantification of the expressionlevels of each gene and transcript; and

(iv) calculation of the differences in expression for all genes among the different experimental conditions.

1 比对reads到gene组

2将alignments 组装成完整的转录本

3定量每个gene或者转录本的表达水平

4计算不同实验条件下所有gene表达差异

分析使用的3个软件的分别作用：

HISAT：alignsRNA-seq reads to a genome and discovers transcript splice sites

HISAT：比对RNA测序的reads到基因组和已知的转录剪切位点

StringTie：assembles the alignments into full and partial tran-scripts, creating multiple isoforms asnecessary and estimating the expression levels of all genes and transcripts.

StringTie：组装 alignments到全部或者部分转录组，生成多个isoforms，计算所有gene和transcripts的表达水平

Ballgown：takes thetranscripts and expression levels from StringTie and applies rigorous statistical methods to determine which transcripts are differentially expressed between two or more experiments.

Ballgown：导入StringTie生成的转录本以及表达水平结果，采用严格的统计方法来确认在不同实验条件下差异表达的 transcripts

具体流程图：

Figure 1 | An overview of the ‘new Tuxedo’ protocol.

具体流程：

*FASTQC和FASTX toolkit进行原始RNA测序数据的质控：去除污染物，去除接头，低质量的序列

1 用HISAT将样本的read比对到参考基因组

2 比对结果传送到stringtie进行转录本拼接

3 用stingtie的merge功能将拼接后的转录本进行融合

（Cufflinks的cuffmerge功能能代替atingtie的merge功能）

4 融合后的转录本回送到stingtie，重新计算转录本的丰度

stringtie:gffcompre确定拼接的转录本多少匹配到已经注释的gene，多少是完全新的

5 stingtie提供转录本的read数量

stringtie传送三类数据至ballgown

(i)phenotype data—information about the samples being collected;

(ii)expression data—normalized and un-normalized measures of the amount of eachexon, junction, transcript and gene expressed in each sample;

(iii)genomic information— coordinates giving the location of the exons, introns,transcripts and genes, as well as annotation including information such as gene names.

A 表型数据：收集的样本信息

B 表达数据：标准化或未标准化的内显子，junction，转录本，gene的表达信息

C gene组信息：内外显子转录本等的位置信息，或者gene名称等

6 ballgown根据不同实验条件计算差异表达gene

ballgown分析流程：

A loading the data into R.

载入由stingtie产生的丰度数据和描述样本的表型信息数据到R

划重点：确保gene组样本的id与表型数据的id一致

B inspectthe distribution of abundance estimates for the transcripts.

检查转录本丰度估计的分布

划重点：丰度估计由FPKM表示，每1百万个map上的reads中map到外显子的每1K个碱基上的reads个数

ballgown的stattest功能：直接标记任何已知的干扰因子

C The result is a table with information on thefeature tested for differential expression

差异表达的特征检验

具体的软件安装与执行代码，文章中有具体列出，这里就不累述。详细请阅读文章。

转录组分析文章笔记

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

你可能感兴趣的:(转录组分析文章笔记)