Analyzing RNA-seq data with DESeq2(二)

书接上文,我们已经学会了如何利用count matrix数据来构建DESeqDataSet,今天我们来学习另一种数据输入的构建方法htseq-count input

Htseq-count input

先介绍一下什么是HTSeq,它是一个Python包用来对测序数据进行分析。

1.Getting statistical summaries about the base-call quality scores to study the data quality.
2.Calculating a coverage vector and exporting it for visualization in a genome browser.
3.Reading in annotation data from a GFF file.
4.Assigning aligned reads from an RNA-Seq experiments to exons and genes.

该包学习参考地址:https://htseq.readthedocs.io/en/master/tour.html

Analyzing RNA-seq data with DESeq2(一)
Analyzing RNA-seq data with DESeq2(二)
Analyzing RNA-seq data with DESeq2(三)
Analyzing RNA-seq data with DESeq2(四)
Analyzing RNA-seq data with DESeq2(五)

directory <- "/path/to/your/files/"
directory <- system.file("extdata", package="pasilla",
                         mustWork=TRUE)

sampleFiles <- grep("treated",list.files(directory),value=TRUE)
sampleCondition <- sub("(.*treated).*","\\1",sampleFiles)
sampleTable <- data.frame(sampleName = sampleFiles,
                          fileName = sampleFiles,
                          condition = sampleCondition)
sampleTable$condition <- factor(sampleTable$condition)

library("DESeq2")
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                       directory = directory,
                                       design= ~ condition)

ddsHTSeq
## class: DESeqDataSet 
## dim: 70463 7 
## metadata(1): version
## assays(1): counts
## rownames(70463): FBgn0000003:001 FBgn0000008:001 ... FBgn0261575:001
##   FBgn0261575:002
## rowData names(0):
## colnames(7): treated1fb.txt treated2fb.txt ... untreated3fb.txt
##   untreated4fb.txt
## colData names(1): condition

看看原始数据是什么样子呢?

> head(ddsHTSeq@assays@data$counts)
                treated1fb.txt treated2fb.txt treated3fb.txt untreated1fb.txt untreated2fb.txt untreated3fb.txt untreated4fb.txt
FBgn0000003:001              0              0              1                0                0                0                0
FBgn0000008:001              0              0              0                0                0                0                0
FBgn0000008:002              0              0              0                0                0                1                0
FBgn0000008:003              0              1              0                1                1                1                0
FBgn0000008:004              1              0              1                0                1                0                1
FBgn0000008:005              4              1              1                2                2                0                1

到现在两种常用输入数据形式已经学习完了,接下来就是对数据进行处理了哦。
对了有时间可以学习一下HTSeq这个Python包,感觉很强大的样子呀。

下次见咯( ^ . ^ )

大家一起学习讨论鸭!

来一杯!

你可能感兴趣的:(Analyzing RNA-seq data with DESeq2(二))