DESeq2差异分析

need:表达矩阵分组矩阵，值要是整数

DESeq2和EdgeR都可用于做基因差异表达分析，主要也是用于RNA-Seq数据，同样也可以处理类似的ChIP-Seq,shRNA以及质谱数据。

这两个都属于R包，其相同点在于都是对count data数据进行处理，都是基于负二项分布模型。因此会发现，用两者处理同一组数据，最后在相同阈值下筛选出的大部分基因都是一样的，但是有一部分不同应该是由于其估计离散度的不同方法所导致的。

library(DESeq2)

library(limma)

library(pasilla)#示例数据包

data(pasillaGenes)

exprSet=counts(pasillaGenes) ##做好表达矩阵

group_list=pasillaGenes$condition##做好分组因子即可

(colData <- data.frame(row.names=colnames(exprSet), group_list=group_list))

dds <- DESeqDataSetFromMatrix(countData = exprSet,

colData = colData,#Rows of colData correspond to columns of countData

design = ~ group_list)

##上面是第一步第一步，构建dds这个对象，需要一个表达矩阵和分组矩阵！！！

dds2 <- DESeq(dds) ##第二步，直接用DESeq函数即可

resultsNames(dds2)#results extracts a result table from a DESeq analysis giving base means across samples, log2 fold changes, standard errors, test statistics, p-values and adjusted p-values; resultsNames returns the names of the estimated effects (coefficents) of the model; removeResults returns a DESeqDataSet object with results columns removed.

res <- results(dds2, contrast=c("group_list","treated","untreated"))#this argument specifies what comparison to extract from the object to build a results table

## 提取你想要的差异分析结果，我们这里是treated组对untreated组进行比较

resOrdered <- res[order(res$padj),]#padj排序

resOrdered=as.data.frame(resOrdered)

可以看到程序非常好用！

它只对RNA-seq的基因的reads的counts数进行分析，请不要用RPKM等经过了normlization的表达矩阵来分析。

值得一提的是DESeq2软件独有的normlization方法！

rld <- rlogTransformation(dds2) ## 得到经过DESeq2软件normlization的表达矩阵！

exprSet_new=assay(rld)

par(cex = 0.7)#指定符号的大小

n.sample=ncol(exprSet)#样本数

if(n.sample>40) par(cex = 0.5)

cols <- rainbow(n.sample*1.2)

par(mfrow=c(2,2))

boxplot(exprSet, col = cols,main="expression value",las=2)

boxplot(exprSet_new, col = cols,main="expression value",las=2)

hist(exprSet)

hist(exprSet_new)

第一张图也是箱型图

看这个图就知道了，它把本来应该是数据离散程度非常大的RNA-seq的基因的reads的counts矩阵经过normlization后变成了类似于芯片表达数据的表达矩阵，然后其实可以直接用T检验来找差异基因了！

但是，如果你的分组不只是两个，就复杂了，你需要再仔细研读说明书，甚至你可能需要咨询实验设计人员或者统计人员！

DESeq2差异分析

你可能感兴趣的:(DESeq2差异分析)