前面铺垫了那么多终于要开始了
Analyzing RNA-seq data with DESeq2(一)
Analyzing RNA-seq data with DESeq2(二)
Analyzing RNA-seq data with DESeq2(三)
Analyzing RNA-seq data with DESeq2(四)
Analyzing RNA-seq data with DESeq2(五)
Differential expression analysis
The standard differential expression analysis steps are wrapped into a single function, DESeq.
Results tables are generated using the function results, which extracts a results table with log2 fold changes, p values and adjusted p values.
dds <- DESeq(dds)
res <- results(dds)
res
## log2 fold change (MLE): condition treated vs untreated
## Wald test p-value: condition treated vs untreated
## DataFrame with 9921 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue padj
##
## FBgn0000008 95.14429 0.00227644 0.223729 0.010175 0.9918817 0.997211
## FBgn0000014 1.05652 -0.49512039 2.143186 -0.231021 0.8172987 NA
## FBgn0000017 4352.55357 -0.23991894 0.126337 -1.899041 0.0575591 0.288002
## FBgn0000018 418.61048 -0.10467391 0.148489 -0.704927 0.4808558 0.826834
## FBgn0000024 6.40620 0.21084779 0.689588 0.305759 0.7597879 0.943501
## ... ... ... ... ... ... ...
## FBgn0261570 3208.38861 0.2955329 0.127350 2.3206264 0.020307 0.144240
## FBgn0261572 6.19719 -0.9588230 0.775315 -1.2366888 0.216203 0.607848
## FBgn0261573 2240.97951 0.0127194 0.113300 0.1122634 0.910615 0.982657
## FBgn0261574 4857.68037 0.0153924 0.192567 0.0799327 0.936291 0.988179
## FBgn0261575 10.68252 0.1635705 0.930911 0.1757102 0.860522 0.967928
Note that we could have specified the coefficient or contrast we want to build a results table for, using either of the following equivalent commands:
如果需要明确如何处理对象则可以使用以下方法:
res <- results(dds, name="condition_treated_vs_untreated")
##The text, condition treated vs untreated, tells you that the estimates are of the logarithmic fold change log2(treated/untreated).
通过对treated和untreated顺序的更改来达到明确处理对象的问题
res2 <- results(dds, contrast=c("condition","untreated","treated"))
res2
log2 fold change (MLE): condition untreated vs treated
Wald test p-value: condition untreated vs treated
DataFrame with 9921 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
FBgn0000008 95.1440789963134 -0.00215437939131803 0.223778503681452 -0.00962728481903151 0.992318656737681 0.997034169677036
FBgn0000014 1.05657219346166 0.495299273223956 2.14327479881991 0.231094619083222 0.817241295341332 NA
FBgn0000017 4352.5928987935 0.240025555383081 0.126342087768198 1.89980678349608 0.0574584804033397 0.287669966702742
FBgn0000018 418.614930505122 0.104799219010569 0.148536767352526 0.705543959778297 0.480471785048082 0.826695551321695
FBgn0000024 6.40628919476961 -0.210746819818719 0.689970836726196 -0.305443083389843 0.760028712717949 0.943790945818601
... ... ... ... ... ... ...
FBgn0261570 3208.38445969106 -0.295433108560876 0.127325952346867 -2.32028980043316 0.0203252055046957 0.144360633334623
FBgn0261572 6.19713652050888 0.95895731601274 0.775588232046352 1.23642582028685 0.216300323350011 0.608242035446242
FBgn0261573 2240.98398636611 -0.0126194294797913 0.113296402610125 -0.111384202755468 0.911311686482631 0.982594010602204
FBgn0261574 4857.74267170989 -0.0152592268752615 0.192504546880859 -0.0792668387448812 0.936820382128824 0.988423794214432
FBgn0261575 10.6835537573787 -0.163219202829332 0.930591813482421 -0.175392906389902 0.860770914073928 0.967954942188896
Log fold change shrinkage for visualization and ranking
对数据进行收缩处理
resultsNames(dds)
## [1] "Intercept" "condition_treated_vs_untreated"
resLFC <- lfcShrink(dds, coef="condition_treated_vs_untreated", type="apeglm")
resLFC
## log2 fold change (MAP): condition treated vs untreated
## Wald test p-value: condition treated vs untreated
## DataFrame with 9921 rows and 5 columns
## baseMean log2FoldChange lfcSE pvalue padj
##
## FBgn0000008 95.14429 0.00119920 0.151897 0.9918817 0.997211
## FBgn0000014 1.05652 -0.00473412 0.205468 0.8172987 NA
## FBgn0000017 4352.55357 -0.18989990 0.120377 0.0575591 0.288002
## FBgn0000018 418.61048 -0.06995753 0.123901 0.4808558 0.826834
## FBgn0000024 6.40620 0.01752715 0.198633 0.7597879 0.943501
## ... ... ... ... ... ...
## FBgn0261570 3208.38861 0.24110290 0.1244469 0.020307 0.144240
## FBgn0261572 6.19719 -0.06576173 0.2141351 0.216203 0.607848
## FBgn0261573 2240.97951 0.01000619 0.0993764 0.910615 0.982657
## FBgn0261574 4857.68037 0.00843552 0.1408267 0.936291 0.988179
## FBgn0261575 10.68252 0.00809101 0.2014704 0.860522 0.967928
Using parallelization
并行处理
使用BiocParallel包
Parallelizing DESeq, results, and lfcShrink can be easily accomplished by loading the BiocParallel package, and then setting the following arguments: parallel=TRUE and BPPARAM=MulticoreParam(4), for example, splitting the job over 4 cores.
library("BiocParallel") ##使用BiocParallel包
register(MulticoreParam(4))
#提前声明好后,直接添加parallel=TRUE选项即可
p-values and adjusted p-values
P值调整
##p值从小到大排列
resOrdered <- res[order(res$pvalue),]
##使用summary函数获取简单的基本结果
summary(res)
##
## out of 9921 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up) : 518, 5.2%
## LFC < 0 (down) : 536, 5.4%
## outliers [1] : 1, 0.01%
## low counts [2] : 1539, 16%
## (mean count < 6)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
##p值小于0.1的共有多少个
sum(res$padj < 0.1, na.rm=TRUE)
##[1] 1048
如果想要提取排序后的差异分析结果可以这样操作:
resOrdered <- res[order(res$padj),] #padj排序
resOrdered_frame=as.data.frame(resOrdered)
write.csv(resOrdered_frame,
file="condition_treated_results.csv")
##通过class函数看看数据类型
> class(resOrdered)
##[1] "DESeqResults"
##attr(,"package")
##[1] "DESeq2"
> class(resOrdered_frame)
##[1] "data.frame"
函数results有着丰富的参数设置,可以自定义生成的结果,默认p值=0.1,可使用alpha参数更改为合适的值。
例如:
res05 <- results(dds, alpha=0.05)
summary(res05)
##
##out of 9921 with nonzero total read count
##adjusted p-value < 0.05
##LFC > 0 (up) : 408, 4.1%
##LFC < 0 (down) : 428, 4.3%
##outliers [1] : 1, 0.01%
##low counts [2] : 1154, 12%
##(mean count < 4)
##[1] see 'cooksCutoff' argument of ?results
##[2] see 'independentFiltering' argument of ?results
sum(res05$padj < 0.05, na.rm=TRUE)
## [1] 836
基本的数据处理到这里就完成了,接下来就是结果展示了。