当当当!!!
今天笔记记录完后,DESeq2这个包的基本工作流程也就完成了,其实整个过程中内容并不是很多,重要的是对每一步理解。
Analyzing RNA-seq data with DESeq2(一)
Analyzing RNA-seq data with DESeq2(二)
Analyzing RNA-seq data with DESeq2(三)
Analyzing RNA-seq data with DESeq2(四)
Analyzing RNA-seq data with DESeq2(五)
Exploring and exporting results
MA-plot
plotMA(res, ylim=c(-2,2))
plotMA(resLFC, ylim=c(-2,2))
p值小于0.1的点将被表示为红色。从窗口掉出的点被绘制成指向向上或向下的开放三角形。
Alternative shrinkage estimators
lfcShrink函数可以提供三种收缩数据的方式,可通过type参数指定:
The options for
type
are:
apeglm
is the adaptive t prior shrinkage estimator from the apeglm package (Zhu, Ibrahim, and Love 2018). As of version 1.28.0, it is the default estimator.ashr
is the adaptive shrinkage estimator from the ashr package (Stephens 2016). Here DESeq2 uses the ashr option to fit a mixture of Normal distributions to form the prior, withmethod="shrinkage"
.normal
is the the original DESeq2 shrinkage estimator, an adaptive Normal distribution as prior.
在lfcShrink函数中通过coef参数来指定说明要进行收缩的系数,如coef=condition_treated_vs_untreated。除此之外,我们也可以用resultsNames(dds)中的顺序来指定系数。
resultsNames(dds)
## [1] "Intercept" "condition_treated_vs_untreated"
# because we are interested in treated vs untreated, we set 'coef=2'
resNorm <- lfcShrink(dds, coef=2, type="normal")
resAsh <- lfcShrink(dds, coef=2, type="ashr")
par(mfrow=c(1,3), mar=c(4,4,2,1)) ##修改图的展示方式
xlim <- c(1,1e5); ylim <- c(-3,3) ##设置横纵坐标
plotMA(resLFC, xlim=xlim, ylim=ylim, main="apeglm")
plotMA(resNorm, xlim=xlim, ylim=ylim, main="normal")
plotMA(resAsh, xlim=xlim, ylim=ylim, main="ashr")
Plot counts
可以查看指定基因在不同组中的表达情况
plotCounts(dds, gene=which.min(res$padj), intgroup="condition")
plotCounts(dds, gene="FBgn0000008" , intgroup="condition")
使用returnData参数还可以将数据返回后再使用ggplot函数美化制作
d <- plotCounts(dds, gene=which.min(res$padj), intgroup="condition",
returnData=TRUE)
library("ggplot2")
ggplot(d, aes(x=condition, y=count)) +
geom_point(position=position_jitter(w=0.1,h=0)) +
scale_y_log10(breaks=c(25,100,400))
Exporting results to CSV files
- 提出排序后的结果
write.csv(as.data.frame(resOrdered),
file="condition_treated_results.csv")
- 设置p值阈值,将阈值以上或以下的提出
resSig <- subset(resOrdered, padj < 0.1)
resSig
## log2 fold change (MLE): condition treated vs untreated
## Wald test p-value: condition treated vs untreated
## DataFrame with 1054 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue
##
## FBgn0039155 730.568 -4.61874 0.1691240 -27.3098 3.24447e-164
## FBgn0025111 1501.448 2.89995 0.1273576 22.7701 9.07164e-115
## FBgn0029167 3706.024 -2.19691 0.0979154 -22.4368 1.72030e-111
## FBgn0003360 4342.832 -3.17954 0.1435677 -22.1466 1.12417e-108
## FBgn0035085 638.219 -2.56024 0.1378126 -18.5777 4.86845e-77
## ... ... ... ... ... ...
## FBgn0037073 973.1016 -0.252146 0.1009872 -2.49681 0.0125316
## FBgn0029976 2312.5885 -0.221127 0.0885764 -2.49645 0.0125443
## FBgn0030938 24.8064 0.957645 0.3836454 2.49617 0.0125542
## FBgn0039260 1088.2766 -0.259253 0.1038739 -2.49585 0.0125656
## FBgn0034753 7775.2711 0.393515 0.1576749 2.49574 0.0125696
## padj
##
## FBgn0039155 2.71919e-160
## FBgn0025111 3.80147e-111
## FBgn0029167 4.80595e-108
## FBgn0003360 2.35542e-105
## FBgn0035085 8.16049e-74
## ... ...
## FBgn0037073 0.0999489
## FBgn0029976 0.0999489
## FBgn0030938 0.0999489
## FBgn0039260 0.0999489
## FBgn0034753 0.0999489
##保存结果
write.csv(as.data.frame(resSig ),
file="p0.1_low.csv")
到此为止,关于DESeq2的全部标准的基本流程已经学完了,而这部分是最简单也是最基础的部分,其中还有很多东西的可以拓展和探索,很多原理也需要下功夫去理解和学习,希望以后可以更加透彻的明白其中的知识吧!