ChIP-Seq: DiffBind无control,无重复样本

DiffBind的使用有前辈已经写的很详细了,可以参考下:

  • https://mp.weixin.qq.com/s?__biz=MzAxMDkxODM1Ng==&mid=2247487545&idx=1&sn=6dea2112d5a1c14555a4263d5dcfe42c&chksm=9b485082ac3fd994df4665359d5e71feb39c4188e9aabebe3996ca0744a54cc711df368068a7&scene=21#wechat_redirect
  • https://www.jianshu.com/p/f849bd55ac27
  • https://zhuanlan.zhihu.com/p/52379828

另附上其官方手册:

  • https://www.bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf

DiffBind首先要导入一个SampleSheet文件,格式为csv,官方文档提到Spreadsheets in Excel® format, with a .xls or .xlsx suffix, are also accepted,但是我导入时出错。
SampleSheet文件含有固定的几列,

image.png

PeakCaller的选项为
– “raw”: text file file; peak score is in fourth column
– “bed”: .bed file; peak score is in fifth column
– “narrow”: default peak.format: narrowPeaks file
– “macs”: MACS .xls file
– “swembl”: SWEMBL .peaks file
– “bayes”: bayesPeak file
– “peakset”: peakset written out using pv.writepeakset
– “fp4”: FindPeaks v4
(详见手册:https://www.bioconductor.org/packages/release/bioc/manuals/DiffBind/man/DiffBind.pdf)
其中我的样本无control的input,所以两列ControlIDbamControl为空。
image.png

其次bam文件路径仍为E:\defect\DNA_protein_interaction\GSE55506\Differential_expression\T2N_H3K4me3_sorted.bam,无需写成R识别的\\,导入R测试

> dbObj <- dba(sampleSheet="SampleSheet.csv")
trisomy_21 fibroblasts trisomy_21 trisomy_21 trisomy_21 1 narrow
euploid fibroblasts euploid euploid euploid 1 narrow
> dbObj
2 Samples, 33153 sites in matrix (47495 total):
          ID      Tissue     Factor  Condition  Treatment Replicate Caller Intervals
1 trisomy_21 fibroblasts trisomy_21 trisomy_21 trisomy_21         1 narrow     40820
2    euploid fibroblasts    euploid    euploid    euploid         1 narrow     44391

没问题,证明无control的input也是可行的,但是进行差异分析时报错

> dbObj <- dba.contrast(dbObj, categories=DBA_FACTOR,minMembers = 1)
Error in dba.contrast(dbObj, categories = DBA_FACTOR, minMembers = 1) : 
  minMembers must be at least 2. Use of replicates strongly advised.
> dbObj <- dba.contrast(dbObj, categories=DBA_FACTOR,minMembers = 2)
Warning message:
No contrasts added. Perhaps try more categories, or lower value for minMembers. 
> dbObj <- dba.analyze(dbObj)
Error in pv.DBA(DBA, method, bSubControl, bFullLibrarySize, bTagwise = bTagwise,  : 
  Unable to perform analysis: no contrasts specified.
In addition: Warning message:
No contrasts added. Perhaps try more categories, or lower value for minMembers. 
> dbObj <- dba.contrast(dbObj, categories=DBA_CONDITION)
Warning message:
No contrasts added. Perhaps try more categories, or lower value for minMembers. 
> dbObj <- dba.contrast(dbObj, categories=DBA_CONDITION, minMembers = 1)
Error in dba.contrast(dbObj, categories = DBA_CONDITION, minMembers = 1) : 
  minMembers must be at least 2. Use of replicates strongly advised.

根据提示,难道一定需要2个以上的重复?待解决......

=================================================================================
去论坛及官网问了下,DiffBind的作者给出了回答,输入的样本DiffBind需要重复
原回答:
Yes, replicates are required to do any kind of statistical analysis. Replicates are required to estimate the variance in the data and calculate confidence statistics such as p-values/FDRs.

Without replicates, you can do some exploratory analysis of overlapping peaks (occupancy analysis). For example using dba.plotVenn(). But not knowing if your data represents an outlier, combined with the inherent noisiness of peak calling, means you will have to have another way to validate any "differential" peaks you identify.
链接:https://support.bioconductor.org/p/125809/#125840

你可能感兴趣的:(ChIP-Seq: DiffBind无control,无重复样本)