参考文章:
使用IDR软件处理生物学重复样本的peak calling
Irreproducible Discovery Rate (IDR)
(base) zexing@DNA:~/projects/zhaoyingying/ChIP_seq/2021_05_01/scripts_log$ conda install idr
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /f/xudonglab/zexing/miniconda3
added / updated specs:
- idr
The following packages will be downloaded:
package | build
---------------------------|-----------------
conda-4.10.1 | py37h89c1867_0 3.1 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
idr-2.0.4.2 | py37h77a2a36_5 77 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda
matplotlib-3.2.2 | 1 6 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
matplotlib-base-3.2.2 | py37h1d35a4c_1 7.0 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
openssl-1.1.1k | h7f98852_0 2.1 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge
------------------------------------------------------------
Total: 12.3 MB
The following NEW packages will be INSTALLED:
idr anaconda/cloud/bioconda/linux-64::idr-2.0.4.2-py37h77a2a36_5
matplotlib anaconda/cloud/conda-forge/linux-64::matplotlib-3.2.2-1
The following packages will be UPDATED:
conda 4.9.2-py37h89c1867_0 --> 4.10.1-py37h89c1867_0
openssl 1.1.1i-h7f98852_0 --> 1.1.1k-h7f98852_0
The following packages will be DOWNGRADED:
matplotlib-base 3.3.3-py37h4f6019d_0 --> 3.2.2-py37h1d35a4c_1
Proceed ([y]/n)? y
Downloading and Extracting Packages
conda-4.10.1 | 3.1 MB | ####################################################################################################### | 100%
matplotlib-base-3.2. | 7.0 MB | ####################################################################################################### | 100%
idr-2.0.4.2 | 77 KB | ####################################################################################################### | 100%
openssl-1.1.1k | 2.1 MB | ####################################################################################################### | 100%
matplotlib-3.2.2 | 6 KB | ####################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
usage: idr [-h] --samples SAMPLES SAMPLES [--peak-list PEAK_LIST]
[--input-file-type {narrowPeak,broadPeak,bed,gff}] [--rank RANK]
[--output-file OUTPUT_FILE]
[--output-file-type {narrowPeak,broadPeak,bed}]
[--log-output-file LOG_OUTPUT_FILE] [--idr-threshold IDR_THRESHOLD]
[--soft-idr-threshold SOFT_IDR_THRESHOLD] [--use-old-output-format]
[--plot] [--use-nonoverlapping-peaks]
[--peak-merge-method {sum,avg,min,max}] [--initial-mu INITIAL_MU]
[--initial-sigma INITIAL_SIGMA] [--initial-rho INITIAL_RHO]
[--initial-mix-param INITIAL_MIX_PARAM] [--fix-mu] [--fix-sigma]
[--dont-filter-peaks-below-noise-mean] [--use-best-multisummit-IDR]
[--allow-negative-scores] [--random-seed RANDOM_SEED]
[--max-iter MAX_ITER] [--convergence-eps CONVERGENCE_EPS]
[--only-merge-peaks] [--verbose] [--quiet] [--version]
optional arguments:
-h, --help show this help message and exit
--samples SAMPLES SAMPLES, -s SAMPLES SAMPLES
Files containing peaks and scores.
--peak-list PEAK_LIST, -p PEAK_LIST
If provided, all peaks will be taken from this file.
--input-file-type {narrowPeak,broadPeak,bed,gff}
File type of --samples and --peak-list.
--rank RANK Which column to use to rank peaks.
Options: signal.value p.value q.value columnIndex
Defaults:
narrowPeak/broadPeak: signal.value
bed: score
--output-file OUTPUT_FILE, -o OUTPUT_FILE
File to write output to.
Default: idrValues.txt
--output-file-type {narrowPeak,broadPeak,bed}
Output file type. Defaults to input file type when available, otherwise bed.
--log-output-file LOG_OUTPUT_FILE, -l LOG_OUTPUT_FILE
File to write output to. Default: stderr
--idr-threshold IDR_THRESHOLD, -i IDR_THRESHOLD
Only return peaks with a global idr threshold below this value.
Default: report all peaks
--soft-idr-threshold SOFT_IDR_THRESHOLD
Report statistics for peaks with a global idr below this value but return all peaks with an idr below --idr.
Default: 0.05
--use-old-output-format
Use old output format.
--plot Plot the results to [OFNAME].png
--use-nonoverlapping-peaks
Use peaks without an overlapping match and set the value to 0.
--peak-merge-method {sum,avg,min,max}
Which method to use for merging peaks.
Default: 'sum' for signal/score/column indexes, 'min' for p/q-value.
--initial-mu INITIAL_MU
Initial value of mu. Default: 0.10
--initial-sigma INITIAL_SIGMA
Initial value of sigma. Default: 1.00
--initial-rho INITIAL_RHO
Initial value of rho. Default: 0.20
--initial-mix-param INITIAL_MIX_PARAM
Initial value of the mixture params. Default: 0.50
--fix-mu Fix mu to the starting point and do not let it vary.
--fix-sigma Fix sigma to the starting point and do not let it vary.
--dont-filter-peaks-below-noise-mean
Allow signal points that are below the noise mean (should only be used if you know what you are doing).
--use-best-multisummit-IDR
Set the IDR value for a group of multi summit peaks (a group of peaks with the same chr/start/stop but different summits) to the best value across all of these peaks. This is a work around for peak callers that don't do a good job splitting scores across multi summit peaks (e.g. MACS). If set in conjunction with --plot two plots will be created - one with alternate summits and one without. Use this option with care.
--allow-negative-scores
Allow negative values for scores. (should only be used if you know what you are doing)
--random-seed RANDOM_SEED
The random seed value (sor braking ties). Default: 0
--max-iter MAX_ITER The maximum number of optimization iterations. Default: 3000
--convergence-eps CONVERGENCE_EPS
The maximum change in parameter value changes for convergence. Default: 1.00e-06
--only-merge-peaks Only return the merged peak list.
--verbose Print out additional debug information
--quiet Don't print any status messages
--version show program's version number and exit
此次实验是具有生物学重复样本,处理前需要对重复样本的共有peak进行鉴定,采用IDR软件进行筛选。
vim新建ChIP_seq_script_1,脚本如下:
#! /bin/bash
# 上面一行宣告这个script的语法使用bash语法,当程序被执行时,能够载入bash的相关环境配置文件。
# Program:
# This program is used for calling peaks from different samples in the same condition.
#History:
# 2021/05/08 zexing First release
#
# 参数--samples Files containing peaks and scores。
# 参数--peak-list If provided, all peaks will be taken from this file。
# 参数--output-file File to write output to。Default: idrValues.txt
# 参数--plot Plot the results to [OFNAME].png。
# 参数--output-file-type {narrowPeak,broadPeak,bed}. Output file type. Defaults to input file type when available, otherwise bed.
# 参数--idr-threshold Only return peaks with a global idr threshold below this value. Default: report all peaks
# 参数--soft-idr-threshold Report statistics for peaks with a global idr below this value but return all peaks with an idr below --idr. Default: 0.05
dir=/f/xudonglab/zexing/projects/zhaoyingying/ChIP_seq/2021_05_01
peak=${dir}/macs2_callpeak
results=${dir}/idr
idr \
--samples ${peak}/JV21_H3K27me3_peaks.narrowPeak ${peak}/JV22_H3K27me3_peaks.narrowPeak \
--output-file ${results}/JV21_JV22_H3K27me3_narrowPeak.txt \
--plot ${results}/JV21_JV22_H3K27me3_narrowPeak.png \
--idr-threshold 0.05
后台运行ChIP_seq_script_1脚本如下
nohup bash ChIP_seq_script_1 > ChIP_seq_script_1_log &
对于输出文件的结果有两种处理方式: