直接把 bam 导入IGV看峰图太慢,把 bam 转为 bigWig 格式后,文件会小很多,载入也很快了。
启发: GEO 看到一句:
Bigwig files were obtained with the bamCoverage function from deepTools. Values represent number of reads per bin. We used a bin size of 50.
https://deeptools.readthedocs.io/en/develop/content/list_of_tools.html
安装软件
$ pip3 install deeptools
$ deeptools --version
deeptools 3.5.1
$ bamCoverage --help
usage: An example usage is:$ bamCoverage -b reads.bam -o coverage.bw
This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output.
The coverage is calculated as the number of reads per bin, where bins are short consecutive counting windows of a defined size. It is possible to extended the length of the reads to better reflect the actual fragment length.
*bamCoverage* offers normalization by scaling factor, Reads Per Kilobase per Million mapped reads (RPKM), counts per million (CPM), bins per million mapped reads (BPM) and 1x depth (reads per genome coverage, RPGC).
对于基因 EEF1D, 10x bam 最大366;
# v1 # 去掉的过多了! 最大是 75
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.bigWig \
--outFileFormat bigwig \
--ignoreDuplicates \
-p 12 \
--scaleFactor 0.5
其中 -p 12 是多进程,测试机器只有12个核;
# v2 no --ignoreDuplicates \ #最大是 3851
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v2.bigWig \
--outFileFormat bigwig \
-p 12 \
--scaleFactor 0.5
--ignoreDuplicates If set, reads that have the same orientation and start position will be considered only once.
If reads are paired, the mate's position also has to coincide to ignore a read. (default: False)
# 方向一致,起点一致就算一次! 如果是配对的,另一个位置也要一致才可以忽略一个 read。 默认 F
# v3 no --ignoreDuplicates , no --scaleFactor 0.5 #最大是 v2 的2倍,7701
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v3.bigWig \
--outFileFormat bigwig \
-p 12
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v4.bigWig \
--outFileFormat bigwig \
--binSize 10 \
-p 12
--binSize INT bp, -bs INT bp
Size of the bins, in bases, for the output of the bigwig/bedgraph file. (Default: 50)
分区间的大小,默认 50
当改为1bp分辨率时,生成时间突然变长了:
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v4_1.bigWig \
--outFileFormat bigwig \
--binSize 1 \
-p 12
$ ls -lth bam/bw
total 29M
-rw-rw-r-- 1 wangjl wangjl 13M May 23 13:42 BMMC_donor1_cluster6.v4_1.bigWig
-rw-rw-r-- 1 wangjl wangjl 4.6M May 23 13:34 BMMC_donor1_cluster6.v4.bigWig
-rw-rw-r-- 1 wangjl wangjl 1.9M May 23 13:19 BMMC_donor1_cluster6.v3.bigWig
-rw-rw-r-- 1 wangjl wangjl 1.9M May 23 11:30 BMMC_donor1_cluster6.v2.bigWig
-rw-rw-r-- 1 wangjl wangjl 1.9M May 22 22:44 BMMC_donor1_cluster6.bigWig
-rw-rw-r-- 1 wangjl wangjl 3.2M May 22 22:44 BMMC_donor1_cluster1.bigWig
-rw-rw-r-- 1 wangjl wangjl 2.5M May 22 22:42 BMMC_donor1_cluster3.bigWig
会特别慢 13:54–>14:11 还没结束,算了,强制退出。其他都是1-2min结束。
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v5.bigWig \
--outFileFormat bigwig \
--binSize 10 \
--smoothLength 20
-p 12
--smoothLength INT bp
The smooth length defines a window, larger than the binSize, to average the number of reads.
For example, if the --binSize is set to 20 and the --smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered.
Any value smaller than --binSize will be ignored and no smoothing will be applied. (default: None)
平滑长度,低于 --binSize 则忽略,不做平滑化。默认不做平滑化。
起初,这个无脑做出来的bw文件在 igv上看是反的,仔细看参数,是过滤掉 forward,那么剩下的是 reverse。同理 filterRNAstrand reverse 得到的是 forward 链的reads。
下面是正确的命名:
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v6r.bigWig \
--outFileFormat bigwig \
--binSize 1 \
--filterRNAstrand forward \
-p 12
$ bamCoverage --bam BMMC_donor1_cluster6.bam \
--outFileName bw/BMMC_donor1_cluster6.v6f.bigWig \
--outFileFormat bigwig \
--binSize 1 \
--filterRNAstrand reverse \
-p 12
总结:按方向生成 1bp 分辨率的bw,再使用 igv 查看。
效果图:
看图中,cluster6的bam(最大366)和后面的v6 (最大5791) 最高值差异很大,可能 bam 还是有抽样?或者 bw 值偏大?
换几个基因看看:
BigWig 和 原bam 的峰形状为什么不一样呢? //todo 不知道。
这不是 igv.js 特有的,igv 桌面版也是这样:
网页版 IGV.js 的代码:
help: https://github.com/igvteam/igv.js/wiki
|
test3: show tracks using BigWig file as input
目前,该issue还没有收到回复:https://github.com/deeptools/deepTools/issues/1139