统计测序深度和覆盖度的工具

1.GATK DepthOfCoverage

cat test.sh 

#!/usr/bin/bash HG38=/path/hg38/hg38.fa GATK=/path/biosoft/gatk3.7/GenomeAnalysisTK.jar java -jar -Xmx10g $GATK -T DepthOfCoverage \ -R $HG38 \ -o ./test \ -I /path2bam/03_bam_processing/03_base_recal/test.sorted_MarkDuplicates_BQSR.bam \ -L /path2bed/target.bed \ --omitDepthOutputAtEachBase --omitIntervalStatistics \ -ct 1 -ct 5 -ct 10 -ct 20 -ct 30 -ct 50 -ct 100

结果生成四个文件

 4096 Jan 13 21:39 ./
 4096 Jan 13 21:05 ../
 6417 Jan 13 21:39 test.sample_cumulative_coverage_counts
 GenekVIP 6412 Jan 13 21:39 test.sample_cumulative_coverage_proportions
 9362 Jan 13 21:39 test.sample_statistics
 291 Jan 13 21:39 test.sample_summary

 less test.sample_sumary 结果不是很好

sample_id       total   mean    granular_third_quartile granular_median granular_first_quartile %_bases_above_1 %_bases_above_5 %_bases_above_10        %_bases_above_20        %_bases_above_3
test    2025198 17.50   1       1       1       5.1     2.7     2.6     2.5     2.5     2.4     2.3
Total   2025198 17.50   N/A     N/A     N/A

#

2.bamdst: bam文件深度统计

#

软件安装:

git clone https://github.com/shiquan/bamdst.git
cd bamdst/
make

 

测试:

cat  test.sh

#!/usr/bin/bash
/path2bamdst/biosoft/bamdst/bamdst -p /path2bed/target.bed -o ./ ../03_base_recal/test.sorted.markdup.realign.BQSR.bam echo "run status $?"

结果文件:

   1630 Jan 13 16:58 chromosomes.report
   4331 Jan 13 16:58 coverage.report
  64188 Jan 13 16:58 depth_distribution.plot
 838584 Jan 13 16:58 depth.tsv.gz
  18119 Jan 13 16:58 insertsize.plot
   9519 Jan 13 16:58 region.tsv.gz
     0  Jan 13 16:58 uncover.bed

看一下内容:

$ less -S chromosomes.report#该文件中存储的是从bam文件中获取的染色体深度、覆盖度信息
Chromosome        DATA(%)       Avg depth          Median       Coverage%        Cov 4x %       Cov 10x %       Cov 30x %
chrM       100.00            0.26              0.0           5.66            2.83            0.00            0.00
$ less coverage.report#信息太多了,我目前觉得比较重要的有
[Total] Raw Reads (All reads)    15
[Total] Mapped Reads    15
[Total] Properly paired    0#Paired reads with properly insert size
[Total] Fraction of Properly paired    0.00%
[Total] Read and mate paired    0#read1 and read2 paired.
[Total] Fraction of Read and mate paired    0.00%
[Total] Map quality cutoff value    20
[Total] MapQuality above cutoff reads    10
[Total] Fraction of MapQ reads in all reads    66.67%
[Total] Fraction of MapQ reads in mapped reads    66.67%
[Target] Average depth    0.26
[Target] Average depth(rmdup)    0.06
[Target] Coverage (>0x)    5.66%
[Target] Coverage (>=4x)    2.83%
[Target] Coverage (>=10x)    0.00%
[Target] Coverage (>=30x)    0.00%
$ less depth_distribution.plot#结合R可以做出目标区域的深度分布图
0       900     0.943396        54      0.056604
1       0       0.000000        54      0.056604
2       0       0.000000        54      0.056604
3       27      0.028302        27      0.028302
4       4       0.004193        23      0.024109
#左边三列分别代表:覆盖深度,对应该深度的碱基数,碱基占比(以目标区域的碱基数为分母);
#右边两列是高深度向低深度累加的值,分别是碱基数及其占比,累加到X=1为止。
$ less depth.tsv
#Chr    Pos     Raw Depth       Rmdup depth     Cover depth
chrM    650     8       6       8
chrM    651     8       6       8
chrM    652     8       6       8
chrM    653     9       6       9
chrM    654     9       6       9
#Raw Depth从输入bam文件中得到,没有任何限制;
#Rmdup depth去除了PCR重复的reads, 次优比对reads, 低比对质量的reads(mapQ < 20), 该值与samtools depth的输出深度类似;
#默认使用raw depth来统计coverage.report文件中的覆盖率信息。 如果要使用rmdup depth计算覆盖率,可使用参数"--use_rmdup";
#The coverage depth is the raw depth with consider of deletion region, so this value should be equal to or greated than the raw depth.
$ less insertsize.plot#做出两个接头之间的插入片段大小的图,理论上是根据read1和read2在染色体上的坐标信息求得,这时read1和read2要求比对到一条染色体上。
$ less region.tsv
#Chr    Start   Stop    Avg depth       Median  Coverage        Coverage(FIX)
chrM    649     1603    0.26    0.0     5.66    5.66
#目标区域文件每一个区域的统计,其中Coverage以X>0来统计
$ less uncover.bed
chrM    672     1603
#--uncover的值默认是5,从前面depth_distribution.plot文件中也能看出来,深度小于5的碱基数就是931;
#包含低覆盖或者是未覆盖的,通过"--uncover"规定“未覆盖”的阈值。

REFERENCE   简书作者:hsy_hzauer 链接:https://www.jianshu.com/p/ac7b8e4d76e4

转载于:https://www.cnblogs.com/yellow-hgy/p/10264275.html

你可能感兴趣的:(统计测序深度和覆盖度的工具)