HISAT2 Bowtie2 提取唯一比对 unique mapping reads

把read比对到基因组之后,需要提取唯一比对来进行下一步分析。

bowtie2和HISAT2 都没有只输出唯一比对的命令,所以需要对比对结果sam文件进行提取,可以通过sam文件最后一列的标签进行筛选。

Bowtie2

bowtie2的输出sam文件的标签:

AS:i::Alignment score. Can be negative. Can be greater than 0 in [--local] mode (but not in [--end-to-end] mode). Only present if SAM record is for an aligned read.(引用自bowtie2 manual:http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#sam-output)

表示比对分数,只有比对上的reads才会有这个标签。
所以我们先用grep “AS:” aligned.sam将所有比对上的reads提取出来。

XS:i:: Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Can be greater than 0 in [--local] mode (but not in [--end-to-end]mode). Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than [AS:i] (引用自bowtie2 manual:http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#sam-output)

XS:i:标签是在多个比对时才会出现(multi mapping reads)。
所以用grep -v "XS:i"命令即可实现将不包含XS:i标签的例子全部提取出来。

bowtie2提取唯一比对的命令总结
grep "AS:" aligned.sam | grep –v "XS:"  > unique_alignments.sam

HISAT2

HISAT2输出sam文件的标签:

ZS:i: : Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i.
NH:i: : The number of mapped locations for the read or the pair.
(引用自HISAT2 manual: http://daehwankimlab.github.io/hisat2/manual/)

ZS:i:仅当SAM记录用于已对齐的读取且已为该读取找到多个对齐时才出现。
NH:i:显示read 比对到基因组位置的个数

HISAT2提取唯一比对的命令总结
grep "NH:i:1" aligned.sam | grep -v "ZS:i" > unique_alignments.sam

进行提取后,一定要用wc -l unique_alignments.sam来查看read的个数,用来确定和比对软件输出的log文件中记录的个数是否一致。

wc -l unique_alignments.sam
42731387 unique_alignments.sam

比对软件输出的log文件结果:

62624861 reads; of these:
  62624861 (100.00%) were unpaired; of these:
    7315939 (11.68%) aligned 0 times
    42731387 (68.23%) aligned exactly 1 time
    12577535 (20.08%) aligned >1 times
88.32% overall alignment rate

aligned exactly 1 time 表示唯一比对。

在网上查了很多方法,进行HISAT2比对时,很多人建议只需要用grep "NH:i:1"就可以筛选出唯一比对,但是筛选出的read个数和输出的log文件显示的个数不相符。也看到其他人提问过这个问题,但是没有人解答,所以分享出来。

Bowtie1

只需要在mapping时 加入 -m 1 ,就可以直接输出所有 唯一比对的read。

你可能感兴趣的:(HISAT2 Bowtie2 提取唯一比对 unique mapping reads)