bam 可视化:samtools tview 详细解释

bam可视化软件大家可能熟悉的是IGV,然而,IGV对于大多数linux用户来说并不友好,而samtools tview可以很好满足该需求。

话不多说直接上命令行:

samtools tview -p chr1:3128088 NA12878.bam hg38.fasta

-p 指定染色体的位置,tview从指定的位置开始显示
NA12878.bam 比对结果bam文件,需要构建索引(NA12878.bam.bai)
hg38.fasta 比对时使用的fasta文件,如果不提供则第一行会显示为N. 需要提供fai索引(hg38.fasta.fai)

运行之后就会进入samtools tview 的交互界面:
bam 可视化:samtools tview 详细解释_第1张图片
结果如上图所示,图是不是让你很蒙圈,其实符号用的是Pileup_format表示的,参考https://en.wikipedia.org/wiki/Pileup_format:
. (dot) 与正链匹配的碱基
, (comma) 与反链匹配的碱基
(less-/greater-than sign) denotes a reference skip. This occurs, for example, if a base in the reference genome is intronic and a read maps to two flanking exons. If quality scores are given in a sixth column, they refer to the quality of the read and not the specific base.
AGTCN (upper case) denotes a base that did not match the reference on the forward strand
agtcn (lower case) denotes a base that did not match the reference on the reverse strand
A sequence matching the regular expression +[0-9]+[ACGTNacgtn]+ denotes an insertion of one or more bases starting from the next position. For example, +2AG means insertion of AG in the forward strand
A sequence matching the regular expression -[0-9]+[ACGTNacgtn]+ denotes a deletion of one or more bases starting from the next position. For example, -2ct means deletion of CT in the reverse strand
^ (caret) marks the start of a read segment and the ASCII of the character following `^’ minus 33 gives the mapping quality
$ (dollar) marks the end of a read segment
* (asterisk) is a placeholder for a deleted base in a multiple basepair deletion that was mentioned in a previous line by the -[0-9]+[ACGTNacgtn]+ notation

此外, 按下 “shift+?” 即可显示帮助菜单栏,如下图所示:
bam 可视化:samtools tview 详细解释_第2张图片

  1. 按下 g ,则提示输入要到达基因组的某一个位点
  2. 使用H(左)J(上)K(下)L(右)移动显示界面。
  3. Ctrl+H 向左移动1kb碱基; Ctrl+L 向右移动1kb碱基
  4. 可以用颜色标注比对质量,碱基质量,核苷酸等。
    30~40的碱基质量或比对质量使用白色表示;
    20~30黄色;
    10~20绿色;
    0~10蓝色。
  5. 使用点号’.'切换显示碱基和点号;
  6. 使用r切换显示read name
    7)其他功能等等

你可能感兴趣的:(c++,c语言,linux)