从bam/sam 生成可视化 IGV .tdf 文件;Generating .tdf from bam/sam for IGV

Chinese version:

写在前面:半吊子生信人,一路摸摸索索各种查,为了避免忘记,还是做个记录吧

(所以参数如何改之类还是建议查官网)

老板给我sam文件,问能不能搞个可视化的coverage图(也就是tdf)。igv官网上只说了igvtools里ToTDF支持的格式,没有sam/bam在内,所以在查了一些forum后整理如下。

prerequisite: samtools, igvtools. igvtools 可以用conda install igvtools,samtools可以上官网下载。我的环境是Ubuntu。

首先,如果文件是sam格式,要把他转成bam,才能进行sort 和index。

最简单的命令为: 

samtools view -bS -1 seq.sam > seq.bam

有了bam文件后,先sort 再 index, 可以重新命名下文件:

samtools sort seq.bam -o seq.sort.bam

samtools index seq.sort.bam

然后需要使用igvtools将bam转成tdf。在这之前,首先需要生成一个reference genome。

如果你的sam 文件是从bwa生成的,那么不需要这一步 (但还是要检查是否有fa.fai文件),如果不是,需要先把ref.fasta (参考genome的fasta格式)转成ref.fa.fai

samtools faidx ref.fa

随后:

cut -f1,2 ref.fa.fai > ref.chrom.sizes

将这个文件放到igvtools的genome文件夹里,这样转tdf的时候就能自动识别(这一步需要自己去看genomes文件夹在哪个位置)。我用的自己的ref,如果是数据库的ref的话可以先看看那个文件夹里是不是已经有你要用的ref了:

cp ref.chrom.sizes ~/anaconda3/share/igvtools-2.5.3-0/lib/genomes

然后最后一步 (default parameters):

igvtools count -z 5 -w 25 seq.sort.bam seq.bam.tdf ref

就ok啦, 打开IGV软件可以查看生成的文件。



English version:

(For parameter settings, please refer to igvtools and samtools)

Given the sam and bam files, one can generate the coverage diagrams for those pair-ended short sequences in .tdf formats. From the official website of IGV, igvtools’s ToTDF function didn’t support sam and bam files. Therefore, I did some research and organized the commands to complete such mission as follows.

Prerequisite: samtools (can be downloaded and installed from the official website), igvtools (can be installed using conda install igvtools). My environment is Ubuntu.

At first, the sam files are needed to be converted into bam files to be sorted and indexed.

The default commands are:

samtools view -bS -1 seq.sam > seq.bam

Once the bam files are generated, you can sort and then index your files (file names can be changed during this process):

samtools sort seq.bam -o seq.sort.bam

samtools index seq.sort.bam

Before generating tdf from bam files, you need a reference genome.

If your sam files come directly from bwa, then you don’t need this step (but you still need to check if you have the file ended in fa.fai); if this is not the case, you need to convert ref.fasta (the reference sequence in fasta format) into ref.fa.fai using:

samtools faidx ref.fa

Once you got the fa.fai files, use the command:

cut -f1,2 ref.fa.fai > ref.chrom.sizes

and put this chrom.sizes file into the genomes folder under igvtools folder so that the igvtools will automatically detect your reference genome during the tdf generation (you need to find out where your genomes folder locates, I’ll show you with my example). I used my own reference genome. If you are using a reference from the databank, you can browse through the local genomes folder to check if the reference genome is already there.

The command I used to put my own ref genome into the tool’s folder was:

cp ref.chrom.sizes ~/anaconda3/share/igvtools-2.5.3-0/lib/genomes

At last, the final step for the tdf generation will be (using default parameters):

igvtools count -z 5 -w 25 seq.sort.bam seq.bam.tdf ref

You are all set now. Remember to check the tdf files using the IGV app.

你可能感兴趣的:(从bam/sam 生成可视化 IGV .tdf 文件;Generating .tdf from bam/sam for IGV)