idba_ud 安装(2018-05-24)

这个拼接软件版本比较旧,但是相对于SOAPdenovo来说比较方便。

序列的拼接与组装是基因组测序数据处理中一个至关重要的步骤,对于高通量测序的海量短序列,拼接与组装显得尤为重要。

据不完全统计拼接软件至少有十几种左右,其中大家比较熟知的有SOAPdenovo、idba_ud、ABYSS、Velvet、等,每个软件都有自己的优点和不足。

拼接的思路大体可以分为两种:一种叫做Overgap,一种是debrujin;

前者是根据两条read序列前后部分的重叠来拼接,适用于一代测序的结果,而后者是将reads切割成更小的片段k-mers,k-mers的组合来完成拼接工作,适用于二代高通量测序。idba_ud是一个可以针对不同测序深度的短reads的基于交互式De Bruijin作图的从头拼接软件。他从小的k-mer开始到大的的k-mer进行迭代计算,设定阈值,短的和低深度的contigs被删掉,这样来完成低深度和高深度的拼接。

下载新版本转移至:

https://github.com/loneknightpy/idba

参考文献

Peng, Y., et al. (2010) IDBA- A Practical Iterative de Bruijn Graph De Novo Assembler. RECOMB. Lisbon.

Peng, Y., et al. (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth, Bioinformatics, 28, 1420-1428.

安装步骤如下:

1. git clone https://github.com/loneknightpy/idba.git

进去这个文件夹

2. $ ./build.sh

3. $ ./configure --perfix=绝对路径(因为没有权限)

4. $ make

5. 将这个路径写到环境变量下

export PATH=绝对路径:$PATH

可直接使用。

Comments

Note that IDBA assemblers are designed for short reads (around 100bp). If you want to assemble paired-end reads with longer read length, please modify the constant kMaxShortSequence in src/sequence/short_sequence.h to support longer read length.

Please find the manual by running the assembler without any parameters. For example:

$ bin/idba

IDBA series assemblers accept FASTA format reads. FASTQ format reads can be converted by fq2fa program in the package.

$ bin/fq2fa read.fq read.fa

IDBA-UD, IDBA-Hybrid and IDBA-Tran require paired-end reads stored in the same FASTA file. A pair of reads should be in two consecutive lines. If not, please use fq2fa to merge two FASTQ read files to single file.

$ bin/fq2fa --merge --filter read_1.fq read_2.fq read.fa

or convert a FASTQ read file to FASTA file.

$ bin/fq2fa --paired --filter read.fq read.fa

This tool assumes that the paired-end reads are in order (->, <-). If your data is in reverse order (<-, ->), please convert it by yourself.

小技巧:idba_ud默认最长只支持reads长度为128的序列,

如果你的reads序列大于128,可修改 src/sequence/short_sequence.h文件中的kMaxShortSequence值

我的read最大150,将这个值改成了150,但是改完之后要重新编译安装一遍,用make clean清楚之前装的编译环境。

你可能感兴趣的:(idba_ud 安装(2018-05-24))