一键线粒体组装软件大比拼
[toc]
1. mitobim
1.1 软件信息
发表日期:2013年
软件下载地址:https://github.com/chrishah/MITObim
论文地址:https://academic.oup.com/nar/article/41/13/e129/1129833
编写主程序语言:perl
Prerequisites:MIRA
Input:reference,seed.
Output
1.2 算法
baiting and iterative mapping
(i) Deriving reference sequence from previous mapping assembly,
(ii) in silico baiting using the newly derived reference
(iii) previously fished reads are mapped to the newly derived reference leading to an extension of the reference sequence.
![MITObim程序的原理图工作流程。 第一步,将线粒体读图映射到相关参考序列上的保守区。 有关物种的初步参考依据是制图结果。 第二步,钓鱼读取与读取池中先前标识的区域重叠。 第三步,映射读取的子集并创建新的扩展引用。 重复执行第二步和第三步,直到所有间隙都闭合并且读取次数保持固定。 黑色矩形,核读; 红色矩形,远缘种的线粒体基因组; 绿色矩形,线粒体读数和不断增长的线粒体参考。]
1.3 创新点
第一个针对细胞器组的组装的
1.4 缺点
运行速度较慢,内存消耗较大
很难定义近缘序列和有的无近缘参考序列
NUMt未评估
2.ARC
2.1 软件信息
发表日期:2014年
软件下载地址: http://ibest.github.io/ARC/
论文地址:https://www.biorxiv.org/content/10.1101/014662v1.full
编写主程序语言:**python **
Prerequisites:
Mapper:
Bowtie 2
Blat
Assembler:
Roche/Newbler assembler
Spades assembler
Additional Requirements:
- Python 2.7.X
- Python module BioPython
Input:Trimmed and interleaved reads(fasta,fasta), reference mitogenome(fasta)
Output;configs
2.2 算法
Map reads against a set of targets using BLAT or Bowtie2
Extract mapped reads
Assemble mapped reads into contigs using Roche/Newbler or Spades assemblers
Map reads against the newly formed contigs
-
Iterate until stopping conditions have been met
2.3 创新点
In many experiments, de novo assembly of the full dataset is slow, resource intensive, and the end result is difficult to analyze because thousands of contigs are produced. Furthermore, it is difficult to take advantage of additional information available from previously assembled, but distantly related sequences during the assembly stage.
Mapping based approaches also have limitations due to regions of low sequence identity where reads cannot be mapped as described by Heng Li
2.4 缺点
近缘序列
重复序列
3.mitoMaker
3.1 软件信息
发表日期:2014年
软件下载地址:
论文地址:
编写主程序语言:**python 2.7 **
Prerequisites:
Input:raw data
Output;configs
3.2 算法
mitoMaker是一款线粒体/叶绿体组装的pipeline软件,可以从原始的下机数据开始,自动化的组装基因组,注释基因结构,最终生成genebank, fasta 等文件。 整个pipeline 可以分成6个主要步骤:
1)基于不同大小的kmer 值进行denovo 组装
2)查找对应的结构基因,并检测第一步组装的结果是否环化;
3) 从所有的组装结果中,挑选一个最佳的组装结果;
4)将最佳的组装结果作为参照,调用Mira 和 MITObim, 延长组装结果,并进行gap close;进一步提升组装结果的质量
- 基于第四步的组装结果,进行基因结构注释;
6)创建最终的结果目录,包含所有的结果(PNG, GENBANK, FASTA, SEQUIN, CAF, MAF and a stats logfile)
3.3 创新点
一键式
3.4 缺点
用的少,
运行有错误
4.NOVOPlasty
4.1 软件信息
发表日期:2016年
软件下载地址: https://github.com/ndierckx/NOVOPlasty/
论文地址: Dierckxsens N., Mardulyn P. and Smits G. (2016) NOVOPlasty: De novo assembly of organelle genomes from whole genome data. Nucleic Acids Research, doi: 10.1093/nar/gkw955
编写主程序语言:Perl
Prerequisites:
Input:Trimmed and interleaved reads, reference mitogenome
Output;configs
4.2 算法
seed-and-extend algorithm
(A) All reads are stored in a hash table with a unique id. A second hash table contains the ids for the read start = k-mer parameter (default = 38) of the corresponding read.
(B) Scope of search 1 is the region where a match of the ‘read start’ indicates a extension of the sequence. All these matching reads are stored separately.
(C) The position of the paired reads are verified by aligning each paired read to a previous assembled area, which is determined by the library insert size (scope of search 2).
(D) A consensus sequence of the different extensions is determined.
4.3 创新点
种子扩展算法。
速度快,准确性高,资源消耗小
4.4 缺点
较依赖种子,种子序列不同,结果有可能不同
叶绿体也可
5.Norgal
5.1 软件信息
发表日期:2017年
软件下载地址: https://bitbucket.org/kosaidtu/norgal
论文地址: Al-Nakeeb, K., Petersen, T. & Sicheritz-Pontén, T. Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data. BMC Bioinformatics 18, 510 (2017) doi:10.1186/s12859-017-1927-y
编写主程序语言:**python3 **
Prerequisites: Java, matplotlib
Input:Trimmed and interleaved reads, reference mitogenome
Output;configs
5.2 算法
基于kmer值
- Trim and remove adapters from NGS reads using AdapterRemoval [6] and perform a de novo assembly using MEGAHIT [7].
- Map the reads back to the longest assembled sequence using bwa mem [8] and calculate the read depths for each position in order to determine the nuclear depth threshold (ND threshold).
- Count kmers of size 31 in all reads and only keep a subset of reads that contains at least one 31-kmer with a frequency that is greater than the ND threshold. This is done using the program BBTools [9].
- Perform a de novo assembly using idba_ud with the reads containing the frequent kmers and extract either the longest contig or optionally the longest contig with a predicted cytochrome c oxidase subunit 1 (COI) gene.
-
Examine circularity of the longest contig, determine read depth, identify potential mitochondrial and chloroplast contigs, and output plots comparing depths between this contig and the longest contig from the assembly in step (1).
5.3 创新点
1.基于kmer值
Norgal是线粒体基因组完全未知且无法从任何已知参考序列或种子序列组装而成的场景中的最佳选择
5.4 缺点
慢,资源消耗大且不一定准确
NUMT 不能去除
不 适合用于宏基因组学数据集或读数均均匀分布在有丝分裂基因组和核基因组中的数据集(例如线粒体拷贝数低的生物或PCR重复很多的样品) 和Irregular and complex mitochondria
5.5 建议
Uers interested in completely unknown chloroplast or other organelle genomes for which there are no known sequences, the following approach is suggested:
-
Extract contigs of interests from the Norgal assembly, such as the ten longest contigs or the contigs with hits from the BLAST-search
-
Run MITOBim or NOVOPlasty or another assembler that can extend seed sequences on each of the ten contigs
-
Validate the output by:
- (a)mapping reads back to the contigs and compare depths to the nuclear depth
- (b)checking for circularity in the contigs
- (c)annotating the contigs with relevant features e.g. mitochondrial genes etc.
6.MitoZ
6.1 软件信息
发表日期:2019年
软件下载地址: https://github.com/linzhi2013/MitoZ
论文地址:MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization,Nucleic Acids Research
编写主程序语言:**python **
Prerequisites:
conda create -n mitozEnv libgd=2.2.4 python=3.6.0 biopython=1.69 ete3=3.0.0b35 perl-list-moreutils perl-params-validate perl-clone circos=0.69 perl-bioperl blast=2.2.31 hmmer=3.1b2 bwa=0.7.12 samtools=1.3.1 infernal=1.1.1 tbl2asn openjdk
$ conda update -c bioconda tbl2asn ete3
Input:rawdata
Output;contigs,图,注释
6.2 算法
从头组装再鉴定PCGs
6.3 创新点
无需参考序列
一键式
准确性,和完整性好
6.4 缺点
内存较大
只能组装脊椎和昆虫
有时会缺少tRNA,即使有时环状
ps:写完后看到了一篇综述,大家可以参考下
匡卫民 于黎. 基因组时代线粒体基因组拼装策略及软件应用现状[J]. 遗传, 10.16288/j.yczz.19-227.