软件一:MITObim - 线粒体诱饵和迭代映射
VERSIONS :1.9(稳定 - 依赖于MIRA 4.0.2)
我提供了进一步的例子(https://github.com/chrishah/MITObim/tree/master/examples)
必备条件
-------------
- GNU工具
- Perl
- MIRA的运行版本
MIRA 4.0.2 http://sourceforge.net/projects/mira-assembler/files/MIRA/stable/
** MIRA的预编译**二进制文件可用于Linux和OSX http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html“MIRA的最终指南”
MITObim程序(线粒体诱饵和迭代映射)
代表了直接从总基因组DNA衍生的NGS读取中组装非模式生物的新线粒体基因组的高效方法。
脚本正在执行三个步骤并迭代地重复:
(i)从先前的映射组合中导出参考序列,
(ii)使用新导出的参考进行的电脑诱饵
(iii)先前捕获的读取被映射到新导出的引用,导致扩展的参考序列
需要将包含MIRA可执行文件的目录放在PATH中才能成功 使用MITObim.pl
如果您不能或不会这样做,您还可以通过--mirapath选项告诉MITObim在哪里找到正确的MIRA二进制文件
- 从Github下载MITObim包装器脚本和testdata,例如 将整个MITObim存储库下载到zip存档(使用Github页面上的按钮)或在命令行上使用git(`git clone --recursive git:// github.com / chrishah / MITObim.git`)
在ubuntu上安装docker应该是一样简单:
```bash
sudo apt-get install docker.io
```
然后,您可以在计算机上指定一个将与映像中的/ home / data目录同步的工作目录,并输入自包含的shell环境以运行MITObim:
```bash
WORKING_DIR=/您/所需/工作/目录
sudo docker run -i -t -v $ WORKING_DIR /:/ home / data chrishah / mitobim / bin / bash
if you found MITObim useful, please cite:
Hahn C, Bachmann L and Chevreux B. (2013) Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads -
a baiting and iterative mapping approach. Nucl. Acids Res. 41(13):e129. doi: 10.1093/nar/gkt371
*************************************************************************************************************************************************************************
usage: ./MITObim.pl
parameters:
-start iteration to start with (default=0, when using '-quick' reference)
-end iteration to end with (default=startiteration, i.e. if not specified otherwise stop after 1 iteration)
-sample sampleID (please don't use '.' in the sampleID). If resuming, the sampleID needs to be identical to that of the previous iteration / MIRA assembly.
-ref referenceID. If resuming, use the same as in previous iteration/initial MIRA assembly.
-readpool readpool in fastq format (*.gz is also allowed). read pairs need to be interleaved for full functionality of the '-pair' option below.
-quick reference sequence to be used as bait in fasta format
-maf extracts reference from maf file created by previous MITObim iteration/MIRA assembly (resume)
optional:
--kbait set kmer for baiting stringency (default: 31)
--platform specify sequencing platform (default: 'solexa'; other options: 'iontor', '454', 'pacbio')
--denovo runs MIRA in denovo mode
--pair extend readpool to contain full read pairs, even if only one member was baited (relies on /1 and /2 header convention for read pairs) (default: no).
--verbose show detailed output of MIRA modules (default: no)
--split split reference at positions with more than 5N (default: no)
--help shows this helpful information
--clean retain only the last 2 iteration directories (default: no)
--trimreads trim data (default: no; we recommend to trim beforehand and feed MITObim with pre trimmed data)
--trimoverhang trim overhang up- and downstream of reference, i.e. don't extend the bait, just re-assemble (default: no)
--mismatch number of allowed mismatches in mapping - only for illumina data (default: 15% of avg. read length)
--min_cov minimum average coverage of contigs to be retained (default: 0 - off)
--min_len minimum length of contig to be retained as backbone (default: 0 - off)
--mirapath full path to MIRA binaries (only needed if MIRA is not in PATH)
--redirect_tmp redirect temporary output to this location (useful in case you are running MITObim on an NFS mount)
--NFS_warn_only allow MIRA to run on NFS mount without aborting - warn only (expert option - see MIRA documentation 'check_nfs')
--version display MITObim version
examples:
./MITObim.pl -start 1 -end 5 -sample StrainX -ref reference-mt -readpool illumina_readpool.fastq -maf initial_assembly.maf
./MITObim.pl -end 10 -quick reference.fasta -sample StrainY -ref reference-mt -readpool illumina_readpool.fastq
TUTORIAL I: reconstruction of a mitochondrial genome using a two step procedure
a. Initial mapping assembly using MIRA:(初始组装基因)
-bash-4.1$ mkdir tutorial1
-bash-4.1$ cd tutorial1
-bash-4.1$ cp /PATH/TO/testdata1/Tthymallus-150bp-300sd50-interleaved.fastq initial-mapping-testpool-to-Salpinus-mt_in.solexa.fastq
-bash-4.1$ cp /PATH/TO/testdata1/Salpinus-mt-genome-NC_000861.fasta initial-mapping-testpool-to-Salpinus-mt_backbone_in.fasta
-bash-4.1$ ln -s /PATH/TO/testdata1/Tthymallus-150bp-300sd50-interleaved.fastq reads.fastq
-bash-4.1$ ln -s /PATH/TO/testdata1/Salpinus-mt-genome-NC_000861.fasta reference.fa
-bash-4.1$ echo -e "\n#manifest file for basic mapping assembly with illumina data using MIRA 4\n\nproject = initial-mapping-testpool-to-Salpinus-mt\n\njob=genome,mapping,accurate\n\nparameters = -NW:mrnl=0 -AS:nop=1 SOLEXA_SETTINGS -CO:msr=no\n\nreadgroup\nis_reference\ndata = reference.fa\nstrain = Salpinus-mt-genome\n\nreadgroup = reads\ndata = reads.fastq\ntechnology = solexa\nstrain = testpool\n" > manifest.conf
-bash-4.1$ head -n 20 manifest.conf
-bash-4.1$ mira manifest.conf
运行结果在a.txt中
-bash-4.1$ ls -hlrt
-bash-4.1$ ls -hlrt initial-mapping-testpool-to-Salpinus-mt_assembly/
The newly constructed reference(新的参考序列位置) is contained in the file `initial-mapping-testpool-to-Salpinus-mt_out.maf` in the `initial-mapping-testpool-to-Salpinus-mt_d_results` directory.
b. Baiting and iterative mapping using the MITObim.pl script:
-bash-4.1$ /PATH/TO/MITObim.pl -start 1 -end 10 -sample testpool -ref Salpinus_mt_genome -readpool reads.fastq -maf initial-mapping-testpool-to-Salpinus-mt_assembly/initial-mapping-testpool-to-Salpinus-mt_d_results/initial-mapping-testpool-to-Salpinus-mt_out.maf &> log
pwd:/home/cainana/MITObim/dir/tutorial2
running mapping assembly using MIRA
readpool contains 6000 reads
assembly contains 1 contig(s)
contig length: 16664
MITObim has reached a stationary read number after 5 iterations!!
Final assembly result will be written to file: /home/cainana/MITObim/dir/tutorial2/iteration5/testpool_Salpinus_mt_genome-it5_noIUPAC.fasta
TUTORIAL II - direct reconstruction without prior mapping assembly using the --quick option(无需之前的基因组装,直接用-quick选项重建)(*approximate runtime: 4 min*)
-bash-4.1$ mkdir tutorial3
-bash-4.1$ cd tutorial3
-bash-4.1$ /PATH/TO/MITObim.pl -start 1 -end 30 -sample testpool -ref Salpinus_mt_genome -readpool ~/PATH/TO/testdata1/Tthymallus-150bp-300sd50-interleaved.fastq --quick ~/PATH/TO/testdata1/Salpinus-mt-genome-NC_000861.fasta &> log
result:reconstruct the mitochondrial genome and reach a stationary number of mitochondrial reads only after 14 iterations
324.1MB
TUTORIAL III - reconstructing mt genomes from mt barcode seeds(mt maybe is mitochondrial)(*approximate runtime: 20 min*)
-bash-1.4$ mkdir tutorial4
-bash-1.4$ cd tutorial4
-bash-1.4$ ~/PATH/TO/MITObim.pl -sample testpool -ref Tthymallus-COI -readpool ~/PATH/TO/testdata1/Tthymallus-150bp-300sd50-interleaved.fastq --quick ~/PATH/TO/testdata1/Tthymallus-COI-partial-HQ961018.fasta -end 100 --clean &> log
MITObim reconstructs the mitchondrial genome in 82 iterations. runtime: 6 min
`--clean` option which tells MITObim to always only keep the latest two iteration directories to save space
ls tutorial4
only iteration81\iteration82\log 85.1MB
For "well behaved" datasets
_de novo_ assembly(--denovo` flag)
read pair information(`--paired` flag) can further speed up the reconstruction
(*approximate runtime: 10 min*)
-bash-4.1$ mkdir tutorial3-denovo
-bash-4.1$ cd tutorial3-denovo
-bash-4.1$ ~/PATH/TO/MITObim.pl -sample testpool -ref Tthymallus-COI -readpool ~/PATH/TO/testdata1/Tthymallus-150bp-300sd50-interleaved.fastq --quick ~/PATH/TO/testdata1/Tthymallus-COI-partial-HQ961018.fasta -end 50 --denovo --paired --clean &> log
run的过程中tutorial4-denovo下不断更新 latest 3 dir and log END: 30\31
runtime:3 min 93.9MB
~/MITObim/MITObim.pl --sample SRR831234 -ref GQ368662 -readpool ~/F/genomes/SRR831234.fastq --quick ~/F/genomes/GQ368662.fasta -end 5 --denovo --paired --clean &>log
Fatal error (may be due to problems of the input data or parameters):
********************************************************************************
* Tmp directory is on a NFS mount ... but we don't want that. *(最好不要把临时文件放在挂在的磁盘上,会很慢) a controlled program stop
********************************************************************************
cainana@cainana-VirtualBox:~/genomes/reconstruction/work2$ ~/MITObim/MITObim.pl --sample SRR831234 -ref GQ368662 -readpool ~/genomes/SRR831234.fastq --quick ~/genomes/GQ368662.fasta -end 5 --denovo --paired --clean &>log
error
********************************
******************************
**************************
************************
*********************
*******************
****************
**************
*************
***********
*********
*******
*****
技术路线
1、对比 Bowtie2-2.2.9
$./bowtie2-build AJ492192.fasta AJ492192
$./bowtie2 -x AJ492192 -U SRR831234.fastq -s SRR831234_bowtie.sam
2、格式处理 Picard
$ java -jar picard.jar SortSam I=SRR831234_bowtie.sam O=SRR831234_bowtie_s.sam SORT_ORDER=coordinate
$ java -jar picard.jar SamToFastq I=SRR831234_bowtie_S.sam FASTQ=SRR831234_bowtie_pic.fastq
3、组装 MITObim_1.8
$nohup ./MITObim_1.8.pl -start 1 -end 30 -sample SRR831234 -ref AJ492192 -readpool SRR831234_bowtie_pic_QC.fastq
/ --quick AJ492192.fasta --mirapath /home/su/cesar/mira_4.0.2_linux-gnu_x86_64_static/bin --NFE_warn_only >log & (kbait= 31 default)
4、环化 BLAST+
5、查重 Notepad++
6、注释 bioedit