DAS Tool 是一种自动化的处理方法, 集成了多个 binning 算法的结果, 从而从单个 assembly 结果中获取优质的, 非冗余的 bins. 与其他方法相比, 其可以从土壤基因组中重建更多接近完整的基因组 1 2
DAS Tool 可以通过 Bioconda 安装. 存储库.
conda install -c bioconda das_tool
(例 1) 对 MetaBAT, MaxBin, Concot, TourESOM 的 binning 结果运行 DAS Tool.
$ ./DAS_Tool -i \
sample_data/sample.human.gut_concoct_scaffolds2bin.tsv, \
sample_data/sample.human.gut_maxbin2_scaffolds2bin.tsv, \
sample_data/sample.human.gut_metabat_scaffolds2bin.tsv, \
sample_data/sample.human.gut_tetraESOM_scaffolds2bin.tsv \
-l concoct,maxbin,metabat,tetraESOM \
-c sample_data/sample.human.gut_contigs.fa \
-o sample_output/DASToolRun1
其中 -i
指定不同 binning 软件输出的 bin, -l
指定标签, 也就是对应 binning 结果的输出软件, -c
指定用于此次 binning 的叠连群, 指定为 fasta 格式. -o
指定输出文件前缀.
用逗号分隔的 bin 表
-i, --bins methodA.scaffolds2bin,...,methodN.scaffolds2bin
列表为用 "\t"
分隔的 scaffold-IDs 和 bin-IDs, 如下:
Scaffold_1 bin.01
Scaffold_8 bin.01
Scaffold_42 bin.02
Scaffold_49 bin.03
FASTA 格式的叠连群 (contigs)
-c, --contigs contigs.fa
也就是用于 binning 的 assembly 文件, 如下:
>Scaffold_1
ATCATCGTCCGCATCGACGAATTCGGCGAACGAGTACCCCTGACCATCTCCGATTA...
>Scaffold_2
GATCGTCACGCAGGCTATCGGAGCCTCGACCCGCAAGCTCTGCGCCTTGGAGCAGG...
预先预测的蛋白序列
--proteins proteins.faa
格式如
>Scaffold_1_1
MPRKNKKLPRHLLVIRTSAMGDVAMLPHALRALKEAYPEVKVTVATKSLFHPFFEG...
>Scaffold_1_2
MANKIPRVPVREQDPKVRATNFEEVCYGYNVEEATLEASRCLNCKNPRCVAACPVN...
输出文件包括
--write_bin_evals
为 1 1 1 (默认为 1 1 1), 则估计输入bin集合的质量和完整性 (_[method].eval).--create_plots
为 1 1 1 (默认为 1 1 1), 则显示每种方法的高质量 bin 的数量和分数分布 (_DASTool_hqBins.pdf,_DASTool_scores.pdf).--write_bins
为 1 1 1 (默认为 0 0 0), 则以 FASTA 格式输出 bin (DASTool_Bins).DAS_Tool -i methodA.scaffolds2bin,...,methodN.scaffolds2bin
-l methodA,...,methodN -c contigs.fa -o myOutput
-i, --bins Comma separated list of tab separated scaffolds to bin tables.
-c, --contigs Contigs in fasta format.
-o, --outputbasename Basename of output files.
-l, --labels Comma separated list of binning prediction names. (optional)
--search_engine Engine used for single copy gene identification [blast/diamond/usearch].
(default: usearch)
--write_bin_evals Write evaluation for each input bin set [0/1]. (default: 1)
--create_plots Create binning performance plots [0/1]. (default: 1)
--write_bins Export bins as fasta files [0/1]. (default: 0)
--proteins Predicted proteins in prodigal fasta format (>scaffoldID_geneNo).
Gene prediction step will be skipped if given. (optional)
--score_threshold Score threshold until selection algorithm will keep selecting bins [0..1].
(default: 0.5)
--duplicate_penalty Penalty for duplicate single copy genes per bin (weight b).
Only change if you know what you're doing. [0..3]
(default: 0.6)
--megabin_penalty Penalty for megabins (weight c). Only change if you know what you're doing. [0..3]
(default: 0.5)
--db_directory Directory of single copy gene database. (default: install_dir/db)
--resume Use existing predicted single copy gene files from a previous run [0/1]. (default: 0)
--debug Write debug information to log file.
-t, --threads Number of threads to use. (default: 1)
-v, --version Print version number and exit.
-h, --help Show this message.
Example 2: Run DAS Tool again with different parameters. Use the proteins predicted in Example 1 to skip the gene prediction step, disable writing of bin evaluations, set the number of threads to 2 and score threshold to 0.6. Output files will start with the prefix DASToolRun2:
$ ./DAS_Tool -i sample_data/sample.human.gut_concoct_scaffolds2bin.tsv, \
sample_data/sample.human.gut_maxbin2_scaffolds2bin.tsv, \
sample_data/sample.human.gut_metabat_scaffolds2bin.tsv, \
sample_data/sample.human.gut_tetraESOM_scaffolds2bin.tsv \
-l concoct,maxbin,metabat,tetraESOM \
-c sample_data/sample.human.gut_contigs.fa \
-o sample_output/DASToolRun2 \
--proteins sample_output/DASToolRun1_proteins.faa \
--write_bin_evals 0 \
--threads 2 \
--score_threshold 0.6
不是所有的 binning 工具都以 "\t"
分隔的 scaffold-ID 和 bin-ID 文件形式输出. DAS 工具同时提供了一个脚本, 将一组 fasta 格式的 bin 转化为 “scaffolds2bin” 表格, 用于 DAS Tool 的输入: Fasta_to_Scaffolds2Bin
$ src/Fasta_to_Scaffolds2Bin.sh -h
Fasta_to_Scaffolds2Bin: Converts genome bins in fasta format to scaffolds-to-bin table.
Usage: Fasta_to_Scaffolds2Bin.sh -e fasta > my_scaffolds2bin.tsv
-e, --extension Extension of fasta files. (default: fasta)
-i, --input_folder Folder with bins in fasta format. (default: ./)
-h, --help Show this message.
$ ls /maxbin/output/folder
maxbin.001.fasta maxbin.002.fasta maxbin.003.fasta...
$ src/Fasta_to_Scaffolds2Bin.sh -i /maxbin/output/folder -e fasta > maxbin.scaffolds2bin.tsv
$ head gut_maxbin2_scaffolds2bin.tsv
NODE_10_length_127450_cov_375.783524 maxbin.001
NODE_27_length_95143_cov_427.155298 maxbin.001
NODE_51_length_78315_cov_504.322425 maxbin.001
NODE_84_length_66931_cov_376.684775 maxbin.001
NODE_87_length_65653_cov_460.202156 maxbin.001
DASTool_output/
需要手动创建, 否则运行结束后不会输出.mv: cannot stat ‘DASTool_output/_proteins.faa.scg’: No such file or directory
mv: cannot stat ‘DASTool_output/_proteins.faa.scg’: No such file or directory
rm: cannot remove ‘DASTool_output/_proteins.faa.findSCG.b6’: No such file or directory
rm: cannot remove ‘DASTool_output/_proteins.faa.scg.candidates.faa’: No such file or directory
rm: cannot remove ‘DASTool_output/_proteins.faa.all.b6’: No such file or directory
使用 --search_engine diamond
后运行成功.
DAS_Tool -i MetaBat.scaffolds2bin.tsv,MaxBin.scaffolds2bin.tsv,CONCOCT.scaffolds2bin.tsv -l MetaBat,MaxBin,CONCOCT -c …/scaffold.fa -o DASTool_output/
srun -p small -n 4 --pty /bin/bash DAS_Tool -i MetaBat.scaffolds2bin.tsv,MaxBin.scaffolds2bin.tsv,CONCOCT.scaffolds2bin.tsv -l MetaBat,MaxBin,CONCOCT -c …/scaffold.fa -o DASTool_output/
checkM MaxBin scaffold_gene.faa
CONCOCT MetaBat scaffold_gene.gff
DAS_Tool scaffold.bam scaffold_gene.gtf
Mariana_TY42_1_paired.fq.gz scaffold.bam.bai scaffold.res
Mariana_TY42_1_unpaired.fq.gz scaffold.depth scaffold.res.summary
Mariana_TY42_2_paired.fq.gz scaffold.fa
Mariana_TY42_2_unpaired.fq.gz scaffold_gene_count.fa
https://www.baidu.com/link?url=JbN0z_QhZbcz05SXOmXghq4KtVaCf00Tbp6YBX3qm3O6AB-yyFw2gN9XISe880jE3sylTvZ4mTI3k-XvDwzTg9D8mefZI0koVLxEVn_M6gk_jaRX6x8BXgfeRqsWaQmH&wd=&eqid=f554c6c4000ce067000000065eca2021 (DAS Tool for Genome Reconstruction from Metagenomes) ↩︎
https://doi.org/10.1038/s41564-018-0171-1 (Christian M. K. Sieber, Alexander J. Probst, Allison Sharrar, Brian C. Thomas, Matthias Hess, Susannah G. Tringe & Jillian F. Banfield (2018). Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nature Microbiology.) ↩︎