细菌基因组分析软件--Bactopia

一、软件介绍:


1、文章信息:

Petit III RA, Read TD, Bactopia: a flexible pipeline for complete analysis of bacterial genomes. mSystems. 5 (2020), https://doi.org/10.1128/mSystems.00190-20.

2、软件相关介绍:

https://github.com/bactopia/bactopia

3、软件工作流程:


软件分析流程

4、主要功能

我觉得最大的特点是傻瓜,一步到位。以前的分析往往需要多步多软件进行。用完一个再用另外一个。比如:FastQC-Trimmomatic-Unicycler(SPAdes)-Prokka-blast against custom database。更麻烦的是需要经常写一些小脚本处理格式。总之很烦躁,还很难发好文章(血与泪的教训)。

该软件配置完成后可以一步到位,有木有觉得很激动,很爽?什么总结信息、提取16S序列构建进化树、物种分类、基于ANI来进行物种更细的分类(species/subspecies?)、泛基因组分析之类的一次性搞定。不知道正在准备搭建流程的公司看到这个有没有很激动。

文章里提供的1.4版本的软件列表

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

AMRFinder    3.6.7    Finds acquired antimicrobial resistance genes and some point mutations in protein or assembled nucleotide sequences

Aragorn    1.2.38    Finds transfer RNA (tRNA) features 

Ariba    2.14.4    Antimicrobial resistance identification by assembly

ART    2016.06.05    A set of simulation tools to generate synthetic next-generation sequencing reads 

assembly-scan    0.3.0    Generates basic stats for an assembly

Barrnap    0.9    Bacterial ribosomal RNA predictor

BBMap    38.76    A suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data

BCFtools    1.9    Utilities for variant calling and manipulating VCFs and BCFs

Bedtools    2.29.2    A powerful tool set for genome arithmetic

BioPython    1.76    Tools for biological computation written in Python 

BLAST    2.9.0    Basic local alignment search tool

Bowtie2    2.4.1    A fast and sensitive gapped-read aligner

BWA    0.7.17    Burrows-Wheeler Aligner for short-read alignment

CD-HIT    4.8.1    Accelerated for clustering the next-generation sequencing data 

CheckM    1.1.2    Assesses the quality of microbial genomes recovered from isolates, single cells, and metagenomes

ClonalFrameML1.12    Efficient inference of recombination in whole bacterial genomes 

DiagrammeR 1.0.0 Graph and network visualization using tabular data in R https://github.com/rich-iannone/DiagrammeR

DIAMOND 0.9.35 Accelerated BLAST-compatible local sequence aligner https://github.com/bbuchfink/diamond

eggNOG-Mapper    2.0.1    Fast genome-wide functional annotation through orthology assignment

EMIRGE    0.61.1    Reconstructs full-length ribosomal genes from short-read sequencing data

FastANI    1.3    Fast whole-genome similarity (ANI) estimation 

FastTree2    2.1.10    Approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences

fastq-dl    1.0.3    Downloads FASTQ files from SRA or ENA repositories

FastQC    0.11.9    A quality control analysis tool for high throughput sequencing data

fastq-scan    0.4.3    Outputs FASTQ summary statistics in JSON format

FLASH    1.2.11    A fast and accurate tool to merge paired-end reads

freebayes    1.3.2    Bayesian haplotype-based genetic polymorphism discovery and genotyping

GNU Parallel    20200122    A shell tool for executing jobs in parallel

GTDB-tk    1.0.2    A tool kit for assigning objective taxonomic classifications to bacterial and archaeal genomes

HMMER    3.3    Biosequence analysis using profile hidden Markov models

Infernal    1.1.2    Searches DNA sequence databases for RNA structure and sequence similarities

IQ-TREE    1.6.12    Efficient phylogenomic software by maximum likelihood

ISMapper    2.0    Insertion sequence mapping software

Lighter    1.1.2    Fast and memory-efficient sequencing error corrector

MAFFT    7.455    Multiple alignment program for amino acid or nucleotide sequences

Mash    2.2.2    Fast genome and metagenome distance estimation using MinHash

Mashtree    1.1.2    Creates a tree using Mash distances

maskrc-svg    0.5    Masks recombination as detected by ClonalFrameML or Gubbins and draws an SVG

McCortex    1.0    De novo genome assembly and multisample variant calling

MEGAHIT    1.2.9    Ultra-fast and memory-efficient (meta-)genome assembler

MinCED    0.4.2    Mining CRISPRs in environmental data sets

Minimap2    2.17    A versatile pairwise aligner for genomic and spliced nucleotide sequences

ncbi-genome-download    0.2.12    Scripts to download genomes from the NCBI FTP servers

Nextflow    19.10.0    A DSL for data-driven computational pipelines

phyloFlash    3.3b3    Rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of anIllumina (metagenomic data set)

Pigz    2.3.4    A parallel implementation of gzip for modern multiprocessor, multicore machines

Pilon    1.23    An automated genome assembly improvement and variant detection tool

PIRATE    1.0.3    A toolbox for pan-genome analysis and threshold evaluation

pplacer    1.1.alpha19    Phylogenetic placement and downstream analysis

Prodigal    2.6.3    Fast, reliable protein-coding gene prediction for prokaryotic genomes

Prokka    1.4.5    Rapid prokaryotic genome annotation

QUAST    5.0.2    Quality assessment tool for genome assemblies

Racon    1.4.13    Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads

Roary    3.13.0    Rapid large-scale prokaryote pan genome analysis

samclip    0.2    Filter SAM file for soft and hard clipped alignments

SAMtools    1.9    Tools for manipulating next-generation sequencing data

Seqtk    1.3    A fast and lightweight tool for processing sequences in the FASTA or FASTQ format

Shovill    1.0.9se    Faster assembly of Illumina reads

SKESA    2.3.0    Strategic k-mer extension for scrupulous assemblies

Snippy    4.4.5    Rapid haploid variant calling and core genome alignment

SnpEff    4.3.1    Genomic variant annotations and functional effect prediction toolbox

snp-dists    0.6.3    Pairwise SNP distance matrix from a FASTA sequence alignment 

SNP-sites    2.5.1    Rapidly extracts SNPs from a multi-FASTA alignment

Sourmash    3.2.0    Compute and compare MinHash signatures for DNA data sets 

SPAdes    3.13.0    An assembly toolkit containing various assembly pipelines

Trimmomatic    0.39    A flexible read trimming tool for Illumina NGS data

Unicycler    0.4.8    Hybrid assembly pipeline for bacterial genomes

vcf-annotator    0.5    Add biological annotations to variants in a VCF file 

Vcflib    1.0.0rc3    A simple C library for parsing and manipulating VCF files

Velvet    1.2.10    Short read de novo assembler using de Bruijn graphs

VSEARCH    2.14.1    Versatile open-source tool for metagenomics

vt    2015.11.10    A tool set for short-variant discovery in genetic sequence data

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5. 软件使用

    5.1 软件安装

    conda create -y -n bactopia -c conda-forge -c bioconda bactopia

    conda activate bactopia

    bactopia datasets datasets/ #这里会下载到指定的目录‘datasets/',包含了CARD,VFDB(核心),RefSeq Mash Sketch,GenBank Sourmash Signatures, PLSDB Mash Sketch & BLAST。

    5.2 软件运行

    双端数据

bactopia --R1 ${SAMPLE}_R1.fastq.gz --R2 ${SAMPLE}_R2.fastq.gz --sample ${SAMPLE} \

        --datasets datasets/ --outdir ${OUTDIR}

    单端数据

bactopia --SE ${SAMPLE}.fastq.gz --sample ${SAMPLE} --datasets datasets/ --outdir ${OUTDIR}

    多样本

bactopia prepare directory-of-fastqs/ > fastqs.txt

bactopia --fastqs fastqs.txt --datasets datasets --outdir ${OUTDIR}

    ENA数据(真香)

bactopia --accessions ena-accessions.txt \

        --datasets datasets/ \

        --species "Staphylococcus aureus" \

        --coverage 100 \

        --genome_size median \

        --cpus 2 \

        --outdir ena-multiple-samples

你可能感兴趣的:(细菌基因组分析软件--Bactopia)