Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes

Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes     复杂细菌基因组混合装配的长序列测序技术比较

Abstract

Illumina sequencing allows rapid, cheap and accurate whole genome bacterial analyses, but short reads (<300 bp) do not usually enable complete genome assembly. Long-read sequencing greatly assists with resolving complex bacterial genomes, particularly when combined with short-read Illumina data (hybrid assembly). However, it is not clear how different long-read sequencing methods affect hybrid assembly accuracy. Relative automation of the assembly process is also crucial to facilitating high-throughput complete bacterial genome reconstruction, avoiding multiple bespoke filtering and data manipulation steps. In this study, we compared hybrid assemblies for 20 bacterial isolates, including two reference strains, using Illumina sequencing and long reads from either Oxford Nanopore Technologies (ONT) or SMRT Pacific Biosciences (PacBio) sequencing platforms. We chose isolates from the family Enterobacteriaceae, as these frequently have highly plastic, repetitive genetic structures, and complete genome reconstruction for these species is relevant for a precise understanding of the epidemiology of antimicrobial resistance. We de novo assembled genomes using the hybrid assembler Unicycler and compared different read processing strategies, as well as comparing to long-read-only assembly with Flye followed by short-read polishing with Pilon. Hybrid assembly with either PacBio or ONT reads facilitated high-quality genome reconstruction, and was superior to the long-read assembly and polishing approach evaluated with respect to accuracy and completeness. Combining ONT and Illumina reads fully resolved most genomes without additional manual steps, and at a lower consumables cost per isolate in our setting. Automated hybrid assembly is a powerful tool for complete and accurate bacterial genome assembly.

Keywords: hybrid assembly, bacterial genomics, long-read sequencing, Enterobacteriaceae, plasmid assembly

Illumina测序允许快速、廉价和准确的全基因组细菌分析,但短读(<300 bp)通常不能实现完整的基因组组装。
长读测序极大地帮助解决复杂的细菌基因组,特别是当结合短读Illumina数据(混合组装)。
然而,目前还不清楚不同的长读测序方法如何影响混合装配的准确性。
装配过程的相对自动化对于促进高通量的细菌基因组完全重建也至关重要,避免了多个定制的过滤和数据操作步骤。
在这项研究中,我们使用Illumina测序和来自Oxford Nanopore Technologies (ONT)或SMRT Pacific Biosciences (PacBio)测序平台的长序列比对了20个细菌分离物的杂交组件,包括两个参考菌株。
我们选择了肠杆菌科的分离株,因为这些分离株通常具有高度可塑性、重复的遗传结构,对这些物种进行完整的基因组重建对于准确理解抗菌素耐药性的流行病学是有意义的。
我们使用混合汇编器Unicycler重新组装基因组,并比较了不同的读处理策略,以及使用Flye进行长只读组装,然后使用Pilon进行短读抛光。
无论是PacBio还是ONT reads的混合装配都促进了高质量的基因组重建,并在准确性和完整性方面优于长读装配和抛光方法。
联合ONT和Illumina无需额外的人工步骤即可读取完全解析的大多数基因组,并且在我们的环境中,每个分离物的消耗成本更低。
自动化混合装配是一个强大的工具,完整和准确的细菌基因组组装。

你可能感兴趣的:(Tools,生物信息学,第三代测序,技术)