用nextDenovo组装基因组

背景介绍

NextDenovo是武汉未来组（现在可能得叫希望组了）开发的用于三代基因组组装的软件。
想当年读硕士的时候我还因为项目合作的事儿在未来组呆了好几个月来着。

可用资源

GitHub地址：https://github.com/Nextomics/NextDenovo
官方文档： https://nextdenovo.readthedocs.io/en/latest/
洲更学长的笔记：ttps://xuzhougeng.top/archives/Assembly-nanopore-with-NextDenovo

软件安装

安装起来比较轻松愉快，因为软件本体不需要安装，有编译好的二进制文件可以直接下载使用。唯一需要安装的就是一个python的依赖Paralleltask

# 下载软件本体
wget https://github.com/Nextomics/NextDenovo/releases/download/v2.5.0/NextDenovo.tgz
# 安装依赖
python -m pip install Paralleltask
# 解压软件
tar -zxvf NextDenovo.tgz

软件测试

压缩包解压开之后可以找到里面有个test_data文件夹和它下面的示例程序test_data/run.cfg，可以直接运行测试一下软件能不能在你的服务器上跑通。当然这一步是非必须的哈。

cd NextDenovo
nextDenovo test_data/run.cfg

运行自己的项目

生成输入文件

把自己的组装数据的绝对路径存入文件并命名成input.fofn

ls /path/to/01RawData/PacBio/*hifi_reads.fastq.gz > input.fofn

编写config文件

拷贝一份测试数据的cfg文件过来

cp ../NextDenovo/doc/run.cfg .

按照自己的项目的实际情况去修改参数。我的test.run.cfg文件如下：

[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = test_nextDenovo
task = all # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes 
parallel_jobs = 24 # number of tasks used to run in parallel
input_type = raw # raw, corrected
read_type = hifi # clr, ont, hifi
input_fofn = input.fofn
workdir = 01_rundir

[correct_option]
read_cutoff = 1k
genome_size = x.xg # estimated genome size
sort_options = -m 20g -t 15
minimap2_options_raw = -t 8
pa_correction = 3 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -p 15

[assemble_option]
minimap2_options_cns = -t 8 
nextgraph_options = -a 1

更多的参数说明可以访问下面这个官方教程地址：

https://nextdenovo.readthedocs.io/en/latest/OPTION.html

接下来去run就可以了

nohup nextDenovo test.run.cfg &

私货时间

在我目前用PacBio HiFi数据组装基因组的项目中，NextDenovo的效果仅次于hifiasm。
GitHub上目前NextDenovo团队是把HiFi给划掉了，不知道是不推荐使用NextDenovo用于组装HiFi数据还是啥别的意思。
NextDenovo现在文章还未发布，如果使用了请引用GitHub地址：
https://github.com/Nextomics/NextDenovo