Canu是Celera的继任者,能用于组装PacBio和Nanopore两家公司得到的测序结果。
Canu分为三个步骤:纠错,修整和组装。
# 下载 pacbio 测试数据
wget -c http://gembox.cbcb.umd.edu/mhap/raw/ecoli_p6_25x.filtered.fastq -O pacbio.fastq
# 下载 nanopare 测试数据
wget -c http://nanopore.s3.climb.ac.uk/MAP006-PCR-1_2D_pass.fasta -O oxford.fasta
为了测试方便,这里没有从 Canu 的源码编译,而是直接使用的 Docker 容器来测试。
为了从 quay.io 获取 Canu 镜像,需要先注册一个账号,注册比较简单,就是填个表格就行了。
sudo docker login quay.io
sudo docker pull quay.io/biocontainers/canu:1.7.1--pl526h470a237_0
在上面下载的数据文件同级目录下运行下面命令来启动一个容器
sudo docker run -it --rm -v `pwd`:/canu quay.io/biocontainers/canu:1.7.1--pl526h470a237_0 bash
canu -correct \
-p ecoli -d ecoli-pacbio \
corThreads=32 corOutCoverage=120 corMinCoverage=2 \
gnuplotTested=true \
genomeSize=120m minReadLength=2000 minOverlapLength=500 \
maxMemory=500g maxThreads=32 \
ovsMemory=1-32G ovsThreads=16 ovsConcurrency=16 \
ovbMemory=1g ovbConcurrency=16 oeaThreads=16 \
-pacbio-raw pacbio.fastq
canu -trim \
-p ecoli -d ecoli-pacbio \
gnuplotTested=true \
genomeSize=120m minReadLength=2000 minOverlapLength=500 \
maxMemory=500g maxThreads=32 \
ovsMemory=1-32G ovsThreads=16 ovsConcurrency=16 \
ovbMemory=1g ovbConcurrency=16 oeaThreads=16 \
-pacbio-corrected ecoli-pacbio/ecoli.correctedReads.fasta.gz
canu -assemble \
-p ecoli -d ecoli-pacbio \
gnuplotTested=true \
genomeSize=120m minReadLength=2000 minOverlapLength=500 \
maxMemory=500g maxThreads=32 \
ovsMemory=1-32G ovsThreads=16 ovsConcurrency=16 \
ovbMemory=1g ovbConcurrency=16 oeaThreads=16 \
correctedErrorRate=0.050 \
-pacbio-corrected ecoli-pacbio/ecoli.trimmedReads.fasta.gz
canu -correct \
-p ecoli -d ecoli-oxford \
corThreads=32 corOutCoverage=120 corMinCoverage=2 \
gnuplotTested=true \
genomeSize=120m minReadLength=2000 minOverlapLength=500 \
maxMemory=500g maxThreads=32 \
ovsMemory=1-32G ovsThreads=16 ovsConcurrency=16 \
ovbMemory=1g ovbConcurrency=16 oeaThreads=16 \
-nanopore-raw oxford.fasta
canu -trim \
-p ecoli -d ecoli-oxford \
gnuplotTested=true \
genomeSize=120m minReadLength=2000 minOverlapLength=500 \
maxMemory=500g maxThreads=32 \
ovsMemory=1-32G ovsThreads=16 ovsConcurrency=16 \
ovbMemory=1g ovbConcurrency=16 oeaThreads=16 \
-nanopore-corrected ecoli-oxford/ecoli.correctedReads.fasta.gz
canu -assemble \
-p ecoli -d ecoli-oxford \
gnuplotTested=true \
genomeSize=120m minReadLength=2000 minOverlapLength=500 \
maxMemory=500g maxThreads=32 \
ovsMemory=1-32G ovsThreads=16 ovsConcurrency=16 \
ovbMemory=1g ovbConcurrency=16 oeaThreads=16 \
correctedErrorRate=0.050 \
-nanopore-corrected ecoli-oxford/ecoli.trimmedReads.fasta.gz
创建一个 spec 文件,比如:myspec.txt,内容如下:
useGrid=1
gridOptions=-S /bin/bash -q all.q -l mem_free=32g
gridEngineThreadsOption=-pe mpi THREADS
gridEngineMemoryOption=-l mem_total=MEMORY
ovbMemory=8g
maxMemory=64g
maxThreads=32
ovsMemory=8-64g
ovsThreads=4
oeaThreads=4
# 使用 PacBio 数据测试
canu -s myspec.txt -p ecoli -d ecoli-pacbio gnuplotTested=true genomeSize=120m -pacbio-raw pacbio.fastq
# 使用 Oxford Nanopore 数据测试
canu -s myspec.txt -p ecoli -d ecoli-oxford gnuplotTested=true genomeSize=120m -nanopore-raw oxford.fasta