BUSCO使用笔记

Today I d like to reuse BUSCO to assess quality of genome assembly for a fungi. This tool can be run on a unix-like platform like ubuntu, like most of others used in bioinformatics. My computer is installed with bi-OS windows and ubuntu. On ubuntu there is no way to input Chinese characters, so this blog was finished in English based the earlier edition when I first use this tool.
Unfortunately, I encountered a hand of tough problems this time and cost nearly 6hrs, I can not imagine the happy situation in the last case. Whatever, the job was done, and I implemented this blog to save my life when the next time is going to come although I even know the actually date.

安裝BUSCO

下載 https://gitlab.com/ezlab/busco
path to the busco folder, relative path is forbidden
安裝sudo python setup.py install

怎麼都搞成繁體字了?

安裝下面3個軟件,後2個需要安裝,安裝方法見各自的readme文件, yes we have to install them instead just download

  • NCBI BLAST+ [NB: please see release note 2.0.1 below]
    https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST这个包中包含各种比对需要的工具
    It seems this tool is not to installed, I nearly forgot.

  • HMMER (HMMER v3.1b2)
    install
    from http://hmmer.org/
    tar xf hmmer-3.2.1.tar.gz
    cd hmmer-3.2.1
    ./configure
    make

  • Augustus (> 3.2.1) (only used for assessing genomes)
    http://bioinf.uni-greifswald.de/augustus/
    A lot of dynamic libraries were required, I do not know their usefulness, just install them, from the ubuntu origin or github.
    from ubuntu: sudo apt-get install xx, when you met errors, most of the time there would be some tips showing for you just follow the instruction.
    tar -xzf augustus-3.3.1.tar.gz
    Edit common.mk and uncomment the line ZIPINPUT = true
    sudo apt-get install libboost-iostreams-dev
    sudo apt-get install zlib1g-dev
    sudo apt-get install libgsl-dev
    sudo apt-get install libmysql++-dev
    sudo apt-get install libboost-graph-dev
    sudo apt-get install libsuitesparse-dev liblpsolve55-dev
    sudo apt-get install bamtools libbamtools-dev
    sudo apt-get install libboost-all-dev

from github (just use git clone "the website of the package on github") then cd to the path for your download item, and them sudo make install
sudo make install libbzip2

sudo apt-get install libbz2-dev
sudo apt-get install liblzma-dev
sudo apt-get install libncurses5-dev
export TOOLDIR=/home/dong/dlm_wd/asmb_assess/sof/TOOLDIR

when you install augustus, several tools are needed, you have to study the manual carefully, and follow those tips, step by step.

配置config.ini文件

BUSCO目錄下有config文件夾,將其中的config.ini.default拷貝並重命名爲config.ini,打開後修改,把所有的默認路徑如/home/osboxes/BUSCOVM/augustus/augustus-3.2.2/,都替換成實際的路徑,比如我的/home/larix/下载/augustus/

export AUGUSTUS_CONFIG_PATH="/path/to/AUGUSTUS/augustus-3.2.3/config/" 這裏的路徑替換成實際的,否則報錯

export AUGUSTUS_CONFIG_PATH="/home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/config/"
export BUSCO_CONFIG_FILE="/home/dong/dlm_wd/asmb_assess/sof/busco/busco-master/config/config.ini"

export PATH="/home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/bin:PATH"
export AUGUSTUS_CONFIG_PATH="/home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/config/"

here is my config.int

[tblastn]
# path to tblastn
path = /home/dong/dlm_wd/asmb_assess/sof/blast/ncbi-blast-2.7.1+/bin/


[makeblastdb]
# path to makeblastdb
path = /home/dong/dlm_wd/asmb_assess/sof/blast/ncbi-blast-2.7.1+/bin/


[augustus]
# path to augustus
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/bin/

[etraining]
# path to augustus etraining
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/bin/

# path to augustus perl scripts, redeclare it for each new script
[gff2gbSmallDNA.pl]
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/scripts/
[new_species.pl]
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/scripts/
[optimize_augustus.pl]
path = /home/dong/dlm_wd/asmb_assess/sof/augustus/augustus-3.3.1/scripts/

[hmmsearch]
# path to HMMsearch executable
path = /home/dong/dlm_wd/asmb_assess/sof/hmmer/hmmer-3.2.1/src/


[Rscript]
# path to Rscript, if you wish to use the plot tool
path = /usr/bin/

準備自己的數據和匹配數據庫

自己的數據指的是組裝好的fasta文件,數據庫要從http://busco.ezlab.org/ 下載對應的,放在電腦中的位置可以自定義,運行時給清楚就可以了。

run

基因組據裝評估

先把路徑cd到BUSCO文件夾,然後輸入
python scripts/run_BUSCO.py -i SEQUENCE_FILE -o OUTPUT_NAME -l LINEAGE -m geno
SEQUENCE_FILE-你組裝好的基因組文件
OUTPUT_NAME-給定一個輸出名
LINEAGE-要匹配的數據庫
geno-指定是基因組評估

我的python scripts/run_BUSCO.py -i /home/larix/下载/myproject/zglk_fungi.genome_contigs.fasta -o ev_genome -l ascomycota_odb9 -m geno
我的匹配數據庫放在了BUSCO文件夾中。

python scripts/run_BUSCO.py 
-i /home/dong/dlm_wd/asmb_assess/seq/genome/Metarhizium_anisopliae.Metarhizium_anisopliae.dna.nonchromosomal.fa 
-o  lvjiangjun_geno 
-l /home/dong/dlm_wd/asmb_assess/seq/reference/ascomycota_odb9 
-m geno

回車後程序會進入運行階段
給一個樣例

root@larix:/home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136
d# python scripts/run_BUSCO.py -i /home/larix/下载/myproject/sequence.fasta -o genome -l fungi_odb9 -m geno
INFO    ****************** Start a BUSCO 3.0.2 analysis, current time: 07/23/2017 14:44:33 ******************
INFO    Configuration loaded from /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/scripts/../config/config.ini
INFO    Init tools...
INFO    Check dependencies...
INFO    Check input file...
INFO    To reproduce this run: python scripts/run_BUSCO.py -i /home/larix/下载/myproject/sequence.fasta -o genome -l fungi_odb9/ -m genome -c 1 -sp aspergillus_nidulans
INFO    Mode is: genome
INFO    The lineage dataset is: fungi_odb9 (eukaryota)
INFO    Temp directory is ./tmp/
INFO    ****** Phase 1 of 2, initial predictions ******
INFO    ****** Step 1/3, current time: 07/23/2017 14:44:34 ******
INFO    Create blast database...
INFO    [makeblastdb]   Building a new DB, current time: 07/23/2017 14:44:35
INFO    [makeblastdb]   New DB name:   /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/tmp/genome_2557368148
INFO    [makeblastdb]   New DB title:  /home/larix/下载/myproject/sequence.fasta
INFO    [makeblastdb]   Sequence type: Nucleotide
INFO    [makeblastdb]   Keep Linkouts: T
INFO    [makeblastdb]   Keep MBits: T
INFO    [makeblastdb]   Maximum file size: 1000000000B
INFO    [makeblastdb]   Adding sequences from FASTA; added 1 sequences in 0.142315 seconds.
INFO    [makeblastdb]   1 of 1 task(s) completed at 07/23/2017 14:44:35
INFO    Running tblastn, writing output to /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/blast_output/tblastn_genome.tsv...
INFO    [tblastn]   1 of 1 task(s) completed at 07/23/2017 14:44:40
INFO    ****** Step 2/3, current time: 07/23/2017 14:44:40 ******
INFO    Maximum number of candidate contig per BUSCO limited to: 3
INFO    Getting coordinates for candidate regions...
INFO    Pre-Augustus scaffold extraction...
INFO    Running Augustus prediction using aspergillus_nidulans as species:
INFO    [augustus]  Please find all logs related to Augustus errors here: /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/augustus_output/augustus.log
INFO    [augustus]  7 of 69 task(s) completed at 07/23/2017 14:45:02
INFO    [augustus]  14 of 69 task(s) completed at 07/23/2017 14:45:27
INFO    [augustus]  21 of 69 task(s) completed at 07/23/2017 14:45:49
INFO    [augustus]  28 of 69 task(s) completed at 07/23/2017 14:46:04
INFO    [augustus]  35 of 69 task(s) completed at 07/23/2017 14:46:25
INFO    [augustus]  42 of 69 task(s) completed at 07/23/2017 14:46:42
INFO    [augustus]  49 of 69 task(s) completed at 07/23/2017 14:46:56
INFO    [augustus]  56 of 69 task(s) completed at 07/23/2017 14:47:50
INFO    [augustus]  63 of 69 task(s) completed at 07/23/2017 14:48:07
INFO    [augustus]  69 of 69 task(s) completed at 07/23/2017 14:48:18
INFO    Extracting predicted proteins...
INFO    ****** Step 3/3, current time: 07/23/2017 14:48:18 ******
INFO    Running HMMER to confirm orthology of predicted proteins:
INFO    [hmmsearch] 7 of 69 task(s) completed at 07/23/2017 14:48:19
INFO    [hmmsearch] 14 of 69 task(s) completed at 07/23/2017 14:48:19
INFO    [hmmsearch] 21 of 69 task(s) completed at 07/23/2017 14:48:19
INFO    [hmmsearch] 28 of 69 task(s) completed at 07/23/2017 14:48:20
INFO    [hmmsearch] 35 of 69 task(s) completed at 07/23/2017 14:48:20
INFO    [hmmsearch] 42 of 69 task(s) completed at 07/23/2017 14:48:20
INFO    [hmmsearch] 49 of 69 task(s) completed at 07/23/2017 14:48:20
INFO    [hmmsearch] 56 of 69 task(s) completed at 07/23/2017 14:48:20
INFO    [hmmsearch] 63 of 69 task(s) completed at 07/23/2017 14:48:21
INFO    [hmmsearch] 69 of 69 task(s) completed at 07/23/2017 14:48:21
INFO    Results:
INFO    C:6.6%[S:6.6%,D:0.0%],F:3.4%,M:90.0%,n:290
INFO    19 Complete BUSCOs (C)
INFO    19 Complete and single-copy BUSCOs (S)
INFO    0 Complete and duplicated BUSCOs (D)
INFO    10 Fragmented BUSCOs (F)
INFO    261 Missing BUSCOs (M)
INFO    290 Total BUSCO groups searched
INFO    ****** Phase 2 of 2, predictions using species specific training ******
INFO    ****** Step 1/3, current time: 07/23/2017 14:48:21 ******
INFO    Extracting missing and fragmented buscos from the ancestral_variants file...
INFO    Running tblastn, writing output to /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/blast_output/tblastn_genome_missing_and_frag_rerun.tsv...
INFO    [tblastn]   1 of 1 task(s) completed at 07/23/2017 14:49:00
INFO    Maximum number of candidate contig per BUSCO limited to: 3
INFO    Getting coordinates for candidate regions...
INFO    ****** Step 2/3, current time: 07/23/2017 14:49:00 ******
INFO    Training Augustus using Single-Copy Complete BUSCOs:
INFO    Converting predicted genes to short genbank files at 07/23/2017 14:49:00...
INFO    All files converted to short genbank files, now running the training scripts at 07/23/2017 14:49:01...
INFO    Pre-Augustus scaffold extraction...
INFO    Re-running Augustus with the new metaparameters, number of target BUSCOs: 271
INFO    [augustus]  7 of 61 task(s) completed at 07/23/2017 14:49:15
INFO    [augustus]  13 of 61 task(s) completed at 07/23/2017 14:49:29
INFO    [augustus]  19 of 61 task(s) completed at 07/23/2017 14:49:43
INFO    [augustus]  25 of 61 task(s) completed at 07/23/2017 14:50:02
INFO    [augustus]  31 of 61 task(s) completed at 07/23/2017 14:50:19
INFO    [augustus]  37 of 61 task(s) completed at 07/23/2017 14:51:08
INFO    [augustus]  43 of 61 task(s) completed at 07/23/2017 14:51:24
INFO    [augustus]  49 of 61 task(s) completed at 07/23/2017 14:51:46
INFO    [augustus]  55 of 61 task(s) completed at 07/23/2017 14:52:07
INFO    [augustus]  61 of 61 task(s) completed at 07/23/2017 14:52:26
INFO    Extracting predicted proteins...
INFO    ****** Step 3/3, current time: 07/23/2017 14:52:27 ******
INFO    Running HMMER to confirm orthology of predicted proteins:
INFO    [hmmsearch] 6 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 12 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 18 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 24 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 30 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 36 of 59 task(s) completed at 07/23/2017 14:52:27
INFO    [hmmsearch] 42 of 59 task(s) completed at 07/23/2017 14:52:28
INFO    [hmmsearch] 48 of 59 task(s) completed at 07/23/2017 14:52:28
INFO    [hmmsearch] 54 of 59 task(s) completed at 07/23/2017 14:52:28
INFO    [hmmsearch] 59 of 59 task(s) completed at 07/23/2017 14:52:28
INFO    Results:
INFO    C:6.6%[S:6.6%,D:0.0%],F:3.4%,M:90.0%,n:290
INFO    19 Complete BUSCOs (C)
INFO    19 Complete and single-copy BUSCOs (S)
INFO    0 Complete and duplicated BUSCOs (D)
INFO    10 Fragmented BUSCOs (F)
INFO    261 Missing BUSCOs (M)
INFO    290 Total BUSCO groups searched
INFO    BUSCO analysis done. Total running time: 474.648495913 seconds
INFO    Results written in /home/larix/下载/busco-master-e83a6c94101511484799f9770cdfc148559b136d/run_genome/

for more readings, you can refer to BUSCO - 组装质量评估

你可能感兴趣的:(BUSCO使用笔记)