一、NCBI blast+
1. 安装配置BLAST+程序
在ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/中下载最新的BLAST可执行程序(不要下载源代码`,源码编译非常慢),选择预编译版本,如ncbi-blast-2.2.30+-x64-linux.tar.gz。如果服务器能联网,可直接用wget下载。或者,下载后用SFTP客户端传输到服务器上。
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.2.30+-x64-linux.tar.gz
解压缩:
tar -zxvf ncbi-blast-2.2.30+-x64-linux.tar.gz
2.基本用法
**提示:blast输出格式有多种,其中11包含信息最全,其它格式都可用blast_formatter程序由11转化为其它格式。所以,比对结果请使用11格式。
1) 对相应的序列进行建库
makeblastdb -in db.fasta -dbtype nucl -parse_seqids -out dbname
**其中 -dbtype 为 nucl 则表示对核酸类型的序列建库,为 prot 则表示对氨基酸类型的序列进行建库
2) 建库之后,就是拿目标序列比对
blastn -query test.fa -db daname -outfmt 11 -out "[email protected]" -num_threads 8
**其中输出文件名[email protected]是个人习惯,即“序列文件名.blast子程序名@库名.结果格式”,结果简单明了
**如果目标序列是蛋白序列,匹配到 nr 数据库或者其他蛋白类数据库,以及其他自己构建的蛋白序列库时,则用 blastp, 其他参数类似。
二、diamond程序
1. 安装diamond程序
在diamond下载界面获得下载链接
wget http://github.com/bbuchfink/diamond/releases/download/v0.9.17/diamond-linux64.tar.gz
tar xzf diamond-linux64.tar.gz
**解压结果为一个二进制可执行文件 diamond, 直接添加环境变量即可
2. 基本用法
To now run an alignment task, we assume to have a protein database file in FASTA format named nr.faa and a file of DNA reads that we want to align namedreads.fna.
1) 建库 In order to set up a reference database for DIAMOND, the makedb command needs to be executed with the following command line:
$ diamond makedb --in nr.faa -d nr ## 建库
$ diamond help
diamond helpdiamond v0.8.8.70 | by Benjamin BuchfinkCheck http://github.com/bbuchfink/diamond for updates.
Syntax: diamond COMMAND [OPTIONS]
Commands:
makedb Build DIAMOND database from a FASTA file
blastp Align amino acid query sequences against a protein reference database
blastx Align DNA query sequences against a protein reference database
view View DIAMOND alignment archive (DAA) formatted file
help Produce help message
version Display version information
General options:
--threads (-p) number of CPU threads
--db (-d) database file
--daa (-a) DIAMOND alignment archive (DAA) file
--verbose (-v) verbose console output
--log enable debug log
--quiet disable console output
Makedb options:
--in input reference file in FASTA format
--block-size (-b) sequence block size in billions of letters (default=2)
Aligner options:
--query (-q) input query file
--max-target-seqs (-k) maximum number of target sequences to report alignments for
--top report alignments within this percentage range of top alignment score (overrides --max-target-seqs)
--compress compression for output files (0=none, 1=gzip)
--evalue (-e) maximum e-value to report alignments
--min-score minimum bit score to report alignments (overrides e-value setting)
--id minimum identity% to report an alignment
--query-cover minimum query cover% to report an alignment
--sensitive enable sensitive mode (default: fast)
--index-chunks (-c) number of chunks for index processing
--tmpdir (-t) directory for temporary files
--gapopen gap open penalty (default=11 for protein)
--gapextend gap extension penalty (default=1 for protein)
--matrix score matrix for protein alignment
--seg enable SEG masking of queries (yes/no)
--salltitles print full subject titles in output files
Advanced options:
--seed-freq maximum seed frequency
--run-len (-l) mask runs between stop codons shorter than this length
--max-hits (-C) maximum number of hits to consider for one seed
--id2 minimum number of identities for stage 1 hit
--window (-w) window size for local hit search
--xdrop (-x) xdrop for ungapped alignment
--gapped-xdrop (-X) xdrop for gapped alignment in bits
--ungapped-score minimum raw alignment score to continue local extension
--hit-band band for hit verification
--hit-score minimum score to keep a tentative alignment
--band band for dynamic programming computation
--shapes (-s) number of seed shapes (0 = all available)
--index-mode index mode (0=4x12, 1=16x9)
--fetch-size trace point fetch size
--single-domain Discard secondary domains within one target sequence
--dbsize effective database size (in letters)
--no-auto-append disable auto appending of DAA and DMND file extensions
View options:
--out (-o) output file
--outfmt (-f) output format (tab/sam/xml)
--forwardonly only show alignments of forward strand
2) 序列比对
** 上面建库之后会生成一个 nr.dmnd 文件,The alignment task may then be initiated using the blastx command like this:
$ diamond blastx -d nr -q reads.fna -o matches.m8
The output file here is specified with the –o option and named matches.m8. By default, it is generated in BLAST tabular format.