与之前的blast相比,新的blast+将blastn,blastx等合作与blastall命令分隔开来,对各个命令的参数定制更加方便
个人在使用blastn的过程中总结了一些自认为常用的参数,总结如下:
blastn -db database_name -query input_file -out output_file -evalue evalue -max_target_seqs num_sequences -num_threads int_value -outfmt format format_string
blastn -db database_name -query input_file -out output_file -evalue evalue -max_target_seqs num_sequences -num_threads int_value -outfmt format "7 qacc sacc evalue length pident"
例如:
blastn -db plant_rna -query test.fa -out test.out -evalue 0.00001 -max_target_seqs 5 -num_threads 4 -outfmt format "7 qacc sacc evalue length pident"
blastn:这个不用说了吧,核酸对核酸的比对
-db: 指定blast搜索用的数据库,详见上篇文章
-query:用来查询的输入序列,fasta格式
-out:输出结果文件
-evalue: 设置e值cutoff
-max_target_seqs:设置最多的目标序列匹配数(以前我都用-b 5 -v 5,理解不对请指教)
-num_threads:指定多少个cpu运行任务(依赖于你的系统,同于以前的-a参数)
-outfmt format "7 qacc sacc evalue length pident" :这个是新BLAST+中最拉风的功能了,直接控制输出格式,不用再用parser啦, 7表示带注释行的tab格式的输出,可以自定义要输出哪些内容,用空格分格跟在7的后面,并把所有的输出控制用双引号括起来,其中qacc查询序列的acc,sacc表示目标序列的acc,evalue即是e值,length即是匹配的长度,pident即是序列相同的百分比,其他可用的特征(红色字体)如下:
*** Formatting options
-outfmt
alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1
10 = Comma-separated values
Options 6, 7, and 10 can be additionally configured to produce
a custom format specified by space delimited format specifiers.
The supported format specifiers are:
When not provided, the default value is:
'qseqid sseqid pident length mismatch gapopen qstart qend sstart send
evalue bitscore', which is equivalent to the keyword 'std'
Default = `0'
调用blastn合作加-help参数可以打印出下面详细的帮助信息
blastn -help
blastn [-h] [-help] [-import_search_strategy filename]
[-export_search_strategy filename] [-task task_name] [-db database_name]
[-dbsize num_letters] [-gilist filename] [-negative_gilist filename]
[-entrez_query entrez_query] [-db_soft_mask filtering_algorithm]
[-subject subject_input_file] [-subject_loc range] [-query input_file]
[-out output_file] [-evalue evalue] [-word_size int_value]
[-gapopen open_penalty] [-gapextend extend_penalty]
[-perc_identity float_value] [-xdrop_ungap float_value]
[-xdrop_gap float_value] [-xdrop_gap_final float_value]
[-searchsp int_value] [-penalty penalty] [-reward reward] [-no_greedy]
[-min_raw_gapped_score int_value] [-template_type type]
[-template_length int_value] [-dust DUST_options]
[-filtering_db filtering_database]
[-window_masker_taxid window_masker_taxid]
[-window_masker_db window_masker_db] [-soft_masking soft_masking]
[-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
[-best_hit_score_edge float_value] [-window_size int_value]
[-off_diagonal_range int_value] [-use_index boolean] [-index_name string]
[-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]
[-outfmt format] [-show_gis] [-num_descriptions int_value]
[-num_alignments int_value] [-html] [-max_target_seqs num_sequences]
[-num_threads int_value] [-remote] [-version]
DESCRIPTION
Nucleotide-Nucleotide BLAST 2.2.23+
OPTIONAL ARGUMENTS
-h
Print USAGE and DESCRIPTION; ignore other arguments
-help
Print USAGE, DESCRIPTION and ARGUMENTS description; ignore other arguments
-version
Print version number; ignore other arguments
*** Input query options
-query
Input file name
Default = `-'
-query_loc
Location on the query sequence (Format: start-stop)
-strand
Query strand(s) to search against database/subject
Default = `both'
*** General search options
-task 'megablast' 'vecscreen' >
Task to execute
Default = `megablast'
-db
BLAST database name
* Incompatible with: subject, subject_loc
-out
Output file name
Default = `-'
-evalue
Expectation value (E) threshold for saving hits
Default = `10'
-word_size =4>
Word size for wordfinder algorithm (length of best perfect match)
-gapopen
Cost to open a gap
-gapextend
Cost to extend a gap
-penalty
Penalty for a nucleotide mismatch
-reward =0>
Reward for a nucleotide match
-use_index
Use MegaBLAST database index
-index_name
MegaBLAST database index name
*** BLAST-2-Sequences options
-subject
Subject sequence(s) to search
* Incompatible with: db, gilist, negative_gilist, db_soft_mask
-subject_loc
Location on the subject sequence (Format: start-stop)
* Incompatible with: db, gilist, negative_gilist, db_soft_mask, remote
*** Formatting options
-outfmt
alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1
10 = Comma-separated values
Options 6, 7, and 10 can be additionally configured to produce
a custom format specified by space delimited format specifiers.
The supported format specifiers are:
qseqid means Query Seq-id
qgi means Query GI
qacc means Query accesion
sseqid means Subject Seq-id
sallseqid means All subject Seq-id(s), separated by a ';'
sgi means Subject GI
sallgi means All subject GIs
sacc means Subject accession
sallacc means All subject accessions
qstart means Start of alignment in query
qend means End of alignment in query
sstart means Start of alignment in subject
send means End of alignment in subject
qseq means Aligned part of query sequence
sseq means Aligned part of subject sequence
evalue means Expect value
bitscore means Bit score
score means Raw score
length means Alignment length
pident means Percentage of identical matches
nident means Number of identical matches
mismatch means Number of mismatches
positive means Number of positive-scoring matches
gapopen means Number of gap openings
gaps means Total number of gaps
ppos means Percentage of positive-scoring matches
frames means Query and subject frames separated by a '/'
qframe means Query frame
sframe means Subject frame
When not provided, the default value is:
'qseqid sseqid pident length mismatch gapopen qstart qend sstart send
evalue bitscore', which is equivalent to the keyword 'std'
Default = `0'
-show_gis
Show NCBI GIs in deflines?
-num_descriptions =0>
Number of database sequences to show one-line descriptions for
Default = `500'
-num_alignments =0>
Number of database sequences to show alignments for
Default = `250'
-html
Produce HTML output?
*** Query filtering options
-dust
Filter query sequence with DUST (Format: 'yes', 'level window linker', or
'no' to disable)
Default = `20 64 1'
-filtering_db
BLAST database containing filtering elements (i.e.: repeats)
-window_masker_taxid
Enable WindowMasker filtering using a Taxonomic ID
-window_masker_db
Enable WindowMasker filtering using this repeats database.
-soft_masking
Apply filtering locations as soft masks
Default = `true'
-lcase_masking
Use lower case filtering in query and subject sequence(s)?
*** Restrict search or results
-gilist
Restrict search of database to list of GI's
* Incompatible with: negative_gilist, remote, subject, subject_loc
-negative_gilist
Restrict search of database to everything except the listed GIs
* Incompatible with: gilist, remote, subject, subject_loc
-entrez_query
Restrict search with the given Entrez query
* Requires: remote
-db_soft_mask
Filtering algorithm ID to apply to the BLAST database as soft masking
* Incompatible with: subject, subject_loc
-perc_identity
Percent identity
-culling_limit =0>
If the query range of a hit is enveloped by that of at least this many
higher-scoring hits, delete the hit
* Incompatible with: best_hit_overhang, best_hit_score_edge
-best_hit_overhang =0 and =<0.5)>
Best Hit algorithm overhang value (recommended value: 0.1)
* Incompatible with: culling_limit
-best_hit_score_edge =0 and =<0.5)>
Best Hit algorithm score edge value (recommended value: 0.1)
* Incompatible with: culling_limit
-max_target_seqs =1>
Maximum number of aligned sequences to keep
*** Discontiguous MegaBLAST options
-template_type
Discontiguous MegaBLAST template type
* Requires: template_length
-template_length
Discontiguous MegaBLAST template length
* Requires: template_type
*** Statistical options
-dbsize
Effective length of the database
-searchsp =0>
Effective length of the search space
*** Search strategy options
-import_search_strategy
Search strategy to use
* Incompatible with: export_search_strategy
-export_search_strategy
File name to record the search strategy used
* Incompatible with: import_search_strategy
*** Extension options
-xdrop_ungap
X-dropoff value (in bits) for ungapped extensions
-xdrop_gap
X-dropoff value (in bits) for preliminary gapped extensions
-xdrop_gap_final
X-dropoff value (in bits) for final gapped alignment
-no_greedy
Use non-greedy dynamic programming extension
-min_raw_gapped_score
Minimum raw gapped score to keep an alignment in the preliminary gapped and
traceback stages
-ungapped
Perform ungapped alignment only?
-window_size =0>
Multiple hits window size, use 0 to specify 1-hit algorithm
-off_diagonal_range =0>
Number of off-diagonals to search for the 2nd hit, use 0 to turn off
Default = `0'
*** Miscellaneous options
-parse_deflines
Should the query and subject defline(s) be parsed?
-num_threads =1>
Number of threads to use in the BLAST search
Default = `1'
* Incompatible with: remote
-remote
Execute search remotely?
* Incompatible with: gilist, negative_gilist, subject_loc, num_threads