Blast常用命令

构建蛋白质数据库:

makeblastdb -in spinach.fa  -parse_seqids -hash_index  -out spinach_DB -dbtype prot

构建核酸数据库:

makeblastdb -in spinach.fa  -parse_seqids -hash_index  -out spinach_DB -dbtype nucl    #一般带上-parse_seqids -hash_index 

blastp

-query  后接查询序列的文件名称; 
-db  后接格式化好的数据库名称; 
-out  后接要输出的文件名称及格式

blastp -query spinach_1.fa -db spinach_DB -out spinach_blast

个人在使用blastn的过程中总结了一些自认为常用的参数,总结如下:

blastn -query input_file -db database_name  -out output_file -evalue evalue -max_target_seqs num_sequences -num_threads int_value -outfmt format format_string
blastn -query input_file -db database_name  -out output_file -evalue evalue -max_target_seqs num_sequences -num_threads int_value -outfmt format  "7 qacc sacc evalue length pident"

blastn -task blastn-short -query input_file -out output_file -db database_name -outfmt 6(or 7) -evalue 0.01 -num_threads 15 ##序列比对

-db: 指定blast搜索用的数据库,详见上篇文章
-query:用来查询的输入序列,fasta格式
-out:输出结果文件
-evalue: 设置e值cutoff
-max_target_seqs:设置最多的目标序列匹配数(以前我都用-b 5 -v 5,理解不对请指教)
-num_threads:指定多少个cpu运行任务(依赖于你的系统,同于以前的-a参数)
-outfmt format "7 qacc sacc evalue length pident" :这个是新BLAST+中最拉风的功能了,直接控制输出格式,不用再用parser啦, 7表示带注释行的tab格式的输出,可以自定义要输出哪些内容,用空格分格跟在7的后面,并把所有的输出控制用双引号括起来,其中qacc查询序列的acc,sacc表示目标序列的acc,evalue即是e值,length即是匹配的长度,pident即是序列相同的百分比,其他可用的特征(红色字体)如下:

结果文件简写 含义

qaccver 查询的AC版本(与此类似的还有qseqid,qgi,qacc,与序列命名有关)
saccver 目标的AC版本(于此类似的还有sseqid,sallseqid,sgi,sacc,sallacc,也是序列命名相关)
pident  完全匹配百分比 (响应的nident则是匹配数)
length  联配长度(另外slen表示查询序列总长度,qlen表示目标序列总长度)
mismatch    错配数目
gapopen gap的数目
qstart  查询序列起始
qend    查询序列结束
sstart  目标序列起始
send    目标序列结束
evalue  期望值
bitscore    Bit得分
score   原始得分
AC: accession

*** Formatting options

-outfmt 
alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1
10 = Comma-separated values
Options 6, 7, and 10 can be additionally configured to produce
a custom format specified by space delimited format specifiers.

The supported format specifiers are:
qseqid means Query Seq-id
qgi    means Query GI
qacc   means Query accesion
sseqid    means Subject Seq-id
sallseqid means All subject Seq-id(s), separated by a ';'
sgi       means Subject GI
sallgi    means All subject GIs
sacc      means Subject accession
sallacc   means All subject accessions
qstart    means Start of alignment in query
qend      means End of alignment in query
sstart    means Start of alignment in subject
send      means End of alignment in subject
qseq      means Aligned part of query sequence
sseq      means Aligned part of subject sequence
evalue    means Expect value
bitscore  means Bit score
score     means Raw score
length    means Alignment length
pident    means Percentage of identical matches
nident    means Number of identical matches
mismatch  means Number of mismatches
positive  means Number of positive-scoring matches
gapopen   means Number of gap openings
gaps      means Total number of gaps
ppos      means Percentage of positive-scoring matches
frames    means Query and subject frames separated by a '/'
qframe    means Query frame
sframe means Subject frame

When not provided, the default value is:
'qseqid sseqid pident length mismatch gapopen qstart qend sstart send
evalue bitscore', which is equivalent to the keyword 'std'
Default = `0'

你可能感兴趣的:(Blast常用命令)