1.下载Swiss-prot的蛋白质序列并构建blast数据库
Swiss-Prot 数据库中的蛋白质的功能经过了试验验证,注释是精确的。但是其蛋白质数目相比于Nr,就非常少了,仅有约54万条。由于数据库不大,适合于本地化Blast进行Swiss-Prot注释。
(1)下载Swiss-Prot的蛋白质序列并构建Blast数据库 $wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
(这是windows下的下载链接:http://www.uniprot.org/downloads)
(2)解压下载好的数据库$gzip -d uniprot_sprot.fasta.gz
(3)建库 $makeblastdb -in uniprot_sprot.fasta-dbtype prot -title uniprot_sprot-parse_seqids -outuniprot_sprot -logfileuniprot_sprot.log
$cat uniprot_sprot.log
(在此之前,我将makeblastdb加入到环境变量中去了。还有下面的blastp我也加入到环境变量中去了。)
2.使用blastp进行Swiss-prot注释
$blastp -query proteins.fasta -out swiss-prot.tab -db uniprot_sprot -evalue 1e-5 -outfmt 7
$cat swiss-prot.tab
下面是注释的结果:
# BLASTP 2.2.30+
# Query: sp|Q197F8|002R_IIV3 Uncharacterized protein 002R OS=Invertebrate iridescent virus 3 GN=IIV3-002R PE=4 SV=1
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
sp|Q197F8|002R_IIV3 sp|Q197F8|002R_IIV3 100.00 458 0 0 1458 1 458 0.0 949
# BLASTP 2.2.30+
# Query: sp|Q197F7|003L_IIV3 Uncharacterized protein 003L OS=Invertebrate iridescent virus 3 GN=IIV3-003L PE=4 SV=1
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
sp|Q197F7|003L_IIV3 sp|Q197F7|003L_IIV3 100.00 156 0 0 1156 1 156 1e-111 320
# BLASTP 2.2.30+
# Query: sp|Q6GZX2|003R_FRG3G Uncharacterized protein 3R OS=Frog virus 3 (isolate Goorha) GN=FV3-003R PE=4 SV=1
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
sp|Q6GZX2|003R_FRG3G sp|Q6GZX2|003R_FRG3G 100.00 438 0 0 1438 1 438 0.0 900
# BLASTP 2.2.30+
# Query: sp|Q6GZX1|004R_FRG3G Uncharacterized protein 004R OS=Frog virus 3 (isolate Goorha) GN=FV3-004R PE=4 SV=1
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
sp|Q6GZX1|004R_FRG3G sp|Q6GZX1|004R_FRG3G 100.00 60 0 0 160 1 60 3e-36 121
# BLASTP 2.2.30+
# Query: sp|Q197F5|005L_IIV3 Uncharacterized protein 005L OS=Invertebrate iridescent virus 3 GN=IIV3-005L PE=4 SV=1
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
sp|Q197F5|005L_IIV3 sp|Q197F5|005L_IIV3 99.08 217 2 0 1217 1 217 2e-156 439
# BLAST processed 5 queries
3.Swiss-Prot Annotation Practise
$mkdir -p /home/train/swiss-prot
$cd /home/train/swiss-prot
$blast.pl blastp uniprot_sprot ../proteins.fasta 1e-5 4 uniprot_sprot 5
————我做到这一步就总是卡住,,,,,,继续研究中.....
$parsing_blast_result.pl uniprot_sprot.xml 20 1e-5 0.2 > uniprot_sprot.xls
-------------------------------------------------------------------------分割线------------------------------------------------------------------------------------------------------------------------------
bash: /home/sicong/blast/bin/parsing_blast_result.pl: 权限不够的解决方法
$cd /home/sicong/blast/bin/
$chmod 755 parsing_blast_result.pl
------------------------------------------------------------------------------分割线-------------------------------------------------------------------------------------------------------------------------
接下来我用了blastx,将核酸序列比对到蛋白质数据库,这里就是Swiss-prot
$makeblastdb -in uniprot_sprot.fasta -dbtype prot -title uniprot_sprot -parse_seqids -out uniprot_sprot -logfile uniprot_sprot.log
$cat uniprot_sprot.log
$blastx -help
$blastx -query Trinity.fasta -out swiss-prot_.tab -db uniprot_sprot -evalue 1e-5 -outfmt 7
$cat swiss-prot_.tab
# BLASTX 2.2.30+
# Query: TRINITY_DN105_c0_g1_i1 len=201 path=[179:0-200] [-1, 179, -2]
# Database: uniprot_sprot
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 280 hits found
TRINITY_DN105_c0_g1_i1 sp|P46595|UBC4_SCHPO 100.00 67 0 0 1 201 35 101 9e-43 141
TRINITY_DN105_c0_g1_i1 sp|Q9UVR2|UBC1_MAGO7 97.01 67 2 0 1 201 35 101 4e-42 139
TRINITY_DN105_c0_g1_i1 sp|O74196|UBC1_COLGL 95.52 67 3 0 1 201 35 101 2e-41 137
TRINITY_DN105_c0_g1_i1 sp|P15732|UBC5_YEAST 89.55 67 7 0 1 201 36 102 1e-39 133
TRINITY_DN105_c0_g1_i1 sp|P15731|UBC4_YEAST 88.06 67 8 0 1 201 36 102 2e-39 132
TRINITY_DN105_c0_g1_i1 sp|P61078|UB2D3_RAT 92.54 67 5 0 1 201 35 101 7e-39 131
TRINITY_DN105_c0_g1_i1 sp|Q5R4V7|UB2D3_PONAB 92.54 67 5 0 1 201 35 101 7e-39 131
TRINITY_DN105_c0_g1_i1 sp|P61079|UB2D3_MOUSE 92.54 67 5 0 1 201 35 101 7e-39 131
TRINITY_DN105_c0_g1_i1 sp|Q4R5N4|UB2D3_MACFA 92.54 67 5 0 1 201 35 101 7e-39 131
TRINITY_DN105_c0_g1_i1 sp|P61077|UB2D3_HUMAN 92.54 67 5 0 1 201 35 101 7e-39 131
TRINITY_DN105_c0_g1_i1 sp|P62840|UB2D2_XENLA 92.54 67 5 0 1 201 35 101 7e-39 131
TRINITY_DN105_c0_g1_i1 sp|P62839|UB2D2_RAT 92.54 67 5 0
..........
........
.....
...
..
.