I have got our de novo assembled transcriptome annotated using Sma3s (Sequence Massive Annotation With 3 modules) against UniProt plant taxonomic division (i.e. uniprot_sprot_plants.dat.gz
anduniprot_trembl_plants.dat.gz
).
According to the instructions on Sma3s website, I got all required scripts and modules installed. I did the annotation on a 64-bit Ubuntu virtual machine (Ubuntu 12.04 LTS, Perl 5.14.2, NCBI Blast 2.2.26).
After the annotation finished, I got the annotation result with a .annot suffix. But Blast2GO doesn't accept the e-values and the descriptions of Sma3s. So as Antonio (one of sma3s developers) suggested, I extracted the columns contained IDs and GO terms.
I used the command below:
$ cut -f 1,4 all_annotation > two_column_annot
$ awk -F"\t" '{gsub (/(\@[0-9]\.?[0-9]*|\@[0-9]\.?[0-9]*\e\-[0-9]*)/, "", $0); print $0}' two_column_annot | sed -e 's/([^)(]*)//g' | sed -e 's/([^)(]*)//g' | tr ';' ',' > b2g.annot
In the second command line above, I used two sed -e 's/([^)(]*)//g'
because there were nested parentheses. I'm not good at writing command, the command above is complicated but it works well in my case.
Open Blast2GO, File => Load Annotations (.annot), Annotation => GO-Slim => Run GO-Slim (online),Analysis => Make Combined Graph. Done.
I appreciate the generous help from Antonio and Evan. Without their help, I must be struggling with my sequences now.
这篇文章转载于:http://senhao.github.io/2013/07/10/import-sma3s-annotation-result-into-blast2go/