signalp工具+ 我的脚本

之前出现了一个bug
http://stackoverflow.com/questions/38605052/signalp-error-message-cant-locate-fasta-pm-in-inc

使用环境:ubuntu
下载signalp:http://www.cbs.dtu.dk/download/EBC48D58-1F4D-11E7-A2FF-F5A645117D33/
接着解压之后,查看readme,使用说明如下:

  1. Test SignalP on the 10 eukaryotic sequences shipped with the package: go to
    the signalp-4.1 directory on your system and issue the following commands:
    > ./signalp -t euk -f short test/euk10.fsa > euk10.fsa.short_out
    > ./signalp -t euk -f long test/euk10.fsa > euk10.fsa.long_out
    > ./signalp -t euk -f all test/euk10.fsa > euk10.fsa.all_out
    > ./signalp -t euk -f summary test/euk10.fsa > euk10.fsa.summary_out

然后我们要修改一下singalp的配置,打开signalp,编辑第14行,改成你自己文件的目录

BEGIN {
    #$ENV{SIGNALP} = '/usr/cbs/bio/src/signalp-4.1';
     $ENV{SIGNALP} = '/home/huangle/signalp-4.1';   
}

我的脚本是批处理序列的脚本,统计cutoff:

import re
import os
filename = 'test.fas'
directory = '/home/huangle/signalp-4.1/'
filepath = directory + filename
wfilepath = directory + filename+'.out'
pfilepath =directory + filename + '.short_out'
seq ={}#key: 'cazyme_name' value:  seq
cutoff ={}#key: 'cazyme_name' value:  cutoff
#shell exec

os.system('./signalp -t euk -f short '+ filename +' > ' + pfilepath)


#calculate the length of cazyme
f = open(filepath)
line = f.readline()
lenn = 0
line1 = re.split('\s',line)
line1 = line1[0].split('>')
key = line1[1]
tag =0
while line:

      line = f.readline()
      
      if line and line[0] == '>':
     seq[key] = lenn
     lenn=0
     line1 = re.split('\s',line)
     line1 = line1[0].split('>')
     key = line1[1]
      else:
           if line:
          lenn += len(line)-1

seq[key] = lenn
#print seq
f.close()
#calculate cutoff
f = open(pfilepath)
line = f.readline()
line = f.readline()
line = f.readline()
while line:
    line = re.split('\s\s+',line)
    key = line[0]
    n1 = int(line[2])
    n2 = int(line[4])
    n3 = int(line[6])
    MAX = -1
    if(n1>n2):
        if(n1>n3):
            MAX = n1
        else:
            MAX = n3
    else:
        if(n2>n3):
            MAX = n2
        else:
            MAX = n3    
    cutoff[key] = MAX-1
    line = f.readline()
#print cutoff
f.close()
#output
fw =open(wfilepath,'w')
title = "#cazyme_name   cayzme_length   Cleavage Site\n"
fw.write(title)
for (key,value) in seq.items():
    fw.write(key + '    ' + str(value) + '  ' + str(cutoff[key]) + '\n')
fw.close()
print "success"
os.system('gedit ' + wfilepath)

你可能感兴趣的:(signalp工具+ 我的脚本)