[Py005] gff文件处理1

根据第3列的type,提取mRNA及相应exon的信息

思路:

​ 每次读取一行,提取到mRNA特征值后,写出该行;

​ 判断下一行是否具有mRNA或exon特征值,如果有的话,递归自动判断下下一行

import re
import sys

sys.setrecursionlimit(1000000)  # 设置最高递归次数

def autoNext(file, out):
    content = next(file)
    if re.search("\tmRNA\t", content) or re.search(r"\texon\t", content):
        out.write(content)
        return autoNext(file, out)


with open("genome.gff", "r") as gff:
    outGFF = open("mRNA.tmp.gff", "w")
    try:
        while gff:
            line = next(gff)
            if re.search(r"\tmRNA\t", line):
                outGFF.write(line)
                autoNext(gff, outGFF)
                outGFF.flush() # 及时清理缓存
    except StopIteration: # 防止最后next完最后一行后报错
        pass
    outGFF.close()

你可能感兴趣的:([Py005] gff文件处理1)