导读
用python3脚本统计fasta文件碱基数、GC含量
python3脚本
fasta文件、格式就不多说了
#usr/bin/env python3
import os,sys,re
ms, infile, outfile = sys.argv
with open(infile) as f:
f=f.read()
# read()将整个文件读到内存
f=re.sub('>.*\n|\n', '', f)
# 去掉标题行、回车符
size=len(f)
nG=f.count("g")+f.count("G")
nC=f.count("c")+f.count("C")
percent_GC=(nG+nC)/size
with open(outfile, 'w') as out:
out.write("ID\tsize\tpercent_gc\n")
ID=re.sub('.fasta', '', infile)
out.write("{}\t{}\t{}\n".format(ID, size, percent_GC))
# 行末尾加\n
运行,计算碱基数、GC含量
python3 assembly.fasta out.file
# 使用python3 infile outfile