为什么pypy更慢了

文件样式:

407     206     399     474     380     505     378     262     16      307     463     239     137     518     114     470     
ENST00000456328.2_1     26.0522146463374        12.8134728632941        53.9639191995227        17.6675974730022        23.31847
ENST00000450305.2_1     0       0.71185960351634        0       0       0       4.07723611272235        0.95740454723925        
ENST00000488147.1_1     373.714527340564        453.454567439909        381.068290962784        539.450642842335        261.9898
ENST00000473358.1_1     0       0       0.830214141531119       1.17783983153348        1.37167476868719        0       0       
ENST00000469289.1_1     0       0       0       0       0       0       0       0.994054947983683       0       0       0       
ENST00000417324.1_1     0       0       0       0       0       0       0       0       0       0       0       0       0       
ENST00000461467.1_1     0       0       0       0       0       0       0       0       0       0       0       0       0       
ENST00000606857.1_1     0       0       0       0       0       0       0       0       0       0       0       0       0       
ENST00000642116.1_1     0       0       0       0       0       0       0       0       0       0       0       0       0       
ENST00000492842.2_2     0       0       0       0       0       0       0       0       0       0       0       0       0       
ENST00000641515.2_2     0       0       0.830214141531119       0       0       0       0       0       0       0       0       
ENST00000335137.4_2     0       0       0       0       0       0       0       0       0       0       0       0       0       
ENST00000466430.5_1     6.28846560428834        7.83045563867974        6.64171313224895        14.1340779784018        1.371674
ENST00000477740.5_1     0       0       0       0       0       0       0       0       0       0       0       0       0       
ENST00000471248.1_1     0.898352229184049       2.13557881054902        3.32085656612448        0       0       2.03861805636117
ENST00000610542.1_1     0       0       0       0       0       0       0       0       0       0       0       0       0       
ENST00000453576.2_1     0       0       0       0       0       0       0       0       0       0       0       0       0       
ENST00000495576.1_1     0       0       0       0       0       0       0       0       0       0       0       0       0       
ENST00000442987.3_1     8.08517006265644        21.3557881054902        16.6042828306224        9.42271865226786        15.08842

这是一个标准的基因表达矩阵csv文件,大小为352Mb,有208938行

接下来,我将在不适用第三方库如pandas 的情况下,将第一列基因名字的后缀去掉,为了验证pypy提高for循环速度,故意使用for循环而非列表推导

import datetime
starttime = datetime.datetime.now()
def split_fs(name):
    return name.split(".")[0]

with open("mRNA_normlized_by_deseq_quan.txt",'r') as f:
    first_line=next(f)
    fin=[first_line]
    for line in f:
        if line:
            name,*ot=line.split("\t")
            new=[split_fs(name)]
            new.extend(ot)
            new_line="\t".join(new)
            fin.append(new_line)

with open("test.txt",'a') as l:
    l.writelines(fin)
endtime = datetime.datetime.now()
print("time = ", (endtime - starttime).seconds)

代码如上,直接运行python3

python3 test.py
time = 4

大概消耗了4S时间就完成了数据处理

而我们运行最新版pypy-3.6

pypy3 test.py
time = 15 

这里pypy3的速度居然比python3满了将近4倍,这是什么原因呢?

你可能感兴趣的:(为什么pypy更慢了)