python 程序性能优化的套路一般有两种:1)jit, 即just in time compiler, 即时编译器,在运行时将某些函数编译成二进程代码,使用这种方式的有:numba 和pypy;2)将python代码转换成c++/c代码,然后编译执行,这种方式有:cython和nuitka。总而言之,转换成c++/c代码以后编译成二进制文件执行的效率比用numba和pypy即时编译执行的效率要高。
1. 首先看一下python写的求质数的函数 以及 用 numba的jit优化的函数
# main.py
# 纯python语言写的求质数的代码
def primes_python(nb_primes):
p = []
n = 2
while len(p) < nb_primes:
# Is n prime?
for i in p:
if n % i == 0:
break
# If no break occurred in the loop
else:
p.append(n)
n += 1
return p
# 使用numba的jit优化的代码,只需要在上面的函数加一行代码
from numba import jit
@jit
def primes_jit(nb_primes):
p = []
n = 2
while len(p) < nb_primes:
# Is n prime?
for i in p:
if n % i == 0:
break
# If no break occurred in the loop
else:
p.append(n)
n += 1
return p
2. 新建一个primes.pyx文件,写一个cython函数,其中声明了变量的类型
# primes.pyx
def primes(int nb_primes):
cdef int n, i, len_p
cdef int p[1000]
if nb_primes > 1000:
nb_primes = 1000
len_p = 0 # The current number of elements in p.
n = 2
while len_p < nb_primes:
# Is n prime?
for i in p[:len_p]:
if n % i == 0:
break
# If no break occurred in the loop, we have a prime.
else:
p[len_p] = n
len_p += 1
n += 1
# Let's return the result in a python list:
result_as_list = [prime for prime in p[:len_p]]
return result_as_list
再建立一个primes_python.pyx文件,新建一个和之前python里面写的一样的函数,作为对比。
# primes_python.pyx
def primes_python(nb_primes):
p = []
n = 2
while len(p) < nb_primes:
# Is n prime?
for i in p:
if n % i == 0:
break
# If no break occurred in the loop
else:
p.append(n)
n += 1
return p
新建setup.py文件,用来编译.pyx函数
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules=cythonize(["primes.pyx", "primes_python.pyx"],
annotate=True)
)
# 编译命令用这个
# python setup.py build_ext --inplace
使用python setup.py build_ext --inplace编译后可以得到.pyd文件,就是可以导入的python库了。
3. 修改一下main.py, 加入函数调用和度量
# main.py 的完整内容
import primes
import primes_python
import timeit
from numba import jit
def primes_python(nb_primes):
p = []
n = 2
while len(p) < nb_primes:
# Is n prime?
for i in p:
if n % i == 0:
break
# If no break occurred in the loop
else:
p.append(n)
n += 1
return p
@jit
def primes_jit(nb_primes):
p = []
n = 2
while len(p) < nb_primes:
# Is n prime?
for i in p:
if n % i == 0:
break
# If no break occurred in the loop
else:
p.append(n)
n += 1
return p
if __name__ == "__main__":
repeat_times = 1000
t1 = timeit.timeit(stmt="primes_python(1000)",
setup="from __main__ import primes_python", number=repeat_times)
print(f"run in python: {t1}s")
t2 = timeit.timeit(stmt="primes.primes(1000)",
setup="import primes", number=repeat_times)
print(f"run cython with cdef: {t2}s")
t3 = timeit.timeit(stmt="primes_jit(1000)",
setup="from __main__ import primes_jit", number=repeat_times)
print(f"run in python with numba jit: {t3}s")
t4 = timeit.timeit(stmt="primes_python.primes_python(1000)",
setup="import primes_python", number=repeat_times)
print(f"run cython without cdef: {t4}s")
运行一下,得到的结果如下:
run in python: 28.519053545829927s
run cython with cdef: 1.6289360376895452s
run in python with numba jit: 2.0565857326599577s
run cython without cdef: 13.221758278866588s
4. 测试一下pypy的结果,建立primes_pypy.py文件:
# primes_pypy.py
import timeit
def primes_python(nb_primes):
p = []
n = 2
while len(p) < nb_primes:
# Is n prime?
for i in p:
if n % i == 0:
break
# If no break occurred in the loop
else:
p.append(n)
n += 1
return p
if __name__ == "__main__":
repeat_times = 1000
t1 = timeit.timeit(stmt="primes_python(1000)",
setup="from __main__ import primes_python", number=repeat_times)
print(f"run in pypy: {t1}s")
使用pypy3 primes_pypy.py 运行文件, 得到结果如下:
run in pypy: 3.0445395345987682s
5. nuitka的暂时没弄出来, 总体的运行结果如下:
run in python: 28.519053545829927s
run cython with cdef: 1.6289360376895452s
run in python with numba jit: 2.0565857326599577s
run cython without cdef: 13.221758278866588s
run in pypy: 3.0445395345987682s
基本上jit的效果很明显,也不用改动python代码。