用Cython编译写出更快的Python代码

原文地址: http://www.behnel.de/cython200910/talk.html以下为原文

About myself

  • Passionate Python developer since 2002

    • after Basic, Logo, Pascal, Prolog, Scheme, Java, C, ...

  • CS studies in Germany, Ireland, France

  • PhD in distributed systems in 2007

    • Language design for self-organising systems

    • Darmstadt University of Technologies, Germany

  • Current occupations:

    • http://codespeak.net/lxml/

    • IT transformations, SOA design, Java-Development, ...

    • Employed by Senacor Technologies AG, Germany

    • »lxml« OpenSource XML toolkit for Python

    • »Cython«

Part 1: Intro to Cython

  • Part 1: Intro to Cython

  • Part 2: Building Cython modules

  • Part 3: Writing fast code

  • Part 4: Talking to other extensions

What is Cython?

Cython is the missing link

between the simplicity of Python

and the speed of C / C++ / Fortran.

SimplicityVsSpeed1.png

What is Cython?

Cython is the missing link

between the simplicity of Python

and the speed of C / C++ / Fortran.

SimplicityVsSpeed.png

What is Cython?

Cython is

  • an Open-Source project

    • http://cython.org

    • http://pypi.python.org/pypi/Cython

  • a Python compiler (almost)

    • an enhanced, optimising fork of Pyrex

  • an extended Python language for

    • writing fast Python extension modules

    • interfacing Python with C libraries

Major Cython Core Developers

  • Robert Bradshaw, Stefan Behnel, Dag Sverre Seljebotn

    • lead developers

  • Lisandro Dalcín

    • C/C++ portability and various feature patches

  • Kurt Smith, Danilo Freitas

    • Google Summer of Code 2009: Fortran/C++ integration

  • Greg Ewing

    • main developer and maintainer of Pyrex

  • many, many others - see

    • http://cython.org/

    • the mailing list archives of Cython and Pyrex

How to use Cython

  • you write Python code

    • Cython translates it into C code

    • your C compiler builds a shared library for CPython

    • you import your module into CPython

  • Cython has support for

    • optionally compile Python code from setup.py!

    • Cython does that for its own modules :-)

    • distutils

    • embedding the CPython runtime in an executable

Example: compiling Python code

# file: worker.pyclass HardWorker(object):    u"Almost Sisyphos"    def __init__(self, task):        self.task = task    def work_hard(self, repeat=100):        for i in range(repeat):            self.task()def add_simple_stuff():
    x = 1+1HardWorker(add_simple_stuff).work_hard()

Example: compiling Python code

  • compile with

    $ cython worker.py
  • translates to ~1500 line .c file (Cython 0.11.3)

    • helps tracing your own code in generated sources

    • different C compilers, Python versions, ...

    • lots of portability #define's

    • tons of helpful C comments with Python code snippets

    • a lot of code that you don't want to write yourself

Portable Code

  • Cython compiler generates C code that compiles

    • with all major compilers (C and C++)

    • on all major platforms

    • in Python 2.3 through 3.1

  • Cython language syntax follows Python 2.6

    • get involved to get it quicker!

    • optional Python 3 syntax support is on TODO list

... the fastest way to port Python 2 code to Py3 ;-)

Python language feature support

  • most of Python 2 syntax is supported

    • top-level classes and functions

    • control structures: loops, with, try-except/finally, ...

    • object operations, arithmetic, ...

  • plus many Py3 features:

    • list/set/dict comprehensions

    • keyword-only arguments

    • extended iterable unpacking (a,b,*c,d = some_list)

Python features in work

  • Inner functions with closures

    def factory(a,b):    def closure_function(c):        return a+b+c    return closure_function
    • status: (hopefully) to be merged for 0.12

Planned Cython features

  • improved C++ integration (GSoC 2009)

    • e.g. function/operator overloading support

    • status: mostly there, to be finished and integrated

  • improved Fortran integration (GSoC 2009)

    • talking to Fortan code directly

    • status: mostly there, to be finished and integrated

  • native array data type with SIMD behaviour

    • status: large interest, implementation pending

... as usual: great ideas, little time

Currently unsupported

  • local/inner classes (~open)

  • lambda expressions (~easy)

  • generators (~needs work)

  • generator expressions (~easy)

    • with obvious optimisations, e.g.

      set( x.a for x in some_list )== { x.a for x in some_list }

... all certainly on the TODO list for 1.0.

Speed

Cython generates very efficient C code:

  • PyBench: most benchmarks run 20-80% faster

    • conditions and loops run 5-8x faster than in Py2.6.2

    • overall about 30% faster for plain Python benchmark

    • obviously, real applications are different

  • PyPy's richards.py benchmark:

    • heavily class based scheduler

    • 20% faster than CPython 2.6.2

Type declarations

Cython supports optional type declarations that

  • can be employed exactly where performance matters

  • let Cython generate plain C instead of C-API calls

  • make richards.py benchmark 5x faster than CPython

    • without Python code modifications :)

  • can make code 100 - 1000x faster than CPython

    • expect several 100 times in calculation loops

Part 2: Building Cython modules

  • Part 1: Intro to Cython

  • Part 2: Building Cython modules

  • Part 3: Writing fast code

  • Part 4: Talking to other extensions

Ways to build Cython code

To compile Python code (.py) or Cython code (.pyx)

  • you need:

    • Cython, Python and a C compiler

  • you can use:

    • web app that supports writing and running Cython code

    • on-the-fly build + import (for experiments)

    • setup.py script (likely required anyway)

    • distutils

    • pyximport

    • Sage notebook

    • cython source.pyx + manual C compilation

Example: distutils

  • A minimal setup.py script:

    from distutils.core import setupfrom distutils.extension import Extensionfrom Cython.Distutils import build_ext
    
    ext_modules = [Extension("worker", ["worker.py"])]
    
    setup(
      name = 'stupid little app',
      cmdclass = {'build_ext': build_ext},
      ext_modules = ext_modules
    )
  • Run with

    $ python setup.py build_ext --inplace

Example: pyximport

Build and import Cython code files (.pyx) on the fly

$ ls
worker.pyx$ PYTHONPATH=. python

Python 2.6.2 (r262:71600, Apr 17 2009, 11:29:30)[GCC 4.3.2] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import pyximport>>> pyximport.install()>>> import worker>>> worker<module 'worker' from '~/.pyxbld/.../worker.so'>>>> worker.HardWorker<class 'worker.HardWorker'>>>> worker.HardWorker(worker.add_simple_stuff).work_hard()

pyximporting Python modules

  • pyximport can also compile Python modules:

>>> import pyximport>>> pyximport.install(pyimport = True)>>> import shlex[lots of compiler errors from different modules ...]>>> help(shlex)

  • currently works for a few stdlib modules

  • falls back to normal Python import automatically

  • not production ready, but nice for testing :)

Writing executable programs

# file: hw.pydef hello_world():    import sys    print "Welcome to Python %d.%d!" % sys.version_info[:2]if __name__ == '__main__':
    hello_world()

Writing executable programs

# file: hw.pydef hello_world():    import sys    print "Welcome to Python %d.%d!" % sys.version_info[:2]if __name__ == '__main__':
    hello_world()

Compile, link and run:

$ cython --embed hw.py   # <- embed a main() function$ gcc $CFLAGS -I/usr/include/python2.6 \
   -o hw hw.c -lpython2.6 -lpthread -lm -lutil -ldl$ ./hw
Welcome to Python 2.6!

Part 3: Writing fast code

  • Part 1: Intro to Cython

  • Part 2: Building Cython modules

  • Part 3: Writing fast code

  • Part 4: Talking to other extensions

A simple example

  • Plain Python code:

# integrate_py.pyfrom math import sindef f(x):    return sin(x**2)def integrate_f(a, b, N):
    dx = (b-a)/N
    s = 0    for i in range(N):
        s += f(a+i*dx)    return s * dx

Type declarations in Cython

Function arguments are easy

  • Python:

    def f(x):    return sin(x**2)
  • Cython:

    def f(double x):    return sin(x**2)

Type declarations in Cython

»cdef« keyword declares

  • variables with C or builtin types

    cdef double dx, s
  • functions with C signatures

    cdef double f(double x):    return sin(x**2)
  • classes as 'builtin' extension types

    cdef class MyType:    cdef int field

Functions: def vs. cdef vs. cpdef

  • def func(int x):

    • part of the Python module API

    • Python call semantics

  • cdef int func(int x):

    • C signature

    • C call semantics

  • cpdef int func(int x):

    • Python wrapper around cdef function

    • C calls cdef function, Python calls wrapper

    • note: modified C signature!

Typed arguments and return values

  • def func(int x):

    • caller passes Python objects for x

    • function converts to int on entry

    • implicit return type always object

  • cdef int func(int x):

    • caller converts arguments as required

    • function receives C int for x

    • arbitrary return type, defaults to object

  • cpdef int func(int x):

    • wrapper converts

    • C callers convert arguments as required

    • Python callers pass and receive objects

A simple example: Python

# integrate_py.pyfrom math import sindef f(x):    return sin(x**2)def integrate_f(a, b, N):
    dx = (b-a)/N
    s = 0    for i in range(N):
        s += f(a+i*dx)    return s * dx

A simple example: Cython

# integrate_cy.pyxcdef extern from "math.h":
    double sin(double x)cdef double f(double x):    return sin(x**2)cpdef double integrate_f(double a, double b, int N):    cdef double dx, s    cdef int i

    dx = (b-a)/N
    s = 0    for i in range(N):
        s += f(a+i*dx)    return s * dx

Overriding declarations in .pxd

  • Plain Python code:

# integrate_py.pyfrom math import sindef f(x):    return sin(x**2)def integrate_f(a, b, N):
    dx = (b-a)/N
    s = 0    for i in range(N):
        s += f(a+i*dx)    return s * dx

Overriding declarations in .pxd

Python integrate_py.py Cython integrate_py.pxd
# integrate_py.pyfrom math import sindef f(x):    return sin(x**2)def integrate_f(a, b, N):

    dx = (b-a)/N
    s = 0    for i in range(N):
        s += f(a+i*dx)    return s * dx
# integrate_py.pxdcimport cythoncpdef double f(double x)@cython.locals(
    dx=double, s=double, i=int)cpdef integrate_f(
    double a, double b, int N)

The .pxd file used

# integrate_py.pxdcimport cythoncpdef double f(double x):    return sin(x**2)cpdef double integrate_f(double a, double b, int N)

Overriding declarations in .pxd

  • advantage:

    • Eclipse, pylint, 2to3, ...

    • runs unchanged in Python interpreter

    • plain Python code

    • complete Python tool-chain available

  • drawback:

    • cannot override from math import sin

    • no access to C functions

Typing in Python syntax

  • Plain Python code:

# integrate_py.pyfrom math import sindef f(x):    return sin(x**2)def integrate_f(a, b, N):
    dx = (b-a)/N
    s = 0    for i in range(N):
        s += f(a+i*dx)    return s * dx

Typing in Python syntax

  • http://wiki.cython.org/pure

from math import sinimport [email protected](x=cython.double)def f(x):    return sin(x**2)@cython.locals(a=cython.double, b=cython.double,
               N=cython.Py_ssize_t, dx=cython.double,
               s=cython.double, i=cython.Py_ssize_t)def integrate_f(a, b, N):
    dx = (b-a)/N
    s = 0    for i in range(N):
        s += f(a+i*dx)    return s * dx

Declaring Python types

  • Access to Python's builtins is heavily optimised

    • for ... in range()/list/tuple/dict

    • list.append(), list.reverse()

    • set([...]), tuple([...])

  • Further improvements in Cython 0.12

    • replacements for enumerate(), type()

    • dict([...]), unicode.encode(), list.sort()

  • Declaring Python types is often worth it!

  • Easy to add new optimisations

    • don't write prematurely optimised code, fix Cython!

Declaring Python types: dict

  • example: dict iteration

def filter_a(d):    return { key : value             for key, value in d.iteritems()             if 'a' not in value }import stringd = { s:s for s in string.ascii_letters }print filter_a(d)

Declaring Python types: dict

  • simple change, ~30% faster:

def filter_a(dict d):       # <====    return { key : value             for key, value in d.iteritems()             if 'a' not in value }import stringd = { s:s for s in string.ascii_letters }print filter_a(d)

Declaring Python types: dict

  • simple change, ~30% faster:

def filter_a(dict d):       # <====    return { key : value             for key, value in d.iteritems()             if 'a' not in value }import stringd = { s:s for s in string.ascii_letters }print filter_a(d)

  • drawback:

    • non-dict mapping arguments raise a TypeError

Think twice before you type

SimplicityVsSpeed.png

  • benchmark code before adding static types!

Classes

  • class MyClass(object):

    • Python class with __dict__

    • multiple inheritance

    • arbitrary Python attributes

    • Python methods

    • monkey-patcheable etc.

  • cdef class MyClass(SomeSuperClass):

    • C-only access by default, or readonly/public

    • only from other extension types!

    • "builtin" extension type

    • single inheritance

    • fixed, typed fields

    • Python + C methods

cdef classes - when to use them?

  • Use cdef classes

    • e.g. whenever wrapping C structs/pointers/etc.

    • when C attribute types are used

    • when the need for speed beats Python's generality

  • Use Python classes

    • for bytes/tuple subtypes (PyVarObject)

    • for exceptions if Py<2.5 compatibility is required

    • when multiple inheritance is required

    • when users are allowed to monkey-patch

Part 4: Talking to other extensions

  • Part 1: Intro to Cython

  • Part 2: Building Cython modules

  • Part 3: Writing fast code

  • Part 4: Talking to other extensions

Talking to other extensions

  • Python 3 buffer protocol (available in Py2.6)

  • external C-APIs

Python 3 buffer protocol

  • Native support for new Python buffer protocol

    • PEP 3118

def inplace_invert_2D_buffer(                object[unsigned char, 2] image):    cdef int i, j    for i in range(image.shape[0]):        for j in range(image.shape[1]):
            image[i, j] = 255 - image[i, j]

  • can be supported for extension types in Py2.x

    • declared through .pxd files

    • Cython ships with numpy.pxd

    • array.pxd available (stdlib's array)

Conclusion

  • Cython is a tool for

    • translating Python code to efficient C

    • easily interfacing to external C/C++/Fortran code

  • Use it to

    • concentrate on the mapping, not the glue!

    • don't change the language just to get fast code!

    • concentrate on optimisations, not rewrites!

    • speed up existing Python modules

    • write C extensions for CPython

    • wrap C libraries in Python

... but Cython is also

  • a great project

  • a very open playground for great ideas!

Cython

Cython

C-Extensions in Python

... use it, and join the project!

http://cython.org/

你可能感兴趣的:(用Cython编译写出更快的Python代码)