Why is truth() faster than bool()? Part II

Why is truth() faster than bool()? Part II

In this post we’ll continue our quest for line-by-line Python optimizations.

We already covered several tips in part I.

Today we’ll learn:

  1. Why using syntax beats calling methods
  2. How dict.get() is disappointing
  3. Why you should avoid everything in the copy module if you can
  4. How object attribute resolution is not free
  5. And finally, why operator.truth() is faster than bool()

1) Syntax is faster than calling methods

Use the {} and [] syntax to create new dicts and lists rather than the dict() or list() methods.

Using the dis module to examine dict construction, we see the two processes are totally different.

>>> from dis import dis

>>> dis(lambda: dict(a=1))

  1           0 LOAD_GLOBAL              0 (dict)

              3 LOAD_CONST               0 ('a')

              6 LOAD_CONST               1 (1)

              9 CALL_FUNCTION          256

             12 RETURN_VALUE        

>>> dis(lambda: {‘a’: 1})

  1           0 BUILD_MAP                1

              3 LOAD_CONST               0 (1)

              6 LOAD_CONST               1 ('a')

              9 STORE_MAP           

             10 RETURN_VALUE     

The {} syntax goes directly to the bytecode BUILD_MAP and STORE_MAP, where dict() does a subroutine call (CALL_FUNCTION) plus it has to look up the name ‘dict’ first (LOAD_GLOBAL).

The difference is large:

>>> timeit.Timer("{'a': 1}").timeit(1000000)

0.10097813606262207

>>> timeit.Timer("dict(a=1)").timeit(1000000)

0.43113112449645996

The result is similar for making an empty list:

>>> timeit.Timer("[]").timeit(1000000)

0.039584875106811523

>>> timeit.Timer("list()").timeit(1000000)

0.1627810001373291

2) Conditional dict lookup

It’s a heartbreaker, but dict.get() is slower than its more verbose alternatives.

k in adict and adict[k]

is faster than

adict.get(k)

According to timeit:

>>> timeit.Timer("adict.get(0)", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.13234400749206543

>>> timeit.Timer("0 in adict and adict[0]", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.052942037582397461

It’s still slower even if the key exists:

>>> timeit.Timer("adict.get(1)", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.13730907440185547

>>> timeit.Timer("1 in adict and adict[1]", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.099563121795654297

Or if you specify an alternative:

>>> timeit.Timer("adict.get(0, None)", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.15692591667175293

>>> timeit.Timer("adict[0] if 0 in adict else None", "adict = {1: 2, 3: 4, 5: 6}").timeit(1000000)

0.077980995178222656

Even so, it’s kind of ugly and long-winded. Isn’t just better to keep it simple and aesthetic, even at the cost of a little efficiency?

Let’s consider a slightly more complex case:

adict.get(x, {})

In this case, the {} dict object will be constructed every single time and passed to the get method, and then thrown away if the key is found (imagine even using a more complex object than a dict). Consider instead an alternative such as this:

adict[x] if x in adict else {}

In this way the {} will never be constructed unless it will actually be used.

3) Avoid the copy module

copy.copy and copy.deepcopy are performance hogs. Avoid them if you can!

Look at how copy.deepcopy performs when copying nested dicts just two levels deep, compared to just iterating through them and making copies ourselves:

>>> adict = {1: {2: 3, 4: 5}, 4: {5: 8}, 8: {3: 'a', 'b': 9}}

>>> timeit.Timer("copy.deepcopy(adict)", "import copy; from __main__ import adict").timeit(100000)

2.78672194480896

>>> timeit.Timer("dict((k, v.copy()) for k, v in adict.iteritems())", "from __main__ import adict").timeit(100000)

0.25296592712402344

Yikes.

copy.deepcopy does a big, scary recursive copy, so no wonder it’s so slow. But, you might ask, I just want a shallow copy of a list or dict! Can’t I use copy.copy?

copy.copy does all sorts of type checking that looks for different ways to copy different types of things. It’s also implemented in Python, not c, so there’s no advantage there. If you’re certain of the type of object you’re going to copy, there is always a better alternative than copy.copy.

You might think, if you’re copying a dict, you can just pass it into dict() to make a copy, such as dict(adict). The same goes for a list: list(alist); and sets: set(aset). But these are not actually the fastest ways to make a copies of these types of objects.

The dict and set types have copy() methods that are the fastest way to make a new copy of the same object:

adict.copy()

and

aset.copy()

For a dict, the timeit output is:

>>> timeit.Timer("copy.copy(adict)", "import copy; adict={1: 2, 3: 4, 5: 6}").timeit(100000)

0.12309503555297852

>>> timeit.Timer("dict(adict)", "import copy; adict={1: 2, 3: 4, 5: 6}").timeit(100000)

0.052445888519287109

>>> timeit.Timer("adict.copy()", "import copy; adict={1: 2, 3: 4, 5: 6}").timeit(100000)

0.017553091049194336

Interestingly for lists, the fastest way to make a copy is actually an unbounded slice operation:

newlist = alist[:]

timeit says it all:

>>> timeit.Timer("copy.copy(a)", "import copy; a = [1,2,3,4,5]").timeit(100000)

0.092635869979858398

>>> timeit.Timer("list(a)", "import copy; a = [1,2,3,4,5]").timeit(100000)

0.028503894805908203

>>> timeit.Timer("a[:]", "import copy; a = [1,2,3,4,5]").timeit(100000)

0.013506889343261719

The same syntax works for tuples (or for that matter, strings) but you should never need to copy a tuple or string since they are immutable.

If you’re copying an object of your own design, the best method is write your own copy constructor and use that.

4) Attribute resolution is not free

If you want to use the groupby method in the itertools module, should you import groupby from itertools or import itertools and then call itertools.groupby?

Opinions differ on the relative merits in terms of style, but each of those ‘.’ operators actually incur a little cost so “from itertools import groupby” is actually more efficient if you’re using groupby a lot.

You can see the minor difference using the dis module again:

>>> import itertools

>>> dis.dis(lambda: itertools.groupby(()))

  1           0 LOAD_GLOBAL              0 (itertools)

              3 LOAD_ATTR                1 (groupby)

              6 LOAD_CONST               0 (())

              9 CALL_FUNCTION            1

             12 RETURN_VALUE        

>>> from itertools import groupby

>>> dis.dis(lambda: groupby(()))

  1           0 LOAD_GLOBAL              0 (groupby)

              3 LOAD_CONST               0 (())

              6 CALL_FUNCTION            1

              9 RETURN_VALUE 

Note the additional LOAD_ATTR instruction when calling itertools.groupby . This will be incurred every time that itertools.groupby is called; for all the Python interpreter knows, the attribute reference could have been redefined between subsequent calls.

The actual performance difference for a trivial case is real but small:

>>> timeit.Timer("groupby(())", "from itertools import groupby").timeit(1000000) 

0.20799016952514648

>>> timeit.Timer("itertools.groupby(())", "import itertools").timeit(1000000)

0.24183106422424316

5) Truth is faster than bool

Suppose you want to get the count of the number of objects in some collection that have some attribute set to True. You could follow the advice from the previous post and write something like:

sum(some_object.some_attribute for some_object in object_list)

(Remember, bool is a subclass of int in Python and True is an alias of 1 and False is an alias of 0.)

But, if some_attribute in the code snippet above could possibly be None, you risk a TypeError. Unlike False, None is not equivalent to zero and does not support arithmetic operation.

So, here’s an alternative. Just pass the attribute to bool() to cast Nones into Falses.

sum(bool(some_object.some_attribute) for some_object in object_list)

But, is this the fastest way? No, as it turns out.

It turns out that operator.truth is faster than bool.

from operator import truth

sum(truth(some_object.some_attribute) for some_object in object_list)

If object_list is extremely large, this can make a significant difference.

Why is this? Examining the source code for bool, we find:

  53 bool_new(PyTypeObject *type, PyObject *args, PyObject *kwds)

  54 {

  55     static char *kwlist[] = {"x"0};

  56     PyObject *= Py_False;

  57     long ok;

  58

  59     if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O:bool", kwlist, &x))

  60         return NULL;

  61     ok = PyObject_IsTrue(x);

  62     if (ok < 0)

  63         return NULL;

  64     return PyBool_FromLong(ok);

  65 }

Whereas in the operator module source code:

  88 spami(truth            , PyObject_IsTrue)

operator.truth is just an alias for the Python C function PyObject_IsTrue, but bool() does some extra stuff that adds some overhead before ultimately calling PyObject_IsTrue itself.

By the way, remember:

from operator import truth

truth()

is a tiny bit faster than

import operator

operator.truth()

As we saw above, that ‘.’ does do a lookup that adds a tiny bit of execution to resolve.

Coda

If you enjoyed these posts and like writing nimble Python, we should have a conversation about you working here. Contact our man on the inside John Delaney at  [email protected]

你可能感兴趣的:(Why is truth() faster than bool()? Part II)