Dictionary access in Python the right way

Dictionary access in Python the right way

The first step to writing good code is to read good code. The second step is to figure out what is wrong with that code, and the third step is to avoid those mistakes.

A common task in Python is to return a value from a dictionary, given a key, and return some other value if the dictionary does not contain they key. A variant of this problem is to execute some code depending on whether the key is in the dictionary. These four methods all achieve this task:

def get_with_has_key(dictionary, key, default):
    if dictionary.has_key(key):
        return dictionary[key]
    else:
        return default

def get_with_in(dictionary, key, default):
    if key in dictionary:
        return dictionary[key]
    else:
        return default

def get_with_get(dictionary, key, default):
    return dictionary.get(key, default)

def get_with_try_except(dictionary, key, default):
    try:
        return dictionary[key]
    except KeyError:
        return default

Great! Woohoo, we’ve achieved the task. But, according to The Zen of Python, “there should be one– and preferably only one –obvious way to do it.” So, which way is obvious?

I see get_with_has_key a lot in code written by people who are still learning python. It’s perfectly reasonable, but there is a clear reason it’s not the most obvious way. Membership testing in Python should nearly always be done using the “in” keyword. I don’t know why the dict type even has as “has_key” method, other than to help beginners who don’t know how to use the “in” keyword. In fact, this is more of a hindrance to beginners, as it prevents them from learning the obvious/elegant way to do membership testing.

get_with_in is a very elegant solution. It would be (similar to) the obvious solution if this were a problem about sets. But because we’re using dictionaries…

get_with_get is the obvious solution. It’s a dict method that serves exactly this purpose. Why would you use anything else? Well, there’s the variant of the problem where code needs to be executed. dict.get cannot elegantly be used to execute code. Let’s say you need to run some code if the key isn’t in the dictionary. Then you can use…

get_with_try_except. get_with_try_except has a space for a suite (block of code) to run if the key is in the dictionary, and a suite to run if they key is not in the dictionary. In fact, here’s an example straight from the numpy source:

try:
    thismat = ldict[col]
except KeyError:
    try:
        thismat = gdict[col]
    except KeyError:
        raise KeyError("%s not found" % (col,))

Notice that an extra block of code is run if the key isn’t in ldict. This means get_with_get isn’t suitable for this situation. numpy uses a variation on get_with_try_except here. But they also could have used a variation on get_with_in:

if col in ldict:
    thismat = ldict[col]
else:
    if col in gdict:
        thismat = gdict[col]
    else:
        raise KeyError("%s not found" % (col,))

So, which way is obvious?

The Zen of Python also states that “Explicit is better than implicit.” Catching a KeyError is only implicitly a result of the key not being in the dictionary. This could be fixed by adding a comment “# col not a key in either dict.” That would be explicit. On the other hand, catching a KeyError does not necessarily mean the key isn’t in the dictionary. If you try to access dictionary[somefunction()] and somefunction raises a KeyError, the result is indistinguishable. And, if a type other than the builtin dict is used, that class’s __getitem__ hook could raise a KeyError for some reason other than the key being invalid. So, get_with_in is not only more explicit but also more precise in its condition checking. The condition will pass if and only if the dictionary has the specified key. With get_with_try_except, some edge case could go wrong.

Another reason get_with_in is the obvious choice is the performance. Raising an exception is expensive (not to mention ugly.)

123456789101112131415161718192021222324252627282930313233343536373839404142
           
           
           
           
import random
dictionary = { i : random . random () for i in xrange ( 10000 ) if i % 2 == 0 }
def get_with_try_except ( d , key , default ):
try :
return d [ key ]
except KeyError :
return default
def get_with_get ( d , key , default ):
return d . get ( key , default )
def get_with_in ( d , key , default ):
if key in d :
return d [ key ]
else :
return default
 
for function in [ get_with_try_except , get_with_get , get_with_in ]:
print '---'
print function . __name__
print 'With valid key:'
% timeit function ( dictionary , 24 , None )
print 'With invalid key:'
% timeit function ( dictionary , 25 , None )
 
---
get_with_try_except
With valid key :
1000000 loops , best of 3 : 333 ns per loop
With invalid key :
100000 loops , best of 3 : 2.86 us per loop
---
get_with_get
With valid key :
1000000 loops , best of 3 : 491 ns per loop
With invalid key :
1000000 loops , best of 3 : 501 ns per loop
---
get_with_in
With valid key :
1000000 loops , best of 3 : 380 ns per loop
With invalid key :
1000000 loops , best of 3 : 335 ns per loop
view raw dictionary_get.py hosted with ❤ by  GitHub

If the dictionary does not contain the key, get_with_in takes about 335ns. But get_with_try_except takes 2860ns! This is 750% longer. If this simple operation is part of a nested loop in a function that gets called millions of times (which it very often is, in libraries like numpy) those extra couple of microseconds can add up.

Conclusions

get_with_try_except should not be used. It’s unclear, slow, ugly, and imprecise. Yet people continue to include it in their code, because it’s what they’ve seen in other people’s code. That’s why it’s important to move past the first step, reading other people’s code, to the second step: figuring out what’s wrong with that code so you can learn from it.

get_with_get is the clearest, most obvious way to get a value from a dictionary and default to a default value. Although get_with_in outperforms it slightly, clarity is almost always valued over marginal speed in Python. get_with_in and get_with_get are both clear and simple, however, and this is one of those cases where the obvious way may not be obvious at first unless you’re Dutch. However, the obvious way is certainly not get_with_try_except. And if a block of code needs to be executed, get_with_in is definitely the obvious way.

This is a simple example (I hope!) but the message stands: don’t blindly copy a practice that you’ve seen in someone else’s code, even if it’s in NumPy’s code! (I’m not trying to single out NumPy — I wanted to provide an example of real world code, and it had to be open source). No code is perfect, and noticing flaws helps you avoid them. Dictionary access may seem trivial but doing it inefficiently can add up and cause major performance problems. I could say the same for using a list comprehension where a generator expression would suffice, and so on. If you find yourself writing a code snippet you’ve written a thousand times, that’s where you should be most cautious that there could be a better way to do it.

From:
http://cbarker.net/blog/archives/219

你可能感兴趣的:(Dictionary access in Python the right way)