Entirely copied from here
All I want is to mark it, thanks for this nice article.
You might have lived a long and happy coding life without ever needing to know what closures are. For one, many languages like C or C++ don't even support those (edit: C++ does support them via lambdas, of course. Sorry for any misinformation). Many functional languages rely heavily on closures. Others, like Python, support it, but it's not at the very core of the language (or the language philosophy, for that matter). Python, however, is a fine language for getting to know closures. So, even if your expertise is in some other language, you might learn a thing or two here.
Before I even attempt some kind of definition (wikipedia/closure has one, albeit somewhat complicated) or give you an example, I'll tell you what closures are about: They are about maintaining state of some kind in functions. Now, that is about as vague as can be, but keep that in mind as the over-arching theme here – more often than not, when we read about features we didn't know about, the question is not “how does it work” but “why would I need it”. This article attempts to answer both.
A few words about the conventions in this article: All examples are in Python 3. Some things don't even work in Python 2 (like the nonlocal keyword), but even if you have no experience in Python 3, it should trivially be understandable. A code example with >>> in front of it means I have used the python command line interpreter to toy with the code. I highly encourage you to try all the examples!
Instead of the canonical, entirely useless incrementer example, I'll give you one that is at least somewhat useful. Imagine a function that logs to stdout (that is: it prints to the screen). Just like print, of course, but with the difference that it prints the logging level right before each output. And (that part is a bit weak, I know, but stay with me for the purpose of argument) there should be one function for every logging level, like so:
log_info()
log_warning()
log_error()
Let's just write them down, shall we (no closures yet):
def log_info(message): print("info:", message) def log_warning(message): print("warning:", message) def log_error(message): print("error:", message)
Now, here's quite a bit of code repetition, and if I wanted to introduce more levels (say, “debug” or “critical”), it will get even worse. Since one of my most useful coding mantras is code duplication is a bug, we should try to find another way to construct these functions for us.
We all know that Object Orientation (OO) is here to help us out, so let's write the above example with classes (still no closures):
class Log: def __init__(self, level): self._level = level def __call__(self, message): print("{}: {}".format(self._level, message)) log_info = Log("info") log_warning = Log("warning") log_error = Log("error")
Both of these versions behave exactly the same:
>>> log_info("this is a test") info: this is a test
Admittedly, as an API it's not the best fit for a class-based version, but it wouldn't get a lot more readable if we did it any other way.
Now consider this (a closure, finally):
def make_log(level): def _(message): print("{}: {}".format(level, message)) return _ log_info = make_log("info") log_warning = make_log("warning") log_error = make_log("error")
And – try it! – it behaves exactly like the two examples before. And it's not only more succinct than the class-based version, one might argue it's also better readable. But:
We see two functions here – the make_log function, and within that we have a function called _. Why is it called _? Because it does not matter. We could have given it any name we want – it's not visible outside of this function. In fact, since this is a one-liner, we could have used a lambda expression:
def make_log(level): return lambda message: print("{}: {}".format(level, message))
This works just as well. However, lambda expression are frequently frowned upon in the Python community, because more often than not they make your code less readable. I consider the above case simple enough to understand, but your co-workers might not, so my recommendation is to always use named functions (the ones created with def) and just give it a name that makes clear that it won't be used.
Now that inner function is returned by the outer function (make_log). Mind you, it's the function itself that is returned, not the return value of that function. In other words, the inner function is not called within the make_log function. Or yet another way: make_log is a function that returns a function when called.
If that sounds strange to you, then remember that everything in python is, in fact, an object. That includes functions, which means they can be used as function return values (they can be used as function parameters as well). In python, even classes are objects, which has interesting consequences that I might explore in another tutorial.
In the example above, I call make_log three times with three different arguments (“info”, “warning” and “error”). I get back three functions that have the same signatures (meaning they accept the same parameters), but behave differently. They have somehow saved the state of the variable level of the function outside the scope of the inner function _. And they still maintain that state, even after the control flow has left the scope of the function make_log. How does it work?
We have seen that make_log can be considered a kind of function factory. It creates new customized functions for us. In our case, it binds the variable level in the inner function to a value. That value could be anything, but it wouldn't make a lot of sense to have it be a static value, because then all returned functions would behave the same. No, we bind the value of the variable level to the argument we have passed to our factory function make_log. The value (say, “info”) is attached to the inner function. When it is returned and executed and the control flow reaches the variable level, Python knows how to look it up in some kind of special storage (which we are getting to know in a minute) and retrieves the value “info”. Control flow resumes, all is well.
OK, now where the are non-local values stored? As said before, functions are objects. Since they are objects, they might have attributes where things can be stored. Let's explore one of the generated functions:
>>> dir(log_info) ['__annotations__', '__call__', '__class__', '__closure__', '__code__', `'__defaults__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']`
Do you see that? There is an attribute called __closure__. Let's examine it further:
>>> log_info.__closure__ (<cell at 0xb7107d1c: str object at 0xb7207620>,)
So it looks like a one-tuple with one cell object in it. Let's take a look at that object:
>>> dir(log_info.__closure__[0]) ['__class__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'cell_contents']
See the attribute cell_contents? Maybe that's it:
>>> log_info.__closure__[0].cell_contents 'info'
Voilà! There we have it. But wait: Where does Python store the name level of the non-local variable? The answer is: It doesn't. Then how does Python know which name it should associate the value to? The answer is, surprisingly: It doesn't need to know.
Python constructs a “byte code” representation of the code that is executed on a Virtual Machine (or VM). In that byte code, there's a special operation to get values from that list, and replaces the name of the non-local variable with that operation. Consider the following definition:
def make_useless_function(): spam = "eggs" foo = "bar" def _(): return spam, foo return _
Let's have a look at this using dis, the built-in disassembler:
>>> import dis >>> useless_function = make_useless_function() >>> [cell.cell_contents for cell in useless_function.__closure__] ['bar', 'eggs'] >>> dis.dis(useless_function) 5 0 LOAD_DEREF 1 (spam) 3 LOAD_DEREF 0 (foo) 6 BUILD_TUPLE 2 9 RETURN_VALUE
I won't go into detail about what all this means; if you are really curious, you should read about stack-based languages (because the Python byte code is essentially one, and the Python VM is an interpreter for this language). The important thing can still be seen here: a LOAD_DEREF 1 loads the second value from the closure tuple, which, as we can see, is the value “eggs”. Instead of the name “spam”, the Python byte code references the index in the tuple for retrieving the needed value.
That's the end of our little excursion. I hope you had fun (I surely had).
Up to this point we have used the non-local value read-only – what if we wanted to write to it? Consider this definition:
def make_stupid_function(): spam = "eggs" def _(): spam = "bacon" return spam return _
And now let's try is:
>>> stupid_function = make_stupid_function() >>> stupid_function() 'bacon'
Works as expected, doesn't it? But when we look at the __closure__ attribute, we see:
>>> stupid_function.__closure__
That's right: We see nothing. Or to be more explicit:
>>> print(stupid_function.__closure__) None >>> import dis >>> dis.dis(stupid_function) 4 0 LOAD_CONST 1 ('bacon') 3 STORE_FAST 0 (spam) 5 6 LOAD_FAST 0 (spam) 9 RETURN_VALUE
So we see that Python does not consider spam to be a reference to a non-local value, but to a local value instead. That's all very much in line with how Python treats assignment to names, so it's really not very surprising. Just remember: If there's a variable definition in the inner function, the name of that variable shadows an equally named variable in the outer scope. So, onto another try – this time a somewhat less stupid example. Imagine a counter function that returns a number of times it has been called.
# this won't work! def make_counter(): i = 0 def _(): i += 1 return i return _
OK, looks fine. But, as the comment has given away already (sorry to spoil it): It doesn't work. Watch how it fails:
>>> counter = make_counter() >>> counter() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 5, in _ UnboundLocalError: local variable 'i' referenced before assignment
We need to tell Python that the name i is, in fact, a non-local variable and not a local one as it seems to assume. Luckily for us, there's a keyword to do exactly that, and it goes by the intuitive name of nonlocal. We correct our definition of make_counter like so:
# this will work! def make_counter(): i = 0 def _(): nonlocal i # This line is new i += 1 return i return _
And use it:
>>> counter = make_counter() >>> counter() 1 >>> counter() 2 >>> counter() 3
Note that Python 2 doesn't have the nonlocal keyword, so Python 2 only has read-only closures – one more reason to dump Python 2, really.
At this point, I assume that you know what a closure is and how it works. But where are closures useful?
Closures as decorators:
There's a part of Python where you often need closures, and that is to write decorators. While not every decorator returns a closure function, mostly it does. It gets especially closure-ish if your decorator takes arguments. Here's an example of a decorator that prints the name of the decorated function before a call of that function:
def print_name(func): def _(*args, **kwargs): print("calling function '{}'".format(func.__name__)) return func(*args, **kwargs) return _
And trying it:
>>> @print_name ... def do_nothing(): ... pass ... >>> do_nothing() calling function 'do_nothing'
It is OK if you don't understand exactly what is going on here. Using decorators is easy, but writing them can be an exercise in brain-bending at first. The @-syntax is just syntactic sugar, really – the above application of the decoration could have been written like this:
>>> def do_nothing(): ... pass ... >>> do_nothing = print_name(do_nothing) >>> do_nothing() calling function '_do_nothing'
We see that print_name is analogous to our make_log function in the first example as it behaves like a function factory. However, it takes a function as its argument. It returns a new function that is then named as the function we gave as an argument. I repeat that: It's a new function, just the same name. The old function is a non-local value of the new function, and usually it is used to call that old function as we did with
return func(*args, **kwargs)
However, decorators don't need to call the function that is passed to them – sometimes it makes sense to, say, return a dummy function. But all that is subject to another article (I'm not promising anything).
When do you need to write decorators? If you develop an application, chances are you don't have to write a single decorator. But if you write a library or a framework, decorators often make for a simpler API than what you could achieve by inheriting classes or other, more explicit techniques. See, for example, how flask uses decorators for routing. Often decorators are used to register functions as plug-ins, or as callbacks to signals, or the like. Depending on the library or framework you write, you might want to consider decorators as part of the API.
The nice thing about decorators is that even if you don't really know how they work, you can use them just as well. So don't hesitate to expose nice and helpful decorators to your users, even if the implementation is tricky.
Closures as alternatives to class-based objects
In our very first example, we have seen how a function factory that returns a closure function can operate analogously to a class that returns an object. Often times, that makes for a more elegant and concise version of the same logic. However, if you are writing a library/framework, my recommendation is to rarely, if ever, use such a function factory as part of your API. The reason is simple: A Python programmer doesn't expect it. We are used to classes and functions as part of an API, but functions that return functions, and worse: functions that maintain some kind of state – that can be really surprising. Don't surprise users of your software.
What about using closures within your software, without exposing the magic? Here it can make a whole lot of sense. Let's say we make a simple logger module – save the following example as log.py:
def _make_log(level): def _(message): print("{}: {}".format(level, message)) return _ info = _make_log("info") warning = _make_log("warning") error = _make_log("error")
That is almost exactly like our first closure example, just with some re-naming. Enter the interactive interpreter and try this module:
>>> import log >>> log.info("this is a test") info: this is a test >>> log.warning("Uh oh") warning: Uh oh
To the user of this module, the functions info, warning and error appear as regular functions. The API is even somewhat sane (almost as in the official logging package – see this simple logging example).
Programmers with a Java background might be irritated by how not everything is wrapped in some kind of class syntax, but, dear Java coder – consider it as an opportunity to free your mind. Or, if you are still not convinced, see how a module is syntactically just like an instantiated object, and functions of that objects are like its methods. That is, think of the log module as an instance of some hypothetical class Log. Does it make you feel better? Good. Because there's really not much of a difference – in Python, classes are just some kind of blessed namespaces (with a lot of magic to hide that fact).
So, as long as your closure-based code is readable to a hypothetical co-programmer (and that might be you in a few years, when you have – god forbid! – forgotten everything you learned about closures in this article) as a class-based version, feel free to use it. However, you might be disappointed that very often, this is not the case. Python programmers are used to class-based code, and so you should not unnecessarily put the bar for potential contributers (like your future self) higher than it needs to be.
Closures and callbacks
If you have ever written code with twisted, written GUI code with PyGObject (GTK) or PySide (Qt), you surely know what callbacks are. For everyone else: A callback is a kind of function that is called whenever some kind of event happens. Typically, you pass function A as an argument for another function B, and that function B may call A (or not). It is often called for some asynchronous I/O, like so:
get_http_return_code("http://www.example.com", on_success, on_error)
where on_success and on_error are functions we have declared somewhere. on_success is then called by the get_http_return_code function if the URL could be reached, on_error if not. Hence the name callback (“we call you back”, and then mostly they don't – you know about it, don't you ...).
In GUI frameworks it is mostly tied to a signal/slot mechanism. Some basic register function call might look like that:
connect("button_press", on_button_press)
Again, assume we have a function called on_button_press declared. When a "button_press" signal is emitted, this function is called.
Canonically, these callbacks are methods of an object that tracks the applications (or just the GUIs) state. Here's some (pseudo-)code for a fictional GUI framework named MyGUI that prints an increasing number for every time a button is pressed:
class App(MyGUI.Window): def __init__(self): super.__init__() self._counter = 0 self.button = MyGUI.Button() self.attach(self.button) self.button.connect("click", self._on_button_clicked) self.show() def _on_button_clicked(self, button): self._counter += 1 print(self._counter) app = App() app.run()
Heres how a non-class-based version might look like:
def run_app(): counter = 0 def on_button_clicked(button): nonlocal counter counter += 1 print(self.counter) window = MyGUI.Window() button = MyGUI.Button() window.attach(button) button.connect("click", on_button_clicked) window.show() run_app()
Or a mix-and-match version:
class App(MyGUI.Window): def __init__(self): super.__init__() counter = 0 def on_button_clicked(button): nonlocal counter counter += 1 print(counter) self.button = MyGUI.Button() self.attach(self.button) self.button.connect("click", on_button_clicked) self.show() app = App() app.run()
This is not an optimal example of showing possible advantages of closure-based code; I chose it solely because it allows for a side-by-side comparison. There are no differences in how they behave, but one can see a very distinctive trait: In the first version, the callback and the counter are private members, but in the last version, they don't even exist as attributes of the app object. Meaning: They are not even private, so they can't even be accessed from methods of App instances. From a architectural point of view, closures allow for very tight access control. In our case, the on_button_clicked function should only ever be called when a signal "click" is emitted, so we don't need to expose it to the rest of the methods. Likewise, counter is only ever needed in that function, so we bind it to the function via the closure mechanism and don't expose it on the object namespace.
GUI programming (which is mostly OO, because that paradigm is very helpful there) aside, we usually encounter callbacks in non-blocking network frameworks. Here, text-book OO is often syntactic overkill and leads to a lot of boilerplate code. So when a more functional style is chosen, closures come as a natural way to inject a value (that is: an object) into a callback.
We programmers posses a kind of mental toolkit – when we encounter a problem, we mentally try out each of the tools to see which one suits best. A good programmer is one that has a lot of tools to chose from.
I hope that you, dear reader, did learn about what closures are and how they work, but most importantly, that you were able to add the tool closure to your toolbox, making you an even better programmer than you already were. Thanks for reading!