Improve Your Python: 'yield' and Generators Explained

都是转的,中英文对照

提高你的Python: 解释‘yield’和‘Generators(生成器)’

在开始课程之前,我要求学生们填写一份调查表,这个调查表反映了它们对Python中一些概念的理解情况。一些话题("if/else控制流" 或者 "定义和使用函数")对于大多数学生是没有问题的。但是有一些话题,大多数学生只有很少,或者完全没有任何接触,尤其是“生成器和yield关键字”。我猜这对大多数新手Python程序员也是如此。

有事实表明,在我花了大功夫后,有些人仍然不能理解生成器和yield关键字。我想让这个问题有所改善。在这篇文章中,我将解释yield关键字到底是什么,为什么它是有用的,以及如何来使用它。

注意:最近几年,生成器的功能变得越来越强大,它已经被加入到了PEP。在我的下一篇文章中,我会通过协程(coroutine),协同式多任务处理(cooperative multitasking),以及异步IO(asynchronous I/O)(尤其是GvR正在研究的 "tulip" 原型的实现)来介绍yield的真正威力。但是在此之前,我们要对生成器和yield有一个扎实的理解.

协程与子例程


我们调用一个普通的Python函数时,一般是从函数的第一行代码开始执行,结束于return语句、异常或者函数结束(可以看作隐式的返回None)。一旦函数将控制权交还给调用者,就意味着全部结束。函数中做的所有工作以及保存在局部变量中的数据都将丢失。再次调用这个函数时,一切都将从头创建。

对于在计算机编程中所讨论的函数,这是很标准的流程。这样的函数只能返回一个值,不过,有时可以创建能产生一个序列的函数还是有帮助的。要做到这一点,这种函数需要能够“保存自己的工作”。

我说过,能够“产生一个序列”是因为我们的函数并没有像通常意义那样返回。return隐含的意思是函数正将执行代码的控制权返回给函数被调用的地方。而"yield"的隐含意思是控制权的转移是临时和自愿的,我们的函数将来还会收回控制权。

在Python中,拥有这种能力的“函数”被称为生成器,它非常的有用。生成器(以及yield语句)最初的引入是为了让程序员可以更简单的编写用来产生值的序列的代码。 以前,要实现类似随机数生成器的东西,需要实现一个类或者一个模块,在生成数据的同时保持对每次调用之间状态的跟踪。引入生成器之后,这变得非常简单。

为了更好的理解生成器所解决的问题,让我们来看一个例子。在了解这个例子的过程中,请始终记住我们需要解决的问题:生成值的序列。

注意:在Python之外,最简单的生成器应该是被称为协程(coroutines)的东西。在本文中,我将使用这个术语。请记住,在Python的概念中,这里提到的协程就是生成器。Python正式的术语是生成器;协程只是便于讨论,在语言层面并没有正式定义。

例子:有趣的素数

假设你的老板让你写一个函数,输入参数是一个int的list,返回一个可以迭代的包含素数1 的结果。

记住,迭代器(Iterable) 只是对象每次返回特定成员的一种能力。

你肯定认为"这很简单",然后很快写出下面的代码:

01 def get_primes(input_list):
02     result_list = list()
03     for element in input_list:
04         if is_prime(element):
05             result_list.append()
06  
07     return result_list
08  
09 # 或者更好一些的...
10  
11 def get_primes(input_list):
12     return (element for element in input_list if is_prime(element))
13  
14 # 下面是 is_prime 的一种实现...
15  
16 def is_prime(number):
17     if number > 1:
18         if number == 2:
19             return True
20         if number % 2 == 0:
21             return False
22         for current in range(3, int(math.sqrt(number) + 1), 2):
23             if number % current == 0:
24                 return False
25         return True
26     return False

上面 is_prime 的实现完全满足了需求,所以我们告诉老板已经搞定了。她反馈说我们的函数工作正常,正是她想要的。

处理无限序列

噢,真是如此吗?过了几天,老板过来告诉我们她遇到了一些小问题:她打算把我们的get_primes函数用于一个很大的包含数字的list。实际上,这个list非常大,仅仅是创建这个list就会用完系统的所有内存。为此,她希望能够在调用get_primes函数时带上一个start参数,返回所有大于这个参数的素数(也许她要解决 Project Euler problem 10)。

我们来看看这个新需求,很明显只是简单的修改get_primes是不可能的。 自然,我们不可能返回包含从start到无穷的所有的素数的列表 (虽然有很多有用的应用程序可以用来操作无限序列)。看上去用普通函数处理这个问题的可能性比较渺茫。

在我们放弃之前,让我们确定一下最核心的障碍,是什么阻止我们编写满足老板新需求的函数。通过思考,我们得到这样的结论:函数只有一次返回结果的机会,因而必须一次返回所有的结果。得出这样的结论似乎毫无意义;“函数不就是这样工作的么”,通常我们都这么认为的。可是,不学不成,不问不知,“如果它们并非如此呢?”

想象一下,如果get_primes可以只是简单返回下一个值,而不是一次返回全部的值,我们能做什么?我们就不再需要创建列表。没有列表,就没有内存的问题。由于老板告诉我们的是,她只需要遍历结果,她不会知道我们实现上的区别。

不幸的是,这样做看上去似乎不太可能。即使是我们有神奇的函数,可以让我们从n遍历到无限大,我们也会在返回第一个值之后卡住:

1 def get_primes(start):
2     for element in magical_infinite_range(start):
3         if is_prime(element):
4             return element
假设这样去调用get_primes:
1 def solve_number_10():
2     # She *is* working on Project Euler #10, I knew it!
3     total = 2
4     for next_prime in get_primes(3):
5         if next_prime < 2000000:
6             total += next_prime
7         else:
8             print(total)
9             return

显然,在get_primes中,一上来就会碰到输入等于3的,并且在函数的第4行返回。与直接返回不同,我们需要的是在退出时可以为下一次请求准备一个值。

不过函数做不到这一点。当函数返回时,意味着全部完成。我们保证函数可以再次被调用,但是我们没法保证说,“呃,这次从上次退出时的第4行开始执行,而不是常规的从第一行开始”。函数只有一个单一的入口:函数的第1行代码。

走进生成器

这类问题极其常见以至于Python专门加入了一个结构来解决它:生成器。一个生成器会“生成”值。创建一个生成器几乎和生成器函数的原理一样简单。

一个生成器函数的定义很像一个普通的函数,除了当它要生成一个值的时候,使用yield关键字而不是return。如果一个def的主体包含yield,这个函数会自动变成一个生成器(即使它包含一个return)。除了以上内容,创建一个生成器没有什么多余步骤了。

生成器函数返回生成器的迭代器。这可能是你最后一次见到“生成器的迭代器”这个术语了, 因为它们通常就被称作“生成器”。要注意的是生成器就是一类特殊的迭代器。作为一个迭代器,生成器必须要定义一些方法(method),其中一个就是__next__()。如同迭代器一样,我们可以使用next()函数来获取下一个值。

为了从生成器获取下一个值,我们使用next()函数,就像对付迭代器一样。

(next()会操心如何调用生成器的__next__()方法)。既然生成器是一个迭代器,它可以被用在for循环中。

每当生成器被调用的时候,它会返回一个值给调用者。在生成器内部使用yield来完成这个动作(例如yield 7)。为了记住yield到底干了什么,最简单的方法是把它当作专门给生成器函数用的特殊的return(加上点小魔法)。**

yield就是专门给生成器用的return(加上点小魔法)。

下面是一个简单的生成器函数:

1 >>> def simple_generator_function():
2 >>>    yield 1
3 >>>    yield 2
4 >>>    yield 3
这里有两个简单的方法来使用它:
01 >>> for value in simple_generator_function():
02 >>>     print(value)
03 1
04 2
05 3
06 >>> our_generator = simple_generator_function()
07 >>> next(our_generator)
08 1
09 >>> next(our_generator)
10 2
11 >>> next(our_generator)
12 3

魔法?

那么神奇的部分在哪里?我很高兴你问了这个问题!当一个生成器函数调用yield,生成器函数的“状态”会被冻结,所有的变量的值会被保留下来,下一行要执行的代码的位置也会被记录,直到再次调用next()。一旦next()再次被调用,生成器函数会从它上次离开的地方开始。如果永远不调用next(),yield保存的状态就被无视了。

我们来重写get_primes()函数,这次我们把它写作一个生成器。注意我们不再需要magical_infinite_range函数了。使用一个简单的while循环,我们创造了自己的无穷串列。

1 def get_primes(number):
2     while True:
3         if is_prime(number):
4             yield number
5         number += 1
如果生成器函数调用了return,或者执行到函数的末尾,会出现一个StopIteration异常。 这会通知next()的调用者这个生成器没有下一个值了(这就是普通迭代器的行为)。这也是这个while循环在我们的get_primes()函数出现的原因。如果没有这个while,当我们第二次调用next()的时候,生成器函数会执行到函数末尾,触发StopIteration异常。一旦生成器的值用完了,再调用next()就会出现错误,所以你只能将每个生成器的使用一次。下面的代码是错误的:
01 >>> our_generator = simple_generator_function()
02 >>> for value in our_generator:
03 >>>     print(value)
04  
05 >>> # 我们的生成器没有下一个值了...
06 >>> print(next(our_generator))
07 Traceback (most recent call last):
08   File "", line 1, in
09     next(our_generator)
10 StopIteration
11  
12 >>> # 然而,我们总可以再创建一个生成器
13 >>> # 只需再次调用生成器函数即可
14  
15 >>> new_generator = simple_generator_function()
16 >>> print(next(new_generator)) # 工作正常
17 1

因此,这个while循环是用来确保生成器函数永远也不会执行到函数末尾的。只要调用next()这个生成器就会生成一个值。这是一个处理无穷序列的常见方法(这类生成器也是很常见的)。

执行流程

让我们回到调用get_primes的地方:solve_number_10。

1 def solve_number_10():
2     # She *is* working on Project Euler #10, I knew it!
3     total = 2
4     for next_prime in get_primes(3):
5         if next_prime < 2000000:
6             total += next_prime
7         else:
8             print(total)
9             return

我们来看一下solve_number_10的for循环中对get_primes的调用,观察一下前几个元素是如何创建的有助于我们的理解。当for循环从get_primes请求第一个值时,我们进入get_primes,这时与进入普通函数没有区别。

  1. 进入第三行的while循环
  2. 停在if条件判断(3是素数)
  3. 通过yield将3和执行控制权返回给solve_number_10

接下来,回到insolve_number_10:

  1. for循环得到返回值3
  2. for循环将其赋给next_prime
  3. total加上next_prime
  4. for循环从get_primes请求下一个值

这次,进入get_primes时并没有从开头执行,我们从第5行继续执行,也就是上次离开的地方。

1 def get_primes(number):
2     while True:
3         if is_prime(number):
4             yield number
5         number += 1 # <<<<<<<<<<

最关键的是,number还保持我们上次调用yield时的值(例如3)。记住,yield会将值传给next()的调用方,同时还会保存生成器函数的“状态”。接下来,number加到4,回到while循环的开始处,然后继续增加直到得到下一个素数(5)。我们再一次把number的值通过yield返回给solve_number_10的for循环。这个周期会一直执行,直到for循环结束(得到的素数大于2,000,000)。

更给力点

在PEP 342中加入了将值传给生成器的支持。PEP 342加入了新的特性,能让生成器在单一语句中实现,生成一个值(像从前一样),接受一个值,或同时生成一个值并接受一个值。

我们用前面那个关于素数的函数来展示如何将一个值传给生成器。这一次,我们不再简单地生成比某个数大的素数,而是找出比某个数的等比级数大的最小素数(例如10, 我们要生成比10,100,1000,10000 ... 大的最小素数)。我们从get_primes开始:

01 def print_successive_primes(iterations, base=10):
02     # 像普通函数一样,生成器函数可以接受一个参数
03     
04     prime_generator = get_primes(base)
05     # 这里以后要加上点什么
06     for power in range(iterations):
07         # 这里以后要加上点什么
08  
09 def get_primes(number):
10     while True:
11         if is_prime(number):
12         # 这里怎么写?
 get_primes的后几行需要着重解释。yield关键字返回number的值,而像 other = yield foo 这样的语句的意思是,"返回foo的值,这个值返回给调用者的同时,将other的值也设置为那个值"。你可以通过send方法来将一个值”发送“给生成器。
1 def get_primes(number):
2     while True:
3         if is_prime(number):
4             number = yield number
5         number += 1
通过这种方式,我们可以在每次执行yield的时候为number设置不同的值。现在我们可以补齐print_successive_primes中缺少的那部分代码:
1 def print_successive_primes(iterations, base=10):
2     prime_generator = get_primes(base)
3     prime_generator.send(None)
4     for power in range(iterations):
5         print(prime_generator.send(base ** power))

这里有两点需要注意:首先,我们打印的是generator.send的结果,这是没问题的,因为send在发送数据给生成器的同时还返回生成器通过yield生成的值(就如同生成器中yield语句做的那样)。

第二点,看一下prime_generator.send(None)这一行,当你用send来“启动”一个生成器时(就是从生成器函数的第一行代码执行到第一个yield语句的位置),你必须发送None。这不难理解,根据刚才的描述,生成器还没有走到第一个yield语句,如果我们发生一个真实的值,这时是没有人去“接收”它的。一旦生成器启动了,我们就可以像上面那样发送数据了。

综述

在本系列文章的后半部分,我们将讨论一些yield的高级用法及其效果。yield已经成为Python最强大的关键字之一。现在我们已经对yield是如何工作的有了充分的理解,我们已经有了必要的知识,可以去了解yield的一些更“费解”的应用场景。

不管你信不信,我们其实只是揭开了yield强大能力的一角。例如,send确实如前面说的那样工作,但是在像我们的例子这样,只是生成简单的序列的场景下,send几乎从来不会被用到。下面我贴一段代码,展示send通常的使用方式。对于这段代码如何工作以及为何可以这样工作,在此我并不打算多说,它将作为第二部分很不错的热身。

01 import random
02  
03 def get_data():
04     """返回0到9之间的3个随机数"""
05     return random.sample(range(10), 3)
06  
07 def consume():
08     """显示每次传入的整数列表的动态平均值"""
09     running_sum = 0
10     data_items_seen = 0
11  
12     while True:
13         data = yield
14         data_items_seen += len(data)
15         running_sum += sum(data)
16         print('The running average is {}'.format(running_sum / float(data_items_seen)))
17  
18 def produce(consumer):
19     """产生序列集合,传递给消费函数(consumer)"""
20     while True:
21         data = get_data()
22         print('Produced {}'.format(data))
23         consumer.send(data)
24         yield
25  
26 if __name__ == '__main__':
27     consumer = consume()
28     consumer.send(None)
29     producer = produce(consumer)
30  
31     for _ in range(10):
32         print('Producing...')
33         next(producer)

请谨记……

我希望您可以从本文的讨论中获得一些关键的思想:

  • generator是用来产生一系列值的
  • yield则像是generator函数的返回结果
  • yield唯一所做的另一件事就是保存一个generator函数的状态
  • generator就是一个特殊类型的迭代器(iterator)
  • 和迭代器相似,我们可以通过使用next()来从generator中获取下一个值
  • 通过隐式地调用next()来忽略一些值

我希望这篇文章是有益的。如果您还从来没有听说过generator,我希望现在您可以理解它是什么以及它为什么是有用的,并且理解如何使用它。如果您已经在某种程度上比较熟悉generator,我希望这篇文章现在可以让您扫清对generator的一些困惑。

同往常一样,如果某一节的内容不是很明确(或者某节内容更重要,亦或某些内容包含错误),请尽一切办法让我知晓。您可以在下面留下您的评论、给[email protected]发送电子邮件或在Twitter中@jeffknupp。


Improve Your Python: 'yield' and Generators Explained

Posted on Apr 07, 2013 by Jeff Knupp

Prior to beginning tutoring sessions, I ask new students to fill out a briefself-assessment where they rate their understanding of various Python concepts. Some topics ("control flow with if/else" or "defining and using functions") are understood by a majority of students before ever beginning tutoring. There are ahandful of topics, however, that almost all students report having noknowledge or very limited understanding of. Of these, "generators and the yield keyword" is one of the biggest culprits. I'm guessing this is the case for mostnovice Python programmers.

Many report having difficulty understanding generators and the yield keyword even after making a concerted effort to teach themselves the topic.I want to change that. In this post, I'll explain what the yield keyword does, why it's useful, and how to use it.

Note: In recent years, generators have grown more powerful as features have been added through PEPs. In my next post, I'll explore the true power of yield with respect to coroutines, cooperative multitasking and asynchronous I/O (especially their use in the "tulip" prototype implementation GvR has been working on). Before we get there, however, we need a solid understanding of how the yield keyword and generators work.

Coroutines and Subroutines

When we call a normal Python function, execution starts at function's first lineand continues until a return statement, exception, or the end of thefunction (which is seen as an implicit return None) is encountered.Once a function returns control to its caller, that's it. Any work done by thefunction and stored in local variables is lost. A new call to the functioncreates everything from scratch.

This is all very standard when discussing functions (more generally referred to as subroutines) incomputer programming. There are times, though, when it's beneficial to havethe ability to create a "function" which, instead of simply returning a singlevalue, is able to yield a series of values. To do so, such a function would needto be able to "save its work," so to speak.

I said, "yield a series of values" because our hypothetical function doesn't "return" in the normal sense. return implies that the function is returning control of execution to the point where the function was called. "Yield," however, implies that the transfer of control is temporary and voluntary, and our function expects to regain it in the future.

In Python, "functions" with these capabilities are called generators, and they're incredibly useful. generators (and the yield statement) were initially introduced to give programmers a more straightforward way to write code responsible for producing a series ofvalues. Previously, creating something like a random number generator requireda class or module that both generated values and kept track of state between calls. With the introduction of generators, this became much simpler.

To better understand the problem generators solve, let's take a look at an example. Throughout the example, keep in mind the core problem being solved:generating a series of values.

Note: Outside of Python, all but the simplest generators would be referred to as coroutines. I'll use the latter term later in the post. The important thing to remember is, in Python, everything described here as a coroutine is still a generator. Python formally defines the term generator; coroutine is used in discussion but has no formal definition in the language.

Example: Fun With Prime Numbers

Suppose our boss asks us to write a function that takes a list of ints and returns some Iterable containing the elements which are prime1 numbers.

Remember, an Iterable is just an object capable of returning its members one at a time.

"Simple," we say, and we write the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def get_primes(input_list):
    result_list = list()
    for element in input_list:
        if is_prime(element):
            result_list.append()

    return result_list

# or better yet...

def get_primes(input_list):
    return (element for element in input_list if is_prime(element))

# not germane to the example, but here's a possible implementation of
# is_prime...

def is_prime(number):
    if number > 1:
        if number == 2:
            return True
        if number % 2 == 0:
            return False
        for current in range(3, int(math.sqrt(number) + 1), 2):
            if number % current == 0: 
                return False
        return True
    return False

Either is_prime implementation above fulfills the requirements, so we tell our boss we're done. She reports our function works and is exactly what she wanted.

Dealing With Infinite Sequences

Well, not quite exactly. A few days later, our boss comes back and tells us she's run into a small problem: she wants to use our get_primes function on avery large list of numbers. In fact, the list is so large that merely creating it would consume all of the system's memory. To work around this, she wants to be able to call get_primes with a start value and get all the primes larger than start (perhaps she's solving Project Euler problem 10).

Once we think about this new requirement, it becomes clear that it requires more than a simple change to get_primes. Clearly, we can't return a list of all the prime numbers from start to infinity (operating on infinite sequences, though, has a wide range of useful applications). The chances of solving this problem using a normal function seem bleak.

Before we give up, let's determine the core obstacle preventing us from writing a function that satisfies our boss's new requirements.Thinking about it, we arrive at the following: functions only get one chance to return results, and thus must return all results at once.It seems pointless to make such an obvious statement; "functions justwork that way," we think. The real value lies in asking, "but what if theydidn't?"

Imagine what we could do if get_primes could simply return the next valueinstead of all the values at once. It wouldn't need to createa list at all. No list, no memory issues. Since our boss told us she's just iterating over the results, she wouldn't know the difference.

Unfortunately, this doesn't seem possible. Even if we had a magical function that allowed us to iterate from n to infinity, we'd get stuck after returning the first value:

1
2
3
4
def get_primes(start):
    for element in magical_infinite_range(start):
        if is_prime(element):
            return element

Imagine get_primes is called like so:

1
2
3
4
5
6
7
8
9
def solve_number_10():
    # She *is* working on Project Euler #10, I knew it!
    total = 2
    for next_prime in get_primes(3):
        if next_prime < 2000000:
            total += next_prime
        else:
            print(total)
            return

Clearly, in get_primes, we would immediately hit the case where number = 3 and return at line 4.Instead of return, we need a way to generate a value and, when asked for the next one, pick up where we left off.

Functions, though, can't do this. When they return, they'redone for good. Even if we could guarantee a function would be called again, wehave no way of saying, "OK, now, instead of starting at the first line likewe normally do, start up where we left off at line 4." Functions have a single entrypoint: the first line.

Enter the Generator

This sort of problem is so common that a new construct was added to Pythonto solve it: the generator. A generator "generates" values. Creatinggenerators was made as straightforward as possible through the concept of generator functions, introduced simultaneously.

A generator function is defined like a normal function, but whenever it needs to generate avalue, it does so with the yield keyword rather than return. If the body of a def contains yield, the function automatically becomes a generator function (even if italso contains a return statement). There's nothing else we need to do to create one.

generator functions create generator iterators. That's the last time you'll see the term generator iterator, though, since they're almostalways referred to as "generators". Just remember that a generatoris a special type of iterator. To be considered an iterator, generators must define a few methods, one of which is __next__(). To get the next value from a generator, we use the same built-in function asfor iterators: next().

This point bear repeating: to get the next value from a generator, we use the same built-in function as for iterators: next().

(next() takes care of calling the generator's __next__() method). Since agenerator is a type of iterator, it can be used in a for loop.

So whenever next() is called on a generator, the generator is responsiblefor passing back a value to whomever called next(). It does so by calling yieldalong with the value to be passed back (e.g. yield 7). The easiest way to rememberwhat yield does is to think of it as return (plus a little magic) for generator functions.**

Again, this bears repeating: yield is just return (plus a little magic) for generator functions.

Here's a simple generator function:

1
2
3
4
>>> def simple_generator_function():
>>>    yield 1
>>>    yield 2
>>>    yield 3

And here are two simple ways to use it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
>>> for value in simple_generator_function():
>>>     print(value)
1
2
3
>>> our_generator = simple_generator_function()
>>> next(our_generator)
1
>>> next(our_generator)
2
>>> next(our_generator)
3

Magic?

What's the magic part? Glad you asked! When a generator function calls yield, the "state" of the generator function is frozen; the values of all variables are saved and the next line of code to be executed is recorded until next() is calledagain. Once it is, the generator function simply resumes where it left off.If next() is never called again, the state recorded during the yield call is (eventually) discarded.

Let's rewrite get_primes as a generator function. Notice that we no longer need the magical_infinite_range function. Using a simple while loop, we can create our own infinite sequence:

1
2
3
4
5
def get_primes(number):
    while True:
        if is_prime(number):
            yield number
        number += 1

If a generator function calls return or reaches the end its definition, aStopIteration exception is raised. This signals to whoever was calling next()that the generator is exhausted (this is normal iterator behavior). It is also the reason the while True: loop is present in get_primes. If it weren't, the first time next() was called we would check if the number is prime and possibly yield it. If next() were called again, we would uselessly add 1 to number and hit the end of thegenerator function (causing StopIteration to be raised). Once a generator has been exhausted, calling next() on it will result in an error, so you can only consume all the values of a generator once. The following will not work:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
>>> our_generator = simple_generator_function()
>>> for value in our_generator:
>>>     print(value)

>>> # our_generator has been exhausted...
>>> print(next(our_generator))
Traceback (most recent call last):
  File "", line 1, in <module>
    next(our_generator)
StopIteration

>>> # however, we can always create a new generator
>>> # by calling the generator function again...

>>> new_generator = simple_generator_function()
>>> print(next(new_generator)) # perfectly valid
1

Thus, the while loop is there to make sure we never reach the end ofget_primes. It allows us to generate a value for as long as next() is calledon the generator. This is a common idiom when dealing with infinite series (andgenerators in general).

Visualizing the flow

Let's go back to the code that was calling get_primes: solve_number_10.

1
2
3
4
5
6
7
8
9
def solve_number_10():
    # She *is* working on Project Euler #10, I knew it!
    total = 2
    for next_prime in get_primes(3):
        if next_prime < 2000000:
            total += next_prime
        else:
            print(total)
            return

It's helpful to visualize how the first few elements are created when we callget_primes in solve_number_10's for loop. When the for loop requests the first value from get_primes, we enter get_primes as we would in a normal function.

  1. We enter the while loop on line 3
  2. The if condition holds (3 is prime)
  3. We yield the value 3 and control to solve_number_10.

Then, back in solve_number_10:

  1. The value 3 is passed back to the for loop
  2. The for loop assigns next_prime to this value
  3. next_prime is added to total
  4. The for loop requests the next element from get_primes

This time, though, instead of entering get_primes back at the top, we resume at line 5, where we left off.

1
2
3
4
5
def get_primes(number):
    while True:
        if is_prime(number):
            yield number
        number += 1 # <<<<<<<<<<

Most importantly, number still has the same value it did when we called yield(i.e. 3). Remember, yield both passes a value to whoever called next(),and saves the "state" of the generator function. Clearly, then, number is incremented to 4, we hit the top of the while loop, and keep incrementing number until we hit the next prime number (5). Again we yield the value of number to the for loop in solve_number_10. This cycle continues until the for loop stops (at the first prime greater than 2,000,000).

Moar Power

In PEP 342, support was added for passing values into generators. PEP 342 gave generators the power to yield a value (as before), receive avalue, or both yield a value and receive a (possibly different) value in a single statement.

To illustrate how values are sent to a generator, let's return to our prime number example. This time, instead of simply printing every prime number greater than number, we'll find the smallest prime number greater than successive powers of a number (i.e. for 10, we wantthe smallest prime greater than 10, then 100, then 1000, etc.). We start in the same way as get_primes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def print_successive_primes(iterations, base=10):
    # like normal functions, a generator function
    # can be assigned to a variable

    prime_generator = get_primes(base)
    # missing code...
    for power in range(iterations):
        # missing code...

def get_primes(number):
    while True:
        if is_prime(number):
        # ... what goes here?

The next line of get_primes takes a bit of explanation. While yield number would yield thevalue of number, a statement of the form other = yield foo means, "yield foo and,when a value is sent to me, set other to that value." You can "send" values toa generator using the generator's send method.

1
2
3
4
5
def get_primes(number):
    while True:
        if is_prime(number):
            number = yield number
        number += 1

In this way, we can set number to a different value each time the generatoryields. We can now fill in the missing code in print_successive_primes:

1
2
3
4
5
def print_successive_primes(iterations, base=10):
    prime_generator = get_primes(base)
    prime_generator.send(None)
    for power in range(iterations):
        print(prime_generator.send(base ** power))

Two things to note here: First, we're printing the result of generator.send,which is possible because send both sends a value to the generator andreturns the value yielded by the generator (mirroring how yield works fromwithin the generator function).

Second, notice the prime_generator.send(None) line. When you're using send to "start" a generator (that is, execute the code from the first line of the generator function up tothe first yield statement), you must send None. This makes sense, since by definitionthe generator hasn't gotten to the first yield statement yet, so if we sent areal value there would be nothing to "receive" it. Once the generator is started, wecan send values as we do above.

Round-up

In the second half of this series, we'll discuss the various ways in whichgenerators have been enhanced and the power they gained as a result. yield hasbecome one of the most powerful keywords in Python. Now that we've built a solidunderstanding of how yield works, we have the knowledge necessaryto understand some of the more "mind-bending" things that yield can be used for.

Believe it or not, we've barely scratched the surface of the power of yield.For example, while send does work as described above, it's almost neverused when generating simple sequences like our example. Below, I've pasteda small demonstration of one common way send is used. I'll not say any moreabout it as figuring out how and why it works will be a good warm-up for parttwo.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import random

def get_data():
    """Return 3 random integers between 0 and 9"""
    return random.sample(range(10), 3)

def consume():
    """Displays a running average across lists of integers sent to it"""
    running_sum = 0
    data_items_seen = 0

    while True:
        data = yield
        data_items_seen += len(data)
        running_sum += sum(data)
        print('The running average is {}'.format(running_sum / float(data_items_seen)))

def produce(consumer):
    """Produces a set of values and forwards them to the pre-defined consumer
    function"""
    while True:
        data = get_data()
        print('Produced {}'.format(data))
        consumer.send(data)
        yield

if __name__ == '__main__':
    consumer = consume()
    consumer.send(None)
    producer = produce(consumer)

    for _ in range(10):
        print('Producing...')
        next(producer)

Remember...

There are a few key ideas I hope you take away from thisdiscussion:

  • generators are used to generate a series of values
  • yield is like the return of generator functions
  • The only other thing yield does is save the "state" of a generator function
  • A generator is just a special type of iterator
  • Like iterators, we can get the next value from a generator using next()
    • for gets values by calling next() implicitly

I hope this post was helpful. If you had never heard of generators, I hope you now understand what they are,why they're useful, and how to use them. If you were somewhat familiar withgenerators, I hope any confusion is now cleared up.

As always, if any section is unclear (or, more importantly, contains errors), byall means let me know. You can leave a comment below, email me [email protected], or hit me up on Twitter@jeffknupp.


  1. Quick refresher: a prime number is a positive integer greater than 1that has no divisors other than 1 and itself. 3 is prime because there are nonumbers that evenly divide it other than 1 and 3 itself. ↩


你可能感兴趣的:(python)