Iterator and Iterable


unfortunately, the versions  of Setence below  are bad ideas and not pythonic.


import re
import reprlib

RE_WORD = re.compile("\w+")
class Sentence(object):
    def __init__(self, text):
        self.words = RE_WORD.findall(text)

    def __getitem__(self, idx):
        return self.words[idx]

    def __len__(self):
        return len(self.words)

    def __repr__(self):
        return "Sentence ({})".format(reprlib.repr(self.text))

Iterable 对象的定义

an object is considered iterable if it implements the __iter__ method 
class Foo(object):
    """docstring for Foo"""
    def __iter__(self):

>>> from collections import abc
>>> issubclass(Foo, abc.Iterable)
>>> isinstance(Foo(), abc.Iterable)

 However, the most accurate way the check whether an object is iterable is to call iter(x) and handle the TypeError
exception if it isn't. This is more accurate than using isinstance(x, abc.Iterable), because iter(x) also considers
the legacy __getitem__ method, while the Iterable ABC does not.
但是,最准确的方法,判断一个对象是不是Iterable的,是利用try except语句。调用iter(anObject),并处理TypeError异常。


具体而言:Objects implements   an __iter__ method returning an iterator are iterable. (实现了__iter__方法并返回一个迭代器的对象是可迭代的)

Sequences are always iterable(序列都是可迭代的,因为它们都实现了__getitem__方法); 

as are object implements a __getitem__ method that takes 0-based indexes.(任何实现了__getitem__方法,并从0开始索引的对象是可迭代的。)




用代码来说, for循环的内部机制其实包括了一个while循环和try...except...异常处理。


# the for manchinery by hand with a while  loop.
>>> s = 'ABC'
>>> for e in s:
that is like:
>>> s = 'ABC'
>>> it = iter(s)
>>> while True:
        except StopIteration, e:
            del it                 # decreace the reference by 1

Iterable 与 Iterator 的关系

Python obtains iterators from iterables.

in other words, iterable builds iterator.

比如,sequences都是Iterable的,python解释器依靠内置函数iter(), 获得sequence的iterator。

正如上面的 iter(s)。



1. __next__方法。return the next available item, raisingStopIteration when there are no more items.

2. __iter__方法。 return self; this allows iterators to be used where an iterable is expected, for example, in a for loop.

abc.Iterator 内部机制

# abc.Iterator class. __file__ = 'Lib/_collections_abc.py'

class Iterator(Iterable):
    slots = () # can not be used as an instance
    def __iter__(self):
        return self
    def __next__(self):
        raise StopIteration

    def __subclasshook__(cls, C):
        if cls is Iterator:
            if any("__next__" in B.__dict__ for B in C.__mro__) and any("__iter__" in B.__dict__ for B in C.__mro__):
                return True
        return NotImplemented

__subclasshook__ 正是isinstance, issubclass 调用时判断的依据。


这也是为什么不用abc.Iterator判断一个对象是不是可以迭代的原因:首先,它主要判断是不是Iterator而不是Iterable, 其次,它判断的标准为:必须同时具备__iter__,__next__。


Iterators in Python aren't a matter of type but of protocol. 

A large and changing number of builtin types implement *some* flavor of iterator.

Don't check the type! Use hasattr to check for both "__iter__" and "__next__" attributes instead.

In fact, that's exactly what the __subclasshook__ method of the abc.Iterator ABC does.

The best way to check if an object x is an iterator is to call isinstance(x, abc.Iterator)

Thanks to Iterator.__subclasshook__, this test works even if the class of x is not a real or virtual subclass of Iterator.


Because the only methods required of an iterator are __next__ and __iter__, there is no way to check whether there are remaining items, other than to call next() and catch StopIteration. 


Also, it's no possible to 'reset' an iterator.


if you need to start over, you need to call iter( ) on the iterable that built the iterator in the first place.


calling iter() on the original iterator itself won't help, because __iter__ return self.

如果,错误的将iter()函数作用于原来的迭代器,这并无卵用,因为原来的迭代器中的__iter__方法return self.


Any object that implement the __next__ no-argument method that returns the next item in a series or raise StopIteration when there are no more items.


Python iterators also implement the __iter__ method so they are iterable as well.



class Setence(object):
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __iter__(self):
        return SentenceIterator(self.words)
    def __repr__(self):
        return "Setence ({})".format(reprlib.repr(self.words))

class SentenceIterator(object):
    def __init__(self, words):
        self.words = words
        self.idx = 0

    def __iter__(self):
        return self

    def __next__(self):
            word = self.words[self.idx]
        except IndexError:
            raise StopIteration
        self.idx += 1
        return word
# this version has no __getitem__,  to make it clear that the class is iterable because it implements __iter__

Note that implementing the __iter__ method in SentenceIterator is not actually needed for this example to work, but the it's the right thing todo:

针对上面的例子而言,从完整地实现代码功能角度来看, __iter__方法并不是必须的,但是实现__iter__方法有以下两点好处:

    iterators are supposed to implement both __next__ and __iter__, and doing so makes our iterator 

    iterator 需要两个方法作为接口,完整实现这两个方法的对象才是迭代器。
    pass the issubclass(SentenceIterator, abc.Iterator) test.

    完整实现这两个方法可以使迭代器通过issubclass(SentenceIterator, abc.Iterator)测试。

A common cause of errors in building iterables and iterators is to confuse the two.

创建 iterables 和 iterators由于混淆这两种概念经常产生bug

To be clear:


iterables have an __iter__ method that instantiates a new iterator every time.

iterable 对象中有__iter__方法,这个方法初始化一个迭代器并返回。特殊的实现了__getitem__方法的对象,是通过iter()函数初始化一个迭代器并返回。

Iterators implement a __next__ method that returns individual items, and an __iter__ method that returns self.

Iterator 实现__next__方法返回下一个元素,实现__iter__方法返回本身。

Therefore, iterators are also iterable, but iterables are not iterators.

所以, iterators 也是 iterables, 但是Iterables不一定是Iterators.

It may be tempting to add __next__ methos in the Sentence class, making Setence instance at the same time an iterable and iterator over itself. But this is a terrible idea.


it must be possible to obtain multiple independent iterators from the same iterable instance, 

因为, 从一个Iterable对象中应该能够获得多个独立的Iterator。

and each iterator must keep its own internal state, 


so a proper implementation of the pattern requires each call to iter(my_iterable) to create a new, independent, iterator.


That is why we need the SentenceIterator class in this example.

这就是我们还需要实现 SentenceIterator对象的原因。


An iterable should never act as an iterator over itself.


In other words, iterable should implement __iter__ method, but not __next__.

换而言之,iterable 应该实现__iter__方法,不实现__next__方法。

On the other hand, for convenience, iterators should be iterable.

另一方面, 为了方便, Iterator应该是iterable的。

An iterator's __iter__ should just return self.

Iterator的__iter__方法应该只 return self.
