翻译自:https://pymotw.com/3/itertools/index.html
itertools中函数设计的初衷是使用起来快速且更有效的利用内存,数据不会被创建直到真的需要,这种“lazy”模式使其不用存储大量数据在内存中。
1) chain
chain函数以多个iterators 为入参,返回一个iterators , 该iterators 包含了入参中的所有元素。
from itertools import *
for i in chain([1, 2, 3], ['a', 'b', 'c']): #使用chain的好处是可以不用再构建一个大的list
print(i, end=' ')
#结果
1 2 3 a b c
2) chain.from_iterable()函数
若iterables 事先不能确定,可以使用chain.from_iterable()函数
from itertools import *
def make_iterables_to_chain():
yield [1, 2, 3]
yield ['a', 'b', 'c']
for i in chain.from_iterable(make_iterables_to_chain()):
print(i, end=' ')
print()
#结果
1 2 3 a b c
3) zip函数
将多个iterator中的相应位置数值组成一个个tuple,注意对比zip(较短的iterator结束后就停止)和zip_longest函数(较长的iterator结束后才停止,默认用None填充没有的值)。
for i in zip([1, 2, 3], ['a', 'b', 'c']):
print(i)
#结果
(1, 'a')
(2, 'b')
(3, 'c')
4) islice函数
返回入参iterator中的根据索引选择的数值
from itertools import *
print('Stop at 5:')
for i in islice(range(100), 5):
print(i, end=' ')
print('\n')
print('Start at 5, Stop at 10:')
for i in islice(range(100), 5, 10):
print(i, end=' ')
print('\n')
print('By tens to 100:')
for i in islice(range(100), 0, 100, 10):
print(i, end=' ')
print('\n')
#结果
Stop at 5:
0 1 2 3 4
Start at 5, Stop at 10:
5 6 7 8 9
By tens to 100:
0 10 20 30 40 50 60 70 80 90
5) tee函数
根据输入itrerator返回多个独立的iterator
from itertools import *
r = islice(count(), 5)
i1, i2 = tee(r)
print('i1:', list(i1))
print('i2:', list(i2))
#结果
i1: [0, 1, 2, 3, 4]
i2: [0, 1, 2, 3, 4]
tee() 返回的新迭代器与源迭代器共享输入数据,因此,源迭代器中消耗了的数据,新迭代器都不会再出现。
from itertools import *
r = islice(count(), 5)
i1, i2 = tee(r)
print('r:', end=' ')
for i in r:
print(i, end=' ')
if i > 1:
break
print()
print('i1:', list(i1))
print('i2:', list(i2))
#结果
r: 0 1 2
i1: [3, 4]
i2: [3, 4]
1)内置map函数
def times_two(x):
return 2 * x
def multiply(x, y):
return (x, y, x * y)
print('Doubles:')
for i in map(times_two, range(5)):
print(i)
print('\nMultiples:')
r1 = range(5)
r2 = range(5, 10)
for i in map(multiply, r1, r2):
print('{:d} * {:d} = {:d}'.format(*i))
print('\nStopping:')
r1 = range(5)
r2 = range(2)
for i in map(multiply, r1, r2):
print(i)
#结果
Doubles:
0
2
4
6
8
Multiples:
0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36
Stopping:
(0, 0, 0)
(1, 1, 1)
2)starmap函数
starmap函数跟map函数类似,只不过map函数接受多个iterator,而starmap函数只接受一个iterator,且使用*号将该iterator中的元素拆成单个
from itertools import *
values = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]
for i in starmap(lambda x, y: (x, y, x * y), values):
print('{} * {} = {}'.format(*i))
#结果
0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36
1)count函数
count函数返回一个无限产生连续整数的迭代器, 第一个值默认为0, 没有上限
from itertools import *
for i in zip(count(1), ['a', 'b', 'c']): #count起始值为1
print(i)
#结果
(1, 'a')
(2, 'b')
(3, 'c')
2)cycle函数
cycle() 返回的迭代器会重复产生参数中的内容。如果输入iterator中的内容比较大,很可能比较消耗内存
from itertools import *
for i in zip(range(7), cycle(['a', 'b', 'c'])):
print(i)
# 结果
(0, 'a')
(1, 'b')
(2, 'c')
(3, 'a')
(4, 'b')
(5, 'c')
(6, 'a')
3)repeat函数
repeat() 返回的迭代器会重复相同的值,重复次数可由参数 times 指定。
from itertools import *
for i in repeat('over-and-over', 5):
print(i)
#结果
over-and-over
over-and-over
over-and-over
over-and-over
over-and-over
用repeat()跟zip()或map()组合会有神奇的功效,比如跟zip组合,产生带序号的常量值:
from itertools import *
for i, s in zip(count(), repeat('over-and-over', 5)):
print(i, s)
#结果
0 over-and-over
1 over-and-over
2 over-and-over
3 over-and-over
4 over-and-over
跟map组合,生成乘法表:
from itertools import *
for i in map(lambda x, y: (x, y, x * y), repeat(2), range(5)):
print('{:d} * {:d} = {:d}'.format(*i))
#结果
2 * 0 = 0
2 * 1 = 2
2 * 2 = 4
2 * 3 = 6
2 * 4 = 8
1) dropwhile函数
dropwhile() 返回的迭代器,它对输入迭代器中的每个元素逐一进行测试,丢弃所有满足测试条件的元素,直到碰到使条件测试返回值为 False 的元素,该元素及之后的所有元素作为返回迭代器中的元素。
from itertools import *
def should_drop(x):
print('Testing:', x)
return x < 1
for i in dropwhile(should_drop, [-1, 0, 1, 2, -2]):
print('Yielding:', i)
#结果
Testing: -1
Testing: 0
Testing: 1
Yielding: 1
Yielding: 2
Yielding: -2
2)takewhile函数
takewhile函数正好与dropwhile相反,其返回的迭代器会一直返回条件为真的元素,直到遇到一个为false的。
from itertools import *
def should_take(x):
print('Testing:', x)
return x < 2
for i in takewhile(should_take, [-1, 0, 1, 2, -2]):
print('Yielding:', i)
#结果
Testing: -1
Yielding: -1
Testing: 0
Yielding: 0
Testing: 1
Yielding: 1
Testing: 2
3)filter函数
filter函数返回的迭代器包含所有使条件为真的元素
from itertools import *
def check_item(x):
print('Testing:', x)
return x < 1
for i in filter(check_item, [-1, 0, 1, 2, -2]):
print('Yielding:', i)
#结果
Testing: -1
Yielding: -1
Testing: 0
Yielding: 0
Testing: 1
Testing: 2
Testing: -2
Yielding: -2
4)filterfalse函数
filterfalse函数返回迭代器,只包含那些使值为false的元素
5)compress函数
与filter函数想比,compress函数提供了另一种过滤机制,不是提供一个函数,而是提供一个iterable ,根据该iterable 中的值来确定是否接受输入iterable 中的值
from itertools import *
every_third = cycle([False, False, True])
data = range(1, 10)
for i in compress(data, every_third):
print(i, end=' ')
print()
#结果
3 6 9
1)groupby函数
groupby函数对iterator中的元素根据某个key对其进行分类。
import functools
from itertools import *
import operator
import pprint
@functools.total_ordering
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return '({}, {})'.format(self.x, self.y)
def __eq__(self, other):
return (self.x, self.y) == (other.x, other.y)
def __gt__(self, other):
return (self.x, self.y) > (other.x, other.y)
# Create a dataset of Point instances
data = list(map(Point,
cycle(islice(count(), 3)),
islice(count(), 7)))
print('Data:')
pprint.pprint(data, width=35)
print()
# Try to group the unsorted data based on X values
print('Grouped, unsorted:')
for k, g in groupby(data, operator.attrgetter('x')):
print(k, list(g))
print()
# Sort the data
data.sort()
print('Sorted:')
pprint.pprint(data, width=35)
print()
# Group the sorted data based on X values
print('Grouped, sorted:')
for k, g in groupby(data, operator.attrgetter('x')):
print(k, list(g))
print()
#结果
Data:
[(0, 0),
(1, 1),
(2, 2),
(0, 3),
(1, 4),
(2, 5),
(0, 6)]
Grouped, unsorted:
0 [(0, 0)]
1 [(1, 1)]
2 [(2, 2)]
0 [(0, 3)]
1 [(1, 4)]
2 [(2, 5)]
0 [(0, 6)]
Sorted:
[(0, 0),
(0, 3),
(0, 6),
(1, 1),
(1, 4),
(2, 2),
(2, 5)]
Grouped, sorted:
0 [(0, 0), (0, 3), (0, 6)]
1 [(1, 1), (1, 4)]
2 [(2, 2), (2, 5)]