collections模块包含多种数据结构的实现,扩展了其他模块中的相应结构.例如,Deque是一个双端队列,允许从任意一端增加或删除元素.defaultdict是一个字典,如果找不到某个键,它会响应一个默认值,而OrderedDict会记住增加元素的序列.namedtuple扩展了一般的tuple,除了为每个成员元素提供一个数值索引外还提供一个属性名.
对于大量数据,array会比list更高效的利用内存.由于array仅限于一种数据类型,与通用的list相比,它可以采用一种更紧凑的内存表示.而且list的很多方法同样适用于array.
list包含一个sort()方法.但是heapq为有序表,函数可以修改列表的内容,并且以很低的开销维护列表原来的顺序.
构建有序列表或数组还可以使用bisect.
使用list的insert()和pop()方法模拟队列,不是线程安全的.要完成线程间的实序通信,可以使用Queue模块.multiprocessing包含一个Queue的版本,它会处理进程间的通信,从而能更容易的将一个多线程程序转换为使用进程而不是线程.
struct对于解码另一个应用的数据(例如Windows下的二进制数据)会很有用,可以将这些数据解码为Python的内置类型,以便处理.
对于高度互连的数据结构,如图和树,可以使用weakref维护引用,同时当不再需要某些对象时仍允许垃圾回收器进行清理.copy中的函数用于复制数据结构及其内容,包括用deepcopy()完成递归复制.
可以使用pprint来创建易读的表示.
作用:容器数据类型
Python版本:2.4及以后版本
collections模块包含内置类型list,dict和tuple以外的其他容器数据类型
Counter作为一个容器,可以跟踪相同的值增加了多少次.
Counter支持3中形式的初始化.调用Counter的构造函数时可以提供一个元素序列或者一个包含键和计数的字典,还可以使用关键字参数将字符串名映射到计数:
>>> import collections >>> collections.Counter(['a', 'b', 'c', 'a', 'b', 'b']) Counter({'b': 3, 'a': 2, 'c': 1}) >>> collections.Counter({'a':2, 'b':3, 'c':1}) Counter({'b': 3, 'a': 2, 'c': 1}) >>> collections.Counter(a = 2, b = 3, c = 1) Counter({'b': 3, 'a': 2, 'c': 1})由于返回的是一个字典,我们可以通过update来增加数据,通过items来查看数据,用elements来查看所有的数据:
>>> c = collections.Counter() >>> c Counter() >>> c.update('abcdaab') >>> c Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1}) >>> c.update({'a':1, 'd':5}) >>> c Counter({'d': 6, 'a': 4, 'b': 2, 'c': 1}) >>> for key, value in c.items(): print key, ' => ', value a => 4 c => 1 b => 2 d => 6 >>> c.elements() <itertools.chain object at 0x0000000002C97390> >>> list(c.elements()) ['a', 'a', 'a', 'a', 'c', 'b', 'b', 'd', 'd', 'd', 'd', 'd', 'd']使用most_common()可以生成一个序列,其中包含n个最常遇见的输入值及其相应计数(类似于字典,通过值进行排序即可)
>>> c = collections.Counter() >>> c.update({'a':5, 'b':3, 'c':11, 'd':23, 'e':2}) >>> for letter, count in c.most_common(3): print '%s: %d' % (letter, count) d: 23 c: 11 a: 5但字典本身是一个哈希结构,不是一个可用于排序的可迭代其对象.所以无法使用字典实现上述的需求.
而Counter甚至支持算术和集合操作来完成结果的聚集:
>>> import collections >>> c1 = collections.Counter(['a', 'b', 'c', 'a', 'b', 'b']) >>> c2 = collections.Counter('alphabet') >>> c1 Counter({'b': 3, 'a': 2, 'c': 1}) >>> c2 Counter({'a': 2, 'b': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1}) >>> c1 + c2 Counter({'a': 4, 'b': 4, 'c': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1}) >>> c1 - c2 Counter({'b': 2, 'c': 1}) >>> c1 & c2 Counter({'a': 2, 'b': 1})
标准字典包括一个方法setdefault()来获取一个值,如果这个值不存在则建立一个默认值.defaultdict初始化容器时会让调用者提前指定默认值.
>>> import collections >>> def default_factory(): return 'default value' >>> d = collections.defaultdict(default_factory, foo='bar') >>> d defaultdict(<function default_factory at 0x0000000002C929E8>, {'foo': 'bar'}) >>> d['foo'] 'bar' >>> d['bar'] 'default value'
deque(双端队列)支持从任意一端增加和删除元素.
>>> d = collections.deque('abcdefg') >>> d deque(['a', 'b', 'c', 'd', 'e', 'f', 'g']) >>> del d >>> d = collections.deque() >>> d.extend('abcdefg') >>> d.append('h') >>> d deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) >>> d2 = collections.deque() >>> d2.extendleft(range(6)) >>> d2.appendleft(6) >>> d2 deque([6, 5, 4, 3, 2, 1, 0]) >>> d2.pop() 0 >>> d2.popleft() 6 >>> d2 deque([5, 4, 3, 2, 1])由于双端队列是线程安全的,所以甚至可以在不同线程中同时从两端利用队列的内容:
import collections import threading import time candle = collections.deque(range(5)) def burn(direction, nextSource): while True: try: next = nextSource() except IndexError: break else: print '%8s: %s' % (direction, next) time.sleep(0.1) print '%8s donw' % direction return if __name__ == "__main__": left = threading.Thread(target=burn, args=('Left', candle.popleft)) right = threading.Thread(target=burn, args=('Right', candle.pop)) left.start() right.start() left.join() right.join()解释器显示如下:
>>> Left: 0 Right: 4 Right: 3 Left: 1 Right: 2 Left donw Right donw而deque有一个很有用的功能:可以按任意一个方向旋转,而跳过一些元素.
>>> import collections >>> d = collections.deque(range(10)) >>> d deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> d.rotate(2) >>> d deque([8, 9, 0, 1, 2, 3, 4, 5, 6, 7]) >>> d = collections.deque(range(10)) >>> d.rotate(-2) >>> d deque([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])
标准tuple使用数值索引来访问其成员
bob = ('Bob', 30, 'male') print 'Representation:', bob jane = ('jane', 29, 'female') print '\nFields by index:', jane[0] print '\nFields by index:' for p in [bob, jane]: print '%s is a %d year old %s' % p解释器显示如下:
>>> Representation: ('Bob', 30, 'male') Fields by index: jane Fields by index: Bob is a 30 year old male jane is a 29 year old female由于使用tuple时需要记住对应各个值要使用哪个索引,这可能导致错误,特别是当tuple有大量字段,而且元组的构造和使用相距很远时.对于各个成员,namedtuple除了指定数值索引外,还会指定名字(可以理解为具有排序的字典结构).
import collections Person = collections.namedtuple('Person', 'name age gender') print 'Type of Person', type(Person) bob = Person(name='Bob', age=30, gender='male') print '\nRepresentation:', bob jane = Person(name='jane', age=29, gender='female') print '\nField by name:', jane.name print '\nFields by index:' for p in [bob, jane]: print '%s is a %d year old %s' % p解释器显示如下:
>>> Type of Person <type 'type'> Representation: Person(name='Bob', age=30, gender='male') Field by name: jane Fields by index: Bob is a 30 year old male jane is a 29 year old female如果字段名重复或与Python关键字冲突,就是非法字段名:
import collections try: collections.namedtuple('Person', 'name class age gender') except ValueError, err: print err try: collections.namedtuple('Person', 'name age gender age') except ValueError, err: print err解释器显示如下:
>>> Type names and field names cannot be a keyword: 'class' Encountered duplicate field name: 'age'如果创建一个namedtuple时要基于在程序控制之外的值(如表示一个数据库查询返回的记录行,而且数据库模式事先并不知道),要将rename选项设置为True,从而对非法字段重命名:
import collections with_class = collections.namedtuple('Person', 'name class age gender', rename = True) print with_class._fields two_ages = collections.namedtuple('Person', 'name age gender age', rename = True) print two_ages._fields解释器显示如下:
>>> ('name', '_1', 'age', 'gender') ('name', 'age', 'gender', '_3')
OrderedDict是一个字典子类,可以记住其内容增加的顺序.
>>> import collections >>> d1 = {} >>> d1['a'] = 'A' >>> d1['b'] = 'B' >>> d1['c'] = 'C' >>> d2 = collections.OrderedDict() >>> d2['a'] = 'A' >>> d2['b'] = 'B' >>> d2['c'] = 'C' >>> d1 {'a': 'A', 'c': 'C', 'b': 'B'} >>> d2 OrderedDict([('a', 'A'), ('b', 'B'), ('c', 'C')])而且判断是否相等时候,元素增加的顺序也是考虑的一个因素.
>>> import collections >>> d1 = {} >>> d1['a'] = 'A' >>> d1['b'] = 'B' >>> d1['c'] = 'C' >>> d2 = {} >>> d2['c'] = 'C' >>> d2['b'] = 'B' >>> d2['a'] = 'A' >>> d1 == d2 True >>> d1 = collections.OrderedDict() >>> d1['a'] = 'A' >>> d1['b'] = 'B' c >>> d1['c'] = 'C' >>> d2 = collections.OrderedDict() >>> d2['c'] = 'C' >>> d2['b'] = 'B' >>> d2['a'] = 'A' >>> d1 == d2 False
作用:高效管理固定类型数值数据的序列
Python版本:1.4及以后版本
array模块定义了一个序列数据结构,和list类似但是所有成员都必须是相同的基本类型.
array实例化时可以提供一个参数来描述允许哪种数据类型,还可以有一个初始的数据序列存储在数组中.array支持的操作包括分片,迭代以及向末尾增加元素.
import array a = array.array('i', range(3)) print 'Initial:', a a.extend(range(4, 6)) print 'Extended:', a print 'Slice:', a[2:5] print 'Iterator:' print list(enumerate(a))解释器显示如下:
>>> Initial: array('i', [0, 1, 2]) Extended: array('i', [0, 1, 2, 4, 5]) Slice: array('i', [2, 4, 5]) Iterator: [(0, 0), (1, 1), (2, 2), (3, 4), (4, 5)]我们可以使用高效读/写文件的专用内置方法将数组的内容写入文件或从文件读入数组:
import array import binascii import tempfile a = array.array('i', range(5)) print 'A1:', a output = open('test.txt', 'w') a.tofile(output) output.flush() output.close() with open('test.txt', 'rb') as input: raw_data = input.read() print 'Raw Contents:', binascii.hexlify(raw_data) input.seek(0) a2 = array.array('i') a2.fromfile(input, len(a)) print 'A2:', a2解释器显示如下:
>>> A1: array('i', [0, 1, 2, 3, 4]) Raw Contents: 0000000001000000020000000300000004000000 A2: array('i', [0, 1, 2, 3, 4])如果数组中的数据没有采用固有的字节顺序,或者在发送到一个采用不同字节顺序的系统(或在网络上发送)之前需要交换顺序,可以由Python转换整个数组而无须迭代处理每一个元素:
import array import binascii def to_hex(a): chars_per_item = a.itemsize hex_version = binascii.hexlify(a) num_chunks = len(hex_version) for i in range(num_chunks): start = i * chars_per_item end = start + chars_per_item yield hex_version[start:end] a1 = array.array('i', range(5)) a2 = array.array('i', range(5)) a2.byteswap() fmt = '%10s %10s %10s %10s' print fmt % ('A1 hex', 'A1', 'A2 hex', 'A2') print fmt % (('-' * 10, ) * 4) for values in zip(to_hex(a1), a1, to_hex(a2), a2): print fmt % values解释器显示如下(运行环境是64位系统):
>>> A1 hex A1 A2 hex A2 ---------- ---------- ---------- ---------- 0000 0 0000 0 0000 1 0000 16777216 0100 2 0000 33554432 0000 3 0001 50331648 0200 4 0000 67108864
最大堆确保父节点大于或等于其两个子节点.最小堆要求父节点小于或等于其子节点.Python的heapq模块实现了一个最小堆.
heapq_heapdata.py:
data = [19, 9, 4, 10, 11]heapq_showtree.py:
import math from cStringIO import StringIO def show_tree(tree, total_width=36, fill=' '): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i + 1, 2))) else: row = 0 if row != last_row: output.write('\n') columns = 2 ** row col_width = int(math.floor((total_width * 1.0) / columns)) output.write(str(n).center(col_width, fill)) last_row = row print output.getvalue() print '-' * total_width print return创建堆有两种基本方式:heappush()和heapify():
import heapq from heapq_showtree import show_tree from heapq_heapdata import data heap = [] print 'random:', data print for n in data: print 'add %3d:' % n heapq.heappush(heap, n) show_tree(heap)解释器显示如下:
>>> random: [19, 9, 4, 10, 11] add 19: 19 ------------------------------------ add 9: 9 19 ------------------------------------ add 4: 4 19 9 ------------------------------------ add 10: 4 10 9 19 ------------------------------------ add 11: 4 10 9 19 11 ------------------------------------而使用heapify更加高效:
import heapq from heapq_showtree import show_tree from heapq_heapdata import data print 'random :', data heapq.heapify(data) print 'heapified:' show_tree(data)解释器显示如下:
>>> random : [19, 9, 4, 10, 11] heapified: 4 9 19 10 11 ------------------------------------一旦堆已经正确组织,就可以使用heappop()删除有最小值的元素:
import heapq from heapq_showtree import show_tree from heapq_heapdata import data print 'random :', data heapq.heapify(data) print 'heapified:' show_tree(data) print for i in range(2): smallest = heapq.heappop(data) print 'pop %3d:' % smallest show_tree(data)解释器显示如下:
>>> random : [19, 9, 4, 10, 11] heapified: 4 9 19 10 11 ------------------------------------ pop 4: 9 10 19 11 ------------------------------------ pop 9: 10 11 19 ------------------------------------而我们可以使用heapreplace()来删除现有元素并替换为新值:
import heapq from heapq_showtree import show_tree from heapq_heapdata import data heapq.heapify(data) print 'start:' show_tree(data) print for i in [0, 13]: smallest = heapq.heapreplace(data, i) print 'replace %2d with %2d:' % (smallest, i) show_tree(data)解释器显示如下:
>>> start: 4 9 19 10 11 ------------------------------------ replace 4 with 0: 0 9 19 10 11 ------------------------------------ replace 0 with 13: 9 10 19 13 11 ------------------------------------heapq还包括两个检查可迭代对象的函数,查找其中包含的最大值或最小值的范围:
import heapq from heapq_showtree import show_tree from heapq_heapdata import data print 'all :', data print '3 largest:', heapq.nlargest(3, data) print 'from sort:', list(reversed(sorted(data)[-3:])) print '3 smallest:', heapq.nsmallest(3, data) print 'from sort :', sorted(data)[:3]解释器显示如下:
>>> all : [19, 9, 4, 10, 11] 3 largest: [19, 11, 10] from sort: [19, 11, 10] 3 smallest: [4, 9, 10] from sort : [4, 9, 10]
bisect模块实现了一个算法用于向列表中插入元素,同时仍然保持列表有序.我们通过insort()向一个列表中插入元素.
import bisect import random random.seed(1) print 'New Pos Contents' print '--- --- --------' lst = [] for i in range(1, 15): r = random.randint(1, 100) #通过方法bisect来确定r的具体插入位置 position = bisect.bisect(lst, r) #将r插入到列表中 bisect.insort(lst, r) print '%3d %3d' % (r, position), lst解释器显示如下:
>>> New Pos Contents --- --- -------- 14 0 [14] 85 1 [14, 85] 77 1 [14, 77, 85] 26 1 [14, 26, 77, 85] 50 2 [14, 26, 50, 77, 85] 45 2 [14, 26, 45, 50, 77, 85] 66 4 [14, 26, 45, 50, 66, 77, 85] 79 6 [14, 26, 45, 50, 66, 77, 79, 85] 10 0 [10, 14, 26, 45, 50, 66, 77, 79, 85] 3 0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85] 84 9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85] 44 4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85] 77 9 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85] 1 0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]insort实际上等价于insort_right:在现有值后面插入新值.我们也可以使用insort_left在现有值之前插入新值.
import bisect import random random.seed(1) print 'New Pos Contents' print '--- --- --------' lst = [] for i in range(1, 15): r = random.randint(1, 100) position = bisect.bisect_left(lst, r) bisect.insort_left(lst, r) print '%3d %3d' % (r, position), lst
作用:提供一个线程安全的FIFO实现
Python版本:至少1.4
Queue模块提供了一个适用于多线程编程的先进先出数据结构,可以用来在生产者和消费者线程之间安全的传递消息或其他数据.它会为调用者处理锁定,使多个线程可以安全的处理同一个Queue实例.Queue的大小(其中包含的元素个数)可能要受限,以限制内存使用或处理.
使用put()将元素增加到序列一端,使用get()从另一端剔除.
import Queue q = Queue.Queue() for i in range(5): q.put(i) while not q.empty(): print q.get(), print解释器显示如下:
>>> 0 1 2 3 4
LifoQueue使用了后进先出顺序:
import Queue q = Queue.LifoQueue() for i in range(5): q.put(i) while not q.empty(): print q.get(), print解释器显示如下:
>>> 4 3 2 1 0
优先队列是:元素的处理顺序需要根据这些元素的特性来决定.
import Queue import threading class Job(object): def __init__(self, priority, description): self.priority = priority self.description = description print 'New job:', description def __cmp__(self, other): return cmp(self.priority, other.priority) q = Queue.PriorityQueue() q.put(Job(3, 'Mid-level job')) q.put(Job(10, 'Low-level job')) q.put(Job(1, 'Important job')) def process_job(q): while True: next_job = q.get() #这里之所以要用'A %s' % B,而不是'A ', B,是因为多线程情况下,第二种方式会输出混乱 print 'Processing job:%s\n' % next_job.description q.task_done() workers = [threading.Thread(target=process_job, args = (q,)), threading.Thread(target=process_job, args = (q,)),] for w in workers: w.setDaemon(True) w.start() q.join()解释器显示如下:
>>> New job: Mid-level job New job: Low-level job New job: Important job Processing job:Important job Processing job:Mid-level job Processing job:Low-level job
作用:在字符串和二进制数据之间转换
Python版本:1.4及以后版本
struct模块包括一些在字节串与内置Python数据类型(如数字和字符串)之间完成转换的函数.
Struct支持使用格式指示符将数据打包为字符串,以及从字符串解包数据,格式指示符由表示数据类型的字符以及可选的数量及字节序指示符构成.
我们使用pack来打包数据,unpack来解包数据:
import struct import binascii values = (1, 'ab', 2.7) s = struct.Struct('I 2s f') packed_data = s.pack(*values) print 'Original values:', values print 'Format string :', s.format print 'Uses :', s.size, 'bytes' print 'Packed Value :', binascii.hexlify(packed_data) packed_data = binascii.unhexlify(binascii.hexlify(packed_data)) unpacked_data = s.unpack(packed_data) print 'Unpacked Values:', unpacked_data
解释器显示如下:
>>> Original values: (1, 'ab', 2.7) Format string : I 2s f Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 Unpacked Values: (1, 'ab', 2.700000047683716)
默认情况下,值会使用内置C库的字节序来编码.只需在格式串中提供一个显式的字节序指令,就可以容易的覆盖这个默认选择:
import struct import binascii values = (1, 'ab', 2.7) print 'Original vlaues:', values endianness = [ ('@', 'native, native'), ('=', 'native, standard'), ('<', 'little-endian'), ('>', 'big-endian'), ('!', 'network'), ] for code, name in endianness: s = struct.Struct(code + ' I 2s f') packed_data = s.pack(*values) print print 'Format string :', s.format, 'for', name print 'Uses :', s.size, 'bytes' print 'Packed Value :', binascii.hexlify(packed_data) print 'Unpacked Value :', s.unpack(packed_data)解释器显示如下:
>>> Original vlaues: (1, 'ab', 2.7) Format string : @ I 2s f for native, native Uses : 12 bytes Packed Value : 0100000061620000cdcc2c40 Unpacked Value : (1, 'ab', 2.700000047683716) Format string : = I 2s f for native, standard Uses : 10 bytes Packed Value : 010000006162cdcc2c40 Unpacked Value : (1, 'ab', 2.700000047683716) Format string : < I 2s f for little-endian Uses : 10 bytes Packed Value : 010000006162cdcc2c40 Unpacked Value : (1, 'ab', 2.700000047683716) Format string : > I 2s f for big-endian Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, 'ab', 2.700000047683716) Format string : ! I 2s f for network Uses : 10 bytes Packed Value : 000000016162402ccccd Unpacked Value : (1, 'ab', 2.700000047683716)
通常在重视性能情况下或者向扩展模块传入或传出数据时才会处理二进制打包数据.通过避免为每个打包结构分配一个新缓冲区所带来的开销,可以优化这些情况.pack_into()和unpack_from()方法支持直接写入预分配的缓冲区.
import struct import binascii s = struct.Struct('I 2s f') values = (1, 'ab', 2.7) print 'Original:', values print print 'ctypes string buffer' import ctypes b = ctypes.create_string_buffer(s.size) print 'Before :', binascii.hexlify(b.raw) s.pack_into(b, 0, *values) print 'After :', binascii.hexlify(b.raw) print 'Unpacked:', s.unpack_from(b, 0) print print 'array' import array a = array.array('c', '\0' * s.size) print 'Before :', binascii.hexlify(a) s.pack_into(a, 0, *values) print 'After :', binascii.hexlify(a) print 'Unpacked :', s.unpack_from(a, 0)解释器显示如下:
>>> Original: (1, 'ab', 2.7) ctypes string buffer Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked: (1, 'ab', 2.700000047683716) array Before : 000000000000000000000000 After : 0100000061620000cdcc2c40 Unpacked : (1, 'ab', 2.700000047683716)
作用:引用一个'昂贵'的对象,不过如果不再有其他非弱引用,则允许由垃圾回收器回收其内存.
Python版本:2.1及以后版本
weakref模块支持对象的弱引用.正常的引用会增加对象的引用计数,避免它被垃圾回收,但并不总是希望如此,比如有时可能会出现一个循环引用,或者有时可能要构建一个对象缓存,需要内存时则要删除这个缓存.弱引用是避免对象被自动清除的一个对象句柄.
对象的弱引用通过ref类管理.要获取原对象,可以调用引用对象.
import weakref class ExpensiveObject(object): def __del__(self): print '(Deleting %s)' % self obj = ExpensiveObject() r = weakref.ref(obj) print 'obj:', obj print 'ref:', r print 'r():', r() print 'deleting obj' del obj #缓冲区并未并清除,如果为一般对象引用,则会引发异常 print 'r():', r()解释器显示如下:
>>> obj: <__main__.ExpensiveObject object at 0x0000000002CE07B8> ref: <weakref at 0000000002CDD688; to 'ExpensiveObject' at 0000000002CE07B8> r(): <__main__.ExpensiveObject object at 0x0000000002CE07B8> deleting obj (Deleting <__main__.ExpensiveObject object at 0x0000000002CE07B8>) r(): None
ref构造函数接受一个可选的回调函数,删除所引用的对象时会调用这个函数:
import weakref class ExpensiveObject(object): def __del__(self): print '(Deleting %s)' % self def callback(reference): """Invoked when referenced object is deleted""" print 'callback(', reference, ')' obj = ExpensiveObject() r = weakref.ref(obj, callback) print 'obj:', obj print 'ref:', r print 'r():', r() print 'deleting obj' del obj print 'r():', r()解释器显示如下:
>>> obj: <__main__.ExpensiveObject object at 0x0000000002C50828> ref: <weakref at 0000000002C4D6D8; to 'ExpensiveObject' at 0000000002C50828> r(): <__main__.ExpensiveObject object at 0x0000000002C50828> deleting obj callback( <weakref at 0000000002C4D6D8; dead> ) (Deleting <__main__.ExpensiveObject object at 0x0000000002C50828>) r(): None
使用代理比使用弱引用更为方便,但是代理也仅仅是一个引用,而非真正的对象:
import weakref class ExpensiveObject(object): def __init__(self, name): self.name = name def __del__(self): print '(Deleting %s)' % self obj = ExpensiveObject('My Object') r = weakref.ref(obj) p = weakref.proxy(obj) print 'via obj:', obj.name print 'via ref:', r().name print 'via proxy:', p.name del obj print 'via proxy:', p.name解释器显示如下:
>>> via obj: My Object via ref: My Object via proxy: My Object (Deleting <__main__.ExpensiveObject object at 0x0000000002BC07B8>) via proxy: Traceback (most recent call last): File "C:\Python27\test.py", line 17, in <module> print 'via proxy:', p.name ReferenceError: weakly-referenced object no longer exists
弱引用有一种用法,即在不阻止垃圾回收时允许循环引用.
weakref_graph.py:
import gc from pprint import pprint import weakref class Graph(object): def __init__(self, name): self.name = name self.other = None def set_next(self, other): print '%s.set_next(%r)' % (self.name, other) self.other = other def all_nodes(self): "Generate the nodes in the graph sequence." yield self n = self.other while n and n.name != self.name: yield n n = n.other if n is self: yield n return def __str__(self): return '->'.join(n.name for n in self.all_nodes()) def __repr__(self): return '<%s at 0x%x name=%s' % (self.__class__.__name__, id(self), self.name) def __del__(self): print '(Deleting %s)' % self.name self.set_next(None) def collect_and_show_garbage(): "Show what garbage is present." print 'Collecting...' n = gc.collect() print 'Unreachable objects:', n print 'Garbage:', pprint(gc.garbage) def demo(graph_factory): print 'Set up graph:' one = graph_factory('one') two = graph_factory('two') three = graph_factory('three') one.set_next(two) two.set_next(three) three.set_next(one) print print 'Graph:' print str(one) collect_and_show_garbage() print three = None two = None print 'After 2 references removed:' print str(one) collect_and_show_garbage() print print 'Removing last reference:' one = None collect_and_show_garbage()weakref_cycle.py:
import gc from pprint import pprint import weakref from weakref_graph import Graph, demo, collect_and_show_garbage gc.set_debug(gc.DEBUG_LEAK) print 'Setting up the cycle' print demo(Graph) print print 'Breaking the cycle and cleaning up garbage' print gc.garbage[0].set_next(None) while gc.garbage: del gc.garbage[0] print collect_and_show_garbage()解释器显示如下:
>>> Setting up the cycle Set up graph: one.set_next(<Graph at 0x2ad0be0 name=two) two.set_next(<Graph at 0x2adb5c0 name=three) three.set_next(<Graph at 0x2ad0a58 name=one) Graph: one->two->three->one Collecting... Unreachable objects: 0 Garbage:[] After 2 references removed: one->two->three->one Collecting... Unreachable objects: 0 Garbage:[] Removing last reference: Collecting... gc: uncollectable <Graph 0000000002AD0A58> gc: uncollectable <Graph 0000000002AD0BE0> gc: uncollectable <Graph 0000000002ADB5C0> gc: uncollectable <dict 0000000002A36378> gc: uncollectable <dict 00000000029F8488> gc: uncollectable <dict 00000000029F88C8> Unreachable objects: 6 Garbage:[<Graph at 0x2ad0a58 name=one, <Graph at 0x2ad0be0 name=two, <Graph at 0x2adb5c0 name=three, {'name': 'one', 'other': <Graph at 0x2ad0be0 name=two}, {'name': 'two', 'other': <Graph at 0x2adb5c0 name=three}, {'name': 'three', 'other': <Graph at 0x2ad0a58 name=one}] Breaking the cycle and cleaning up garbage one.set_next(None) (Deleting two) two.set_next(None) (Deleting three) three.set_next(None) (Deleting one) one.set_next(None) Collecting... Unreachable objects: 0 Garbage:[]我们可以使用代理来进行回收:
import gc from pprint import pprint import weakref from weakref_graph import Graph, demo class WeakGraph(Graph): def set_next(self, other): if other is not None: if self in other.all_nodes(): other = weakref.proxy(other) super(WeakGraph, self).set_next(other) return demo(WeakGraph)解释器显示如下:
>>> Set up graph: one.set_next(<WeakGraph at 0x2b8d668 name=two) two.set_next(<WeakGraph at 0x2b8d6d8 name=three) three.set_next(<weakproxy at 0000000002B7DB38 to WeakGraph at 0000000002B80BA8>) Graph: one->two->three Collecting... Unreachable objects: 0 Garbage:[] After 2 references removed: one->two->three Collecting... Unreachable objects: 0 Garbage:[] Removing last reference: (Deleting one) one.set_next(None) (Deleting two) two.set_next(None) (Deleting three) three.set_next(None) Collecting... Unreachable objects: 0 Garbage:[]
WeakValueDictionary使用其中保存的值的弱引用,当其他代码不再实际使用这些值时允许将其垃圾回收.通过使用垃圾回收器的显式调用,由此说明了使用常规字典和WeakValueDictionary完成内存处理的差别.
import gc from pprint import pprint import weakref gc.set_debug(gc.DEBUG_LEAK) class ExpensiveObject(object): def __init__(self, name): self.name = name def __repr__(self): return 'ExpensiveObject(%s)' % self.name def __del__(self): print ' (Deleting %s)' % self def demo(cache_factory): all_refs = {} print 'CACHE TYPE:', cache_factory cache = cache_factory() for name in ['one', 'two', 'three']: o = ExpensiveObject(name) cache[name] = o all_refs[name] = o del o print ' all_refs =', pprint(all_refs) print '\n Before, cache contains:', cache.keys() for name, value in cache.items(): print ' %s = %s' % (name, value) del value print '\n Cleanup:' del all_refs gc.collect() print '\n After, cache contains:', cache.keys() for name, value in cache.items(): print ' %s = %s' % (name, value) print ' demo returning' return demo(dict) print demo(weakref.WeakValueDictionary)如果循环变量指示缓存的值,这些循环变量必须显式清除,从而使对象的引用计数减少,否则,垃圾回收器不会删除这些对象,它们仍会保留在缓存中.类似的,all_refs变量用来维护引用,避免它们过早的被垃圾回收.
>>> CACHE TYPE: <type 'dict'> all_refs ={'one': ExpensiveObject(one), 'three': ExpensiveObject(three), 'two': ExpensiveObject(two)} Before, cache contains: ['three', 'two', 'one'] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) Cleanup: After, cache contains: ['three', 'two', 'one'] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) demo returning (Deleting ExpensiveObject(three)) (Deleting ExpensiveObject(two)) (Deleting ExpensiveObject(one)) CACHE TYPE: weakref.WeakValueDictionary all_refs ={'one': ExpensiveObject(one), 'three': ExpensiveObject(three), 'two': ExpensiveObject(two)} Before, cache contains: ['three', 'two', 'one'] three = ExpensiveObject(three) two = ExpensiveObject(two) one = ExpensiveObject(one) Cleanup: (Deleting ExpensiveObject(three)) (Deleting ExpensiveObject(two)) (Deleting ExpensiveObject(one)) After, cache contains: [] demo returning
作用:提供一些函数,可以使用浅副本或深副本语义复制对象
Python版本:1.4及以后版本
copy模块包括两个函数copy()和deepcopy(),用于复制现有的对象.
copy()创建一个副本,指向原对象内容的引用:
import copy class MyClass: def __init__(self, name): self.name = name def __cmp__(self, other): return cmp(self.name, other.name) a = MyClass('a') my_list = [a] dup = copy.copy(my_list) print [id(x) for x in [my_list, dup]] print [id(y) for x in [my_list, dup] for y in x]解释器显示如下:
>>> [44632008L, 44573384L] [44573512L, 44573512L]
深副本是创建一个全新的副本,包括其内容.
import copy class MyClass: def __init__(self, name): self.name = name def __cmp__(self, other): return cmp(self.name, other.name) a = MyClass('a') my_list = [a] dup = copy.deepcopy(my_list) print [id(x) for x in [my_list, dup]] print [id(y) for x in [my_list, dup] for y in x]解释器显示如下:
>>> [45615048L, 36209224L] [45556552L, 45556424L]
我们可以改写__copy__()和__deepcopy__()来实现定制复制的行为:
import copy class MyClass: def __init__(self, name): self.name = name def __cmp__(self, other): return cmp(self.name, other.name) def __copy__(self): print '__copy__()' return MyClass(self.name) def __deepcopy__(self, memo): print '__deepcopy__(%s)' % str(memo) return MyClass(copy.deepcopy(self.name, memo)) a = MyClass('a') sc = copy.copy(a) dc = copy.deepcopy(a)解释器显示如下:
>>> __copy__() __deepcopy__({})
为了避免复制递归数据结构可能带来的问题,deepcopy()使用了一个字典跟踪已复制的对象.将这个字典传入__deepcopy__()方法,从而在该方法中也可以进行检查:
备注:这段代码不太理解
import copy import pprint class Graph: def __init__(self, name, connections): self.name = name self.connections = connections def add_connection(self, other): self.connections.append(other) def __repr__(self): return 'Graph(name=%s, id=%s)' % (self.name, id(self)) def __deepcopy__(self, memo): print '\nCalling __deepcopy__ for %r' % self if self in memo: existing = memo.get(self) print ' Already copied to %r' % existing return existing print ' Memo dictionary:' pprint.pprint(memo, indent=4, width=40) dup = Graph(copy.deepcopy(self.name, memo), []) print ' Copying to new object %s' % dup memo[self] = dup for c in self.connections: dup.add_connection(copy.deepcopy(c, memo)) return dup root = Graph('root', []) a = Graph('a', [root]) b = Graph('b', [a, root]) root.add_connection(a) root.add_connection(b) dup = copy.deepcopy(root)解释器显示如下:
>>> Calling __deepcopy__ for Graph(name=root, id=45364872) Memo dictionary: { } Copying to new object Graph(name=root, id=45359816) Calling __deepcopy__ for Graph(name=a, id=45363848) Memo dictionary: { Graph(name=root, id=45364872): Graph(name=root, id=45359816), 34200192L: 'root', 46032552L: ['root']} Copying to new object Graph(name=a, id=45361160) Calling __deepcopy__ for Graph(name=root, id=45364872) Already copied to Graph(name=root, id=45359816) Calling __deepcopy__ for Graph(name=b, id=45365512) Memo dictionary: { Graph(name=a, id=45363848): Graph(name=a, id=45361160), Graph(name=root, id=45364872): Graph(name=root, id=45359816), 33255512L: 'a', 34200192L: 'root', 45363848L: Graph(name=a, id=45361160), 45364872L: Graph(name=root, id=45359816), 46032552L: [ 'root', 'a', Graph(name=root, id=45364872), Graph(name=a, id=45363848)]} Copying to new object Graph(name=b, id=45331720)
作用:美观打印数据结构
Python版本: 1.4及以后版本
测试数据pprint_data.py:
data = [(1, {'a' : 'A', 'b' : 'B', 'c' : 'C', 'd' : 'D'}), (2, {'e' : 'E', 'f' : 'F', 'g' : 'G', 'h' : 'H', 'i' : 'I', 'j' : 'J', 'k' : 'K', 'l' : 'L'}),]
from pprint import pprint from pprint_data import data print 'PRINT:' print data print print 'PPRINT:' pprint(data)解释器显示如下:
>>> PRINT: [(1, {'a': 'A', 'c': 'C', 'b': 'B', 'd': 'D'}), (2, {'e': 'E', 'g': 'G', 'f': 'F', 'i': 'I', 'h': 'H', 'k': 'K', 'j': 'J', 'l': 'L'})] PPRINT: [(1, {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}), (2, {'e': 'E', 'f': 'F', 'g': 'G', 'h': 'H', 'i': 'I', 'j': 'J', 'k': 'K', 'l': 'L'})]
要格式化一个数据结构而不把它直接写至一个流,可以使用pformat()来构造一个字符串表示.
from pprint import pformat from pprint_data import data import logging logging.basicConfig(level=logging.DEBUG, format='%(levelname)-8s %(message)s',) logging.debug('Logging pformatted data') formatted = pformat(data) for line in formatted.splitlines(): logging.debug(line.rstrip())解释器显示如下:
>>> DEBUG Logging pformatted data DEBUG [(1, {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}), DEBUG (2, DEBUG {'e': 'E', DEBUG 'f': 'F', DEBUG 'g': 'G', DEBUG 'h': 'H', DEBUG 'i': 'I', DEBUG 'j': 'J', DEBUG 'k': 'K', DEBUG 'l': 'L'})]
通过定制__repr__()来定制特定的输出:
from pprint import pprint class node(object): def __init__(self, name, contents = []): self.name = name self.contents = contents[:] def __repr__(self): return ('node(' + repr(self.name) + ', ' + repr(self.contents) + ')') trees = [node('node-1'), node('node-2', [node('node-2-1')]), node('node-3', [node('node-3-1')]),] pprint(trees)解释器显示如下:
>>> [node('node-1', []), node('node-2', [node('node-2-1', [])]), node('node-3', [node('node-3-1', [])])]
递归数据结构由指向原数据源的引用来表示:
>>> ll = [1, 2] >>> ll.append(ll) >>> ll [1, 2, [...]] >>> pprint(ll) [1, 2, <Recursion on list with id=47206664>]
我们可以指定depth来制定输出的层次:
>>> ll = [1, 2, [3, 4, [5, 6]]] >>> pprint(ll, depth=2) [1, 2, [3, 4, [...]]]
使用width来控制输出宽度
from pprint import pprint from pprint_data import data for width in [80, 5]: print 'WIDTH =', width pprint(data, width=width) print解释器显示如下:
>>> WIDTH = 80 [(1, {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}), (2, {'e': 'E', 'f': 'F', 'g': 'G', 'h': 'H', 'i': 'I', 'j': 'J', 'k': 'K', 'l': 'L'})] WIDTH = 5 [(1, {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}), (2, {'e': 'E', 'f': 'F', 'g': 'G', 'h': 'H', 'i': 'I', 'j': 'J', 'k': 'K', 'l': 'L'})]