为什么80%的码农都做不了架构师?>>>
1. 概述
collections模块包含多种数据结构的实现,扩展了其他模块中的相应结构.例如,Deque是一个双端队列,允许从任意一端增加或删除元素.defaultdict是一个字典,如果找不到某个键,它会响应一个默认值,而OrderedDict会记住增加元素的序列.namedtuple扩展了一般的tuple,除了为每个成员元素提供一个数值索引外还提供一个属性名.
对于大量数据,array会比list更高效的利用内存.由于array仅限于一种数据类型,与通用的list相比,它可以采用一种更紧凑的内存表示.而且list的很多方法同样适用于array.
list包含一个sort()方法.但是heapq为有序表,函数可以修改列表的内容,并且以很低的开销维护列表原来的顺序.
构建有序列表或数组还可以使用bisect.
使用list的insert()和pop()方法模拟队列,不是线程安全的.要完成线程间的实序通信,可以使用Queue模块.multiprocessing包含一个Queue的版本,它会处理进程间的通信,从而能更容易的将一个多线程程序转换为使用进程而不是线程.
struct对于解码另一个应用的数据(例如Windows下的二进制数据)会很有用,可以将这些数据解码为Python的内置类型,以便处理.
对于高度互连的数据结构,如图和树,可以使用weakref维护引用,同时当不再需要某些对象时仍允许垃圾回收器进行清理.copy中的函数用于复制数据结构及其内容,包括用deepcopy()完成递归复制.
可以使用pprint来创建易读的表示.
2. collections---容器数据类型
作用:容器数据类型
Python版本:2.4及以后版本
collections模块包含内置类型list,dict和tuple以外的其他容器数据类型
1. Counter
Counter作为一个容器,可以跟踪相同的值增加了多少次.
Counter支持3中形式的初始化.调用Counter的构造函数时可以提供一个元素序列或者一个包含键和计数的字典,还可以使用关键字参数将字符串名映射到计数:
>>> import collections
>>> collections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])
Counter({'b': 3, 'a': 2, 'c': 1})
>>> collections.Counter({'a':2, 'b':3, 'c':1})
Counter({'b': 3, 'a': 2, 'c': 1})
>>> collections.Counter(a = 2, b = 3, c = 1)
Counter({'b': 3, 'a': 2, 'c': 1})
由于返回的是一个字典,我们可以通过update来增加数据,通过items来查看数据,用elements来查看所有的数据:
>>> c = collections.Counter()
>>> c
Counter()
>>> c.update('abcdaab')
>>> c
Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})
>>> c.update({'a':1, 'd':5})
>>> c
Counter({'d': 6, 'a': 4, 'b': 2, 'c': 1})
>>> for key, value in c.items():
print key, ' => ', value
a => 4
c => 1
b => 2
d => 6
>>> c.elements()
>>> list(c.elements())
['a', 'a', 'a', 'a', 'c', 'b', 'b', 'd', 'd', 'd', 'd', 'd', 'd']
使用most_common()可以生成一个序列,其中包含n个最常遇见的输入值及其相应计数(类似于字典,通过值进行排序即可)
>>> c = collections.Counter()
>>> c.update({'a':5, 'b':3, 'c':11, 'd':23, 'e':2})
>>> for letter, count in c.most_common(3):
print '%s: %d' % (letter, count)
d: 23
c: 11
a: 5
但字典本身是一个哈希结构,不是一个可用于排序的可迭代其对象.所以无法使用字典实现上述的需求.
而Counter甚至支持算术和集合操作来完成结果的聚集:
>>> import collections
>>> c1 = collections.Counter(['a', 'b', 'c', 'a', 'b', 'b'])
>>> c2 = collections.Counter('alphabet')
>>> c1
Counter({'b': 3, 'a': 2, 'c': 1})
>>> c2
Counter({'a': 2, 'b': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1})
>>> c1 + c2
Counter({'a': 4, 'b': 4, 'c': 1, 'e': 1, 'h': 1, 'l': 1, 'p': 1, 't': 1})
>>> c1 - c2
Counter({'b': 2, 'c': 1})
>>> c1 & c2
Counter({'a': 2, 'b': 1})
2. defaultdict
标准字典包括一个方法setdefault()来获取一个值,如果这个值不存在则建立一个默认值.defaultdict初始化容器时会让调用者提前指定默认值.
>>> import collections
>>> def default_factory():
return 'default value'
>>> d = collections.defaultdict(default_factory, foo='bar')
>>> d
defaultdict(, {'foo': 'bar'})
>>> d['foo']
'bar'
>>> d['bar']
'default value'
3. deque
deque(双端队列)支持从任意一端增加和删除元素.
>>> d = collections.deque('abcdefg')
>>> d
deque(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
>>> del d
>>> d = collections.deque()
>>> d.extend('abcdefg')
>>> d.append('h')
>>> d
deque(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
>>> d2 = collections.deque()
>>> d2.extendleft(range(6))
>>> d2.appendleft(6)
>>> d2
deque([6, 5, 4, 3, 2, 1, 0])
>>> d2.pop()
0
>>> d2.popleft()
6
>>> d2
deque([5, 4, 3, 2, 1])
由于双端队列是线程安全的,所以甚至可以在不同线程中同时从两端利用队列的内容:
import collections
import threading
import time
candle = collections.deque(range(5))
def burn(direction, nextSource):
while True:
try:
next = nextSource()
except IndexError:
break
else:
print '%8s: %s' % (direction, next)
time.sleep(0.1)
print '%8s donw' % direction
return
if __name__ == "__main__":
left = threading.Thread(target=burn, args=('Left', candle.popleft))
right = threading.Thread(target=burn, args=('Right', candle.pop))
left.start()
right.start()
left.join()
right.join()
解释器显示如下:
>>>
Left: 0 Right: 4
Right: 3 Left: 1
Right: 2 Left donw
Right donw
而deque有一个很有用的功能:可以按任意一个方向旋转,而跳过一些元素.
>>> import collections
>>> d = collections.deque(range(10))
>>> d
deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> d.rotate(2)
>>> d
deque([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
>>> d = collections.deque(range(10))
>>> d.rotate(-2)
>>> d
deque([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])
4. namedtuple
标准tuple使用数值索引来访问其成员
bob = ('Bob', 30, 'male')
print 'Representation:', bob
jane = ('jane', 29, 'female')
print '\nFields by index:', jane[0]
print '\nFields by index:'
for p in [bob, jane]:
print '%s is a %d year old %s' % p
解释器显示如下:
>>>
Representation: ('Bob', 30, 'male')
Fields by index: jane
Fields by index:
Bob is a 30 year old male
jane is a 29 year old female
由于使用tuple时需要记住对应各个值要使用哪个索引,这可能导致错误,特别是当tuple有大量字段,而且元组的构造和使用相距很远时.对于各个成员,namedtuple除了指定数值索引外,还会指定名字(可以理解为具有排序的字典结构).
import collections
Person = collections.namedtuple('Person', 'name age gender')
print 'Type of Person', type(Person)
bob = Person(name='Bob', age=30, gender='male')
print '\nRepresentation:', bob
jane = Person(name='jane', age=29, gender='female')
print '\nField by name:', jane.name
print '\nFields by index:'
for p in [bob, jane]:
print '%s is a %d year old %s' % p
解释器显示如下:
>>>
Type of Person
Representation: Person(name='Bob', age=30, gender='male')
Field by name: jane
Fields by index:
Bob is a 30 year old male
jane is a 29 year old female
如果字段名重复或与Python关键字冲突,就是非法字段名:
import collections
try:
collections.namedtuple('Person', 'name class age gender')
except ValueError, err:
print err
try:
collections.namedtuple('Person', 'name age gender age')
except ValueError, err:
print err
解释器显示如下:
>>>
Type names and field names cannot be a keyword: 'class'
Encountered duplicate field name: 'age'
如果创建一个namedtuple时要基于在程序控制之外的值(如表示一个数据库查询返回的记录行,而且数据库模式事先并不知道),要将rename选项设置为True,从而对非法字段重命名:
import collections
with_class = collections.namedtuple('Person', 'name class age gender', rename = True)
print with_class._fields
two_ages = collections.namedtuple('Person', 'name age gender age', rename = True)
print two_ages._fields
解释器显示如下:
>>>
('name', '_1', 'age', 'gender')
('name', 'age', 'gender', '_3')
5. OrderedDict
OrderedDict是一个字典子类,可以记住其内容增加的顺序.
>>> import collections
>>> d1 = {}
>>> d1['a'] = 'A'
>>> d1['b'] = 'B'
>>> d1['c'] = 'C'
>>> d2 = collections.OrderedDict()
>>> d2['a'] = 'A'
>>> d2['b'] = 'B'
>>> d2['c'] = 'C'
>>> d1
{'a': 'A', 'c': 'C', 'b': 'B'}
>>> d2
OrderedDict([('a', 'A'), ('b', 'B'), ('c', 'C')])
而且判断是否相等时候,元素增加的顺序也是考虑的一个因素.
>>> import collections
>>> d1 = {}
>>> d1['a'] = 'A'
>>> d1['b'] = 'B'
>>> d1['c'] = 'C'
>>> d2 = {}
>>> d2['c'] = 'C'
>>> d2['b'] = 'B'
>>> d2['a'] = 'A'
>>> d1 == d2
True
>>> d1 = collections.OrderedDict()
>>> d1['a'] = 'A'
>>> d1['b'] = 'B'
c
>>> d1['c'] = 'C'
>>> d2 = collections.OrderedDict()
>>> d2['c'] = 'C'
>>> d2['b'] = 'B'
>>> d2['a'] = 'A'
>>> d1 == d2
False
3. array---固定类型数据序列
作用:高效管理固定类型数值数据的序列
Python版本:1.4及以后版本
array模块定义了一个序列数据结构,和list类似但是所有成员都必须是相同的基本类型.
array实例化时可以提供一个参数来描述允许哪种数据类型,还可以有一个初始的数据序列存储在数组中.array支持的操作包括分片,迭代以及向末尾增加元素.
import array
a = array.array('i', range(3))
print 'Initial:', a
a.extend(range(4, 6))
print 'Extended:', a
print 'Slice:', a[2:5]
print 'Iterator:'
print list(enumerate(a))
解释器显示如下:
>>>
Initial: array('i', [0, 1, 2])
Extended: array('i', [0, 1, 2, 4, 5])
Slice: array('i', [2, 4, 5])
Iterator:
[(0, 0), (1, 1), (2, 2), (3, 4), (4, 5)]
我们可以使用高效读/写文件的专用内置方法将数组的内容写入文件或从文件读入数组:
import array
import binascii
import tempfile
a = array.array('i', range(5))
print 'A1:', a
output = open('test.txt', 'w')
a.tofile(output)
output.flush()
output.close()
with open('test.txt', 'rb') as input:
raw_data = input.read()
print 'Raw Contents:', binascii.hexlify(raw_data)
input.seek(0)
a2 = array.array('i')
a2.fromfile(input, len(a))
print 'A2:', a2
解释器显示如下:
>>>
A1: array('i', [0, 1, 2, 3, 4])
Raw Contents: 0000000001000000020000000300000004000000
A2: array('i', [0, 1, 2, 3, 4])
如果数组中的数据没有采用固有的字节顺序,或者在发送到一个采用不同字节顺序的系统(或在网络上发送)之前需要交换顺序,可以由Python转换整个数组而无须迭代处理每一个元素:
import array
import binascii
def to_hex(a):
chars_per_item = a.itemsize
hex_version = binascii.hexlify(a)
num_chunks = len(hex_version)
for i in range(num_chunks):
start = i * chars_per_item
end = start + chars_per_item
yield hex_version[start:end]
a1 = array.array('i', range(5))
a2 = array.array('i', range(5))
a2.byteswap()
fmt = '%10s %10s %10s %10s'
print fmt % ('A1 hex', 'A1', 'A2 hex', 'A2')
print fmt % (('-' * 10, ) * 4)
for values in zip(to_hex(a1), a1, to_hex(a2), a2):
print fmt % values
解释器显示如下(运行环境是64位系统):
>>>
A1 hex A1 A2 hex A2
---------- ---------- ---------- ----------
0000 0 0000 0
0000 1 0000 16777216
0100 2 0000 33554432
0000 3 0001 50331648
0200 4 0000 67108864
4. heapq---堆排序算法
最大堆确保父节点大于或等于其两个子节点.最小堆要求父节点小于或等于其子节点.Python的heapq模块实现了一个最小堆.
heapq_heapdata.py:
data = [19, 9, 4, 10, 11]
heapq_showtree.py:
import math
from cStringIO import StringIO
def show_tree(tree, total_width=36, fill=' '):
"""Pretty-print a tree."""
output = StringIO()
last_row = -1
for i, n in enumerate(tree):
if i:
row = int(math.floor(math.log(i + 1, 2)))
else:
row = 0
if row != last_row:
output.write('\n')
columns = 2 ** row
col_width = int(math.floor((total_width * 1.0) / columns))
output.write(str(n).center(col_width, fill))
last_row = row
print output.getvalue()
print '-' * total_width
print
return
创建堆有两种基本方式:heappush()和heapify():
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heap = []
print 'random:', data
print
for n in data:
print 'add %3d:' % n
heapq.heappush(heap, n)
show_tree(heap)
解释器显示如下:
>>>
random: [19, 9, 4, 10, 11]
add 19:
19
------------------------------------
add 9:
9
19
------------------------------------
add 4:
4
19 9
------------------------------------
add 10:
4
10 9
19
------------------------------------
add 11:
4
10 9
19 11
------------------------------------
而使用heapify更加高效:
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print 'random :', data
heapq.heapify(data)
print 'heapified:'
show_tree(data)
解释器显示如下:
>>>
random : [19, 9, 4, 10, 11]
heapified:
4
9 19
10 11
------------------------------------
一旦堆已经正确组织,就可以使用heappop()删除有最小值的元素:
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print 'random :', data
heapq.heapify(data)
print 'heapified:'
show_tree(data)
print
for i in range(2):
smallest = heapq.heappop(data)
print 'pop %3d:' % smallest
show_tree(data)
解释器显示如下:
>>>
random : [19, 9, 4, 10, 11]
heapified:
4
9 19
10 11
------------------------------------
pop 4:
9
10 19
11
------------------------------------
pop 9:
10
11 19
------------------------------------
而我们可以使用heapreplace()来删除现有元素并替换为新值:
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heapq.heapify(data)
print 'start:'
show_tree(data)
print
for i in [0, 13]:
smallest = heapq.heapreplace(data, i)
print 'replace %2d with %2d:' % (smallest, i)
show_tree(data)
解释器显示如下:
>>>
start:
4
9 19
10 11
------------------------------------
replace 4 with 0:
0
9 19
10 11
------------------------------------
replace 0 with 13:
9
10 19
13 11
------------------------------------
heapq还包括两个检查可迭代对象的函数,查找其中包含的最大值或最小值的范围:
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print 'all :', data
print '3 largest:', heapq.nlargest(3, data)
print 'from sort:', list(reversed(sorted(data)[-3:]))
print '3 smallest:', heapq.nsmallest(3, data)
print 'from sort :', sorted(data)[:3]
解释器显示如下:
>>>
all : [19, 9, 4, 10, 11]
3 largest: [19, 11, 10]
from sort: [19, 11, 10]
3 smallest: [4, 9, 10]
from sort : [4, 9, 10]
5. bisect---维护有序列表
bisect模块实现了一个算法用于向列表中插入元素,同时仍然保持列表有序.我们通过insort()向一个列表中插入元素.
import bisect
import random
random.seed(1)
print 'New Pos Contents'
print '--- --- --------'
lst = []
for i in range(1, 15):
r = random.randint(1, 100)
#通过方法bisect来确定r的具体插入位置
position = bisect.bisect(lst, r)
#将r插入到列表中
bisect.insort(lst, r)
print '%3d %3d' % (r, position), lst
解释器显示如下:
>>>
New Pos Contents
--- --- --------
14 0 [14]
85 1 [14, 85]
77 1 [14, 77, 85]
26 1 [14, 26, 77, 85]
50 2 [14, 26, 50, 77, 85]
45 2 [14, 26, 45, 50, 77, 85]
66 4 [14, 26, 45, 50, 66, 77, 85]
79 6 [14, 26, 45, 50, 66, 77, 79, 85]
10 0 [10, 14, 26, 45, 50, 66, 77, 79, 85]
3 0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85]
84 9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85]
44 4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85]
77 9 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]
1 0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]
insort实际上等价于insort_right:在现有值后面插入新值.我们也可以使用insort_left在现有值之前插入新值.
import bisect
import random
random.seed(1)
print 'New Pos Contents'
print '--- --- --------'
lst = []
for i in range(1, 15):
r = random.randint(1, 100)
position = bisect.bisect_left(lst, r)
bisect.insort_left(lst, r)
print '%3d %3d' % (r, position), lst
6. Queue---线程安全的FIFO实现
作用:提供一个线程安全的FIFO实现
Python版本:至少1.4
Queue模块提供了一个适用于多线程编程的先进先出数据结构,可以用来在生产者和消费者线程之间安全的传递消息或其他数据.它会为调用者处理锁定,使多个线程可以安全的处理同一个Queue实例.Queue的大小(其中包含的元素个数)可能要受限,以限制内存使用或处理.
1. 基本FIFO队列
使用put()将元素增加到序列一端,使用get()从另一端剔除.
import Queue
q = Queue.Queue()
for i in range(5):
q.put(i)
while not q.empty():
print q.get(),
print
解释器显示如下:
>>>
0 1 2 3 4
2. LIFO队列
LifoQueue使用了后进先出顺序:
import Queue
q = Queue.LifoQueue()
for i in range(5):
q.put(i)
while not q.empty():
print q.get(),
print
解释器显示如下:
>>>
4 3 2 1 0
3. 优先队列
优先队列是:元素的处理顺序需要根据这些元素的特性来决定.
import Queue
import threading
class Job(object):
def __init__(self, priority, description):
self.priority = priority
self.description = description
print 'New job:', description
def __cmp__(self, other):
return cmp(self.priority, other.priority)
q = Queue.PriorityQueue()
q.put(Job(3, 'Mid-level job'))
q.put(Job(10, 'Low-level job'))
q.put(Job(1, 'Important job'))
def process_job(q):
while True:
next_job = q.get()
#这里之所以要用'A %s' % B,而不是'A ', B,是因为多线程情况下,第二种方式会输出混乱
print 'Processing job:%s\n' % next_job.description
q.task_done()
workers = [threading.Thread(target=process_job, args = (q,)),
threading.Thread(target=process_job, args = (q,)),]
for w in workers:
w.setDaemon(True)
w.start()
q.join()
解释器显示如下:
>>>
New job: Mid-level job
New job: Low-level job
New job: Important job
Processing job:Important job
Processing job:Mid-level job
Processing job:Low-level job
7. struct---二进制数据结构
作用:在字符串和二进制数据之间转换
Python版本:1.4及以后版本
struct模块包括一些在字节串与内置Python数据类型(如数字和字符串)之间完成转换的函数.
1. 打包和解包
Struct支持使用格式指示符将数据打包为字符串,以及从字符串解包数据,格式指示符由表示数据类型的字符以及可选的数量及字节序指示符构成.
我们使用pack来打包数据,unpack来解包数据:
import struct
import binascii
values = (1, 'ab', 2.7)
s = struct.Struct('I 2s f')
packed_data = s.pack(*values)
print 'Original values:', values
print 'Format string :', s.format
print 'Uses :', s.size, 'bytes'
print 'Packed Value :', binascii.hexlify(packed_data)
packed_data = binascii.unhexlify(binascii.hexlify(packed_data))
unpacked_data = s.unpack(packed_data)
print 'Unpacked Values:', unpacked_data
解释器显示如下:
>>>
Original values: (1, 'ab', 2.7)
Format string : I 2s f
Uses : 12 bytes
Packed Value : 0100000061620000cdcc2c40
Unpacked Values: (1, 'ab', 2.700000047683716)
2. 字节序
默认情况下,值会使用内置C库的字节序来编码.只需在格式串中提供一个显式的字节序指令,就可以容易的覆盖这个默认选择:
import struct
import binascii
values = (1, 'ab', 2.7)
print 'Original vlaues:', values
endianness = [
('@', 'native, native'),
('=', 'native, standard'),
('<', 'little-endian'),
('>', 'big-endian'),
('!', 'network'),
]
for code, name in endianness:
s = struct.Struct(code + ' I 2s f')
packed_data = s.pack(*values)
print
print 'Format string :', s.format, 'for', name
print 'Uses :', s.size, 'bytes'
print 'Packed Value :', binascii.hexlify(packed_data)
print 'Unpacked Value :', s.unpack(packed_data)
解释器显示如下:
>>>
Original vlaues: (1, 'ab', 2.7)
Format string : @ I 2s f for native, native
Uses : 12 bytes
Packed Value : 0100000061620000cdcc2c40
Unpacked Value : (1, 'ab', 2.700000047683716)
Format string : = I 2s f for native, standard
Uses : 10 bytes
Packed Value : 010000006162cdcc2c40
Unpacked Value : (1, 'ab', 2.700000047683716)
Format string : < I 2s f for little-endian
Uses : 10 bytes
Packed Value : 010000006162cdcc2c40
Unpacked Value : (1, 'ab', 2.700000047683716)
Format string : > I 2s f for big-endian
Uses : 10 bytes
Packed Value : 000000016162402ccccd
Unpacked Value : (1, 'ab', 2.700000047683716)
Format string : ! I 2s f for network
Uses : 10 bytes
Packed Value : 000000016162402ccccd
Unpacked Value : (1, 'ab', 2.700000047683716)
3. 缓冲区
通常在重视性能情况下或者向扩展模块传入或传出数据时才会处理二进制打包数据.通过避免为每个打包结构分配一个新缓冲区所带来的开销,可以优化这些情况.pack_into()和unpack_from()方法支持直接写入预分配的缓冲区.
import struct
import binascii
s = struct.Struct('I 2s f')
values = (1, 'ab', 2.7)
print 'Original:', values
print
print 'ctypes string buffer'
import ctypes
b = ctypes.create_string_buffer(s.size)
print 'Before :', binascii.hexlify(b.raw)
s.pack_into(b, 0, *values)
print 'After :', binascii.hexlify(b.raw)
print 'Unpacked:', s.unpack_from(b, 0)
print
print 'array'
import array
a = array.array('c', '\0' * s.size)
print 'Before :', binascii.hexlify(a)
s.pack_into(a, 0, *values)
print 'After :', binascii.hexlify(a)
print 'Unpacked :', s.unpack_from(a, 0)
解释器显示如下:
>>>
Original: (1, 'ab', 2.7)
ctypes string buffer
Before : 000000000000000000000000
After : 0100000061620000cdcc2c40
Unpacked: (1, 'ab', 2.700000047683716)
array
Before : 000000000000000000000000
After : 0100000061620000cdcc2c40
Unpacked : (1, 'ab', 2.700000047683716)
8. weakref---对象的非永久引用
作用:引用一个'昂贵'的对象,不过如果不再有其他非弱引用,则允许由垃圾回收器回收其内存.
Python版本:2.1及以后版本
weakref模块支持对象的弱引用.正常的引用会增加对象的引用计数,避免它被垃圾回收,但并不总是希望如此,比如有时可能会出现一个循环引用,或者有时可能要构建一个对象缓存,需要内存时则要删除这个缓存.弱引用是避免对象被自动清除的一个对象句柄.
1. 引用
对象的弱引用通过ref类管理.要获取原对象,可以调用引用对象.
import weakref
class ExpensiveObject(object):
def __del__(self):
print '(Deleting %s)' % self
obj = ExpensiveObject()
r = weakref.ref(obj)
print 'obj:', obj
print 'ref:', r
print 'r():', r()
print 'deleting obj'
del obj
#缓冲区并未并清除,如果为一般对象引用,则会引发异常
print 'r():', r()
解释器显示如下:
>>>
obj: <__main__.ExpensiveObject object at 0x0000000002CE07B8>
ref:
r(): <__main__.ExpensiveObject object at 0x0000000002CE07B8>
deleting obj
(Deleting <__main__.ExpensiveObject object at 0x0000000002CE07B8>)
r(): None
2. 引用回调
ref构造函数接受一个可选的回调函数,删除所引用的对象时会调用这个函数:
import weakref
class ExpensiveObject(object):
def __del__(self):
print '(Deleting %s)' % self
def callback(reference):
"""Invoked when referenced object is deleted"""
print 'callback(', reference, ')'
obj = ExpensiveObject()
r = weakref.ref(obj, callback)
print 'obj:', obj
print 'ref:', r
print 'r():', r()
print 'deleting obj'
del obj
print 'r():', r()
解释器显示如下:
>>>
obj: <__main__.ExpensiveObject object at 0x0000000002C50828>
ref:
r(): <__main__.ExpensiveObject object at 0x0000000002C50828>
deleting obj
callback( )
(Deleting <__main__.ExpensiveObject object at 0x0000000002C50828>)
r(): None
3. 代理
使用代理比使用弱引用更为方便,但是代理也仅仅是一个引用,而非真正的对象:
import weakref
class ExpensiveObject(object):
def __init__(self, name):
self.name = name
def __del__(self):
print '(Deleting %s)' % self
obj = ExpensiveObject('My Object')
r = weakref.ref(obj)
p = weakref.proxy(obj)
print 'via obj:', obj.name
print 'via ref:', r().name
print 'via proxy:', p.name
del obj
print 'via proxy:', p.name
解释器显示如下:
>>>
via obj: My Object
via ref: My Object
via proxy: My Object
(Deleting <__main__.ExpensiveObject object at 0x0000000002BC07B8>)
via proxy:
Traceback (most recent call last):
File "C:\Python27\test.py", line 17, in
print 'via proxy:', p.name
ReferenceError: weakly-referenced object no longer exists
4. 循环引用
弱引用有一种用法,即在不阻止垃圾回收时允许循环引用.
weakref_graph.py:
import gc
from pprint import pprint
import weakref
class Graph(object):
def __init__(self, name):
self.name = name
self.other = None
def set_next(self, other):
print '%s.set_next(%r)' % (self.name, other)
self.other = other
def all_nodes(self):
"Generate the nodes in the graph sequence."
yield self
n = self.other
while n and n.name != self.name:
yield n
n = n.other
if n is self:
yield n
return
def __str__(self):
return '->'.join(n.name for n in self.all_nodes())
def __repr__(self):
return '<%s at 0x%x name=%s' % (self.__class__.__name__, id(self), self.name)
def __del__(self):
print '(Deleting %s)' % self.name
self.set_next(None)
def collect_and_show_garbage():
"Show what garbage is present."
print 'Collecting...'
n = gc.collect()
print 'Unreachable objects:', n
print 'Garbage:',
pprint(gc.garbage)
def demo(graph_factory):
print 'Set up graph:'
one = graph_factory('one')
two = graph_factory('two')
three = graph_factory('three')
one.set_next(two)
two.set_next(three)
three.set_next(one)
print
print 'Graph:'
print str(one)
collect_and_show_garbage()
print
three = None
two = None
print 'After 2 references removed:'
print str(one)
collect_and_show_garbage()
print
print 'Removing last reference:'
one = None
collect_and_show_garbage()
weakref_cycle.py:
import gc
from pprint import pprint
import weakref
from weakref_graph import Graph, demo, collect_and_show_garbage
gc.set_debug(gc.DEBUG_LEAK)
print 'Setting up the cycle'
print
demo(Graph)
print
print 'Breaking the cycle and cleaning up garbage'
print
gc.garbage[0].set_next(None)
while gc.garbage:
del gc.garbage[0]
print
collect_and_show_garbage()
解释器显示如下:
>>>
Setting up the cycle
Set up graph:
one.set_next(two->three->one
Collecting...
Unreachable objects: 0
Garbage:[]
After 2 references removed:
one->two->three->one
Collecting...
Unreachable objects: 0
Garbage:[]
Removing last reference:
Collecting...
gc: uncollectable
gc: uncollectable
gc: uncollectable
gc: uncollectable
gc: uncollectable
gc: uncollectable
Unreachable objects: 6
Garbage:[
我们可以使用代理来进行回收:
import gc
from pprint import pprint
import weakref
from weakref_graph import Graph, demo
class WeakGraph(Graph):
def set_next(self, other):
if other is not None:
if self in other.all_nodes():
other = weakref.proxy(other)
super(WeakGraph, self).set_next(other)
return
demo(WeakGraph)
解释器显示如下:
>>>
Set up graph:
one.set_next()
Graph:
one->two->three
Collecting...
Unreachable objects: 0
Garbage:[]
After 2 references removed:
one->two->three
Collecting...
Unreachable objects: 0
Garbage:[]
Removing last reference:
(Deleting one)
one.set_next(None)
(Deleting two)
two.set_next(None)
(Deleting three)
three.set_next(None)
Collecting...
Unreachable objects: 0
Garbage:[]
5. 缓存对象
WeakValueDictionary使用其中保存的值的弱引用,当其他代码不再实际使用这些值时允许将其垃圾回收.通过使用垃圾回收器的显式调用,由此说明了使用常规字典和WeakValueDictionary完成内存处理的差别.
import gc
from pprint import pprint
import weakref
gc.set_debug(gc.DEBUG_LEAK)
class ExpensiveObject(object):
def __init__(self, name):
self.name = name
def __repr__(self):
return 'ExpensiveObject(%s)' % self.name
def __del__(self):
print ' (Deleting %s)' % self
def demo(cache_factory):
all_refs = {}
print 'CACHE TYPE:', cache_factory
cache = cache_factory()
for name in ['one', 'two', 'three']:
o = ExpensiveObject(name)
cache[name] = o
all_refs[name] = o
del o
print ' all_refs =',
pprint(all_refs)
print '\n Before, cache contains:', cache.keys()
for name, value in cache.items():
print ' %s = %s' % (name, value)
del value
print '\n Cleanup:'
del all_refs
gc.collect()
print '\n After, cache contains:', cache.keys()
for name, value in cache.items():
print ' %s = %s' % (name, value)
print ' demo returning'
return
demo(dict)
print
demo(weakref.WeakValueDictionary)
如果循环变量指示缓存的值,这些循环变量必须显式清除,从而使对象的引用计数减少,否则,垃圾回收器不会删除这些对象,它们仍会保留在缓存中.类似的,all_refs变量用来维护引用,避免它们过早的被垃圾回收.
>>>
CACHE TYPE:
all_refs ={'one': ExpensiveObject(one),
'three': ExpensiveObject(three),
'two': ExpensiveObject(two)}
Before, cache contains: ['three', 'two', 'one']
three = ExpensiveObject(three)
two = ExpensiveObject(two)
one = ExpensiveObject(one)
Cleanup:
After, cache contains: ['three', 'two', 'one']
three = ExpensiveObject(three)
two = ExpensiveObject(two)
one = ExpensiveObject(one)
demo returning
(Deleting ExpensiveObject(three))
(Deleting ExpensiveObject(two))
(Deleting ExpensiveObject(one))
CACHE TYPE: weakref.WeakValueDictionary
all_refs ={'one': ExpensiveObject(one),
'three': ExpensiveObject(three),
'two': ExpensiveObject(two)}
Before, cache contains: ['three', 'two', 'one']
three = ExpensiveObject(three)
two = ExpensiveObject(two)
one = ExpensiveObject(one)
Cleanup:
(Deleting ExpensiveObject(three))
(Deleting ExpensiveObject(two))
(Deleting ExpensiveObject(one))
After, cache contains: []
demo returning
9. copy---复制对象
作用:提供一些函数,可以使用浅副本或深副本语义复制对象
Python版本:1.4及以后版本
copy模块包括两个函数copy()和deepcopy(),用于复制现有的对象.
1. 浅副本
copy()创建一个副本,指向原对象内容的引用:
import copy
class MyClass:
def __init__(self, name):
self.name = name
def __cmp__(self, other):
return cmp(self.name, other.name)
a = MyClass('a')
my_list = [a]
dup = copy.copy(my_list)
print [id(x) for x in [my_list, dup]]
print [id(y) for x in [my_list, dup] for y in x]
解释器显示如下:
>>>
[44632008L, 44573384L]
[44573512L, 44573512L]
2. 深副本
深副本是创建一个全新的副本,包括其内容.
import copy
class MyClass:
def __init__(self, name):
self.name = name
def __cmp__(self, other):
return cmp(self.name, other.name)
a = MyClass('a')
my_list = [a]
dup = copy.deepcopy(my_list)
print [id(x) for x in [my_list, dup]]
print [id(y) for x in [my_list, dup] for y in x]
解释器显示如下:
>>>
[45615048L, 36209224L]
[45556552L, 45556424L]
3. 定制复制行为
我们可以改写__copy__()和__deepcopy__()来实现定制复制的行为:
import copy
class MyClass:
def __init__(self, name):
self.name = name
def __cmp__(self, other):
return cmp(self.name, other.name)
def __copy__(self):
print '__copy__()'
return MyClass(self.name)
def __deepcopy__(self, memo):
print '__deepcopy__(%s)' % str(memo)
return MyClass(copy.deepcopy(self.name, memo))
a = MyClass('a')
sc = copy.copy(a)
dc = copy.deepcopy(a)
解释器显示如下:
>>>
__copy__()
__deepcopy__({})
4. 深副本中的递归
为了避免复制递归数据结构可能带来的问题,deepcopy()使用了一个字典跟踪已复制的对象.将这个字典传入__deepcopy__()方法,从而在该方法中也可以进行检查:
备注:这段代码不太理解
import copy
import pprint
class Graph:
def __init__(self, name, connections):
self.name = name
self.connections = connections
def add_connection(self, other):
self.connections.append(other)
def __repr__(self):
return 'Graph(name=%s, id=%s)' % (self.name, id(self))
def __deepcopy__(self, memo):
print '\nCalling __deepcopy__ for %r' % self
if self in memo:
existing = memo.get(self)
print ' Already copied to %r' % existing
return existing
print ' Memo dictionary:'
pprint.pprint(memo, indent=4, width=40)
dup = Graph(copy.deepcopy(self.name, memo), [])
print ' Copying to new object %s' % dup
memo[self] = dup
for c in self.connections:
dup.add_connection(copy.deepcopy(c, memo))
return dup
root = Graph('root', [])
a = Graph('a', [root])
b = Graph('b', [a, root])
root.add_connection(a)
root.add_connection(b)
dup = copy.deepcopy(root)
解释器显示如下:
>>>
Calling __deepcopy__ for Graph(name=root, id=45364872)
Memo dictionary:
{ }
Copying to new object Graph(name=root, id=45359816)
Calling __deepcopy__ for Graph(name=a, id=45363848)
Memo dictionary:
{ Graph(name=root, id=45364872): Graph(name=root, id=45359816),
34200192L: 'root',
46032552L: ['root']}
Copying to new object Graph(name=a, id=45361160)
Calling __deepcopy__ for Graph(name=root, id=45364872)
Already copied to Graph(name=root, id=45359816)
Calling __deepcopy__ for Graph(name=b, id=45365512)
Memo dictionary:
{ Graph(name=a, id=45363848): Graph(name=a, id=45361160),
Graph(name=root, id=45364872): Graph(name=root, id=45359816),
33255512L: 'a',
34200192L: 'root',
45363848L: Graph(name=a, id=45361160),
45364872L: Graph(name=root, id=45359816),
46032552L: [ 'root',
'a',
Graph(name=root, id=45364872),
Graph(name=a, id=45363848)]}
Copying to new object Graph(name=b, id=45331720)
10. pprint---美观打印数据结构
作用:美观打印数据结构
Python版本: 1.4及以后版本
测试数据pprint_data.py:
data = [(1, {'a' : 'A', 'b' : 'B', 'c' : 'C', 'd' : 'D'}),
(2, {'e' : 'E', 'f' : 'F', 'g' : 'G', 'h' : 'H',
'i' : 'I', 'j' : 'J', 'k' : 'K', 'l' : 'L'}),]
1. 打印
from pprint import pprint
from pprint_data import data
print 'PRINT:'
print data
print
print 'PPRINT:'
pprint(data)
解释器显示如下:
>>>
PRINT:
[(1, {'a': 'A', 'c': 'C', 'b': 'B', 'd': 'D'}), (2, {'e': 'E', 'g': 'G', 'f': 'F', 'i': 'I', 'h': 'H', 'k': 'K', 'j': 'J', 'l': 'L'})]
PPRINT:
[(1, {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}),
(2,
{'e': 'E',
'f': 'F',
'g': 'G',
'h': 'H',
'i': 'I',
'j': 'J',
'k': 'K',
'l': 'L'})]
2. 格式化
要格式化一个数据结构而不把它直接写至一个流,可以使用pformat()来构造一个字符串表示.
from pprint import pformat
from pprint_data import data
import logging
logging.basicConfig(level=logging.DEBUG, format='%(levelname)-8s %(message)s',)
logging.debug('Logging pformatted data')
formatted = pformat(data)
for line in formatted.splitlines():
logging.debug(line.rstrip())
解释器显示如下:
>>>
DEBUG Logging pformatted data
DEBUG [(1, {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}),
DEBUG (2,
DEBUG {'e': 'E',
DEBUG 'f': 'F',
DEBUG 'g': 'G',
DEBUG 'h': 'H',
DEBUG 'i': 'I',
DEBUG 'j': 'J',
DEBUG 'k': 'K',
DEBUG 'l': 'L'})]
3. 任意类
通过定制__repr__()来定制特定的输出:
from pprint import pprint
class node(object):
def __init__(self, name, contents = []):
self.name = name
self.contents = contents[:]
def __repr__(self):
return ('node(' + repr(self.name) + ', ' +
repr(self.contents) + ')')
trees = [node('node-1'),
node('node-2', [node('node-2-1')]),
node('node-3', [node('node-3-1')]),]
pprint(trees)
解释器显示如下:
>>>
[node('node-1', []),
node('node-2', [node('node-2-1', [])]),
node('node-3', [node('node-3-1', [])])]
4. 递归
递归数据结构由指向原数据源的引用来表示:
>>> ll = [1, 2]
>>> ll.append(ll)
>>> ll
[1, 2, [...]]
>>> pprint(ll)
[1, 2, ]
5. 限制嵌套输出
我们可以指定depth来制定输出的层次:
>>> ll = [1, 2, [3, 4, [5, 6]]]
>>> pprint(ll, depth=2)
[1, 2, [3, 4, [...]]]
6. 控制输出宽度
使用width来控制输出宽度
from pprint import pprint
from pprint_data import data
for width in [80, 5]:
print 'WIDTH =', width
pprint(data, width=width)
print
解释器显示如下:
>>>
WIDTH = 80
[(1, {'a': 'A', 'b': 'B', 'c': 'C', 'd': 'D'}),
(2,
{'e': 'E',
'f': 'F',
'g': 'G',
'h': 'H',
'i': 'I',
'j': 'J',
'k': 'K',
'l': 'L'})]
WIDTH = 5
[(1,
{'a': 'A',
'b': 'B',
'c': 'C',
'd': 'D'}),
(2,
{'e': 'E',
'f': 'F',
'g': 'G',
'h': 'H',
'i': 'I',
'j': 'J',
'k': 'K',
'l': 'L'})]