Python高级数据结构（一）

数据结构

数据结构的概念很好理解，就是用来将数据组织在一起的结构。换句话说，数据结构是用来存储一系列关联数据的东西。在Python中有四种内建的数据结构，分别是List、Tuple、Dictionary以及Set。大部分的应用程序不需要其他类型的数据结构，但若是真需要也有很多高级数据结构可供选择，例如Collection、Array、Heapq、Bisect、Weakref、Copy以及Pprint。本文将介绍这些数据结构的用法，看看它们是如何帮助我们的应用程序的。

关于四种内建数据结构的使用方法很简单，并且网上有很多参考资料，因此本文将不会讨论它们。

1. Collections

1.1 Counter()

如果你想统计一个单词在给定的序列中一共出现了多少次，诸如此类的操作就可以用到Counter。来看看如何统计一个list中出现的item次数：

from collections import Counter
 
li = ["Dog", "Cat", "Mouse", 42, "Dog", 42, "Cat", "Dog"]
a = Counter(li)
print a 
#Counter({'Dog': 3, 42: 2, 'Cat': 2, 'Mouse': 1})

若要统计一个list中不同单词的数目，可以这么用：

from collections import Counter
 
li = ["Dog", "Cat", "Mouse", 42, "Dog", 42, "Cat", "Dog"]
a = Counter(li)
print a # Counter({'Dog': 3, 42: 2, 'Cat': 2, 'Mouse': 1})
 
print len(set(li)) # 4

如果需要对结果进行分组，可以这么做：

from collections import Counter
 
li = ["Dog", "Cat", "Mouse","Dog","Cat", "Dog"]
a = Counter(li)
 
print a # Counter({'Dog': 3, 'Cat': 2, 'Mouse': 1})
 
print "{0} : {1}".format(a.values(),a.keys())  # [1, 3, 2] : ['Mouse', 'Dog', 'Cat']
 
print(a.most_common(3)) # [('Dog', 3), ('Cat', 2), ('Mouse', 1)]

1.2 deque

deque即双头队列，队列元素能够在队列两端添加或删除。Deque支持线程安全的，经过优化的append和pop操作，在队列两端的相关操作都能达到近乎O(1)的时间复杂度。

以下的例子是执行基本的队列操作：

from collections import deque
q = deque(range(5))
q.append(5)
q.appendleft(6)
print q
print q.pop()
print q.popleft()
print q.rotate(3)
print q
print q.rotate(-1)
print q
 
# deque([6, 0, 1, 2, 3, 4, 5])
# 5
# 6
# None
# deque([2, 3, 4, 0, 1])
# None
# deque([3, 4, 0, 1, 2])

1.3 defaultdict

当查找一个不存在的键操作发生时，它的default_factory会被调用，提供一个默认的值，并将这对键值存储下来。其他的参数同普通的字典一致。

defaultdict对象可以用来追踪单词的位置，如：

from collections import defaultdict
 
s = "the quick brown fox jumps over the lazy dog"
 
words = s.split()
location = defaultdict(list)
for m, n in enumerate(words):
    location[n].append(m)
 
print location
 
# defaultdict(, {'brown': [2], 'lazy': [7], 'over': [5], 'fox': [3],
# 'dog': [8], 'quick': [1], 'the': [0, 6], 'jumps': [4]})

是选择lists或sets与defaultdict搭配取决于你的目的，使用list能够保存你插入元素的顺序，而使用set则不关心元素插入顺序，它会帮助消除重复元素。

from collections import defaultdict
 
s = "the quick brown fox jumps over the lazy dog"
 
words = s.split()
location = defaultdict(set)
for m, n in enumerate(words):
    location[n].add(m)
 
print location
 
# defaultdict(, {'brown': set([2]), 'lazy': set([7]),
# 'over': set([5]), 'fox': set([3]), 'dog': set([8]), 'quick': set([1]),
# 'the': set([0, 6]), 'jumps': set([4])})

另一种创建multidict的方法：

s = "the quick brown fox jumps over the lazy dog"
d = {}
words = s.split()
 
for key, value in enumerate(words):
    d.setdefault(key, []).append(value)
print d
 
# {0: ['the'], 1: ['quick'], 2: ['brown'], 3: ['fox'], 4: ['jumps'], 5: ['over'], 6: ['the'], 7: ['lazy'], 8: ['dog']}

一个更复杂的例子：

class Example(dict):
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value
 
a = Example()
 
a[1][2][3] = 4
a[1][3][3] = 5
a[1][2]['test'] = 6
 
print a # {1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}