Python包括很多标准编程数据结构,如
list
,
tuple
,
dict
,
set
,这些属于内置类型
collections模块包含多种数据结构的实现,扩展了其他模块中相应的结构。
Counter是一个容器,可以跟踪相同的值增加了多少次。这个类可以用来实现其他语言常用包或多集合数据结构来实现的算法。
Deque是一个双端队列,允许从任意一端增加或删除元素。
defaultdict是一个字典,如果找不到某个键,会相应一个默认值。
OrderedDict会记住增加元素的序列。
nametuple扩展了一般的
tuple
,除了为每个成员元素提供一个数值索引外还提供了一个属性名。
1.Counter
Counter作为一个容器,可以跟踪相同的值增加了多少次。这个类可以用来实现其他语言常用包或多集合数据结构来实现的算法。
初始化
Counter支持
3
种形式的初始化。调用Counter的构造函数时可以提供一个元素序列或者一个包含键和计数的字典,还可以使用关键字参数将字符串名映射到计数。
import
collections
print
collections.Counter([
'a'
,
'b'
,
'c'
,
'a'
,
'b'
,
'b'
])
print
collections.Counter({
'a'
:
2
,
'b'
:
3
,
'c'
:
1
})
print
collections.Counter(a
=
2
, b
=
3
, c
=
1
)
print
collections.Counter('aabbbc'
)
这四种形式的初始化结构都是一样的。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Counter({
'b'
:
3
,
'a'
:
2
,
'c'
:
1
})
如果不提供任何参数,可以构造一个空的Counter,然后通过update()方法填充。
import
collections
c
=
collections.Counter()
print
'Initial :'
, c
c.update(
'abcdcaa'
)
print
'Sequencel:'
, c
c.update({
'a'
:
1
,
'd'
:
6
})
print
'Dict :'
, c
计数值将根据新数据增加,替换数据不会改变计数。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Initial : Counter()
Sequencel: Counter({
'a'
:
3
,
'c'
:
2
,
'b'
:
1
,
'd'
:
1
})
Dict
: Counter({
'd'
:
7
,
'a'
:
4
,
'c'
:
2
,
'b'
:
1
})
访问计数
一旦填充了Counter,可以使用字典API获取它的值。
import
collections
c
=
collections.Counter(
'abcdccca'
)
for
letter
in
'abcde'
:
print
'%s : %d'
%
(letter, c[letter])
对于未知元素,Counter不会产生KerError,如果没有找到某个值,其计数为
0
。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
a :
2
b :
1
c :
4
d :
1
elements()方法返回一个迭代器,将生产Counter知道的所有元素
import
collections
c
=
collections.Counter(
'abcdccca'
)
c[
'e'
]
=
0
print
c
print
list
(c.elements())
不能保证元素顺序不变,另外计数小于或等于
0
的元素不包含在内。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Counter({
'c'
:
4
,
'a'
:
2
,
'b'
:
1
,
'd'
:
1
,
'e'
:
0
})
[
'a'
,
'a'
,
'c'
,
'c'
,
'c'
,
'c'
,
'b'
,
'd'
]
使用most_common()可以生成一个序列,其中包含n个最常遇到的输入值及其相应计数。
import
collections
c
=
collections.Counter()
with
open
(r
'd:\check_traffic.sh'
,
'rt'
) as f:
for
line
in
f:
c.update(line.rstrip().lower())
print
'Most common:'
for
letter, count
in
c.most_common(
5
):
print
'%s: %6d'
%
(letter, count)
统计系统所有单词中出现的字母,生成一个频率分布,然后打印
5
个最常见的字母。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Most common:
:
6535
e:
3435
:
3202
t:
3141
i:
3100
算术操作
Counter实例支持算术和集合操作来完成结果的聚集。
import
collections
c1
=
collections.Counter([
'a'
,
'a'
,
'c'
,
'b'
,
'a'
])
c2
=
collections.Counter(
'alphabet'
)
print
'c1:'
, c1
print
'c2:'
, c2
print
'\nCombined counts:'
print
c1
+
c2
print
'\nSubtraction:'
print
c1
-
c2
print
'\nIntersection:'
print
c1 & c2
print
'\nUnion:'
print
c1 | c2
每次通过操作生成一个新的Counter时,计数为
0
或者负的元素都会被删除。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
c1: Counter({
'a'
:
3
,
'c'
:
1
,
'b'
:
1
})
c2: Counter({
'a'
:
2
,
'b'
:
1
,
'e'
:
1
,
'h'
:
1
,
'l'
:
1
,
'p'
:
1
,
't'
:
1
})
Combined counts:
Counter({
'a'
:
5
,
'b'
:
2
,
'c'
:
1
,
'e'
:
1
,
'h'
:
1
,
'l'
:
1
,
'p'
:
1
,
't'
:
1
})
Subtraction:
Counter({
'a'
:
1
,
'c'
:
1
})
Intersection:
Counter({
'a'
:
2
,
'b'
:
1
})
Union:
Counter({
'a'
:
3
,
'c'
:
1
,
'b'
:
1
,
'e'
:
1
,
'h'
:
1
,
'l'
:
1
,
'p'
:
1
,
't'
:
1
})
2
defaultdict
标准字典包括一个方法setdefault()来获取一个值
,如果值不存在则建立一个默认值。defaultdict初始化容器是会让调用者提前指定默认值。
import
collections
def
default_factory():
return
'default value'
d
=
collections.defaultdict(default_factory, foo
=
'bar'
) 或者 d = collections.defaultdict(lambda :'333',{'foo'
=
'bar'}
)
print
'd:'
, d
print
'foo =>'
, d[
'foo'
]
print
'var =>'
, d[
'bar'
]
只要所有键都有相同的默认值,就可以使用这个方法。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
d: defaultdict(
0x0201FAB0
>, {
'foo'
:
'bar'
})
foo
=
> bar
var
=
> default value
3
deque
deque(两端队列)支持从任意一端增加和删除元素。常用的两种结果,即栈和队列,就是两端队列的退化形式,其输入和输出限制在一端。
import
collections
d
=
collections.deque(
'abcdefg'
)
print
'Deque:'
, d
print
'Length:'
,
len
(d)
print
'Left end'
, d[
0
]
print
'Right end'
, d[
-
1
]
d.remove(
'c'
)
print
'remove(c):'
, d
deque是一种序列容器,支持
list
操作,可以通过匹配标识从序列中间删除元素。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
Deque: deque([
'a'
,
'b'
,
'c'
,
'd'
,
'e'
,
'f'
,
'g'
])
Length:
7
Left end a
Right end g
remove(c): deque([
'a'
,
'b'
,
'd'
,
'e'
,
'f'
,
'g'
])
填充
deque可以从任意一端填充,在python实现称为“左端”和“右端”。
import
collections
d1
=
collections.deque()
d1.extend(
'abcdefg'
)
print
'extend:'
, d1
d1.append(
'h'
)
print
'append:'
, d1
d2
=
collections.deque()
d2.extendleft(
xrange
(
6
))
print
'extendleft'
, d2
d2.appendleft(
6
)
print
'appendleft'
, d2
extendleft()迭代处理其输入,对每个元素完成与appendleft()相同的处理。
>>>
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
RESTART
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
>>>
extend: deque([
'a'
,
'b'
,
'c'
,
'd'
,
'e'
,
'f'
,
'g'
])
append: deque([
'a'
,
'b'
,
'c'
,
'd'
,
'e'
,
'f'
,
'g'
,
'h'
])
extendleft deque([
5
,
4
,
3
,
2
,
1
,
0
])
appendleft deque([
6
,
5
,
4
,
3
,
2
,
1
,
0
])
利用
可以从两端利用deque元素,取决于应用的算法。
import
collections
print
"From the right:"
d
=
collections.deque(
'abcdefg'
)
while
True
:
try
:
print
d.pop(),
except
IndexError:
break