Python Tricks - Common Data Structures in Python(4)

Sets and Multisets

In this chapter you’ll see how to implement mutable and immutable set and multiset (bag) data structures in Python, using built-in data types and classes from the standard library. First though, let’s do a quick recap of what a set data structure is:

A set is an unordered collection of objects that does not allow duplicate elements. Typically, sets are used to quickly test a value for membership in the set, to insert or delete new values from a set, and to compute the union or intersection of two sets.

In a “proper” set implementation, membership tests are expected to run in fast O(1) time. Union, intersection, difference, and subset operations should take O(n) time on average. The set implementations included in Python’s standard library follow these performance characteristics.
是否在这个集合中这个测试很快,时间复杂度为O(1)。其他的差值等运算时间复杂度为O(n)。

Just like dictionaries, sets get special treatment in Python and have some syntactic sugar that makes them easy to create. For example, the curly braces set expression syntax and set comprehensions allow you to conveniently define new set instances:

vowels = {'a', 'e', 'i', 'o', 'u'}
squares = {x * x for x in range(10)}

But be careful: To create an empty set you’ll need to call the set() constructor. Using empty curly-braces {} is ambiguous and will create an empty dictionary instead.
注意点就是我们需要用set()来初始化创建一个集合,而不是用{},因为如果使用{}的话,python会不知道这是集合还是字典。

Python and its standard library provide several set implementations. Let’s have a look at them.

set – Your Go-To Set

This is the built-in set implementation in Python. The set type is mutable and allows for the dynamic insertion and deletion of elements.
set可以动态插入和删除元素。

Python’s sets are backed by the dict data type and share the same performance characteristics. Any hashable object can be stored in a set.
python的集合set是被字典数据类型所支持的,享有同样的表现特征。任何可被哈希的对象可以存在集合中。这意思就是列表对象是不能存在集合中的。

>>> vowels = {'a', 'e', 'i', 'o', 'u'}
>>> 'e' in vowels
True

# 两个集合的交集
>>> letters = set('alice')
>>> letters.intersection(vowels)
{'a', 'e', 'i'}

>>> vowels.add('x')
>>> vowels
{'i', 'a', 'u', 'o', 'x', 'e'}

>>> len(vowels)
6
frozenset – Immutable Sets

The frozenset class implements an immutable version of set that cannot be changed after it has been constructed. Frozensets are static and only allow query operations on their elements (no inserts or deletions.) Because frozensets are static and hashable, they can be used as dictionary keys or as elements of another set, something that isn’t possible with regular (mutable) set objects.
frozenset就像字面意思一样set被创建以后就被冰封了起来。frozenset是静态的,可以进行查询操作,不支持插入和删除。因为frozenset是静态的和可哈希的,它可以被用于字典键,或者其他集合的元素,而这些是一般集合可变对象所不支持的。

>>> vowels = frozenset({'a', 'e', 'i', 'o', 'u'})
>>> vowels.add('p')
AttributeError:
"'frozenset' object has no attribute 'add'"

# Frozensets are hashable and can
# be used as dictionary keys:
>>> d = { frozenset({1, 2, 3}): 'hello' }
>>> d[frozenset({1, 2, 3})]
'hello'
collections.Counter – Multisets

The collections.Counter class in the Python standard library implements a multiset (or bag) type that allows elements in the set to have more than one occurrence.
这个就是Counter是允许元素出现多次的,只不过就像下面这样,他会自动叠加这个元素出现的次数。update第一次的a是1次,uodate第二次的时候a是3次,这样update结束后的Counter集合中a就是出现了4次。

This is useful if you need to keep track of not only if an element is part of a set, but also how many times it is included in the set:

>>> from collections import Counter
>>> inventory = Counter()

>>> loot = {'sword': 1, 'bread': 3}
>>> inventory.update(loot)
>>> inventory
Counter({'bread': 3, 'sword': 1})

>>> more_loot = {'sword': 1, 'apple': 1}
>>> inventory.update(more_loot)
>>> inventory
Counter({'bread': 3, 'sword': 2, 'apple': 1})

如果你不仅仅需要追踪到这个元素是否是一个集合的一部分而且需要追踪到这个元素在集合里面出现了多少次的时候,这个Counter很有用。

Here’s a caveat for the Counter class: You’ll want to be careful when counting the number of elements in a Counter object. Calling len() returns the number of unique elements in the multiset, whereas the total number of elements can be retrieved using the sum function:

>>> len(inventory)
3 # Unique elements

>>> sum(inventory.values())
6 # Total no. of elements

主要的是len()就是返回的是Counter中有多少个元素。sum()返回的是元素的总个数。

Key Takeaways
  • Sets are another useful and commonly used data structure included
    with Python and its standard library.
  • Use the built-in set type when looking for a mutable set.
  • frozenset objects are hashable and can be used as dictionary
    or set keys.
  • collections.Counter implements multiset or “bag” data
    structures.

你可能感兴趣的:(Python Tricks - Common Data Structures in Python(4))