Python Tricks - Common Data Structures in Python(2)

Array Data Structures

An array is a fundamental data structure available in most programming languages, and it has a wide range of uses across different algorithms.

In this chapter we’ll take a look at array implementations in Python that only use core language features or functionality that’s included in the Python standard library.

You’ll see the strengths and weaknesses of each approach so you can decide which implementation is right for your use case. But before we jump in, let’s cover some of the basics first.
先介绍一下基础知识。

How do arrays work, and what are they used for?
集合是怎么工作的?它们被用来干什么?

Arrays consist of fixed-size data records that allow each element to be efficiently located based on its index.
列表由固定大小的数据记录构成,并且允许根据列表的索引对列表中的元素进行定位。

Because arrays store information in adjoining blocks of memory, they’re considered contiguous data structures (as opposed to linked datas structure like linked lists, for example.)
因为列表以连续内存块存储信息,它们被认为是连续数据结构,这和链接数据结构比如链接列表是相反的。

A real world analogy for an array data structure is a parking lot:

You can look at the parking lot as a whole and treat it as a single object, but inside the lot there are parking spots indexed by a unique number. Parking spots are containers for vehicles—each parking spot can either be empty or have a car, a motorbike, or some other vehicle parked on it.

列表就像是停车场。我们可以将停车场视作一个整体,像对待单个对象一样对待它。但是在停车场内部又是由特殊的数字定位的停车点。

But not all parking lots are the same:

Some parking lots may be restricted to only one type of vehicle. For example, a motor-home parking lot wouldn’t allow bikes to be parked on it. A “restricted” parking lotcorresponds to a “typed array” data structure that only allows elements that have the same data type stored in them.

但是有一些停车点要求只能停一种车。

Performance-wise, it’s very fast to look up an element contained in an array given the element’s index. A proper array implementation guarantees a constant O(1) access time for this case.
在知道数据索引的情况下查询相应的数据的速度很快。

Python includes several array-like data structures in its standard library that each have slightly different characteristics. Let’s take a look at them:
python在其标准库中包含了一些列表类的数据结构,但是每个各有不同。

list – Mutable Dynamic Arrays

可变动态列表
Lists are a part of the core Python language. Despite their name, Python’s lists are implemented as dynamic arrays behind the scenes. This means a list allows elements to be added or removed, and the list will automatically adjust the backing store that holds these elements by allocating or releasing memory.
列表是python语言核心的一部分。不管他们叫什么名字,python的列表都是作为动态列表实现的。这就意味着一个列表允许元素增加或者删除而且列表可以自动的通过分配和释放内存来调整存储这些元素的空间。

Python lists can hold arbitrary elements—“everything” is an object in Python, including functions. Therefore, you can mix and match different kinds of data types and store them all in a single list.
python列表可以存储任意的元素,任意一个在python中的对象,包括函数。因此,你亏混合和对比不同的数据类型和将他们都存储在一个列表中。

This can be a powerful feature, but the downside is that supporting multiple data types at the same time means that data is generally less tightly packed. And as a result, the whole structure takes up more space.
多类型数据的存储的坏处在于数据没有被紧紧的堆压起来,结果就是整个结构需要占用更多的空间。

>>> arr = ['one', 'two', 'three']
>>> arr[0]
'one'

# Lists have a nice repr:
>>> arr
['one', 'two', 'three']

# Lists are mutable:
>>> arr[1] = 'hello'
>>> arr
['one', 'hello', 'three']

>>> del arr[1]
>>> arr
['one', 'three']

# Lists can hold arbitrary data types:
>>> arr.append(23)
>>> arr
['one', 'three', 23]
tuple – Immutable Containers

Just like lists, tuples are also a part of the Python core language. Unlike lists, however, Python’s tuple objects are immutable. This means elements can’t be added or removed dynamically—all elements in a tuple must be defined at creation time.
元组不可动态改变。

Just like lists, tuples can hold elements of arbitrary data types. Having this flexibility is powerful, but again, it also means that data is less tightly packed than it would be in a typed array.

>>> arr = 'one', 'two', 'three'
>>> arr[0]
'one'

# Tuples have a nice repr:
>>> arr
('one', 'two', 'three')

# Tuples are immutable:
>>> arr[1] = 'hello'
TypeError:
"'tuple' object does not support item assignment"

>>> del arr[1]
TypeError:
"'tuple' object doesn't support item deletion"

# Tuples can hold arbitrary data types:
# (Adding elements creates a copy of the tuple)
>>> arr + (23,)
('one', 'two', 'three', 23)
array.array – Basic Typed Arrays

Python’s array module provides space-efficient storage of basic Cstyle data types like bytes, 32-bit integers, floating point numbers, and so on.

Arrays created with the array.array class are mutable and behave similarly to lists, except for one important difference—they are “typed arrays” constrained to a single data type.
array.array只接受同一种数据类型。

Because of this constraint, array.array objects with many elements are more space-efficient than lists and tuples. The elements stored in them are tightly packed, and this can be useful if you need to store many elements of the same type.

Also, arrays support many of the same methods as regular lists, and you might be able to use them as a “drop-in replacement” without requiring other changes to your application code.

>>> import array
>>> arr = array.array('f', (1.0, 1.5, 2.0, 2.5))
>>> arr[1]
1.5

# Arrays have a nice repr:
>>> arr
array('f', [1.0, 1.5, 2.0, 2.5])

# Arrays are mutable:
>>> arr[1] = 23.0
>>> arr
array('f', [1.0, 23.0, 2.0, 2.5])
>>> del arr[1]
>>> arr
array('f', [1.0, 2.0, 2.5])
>>> arr.append(42.0)
>>> arr
array('f', [1.0, 2.0, 2.5, 42.0])

# Arrays are "typed":
>>> arr[1] = 'hello'
TypeError: "must be real number, not str"
str – Immutable Arrays of Unicode Characters

Python 3.x uses str objects to store textual data as immutable sequences of Unicode characters. Practically speaking, that means a str is an immutable array of characters. Oddly enough, it’s also a recursive data structure—each character in a string is a str object of length 1 itself.
str是一个不变的字符列表。奇怪的是,它也是一个递归数据结构,字符串中的每个字符本身就是长度为1的str对象。

String objects are space-efficient because they’re tightly packed and they specialize in a single data type. If you’re storing Unicode text, you should use them. Because strings are immutable in Python, modifying a string requires creating a modified copy. The closest equivalent to a “mutable string” is storing individual characters inside a list.
如果你存储unicode文本,你应该使用它们。因为字符串是不可变的。修改一个字符串需要创造一个修改后的副本。一个可变字符串近似于在列表中存入单个字符。

>>> arr = 'abcd'
>>> arr[1]
'b'

>>> arr
'abcd'

# Strings are immutable:
>>> arr[1] = 'e'
TypeError:
"'str' object does not support item assignment"

>>> del arr[1]
TypeError:
"'str' object doesn't support item deletion"

# Strings can be unpacked into a list to
# get a mutable representation:
>>> list('abcd')
['a', 'b', 'c', 'd']
>>> ''.join(list('abcd'))
'abcd'

# Strings are recursive data structures:
>>> type('abc')
""
>>> type('abc'[0])
""

其实上面的意思也很明确了,就是字符串就相当于包含了每个字符为单元素的列表,但是这个列表中的元素不可改变。

bytes – Immutable Arrays of Single Bytes

Bytes objects are immutable sequences of single bytes (integers in the range of 0 <= x <= 255). Conceptually, they’re similar to str objects, and you can also think of them as immutable arrays of bytes.
比特对象是单个比特组成的不可变的序列。在0-255之间的整数称之为一个比特。在概念上它们和字符对象比较类似,你可以把它们认为是不可变的比特列表。

Like strings, bytes have their own literal syntax for creating objects and they’re space-efficient. Bytes objects are immutable, but unlike strings, there’s a dedicated “mutable byte array” data type called bytearray that they can be unpacked into. You’ll hear more about that in the next section.

>>> arr = bytes((0, 1, 2, 3))
>>> arr[1]
1

# Bytes literals have their own syntax:
>>> arr
b'x00x01x02x03'
>>> arr = b'x00x01x02x03'

# Only valid "bytes" are allowed:
>>> bytes((0, 300))
ValueError: "bytes must be in range(0, 256)"

# Bytes are immutable:
>>> arr[1] = 23
TypeError:
"'bytes' object does not support item assignment"

>>> del arr[1]
TypeError:
"'bytes' object doesn't support item deletion"
bytearray – Mutable Arrays of Single Bytes

The bytearray type is a mutable sequence of integers in the range 0 <= x <= 255. They’re closely related to bytes objects with the main difference being that bytearrays can be modified freely—you can overwrite elements, remove existing elements, or add new ones. The bytearray object will grow and shrink accordingly.
bytearray可以做增删改的操作,是0-255之间的单类型整数构成的可变序列。

Bytearrays can be converted back into immutable bytes objects but this involves copying the stored data in full—a slow operation taking O(n) time.
比特列表可以被转换成不可变的比特对象但是这个关于全部的复制存储的所有数据,这是一个花费时间长的操作。

>>> arr = bytearray((0, 1, 2, 3))
>>> arr[1]
1

# The bytearray repr:
>>> arr
bytearray(b'x00x01x02x03')

# Bytearrays are mutable:
>>> arr[1] = 23
>>> arr
bytearray(b'x00x17x02x03')
>>> arr[1]
23

# Bytearrays can grow and shrink in size:
>>> del arr[1]
>>> arr
bytearray(b'x00x02x03')

>>> arr.append(42)
>>> arr
bytearray(b'x00x02x03*')

# Bytearrays can only hold "bytes"
# (integers in the range 0 <= x <= 255)
>>> arr[1] = 'hello'
TypeError: "an integer is required"

>>> arr[1] = 300
ValueError: "byte must be in range(0, 256)"

# Bytearrays can be converted back into bytes objects:
# (This will copy the data)
>>> bytes(arr)
b'x00x02x03*'
Key Takeaways

There are a number of built-in data structures you can choose from when it comes to implementing arrays in Python. In this chapter we’ve focused on core language features and data structures included in the standard library only.

If you’re willing to go beyond the Python standard library, third-party
packages like NumPy offer a wide range of fast array implementations
for scientific computing and data science.

By restricting ourselves to the array data structures included with Python, here’s what our choices come down to:

You need to store arbitrary objects, potentially with mixed data types? Use a list or a tuple, depending on whether you want an immutable data structure or not.

You have numeric (integer or floating point) data and tight packing and performance is important? Try out array.array and see if it does everything you need. Also, consider going beyond the standard library and try out packages like NumPy or Pandas.

You have textual data represented as Unicode characters? Use Python’s built-in str. If you need a “mutable string,” use a list of characters.

You want to store a contiguous block of bytes? Use the immutable bytes type, or bytearray if you need a mutable data structure.

In most cases, I like to start out with a simple list. I’ll only specialize later on if performance or storage space becomes an issue. Most of the time, using a general-purpose array data structure like list gives you the fastest development speed and the most programming convenience.

I found that this is usually much more important in the beginning than trying to squeeze out every last drop of performance right from the start.

你可能感兴趣的:(Python Tricks - Common Data Structures in Python(2))