《Python数据分析技术栈》第05章 01 熟悉数组和 NumPy 函数(Getting familiar with arrays and NumPy functions)
Here, we look at various methods of creating and combining arrays, along with commonly used NumPy functions.
在此,我们将介绍创建和组合数组的各种方法,以及常用的 NumPy 函数。
The NumPy package has to be imported before its functions can be used, as shown in the following. The shorthand notation or alias for NumPy is np.
在使用 NumPy 软件包的函数之前,必须先导入 NumPy 软件包,如下所示。NumPy 的简称或别名是 np。
import numpy as np
If you have not already installed NumPy, go to the Anaconda Prompt and enter the following command:
如果尚未安装 NumPy,请转到 Anaconda 提示符并输入以下命令:
pip install numpy
The basic unit in NumPy is an array. In Table 5-1, we look at the various methods for creating an array.
NumPy 的基本单位是数组。表 5-1 介绍了创建数组的各种方法。
The np.array function is used to create a one-dimensional or multidimensional array from a list.
np.array 函数用于从列表创建一维或多维数组。
np.array([[1,2,3],[4,5,6]])
The np.arange function is used to create a range of integers.
np.arange 函数用于创建整数范围。
np.arange(0,9)
#Alternate syntax:
np.arange(9)
#Generates 9 equally spaced integers starting from 0
The np.linspace function creates a given number of equally spaced values between two limits.
np.linspace 函数在两个限值之间创建给定数量的等距值。
np.linspace(1,6,5)
# This generates five equally spaced values between 1 and 6
The np.zeros function creates an array with a given number of rows and columns, with only one value throughout the array – “0”.
np.zeros 函数创建了一个具有给定行数和列数的数组,整个数组只有一个值–“0”。
np.zeros((4,2))
#Creates a 4*2 array with all values as 0
The np.ones function is similar to the np.zeros function, the difference being that the value repeated throughout the array is “1”.
np.ones 函数与 np.zeros 函数类似,不同之处在于整个数组中重复出现的值是 “1”。
np.ones((2,3))
#creates a 2*3 array with all values as 1
The np.full function creates an array using the value specified by the user.
np.full 函数使用用户指定的值创建一个数组。
np.full((2,2),3)
#Creates a 2*2 array with all values as 3
The np.empty function generates an array, without any particular initial value (array is randomly initialized).
np.empty 函数生成一个数组,没有任何特定的初始值(数组是随机初始化的)。
np.empty((2,2))
#creates a 2*2 array filled with random values
The np.repeat function creates an array from a list that is repeated a given number of times.
np.repeat 函数从列表中创建一个数组,并重复给定的次数。
np.repeat([1,2,3],3)
#Will repeat each value in the list 3 times
The randint function (from the np.random module) generates an array containing random numbers.
randint 函数(来自 np.random 模块)生成一个包含随机数的数组。
np.random.randint(1,100,5)
#Will generate an array with 5 random numbers between 1 and 100
One point to note is that arrays are homogeneous data structures, unlike containers (like lists, tuples, and dictionaries); that is, an array should contain items of the same data type. For example, we cannot have an array containing integers, strings, and floating-point (decimal) values together. While defining a NumPy array with items of different data types does not lead to an error while you write code, it should be avoided.
需要注意的一点是,与容器(如列表、元组和字典)不同,数组是同质数据结构;也就是说,数组应包含相同数据类型的项目。例如,我们不能让数组同时包含整数、字符串和浮点数(十进制)值。虽然使用不同数据类型的项定义 NumPy 数组不会在编写代码时导致错误,但应避免使用。
Now that we have looked at the various ways of defining an array, we look at the operations that we can perform on them, starting with the reshaping of an array.
在了解了定义数组的各种方法后,我们来看看可以对数组执行哪些操作,首先是重塑数组。
Reshaping an array is the process of changing the dimensionality of an array. The NumPy method “reshape” is important and is commonly used to convert a 1-D array to a multidimensional one.
重塑数组是改变数组维度的过程。NumPy 方法 "reshape "非常重要,常用于将一维数组转换为多维数组。
Consider a simple 1-D array containing ten elements, as shown in the following statement.
考虑一个包含十个元素的简单一维数组,如下面的语句所示。
x=np.arange(0,10)
We can reshape the 1-D array “x” into a 2-D array with five rows and two columns:
我们可以将一维数组 "x "重塑为五行两列的二维数组:
x.reshape(5,2)
As another example, consider the following array:
另一个例子是下面的数组:
x=np.arange(0,12)
x
Now, apply the reshape method to create two subarrays - each with three rows and two columns:
现在,应用重塑方法创建两个子数组,每个子数组有三行两列:
x=np.arange(0,12).reshape(2,3,2)
x
The product of the dimensions of the reshaped array should equal the number of elements in the original array. In this case, the dimensions of the array (2,3,2) when multiplied equal 12, the number of elements in the array. If this condition is not satisfied, reshaping fails to work.
重塑后数组的尺寸乘积应等于原数组的元素个数。在本例中,数组 (2,3,2) 的尺寸相乘等于 12,即数组中的元素个数。如果不满足这个条件,重塑将失败。
Apart from the reshape method, we can also use the shape attribute to change the shape or dimensions of an array:
除了重塑方法,我们还可以使用 shape 属性来改变数组的形状或尺寸:
x.shape=(5,2)
#5 is the number of rows, 2 is the number of columns
Note that the shape attribute makes changes to the original array, while the reshape method does not alter the array.
请注意,shape 属性会更改原始数组,而 reshape 方法不会更改数组。
The reshaping process can be reversed using the “ravel” method:
重塑过程可以用 "ravel "方法逆转:
x=np.arange(0,12).reshape(2,3,2)
x.ravel()
There are three methods for combining arrays: appending, concatenation, and stacking.
组合数组有三种方法:追加、连接和堆叠。
Appending involves joining one array at the end of another array. The np.append function is used to append two arrays.
追加是指将一个数组连接到另一个数组的末尾。np.append 函数用于追加两个数组。
x=np.array([[1,2],[3,4]])
y=np.array([[6,7,8],[9,10,11]])
np.append(x,y)
Concatenation involves joining arrays along an axis (either vertical or horizontal). The np.concatenate function concatenates arrays.
连接是指沿轴(垂直或水平)连接数组。np.concatenate 函数用于连接数组。
x=np.array([[1,2],[3,4]])
y=np.array([[6,7],[9,10]])
np.concatenate((x,y))
By default, the concatenate function joins the arrays vertically (along the “0” axis). If you want the arrays to be joined side by side, the “axis” parameter needs to be added with the value as “1”:
默认情况下,连接功能会垂直(沿 "0 "轴)连接数组。如果希望并排连接数组,则需要添加 "axis "参数,其值为 “1”:
np.concatenate((x,y),axis=1)
The append function uses the concatenate function internally.
追加函数内部使用连接函数。
Stacking: Stacking can be of two types, vertical or horizontal, as explained in the following.
堆叠: 堆叠有垂直和水平两种类型,具体说明如下。
As the name indicates, vertical stacking stacks arrays one below the other. The number of elements in each subarray of the arrays being stacked vertically must be the same for vertical stacking to work. The np.vstack function is used for vertical stacking.
顾名思义,垂直堆叠是将数组一个接一个地堆叠在一起。垂直堆叠的数组中每个子数组的元素数必须相同,这样垂直堆叠才有效。np.vstack 函数用于垂直堆叠。
x=np.array([[1,2],[3,4]])
y=np.array([[6,7],[8,9],[10,11]])
np.vstack((x,y))
See how there are two elements in each subarray of the arrays “x” and “y”.
看看 "x "和 "y "数组的每个子数组中都有两个元素。
Horizontal stacking stacks arrays side by side. The number of subarrays needs to be the same for each of the arrays being horizontally stacked. The np.hstack function is used for horizontal stacking.
水平堆叠是并排堆叠阵列。水平堆叠的每个数组的子数组数必须相同。np.hstack 函数用于水平堆叠。
In the following example, we have two subarrays in each of the arrays, “x” and “y”
在下面的示例中,每个数组中都有两个子数组,即 "x "和 “y”
x=np.array([[1,2],[3,4]])
y=np.array([[6,7,8],[9,10,11]])
np.hstack((x,y))
In the next section, we look at how to use logical operators to test for conditions in NumPy arrays.
下一节,我们将了解如何使用逻辑运算符测试 NumPy 数组中的条件。
NumPy uses logical operators (&,|,~), and functions like np.any, np.all, and np.where to check for conditions. The elements in the array (or their indexes) that satisfy the condition are returned.
NumPy 使用逻辑运算符 (&,|,~) 以及 np.any、np.all 和 np.where 等函数来检查条件。数组中满足条件的元素(或其索引)将被返回。
Consider the following array:
x=np.linspace(1,50,10)
x
Let us check for the following conditions and see which elements satisfy them:
让我们检查以下条件,看看哪些元素满足这些条件:
Checking if all the values satisfy a given condition: The np.all function returns the value “True” only if the condition holds for all the items of the array, as shown in the following example.
检查是否所有值都满足给定条件: np.all 函数只有在数组的所有项都满足条件的情况下才会返回值 “True”,如下例所示。
np.all(x>20)
#returns True only if all the elements are greater than 20
Checking if any of the values in the array satisfy a condition: The np. any function returns the value “True” if any of the items satisfy the condition.
检查数组中的任何值是否满足条件: 如果有任何项目满足条件,np.any 函数将返回值 “True”。
np.any(x>20)
#returns True if any one element in the array is greater than 20
Returning the index of the items satisfy a condition: The np.where function returns the index of the values in the array satisfying a given condition.
返回满足条件的项的索引 np.where 函数返回数组中满足给定条件的值的索引。
np.where(x<10)
#returns the index of elements that are less than 10
The np.where function is also useful for selectively retrieving or filtering values in an array. For example, we can retrieve those items that satisfy the condition “x<10”, using the following code statement:
np.where 函数还可用于选择性地检索或过滤数组中的值。例如,我们可以使用下面的代码语句检索满足条件 "x<10 "的项:
x[np.where(x<10)]
& operator (equivalent to and operator in Python): Returns True when all conditions are satisfied:
& 运算符(相当于 Python 中的 和 运算符): 当所有条件都满足时,返回 True:
x[(x>10) & (x<50)]
#Returns all items that have a value greater than 10 and less than 50
| operator (equivalent to or operator in Python): Returns True when any one condition, from a given set of conditions, is satisfied.
| 操作符(相当于 Python 中的 or 操作符): 当满足给定条件集中的任意一个条件时,返回 True。
x[(x>10) | (x<5)]
#Returns all items that have a value greater than 10 or less than 5
x[~(x<8)]
#Returns all items greater than 8
We now move on to some other important concepts in NumPy like broadcasting and vectorization. We also discuss the use of arithmetic operators with NumPy arrays.
接下来,我们将讨论 NumPy 中的其他一些重要概念,如广播和矢量化。我们还将讨论算术运算符在 NumPy 数组中的使用。