python numpy
终极指南 (ULTIMATE GUIDE)
Numpy (which stands for Numerical Python) is a library available in Python programming language, supporting matrix data structures and multidimensional array objects. This the most basic scientific computing library that we need to learn, to begin our journey in the field of data science.
Numpy( 代表 Numeric Python )是一个可用Python编程语言提供的库,支持矩阵数据结构和多维数组对象。 这是我们需要学习的最基本的科学计算库,以开始我们在数据科学领域的旅程。
Numpy can compute basic mathematical calculations to make the process of creating advanced machine learning and artificial intelligence applications easier (by using comprehensive mathematical functions available within the library). Numpy allows us to carry out various complex mathematical calculations effortlessly along with several top-up libraries (like matplotlib, pandas, scikit-learn, etc.) built over it.
Numpy可以计算基本的数学计算 ,从而简化创建高级机器学习和人工智能应用程序的过程(通过使用库中提供的全面数学函数)。 Numpy允许我们轻松地进行各种复杂的数学计算,以及在其上构建的数个充值库(例如matplotlib,pandas,scikit-learn等)。
This library is a great tool for every data science professional to handle and analyze the data efficiently. Moreover, it is much easier to perform mathematical operations with numpy arrays in comparison to python’s list.
该库是每位数据科学专业人员有效处理和分析数据的好工具。 而且,与python的列表相比,使用numpy数组执行数学运算要容易得多。
Numpy library has various functions available in it. In this article, we will learn some essential and lesser-known functions of this library and how to implement them efficiently.
Numpy库具有各种可用功能。 在本文中,我们将学习该库的一些基本 功能和鲜为人知的功能,以及如何有效地实现它们。
Note: In this article, we will be using Google Colaboratory to execute our codes.
注意:在本文中,我们将使用 Google Colaboratory 执行代码。
导入numpy (Importing numpy)
Numpy can be simply imported in the notebook by using the following code:
可以使用以下代码将Numpy轻松导入笔记本中:
import numpy as np
Here, numpy is written as np to save time while coding, and also it is a de facto in the data science community.
在这里,numpy被编写为np以节省编码时的时间,并且它在数据科学界实际上是事实。
Now, let’s get started with numpy functions!
现在,让我们开始使用numpy函数!
使用numpy创建n维数组 (Creation of n-dimensional array using numpy)
An array is a data structure in the numpy library, which is just like a list which can store values, but the differences are that we can specify the data type of elements of an array ( dtype
function) and arrays are faster and take less memory to store data, allowing the code to be optimized even further.
数组是numpy库中的数据结构,就像可以存储值的列表一样,但是区别在于我们可以指定数组元素的数据类型( dtype
函数),并且数组速度更快,占用的内存更少存储数据,从而可以进一步优化代码。
To create a single-dimensional array we can use the following code:
要创建一维数组,我们可以使用以下代码:
import numpy as np
array = np.array([1,2,3,4,5])
The process for creating a multi-dimensional array is similar, we just have to add more values in []
brackets:
创建多维数组的过程 相似,我们只需要在[]
括号中添加更多值:
array = np.array([[1.1,2.2,3.0,4.6,5.0],[6.4,7.3,8.5,9.1,10.2])
numpy.linsapce()函数 (numpy.linsapce() function)
This numpy.linspace()
function is used to create an array of evenly spaced numbers in a given interval. We can also determine the number of samples we want to generate (however, it is an optional parameter default value is set to fifty samples). Another optional parameter we can add to this function is restep
which if True
will return the space
i.e. spacing between the samples along with the list. The function is: numpy.linspace(start, stop)
. Let’s apply this function in an example:
此numpy.linspace()
函数用于在给定间隔中创建均匀间隔的数字数组。 我们还可以确定要生成的样本数(但是,这是一个可选参数,默认值设置为五十个样本)。 我们可以添加到此函数的另一个可选参数是restep
,如果为True
则将返回space
即样本与列表之间的间隔。 该函数是: numpy.linspace(start, stop)
。 让我们在示例中应用此功能:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,10,10,dtype = int, retstep=True)
print(x)
x = np.linspace(0,10,100)
y = np.sin(x)
plt.plot(x,y,color = 'orange')
plt.show()
As we can see here, even to calculate mathematical functions we are using numpy
library. We used the linspace()
function to generate equally spaced values and used that array to plot sine
function plot.
正如我们在这里看到的,即使要计算数学函数,我们也使用numpy
库。 我们使用linspace()
函数生成等距的值,并使用该数组绘制sine
函数图。
随机采样功能 (Function for random sampling)
Here, thenumpy.random
function helps us calculate random values in various ways like generating random values in a given shape, generating an array by randomly selecting values from a given 1D array, or randomly permute a sequence of a given array or a range.
在这里, numpy.random
函数可以帮助我们以各种方式计算随机值,例如以给定形状生成随机值,通过从给定的1D数组中随机选择值来生成数组或随机排列给定数组或范围的序列。
numpy.random.rand(): With this function, we can create an array of uniformly distributed values over given input shape in a range [0,1) (i.e. ‘1’ is excluded). For example:
numpy.random.rand():使用此函数,我们可以在给定的输入形状上创建一个范围为[0,1](即排除“ 1”)的均匀分布值的数组。 例如:
np.random.rand(3,4)
As we can see in this example, an array of shape (3,4) is generated with all values lying in a range of [0,1).
正如我们在本例中看到的,将生成一个形状为(3,4)的数组,所有值都在[0,1)范围内。
numpy.random.choice(): This random function returns an array of random samples from a given input array. Other optional parameters that we can define are-
size
i.e. the output shape of the array,replace
i.e. whether we want repeated values in our output array andp
i.e. probability for each given sample of the input array. Check out the following example:numpy.random.choice():此随机函数从给定的输入数组返回一个随机样本数组。 其它可选参数,我们可以定义为─
size
即阵列的输出的形状,replace
即我们是否要重复的值在我们的输出阵列和p
即概率为输入阵列的每个给定的样本。 查看以下示例:
np.random.choice([1,2,3,4],(1,2,3),replace=True,p=[0.2,0.1,0.4,0.3])
Here, we have given the following input parameters to the functions- an input array with four elements, shape of output array ( 1
in the above code is the numbers of the arrays we want as output and 2,3
is output shape), repetition of values is True
and probability for each sample (where the sum of values should be equal to one).
在这里,我们为函数提供了以下输入参数-具有四个元素的输入数组,输出数组的形状(以上代码中的1
是我们想要作为输出的数组的编号,而2,3
是输出形状),重复的值是True
,是每个样本的概率(其中值的总和应等于1)。
np.random.permutation(): This function returns an array with a randomly permutated sequence (in case of input array) or a permuted range (in case of single-input).
np.random.permutation():此函数返回一个数组,该数组具有随机排列的序列(对于输入数组)或排列范围(对于单输入)。
arr = np.random.permutation(5)
print('Permutation of a range: ' + str(arr))
arr_ = np.random.permutation([1,2,3,4,5,6,7,8,9])
print('Permutation of elements of input array: ' + str(arr_))
In the first case, we have returned a permuted array over an input range and in the second case, we have returned a permuted array over an input array.
在第一种情况下,我们在输入范围内返回了一个排列的数组,在第二种情况下,我们在输入数组上返回了一个排列的数组。
The functions available in the
numpy.random
are not only limited to these, but you can also find the complete exhaustive list of functions here: numpy documentation page.
numpy.random
中可用的功能不仅限于这些,还可以在以下位置找到完整的功能清单: numpy文档页面 。
数组的索引和切片 (Indexing and slicing of an array)
To access and modify the objects of an array, we use indexing and slicing methods. Index values of the first element in the array of length n
, start from 0
value and index for the last element of the array will be n-1
.
要访问和修改数组的对象,我们使用索引和切片方法。 长度为n
的数组中第一个元素的索引值从0
值开始,数组最后一个元素的索引为n-1
。
a = [1,2,3,4,5,6]
b = a[3]
#output = 4
In the above example, this indexing method will return the fourth element of the array a.
在上面的示例中,此索引方法将返回数组a.
的第四个元素a.
For basic slicing of the array (i.e. splitting the array, in simple words), we use [start:stop:step_size]
notation.
对于数组的基本切片(即简单地拆分数组),我们使用[start:stop:step_size]
表示法。
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]arr[1:7:2]
#output array([1, 3, 5])
Advanced indexing and slicing: For a multi-dimensional array, we can index and slice the array by giving input of specific rows and columns values( in [rows,column]
format). For better understanding check the following example:
高级索引和切片:对于多维数组,我们可以通过输入特定的行和列值(以[rows,column]
格式)来对数组进行索引和切片。 为了更好地理解,请检查以下示例:
x = np.array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]])x[0:2,1:2]
Here, we have chosen the index of the first two rows (i.e. 0:2
in code) and a single column with index 1
(i.e. 1:2
in code).
在这里,我们选择了前两行的索引(即代码中的0:2
)和具有索引1
(即代码中的1:2
)的单列。
numpy.ravel()和numpy.flatten()函数 (numpy.ravel() and numpy.flatten() functions)
These functions return a 1D flattened form of the input array.
这些函数返回输入数组的一维展平形式。
arr = np.array([[1,2], [3,4],[5,6]])
x = arr.flatten()
print(x)
y = arr.ravel()
print(y)
You may observe that the output of both functions is the same! Now you might wonder what is the difference between the two functions as their output result is the same. It’s simple in numpy.flatten()
a copy of the original array is created while in numpy.ravel()
the original array is changed. Moreover, numpy.ravel()
function is faster than numpy.flatten()
as it does not occupy any memory.
您可能会发现两个函数的输出是相同的! 现在您可能想知道这两个函数之间的区别是什么,因为它们的输出结果是相同的。 在numpy.flatten()
中创建原始数组的副本很简单,而在numpy.ravel()
中更改原始数组则很简单。 而且, numpy.ravel()
函数比numpy.flatten()
更快,因为它不占用任何内存。
numpy.isclose()函数 (numpy.isclose() function)
This function is used to check whether two arrays are equal elements wise within tolerance and returns a boolean array. .isclose
function array can be used to assert
(verify) your code.
此函数用于检查两个数组在公差范围内是否等于相等的元素,并返回布尔数组。 .isclose
函数数组可用于assert
(验证)您的代码。
def inv(arr):
arr = np.array(arr)
inverse = np.linalg.inv(arr)
return inverseassert np.all(np.isclose(inv(np.array([[6, 1, 1], [4, -2, 5], [2, 8, 7]])).tolist(),np.array([[0.17647058823529413,-0.0032679738562091526, -0.02287581699346405],[0.05882352941176469, -0.130718954248366, 0.0849673202614379],[-0.1176470588235294, 0.1503267973856209, 0.0522875816993464]])))print("Sample Tests passed", '\U0001F44D')
In this above, example we are finding the inverse of a given matrix using another numpy function numpy.linalg.inv()
. After that we are verifying are result using assert
function and we have used numpy.isclose()
function to check the output values if they are close to the true values. The assert
function will only work if all the values are True
otherwise it will give an assertion error.
在上面的示例中,我们使用另一个numpy函数numpy.linalg.inv()
查找给定矩阵的逆。 之后,我们使用assert
函数验证结果,并使用numpy.isclose()
函数检查输出值是否接近真实值。 assert
函数仅在所有值均为True
时才起作用,否则会给出断言错误。
在numpy中堆叠数组 (Stack arrays in numpy)
There are two functions available in numpy for stacking different arrays.
numpy中有两个函数可用于堆叠不同的数组。
numpy.hstack(): this function stacks the arrays column-wise (i.e. horizontally), similar to the concatenation of arrays along the second axis (except 1D array, where it concatenates along the first axis). For this function, the input arrays should be of the same shape (except 1D arrays, which can be of any length).
numpy.hstack():此函数按列(即,水平)堆叠数组,类似于沿第二个轴的数组串联(一维数组除外,它沿第一个轴串联)。 对于此功能,输入数组应具有相同的形状(一维数组除外,该数组可以是任意长度)。
a = np.array([[1,2],[3,4],[5,6]])
b = np.array([[7,8],[9,10],[11,12]])np.hstack((a,b))
numpy.vstack(): this function stacks the arrays row-wise (i.e. vertically), similar to the concatenation of arrays along the first axis after 1-D arrays of shape (N,) have been reshaped to (1, N). For this function, the input arrays should be of the same shape (1D arrays must have the same length).
numpy.vstack():此函数按行(即垂直)堆叠数组,类似于将形状(N,)的1-D数组重塑为(1,N)后沿第一轴的数组连接。 对于此功能,输入数组应具有相同的形状(一维数组必须具有相同的长度)。
a = np.array([[1,2],[3,4],[5,6]])
b = np.array([[7,8],[9,10],[11,12]])np.vstack((a,b))
numpy的统计功能 (Statistics functions of numpy)
Numpy library has some useful functions for finding insights and analyzing the data statistically. We can calculate mean, median, variance, standard deviation, compute histogram over a set of data, and much more.
Numpy库具有一些有用的功能,可用于发现见解并进行统计分析。 我们可以计算平均值,中位数,方差,标准差,计算一组数据的直方图等等。
numpy.mean(): with this function, we can calculate the arithmetic mean of a given array where we can also specify the axis.
numpy.mean():使用此函数,我们可以计算给定数组的算术平均值,也可以在其中指定轴。
arr = a = np.array([[1, 2], [3, 4]])
np.mean(a,axis=1)#output:
array([1.5, 3.5])
numpy.histogram(): this function helps us compute the histogram over a set of data. Here, we have to input a flattened array of data over which we want to compute the histogram, we can also define the number of
bins
(i.e. number of equal-width bins in a given range (optional)), andrange
of upper limit and limit of the bins (optional).numpy.histogram():此函数可帮助我们计算一组数据的直方图。 在这里,我们必须输入一个扁平的上我们想要计算直方图数据的阵列中,我们也可以定义的数目
bins
(即,等于宽度箱中的给定范围(可选)的数),和range
的上限和垃圾箱的限制(可选)。
arr = np.array([1,2,3,2,2,3,4,5])
np.histogram(arr, bins= [1,2,3,4,5])
You can also visualize this histogram values on a plot using the matplotlib library.
您还可以使用matplotlib库在图表上可视化此直方图值。
You can find other numpy statistics functions here: numpy documentation page.
您可以在以下位置找到其他numpy统计功能: numpy文档页面 。
结论 (Conclusion)
I hope with this article you must have learned some essential and new functions of this library. I would recommend you to try implementing these functions on your own for a better understanding.
我希望通过本文,您一定已经学习了该库的一些基本功能和新功能。 我建议您尝试自己实现这些功能,以更好地理解。
Implementing these skills in daily use will definitely benefit you as a data science professional!
在日常使用中实施这些技能绝对可以使您成为数据科学专业人士!
If you have any questions or comments, please post them in the comment section.
如果您有任何问题或意见,请在评论部分中发布。
If you want to learn how to visualize the data and find the insights from it visually, then check out our another article:
如果您想学习如何可视化数据并直观地从中找到见解,请查看我们的另一篇文章:
Originally published at: www.patataeater.blogspot.com
最初发布于: www.patataeater.blogspot.com
翻译自: https://towardsdatascience.com/numpy-cheatsheet-for-essential-functions-python-2e7d8618d688
python numpy