numpy 创建n维数组

介绍(Introduction)

NumPy is a Python library used to perform numerical computations with large datasets. Numpy stands for Numerical Python and it is a popular library used by data scientists, especially for machine learning problems. NumPy is useful during pre-processing the data before you train it using a machine learning algorithm.

NumPy是一个Python库，用于对大型数据集进行数值计算。 Numpy代表数值Python，它是数据科学家广泛使用的库，尤其是在机器学习问题上。在使用机器学习算法训练数据之前，NumPy在预处理数据期间很有用。

Working with n-dimensional arrays is easier in Numpy compared to Python lists. Numpy arrays are also faster than Python lists since unlike lists, NumPy arrays are stored at one continuous place in memory. This enables the processor to perform computations efficiently with NumPy arrays.

与Python列表相比，在Numpy中使用n维数组更容易。 NumPy数组也比Python列表快，因为与列表不同，NumPy数组存储在内存中的一个连续位置。这使处理器能够使用NumPy数组有效地执行计算。

In this article, we will look at the basics of working with Numpy including array operations, matrix transformations, generating random values, and so on.

在本文中，我们将研究使用Numpy的基础知识，包括数组运算，矩阵转换，生成随机值等。

安装 (Installation)

Clear installation instructions are provided at the official website of NumPy, so I am not going to repeat it here again. Please find the instructions here.

NumPy的官方网站上提供了清晰的安装说明，因此在此不再赘述。请在此处找到说明。

与NumPy合作 (Working with NumPy)

导入NumPy(Importing NumPy)

To start using NumPy in your script, you have to import it.

要在脚本中开始使用NumPy，必须将其导入。

import numpy as np

将数组转换为NumPy数组 (Converting Arrays to NumPy Arrays)

You. can convert your existing python lists into NumPy arrays using the np.array() method.

您。可以使用np.array()方法将现有的python列表转换为NumPy数组。

arr = [1,2,3]
np.array(arr)

This also applies to multi-dimensional arrays. Numpy will keep track of the shape (dimensions) of the array.

这也适用于多维数组。 Numpy将跟踪数组的形状(尺寸)。

nested_arr = [[1,2],[3,4],[5,6]]
np.array(nested_arr)

NumPy Arange函数 (NumPy Arange Function)

When working with data, you will often come across use cases where you need to generate data.

在处理数据时，经常会遇到需要生成数据的用例。

Numpy as an “arange()” method with which you can generate a range of values between two numbers. The arange function takes the start, end, and an optional distance parameter.

Numpy作为“ arange()”方法，您可以使用该方法生成两个数字之间的值范围。 arange函数采用开始，结束和可选的distance参数。

print(np.arange(0,10)) # without distance parameter
OUTPUT:[0 1 2 3 4 5 6 7 8 9]print(np.arange(0,10,2)) # with distance parameter
OUTPUT: [0 2 4 6 8]

零和一 (Zeroes and Ones)

You can also generate an array or matrix of zeroes or ones using NumPy (trust me, you will need it). Here's how.

您还可以使用NumPy生成零或1的数组或矩阵(相信我，您将需要它)。这是如何做。

print(np.zeros(3))
OUTPUT: [0. 0. 0.]print(np.ones(3))
OUTPUT: [1. 1. 1.]

Both these functions support n-dimensional arrays as well. You can add the shape as a tuple with rows and columns.

这两个函数也都支持n维数组。您可以将形状添加为具有行和列的元组。

print(np.zeros((4,5)))
OUTPUT:
[
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
]print(np.ones((4,5)))
OUTPUT:
[
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
]

身份矩阵 (Identity Matrix)

You can also generate an identity matrix using a built-in Numpy function called “eye”.

您还可以使用称为“眼图”的内置Numpy函数生成一个单位矩阵。

np.eye(5)OUTPUT:
[[1., 0., 0., 0., 0.]
[0., 1., 0., 0., 0.]
[0., 0., 1., 0., 0.]
[0., 0., 0., 1., 0.]
[0., 0., 0., 0., 1.]]

NumPy Linspace函数 (NumPy Linspace Function)

NumPy has a linspace method that generates evenly spaced points between two numbers.

NumPy具有linspace方法，该方法可以在两个数字之间生成均匀间隔的点。

print(np.linspace(0,10,3))
OUTPUT:[ 0.  5. 10.]

In the above example, the first and second params are the start and the end points, while the third param is the number of points you need between the start and the end.

在上面的示例中，第一个和第二个参数是起点和终点，而第三个参数是您在起点和终点之间需要的点数。

Here is the same range with 20 points.

这是20点的相同范围。

print(np.linspace(0,10,20))
OUTPUT:[ 0. 0.52631579  1.05263158  1.57894737  2.10526316  2.63157895   3.15789474  3.68421053  4.21052632  4.73684211  5.26315789  5.78947368   6.31578947  6.84210526  7.36842105  7.89473684  8.42105263  8.94736842   9.47368421 10.]

随机数生成 (Random Number Generation)

When you are working on machine learning problems, you will often come across the need to generate random numbers. Numpy has in-built functions for that as well.

处理机器学习问题时，经常会遇到生成随机数的需求。 Numpy也具有内置功能。

But before we start generating random numbers, let's look at two major types of distributions.

但是在开始生成随机数之前，让我们看一下两种主要的分布类型。

Normal and Uniform Distribution 正态分布和均匀分布

Normal Distribution

正态分布

In a standard normal distribution, the values peak in the middle. The normal distribution is a very important concept in statistics since it seen in many natural phenomena. It is also called as the “bell curve”.

在标准正态分布中，值在中间达到峰值。正态分布是统计学中非常重要的概念，因为它在许多自然现象中都可以看到。它也被称为“钟形曲线”。

Uniform Distribution

均匀分布

If the values in the distribution have the probability as a constant, it is called a uniform distribution. eg. A coin toss has a uniform distribution since the probability of getting either heads or tails in a coin toss is the same.

如果分布中的值具有作为常数的概率，则称为均匀分布。例如。抛硬币具有均匀的分布，因为在抛硬币中获得正面或反面的概率是相同的。

Now that you know the two main distributions work, lets generate some random numbers.

既然您知道两个主要分布的工作原理，让我们生成一些随机数。

To generate random numbers in a uniform distribution, use the rand() function from np.random.
要生成均匀分布的随机数，请使用np.random中的rand()函数。

print(np.random.rand(10)) # array
OUTPUT: [0.46015141 0.89326339 0.22589334 0.29874476 0.5664353  0.39257603  0.77672998 0.35768031 0.95087408 0.34418542]print(np.random.rand(3,4)) # 3x4 matrix
OUTPUT:[[0.63775985 0.91746663 0.41667645 0.28272243]  [0.14919547 0.72895922 0.87147748 0.94037953]  [0.5545835  0.30870297 0.49341904 0.27852723]]

To generate random numbers in a normal distribution, use the randn() function from np.random.
要以正态分布生成随机数，请使用np.random中的randn ()函数。

print(np.random.randn(10))
OUTPUT:[-1.02087155 -0.75207769 -0.22696798  0.86739858  0.07367362 -0.41932541   0.86303979  0.13739312  0.13214285  1.23089936]print(np.random.randn(3,4))
OUTPUT: [[ 1.61013773  1.37400445  0.55494053  0.23133522]  [ 0.31290971 -0.30866402  0.33093618  0.34868954]  [-0.11659865 -1.22311073  0.36676476  0.40819545]]

To generate random integers between a low and high value, use the randint() function from np.random
以产生低和高值之间的随机整数，使用randint()函数从np.random

print(np.random.randint(1,100,10))
OUTPUT:[64 37 62 27  4 33 23 52 70  7]print(np.random.randint(1,100,(2,3)))
OUTPUT:[[92 42 38]  [87 69 38]]

A seed value is used if you want your random numbers to be the same during each computation. Here is how you set a seed value in NumPy.

如果希望每次计算期间的随机数都相同，则使用种子值。这是在NumPy中设置种子值的方法。

To set a seed value in NumPy
在NumPy中设置种子值

np.random.seed(42)
print(np.random.rand(4))OUTPUT:[0.37454012, 0.95071431, 0.73199394, 0.59865848]

Whenever you use a seed number, you will always get the same array generated without any change.

无论何时使用种子编号，都将始终生成相同的数组，而无需进行任何更改。

重塑数组 (Reshaping Arrays)

As a data scientist, you will work with re-shaping the data sets for different types of computations. In this section, we will look at how to work with the shapes of the arrays.

作为数据科学家，您将为不同类型的计算重新定型数据集。在本节中，我们将研究如何处理数组的形状。

To get the shape of an array, use the shape property.
要获得数组的形状，请使用shape属性。

arr = np.random.rand(2,2)
print(arr)
print(arr.shape)OUTPUT:[
[0.19890857 0.00806693]
[0.48199837 0.55373954]
]
(2, 2)

To reshape an array, use the reshape() function.
要重塑数组，请使用reshape()函数。

print(arr.reshape(1,4))
OUTPUT: [[0.19890857 0.00806693 0.48199837 0.55373954]]print(arr.reshape(4,1))
OUTPUT:[
[0.19890857]
[0.00806693]
[0.48199837]
[0.55373954]
]

In order to permanently reshape an array, you have to assign the reshaped array to the ‘arr’ variable. Also, reshape only works if the existing structure makes sense. You cannot reshape a 2x2 array into a 3x1 array.

为了永久地重塑数组，必须将重塑后的数组分配给'arr'变量。同样，只有在现有结构合理的情况下，重塑才有效。您不能将2x2阵列重塑为3x1阵列。

切片数据 (Slicing Data)

Let's look at fetching data from NumPy arrays. NumPy arrays work similarly to Python lists during fetch operations.

让我们看看从NumPy数组中获取数据。在获取操作期间，NumPy数组的工作方式类似于Python列表。

To slice an array
切片数组

myarr = np.arange(0,11)
print(myarr)
OUTPUT:[ 0  1  2  3  4  5  6  7  8  9 10]sliced = myarr[0:5]
print(sliced)
OUTPUT: [0 1 2 3 4]sliced[:] = 99
print(sliced)
OUTPUT: [99 99 99 99 99]print(myarr)
OUTPUT:[99 99 99 99 99  5  6  7  8  9 10]

If you look at the above example, even though we assigned the slice of “myarr” to the variable “sliced”, changing the value of “sliced” affects the original array. This is because the “slice” was just pointing to the original array.

如果您看上面的示例，即使我们将切片“ myarr”分配给了变量“ sliced”，但更改“ sliced”的值也会影响原始数组。这是因为“切片”仅指向原始数组。

To make an independent section of an array, use the copy() function.

要制作数组的独立部分，请使用copy()函数。

sliced = myarr.copy()[0:5]

Slicing multi-dimensional arrays work similarly to one-dimensional arrays.
切片多维数组的工作方式与一维数组类似。

my_matrix = np.random.randint(1,30,(3,3))
print(my_matrix)
OUTPUT: [
[21  1 20]
[22 16 27]
[24 14 22]
]print(my_matrix[0]) # print a single row
OUTPUT: [21  1 20]print(my_matrix[0][0]) # print a single value or row 0, column 0
OUTPUT: 21print(my_matrix[0,0]) #alternate way to print value from row0,col0
OUTPUT: 21

数组计算 (Array Computations)

Now let's look at array computations. Numpy is known for its speed when performing complex computations on large multi-dimensional arrays.

现在让我们看一下数组计算。在大型多维数组上执行复杂计算时，Numpy以其速度而闻名。

Let’s try a few basic operations.

让我们尝试一些基本操作。

new_arr = np.arange(1,11)
print(new_arr)OUTPUT: [ 1  2  3  4  5  6  7  8  9 10]

Addition
加成

print(new_arr + 5)OUTPUT: [ 6  7  8  9 10 11 12 13 14 15]

Subtraction
减法

print(new_arr - 5)OUTPUT: [-4 -3 -2 -1  0  1  2  3  4  5]

Array Addition
数组加法

print(new_arr + new_arr)OUTPUT: [ 2  4  6  8 10 12 14 16 18 20]

Array Division
阵列部

print(new_arr / new_arr)OUTPUT:[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

For Zero division errors, Numpy will convert the value to NaN (Not a number)

对于零除错误，Numpy会将值转换为NaN(不是数字)

There are also a few in-built computation methods available in NumPy to calculate values like mean,standard deviation, variance, etc.

NumPy中还有一些内置的计算方法可用于计算平均值，标准差，方差等值。

Sum — np.sum()
Sum — np.sum()
Square Root — np.sqrt()
平方根— np.sqrt()
Mean — np.mean()
均值— np.mean()
Variance — np.var()
方差— np.var()
Standard Deviation — np.std()
标准偏差— np.std()

While working with 2d arrays, you will often need to calculate row wise or column-wise sum, mean, variance, etc. You can use the optional axis parameter to specify if you want to choose a row or a column.

在使用2d数组时，通常需要计算按行或按列求和，均值，方差等。可以使用可选的axis参数指定要选择行还是列。

arr2d = np.arange(25).reshape(5,5)
print(arr2d)OUTPUT: [
[ 0  1  2  3  4]
[ 5  6  7  8  9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]
]print(arr2d.sum())
OUTPUT: 300print(arr2d.sum(axis=0))  # sum of columns
OUTPUT: [50 55 60 65 70]print(arr2d.sum(axis=1)) #sum of rows
OUTPUT: [ 10  35  60  85 110]

条件运算 (Conditional Operations)

You can also do conditional filtering using NumPy using the square bracket notation. Here is an example.

您也可以使用方括号表示法使用NumPy进行条件过滤。这是一个例子。

arr = np.arange(0,10)
OUTPUT: [0,2,3,4,5,6,7,8,9]print(arr > 4)
OUTPUT: [False False False False False  True  True  True  True  True]print(arr[arr > 4])
OUTPUT: [5 6 7 8 9]

概要 (Summary)

When it comes to working with large datasets, Numpy is a powerful tool to have in your toolkit. It is capable of handling advanced numeric computations and complex n-dimensional array operations. It is highly recommended that you learn NumPy if you plan to start a career in machine learning.

在处理大型数据集时，Numpy是工具包中的强大工具。它能够处理高级数值计算和复杂的n维数组运算。如果打算开始从事机器学习，强烈建议您学习NumPy。

Here is a google colab notebook if you want to try out these examples.

如果您想尝试以下示例，这是一个Google colab笔记本。

Get a summary of my articles and videos sent to your email every Monday morning. You can also connect with me here.

获取每个星期一早上发送到您的电子邮件的我的文章和视频的摘要。您也可以在这里与我联系。

翻译自: https://medium.com/manishmshiva/numpy-crash-course-building-powerful-n-dimensional-arrays-810edc87dcc7