初识numpy

1. numpy是什么?

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

numpy是python用于科学计算的基础软件包。它提供多维数组对象,各种派生对象(如: masked arrays and matrices(矩阵)。Masked arrays are arrays that may have missing or invalid entries.)以及对数据进行快速计算的一系列常规操作:包括:数学,逻辑,形状变换,排序,选择,I / O,离散傅立叶变换,基本线性代数,基本统计运算,随机模拟等等。

At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. There are several important differences between NumPy arrays and the standard Python sequences:

Numpy资源库的核心是n维数组对象,它封装了相同数据类型的n维数组,许多操作都在编译后的代码中执行以提高性能。NumPy数组和标准Python序列之间有一些重要区别:

1.NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.

numpy数组在创建之后长度是固定的,而python列表长度是可变的。改变n维数组需要新建并删除原数组。

2.The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.(尚不理解)

numpy数组中的元素要求为同种数据类型,每个元素占用相同的内存。

3.NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.

NumPy数组有助于对大量数据进行高级数学和其他类型的运算。 通常,与使用Python的内置序列相比,执行此类操作的效率更高,所用代码更少。

4.A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient - one also needs to know how to use NumPy arrays.

越来越多的基于Python的科学和数学软件包都使用NumPy数组。 尽管这些通常支持Python序列输入,但它们会在处理之前将此类输入转换为NumPy数组,并且通常会输出NumPy数组。换句话说,为了高效地使用很多(甚至是大多数)基于科学/数学的Python软件,仅仅知道如何使用Python的内置序列类型是不够的-人们还需要知道如何使用NumPy数组。

The points about sequence size and speed are particularly important in scientific computing. As a simple example, consider the case of multiplying each element in a 1-D sequence with the corresponding element in another sequence of the same length. If the data are stored in two Python lists, a and b, we could iterate over each element:

在科学计算中序列长度与计算速度十分重要。例如:长度相同的两个1维列表相乘,若数据以python列表的形式存储在列表a与列表b中,我们需要迭代每一个元素:

a = [1,2,3]
b = [4,5,6]
c = []
for i in range(len(a)):
    c.append(a[i]*b[i])
    print(i)
    print(c)

This produces the correct answer, but if a and b each contain millions of numbers, we will pay the price for the inefficiencies of looping in Python. We could accomplish the same task much more quickly in C by writing (for clarity we neglect variable declarations and initializations, memory allocation, etc.)

这会产生正确的答案,但是如果a和b都包含数百万个数字,我们将为Python循环效率低下付出代价。我们可以通过编写C语言来更快地完成同一任务(为清楚起见,我们忽略了变量声明和初始化,内存分配等)。
以c语言来完成则速度快,但是代码复杂
NumPy gives us the best of both worlds: element-by-element operations are the “default mode” when an ndarray is involved, but the element-by-element operation is speedily executed by pre-compiled C code. In NumPy

NumPy为我们提供了两全其美的优势:当涉及到ndarray时,逐元素操作是“默认模式”,但是逐元素操作由预编译的C代码快速执行。 在NumPy中

a= np.array((1,2,3),dtype=int)
b= np.array((4,5,6),dtype=int)
c= a*b
print(a)
print(b)
print(c)

does what the earlier examples do, at near-C speeds, but with the code simplicity we expect from something based on Python. Indeed, the NumPy idiom is even simpler! This last example illustrates two of NumPy’s features which are the basis of much of its power: vectorization and broadcasting.

以接近C的速度完成了上述示例,我们期望基于Python的简单代码实现。 实际上,NumPy成语甚至更简单! 最后一个示例说明了使python强大的两项基础:矢量化和广播。

2. 为何numpy运行速度快?

Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just “behind the scenes” in optimized, pre-compiled C code. Vectorized code has many advantages, among which are:

向量化后代码中没有任何显式的循环,索引等操作,这些发生在优化的预编译C代码中的“幕后”。 向量化代码具有许多优点,其中包括:

  1. vectorized code is more concise and easier to read
    向量化的代码更简洁,更易于阅读
  2. fewer lines of code generally means fewer bugs
    更少的代码行通常意味着更少的错误
  3. the code more closely resembles standard mathematical notation (making it easier, typically, to correctly code mathematical constructs)
    该代码更类似于标准数学符号(通常更容易正确地编写数学结构)
  4. vectorization results in more “Pythonic” code. Without vectorization, our code would be littered with inefficient and difficult to read for loops.
    向量化会产生更多的“ Pythonic”代码。 没有向量化,我们的代码将效率低下,并且难以阅读循环。

Broadcasting is the term used to describe the implicit element-by-element behavior of operations; generally speaking, in NumPy all operations, not just arithmetic operations, but logical, bit-wise, functional, etc., behave in this implicit element-by-element fashion, i.e., they broadcast. Moreover, in the example above, a and b could be multidimensional arrays of the same shape, or a scalar and an array, or even two arrays of with different shapes, provided that the smaller array is “expandable” to the shape of the larger in such a way that the resulting broadcast is unambiguous. For detailed “rules” of broadcasting see basics.broadcasting.

广播是一个术语,用于描述操作的隐式逐元素行为。 一般而言,在NumPy中,所有运算(不仅是算术运算,逻辑,按位,函数等)都以这种隐式逐元素的方式运行,即它们广播。此外,在上面的示例中,a和b可以是相同形状的多维数组,也可以是标量和数组,甚至可以是形状不同的两个数组,条件是较小的数组可以“扩展”到较大的形状 以这样的方式使得最终的广播是明确的。 有关广播的详细“规则”,请参阅basics.broadcasting。

参考文献:
What is numpy

你可能感兴趣的:(Numpy)