NumPy官方介绍和数据类型我已经做了笔记,如有需要点这里:NumPy基础(1. 准备!)
ndarry是NumPy的核心类,使用NumPy提供的方法构造出来的array都是ndarry类的实例。
>>> import numpy as np
# 将一个python list 传入生成一个array
>>> a = np.array([2,3,4])
>>> a
array([2, 3, 4])
>>> type(a)
<class 'numpy.ndarray'>
# 这个属性与Python的环境有关
>>> a.dtype
dtype('int32')
>>> b = np.array([1.2, 3.5, 5.1])
>>> b.dtype
dtype('float64')
>>> b = np.array([(1.5,2,3), (4,5,6)])
>>> b
array([[ 1.5, 2. , 3. ],
[ 4. , 5. , 6. ]])
>>> c = np.array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+0.j, 2.+0.j],
[ 3.+0.j, 4.+0.j]])
>>> np.zeros( (3,4) )
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
>>> np.ones( (2,3,4), dtype=np.int16 ) # dtype can also be specified
array([[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]],
[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]]], dtype=int16)
>>> np.empty( (2,3) ) # uninitialized, output may vary
array([[ 3.73603959e-262, 6.02658058e-154, 6.55490914e-260],
[ 5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])
>>> np.arange( 10, 30, 5 )
array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 ) # it accepts float arguments
array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])
When arange is used with floating point arguments, it is generally not possible to predict the number of elements
obtained, due to the finite floating point precision. For this reason, it is usually better to use the function linspace
that receives as an argument the number of elements that we want, instead of the step:
>>> from numpy import pi
>>> np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2
array([ 0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])
# 将 0 ~ 2π 切成100份再求每个元素的sin值
>>> x = np.linspace( 0, 2*pi, 100 ) # useful to evaluate function at lots of points
>>> f = np.sin(x)
如果元素个数与reshape需要的元素个数不相等会报错(ValueError)
>>> a = np.arange(6) # 1d array
>>> print(a)
[0 1 2 3 4 5]
>>>
>>> b = np.arange(12).reshape(4,3) # 2d array
>>> print(b)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
>>>
>>> c = np.arange(24).reshape(2,3,4) # 3d array
>>> print(c)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
如果数组太大,打印会自动跳过中心部分:
>>> print(np.arange(10000))
[ 0 1 2 ..., 9997 9998 9999]
>>>
>>> print(np.arange(10000).reshape(100,100))
[[ 0 1 2 ..., 97 98 99]
[ 100 101 102 ..., 197 198 199]
[ 200 201 202 ..., 297 298 299]
...,
[9700 9701 9702 ..., 9797 9798 9799]
[9800 9801 9802 ..., 9897 9898 9899]
[9900 9901 9902 ..., 9997 9998 9999]]
Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.
>>> a = np.array( [20,30,40,50] )
>>> b = np.arange( 4 )
>>> b
array([0, 1, 2, 3])
# array 减法
>>> c = a-b
>>> c
array([20, 29, 38, 47])
# array 平方
>>> b**2
array([0, 1, 4, 9])
# array 求sin
>>> 10*np.sin(a)
array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
# array比较
>>> a<35
array([ True, True, False, False])
Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product
can be performed using the @ operator (in python >=3.5) or the dot function or method:
>>> A = np.array( [[1,1],
... [0,1]] )
>>> B = np.array( [[2,0],
... [3,4]] )
# 对应元素积
>>> A * B # elementwise product
array([[2, 0],
[0, 4]])
# 矩阵积
>>> A @ B # matrix product
array([[5, 4],
[3, 4]])
# 矩阵积方法二
>>> A.dot(B) # another matrix product
array([[5, 4],
[3, 4]])
Some operations, such as += and *=, act in place to modify an existing array rather than create a new one.
>>> a = np.ones((2,3), dtype=int)
>>> b = np.random.random((2,3))
>>> a *= 3
>>> a
array([[3, 3, 3],
[3, 3, 3]])
>>> b += a
>>> b
array([[ 3.417022 , 3.72032449, 3.00011437],
[ 3.30233257, 3.14675589, 3.09233859]])
>>> a += b # b is not automatically converted to integer type
Traceback (most recent call last):
...
TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with
˓→casting rule 'same_kind'
When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise
one (a behavior known as upcasting).
# int32 + float64 = float64
>>> a = np.ones(3, dtype=np.int32)
>>> b = np.linspace(0,pi,3)
>>> b.dtype.name
'float64'
>>> c = a+b
>>> c
array([ 1. , 2.57079633, 4.14159265])
>>> c.dtype.name
'float64'
# -> complex128
>>> d = np.exp(c*1j)
>>> d
array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
-0.54030231-0.84147098j])
>>> d.dtype.name
'complex128'
Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of
the ndarray class.
>>> a = np.random.random((2,3))
>>> a
array([[ 0.18626021, 0.34556073, 0.39676747],
[ 0.53881673, 0.41919451, 0.6852195 ]])
# 所有元素求和
>>> a.sum()
2.5718191614547998
# 所有元素中最小
>>> a.min()
0.1862602113776709
>>> a.max()
0.6852195003967595
By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However,
by specifying the axis parameter you can apply an operation along the specified axis of an array:
# 3行4列
>>> b = np.arange(12).reshape(3,4)
>>> b
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
# 对列操作
>>> b.sum(axis=0) # sum of each column
array([12, 15, 18, 21])
>>>
# 对行操作
>>> b.min(axis=1) # min of each row
array([0, 4, 8])
>>>
>>> b.cumsum(axis=1) # cumulative sum along each row
array([[ 0, 1, 3, 6],
[ 4, 9, 15, 22],
[ 8, 17, 27, 38]])
NumPy provides familiar mathematical functions such as sin, cos, and exp. In NumPy, these are called “universal
functions”(ufunc). Within NumPy, these functions operate elementwise on an array, producing an array as output.
# 创建array [0,1,2]
>>> B = np.arange(3)
>>> B
array([0, 1, 2])
# exp(自然对数e的指数)操作作用每个元素
>>> np.exp(B)
array([ 1. , 2.71828183, 7.3890561 ])
# 对B每个元素开平方
>>> np.sqrt(B)
array([ 0. , 1. , 1.41421356])
>>> C = np.array([2., -1., 4.])
# array求和
>>> np.add(B, C)
array([ 2., 0., 6.])
其他通用方法:
all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj,
corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum,
mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose,
var, vdot, vectorize, where
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.
# Iterating操作,对1-10的array迭代,**3操作
>>> a = np.arange(10)**3
>>> a
array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
# Indexing操作
>>> a[2]
8
# Slicking操作,截取2-5(前包后不包)位置
>>> a[2:5]
array([ 8, 27, 64])
# [起始位置:结束位置:步长] = 赋值
>>> a[:6:2] = -1000 # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
>>> a
array([-1000, 1, -1000, 27, -1000, 125, 216, 343, 512, 729])
# 翻转
>>> a[ : :-1] # reversed a
array([ 729, 512, 343, 216, 125, -1000, 27, -1000, 1, -1000])
# Iterable-object
>>> for i in a:
... print(i**(1/3.))
...
nan
1.0
nan
3.0
nan
5.0
6.0
7.0
8.0
9.0
Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:
# 定义返回值为(x坐标*10 + y坐标)
>>> def f(x,y):
... return 10*x+y
...
# 通过fromfunction生成形状(5,4),元素值生成规则为f,类型为int的一个array
>>> b = np.fromfunction(f,(5,4),dtype=int)
>>> b
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23],
[30, 31, 32, 33],
[40, 41, 42, 43]])
# 通过x,y坐标获取值(坐标0开始)
>>> b[2,3]
23
# 第0-5行,第1列的元素
>>> b[0:5, 1] # each row in the second column of b
array([ 1, 11, 21, 31, 41])
# 所有行,第1列的元素,结果同上一个eg
>>> b[ : ,1] # equivalent to the previous example
array([ 1, 11, 21, 31, 41])
# 第1,3行所有列的元素
>>> b[1:3, : ] # each column in the second and third row of b
array([[10, 11, 12, 13],
[20, 21, 22, 23]])
When fewer indices are provided than the number of axes, the missing indices are considered complete slices:
>>> b[-1] # the last row. Equivalent to b[-1,:]
array([40, 41, 42, 43])
The expression within brackets in b[i] is treated as an i followed by as many instances of : as needed to represent
the remaining axes. NumPy also allows you to write this using dots as b[i,…].
The dots (…) represent as many colons as needed to produce a complete indexing tuple. For example, if x is an
array with 5 axes, then
• x[1,2,…] is equivalent to x[1,2,:,:,:],
• x[…,3] to x[:,:,:,:,3] and
• x[4,…,5,:] to x[4,:,:,5,:].
>>> c = np.array( [[[ 0, 1, 2], # a 3D array (two stacked 2D arrays)
... [ 10, 12, 13]],
... [[100,101,102],
... [110,112,113]]])
# 3维数组
>>> c.shape
(2, 2, 3)
# 第一维==1
>>> c[1,...] # same as c[1,:,:] or c[1]
array([[100, 101, 102],
[110, 112, 113]])
# 最后一维==2
>>> c[...,2] # same as c[:,:,2]
array([[ 2, 13],
[102, 113]])
Iterating over multidimensional arrays is done with respect to the first axis:
>>> for row in b:
... print(row)
...
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]
However, if one wants to perform an operation on each element in the array, one can use the flat attribute which is
an iterator over all the elements of the array:
>>> for element in b.flat:
... print(element)
...
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43
An array has a shape given by the number of elements along each axis:
>>> a = np.floor(10*np.random.random((3,4)))
>>> a
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])
# 查看形状属性
>>> a.shape
(3, 4)
The shape of an array can be changed with various commands. Note that the following three commands all return a
modified array, but do not change the original array:
>>> a.ravel() # returns the array, flattened
array([ 2., 8., 0., 6., 4., 5., 1., 1., 8., 9., 3., 6.])
>>> a.reshape(6,2) # returns the array with a modified shape
array([[ 2., 8.],
[ 0., 6.],
[ 4., 5.],
[ 1., 1.],
[ 8., 9.],
[ 3., 6.]])
>>> a.T # returns the array, transposed
array([[ 2., 4., 8.],
[ 8., 5., 9.],
[ 0., 1., 3.],
[ 6., 1., 6.]])
>>> a.T.shape
(4, 3)
>>> a.shape
(3, 4)
>>> b = np.array([1,2,3])
>>> b
array([1, 2, 3])
>>> b.T
array([1, 2, 3])
# 需通过增加维度的方法:
>>> c = b[np.newaxis,]
>>> c
array([[1, 2, 3]])
>>> c.T
array([[1],
[2],
[3]])
The order of the elements in the array resulting from ravel() is normally “C-style”, that is, the rightmost index “changes
the fastest”, so the element after a[0,0] is a[0,1]. If the array is reshaped to some other shape, again the array is treated
as “C-style”. NumPy normally creates arrays stored in this order, so ravel() will usually not need to copy its argument,
but if the array was made by taking slices of another array or created with unusual options, it may need to be copied.
The functions ravel() and reshape() can also be instructed, using an optional argument, to use FORTRAN-style arrays,
in which the leftmost index changes the fastest.
>>> a
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])
>>> a.resize((2,6))
>>> a
array([[ 2., 8., 0., 6., 4., 5.],
[ 1., 1., 8., 9., 3., 6.]])
# 给定3行自动计算列数
>>> a.reshape(3,-1)
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])
See also:
ndarray.shape, reshape, resize, ravel
通过堆叠扩展array
>>> a = np.floor(10*np.random.random((2,2)))
>>> a
array([[ 8., 8.],
[ 0., 0.]])
>>> b = np.floor(10*np.random.random((2,2)))
>>> b
array([[ 1., 8.],
[ 0., 4.]])
>>> np.vstack((a,b))
array([[ 8., 8.],
[ 0., 0.],
[ 1., 8.],
[ 0., 4.]])
>>> np.hstack((a,b))
array([[ 8., 8., 1., 8.],
[ 0., 0., 0., 4.]])
The function column_stack stacks 1D arrays as columns into a 2D array. It is equivalent to hstack only for 2D
arrays:
>>> from numpy import newaxis
>>> np.column_stack((a,b)) # with 2D arrays
array([[ 8., 8., 1., 8.],
[ 0., 0., 0., 4.]])
>>> a = np.array([4.,2.])
>>> b = np.array([3.,8.])
>>> np.column_stack((a,b)) # returns a 2D array
array([[ 4., 3.],
[ 2., 8.]])
>>> np.hstack((a,b)) # the result is different
array([ 4., 2., 3., 8.])
>>> a[:,newaxis] # this allows to have a 2D columns vector
array([[ 4.],
[ 2.]])
>>> np.column_stack((a[:,newaxis],b[:,newaxis]))
array([[ 4., 3.],
[ 2., 8.]])
>>> np.hstack((a[:,newaxis],b[:,newaxis])) # the result is the same
array([[ 4., 3.],
[ 2., 8.]])
# 对于高维数组还可以用方法concatenate,axis指定在哪个维度的基础上堆叠
>>> np.concatenate((b,b,c), axis=0)
On the other hand, the function row_stack is equivalent to vstack for any input arrays. In general, for arrays of
with more than two dimensions, hstack stacks along their second axes, vstack stacks along their first axes, and
concatenate allows for an optional arguments giving the number of the axis along which the concatenation should
happen.
In complex cases, r_ and c_ are useful for creating arrays by stacking numbers along one axis. They allow the use of
range literals (“:”)
>>> np.r_[1:4,0,4]
array([1, 2, 3, 0, 4])
When used with arrays as arguments, r_ and c_ are similar to vstack and hstack in their default behavior, but
allow for an optional argument giving the number of the axis along which to concatenate.
See also:
hstack, vstack, column_stack, concatenate, c_, r_
Using hsplit, you can split an array along its horizontal axis, either by specifying the number of equally shaped
arrays to return, or by specifying the columns after which the division should occur:
>>> a = np.floor(10*np.random.random((2,12)))
>>> a
array([[ 9., 5., 6., 3., 6., 8., 0., 7., 9., 7., 2., 7.],
[ 1., 4., 9., 2., 2., 1., 0., 6., 2., 2., 4., 0.]])
# 将2行12列的a切成了3份,每份4列
>>> np.hsplit(a,3) # Split a into 3
[array([[ 9., 5., 6., 3.],
[ 1., 4., 9., 2.]]), array([[ 6., 8., 0., 7.],
[ 2., 1., 0., 6.]]), array([[ 9., 7., 2., 7.],
[ 2., 2., 4., 0.]])]
# 在第3列切一刀,第四列切一刀
>>> np.hsplit(a,(3,4)) # Split a after the third and the fourth column
[array([[ 9., 5., 6.],
[ 1., 4., 9.]]), array([[ 3.],
[ 2.]]), array([[ 6., 8., 0., 7., 9., 7., 2., 7.],
[ 2., 1., 0., 6., 2., 2., 4., 0.]])]
vsplit splits along the vertical axis, and array_split allows one to specify along which axis to split.
>>> a
array([[ 3, 4, 5, 6],
[ 7, 8, 9, 10],
[11, 12, 13, 14]])
# 此分割方法结果必须shape相同
>>> np.split(a,2,axis=1)
[array([[ 3, 4],
[ 7, 8],
[11, 12]]),
array([[ 5, 6],
[ 9, 10],
[13, 14]])]
# 分割成2个shape不同的array
>>> np.array_split(a,2,axis=0)
[array([[ 3, 4, 5, 6],
[ 7, 8, 9, 10]]),
array([[11, 12, 13, 14]])]
When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is
often a source of confusion for beginners. There are three cases:
5.1.1 赋值操作
>>> a = np.arange(12)
>>> b = a # no new object is created
>>> b is a # a and b are two names for the same ndarray object
True
>>> b.shape = 3,4 # changes the shape of a
>>> a.shape
(3, 4)
5.1.2 Python中传递可变对象的引用不会copy
>>> def f(x):
... print(id(x))
...
>>> id(a) # id is a unique identifier of an object
148293216
>>> f(a)
148293216
shape独立,data共享
Different array objects can share the same data. The view method creates a new array object that looks at the same data.
>>> c = a.view()
>>> c is a
False
>>> c.base is a # c is a view of the data owned by a
True
>>> c.flags.owndata
False
>>>
# 改变c的shape,a的shape不改变
>>> c.shape = 2,6 # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c.shape
(2, 6)
# 改变c的data,a的data改变
>>> c[0,4] = 1234 # a's data changes
>>> a
array([[ 0, 1, 2, 3],
[1234, 5, 6, 7],
[ 8, 9, 10, 11]])
切片操作也是返回一个view
Slicing an array returns a view of it:
>>> s = a[ : , 1:3] # spaces added for clarity; could also be written "s = a[:,1:3]"
>>> s[:] = 10 # s[:] is a view of s. Note the difference between s=10 and s[:]=10
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])
所谓深拷贝就是复制一份数据喽!原来np就由copy方法可直接拷贝!
The copy method makes a complete copy of the array and its data.
# 使用copy方复制对象
>>> d = a.copy() # a new array object with new data is created
# 新对象不同
>>> d is a
False
# 数据base不同
>>> d.base is a # d doesn't share anything with a
False
# 修改d数据,a没改变
>>> d[0,0] = 9999
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])
copy方法一般用在切片之后,原array没用了,copy仅需要继续使用的数据array,更省内存啦。
Sometimes copy should be called after slicing if the original array is not required anymore. For example, suppose
a is a huge intermediate result and the final result b only contains a small fraction of a, a deep copy should be made
when constructing b with slicing:
>>> a = np.arange(int(1e8))
>>> b = a[:100].copy()
>>> del a # the memory of ``a`` can be released.
Array Creation arange, array, copy, empty, empty_like, eye, fromfile, fromfunction,
identity, linspace, logspace, mgrid, ogrid, ones, ones_like, r, zeros, zeros_like
Conversions ndarray.astype, atleast_1d, atleast_2d, atleast_3d, mat
Manipulations array_split, column_stack, concatenate, diagonal, dsplit, dstack, hsplit,
hstack, ndarray.item, newaxis, ravel, repeat, reshape, resize, squeeze, swapaxes,
take, transpose, vsplit, vstack
Questions all, any, nonzero, where
Ordering argmax, argmin, argsort, max, min, ptp, searchsorted, sort
Operations choose, compress, cumprod, cumsum, inner, ndarray.fill, imag, prod, put, putmask,
real, sum
Basic Statistics cov, mean, std, var
Basic Linear Algebra cross, dot, outer, linalg.svd, vdot
基础部分差不多了,还有一些花哨的技巧,后面更新。。。