Numpy使用

什么是Numpy

NumPy是使用Python进行科学计算的基础包, 代表 “Numeric Python”。

它是一个由多维数组对象和用于处理数组的例程集合组成的库。

多用在大型,多维数组上执行数值运算

官网 http://www.numpy.org/

加载 numpy

import numpy as np

np.__version__

'1.15.2'

为什么使用numpy

Python list 的特点

list 是不对存储类型做约束, 灵活缺点性能

L = [i for i in range(10)]
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

L[5]

L[5] = 100
L

[0, 1, 2, 3, 4, 100, 6, 7, 8, 9]

L[5] = 'wangsicong'
L

[0, 1, 2, 3, 4, 'wangsicong', 6, 7, 8, 9]

py列表的优点,存储数据类型没有限制,灵活
缺点但效率不高每个元素要检查类型

import array
arr = array.array('i', [i for i in range(10)])
arr

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr[5]

arr[5] = 100
arr

array('i', [0, 1, 2, 3, 4, 100, 6, 7, 8, 9])

# arr[5] = 'hehe'

TypeError Traceback (most recent call last)
in ()
----> 1 arr[5] = 'hehe'

TypeError: an integer is required (got type str)

array 没有看做向量或者矩阵没有配备相应的运算 numpy诞生

Numpy.array 的创建

nparr = np.array([i for i in range(10)])
nparr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

nparr[5]

nparr[5] = 100
# nparr[5] = 'hehe' 会报错
nparr

array([  0,   1,   2,   3,   4, 100,   6,   7,   8,   9])

# 特殊方法
nparr.dtype

dtype('int64')

nparr[5] = 3.74
nparr

array([0, 1, 2, 3, 4, 3, 6, 7, 8, 9])

# 隐式转换  3 直接干掉小数点
nparr.dtype

dtype('int64')

nparr2= np.array([1, 2, 3.0])
nparr2

array([1., 2., 3.])

nparr2.dtype

dtype('float64')

其他创建 numpy.array 的方法¶

np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

np.zeros(10).dtype

dtype('float64')

np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

传入元组参数

np.zeros((3, 5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

3 x 5 的全零矩阵

np.zeros(shape=(3, 5), dtype=int)

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

3 x 5 全1矩阵

np.ones((3, 5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

np.full((3, 5), 666)

array([[666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666]])

np.full(fill_value=666, shape=(3, 5))

array([[666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666],
       [666, 666, 666, 666, 666]])

arange函数相当于py 的range()

左闭右开步长

[i for i in range(0, 20, 2)]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

# [i for i in range(0, 1, 0.2)]
# 报下面的错误 不能传入浮点数

TypeError Traceback (most recent call last)
in ()
----> 1 [i for i in range(0, 1, 0.2)]

TypeError: 'float' object cannot be interpreted as an integer

np.arange(0, 1, 0.2)

array([0. , 0.2, 0.4, 0.6, 0.8])

np.arange(0, 10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

[i for i in range(10)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

linspace¶

包括 0~20 的等长10个点等差数列

np.linspace(0, 20, 10)

array([ 0.        ,  2.22222222,  4.44444444,  6.66666667,  8.88888889,
       11.11111111, 13.33333333, 15.55555556, 17.77777778, 20.        ])

np.linspace(0, 20, 11)

array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18., 20.])

随机数 random¶

randint¶

np.random.randint(0, 10)    # [0, 10)之间的随机数

# 生成一维数组  向量  第三个参数是数组个数

np.random.randint(0, 10, 10)

array([6, 4, 9, 2, 1, 5, 7, 8, 2, 8])

左闭右开

np.random.randint(0, 1, 10)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

np.random.randint(0, 10, size=10)

array([6, 9, 6, 2, 6, 0, 9, 6, 4, 0])

np.random.randint(0, 10, size=(3,5))

array([[8, 7, 5, 2, 6],
       [5, 1, 8, 3, 0],
       [7, 9, 4, 2, 0]])

seed函数随机种子

np.random.seed(666)

np.random.randint(0, 10, size=(3, 5))

array([[2, 6, 9, 4, 3],
       [1, 0, 8, 7, 5],
       [2, 5, 5, 4, 8]])

np.random.seed(666)
np.random.randint(0, 10, size=(3,5))

array([[2, 6, 9, 4, 3],
       [1, 0, 8, 7, 5],
       [2, 5, 5, 4, 8]])

总结 seed666 一致

random 随机浮点数

np.random.random()

0.7315955468480113

np.random.random((3,5))

array([[0.8578588 , 0.76741234, 0.95323137, 0.29097383, 0.84778197],
       [0.3497619 , 0.92389692, 0.29489453, 0.52438061, 0.94253896],
       [0.07473949, 0.27646251, 0.4675855 , 0.31581532, 0.39016259]])

以上是 0 ~ 1之间均匀的浮点数

normal正太分布

均值为0 方差为1 的随机浮点数

np.random.normal()

1.678190210883876

均值为10 方差为100

np.random.normal(10, 100)

-72.62832650185376

均值为10 方差为100 第三个为矩阵

np.random.normal(0, 1, (3, 5))

array([[ 0.82101369,  0.36712592,  1.65399586,  0.13946473, -1.21715355],
       [-0.99494737, -1.56448586, -1.62879004,  1.23174866, -0.91360034],
       [-0.27084407,  1.42024914, -0.98226439,  0.80976498,  1.85205227]])

np.random?

np.random.normal?
# 在notebook里面查看
help(np.random.normal)

Help on built-in function normal:

normal(...) method of mtrand.RandomState instance
    normal(loc=0.0, scale=1.0, size=None)
    
    Draw random samples from a normal (Gaussian) distribution.
    
    The probability density function of the normal distribution, first
    derived by De Moivre and 200 years later by both Gauss and Laplace
    independently [2]_, is often called the bell curve because of
    its characteristic shape (see the example below).
    
    The normal distributions occurs often in nature.  For example, it
    describes the commonly occurring distribution of samples influenced
    by a large number of tiny, random disturbances, each with its own
    unique distribution [2]_.
    
    Parameters
    ----------
    loc : float or array_like of floats
        Mean ("centre") of the distribution.
    scale : float or array_like of floats
        Standard deviation (spread or "width") of the distribution.
    size : int or tuple of ints, optional
        Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
        ``m * n * k`` samples are drawn.  If size is ``None`` (default),
        a single value is returned if ``loc`` and ``scale`` are both scalars.
        Otherwise, ``np.broadcast(loc, scale).size`` samples are drawn.
    
    Returns
    -------
    out : ndarray or scalar
        Drawn samples from the parameterized normal distribution.
    
    See Also
    --------
    scipy.stats.norm : probability density function, distribution or
        cumulative density function, etc.
    
    Notes
    -----
    The probability density for the Gaussian distribution is
    
    .. math:: p(x) = \frac{1}{\sqrt{ 2 \pi \sigma^2 }}
                     e^{ - \frac{ (x - \mu)^2 } {2 \sigma^2} },
    
    where :math:`\mu` is the mean and :math:`\sigma` the standard
    deviation. The square of the standard deviation, :math:`\sigma^2`,
    is called the variance.
    
    The function has its peak at the mean, and its "spread" increases with
    the standard deviation (the function reaches 0.607 times its maximum at
    :math:`x + \sigma` and :math:`x - \sigma` [2]_).  This implies that
    `numpy.random.normal` is more likely to return samples lying close to
    the mean, rather than those far away.
    
    References
    ----------
    .. [1] Wikipedia, "Normal distribution",
           http://en.wikipedia.org/wiki/Normal_distribution
    .. [2] P. R. Peebles Jr., "Central Limit Theorem" in "Probability,
           Random Variables and Random Signal Principles", 4th ed., 2001,
           pp. 51, 51, 125.
    
    Examples
    --------
    Draw samples from the distribution:
    
    >>> mu, sigma = 0, 0.1 # mean and standard deviation
    >>> s = np.random.normal(mu, sigma, 1000)
    
    Verify the mean and the variance:
    
    >>> abs(mu - np.mean(s)) < 0.01
    True
    
    >>> abs(sigma - np.std(s, ddof=1)) < 0.01
    True
    
    Display the histogram of the samples, along with
    the probability density function:
    
    >>> import matplotlib.pyplot as plt
    >>> count, bins, ignored = plt.hist(s, 30, density=True)
    >>> plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *
    ...                np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
    ...          linewidth=2, color='r')
    >>> plt.show()

05 numpy.array 基本操作¶

import numpy as np
# np.random.seed(0)

x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

X = np.arange(15).reshape((3, 5))
X

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

基本属性

查看数组维度

x.ndim

X.ndim

返回元组 x为一维

x.shape  # 数组的样子

(10,)

X.shape

(3, 5)

元素个数

x.size

X.size

numpy.array 的数据访问¶

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

x[0]

x[-1]

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

X[0][0] # 不建议！

X[0, 0]

X[0, -1]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

x[0:5]

array([0, 1, 2, 3, 4])

x[:5]

array([0, 1, 2, 3, 4])

x[5:]

array([5, 6, 7, 8, 9])

x[4:7]

array([4, 5, 6])

步长

x[::2]

array([0, 2, 4, 6, 8])

x[1::2]

array([1, 3, 5, 7, 9])

x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

数组切片

X[:2, :3]

array([[0, 1, 2],
       [5, 6, 7]])

X[:2][:3] # 结果不一样，在numpy中使用","做多维索引
#

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

X[:2]

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

X[:2][:3]#是在上面的基础上进行切片 但数组中只有2个元素(实际是[:3]代表取三个)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

X[:2, ::2]

array([[0, 2, 4],
       [5, 7, 9]])

X[::-1, ::-1]#矩阵反转

array([[14, 13, 12, 11, 10],
       [ 9,  8,  7,  6,  5],
       [ 4,  3,  2,  1,  0]])

X[0, :] #降低唯独  变成一维度  向量

array([0, 1, 2, 3, 4])

X[0, :].ndim

X[:, 0]

array([ 0,  5, 10])

以上切片获取了X 的子矩阵,子数组和python中的list有很大区别

Subarray of numpy.array

subX = X[:2, :3]
subX

array([[0, 1, 2],
       [5, 6, 7]])

subX[0, 0] = 100
subX

array([[100,   1,   2],
       [  5,   6,   7]])

但是元=原矩阵X到底有没有变呢不像list那样会创建新的矩阵,会影响X

array([[100,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14]])

X[0, 0] = 0
X

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

创建的subX和原矩阵无关的副本

subX = X[:2, :3].copy()

subX[0, 0] = 100
subX

array([[100,   1,   2],
       [  5,   6,   7]])

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Reshape

x.shape

(10,)

x.ndim

x.reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

A = x.reshape(2, 5)
A

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

B = x.reshape(1, 10)
B

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

B.ndim

B.shape

(1, 10)

x.reshape(-1, 10)

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

x.reshape(10, -1)

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])

x.reshape(2, -1)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

#x.reshape(3, -1)

报错

ValueError Traceback (most recent call last)
in ()
----> 1 x.reshape(3, -1)

ValueError: cannot reshape array of size 10 into shape (3,newaxis)

Numpy