Python数据分析—NumPy库学习笔记

NumPy库

1.数组

import numpy as np

(1)array(object, dtype=None, *, copy=True, order=‘K’, subok=False, ndmin=0)

'''
dtype : data-type, optional
The desired data-type for the array.  If not given, then the type will be determined as the minimum type required to hold the objects in the sequence.

ndmin : int, optional
Specifies the minimum number of dimensions that the resulting
array should have.  Ones will be pre-pended to the shape as
needed to meet this requirement.
order:指定在数据储存区的排列格式,'C'为行排列的C语言格式,'F'为列排列的Fortran格式
'''
data=np.array([1,2,3])
print(type(data))   #
print(data.dype)   #int32

返回一个numpy.ndarray对象,对象属性

dtype:返回数组元素的类型

shape:返回由整数组成的元组,元组中每个整数依次对应数组每个轴的元素个数

size:返回数组元素个数

ndim:返回数组轴的个数,即维度

nbytes:返回用于保存数据的字节数。

int32–>4个字节1个数字

int64–>8个字节1个数字

(2)用函数创建数组

1 同一种元素的数组

np.zeros(shape,dtype=float,order=‘C’),np.zeros_like(arr)

创建完全由0组成的数组,shape以元组形式声明数组的形状

np.ones(shape),np.ones_like(arr)

np.empty()

np.full(shape,fill_value)

2 对角线独特的数组 n*n型

np.eye()

eye(N, M=None, k=0, dtype=float, order='C')
'''
N,M:行,列数
k : int, optional
Index of the diagonal: 0 (the default) refers to the main diagonal,a positive value refers to an upper diagonal, and a negative value to a lower diagonal.
k=0默认主对角线,通过k值调整对角线位置
'''

np.identity()

identity(n, dtype=None)
n:/列数

np.diag()——(Diagonal line)

diag(v, k=0)
'''
Extract a diagonal or construct a diagonal array.
v : array_like
If `v` is a 2-D array, return a copy of its `k`-th diagonal.   v是二维数组,返回k对角线上的元素
If `v` is a 1-D array, return a 2-D array with `v` on the `k`-th   v是一维数组,返回一个在k上有元素的2维数组
 diagonal.
k : int, optional
Diagonal in question. The default is 0. Use `k>0` for diagonals
above the main diagonal, and `k<0` for diagonals below the main
 diagonal.
'''
3 等差和等比的数组

np.arange() 类似range()

arange([start,] stop[, step,], dtype=None)
返回一维数组,[start,stop)
start: 默认为0
step: 默认为1

np.linspace():生成等差数列

linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
'''
Returns `num` evenly spaced samples, calculated over the
interval [`start`, `stop`].
endpoint默认为True,代表包括末尾的值。
num:有多少元素生成,默认50
'''

np.logspace() 等比数列

logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None,axis=0)
base:底数,默认为10

4 创建自定义类型数组

my_type=np.dtype({
     "names":['book','version'],"formats":["S40",np.int]})
OR
my_type=np.dtype([("book",'S40'),('version',np.int)])

my_type=np.dtype({
     "names":['book','age'],"formats":["S40",np.int]})
print(my_type)   #[('book', 'S40'), ('age', '
infs=np.array([('hack',20),('mary',18)],dtype=my_type)
print(infs)          #[(b'hack', 20) (b'mary', 18)]
print(infs[0])       #(b'hack', 20)
print(infs['book']) #[b'hack' b'mary']

数组对象可修改,但不能增加

5 用from系列函数创建数组

np.frombuffer()

"""
frombuffer(buffer, dtype=float, count=-1, offset=0)

    Interpret a buffer as a 1-dimensional array.

    Parameters
    ----------
    buffer : buffer_like
        An object that exposes the buffer interface.
    dtype : data-type, optional
        Data-type of the returned array; default: float.
    count : int, optional
        Number of items to read. ``-1`` means all data in the buffer.
    offset : int, optional
        Start reading the buffer from this offset (in bytes); default: 0.

np.fromfunction()

fromfunction(function, shape, *, dtype=float, **kwargs)
function:函数对象(lambda)
mul=np.fromfunction(lambda i,j:(i+1)*(j+1), (9,9), dtype=np.int)
print(mul)
# 99乘法表

2 数组的索引和切片

(1)数组的轴

0,1,2… 把轴按索引排序

0轴为竖直方向,1轴为水平方向

(2)根据索引取得数组的元素

b=np.linspace(0,10,5)
print(b) #[ 0.   2.5  5.   7.5 10. ]
three=b[0],b[2],b[3]
three2=b[[0,2,3]]
print(three)    #(0.0, 5.0, 7.5)
print(three2)   #[0.  5.  7.5]

列表作为下标,返回一个数组

c=np.arange(12).reshape(3,4)
print(c[[0,2]])	#第0行和第2行
# [[ 0  1  2  3]
# [ 8  9 10 11]]
print(c[[0,2],[1,2]])   # [ 1 10] 表示在0,2轴基础上分别取索引1,2的值,得到的是c[0,1],c[2,2]

注:c[0,2]是指第0行的第3个元素=c[0][2]

data[[m,n,p…],[m,n,p…],…]

​ 0轴索引列表 1轴索引列表

数组作为下标,返回一个数组

类似列表

t=b==5
print(t)    #[False False  True False False]
print(b[t]) #[5.]

t2=c%2==0
print(c[t2])    #[ 0  2  4  6  8 10]

布尔类型数组作为下标,用于筛选数组中满足要求的元素,即为True的元素被筛选出来

(3)数组的切片

d=np.arange(12)
d1=[i for i in range(12)]
d2=d[2:8]
d3=d1[2:8]
print(d)    #[ 0  1  2  3  4  5  6  7  8  9 10 11]
print(d2)   #[2 3 4 5 6 7]
d2[0]=100
d3[0]=100
print(d)    #[  0   1 100   3   4   5   6   7   8   9  10  11]
print(d1)   #[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

注意: 切片得到数组和原数组共享了一个内存空间,而列表的切片并没有

e=np.arange(0,100,10).reshape(-1,1)+np.arange(0,10)
print(e[1:4])
print(e[1:4,2:5])
print(e[1,:])
print(e[:3,[0,3]])

结果:

[[10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]]
 [[12 13 14]
 [22 23 24]
 [32 33 34]]
 [10 11 12 13 14 15 16 17 18 19]
 [[ 0  3]
 [10 13]
 [20 23]]

3.针对数组的操作

1 数组变形
np.reshape(a, sharp, order='C')
#numpy类的方法
np.arange().resharp(sharp, order='C') # 数组对象的方法
a=np.arange(10).reshape(2,5)

b=np.reshape(a,(10,))   #[0 1 2 3 4 5 6 7 8 9]
c=np.reshape(a,(1,10))  #[[0 1 2 3 4 5 6 7 8 9]]

a1=a.flatten() # #[0 1 2 3 4 5 6 7 8 9]
# 变成一维数组
# 等价于ravel()方法
# 变成多维数组
b1=b[:,None]    #1轴为None
print(b1,b1.ndim)   #1维变为2维 shape=(10,1)
# None等价于np.newaxis
b2=b[np.newaxis,:]
print(b2)   # [[0 1 2 3 4 5 6 7 8 9]] (1,10)
b3=np.expand_dims(b,axis=0) 
print(b3)   #[[0 1 2 3 4 5 6 7 8 9]]
2 组合与分割

(1)水平组合 1轴

np.hstack(tup)

np.stack(tup, axis=1) 需要两数组形状相同,改变维数

d1=np.arange(9).reshape(3,3)
d2=np.arange(12).reshape(3,4)
d3=np.arange(15).reshape(3,5)

d4=np.hstack((d1,d2,d3)	#不改变维数
d5=d1*3
d6=np.stack((d1,d5))	#在0轴合并数组,2维变成3维
print(d6)
d7=np.stack((d1,d5),axis=1)
print(d7)
d8=np.concatenate((d1,d2),axis=1)	#在1轴合并数组,不改变维数
print(d8)
输出结果:
[[ 0  1  2  0  1  2  3  0  1  2  3  4]
 [ 3  4  5  4  5  6  7  5  6  7  8  9]
 [ 6  7  8  8  9 10 11 10 11 12 13 14]]
[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]
 [[ 0  3  6]
  [ 9 12 15]
  [18 21 24]]]
 [[[ 0  1  2]
  [ 0  3  6]]
 [[ 3  4  5]
  [ 9 12 15]]
 [[ 6  7  8]
  [18 21 24]]]
d8: [[ 0  1  2  0  1  2  3]
 [ 3  4  5  4  5  6  7]
 [ 6  7  8  8  9 10 11]]

(2)垂直组合 0轴

d9=d2.T	#转置
d10=np.vstack((d1,d9))
print(d10)
结果:
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]

(3)其他组合

dstack(tup) tup中两数组要求形状相同

column_stack(tup)

row_stack(tup)

e=np.arange(9).reshape(3,3)
e1=e*3
e2=np.dstack((e,e1))  #深度组合

输出:

[[[ 0  0]
  [ 1  3]
  [ 2  6]]
 [[ 3  9]
  [ 4 12]
  [ 5 15]]
 [[ 6 18]
  [ 7 21]
  [ 8 24]]]

总结:

'''
concatenate : Join a sequence of arrays along an existing axis.
stack : Join a sequence of arrays along a new axis.
block : Assemble an nd-array from nested lists of blocks.
vstack : Stack arrays in sequence vertically (row wise).
hstack : Stack arrays in sequence horizontally (column wise).
column_stack : Stack 1-D arrays as columns into a 2-D array.
dsplit : Split array along third axis.
'''

(4)数组的分割

split(ary, indices_or_sections, axis=0)
'''
If `indices_or_sections` is an integer, N, the array will be divided into N equal arrays along `axis`.  If such a split is not possible,an error is raised.

If `indices_or_sections` is a 1-D array of sorted integers, the entries indicate where along `axis` the array is split. 
'''
a2=np.arange(24).reshape(4,6)
a3=np.split(a2,2,axis=1) #沿着1轴将数组a2平分成2份
3 改编元素

np.append(arr, values, axis=None)

"""
arr : array_like
Values are appended to a copy of this array.
values : array_like
These values are appended to a copy of `arr`.  It must be of the
correct shape (the same shape as `arr`, excluding `axis`).  If
`axis` is not specified, `values` can be any shape and will be
flattened before use.
axis : int, optional
The axis along which `values` are appended.  If `axis` is not given, both `arr` and `values` are flattened before use.
"""
# axis没声明,将arr和value扁平化

```python

**np.insert(arr,obj, values, axis=None)**

```python
"""
arr:array_like
obj:int,slice or sequence 插入的位置
values:Values to insert into `arr`. If the type of `values` is different from that of `arr`, `values` is converted to the type of `arr`. `values` should be shaped so that ``arr[...,obj,...] = values`` is legal.
"""
e=f.flatten()
e1=np.insert(e,1,99)
print(e1)   #[ 1 99  2  3  4  5  6]

np.delete(arr, obj, axis=None)

沿指定轴删除数组元素

"""
arr : array_like
        Input array.
    obj : slice, int or array of ints
        Indicate indices of sub-arrays to remove along the specified axis.

        .. versionchanged:: 1.19.0
            Boolean indices are now treated as a mask of elements to remove,
            rather than being cast to the integers 0 and 1.

    axis : int, optional
        The axis along which to delete the subarray defined by `obj`.
        If `axis` is None, `obj` is applied to the flattened array.
"""

4.运算和通用函数

1.算术运算

一个数组和一个标量进行加、减、乘、除运算,结果是数组中的每个元素都与该标量进行相应运算,并返回一个新数组。

​ 原则上讲,只有形状一样的数组之间才能进行运算;但NumPy可以将某个数组通过“广播”的形式进行临时转换,转换条件为两数组某一轴长度一样。

2.比较和逻辑运算

​ 比较运算要求两数组形状一样

​ 逻辑运算:logical_and()、logical_or(), logical_not()

​ np.any(a, axis=None, out=None), a是数组对象,只要a中有一个元素为True, 则返回True

​ np.all(a, axis=None, out=None), a是数组对象,只有a中所有元素为True, 则返回True

3.通用函数

一元函数

np.sin, np.cos, np.tan 三角函数

np.arcsin, np.arccos, np.arctan 反三角函数

np.sinh, npcosh, np.tanh 双曲三角函数

np.arcsinh, np.arccosh, np.arctanh 反双曲三角函数

np.sqrt 求平方根

np.exp 计算自然指数

np.log, np.log2, np.log10 计算对数

二元函数

np.add, np.substract, np.multiply, np.divide 算数运算函数

np.equal, np.not_equal, np.less, np.less_equal, np.greater, np.greater_equal

比较运算函数

np.power(value, num) value**m 指数运算

np.remainder(arr, arr, out=arr) 得到余数

np.reciprocal() 返回倒数

np.real, np.imag, np.conj 返回复数的实部,虚部和完整的复数

np.sign, np.abs 得到对象符号和绝对值

np.round(arr,decimals) arr:arr_like decimals:保留几位小数 四舍五入

np.frompyfunc(funcname, nin, nout)

处理函数使之成为能对数组进行元素级操作的通用函数

func:Python中定义的函数对象

nin:一个整数,表示func函数所接受的参数个数

nout:一个整数,表示func函数所返回的对象个数

np.vectorize(bmi, otypes=[np.float])

将bmi函数矢量化,使能操作数组

>>> a = np.array([1+2j, 3+4j, 5+6j])
>>> a.real
array([1.,  3.,  5.])
>>> a.real = 9
>>> a
array([9.+2.j,  9.+4.j,  9.+6.j])
>>> a.real = np.array([9, 8, 7])
>>> a
array([9.+2.j,  8.+4.j,  7.+6.j])
>>> np.real(1 + 1j)
1.0

5.简单统计应用

1 正态分布函数
np.random.normal(loc=0.0, scale=1.0, size=None)
loc:浮点数,分布的平均值
scale:浮点数,分布的标准差
size:整数或整数元素的数组,输出的数据个数
np.random.randn(size=1000)
生成符合标准正态分布的随机数 loc=0.0,scale=1.0
2.简单统计函数

np.mean(arr), np.average(arr) 计算平均值,加权平均值

np.mean(a, axis=None, dtype=None, out=None, keepdims=np._NoValue)

np.var 计算方差

np.std 计算标准差

np.min, np.max 计算最小值,最大值

np.argmin, np.argmax 返回最小值,最大值的索引

np.ptp 计算全距,即最大值和最小值之间的差

np.percentile 计算百分位在统计对象中的值

np.median 计算统计对象的中值

np.sum 计算统计对象的和

np.random.randint(low, high=None, size=None, dtype=None)

Return random integers from the "discrete uniform" distribution of the specified dtype in the "half-open" interval [`low`, `high`). If `high` is None (the default), then results are from [0, `low`).
chi=np.random.normal(loc=0.0,scale=1.0,size=100)
eng=np.random.randn(100)    #标准正态分布

chi=(chi-chi.min())/10
eng=(eng-eng.min())/10

chi=np.round((100.1-20)*chi+20,1)
eng=np.round((100.1-20)*eng+20,1)

marks=np.vstack((chi,eng))
print(np.mean(marks,axis=1))	#沿1轴取平均值
print(np.std(marks,axis=1))
3.三元操作函数
np.where(condition,res1,res2)
# 条件满足输出结果1,不满足输出结果2
# 只写条件即返回数组下标
np.select(condlist, choicelist, default=0)

6.矩阵 numpy.linalg 线性代数模块

1.创建矩阵

矩阵是二维的数组

np.mat(data, dtype=None)
np.matrix(data, dtype=None, copy=True)
np.bmat(obj, ldict=None, gdict=None)
# data:arr_like,or str '1 2 3;4 5 6;7 8 9'
2.矩阵乘法

‘ * ’ or np.dot(matrix_tuple)

3.基础操作

求转置矩阵 .T

求逆矩阵 .I

创建单位矩阵 np.mat(np.eye(4))

7.矢量运算

矢量可以用一维数组表示

矢量相加:

f1=np.array([10*math.cos(math.pi/6),10*math.sin(math.pi/6)])
f2=np.array([0,12])
f=f1+f2
np.sqrt(np.sum(f**2))
# np.linalg.norm(f)

标量积:

f1=np.array([10*math.cos(math.pi/6),10*math.sin(math.pi/6)])
f2=np.array([0,12])
np.dot(f1,f2)

对于一维数组,dot执行的为标量积

对于矩阵,dot执行的为矩阵乘法

矢量积:

计算结果仍是一个矢量,方向与原来两个矢量所构成的平面垂直

np.cross(f1,f2)

张量积:

np.tensordot(a,b,axes=())
a=np.arange(6).reshape(2,3)
b=np.arange(9).reshape(3,3)
np.tensordot(a,b,axes=([1],[0]))# 相当于做矩阵乘法

a1=np.random.randint(2,size=(2,6,5))
b1=np.random.randint(2,size=(3,2,4))
c=np.tensordot(a,b,axes=([0],[1]))
print(c.shape)	#(6,5,3,4)

8.综合应用示例

1.多项式
np.poly1d(c_or_r, r=False, variable=None)
'''
Parameters
    ----------
c_or_r : array_like
 The polynomial's coefficients, in decreasing powers, or if the value of the second parameter is True, the polynomial's roots (values where the polynomial evaluates to 0).  For example,``poly1d([1, 2, 3])`` returns an object that represents :math:`x^2 + 2x + 3`, whereas ``poly1d([1, 2, 3], True)`` returns one that represents :math:`(x-1)(x-2)(x-3) = x^3 - 6x^2 + 11x -6`.
r : bool, optional
If True, `c_or_r` specifies the polynomial's roots; the default is False.
variable : str, optional
Changes the variable used when printing `p` from `x` to `variable`
 '''
a=np.poly1d([1,3,2])    #x^2 + 3x + 2
print(type(a))  #
print(a(0.0))      # 1.0

多项式运算

b=np.poly1d([1,-1,2])
print(a+b)  # 2x^2 + 2x + 4
print(a*b)  # x^4 + 2x^3 + x^2 + 4x + 4
print(a/b)  #(poly1d([1.]), poly1d([4., 0.])
# 结果的第一项为商,第二项为余数

b1=b+[-2,1]
print(b1)   #x^2 -3*x + 3
b2=b*[3,2,1]
print(b2)   #3x^4 -x^3 + 5x +3x +2
print(b.deriv())    #微分求导 2 x - 1
print(b.integ())    #积分 0.333x^3 -0.5*x^2 +2*x
print(b.roots)  #求根 [0.5+1.32287566j 0.5-1.32287566j]
# 以数组c作为根生成多项式,poly()函数能得到该多项式的系数
c=np.array([1,-1])
c1=np.poly(c)
print(c1)   #[ 1.  0. -1.]
# 拟合多项式函数
polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False)
'''
x:自变量
y:因变量
deg:拟合多项式,多项式的次数
'''
x=np.linspace(0,2*np.pi,10000)
y=np.sin(x)

f7=np.polyfit(x,y,7)
#polyval()生成并复制多项式
space7=np.abs(np.polyval(f7,x)-y)
# 10000个点的拟合结果
print(np.max(space7))   #0.0006625828772341516 
2 解线性方程组
np.linalg.solve(a,b)
# 针对0轴和1轴长度相等的矩阵
Computes the "exact" solution, `x`, of the well-determined, i.e., full rank, linear matrix equation `ax = b`.
d=np.array([[3,1],[1,2]])
e=np.array([9,8])
x=np.linalg.solve(d,e)
print(x)    #[2. 3.]

你可能感兴趣的:(python,笔记,python,numpy,数据分析)