导入numpy并查看版本
import numpy as np
np.__version__
'1.13.1'
什么是numpy?
即Numeric Python,python经过扩展以后可以支持数组和矩阵类型,包含大量的矩阵和数组的计算函数
numpy框架是后面机器学习和数据挖掘的基础,pandas、scipy、matplotlib等都是基于numpy
numpy中最基础数据结构就是ndarray:即数组
data = [1,2,3]
nd = np.array(data)
nd
array([1, 2, 3])
type(data),type(nd)
(list, numpy.ndarray)
# 查看nd中的元素的类型
nd.dtype
dtype('int32')
nd2 = np.array([1,3,4.6,"fdsaf",True])
nd2
array(['1', '3', '4.6', 'fdsaf', 'True'],
dtype='
nd2.dtype
dtype('
【注意】
1、数组中所有元素的类型都相同
2、如果数组是由列表来创建的,列表中元素类不同的时候会被统一成某个类型 (优先级:str>float>int)
# 注:图片在numpy中也是一个数组
# 导入一张图片
import matplotlib.pyplot as plt
# 这个工具是数据可视化分析工具,在这里我用来导入图片
girl = plt.imread("./source/girl.jpg")
type(girl) # 图片导入后是array类型的数组
numpy.ndarray
# 查看数组的形状
girl.shape
# shape属性是一个元组,元组的每一个元素代表了数组girl在这个维度上的元素个数
(900, 1440, 3)
girl
array([[[225, 231, 231],
[229, 235, 235],
[222, 228, 228],
...,
[206, 213, 162],
[211, 213, 166],
[217, 220, 173]],
[[224, 230, 230],
[229, 235, 235],
[223, 229, 229],
...,
[206, 213, 162],
[211, 213, 166],
[217, 220, 173]],
[[224, 230, 230],
[229, 235, 235],
[223, 229, 229],
...,
[206, 213, 162],
[211, 213, 166],
[219, 221, 174]],
...,
[[175, 187, 213],
[180, 192, 218],
[175, 187, 213],
...,
[155, 162, 180],
[153, 160, 178],
[156, 163, 181]],
[[175, 187, 213],
[180, 192, 218],
[174, 186, 212],
...,
[155, 162, 180],
[153, 160, 178],
[155, 162, 180]],
[[177, 189, 215],
[181, 193, 219],
[174, 186, 212],
...,
[155, 162, 180],
[153, 160, 178],
[156, 163, 181]]], dtype=uint8)
# 用plt工具来显示一下图片
plt.imshow(girl)
plt.show()
创建一张图片
# 创建一张图片
boy = np.array([[[0.4,0.5,0.6],[0.8,0.8,0.2],[0.6,0.9,0.5]],
[[0.12,0.32,0.435],[0.22,0.45,0.9],[0.1,0.2,0.3]],
[[0.12,0.32,0.435],[0.12,0.32,0.435],[0.12,0.32,0.435]],
[[0.12,0.32,0.435],[0.12,0.32,0.435],[0.12,0.32,0.435]]])
boy
array([[[ 0.4 , 0.5 , 0.6 ],
[ 0.8 , 0.8 , 0.2 ],
[ 0.6 , 0.9 , 0.5 ]],
[[ 0.12 , 0.32 , 0.435],
[ 0.22 , 0.45 , 0.9 ],
[ 0.1 , 0.2 , 0.3 ]],
[[ 0.12 , 0.32 , 0.435],
[ 0.12 , 0.32 , 0.435],
[ 0.12 , 0.32 , 0.435]],
[[ 0.12 , 0.32 , 0.435],
[ 0.12 , 0.32 , 0.435],
[ 0.12 , 0.32 , 0.435]]])
plt.imshow(boy)
plt.show()
二维数组也可以表示一张图片,二维的图片是灰度级的
#二维数组也可以表示一张图片,二维的图片是灰度级的
boy2 = np.array([[0.1,0.2,0.3,0.4],
[0.6,0.3,0.2,0.5],
[0.9,0.8,0.3,0.2]])
boy2
array([[ 0.1, 0.2, 0.3, 0.4],
[ 0.6, 0.3, 0.2, 0.5],
[ 0.9, 0.8, 0.3, 0.2]])
plt.imshow(boy2,cmap="gray")
plt.show()
图片切割:取出图片一部分
# 切图片
g = girl[:200,:300]
plt.imshow(g)
plt.show()
1)np.ones(shape,dtype=None,order=‘C’)
np.ones((2,3,3,4,5))
# shape参数代表的是数组的形状,要求传一个元组或者列表,元组的每一元素
# 代表创建出来的数组的该维度上的元素的个数
array([[[[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]]],
[[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]]],
[[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]]]],
[[[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]]],
[[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]]],
[[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]],
[[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1.]]]]])
ones = np.ones((168,233,3))
plt.imshow(ones)
plt.show()
2)np.zeros(shape,dtype=“float”,order=“C”)
np.zeros((1,2,3))
array([[[ 0., 0., 0.],
[ 0., 0., 0.]]])
3)np.full(shape,fill_value,dtype=None)
np.full((2,3),12)
array([[12, 12, 12],
[12, 12, 12]])
4)np.eye(N,M,k=0,dtype=‘float’)
np.eye(6)
array([[ 1., 0., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 1.]])
np.eye(3,4)
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.]])
np.eye(5,4)
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.],
[ 0., 0., 0., 0.]])
5)np.linspace(start,stop,num=50)
np.linspace(1,10,num=100)
# 从start到stop平均分成num份,取切分点
array([ 1. , 1.09090909, 1.18181818, 1.27272727,
1.36363636, 1.45454545, 1.54545455, 1.63636364,
1.72727273, 1.81818182, 1.90909091, 2. ,
2.09090909, 2.18181818, 2.27272727, 2.36363636,
2.45454545, 2.54545455, 2.63636364, 2.72727273,
2.81818182, 2.90909091, 3. , 3.09090909,
3.18181818, 3.27272727, 3.36363636, 3.45454545,
3.54545455, 3.63636364, 3.72727273, 3.81818182,
3.90909091, 4. , 4.09090909, 4.18181818,
4.27272727, 4.36363636, 4.45454545, 4.54545455,
4.63636364, 4.72727273, 4.81818182, 4.90909091,
5. , 5.09090909, 5.18181818, 5.27272727,
5.36363636, 5.45454545, 5.54545455, 5.63636364,
5.72727273, 5.81818182, 5.90909091, 6. ,
6.09090909, 6.18181818, 6.27272727, 6.36363636,
6.45454545, 6.54545455, 6.63636364, 6.72727273,
6.81818182, 6.90909091, 7. , 7.09090909,
7.18181818, 7.27272727, 7.36363636, 7.45454545,
7.54545455, 7.63636364, 7.72727273, 7.81818182,
7.90909091, 8. , 8.09090909, 8.18181818,
8.27272727, 8.36363636, 8.45454545, 8.54545455,
8.63636364, 8.72727273, 8.81818182, 8.90909091,
9. , 9.09090909, 9.18181818, 9.27272727,
9.36363636, 9.45454545, 9.54545455, 9.63636364,
9.72727273, 9.81818182, 9.90909091, 10. ])
np.logspace(1,10,num=10)
# 从1-10分成10份(对应的分别是1、2、3...10)
# logx = 1 logx = 2 logx = 3 => 返回值10^1、10^2 .... 10^10
array([ 1.00000000e+01, 1.00000000e+02, 1.00000000e+03,
1.00000000e+04, 1.00000000e+05, 1.00000000e+06,
1.00000000e+07, 1.00000000e+08, 1.00000000e+09,
1.00000000e+10])
6)np.arange([start,]stop,[step,]dtype=None) "[]"中是可选项
np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.arange(2,12)
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
np.arange(2,12,2)
array([ 2, 4, 6, 8, 10])
7)np.random.randint(low,high=None,size=None,dtype=‘I’)
np.random.randint(3,10,size=(10,10,3))
# 随机生成整数数组
array([[[4, 6, 6],
[5, 9, 4],
[5, 9, 6],
[4, 6, 4],
[7, 4, 9],
[5, 9, 4],
[8, 6, 3],
[7, 5, 8],
[8, 3, 4],
[5, 4, 8]],
[[6, 5, 8],
[9, 3, 5],
[8, 4, 4],
[5, 9, 8],
[8, 5, 6],
[9, 4, 6],
[5, 8, 8],
[5, 7, 6],
[3, 7, 9],
[5, 5, 7]],
[[4, 7, 5],
[9, 4, 9],
[3, 3, 4],
[8, 4, 8],
[3, 6, 3],
[4, 4, 3],
[4, 4, 5],
[5, 5, 4],
[5, 7, 9],
[4, 4, 9]],
[[6, 3, 8],
[5, 9, 6],
[5, 6, 7],
[3, 8, 6],
[3, 7, 8],
[6, 9, 7],
[6, 7, 3],
[7, 5, 4],
[3, 3, 6],
[9, 9, 7]],
[[3, 5, 6],
[7, 4, 6],
[5, 3, 7],
[3, 6, 3],
[8, 3, 8],
[7, 9, 7],
[8, 7, 9],
[4, 7, 5],
[8, 8, 6],
[4, 5, 4]],
[[4, 4, 9],
[9, 8, 7],
[6, 6, 6],
[4, 9, 5],
[6, 9, 6],
[9, 4, 8],
[4, 7, 9],
[9, 4, 9],
[6, 9, 3],
[8, 5, 9]],
[[7, 6, 3],
[4, 5, 4],
[5, 6, 7],
[7, 3, 4],
[7, 4, 8],
[7, 5, 6],
[4, 9, 9],
[4, 4, 8],
[9, 3, 6],
[3, 6, 9]],
[[7, 7, 4],
[8, 6, 3],
[3, 8, 7],
[5, 6, 9],
[5, 8, 4],
[9, 4, 4],
[3, 6, 6],
[6, 7, 4],
[4, 8, 8],
[4, 6, 3]],
[[7, 4, 9],
[5, 3, 7],
[5, 9, 4],
[5, 7, 9],
[7, 6, 6],
[6, 3, 3],
[9, 4, 4],
[5, 3, 4],
[5, 7, 9],
[3, 3, 5]],
[[7, 3, 8],
[7, 6, 8],
[5, 7, 4],
[4, 4, 7],
[4, 5, 9],
[8, 3, 5],
[5, 9, 9],
[6, 3, 7],
[9, 5, 7],
[8, 5, 9]]])
8)np.random.randn(d0,d1,…,dn)
从第一维度到第n维度生成一个数组,数组中的数字符合标准正态分布
np.random.randn(2,3,10)
# N(0,1)
array([[[-0.03414751, -1.01771263, 1.12067965, -0.43953023, -1.82364645,
-0.0971702 , -0.65734554, -0.10303229, 1.52904104, -0.48624526],
[-0.29295679, -1.09430988, 0.07499788, 0.31664607, 0.3500672 ,
-0.18508775, 1.75620537, 0.71531162, 0.6161491 , -1.22053836],
[ 0.7323965 , 0.20671506, -0.58314419, -0.16540522, -0.23903187,
1.27785655, 0.26691062, -1.45973265, -0.27273178, -1.02878312]],
[[ 0.07655004, -0.35616184, -0.46353849, -1.8515281 , -0.26543777,
0.76412627, 0.83337437, 0.04521198, -2.10686009, 0.84883742],
[ 0.22188875, 0.63737544, 0.26173337, -0.11475485, -1.30431707,
1.25062924, 2.03032414, 0.13742253, -0.98713219, 1.19711129],
[ 0.69212245, 0.70550039, -1.15995398, -0.95507681, -0.39439139,
2.76551965, 0.56088858, 0.54709151, 1.17615801, 0.17744971]]])
9)np.random.normal(loc=0.0,scale=1.0,size=None)
np.random.normal(175,20,size=100)
# 服从N(175,20) 生成10条数据
array([ 174.44281329, 177.66402876, 162.76426831, 210.11244283,
161.26671985, 209.52372115, 159.92703726, 197.83048917,
190.60230978, 170.27114821, 202.67422923, 203.04492988,
171.13235245, 175.64710565, 200.40533303, 207.930948 ,
141.09792492, 158.87495159, 176.74197674, 164.57884322,
181.22386631, 156.26287142, 133.37408465, 178.07588597,
187.50842048, 186.35236779, 153.61560634, 145.53831704,
232.55949685, 142.01340562, 195.22465693, 188.922162 ,
170.02159668, 167.74728882, 173.27258287, 187.68132279,
217.7260755 , 158.28833839, 155.11568289, 200.26945864,
178.91552559, 149.21007505, 200.6454259 , 169.37529856,
201.18878627, 184.37773296, 196.67909536, 144.10223051,
184.63682023, 167.86858875, 191.08394709, 169.98017168,
204.05198975, 199.65286793, 176.22452948, 181.17515804,
178.81440955, 176.79845708, 189.50950157, 136.05787608,
199.35198398, 162.43654974, 155.61396415, 172.22147069,
181.91161368, 192.82571507, 203.70689642, 190.79312957,
204.48924027, 180.48880551, 176.81359193, 145.87844077,
190.13853094, 160.22281705, 200.04783678, 165.19927728,
184.10218694, 178.27524256, 191.58148162, 141.4792985 ,
208.4723939 , 163.70082179, 142.70675324, 189.25398816,
183.53849685, 150.86998696, 172.04187127, 207.12343336,
190.10648007, 188.18995666, 175.43040298, 183.79396855,
172.60260342, 195.1083776 , 194.70719705, 163.10904061,
146.78089275, 195.2271401 , 201.60339544, 164.91176955])
10)np.random.random(size=None)
np.random.random(size=(12,1)) # 0-1之间的浮点数
array([[ 0.54080763],
[ 0.95618258],
[ 0.19457156],
[ 0.12198452],
[ 0.3423529 ],
[ 0.01716331],
[ 0.28061005],
[ 0.51960339],
[ 0.60122982],
[ 0.26462352],
[ 0.85645091],
[ 0.32352418]])
练习:用随机数生成一张图片
boy = np.random.random(size=(667,568,3))
plt.imshow(boy)
plt.show()
数组的常用属性:
维度 ndim, 大小 size, 形状 shape, 元素类型 dtype, 每项大小 itemsize, 数据 data
tigger = plt.imread("./source/tigger.jpg")
# 1、维度
tigger.ndim
3
# 2、大小,指的是一个数组中具体有多少个数字
tigger.size
2829600
# 3、形状
tigger.shape
(786, 1200, 3)
# 4、数据的类型
tigger.dtype
dtype('uint8')
# 5、每个数字的大小(占的字节数)
tigger.itemsize
1
t = tigger / 255.0
t.dtype
dtype('float64')
t.itemsize
8
# 6、data
tigger.data
l = [1,2,3,4,5,6]
l[5]
l[-1]
l[0]
l[-6]
# 正着数从0开始,倒着数从-1开始
1
nd = np.random.randint(0,10,size=(4))
nd
array([9, 6, 1, 7])
nd[0]
nd[1]
nd[-3]
6
lp = [[1,2,3],
[4,5,6],
[7,8]]
lp[1][2]
6
np.array(lp)
array([list([1, 2, 3]), list([4, 5, 6]), list([7, 8])], dtype=object)
np.array(lp)
# 如果二维列表中,某个维度值不保持一致,将会把这个维度打包成一个列表
# 【注意】数组中每个维度的元素的个数必须一样
array([list([1, 2, 3]), list([4, 5, 6]), list([7, 8])], dtype=object)
nd = np.random.randint(0,10,size=(4,4))
nd
#[[2,2,1],[1,2,1]]
array([[7, 9, 2, 3],
[0, 2, 7, 3],
[1, 9, 0, 1],
[4, 1, 2, 8]])
nd[1][3]
# 多次索引:首先找最前面的维度得到子数组,然后从得到的子数组中继续索引
3
区别于列表
nd[1,3]
# 一次索引:直接按照(1,3)这个次序来找
3
lp[1,3] # 列表不能这样找
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
----> 1 lp[1,3] # 列表不能这样找
TypeError: list indices must be integers or slices, not tuple
nd[[1,1,2,3,1,2]]
# 用列表来做索引:按照列表中指定的次序来遍历数组
array([[0, 2, 7, 3],
[0, 2, 7, 3],
[1, 9, 0, 1],
[4, 1, 2, 8],
[0, 2, 7, 3],
[1, 9, 0, 1]])
lp[[1,1]] # 列表的索引不能是列表
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
----> 1 lp[[1,1]] # 列表的索引不能是列表
TypeError: list indices must be integers or slices, not list
nd[[1,2,2,2]][[0,1,2]]
array([[0, 2, 7, 3],
[1, 9, 0, 1],
[1, 9, 0, 1]])
nd[[2,2,1]]
array([[1, 9, 0, 1],
[1, 9, 0, 1],
[0, 2, 7, 3]])
nd[[2,2,1,1],[1,2,1,1]]
array([9, 0, 2, 2])
nd
array([[7, 9, 2, 3],
[0, 2, 7, 3],
[1, 9, 0, 1],
[4, 1, 2, 8]])
nd[0:100] # 左闭右开的区间,右边可以无限大
array([[7, 9, 2, 3],
[0, 2, 7, 3],
[1, 9, 0, 1],
[4, 1, 2, 8]])
lp[0:100]
[[1, 2, 3], [4, 5, 6], [7, 8]]
nd[:2]
array([[7, 9, 2, 3],
[0, 2, 7, 3]])
nd[1:]
array([[0, 2, 7, 3],
[1, 9, 0, 1],
[4, 1, 2, 8]])
nd[3:0:-1]
# 如果步长为负数,代表从后往前数,要求区间也是倒着的
array([[4, 1, 2, 8],
[1, 9, 0, 1],
[0, 2, 7, 3]])
nd
array([[7, 9, 2, 3],
[0, 2, 7, 3],
[1, 9, 0, 1],
[4, 1, 2, 8]])
nd[:,0::2]
array([[7, 2],
[0, 7],
[1, 0],
[4, 2]])
nd[1:3,0:2] # 即切行又切列
array([[0, 2],
[1, 9]])
把girl倒过来
girl
array([[[225, 231, 231],
[229, 235, 235],
[222, 228, 228],
...,
[206, 213, 162],
[211, 213, 166],
[217, 220, 173]],
[[224, 230, 230],
[229, 235, 235],
[223, 229, 229],
...,
[206, 213, 162],
[211, 213, 166],
[217, 220, 173]],
[[224, 230, 230],
[229, 235, 235],
[223, 229, 229],
...,
[206, 213, 162],
[211, 213, 166],
[219, 221, 174]],
...,
[[175, 187, 213],
[180, 192, 218],
[175, 187, 213],
...,
[155, 162, 180],
[153, 160, 178],
[156, 163, 181]],
[[175, 187, 213],
[180, 192, 218],
[174, 186, 212],
...,
[155, 162, 180],
[153, 160, 178],
[155, 162, 180]],
[[177, 189, 215],
[181, 193, 219],
[174, 186, 212],
...,
[155, 162, 180],
[153, 160, 178],
[156, 163, 181]]], dtype=uint8)
plt.imshow(girl[::-2,::-2])
plt.show()
t = tigger.copy() #
plt.imshow(tigger)
plt.show()
girl2 = plt.imread("./source/girl2.jpg")
plt.imshow(girl2)
plt.show()
# 给老虎挖坑
tigger[150:450,300:600] = girl2
plt.imshow(tigger)
plt.show()
reshape()
resize()
tigger.shape
(786, 1200, 3)
nd = np.random.randint(0,10,size=12)
nd
array([4, 0, 1, 1, 8, 7, 7, 5, 3, 0, 7, 3])
nd.shape
(12,)
nd.reshape((3,2,2,1)) # 参数为一个元组,代表的就是要把nd变成一个什么形状
array([[[[4],
[0]],
[[1],
[1]]],
[[[8],
[7]],
[[7],
[5]]],
[[[3],
[0]],
[[7],
[3]]]])
nd
array([4, 0, 1, 1, 8, 7, 7, 5, 3, 0, 7, 3])
nd.reshape((3,2))#cannot reshape array of size 12 into shape (3,8)
# 变形的时候size要保持一致
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 nd.reshape((3,2))#cannot reshape array of size 12 into shape (3,8)
ValueError: cannot reshape array of size 12 into shape (3,2)
nd.resize((2,6))
nd
array([[4, 0, 1, 1, 8, 7],
[7, 5, 3, 0, 7, 3]])
【注意】
1)形变之前和形变之后的数组的size要保持一致,否则无法形变
2)reshape()函数是把原数组拷贝副本以后对副本进行形变,并且把形变的结果返回
3)resize()函数在原来的数组上进行形变,不需要返回结果
级联:就是按照指定的维度把两个数组连在一起
nd1 = np.random.randint(0,10,size=(4,4))
nd2 = np.random.randint(20,40,size=(3,4))
print(nd1)
print(nd2)
[[2 5 6 1]
[4 8 0 5]
[9 4 7 8]
[4 3 0 8]]
[[38 22 25 38]
[22 38 30 21]
[23 34 28 26]]
# 将两个数组进行级联
np.concatenate([nd1,nd2],axis=0)
# 参数1,是一个列表(或者元组),列表中是参与级联的那些数组
# 参数axis默认为0代表在行上(第0个维度)进行级联,1代表在列上(第1个维度)进行级联
array([[ 2, 5, 6, 1],
[ 4, 8, 0, 5],
[ 9, 4, 7, 8],
[ 4, 3, 0, 8],
[38, 22, 25, 38],
[22, 38, 30, 21],
[23, 34, 28, 26]])
np.concatenate([nd1,nd2],axis=1)
# 列级联需要行数一致
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 np.concatenate([nd1,nd2],axis=1)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
nd3 = np.random.randint(0,10,size=(4,3))
nd3
array([[1, 3, 7],
[9, 5, 3],
[9, 0, 2],
[0, 7, 4]])
nd1
array([[2, 5, 6, 1],
[4, 8, 0, 5],
[9, 4, 7, 8],
[4, 3, 0, 8]])
np.concatenate([nd1,nd3])
# 列数不一致,不能进行行级联
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 np.concatenate([nd1,nd3])
ValueError: all the input array dimensions except for the concatenation axis must match exactly
np.concatenate([nd1,nd3],axis=1)
array([[2, 5, 6, 1, 1, 3, 7],
[4, 8, 0, 5, 9, 5, 3],
[9, 4, 7, 8, 9, 0, 2],
[4, 3, 0, 8, 0, 7, 4]])
1)形状一致才可以级联
nd4 = np.random.randint(0,10,size=(1,2,3))
nd5 = np.random.randint(0,10,size=(1,4,3))
print(nd4)
print(nd5)
[[[2 9 8]
[9 5 6]]]
[[[9 9 6]
[8 3 4]
[8 7 7]
[0 6 6]]]
np.concatenate([nd4,nd5],axis=1)
array([[[2, 9, 8],
[9, 5, 6],
[9, 9, 6],
[8, 3, 4],
[8, 7, 7],
[0, 6, 6]]])
nd6 = np.random.randint(0,10,size=4)
nd6
array([3, 5, 3, 6])
2)维度不一致不能级联
np.concatenate([nd1,nd6])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 np.concatenate([nd1,nd6])
ValueError: all the input arrays must have same number of dimensions
级联需要注意的问题:
1)维度必须一样
2)形状必须相符(axis等于哪个维度,我们去掉这个维度以后,剩余的形状必须一致)
3)级联方向可以有axis来指定,默认是0
针对于二维数组还有hstack和vstack
nd = np.random.randint(0,10,size=(10,1))
nd
array([[1],
[7],
[6],
[9],
[0],
[4],
[6],
[2],
[0],
[8]])
np.hstack(nd)
array([1, 7, 6, 9, 0, 4, 6, 2, 0, 8])
nd1 = np.random.randint(0,10,size=(10,2))
nd1
array([[4, 4],
[3, 1],
[3, 3],
[9, 6],
[5, 1],
[4, 7],
[3, 3],
[4, 3],
[7, 9],
[6, 5]])
np.hstack(nd1)
array([4, 4, 3, 1, 3, 3, 9, 6, 5, 1, 4, 7, 3, 3, 4, 3, 7, 9, 6, 5])
np.vstack(nd1)
array([[4, 4],
[3, 1],
[3, 3],
[9, 6],
[5, 1],
[4, 7],
[3, 3],
[4, 3],
[7, 9],
[6, 5]])
nd2 = np.random.randint(0,10,size=10)
nd2
array([1, 7, 4, 3, 9, 0, 3, 3, 2, 5])
np.vstack(nd2)
array([[1],
[7],
[4],
[3],
[9],
[0],
[3],
[3],
[2],
[5]])
np.hstack(nd2)
array([1, 7, 4, 3, 9, 0, 3, 3, 2, 5])
hstack()把列数组改成行数组,把二维数组改成一维
vstack()把行数组改成列数组,把一维数组改成二维(把一维数组中的每一个元素作为一行)
切分就是把一个数组切成多个
vsplit()
hsplit()
split()
nd = np.random.randint(0,100,size=(5,6))
nd
array([[17, 47, 83, 33, 69, 24],
[60, 4, 34, 29, 75, 60],
[33, 55, 67, 1, 76, 82],
[31, 92, 1, 14, 83, 95],
[59, 88, 81, 49, 70, 11]])
# 水平方向上切分
np.hsplit(nd,[1,4,5,8,9])
# 参数1,代表被切分的数组,参数2,是一个列表,代表了切分点的位置
[array([[17],
[60],
[33],
[31],
[59]]), array([[47, 83, 33],
[ 4, 34, 29],
[55, 67, 1],
[92, 1, 14],
[88, 81, 49]]), array([[69],
[75],
[76],
[83],
[70]]), array([[24],
[60],
[82],
[95],
[11]]), array([], shape=(5, 0), dtype=int32), array([], shape=(5, 0), dtype=int32)]
# 竖直方向上切分
np.vsplit(nd,[1,3,5])
[array([[17, 47, 83, 33, 69, 24]]), array([[60, 4, 34, 29, 75, 60],
[33, 55, 67, 1, 76, 82]]), array([[31, 92, 1, 14, 83, 95],
[59, 88, 81, 49, 70, 11]]), array([], shape=(0, 6), dtype=int32)]
split()函数
nd
array([[17, 47, 83, 33, 69, 24],
[60, 4, 34, 29, 75, 60],
[33, 55, 67, 1, 76, 82],
[31, 92, 1, 14, 83, 95],
[59, 88, 81, 49, 70, 11]])
np.split(nd,[1,2],axis=0)
# axis默认为0代表在第0个维度上进行切分,1代表切的是第1个维度
[array([[17, 47, 83, 33, 69, 24]]),
array([[60, 4, 34, 29, 75, 60]]),
array([[33, 55, 67, 1, 76, 82],
[31, 92, 1, 14, 83, 95],
[59, 88, 81, 49, 70, 11]])]
推广
nd1 = np.random.randint(0,10,size=(3,4,5))
nd1
array([[[5, 7, 8, 7, 9],
[3, 6, 1, 9, 0],
[6, 0, 2, 6, 9],
[4, 5, 5, 3, 9]],
[[6, 7, 6, 2, 3],
[3, 0, 0, 5, 3],
[9, 9, 0, 6, 2],
[5, 4, 5, 4, 4]],
[[8, 7, 4, 8, 9],
[2, 2, 1, 7, 3],
[2, 2, 9, 4, 7],
[7, 3, 9, 4, 1]]])
np.split(nd1,[2],axis=2)
[array([[[5, 7],
[3, 6],
[6, 0],
[4, 5]],
[[6, 7],
[3, 0],
[9, 9],
[5, 4]],
[[8, 7],
[2, 2],
[2, 2],
[7, 3]]]), array([[[8, 7, 9],
[1, 9, 0],
[2, 6, 9],
[5, 3, 9]],
[[6, 2, 3],
[0, 5, 3],
[0, 6, 2],
[5, 4, 4]],
[[4, 8, 9],
[1, 7, 3],
[9, 4, 7],
[9, 4, 1]]])]
nd = np.random.randint(0,100,size=6)
nd
array([34, 69, 14, 2, 48, 74])
nd1 = nd
# 数组之间的赋值只是对地址一次拷贝,数组对象本身并没有被拷贝
nd1
array([34, 69, 14, 2, 48, 74])
nd1[0] = 100
nd1
array([100, 69, 14, 2, 48, 74])
nd
array([100, 69, 14, 2, 48, 74])
nd2 = nd.copy()
# copy函数是把nd引用的那个数组也拷贝一份副本,并且把这个副本的地址存入了nd2
nd2[0] = 200000
nd
array([100, 69, 14, 2, 48, 74])
nd1
array([100, 69, 14, 2, 48, 74])
nd2
array([200000, 69, 14, 2, 48, 74])
讨论:由列表创建数组的过程有木有副本的创建
l = [1,2,3]
l
[1, 2, 3]
nd = np.array(l)
nd
array([1, 2, 3])
nd[0] = 1000
l
[1, 2, 3]
说明:由列表创建数组的过程就是把列表拷贝出一个副本,然后把这个副本中的元素类型做一个统一化,然后放入数组对象中
聚合操作指的就是对数组内部的数据进行某些特性的求解
nd = np.random.randint(0,10,size=(3,4))
nd
array([[5, 9, 6, 8],
[3, 7, 1, 9],
[5, 7, 6, 3]])
nd.sum() # 完全聚合
69
nd.sum(axis=0) # 对行进行聚合(即对第0个维度进行聚合)
array([13, 23, 13, 20])
nd.sum(axis=1) # 对列进行聚合(即对第1个维度进行聚合)
array([28, 20, 21])
nd = np.random.randint(0,10,size=(2,3,4))
nd
array([[[1, 0, 0, 3],
[9, 6, 1, 8],
[4, 9, 3, 9]],
[[8, 0, 4, 3],
[3, 0, 1, 8],
[8, 0, 7, 4]]])
nd.sum()
99
nd.sum(axis=0)
array([[ 9, 0, 4, 6],
[12, 6, 2, 16],
[12, 9, 10, 13]])
nd.sum(axis=2)
array([[ 4, 24, 25],
[15, 12, 19]])
聚合操作的规律:通过axis来改变聚合轴,axis=x的时候,第x的维度就会消失,把这个维度上对应的元素进行聚合
nd1 = np.random.randint(0,10,size=(2,3,4,5))
nd1
array([[[[3, 2, 9, 4, 0],
[1, 0, 2, 3, 7],
[4, 8, 6, 6, 5],
[2, 3, 4, 1, 5]],
[[3, 2, 0, 1, 3],
[7, 3, 3, 4, 1],
[0, 4, 0, 6, 9],
[3, 8, 6, 0, 5]],
[[5, 1, 3, 5, 0],
[1, 4, 1, 8, 0],
[9, 1, 9, 6, 5],
[6, 1, 8, 5, 1]]],
[[[7, 5, 3, 4, 5],
[7, 8, 6, 7, 2],
[9, 9, 5, 3, 4],
[9, 2, 9, 7, 2]],
[[3, 2, 9, 7, 7],
[0, 8, 1, 3, 0],
[1, 5, 5, 6, 5],
[4, 8, 7, 2, 9]],
[[1, 3, 5, 0, 6],
[6, 0, 3, 5, 6],
[2, 4, 6, 9, 0],
[8, 7, 4, 0, 6]]]])
写法一
nd1.sum(axis=2).sum(axis=2)
array([[ 75, 68, 79],
[113, 92, 81]])
写法二
nd1.sum(axis=-1).sum(axis=-1)
array([[ 75, 68, 79],
[113, 92, 81]])
写法三
nd1.sum(axis=(-1,-2))
array([[ 75, 68, 79],
[113, 92, 81]])
nd
array([[[1, 0, 0, 3],
[9, 6, 1, 8],
[4, 9, 3, 9]],
[[8, 0, 4, 3],
[3, 0, 1, 8],
[8, 0, 7, 4]]])
nd.sum(axis=-1)
array([[ 4, 24, 25],
[15, 12, 19]])
nd.max()
9
nd.max(axis=-1)
array([[3, 9, 9],
[8, 8, 8]])
nd.max(axis=1)
array([[9, 9, 3, 9],
[8, 0, 7, 8]])
nd.min(axis=0)
array([[1, 0, 0, 3],
[3, 0, 1, 8],
[4, 0, 3, 4]])
Function Name NaN-safe Version Description
np.sum np.nansum Compute sum of elements
np.prod np.nanprod Compute product of elements
np.mean np.nanmean Compute mean of elements
np.std np.nanstd Compute standard deviation
np.var np.nanvar Compute variance
np.min np.nanmin Find minimum value
np.max np.nanmax Find maximum value
np.argmin np.nanargmin Find index of minimum value
np.argmax np.nanargmax Find index of maximum value
np.median np.nanmedian Compute median of elements
np.percentile np.nanpercentile Compute rank-based statistics of elements
np.any N/A Evaluate whether any elements are true
np.all N/A Evaluate whether all elements are true
np.power 幂运算
np.nan
# 这个数字代表的是缺失,默认是浮点类型
type(np.nan) # 任何数字和nan相运算都是缺失
float
np.nan + 10
nan
np.nan*10
nan
nd2 = np.array([12,23,np.nan,34,np.nan,90])
nd2
array([ 12., 23., nan, 34., nan, 90.])
# 对nd2聚合
nd2.sum(axis=0)
nan
nd2.max()
nan
普通聚合对于有缺失的数组来说会造成干扰,就需要使用带nan的聚合
np.nansum(nd2)
159.0
np.nanmean(nd2)
39.75
聚合操作:
1)axis指定的是聚合的哪个维度,默认没有代表完全聚合(即把所有的数组全聚合起来最后得到一个常数),如果axis值指定哪个维度,这个维度就会消失,取而代之的是聚合以后的结果
2)numpy里面的聚合函数有两个版本带nan和不带nan,带nan的聚合会把缺失的那些项在聚合的时候直接剔除掉
nd = np.random.randint(0,100,size=(5,5))
nd
array([[70, 76, 87, 23, 68],
[34, 3, 59, 93, 71],
[71, 64, 98, 31, 70],
[59, 17, 71, 99, 50],
[86, 58, 91, 22, 18]])
np.sort(nd,axis=0)
array([[34, 3, 59, 22, 18],
[59, 17, 71, 23, 50],
[70, 58, 87, 31, 68],
[71, 64, 91, 93, 70],
[86, 76, 98, 99, 71]])
np.sort(nd[:,3])
array([22, 23, 31, 93, 99])
nd[[4,0,2,1,3]]
array([[86, 58, 91, 22, 18],
[70, 76, 87, 23, 68],
[71, 64, 98, 31, 70],
[34, 3, 59, 93, 71],
[59, 17, 71, 99, 50]])
ind = np.argsort(nd[:,3]) # 按照从小到大的顺序排序以后,返回元素对应的下标
ind
array([4, 0, 2, 1, 3], dtype=int64)
nd[ind]
array([[86, 58, 91, 22, 18],
[70, 76, 87, 23, 68],
[71, 64, 98, 31, 70],
[34, 3, 59, 93, 71],
[59, 17, 71, 99, 50]])
nd = np.random.randint(0,10,size=(3,3))
nd
array([[7, 4, 6],
[4, 5, 1],
[0, 2, 5]])
nd + nd
array([[14, 8, 12],
[ 8, 10, 2],
[ 0, 4, 10]])
nd + 2 # 在这里常数2会被放大成一个3*3的矩阵值全为2
array([[9, 6, 8],
[6, 7, 3],
[2, 4, 7]])
nd - 2
array([[ 5, 2, 4],
[ 2, 3, -1],
[-2, 0, 3]])
在数学矩阵是可以乘以或除以一个常数的
nd * 4
array([[28, 16, 24],
[16, 20, 4],
[ 0, 8, 20]])
nd / 4
array([[ 1.75, 1. , 1.5 ],
[ 1. , 1.25, 0.25],
[ 0. , 0.5 , 1.25]])
1/nd
C:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
"""Entry point for launching an IPython kernel.
array([[ 0.14285714, 0.25 , 0.16666667],
[ 0.25 , 0.2 , 1. ],
[ inf, 0.5 , 0.2 ]])
nd1 = np.random.randint(0,10,size=(2,3))
nd2 = np.random.randint(0,10,size=(3,3))
print(nd1)
print(nd2)
[[8 3 5]
[3 3 5]]
[[4 1 0]
[1 3 0]
[7 6 7]]
np.dot(nd1,nd2)
array([[70, 47, 35],
[50, 42, 35]])
两个矩阵A和B相乘的时候A*B的时候,数学上要求A列数要B的行数保持一致(因为我们在乘的时候是拿A的行乘B的列)
ndarray的广播机制的两条规则:
nd + nd1
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 nd + nd1
ValueError: operands could not be broadcast together with shapes (3,3) (2,3)
nd
array([[7, 4, 6],
[4, 5, 1],
[0, 2, 5]])
nd1 = np.random.randint(0,10,size=3)
nd1
array([1, 8, 6])
矩阵和向量相加减,矩阵和常数相加减,向量和常数相加减在数学上是不允许
在程序中,之所以可这样计算,原因是广播机制,把低维度的数据扩展成了和高维度形状类似的数据类型
nd + nd1
array([[ 8, 12, 12],
[ 5, 13, 7],
[ 1, 10, 11]])
nd1 + 3
array([ 4, 11, 9])
nd2 = np.random.randint(0,10,size=4)
nd2
array([8, 5, 1, 7])
nd1+nd2
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 nd1+nd2
ValueError: operands could not be broadcast together with shapes (3,) (4,)
nd + nd2
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 nd + nd2
ValueError: operands could not be broadcast together with shapes (3,3) (4,)
nd3 = np.random.randint(0,10,size=(3,1))
nd3
array([[6],
[8],
[6]])
nd +nd3 # nd3是一个列向量,向量可以向矩阵广播
array([[13, 10, 12],
[12, 13, 9],
[ 6, 8, 11]])
广播机制的原则:
1)就是要把缺失的那些行或者列补充完整
2)我们可以把一个常数向任何一个矩阵或者向量进行广播,用常数来填补整个扩展的矩阵
3)向量可以向形状类似的举证广播(比如行向量可以向列数与其一致矩阵广播),向量在向矩阵广播的时候,用向量的行(或列)取填补扩展的矩阵