科学运算当中最为重要的两个模块,一个是 numpy,一个是 pandas。任何关于数据分析的模块都少不了它们两个。
(1)运算速度快:numpy 和 pandas 都是采用 C 语言编写,pandas 又是基于 numpy,是 numpy 的升级版本。
(2)消耗资源少:采用的是矩阵运算,会比 python 自带的字典或者列表快好多。
建议直接安装Anaconda。
新建Python文件,输入
import numpy as np
array = np.array([[1, 2, 3],
[2, 3, 4]])
print(array)
print('number of dim:', array.ndim)
print('shape:', array.shape)
print('size:', array.size)
得到
[[1 2 3]
[2 3 4]]
number of dim: 2
shape: (2, 3)
size: 6
新建Python文件,输入
import numpy as np
a = np.array([1, 2, 3], dtype=np.int64)
print(a)
print(a.dtype)
b = np.array([[1, 2, 3],
[2, 3, 4]])
print(b)
c = np.zeros((3, 4))
print(c)
d = np.ones((3, 4), dtype=np.int16)
print(d)
e = np.empty((3, 4))
print(e)
f = np.arange(10, 20, 2)
print(f)
g = np.arange(12).reshape((3, 4))
print(g)
h = np.linspace(1, 10, 5)
print(h)
i = np.linspace(1, 10, 6).reshape((2, 3))
print(i)
得到
[1 2 3]
int64
[[1 2 3]
[2 3 4]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[10 12 14 16 18]
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[ 1. 3.25 5.5 7.75 10. ]
[[ 1. 2.8 4.6]
[ 6.4 8.2 10. ]]
矩阵matrix和数组array是Numpy里的两种数据类型,都可以用于处理行列表示的数字元素。
(1)matrix只能是2维的,array可以是任意维数。
(2)在这两个数据类型上执行相同的数学运算会得到不同的结果。
import numpy as np
a = np.array([10, 20, 30, 40])
b = np.arange(4)
print(a, b)
c = a-b
print(c)
d = b**2
print(d)
e = 10*np.sin(a) #还有cos, tan
print(e)
print(b)
print(b < 3)
print(b == 3)
得到
[10 20 30 40] [0 1 2 3]
[10 19 28 37]
[0 1 4 9]
[-5.44021111 9.12945251 -9.88031624 7.4511316 ]
[0 1 2 3]
[ True True True False]
[False False False True]
import numpy as np
a = np.array([[1, 1],
[0, 1]])
b = np.arange(4).reshape((2, 2))
print(a)
print(b)
c = a*b
c_dot = np.dot(a, b)
c_dot_2 = a.dot(b)
print(c)
print(c_dot)
print(c_dot_2)
得到
[[1 1]
[0 1]]
[[0 1]
[2 3]]
[[0 1]
[0 3]]
[[2 4]
[2 3]]
[[2 4]
[2 3]]
import numpy as np
a = np.random.random((2, 4))
print(a)
print(np.sum(a))
print(np.max(a))
print(np.min(a))
print(np.sum(a, axis=1))
print(np.max(a, axis=0))
print(np.min(a, axis=1))
得到
[[0.73702458 0.34496308 0.17568417 0.52969913]
[0.63922476 0.01391143 0.00312518 0.04572013]]
2.4893524676796797
0.737024580227777
0.003125182682755523
[1.78737096 0.70198151]
[0.73702458 0.34496308 0.17568417 0.52969913]
[0.17568417 0.00312518]
import numpy as np
A = np.arange(2, 14).reshape((3, 4))
print(A)
print(np.argmin(A))
print(np.argmax(A))
print(np.mean(A))
print(A.mean())
print(np.average(A))
print(np.median(A))
print(np.cumsum(A))
print(np.diff(A))
print(np.nonzero(A))
得到
[[ 2 3 4 5]
[ 6 7 8 9]
[10 11 12 13]]
0
11
7.5
7.5
7.5
7.5
[ 2 5 9 14 20 27 35 44 54 65 77 90]
[[1 1 1]
[1 1 1]
[1 1 1]]
(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2], dtype=int64), array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int64))
import numpy as np
B = np.arange(14, 2, -1).reshape((3, 4))
print(B)
print(np.sort(B))
print(np.transpose(B))
print(B.T)
print((B.T).dot(B))
print(np.clip(B, 5, 9))
print(np.mean(B, axis=0))
print(np.mean(B, axis=1))
得到
[[14 13 12 11]
[10 9 8 7]
[ 6 5 4 3]]
[[11 12 13 14]
[ 7 8 9 10]
[ 3 4 5 6]]
[[14 10 6]
[13 9 5]
[12 8 4]
[11 7 3]]
[[14 10 6]
[13 9 5]
[12 8 4]
[11 7 3]]
[[332 302 272 242]
[302 275 248 221]
[272 248 224 200]
[242 221 200 179]]
[[9 9 9 9]
[9 9 8 7]
[6 5 5 5]]
[10. 9. 8. 7.]
[12.5 8.5 4.5]
新建Python文件,输入
import numpy as np
A = np.arange(3, 15)
print(A)
print(A[3])
B = np.arange(3, 15).reshape((3, 4))
print(B)
print(B[2])
print(B[1][1])
print(B[1, 1])
print(B[2, :])
print(B[:, 0])
print(B[1, 1:3])
for row in B:
print(row)
for column in B.T:
print(column)
print(B.flatten())
for item in B.flat:
print(item)
得到
[ 3 4 5 6 7 8 9 10 11 12 13 14]
6
[[ 3 4 5 6]
[ 7 8 9 10]
[11 12 13 14]]
[11 12 13 14]
8
8
[11 12 13 14]
[ 3 7 11]
[8 9]
[3 4 5 6]
[ 7 8 9 10]
[11 12 13 14]
[ 3 7 11]
[ 4 8 12]
[ 5 9 13]
[ 6 10 14]
[ 3 4 5 6 7 8 9 10 11 12 13 14]
3
4
5
6
7
8
9
10
11
12
13
14
新建Python文件,输入
import numpy as np
A = np.array([1, 1, 1])
B = np.array([2, 2, 2])
C = np.vstack((A, B))
D = np.hstack((A, B))
print(A.shape, B.shape, C.shape, D.shape)
print(C) #vertical stack
print(D) #horizontal stack
print(A[:, np.newaxis])
E = np.array([1, 1, 1])[:, np.newaxis]
F = np.array([2, 2, 2])[:, np.newaxis]
print(E)
print(F)
G = np.concatenate((E, F, F, E), axis=1)
print(G)
得到
(3,) (3,) (2, 3) (6,)
[[1 1 1]
[2 2 2]]
[1 1 1 2 2 2]
[[1]
[1]
[1]]
[[1]
[1]
[1]]
[[2]
[2]
[2]]
[[1 2 2 1]
[1 2 2 1]
[1 2 2 1]]
新建Python文件,输入
import numpy as np
A = np.arange(12).reshape((3, 4))
print(A)
print(np.split(A, 2, axis=1))
print(np.split(A, 3, axis=0))
print(np.array_split(A, 3, axis=1))
print(np.vsplit(A, 3))
print(np.hsplit(A, 2))
得到
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[array([[0, 1],
[4, 5],
[8, 9]]), array([[ 2, 3],
[ 6, 7],
[10, 11]])]
[array([[0, 1, 2, 3]]), array([[4, 5, 6, 7]]), array([[ 8, 9, 10, 11]])]
[array([[0, 1],
[4, 5],
[8, 9]]), array([[ 2],
[ 6],
[10]]), array([[ 3],
[ 7],
[11]])]
[array([[0, 1, 2, 3]]), array([[4, 5, 6, 7]]), array([[ 8, 9, 10, 11]])]
[array([[0, 1],
[4, 5],
[8, 9]]), array([[ 2, 3],
[ 6, 7],
[10, 11]])]
新建Python文件,输入
import numpy as np
a = np.arange(4)
b = a
c = b
a[0] = 11
print(a)
print(b)
print(c)
print(b is a)
print(c is a)
d = a.copy() #deep copy
print(d)
d[0] = 20
print(d)
print(a)
得到
[11 1 2 3]
[11 1 2 3]
[11 1 2 3]
True
True
[11 1 2 3]
[20 1 2 3]
[11 1 2 3]
此部分等待以后进行学习。
见网址:https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/4-1-speed-up-numpy/