参考链接:https://datawhalechina.github.io/joyful-pandas/build/html/%E7%9B%AE%E5%BD%95/ch1.html
一般的矩阵乘法根据公式,可以由三重循环写出,请将其改写为列表推导式的形式。
import numpy as np
M1 = np.random.rand(2,3)
M2 = np.random.rand(3,4)
print(M1)
print(M2)
M1@M2
[[0.21882773 0.83856712 0.10216554] # M1矩阵,shape = (2,3)
[0.8178865 0.82343903 0.14219665]]
[[0.4981882 0.81916687 0.285332 0.69347725] # M2矩阵,shape = (3,4)
[0.92874779 0.18476286 0.51331421 0.50188801]
[0.08611562 0.91474668 0.99530425 0.47614143]]
array([[0.8966328 , 0.42764808, 0.59457277, 0.62126409], # 结果维度:shape = (2,4)
[1.18447394, 0.95220039, 0.79758107, 1.04816558]])
list(zip(*M2)) # zip()能够把多个可迭代对象打包成一个元组构成的可迭代对象,它返回了一个zip对象,通过tuple, list可以得到相应的打包结果
[(0.49818820178117007, 0.9287477924393945, 0.08611562119504934),
(0.8191668746611285, 0.1847628644683621, 0.9147466808414475),
(0.28533199558653055, 0.5133142118732157, 0.9953042512832437),
(0.6934772523076975, 0.5018880125960207, 0.47614143326355496)]
#方法一:
z1 = [[sum(map(lambda x: x[0]*x[1],zip(i,j))) for i in zip(*M2)] for j in M1] # M1*M2,列表推导式的内层循环是M2,外层是M1
z1
[[0.8966328043025076,
0.4276480815824257,
0.5945727688239985,
0.6212640864654319],
[1.1844739389352992,
0.9522003935787853,
0.7975810730177366,
1.0481655767133908]]
#方法二:
z2 = [[sum([M1[i][k]*M2[k][j] for k in range(M1.shape[1])]) for j in range(M2.shape[1])] for i in range(M1.shape[0])]
z2
[[0.8966328043025076,
0.4276480815824257,
0.5945727688239985,
0.6212640864654319],
[1.1844739389352992,
0.9522003935787853,
0.7975810730177366,
1.0481655767133908]]
(np.abs(M1@M2 - z2) < 1e-15).all()
True
设矩阵 A m × n A_{m×n} Am×n ,现在对 A A A 中的每一个元素进行更新生成矩阵 B B B ,更新方法是 B i j = A i j ∑ k = 1 n 1 A i k B_{ij}=A_{ij}\sum_{k=1}^n\frac{1}{A_{ik}} Bij=Aij∑k=1nAik1 ,例如下面的矩阵为 A A A ,则 B 2 , 2 = 5 × ( 1 4 + 1 5 + 1 6 ) = 37 12 B_{2,2}=5\times(\frac{1}{4}+\frac{1}{5}+\frac{1}{6})=\frac{37}{12} B2,2=5×(41+51+61)=1237 ,请利用 Numpy
高效实现。
A = [ 1 2 3 4 5 6 7 8 9 ] A= \begin{bmatrix} 1 & 2 &3\\ 4 & 5 &6\\ 7 & 8 & 9 \end{bmatrix} A=⎣⎡147258369⎦⎤
A = np.arange(1,10).reshape(3,3)
A
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
#方法一:
for i in range(A.shape[0]):
print(A[i]*(1/A[i]).sum(0))
[1.83333333 3.66666667 5.5 ]
[2.46666667 3.08333333 3.7 ]
[2.65277778 3.03174603 3.41071429]
# 方法二:
(1/A).sum(1)
array([1.83333333, 0.61666667, 0.37896825])
B = A*((1/A).sum(1).reshape(3,1))
B
array([[1.83333333, 3.66666667, 5.5 ],
[2.46666667, 3.08333333, 3.7 ],
[2.65277778, 3.03174603, 3.41071429]])
设矩阵 A m × n A_{m\times n} Am×n,记 B i j = ( ∑ i = 1 m A i j ) × ( ∑ j = 1 n A i j ) ∑ i = 1 m ∑ j = 1 n A i j B_{ij} = \frac{(\sum_{i=1}^mA_{ij})\times (\sum_{j=1}^nA_{ij})}{\sum_{i=1}^m\sum_{j=1}^nA_{ij}} Bij=∑i=1m∑j=1nAij(∑i=1mAij)×(∑j=1nAij),定义卡方值如下:
χ 2 = ∑ i = 1 m ∑ j = 1 n ( A i j − B i j ) 2 B i j \chi^2 = \sum_{i=1}^m\sum_{j=1}^n\frac{(A_{ij}-B_{ij})^2}{B_{ij}} χ2=i=1∑mj=1∑nBij(Aij−Bij)2
请利用Numpy
对给定的矩阵 A A A计算 χ 2 \chi^2 χ2
np.random.seed(0)
A = np.random.randint(10, 20, (8, 5))
A
array([[15, 10, 13, 13, 17],
[19, 13, 15, 12, 14],
[17, 16, 18, 18, 11],
[16, 17, 17, 18, 11],
[15, 19, 18, 19, 14],
[13, 10, 13, 15, 10],
[12, 13, 18, 11, 13],
[13, 13, 17, 10, 11]])
A.sum(1).reshape(-1,1)*A.sum(0) # shape = (8,5)
array([[ 8160, 7548, 8772, 7888, 6868],
[ 8760, 8103, 9417, 8468, 7373],
[ 9600, 8880, 10320, 9280, 8080],
[ 9480, 8769, 10191, 9164, 7979],
[10200, 9435, 10965, 9860, 8585],
[ 7320, 6771, 7869, 7076, 6161],
[ 8040, 7437, 8643, 7772, 6767],
[ 7680, 7104, 8256, 7424, 6464]])
B =(A.sum(1).reshape(-1,1)*A.sum(0))/A.sum()
B
array([[14.14211438, 13.08145581, 15.20277296, 13.67071057, 11.90294627],
[15.18197574, 14.04332756, 16.32062392, 14.67590988, 12.77816291],
[16.63778163, 15.38994801, 17.88561525, 16.08318891, 14.0034662 ],
[16.42980936, 15.19757366, 17.66204506, 15.88214905, 13.82842288],
[17.67764298, 16.35181976, 19.0034662 , 17.08838821, 14.87868284],
[12.68630849, 11.73483536, 13.63778163, 12.26343154, 10.67764298],
[13.93414211, 12.88908146, 14.97920277, 13.46967071, 11.72790295],
[13.3102253 , 12.31195841, 14.3084922 , 12.86655113, 11.20277296]])
chiq2 = (np.power((A-B),2)/B).sum()
chiq2
11.842696601945802
设 Z Z Z为 m × n m×n m×n的矩阵, B B B和 U U U分别是 m × p m×p m×p和 p × n p×n p×n的矩阵, B i B_i Bi为 B B B的第 i i i行, U j U_j Uj为 U U U的第 j j j列,下面定义 R = ∑ i = 1 m ∑ j = 1 n ∥ B i − U j ∥ 2 2 Z i j \displaystyle R=\sum_{i=1}^m\sum_{j=1}^n\|B_i-U_j\|_2^2Z_{ij} R=i=1∑mj=1∑n∥Bi−Uj∥22Zij,其中 ∥ a ∥ 2 2 \|\mathbf{a}\|_2^2 ∥a∥22表示向量 a a a的分量平方和 ∑ i a i 2 \sum_i a_i^2 ∑iai2。
现有某人根据如下给定的样例数据计算 R R R的值,请充分利用Numpy
中的函数,基于此问题改进这段代码的性能。
np.random.seed(0)
m, n, p = 100, 80, 50
B = np.random.randint(0, 2, (m, p))
U = np.random.randint(0, 2, (p, n))
Z = np.random.randint(0, 2, (m, n))
def solution1(B=B, U=U, Z=Z):
L_res = []
for i in range(m):
for j in range(n):
norm_value = ((B[i]-U[:,j])**2).sum()
L_res.append(norm_value*Z[i][j])
return sum(L_res)
solution1(B, U, Z)
100566
def solution2(B=B, U=U, Z=Z):
T = np.array([[np.power([(B[i][k] - U[k][j]) for k in range(p)],2).sum() for j in range(n)] for i in range(m)])
R = (T*Z).sum()
return R
%timeit L1 = solution1(B, U, Z)
%timeit L2 = solution2(B, U, Z) # 方法二比方法一还慢了一个数量级
37.5 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
311 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
def solution3(B=B, U=U, Z=Z):
T = (np.power(B,2).sum(1).reshape(-1,1))+(np.power(U,2).sum(0))-2*B@U
R = (T*Z).sum()
return R
%timeit L3 = solution3(B, U, Z) # 比方法一快了2个数量级
260 µs ± 3.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
与普通的运算方法相比,numpy内置的向量化运算缩减了运行时间,因而适当使用numpy函数能提高程序效率
输入一个整数的Numpy
数组,返回其中递增连续整数子数组的最大长度,正向是指递增方向。例如,输入[1,2,5,6,7],[5,6,7]为具有最大长度的连续整数子数组,因此输出3;输入[3,2,1,2,3,4,6],[1,2,3,4]为具有最大长度的连续整数子数组,因此输出4。请充分利用Numpy
的内置函数完成。(提示:考虑使用nonzero, diff
函数)
a = np.array([5,3,2,1,0,-1,2,3,4,6])
print(np.diff(a))
array_zero = np.diff(np.diff(a)) #数组中是否有连续的0
print(array_zero)
print(np.nonzero(array_zero)) # 数组中非0元素的索引
[-2 -1 -1 -1 -1 3 1 1 2]
[ 1 0 0 0 4 -2 0 1]
(array([0, 4, 5, 7], dtype=int64),)
np.diff(np.nonzero(array_zero)) # array_zero数组非0元素索引之差+1,即为原数组a的连续递增/递减数组的长度
array([[4, 1, 2]], dtype=int64)
np.diff(np.nonzero(array_zero)).max()+1 # 原数组a递增/递减连续整数子数组的最大长度
5
题目要求是连续递增整数子数组的最大长度,没想到合适的方法,视频提供的方法是:
np.diff(a)!=1
array([ True, True, True, True, True, True, False, False, True])
np.r_[1,np.diff(a)!=1,1]
array([1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1], dtype=int32)
np.nonzero(np.r_[1,np.diff(a)!=1,1])
(array([ 0, 1, 2, 3, 4, 5, 6, 9, 10], dtype=int64),)
np.diff(np.nonzero(np.r_[1,np.diff(a)!=1,1]))
array([[1, 1, 1, 1, 1, 1, 3, 1]], dtype=int64)
np.diff(np.nonzero(np.r_[1,np.diff(a)!=1,1])).max()
3