当矩阵中有很多零元素时,为了节省内存,可以对矩阵压缩,既然有压缩,肯定可以解压缩。正好,python已经为实现了上述的功能,在scipy库里。压缩分为行优先压缩和列优先压缩,同理,解压缩也有行优先和列优先之分。压缩和解压缩的函数接口都是一样的,其会根据传递的参数的不同进而达到实现不同的功能。
统一的接口函数是csr_matrix() csc_matrix(),根据第三个字母的不同,可以很容易的看出是行优先还是列优先;
压缩稀疏矩阵,只需要传递矩阵就好,
csr_matrix(arr) 行优先压缩
csc_matrix(arr) 列优先压缩
假设 compressed_mat为压缩后的矩阵,通过 compressed_mat.to()可以转化为稀疏矩阵。
代码示例:
arr=np.array([[0,0,0,1,0],[0,1,0,0,0]])
a=csc_matrix(arr)
print(a.data,a.indptr,a.indices) ##打印的三个参数,可以用来恢复压缩前的矩阵,具体怎么用,继续往下看就知道
print('\n',a.toarray())
运行结果:
[1 1] [0 0 1 1 2 2] [1 0]
[[0 0 0 1 0]
[0 1 0 0 0]]
arr=np.array([[0,0,0,1,0],[0,1,0,0,0]])
a=csr_matrix(arr)
print(a.data,a.indptr,a.indices) ##打印的三个参数,可以用来恢复压缩前的矩阵,具体怎么用,继续往下看就知道
print('\n',a.toarray())
运行结果:
[1 1] [0 1 2] [3 1]
[[0 0 0 1 0]
[0 1 0 0 0]]
当需要生成稀疏矩阵时,需要三个参数,也就是上面压缩后所打印的参数:
csr_matrix((data, (row_ind, col_ind)), shape=(M, N))
where data, row_ind and col_ind satisfy the relationship a[row_ind[k], col_ind[k]] = data[k].
csc_matrix((data,(row_ind,col_ind)),shape=[M,N])
where data,row_ind and col_ind satisfy the relationship a[row_ind[k], col_ind[k]]=data[k].
代码示例:
import numpy as np
from scipy.sparse import csr_matrix
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
a = csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
print(a)
运行结果:
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
import numpy as np
from scipy.sparse import csr_matrix
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
a = csc_matrix((data, (row, col)), shape=(3, 3)).toarray()
运行结果:
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
csr_matrix((data, indices, indptr), [shape=(M, N)])
is the standard CSR representation where the column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]]. If the shape parameter is not supplied, the matrix dimensions are inferred from the index arrays.
csc_matrix((data, indices, indptr), [shape=(M, N)])
is the standard CSR representation where the row indices for column i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]]. If the shape parameter is not supplied, the matrix dimensions are inferred from the index arrays.
所以,当上面两个函数的参数一样时,压缩前的矩阵互为转置矩阵。
代码示例:
indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
a = csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
print(a)
运行结果:
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
a = csc_matrix((data, indices, indptr), shape=(3, 3)).toarray()
print(a)
[[1 0 4]
[0 0 5]
[2 3 6]]
Sparse matrices can be used in arithmetic operations: they support addition, subtraction, multiplication, division, and matrix power.
Advantages of the CSR format
Disadvantages of the CSR format