Python scipy.sparse——压缩稀疏矩阵和还原

当矩阵中有很多零元素时,为了节省内存,可以对矩阵压缩,既然有压缩,肯定可以解压缩。正好,python已经为实现了上述的功能,在scipy库里。压缩分为行优先压缩和列优先压缩,同理,解压缩也有行优先和列优先之分。压缩和解压缩的函数接口都是一样的,其会根据传递的参数的不同进而达到实现不同的功能。

统一的接口函数是csr_matrix() csc_matrix(),根据第三个字母的不同,可以很容易的看出是行优先还是列优先;

压缩稀疏矩阵,只需要传递矩阵就好,

csr_matrix(arr) 行优先压缩

csc_matrix(arr) 列优先压缩

假设 compressed_mat为压缩后的矩阵,通过 compressed_mat.to()可以转化为稀疏矩阵。

代码示例:

arr=np.array([[0,0,0,1,0],[0,1,0,0,0]])
a=csc_matrix(arr)
print(a.data,a.indptr,a.indices) ##打印的三个参数,可以用来恢复压缩前的矩阵,具体怎么用,继续往下看就知道

print('\n',a.toarray())
运行结果:

[1 1] [0 0 1 1 2 2] [1 0]

 [[0 0 0 1 0]
 [0 1 0 0 0]]

arr=np.array([[0,0,0,1,0],[0,1,0,0,0]])
a=csr_matrix(arr)
print(a.data,a.indptr,a.indices) ##打印的三个参数,可以用来恢复压缩前的矩阵,具体怎么用,继续往下看就知道
print('\n',a.toarray())
运行结果:

[1 1] [0 1 2] [3 1]

 [[0 0 0 1 0]
 [0 1 0 0 0]]

 

当需要生成稀疏矩阵时,需要三个参数,也就是上面压缩后所打印的参数:

csr_matrix((data, (row_ind, col_ind)), shape=(M, N))

  where data, row_ind and col_ind satisfy the relationship a[row_ind[k], col_ind[k]] = data[k].

csc_matrix((data,(row_ind,col_ind)),shape=[M,N])

        where data,row_ind and col_ind satisfy the relationship a[row_ind[k], col_ind[k]]=data[k].

代码示例:

import numpy as np
from scipy.sparse import csr_matrix

row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
a = csr_matrix((data, (row, col)), shape=(3, 3)).toarray()

print(a)


运行结果:
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])


import numpy as np
from scipy.sparse import csr_matrix

row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
a = csc_matrix((data, (row, col)), shape=(3, 3)).toarray()

运行结果:
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

csr_matrix((data, indices, indptr), [shape=(M, N)])

  is the standard CSR representation where the column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are             stored in data[indptr[i]:indptr[i+1]]. If the shape parameter is not supplied, the matrix dimensions are inferred from the index arrays.

csc_matrix((data, indices, indptr), [shape=(M, N)])

  is the standard CSR representation where the row indices for column i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are             stored in data[indptr[i]:indptr[i+1]]. If the shape parameter is not supplied, the matrix dimensions are inferred from the index arrays.

所以,当上面两个函数的参数一样时,压缩前的矩阵互为转置矩阵。

代码示例:

indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
a = csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()

print(a)


运行结果:
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])


indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
a = csc_matrix((data, indices, indptr), shape=(3, 3)).toarray()

print(a)

[[1 0 4]
 [0 0 5]
 [2 3 6]]

Sparse matrices can be used in arithmetic operations: they support addition, subtraction, multiplication, division, and matrix power.

 

Advantages of the CSR format

  • efficient arithmetic operations CSR + CSR, CSR * CSR, etc.
  • efficient row slicing
  • fast matrix vector products

Disadvantages of the CSR format

  • slow column slicing operations (consider CSC)
  • changes to the sparsity structure are expensive (consider LIL or DOK)

 

 

 

你可能感兴趣的:(Python scipy.sparse——压缩稀疏矩阵和还原)