采用scipy.parse.hstack() 合并2个表的时候报错。
事实证明这是个大坑。
在stackoverflow上搜到这个回答。
def hstack(blocks ...):
return bmat([blocks], ...)
def bmat(blocks, ...):
blocks = np.asarray(blocks, dtype='object')
if blocks.ndim != 2:
raise ValueError('blocks must be 2-D')
(continue)
# @hpaulj
# https://stackoverflow.com/questions/31900567/scipy-sparse-hstack1-2-valueerror-blocks-must-be-2-d-why
可以看到,先将传入的参数转化为np.ndarray,然后判断是不是2维。
做测试:
import numpy as np
from scipy.sparse import coo_matrix, hstack
aa = np.array([[4],[5],[6]])
ba = np.array( [[1],[2],[3]])
print(aa.shape) #(3,1)
print(bb.shape) #(3,1)
A = scipy.sparse.coo_matrix(aa)
B = scipy.sparse.coo_matrix(bb)
print(A.shape) #(3,1)
print(B.shape) #(3,1)
#转换成scipy.sparse.coo_matrix之后可以正常合并
C=hstack([A,B])
print(C.shape) #(3,2)
#使用原生的numpy.ndarray就会报错
c=hstack([aa,bb])
#raise('blocks must be 2-D')
这不是大坑是什么??
凭什么np.ndarray就报错啊!!!
我们还原一下
#在hstack函数内,先将传入的参数转换成np.ndarray
blocks = [aa,bb]
blocks = np.asarray(blocks, dtype='object')
#打印看看
print(blocks.shape)
#输出
#(2, 3, 1),这不是变成纵向排列了吗!
print(blocks.ndim)
#输出3,所以被判定为不是2-D矩阵
if blocks.ndim != 2:
raise ValueError('blocks must be 2-D')
所以说来说去,还是numpy自己的函数np.asarray()写的不好。
搞sparse.coo_matrix的时候,[A,B]被横向叠加。
自np.ndarray的时候,[aa,bb]被纵向叠加。
综上,为了解决报错
建议在使用前都先转换成sparse.coo_matrix
import numpy as np
from scipy.sparse import coo_matrix, hstack
A = scipy.sparse.coo_matrix(aa)
B = scipy.sparse.coo_matrix(bb)
C = hstack([A,B])
#这样就不会出错了
但是这样转换很麻烦诶(台湾腔)!!
那怎么办呢!!
老爹说要用魔法打败魔法!
numpy的事情交给numpy对付!
numpy.
hstack
(tup)[source]This is equivalent to concatenation along the second axis, except for 1-D arrays where it concatenates along the first axis. Rebuilds arrays divided by
hsplit
.tup : sequence of ndarrays
The arrays must have the same shape along all but the second axis, except 1-D arrays which can be any length.
示例:
import numpy as np
import scipy
from scipy.sparse import coo_matrix, hstack
aa = np.array([[4],[5],[6]])
bb = np.array( [[1],[2],[3]])
print(aa.shape) #(3,1)
print(bb.shape) #(3,1)
#要用魔法打败魔法!
c=np.hstack([aa,bb])
print(c.shape)
#(3,2)
#最后转回sparse矩阵
cc = scipy.sparse.coo_matrix(c)
#-----------------------------------
A = scipy.sparse.coo_matrix(aa)
B = scipy.sparse.coo_matrix(bb)
print(A.shape)
print(B.shape)
C= scpipy.sparse.hstack([A,B])
print(C.shape)