onehot的transform方法输出矩阵为numpy的稀疏矩阵

xgb_enc_1 = OneHotEncoder()
xgb_enc_2 = OneHotEncoder()

xgb_enc_1.fit(model_1.apply(train_gb))
xgb_enc_2.fit(model_2.apply(train_gb))
#transform输出据真为稀疏矩阵,train_lr为numpy的稠密矩阵
temp_1 = xgb_enc_1.transform(model_1.apply(train_lr))
temp_2 = xgb_enc_2.transform(model_2.apply(train_lr))
temp_3 = train_lr

temp_1
Out[24]: 
<256x1624 sparse matrix of type 'numpy.float64'>'
	with 217600 stored elements in Compressed Sparse Row format>

temp_2
Out[25]: 
<256x1977 sparse matrix of type 'numpy.float64'>'
	with 217600 stored elements in Compressed Sparse Row format>

temp_3.shape
Out[31]: (256, 14)

如果直接使用np.hstack进行拼接:

train_lr_ext_2 = np.hstack((temp_1,temp_3))
报错:
ValueError: all the input arrays must have same number of dimensions

稀疏矩阵与稠密矩阵维度不一致,解决此问题两种方法:

  1. todense()函数
 a = temp_1.todense()

train_lr_ext_2 = np.hstack((a,temp_3))

train_lr_ext_2.shape
Out[34]: (256, 1638)
  1. 使用scipy.saprse的hstack()函数进行拼接
from scipy.sparse import hstack

b = hstack((temp_1,temp_3))

b.shape
Out[39]: (256, 1638)

你可能感兴趣的:(onehot的transform方法输出矩阵为numpy的稀疏矩阵)