sklearn.preprocessing.MultiLabelBinarizer

多标签二值化:sklearn.preprocessing.MultiLabelBinarizer(classes=None, sparse_output=False)

classes_属性:若设置classes参数时,其值等于classes参数值,否则从训练集统计标签值

①classes默认值,classes_属性值从训练集中统计标签值

In [1]: from sklearn.preprocessing import MultiLabelBinarizer
   ...: mlb = MultiLabelBinarizer()
   ...: mlb.fit_transform([(1, 2), (3,4),(5,)])
   ...:
Out[1]:
array([[1, 1, 0, 0, 0],
       [0, 0, 1, 1, 0],
       [0, 0, 0, 0, 1]])

In [2]: mlb.classes_
Out[2]: array([1, 2, 3, 4, 5])

In [5]: from sklearn.preprocessing import MultiLabelBinarizer
   ...: mlb = MultiLabelBinarizer(sparse_output=True)
   ...: mlb.fit_transform([set(['sci-fi', 'thriller']), set(['comedy'])]).toarr
   ...: ay()
   ...:
Out[5]:
array([[0, 1, 1],
       [1, 0, 0]])

②设置classes参数,classes_属性值等于classes参数值

In [3]: from sklearn.preprocessing import MultiLabelBinarizer
   ...: mlb = MultiLabelBinarizer(classes = [2,3,4,5,6,1])
   ...: mlb.fit_transform([(1, 2), (3,4),(5,)])
   ...:
Out[3]:
array([[1, 0, 0, 0, 0, 1],
       [0, 1, 1, 0, 0, 0],
       [0, 0, 0, 1, 0, 0]])

In [4]: mlb.classes_
Out[4]: array([2, 3, 4, 5, 6, 1])



你可能感兴趣的:(sklearn)