多标签二值化:sklearn.preprocessing.MultiLabelBinarizer(classes=None, sparse_output=False)
classes_属性:若设置classes参数时,其值等于classes参数值,否则从训练集统计标签值
①classes默认值,classes_属性值从训练集中统计标签值
In [1]: from sklearn.preprocessing import MultiLabelBinarizer
...: mlb = MultiLabelBinarizer()
...: mlb.fit_transform([(1, 2), (3,4),(5,)])
...:
Out[1]:
array([[1, 1, 0, 0, 0],
[0, 0, 1, 1, 0],
[0, 0, 0, 0, 1]])
In [2]: mlb.classes_
Out[2]: array([1, 2, 3, 4, 5])
In [5]: from sklearn.preprocessing import MultiLabelBinarizer
...: mlb = MultiLabelBinarizer(sparse_output=True)
...: mlb.fit_transform([set(['sci-fi', 'thriller']), set(['comedy'])]).toarr
...: ay()
...:
Out[5]:
array([[0, 1, 1],
[1, 0, 0]])
②设置classes参数,classes_属性值等于classes参数值
In [3]: from sklearn.preprocessing import MultiLabelBinarizer
...: mlb = MultiLabelBinarizer(classes = [2,3,4,5,6,1])
...: mlb.fit_transform([(1, 2), (3,4),(5,)])
...:
Out[3]:
array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0]])
In [4]: mlb.classes_
Out[4]: array([2, 3, 4, 5, 6, 1])