【sklearn】数据预处理之LabelEncoder()、OneHotEncoder()

基于scikit-learn

注意 OneHotEncoder(sparse=False),不然返回的就是索引值的形式

from numpy import array
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
# define example
data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold', 'warm', 'hot']
values = array(data)
print(values)

# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
print(integer_encoded)

# binary encode
onehot_encoder = OneHotEncoder(sparse=False)
values = values.reshape(len(values), 1)  #这一步很有必要
onehot_encoded = onehot_encoder.fit_transform(values)
print(onehot_encoded)

结果:

['cold' 'cold' 'warm' 'cold' 'hot' 'hot' 'warm' 'cold' 'warm' 'hot']
[0 0 2 0 1 1 2 0 2 1]
[[ 1.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]]

把one-hot编码还原成标签编码

# invert first example
int_endode = np.argmax(onehot_encoded, axis=1)
print(int_endode)

结果:

[0 0 2 0 1 1 2 0 2 1]

基于keras

假设已经有了标签编码[0 0 2 0 1 1 2 0 2 1],利用keras.utils.to_categorical()可以把标签编码转化成one-hot编码。

encoded = to_categorical(integer_encoded)
print(integer_encoded)
print(encoded)

得到结果:

[0 0 2 0 1 1 2 0 2 1]
[[ 1.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]]

一般来说,深度学习都要用到one-hot编码对y,也就是label进行处理。
参考:
https://blog.csdn.net/gdh756462786/article/details/79161525

你可能感兴趣的:(python,sklearn,python)