独热编码

关于独热编码的实现
一. scikit-learn库

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder

testdata = pd.DataFrame({'pet':['cat', 'dog', 'dog', 'fish']})   
a = LabelEncoder().fit_transform(testdata.pet)
b = OneHotEncoder(sparse=False).fit_transform(a.reshape(-1, 1))

# 拼接特征
np.hstack((b, b))

二. scikit-learn库

from sklearn.feature_extraction import DictVectorizer
vec = DictVectorizer(sparse=False)
vec.fit_transform(testdata.to_dict(orient='record'))

三. 使用pandas的get_dummies函数

pd.get_dummies(testdata,columns=['pet'])

你可能感兴趣的:(独热编码)