X = OneHotEncoder().fit_transform(X_data).todense()
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
def oneHot(df):
new_cols = []
for old_col in list(df.columns):
new_cols += sorted(['{0}_onehot_{1}'.format(old_col, str(x).lower()) for x in set(df[old_col].values)])
ec = OneHotEncoder()
ec.fit(df.values)
return pd.DataFrame(ec.transform(df).toarray(),columns=new_cols)
if __name__ == '__main__':
df = pd.DataFrame(np.arange(24).reshape(4,6))
print(df)
print(oneHot(df))
0 1 2 3 4 5
0 0 1 2 3 4 5
1 6 7 8 9 10 11
2 12 13 14 15 16 17
3 18 19 20 21 22 23
FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values.
If you want the future behaviour and silence this warning, you can specify "categories='auto'".
In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.
warnings.warn(msg, FutureWarning)
0 1 2 3 4 5 6 ... 17 18 19 20 21 22 23
0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 1.0 0.0 0.0 0.0
1 0.0 1.0 0.0 0.0 0.0 1.0 0.0 ... 1.0 0.0 0.0 0.0 1.0 0.0 0.0
2 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ... 0.0 1.0 0.0 0.0 0.0 1.0 0.0
3 0.0 0.0 0.0 1.0 0.0 0.0 0.0 ... 0.0 0.0 1.0 0.0 0.0 0.0 1.0
[4 rows x 24 columns]