实现one-hot编码的两种方法

实现one hot encode的两种方法

Approach 1: You can use get_dummies on pandas dataframe.

#  transform a given column into one hot. Use prefix to have multiple dummies
>>> import pandas as pd
>>> df = pd.DataFrame({'A': ['a','b','c'],'B': ['b','a','c']})
>>># Get one hot encoding of columns B...
>>> df
  A  B
0 a  b
1 b  a
2 c  c
>>> one_hot = pd.get_dummies(df['B'])
>>># Drop columns B as it is now encoded...
>>> df = df.drop('B', axis=1)
>>># Join the encoded df...
>>> df = df.join(one_hot)
>>> df
  A  a  b  c
0 a  0  1  0
1 b  1  0  0
2 c  0  0  1

一个定性特征哑编码的demo:

def one_hot(df, cols):
"""
    @param df pandas DataFrame

    @param cols a list of columns to encode

    @return a DataFrame with one-hot encoding
"""
for each in cols:
        dummies = pd.get_dummies(df[each], prefix=each, drop_first=False)
        df = pd.concat([df, dummies], axis=1)
return df

使用 sklearn进行特征变量哑编码:

>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit([[0, 0, 3], [1,1,0], [0,2,1], [1,0,2]])
OneHotEncoder(categorical_features='all', dtype=,  handle_unknown='error', n_values='auto', sparse=True)
>>> enc.n_values_array([2,3,4])
>>> enc.feature_indices_array([0,2,5,9])
>>> enc.transform([[0,1,1]])<1x9 sparse matrix oftype''with3stored elementsinCompressed Sparse Rowformat
>>> enc.transform([[0,1,1]]).toarray()
array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

一个保存在全局的Label_Binarizer的demo:

from sklearn.preprocessing import LabelBinarizer
label_binarizer =LabelBinarizer()
label_binarizer.fit(all_your_labels_list)# need to be global or remembered to use it later
def one_hot_encode(x):
"""
One hot encode a list of sample labels. Return a one-hot encoded vector for each label.

    : x: List of sample Labels

    : return: Numpy array of one-hot encoded labels

    """
return label_binarizer.transform(x)

https://stackoverflow.com/questions/37292872/how-can-i-one-hot-encode-in-python

你可能感兴趣的:(实现one-hot编码的两种方法)