sklearn数值特征离散值处理1: LabelEncoder

import pandas as pd
import numpy as np
vg_df = pd.read_csv('vgsales.csv', encoding = 'ISO-8859-1')
vg_df[['Name', 'Platform', 'Year', 'Genre', 'Publisher']].iloc[1:7]

sklearn数值特征离散值处理1: LabelEncoder_第1张图片

genres = np.unique(vg_df['Genre'])
genres

array([‘Action’, ‘Adventure’, ‘Fighting’, ‘Misc’, ‘Platform’, ‘Puzzle’,
‘Racing’, ‘Role-Playing’, ‘Shooter’, ‘Simulation’, ‘Sports’,
‘Strategy’], dtype=object)

from sklearn.preprocessing import LabelEncoder

gle = LabelEncoder()
genre_labels = gle.fit_transform(vg_df['Genre'])
genre_mappings = {
     index: label for index, label in enumerate(gle.classes_)}
print(genre_mappings)

{0: ‘Action’,
1: ‘Adventure’,
2: ‘Fighting’,
3: ‘Misc’,
4: ‘Platform’,
5: ‘Puzzle’,
6: ‘Racing’,
7: ‘Role-Playing’,
8: ‘Shooter’,
9: ‘Simulation’,
10: ‘Sports’,
11: ‘Strategy’}

gle.classes_

array([‘Action’, ‘Adventure’, ‘Fighting’, ‘Misc’, ‘Platform’, ‘Puzzle’,
‘Racing’, ‘Role-Playing’, ‘Shooter’, ‘Simulation’, ‘Sports’,
‘Strategy’], dtype=object)

vg_df['GenreLabel'] = genre_labels
vg_df[['Name', 'Platform', 'Year', 'Genre', 'GenreLabel']].iloc[:7]

sklearn数值特征离散值处理1: LabelEncoder_第2张图片

你可能感兴趣的:(python,#,sklearn数据预处理,python,sklearn,LabelEncoder,数值特征,离散值处理)