pandas 类别转化为数字

类别转化为数字,有三种方法,第一种可以转化为one-hot类型:

data=pd.DataFrame({"level":["low","high","medium","high"],"age":[14,33,24,35]})
print(pd.get_dummies(data))

#    age  level_high  level_low  level_medium
# 0   14           0          1             0
# 1   33           1          0             0
# 2   24           0          0             1
# 3   35           1          0             0

第二种可以自动转化为数字,但是数字之间不存在逻辑上的联系,例如“medium”的取值应该在“low”和“high”之间,但是下面的medium取值反而大于“low”和“high”:

data=pd.DataFrame({"level":["low","high","medium","high"],"age":[14,33,24,35]})
data["level"]=data["level"].astype("category").cat.codes+1
print(data)

#    level  age
# 0      2   14
# 1      1   33
# 2      3   24
# 3      1   35

第三种可以制定转化后,类别与数字的对应关系,例如让“high”变为3,让“low”变成1:

data=pd.DataFrame({"level":["low","high","medium","high"],"age":[14,33,24,35]})
level_map={"low":1,"medium":2,"high":3}
data["level"]=data["level"].map(level_map)
print(data)

#    level  age
# 0      1   14
# 1      3   33
# 2      2   24
# 3      3   35

你可能感兴趣的:(python,数据挖掘,pandas,python)