Pandas玩转数据(十) -- 数据分箱技术Binning

Python3数据科学汇总: https://blog.csdn.net/weixin_41793113/article/details/99707225 


import numpy as np
import pandas as pd
from pandas import Series, DataFrame

score_list = np.random.randint(25, 100, size=20)
score_list

bins = [0,59,70,80,100]
score_cat = pd.cut(score_list, bins) 左开右闭区间
pd.value_counts(score_cat)

Pandas玩转数据(十) -- 数据分箱技术Binning_第1张图片

df = DataFrame()
df['score'] = score_list
df['student'] = [pd.util.testing.rands(3) for i in range(20)]
df['Categories'] = pd.cut(df['score'],bins, labels=['Low','OK','Good','Great'])
df ##造数据df

Pandas玩转数据(十) -- 数据分箱技术Binning_第2张图片

# 对score进行分箱
bins = [0,59,70,80,100]
pd.cut(df['score'], bins)

Pandas玩转数据(十) -- 数据分箱技术Binning_第3张图片

# 传入labels标签,进行分箱标记
del df['Categories']
df['cc'] = pd.cut(df['score'], bins, labels=['low','ok','good','great'])
df

Pandas玩转数据(十) -- 数据分箱技术Binning_第4张图片

 

你可能感兴趣的:(python)