Python:连续属性离散化(等频法)

import pandas as pd

datafile = '/users/dzh/downloads/discretization_data.xls' #参数初始化
data = pd.read_excel(datafile) #读取数据
data = data[u'share_credit_limit_amount'].copy()
k = 2

 

#等频率离散化
w = [1.0*i/k for i in range(k+1)]
w = data.describe(percentiles = w)[4:4+k+1] #使用describe函数自动计算分位数
w[0] = w[0]*(1-1e-10)
d2 = pd.cut(data, w, labels = range(k))


 
cluster_plot(d2, k).show()
d2.to_csv("/users/dzh/desktop/testcsv.csv", encoding='utf-8', index=False) 

你可能感兴趣的:(Python:连续属性离散化(等频法))