'''
pd.cut(
x, # 切分的数据
bins, # 切分的区域
right:bool=True, # 控制左右区间的开和闭 True: 左开右闭 False: 左闭右开 默认为True,即左开右闭
labels=None, # 标签,对应切分的区域,代替返回的bins,不存在时返回NaN
retbins:bool=False, # 是否返回区域bins,默认为False
precision:int=3, # 精度,默认为3
include_lowest:bool=False, # 最左端第一个点是否包含 默认为False,即不包含
duplicates:str='raise' # 若切分的数据带有索引,不是单独一列,为'drop' 默认为'raise',即切分的数据为一列
)
'''
可以为一个list,也可以为一个int
import numpy as np
x = np.array([10,32,38,55,70,92])
# bins 为 int
pd.cut(x, bins=3)
# output
[(9.918, 37.333], (9.918, 37.333], (37.333, 64.667], (37.333, 64.667], (64.667, 92.0], (64.667, 92.0]]
Categories (3, interval[float64]): [(9.918, 37.333] < (37.333, 64.667] < (64.667, 92.0]]
# bins 为 list
pd.cut(x, bins=[10,30,80,100])
# output
[NaN, (30.0, 80.0], (30.0, 80.0], (30.0, 80.0], (30.0, 80.0], (80.0, 100.0]]
Categories (3, interval[int64]): [(10, 30] < (30, 80] < (80, 100]]
控制左右区间的开和闭,默认为True,即左开右闭.
True: 左开右闭
False: 左闭右开
从下面代码输出可以看到,right为False时,区间左闭右开,反之…
import numpy as np
x = np.array([10,32,38,55,70,92])
pd.cut(x, bins=[10,30,80,100],right=False)
# output
[[10, 30), [30, 80), [30, 80), [30, 80), [30, 80), [80, 100)]
Categories (3, interval[int64]): [[10, 30) < [30, 80) < [80, 100)]
标签,对应切分的区域,代替返回的bins,不存在时返回NaN
import numpy as np
x = np.array([10,32,38,55,70,92])
pd.cut(x, bins=[10,30,80,100],labels = ['低','中','高'])
# output
[NaN, 中, 中, 中, 中, 高]
Categories (3, object): [低 < 中 < 高]
是否返回区域bins,默认为False,即不返回
下列代码给出True时的输出结果,array([ 10, 30, 80, 100])
import numpy as np
x = np.array([10,32,38,55,70,92])
pd.cut(x, bins=[10,30,80,100],retbins=True)
# output
([NaN, (30.0, 80.0], (30.0, 80.0], (30.0, 80.0], (30.0, 80.0], (80.0, 100.0]]
Categories (3, interval[int64]): [(10, 30] < (30, 80] < (80, 100]],
array([ 10, 30, 80, 100]))
左端第一个点是否包含 默认为False,即不包含
下列代码给出True时的输出结果,包含最左端点
最左端点为10,当数据中存在10时,由于要包含最左端点,数据10的区间划分显示为(9.999,30.0]
import numpy as np
x = np.array([10,32,38,55,70,92])
pd.cut(x, bins=[10,30,80,100],include_lowest=True)
# output
[(9.999, 30.0], (30.0, 80.0], (30.0, 80.0], (30.0, 80.0], (30.0, 80.0], (80.0, 100.0]]
Categories (3, interval[float64]): [(9.999, 30.0] < (30.0, 80.0] < (80.0, 100.0]]
若切分的数据带有索引,不是单独一列,为’drop’ 默认为’raise’,即切分的数据为一列
下列代码给出’drop‘时的输出结果
import numpy as np
x = pd.Series(np.array([10,32,38,55,70,92]),
index=['a', 'b', 'c', 'd', 'e','f'])
pd.cut(x, bins=[10,30,80,100], duplicates='drop')
# output
a NaN
b (30.0, 80.0]
c (30.0, 80.0]
d (30.0, 80.0]
e (30.0, 80.0]
f (80.0, 100.0]
dtype: category
Categories (3, interval[int64]): [(10, 30] < (30, 80] < (80, 100]]
如有错误,恳请指出~