用于将数据值按照值本身进行分段并排序到 bins 中。
参数包含:x, bins, right, include_lowest, labels, retbins, precision
# x = [1,2,3,5,3,4,1], bins = 3
[In ] pd.cut(np.array([1,2,3,5,3,4,1]),3)
[Out] [(0.996, 2.333], (0.996, 2.333], (2.333, 3.667], (3.667, 5.0], (2.333, 3.667], (3.667, 5.0], (0.996, 2.333]]
Categories (3, interval[float64]): [(0.996, 2.333] < (2.333, 3.667] < (3.667, 5.0]]
# x = [1,2,3,5,3,4,1], bins = [1,2,3]
[In ] pd.cut(np.array([1,2,3,5,3,4,1]),[1,2,3])
[Out] [NaN, (1.0, 2.0], (2.0, 3.0], NaN, (2.0, 3.0], NaN, NaN]
Categories (2, interval[int64]): [(1, 2] < (2, 3]]
# x = [1,2,3,5,3,4,1], bins = [1,2,3], 默认不包含左端点 1,默认包含右端点 3
[In ] pd.cut(np.array([1,2,3,5,3,4,1]),[1,2,3])
[Out] [NaN, (1.0, 2.0], (2.0, 3.0], NaN, (2.0, 3.0], NaN, NaN]
Categories (2, interval[int64]): [(1, 2] < (2, 3]]
# x = [1,2,3,5,3,4,1], bins = [1,2,3], 设置包含左端点 1,设置包含右端点 3
[In ] pd.cut(np.array([1,2,3,5,3,4,1]),[1,2,3],include_lowest=True,right=False)
[Out] [[1.0, 2.0), [2.0, 3.0), NaN, NaN, NaN, NaN, [1.0, 2.0)]
Categories (2, interval[int64]): [[1, 2) < [2, 3)]
# x = [1,2,3,5,3,4,1], bins = 3, 设置用指定标签 ['A','B','C'] 返回序列
[In ] pd.cut(np.array([1,2,3,5,3,4,1]),3,labels=['A','B','C'])
[Out] [A, A, B, C, B, C, A]
Categories (3, object): [A < B < C]
# x = [1,2,3,5,3,4,1], bins = 3, 设置一并返回对应 bins 序列
[In ] pd.cut(np.array([1,2,3,5,3,4,1]),3,retbins=True)
[Out] ([(0.996, 2.333], (0.996, 2.333], (2.333, 3.667], (3.667, 5.0], (2.333, 3.667], (3.667, 5.0], (0.996, 2.333]]
Categories (3, interval[float64]): [(0.996, 2.333] < (2.333, 3.667] < (3.667, 5.0]],
array([0.996 , 2.33333333, 3.66666667, 5. ]))
# x = [1,2,3,5,3,4,1], bins = 3, 精度为2
[In ] pd.cut(np.array([1,2,3,5,3,4,1]),3,precision=2)
[Out] [(1.0, 2.33], (1.0, 2.33], (2.33, 3.67], (3.67, 5.0], (2.33, 3.67], (3.67, 5.0], (1.0, 2.33]]
Categories (3, interval[float64]): [(1.0, 2.33] < (2.33, 3.67] < (3.67, 5.0]]
基于分位数的离散化功能。 根据等级或基于样本分位数(或者说基于样本值落在区间的频率),将变量分离为相等大小的桶。
参数包含:x, q, labels, retbins, precision, duplicates
[In] ll = [1,2,3,5,3,4,1,2]
print('- - - pd.cut()示例1 - - -')
print(pd.cut(ll, 4, precision=2).value_counts())
print('- - - pd.cut()示例2 - - -')
print(pd.cut(ll, [1,2,4], precision=2).value_counts())
print('- - - pd.qcut()示例 - - -')
print(pd.qcut(ll, 4, precision=2).value_counts())
[Out] - - - pd.cut()示例1 - - -
(1.0, 2.0] 4
(2.0, 3.0] 2
(3.0, 4.0] 1
(4.0, 5.0] 1
dtype: int64
- - - pd.cut()示例2 - - -
(1, 2] 2
(2, 4] 3
dtype: int64
- - - pd.qcut()示例 - - -
(0.99, 1.75] 2
(1.75, 2.5] 2
(2.5, 3.25] 2
(3.25, 5.0] 2
dtype: int64