pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates=‘raise’, ordered=True)[source]
Parameters
x:array-like
The input array to be binned. Must be 1-dimensional.
binsint, sequence of scalars, or IntervalIndex
The criteria to bin by.
right:bool, default True
Indicates whether bins includes the rightmost edge or not. If right == True (the default), then the bins [1, 2, 3, 4] indicate (1,2], (2,3], (3,4]. This argument is ignored when bins is an IntervalIndex.
label:sarray or False, default None
Specifies the labels for the returned bins. Must be the same length as the resulting bins. If False, returns only integer indicators of the bins. This affects the type of the output container (see below). This argument is ignored when bins is an IntervalIndex. If True, raises an error. When ordered=False, labels must be provided.
retbin:sbool, default False
Whether to return the bins or not. Useful when bins is provided as a scalar.
precision:int, default 3
The precision at which to store and display the bins labels.
include_lowest:bool, default False
Whether the first interval should be left-inclusive or not.
duplicates:{default ‘raise’, ‘drop’}, optional
If bin edges are not unique, raise ValueError or drop non-uniques.
ordered:bool, default True
Whether the labels are ordered or not. Applies to returned types Categorical and Series (with Categorical dtype). If True, the resulting categorical will be ordered. If False, the resulting categorical will be unordered (labels must be provided).
Returns
out:Categorical, Series, or ndarray
An array-like object representing the respective bin for each value of x. The type depends on the value of labels.
bin:snumpy.ndarray or IntervalIndex.
The computed or specified bins. Only returned when retbins=True. For scalar or sequence bins, this is an ndarray with the computed bins. If set duplicates=drop, bins will drop non-unique bin. For an IntervalIndex bins, this is equal to bins.
pandas.cut(x,bin,right=True,labels=None,retbins=False,precision=3,include_lowest=False, duplicates=’raise’, ordered=True)
将值分类为离散间隔。
当您需要将数据值分段和分类到 bin 中时,请使用cut 。此函数对于从连续变量到分类变量也很有用。例如,cut可以将年龄转换为年龄范围组。支持分箱成相等数量的箱,或预先指定的箱阵列。
参数:
x: array数组 要分箱的输入数组,必须是一维的.
bins: 分箱的段数,一般为整型,也可以是序列.
retbins: 布尔值;是否返回数值所在分组,True返回.
prscision: int,默认3 存储和显示bin标签的精度.
Return:
out:
一个类似数组的对象,表示x的每个值的相应 bin 。类型取决于标签的值。
无(默认):为系列x返回一个系列或为所有其他输入返回一个分类。存储在其中的值是 Interval dtype。
标量序列:为系列x返回一个系列或为所有其他输入返回一个分类。存储在其中的值是序列中的任何类型。
False :返回整数的 ndarray。
bins :
numpy.ndarray 或 IntervalIndex。
计算或指定的 bin。仅在retbins=True时返回。对于标量或序列bins,这是一个带有计算 bins 的 ndarray。
例:
import numpy as np
import pandas as pd
pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3)
#[(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], (0.994, 3.0]]
#Categories (3, interval[float64]): [(0.994, 3.0] < (3.0, 5.0] < (5.0, 7.0]]
pd.cut(np.array([1, 7, 5, 4, 6, 3]), 3, retbins=True)
#([(0.994, 3.0], (5.0, 7.0], (3.0, 5.0], (3.0, 5.0], (5.0, 7.0], (0.994, 3.0]]
#Categories (3, interval[float64]): [(0.994, 3.0] < (3.0, 5.0] < (5.0, 7.0]],
# array([0.994, 3. , 5. , 7. ]))
pandas.qcut(x, q, labels=None, retbins=False, precision=3, duplicates=‘raise’)[source]
Quantile-based discretization function.
Discretize variable into equal-sized buckets based on rank or based on sample quantiles. For example 1000 values for 10 quantiles would produce a Categorical object indicating quantile membership for each data point.
Parameters
Returns
out:Categorical or Series or array of integers if labels is False
The return type (Categorical or Series) depends on the input: a Series of type category if input is a Series else Categorical. Bins are represented as categories when categorical data is returned.
bins:ndarray of floats
Returned only if retbins is True.
例:
import pandas as pd
pd.qcut(range(5), 4)
#[(-0.001, 1.0], (-0.001, 1.0], (1.0, 2.0], (2.0, 3.0], (3.0, 4.0]]
#Categories (4, interval[float64]): [(-0.001, 1.0] < (1.0, 2.0] < (2.0, 3.0] < (3.0, 4.0]]