python dataframe 分位数_python – Groupby给出所选DataFrame列的值的百分位数

我现在没有计算机来测试它,但我认为你可以通过以下方式进行测试:df.groupby(pd.cut(df.col0,np.percentile(df.col0,[0,25,75,90] ,100]),include_lowest = True)).mean().将在150分钟后更新.

一些解释:

In [42]:

#use np.percentile to get the bin edges of any percentile you want

np.percentile(df.col0, [0, 25, 75, 90, 100])

Out[42]:

[0.0067930000000000004,

0.907609,

3.7436589999999996,

13.089311200000001,

19.319745999999999]

In [43]:

#Need to use include_lowest=True

print df.groupby(pd.cut(df.col0, np.percentile(df.col0, [0, 25, 75, 90, 100]), include_lowest=True)).mean()

col0 col1 col2

col0

[0.00679, 0.908] 0.457201 41.0 2.103996

(0.908, 3.744] 3.051177 923.5 5.790717

(3.744, 13.0893] NaN NaN NaN

(13.0893, 19.32] 19.319746 11969.0 7.405685

In [44]:

#Or the smallest values will be skiped

print df.groupby(pd.cut(df.col0, np.percentile(df.col0, [0, 25, 75, 90, 100]))).mean()

col0 col1 col2

col0

(0.00679, 0.908] 0.907609 82.0 4.207991

(0.908, 3.744] 3.051177 923.5 5.790717

(3.744, 13.0893] NaN NaN NaN

(13.0893, 19.32] 19.319746 11969.0 7.405685

你可能感兴趣的:(python,dataframe,分位数)