np.percentile用法详解

np.percentile用来计算一组数的百分位数,其中50%分位数就是中位数。函数文档如下

In [1]: import numpy as np

In [2]: np.percentile?
Signature:
np.percentile(
    a,
    q,
    axis=None,
    out=None,
    overwrite_input=False,
    interpolation='linear',
    keepdims=False,
)
Docstring:
Compute the q-th percentile of the data along the specified axis.

Returns the q-th percentile(s) of the array elements.

Parameters
----------
a : array_like
    Input array or object that can be converted to an array.
q : array_like of float
    Percentile or sequence of percentiles to compute, which must be between
    0 and 100 inclusive.
axis : {int, tuple of int, None}, optional
    Axis or axes along which the percentiles are computed. The
    default is to compute the percentile(s) along a flattened
    version of the array.

    .. versionchanged:: 1.9.0
        A tuple of axes is supported
out : ndarray, optional
    Alternative output array in which to place the result. It must
    have the same shape and buffer length as the expected output,
    but the type (of the output) will be cast if necessary.
overwrite_input : bool, optional
    If True, then allow the input array `a` to be modified by intermediate
    calculations, to save memory. In this case, the contents of the input
    `a` after this function completes is undefined.

interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
    This optional parameter specifies the interpolation method to
    use when the desired percentile lies between two data points
    ``i < j``:

    * 'linear': ``i + (j - i) * fraction``, where ``fraction``
      is the fractional part of the index surrounded by ``i``
      and ``j``.
    * 'lower': ``i``.
    * 'higher': ``j``.
    * 'nearest': ``i`` or ``j``, whichever is nearest.
    * 'midpoint': ``(i + j) / 2``.

    .. versionadded:: 1.9.0
keepdims : bool, optional
    If this is set to True, the axes which are reduced are left in
    the result as dimensions with size one. With this option, the
    result will broadcast correctly against the original array `a`.

    .. versionadded:: 1.9.0

其余的参数我们都忽略,重点来看interpolation。从代码中我们可以知道np.percentile默认使用的是linear即线性插值的方式来计算百分位数。从文档对interpolation的解释来看,不论哪种计算方式都会涉及到i、j注意:i、j是值,不是下标。那么接下来就介绍i、j如何计算:

  1. 首先对数组进行从小到大的排序例如a = [4, 2, 1, 3] 排序之后 a = [1, 2, 3, 4]
  2. 然后计算位置loc = 1 + (n - 1) * p 。其中n是数组长度;p为百分位数0 <= p <= 1,例如p = 0.95 表示95%分位数;loc表示百分位数是数组中的第几个元素,例如loc = 3.0表示百分位数是数组中第3个元素其下标是2注意:在Python中如果 p 为浮点数则最终结果为浮点数。
  3. loc小数部分不为零则i = a[loc整数部分 - 1]j = a[loc整数部分];若loc为整数或小数部分为零则i = j = 百分位数 = a[loc整数部分 - 1]

代码示例

import numpy as np
a=np.array(([5, 3, 1, 7, 9]))
a = np.sort(a)  # a = [1, 3, 5, 7, 9]
loc = 1 + (5 - 1) * 0.5  # loc=3.0;50% 分位数即中位数
a[int(loc) - 1]  # 结果为:5;因为loc小数部分为零所以执行a[loc整数部分 - 1],即a[3 - 1] = a[2] =  5

下面介绍线性插值法计算百分位数

  1. loc为整数或小数部分为零,则a[loc整数部分 - 1]即为百分位数。
  2. loc小数部分非零,例如2.4则表示百分位数是数组中第2.4个元素,此时百分位数的计算公式如下:a[loc整数部分 - 1] + (a[loc整数部分] - a[loc整数部分 - 1]) * loc小数部分i + (j - i) * loc小数部分。以2.4为例:第2.4个元素在第二和第三个元素之间即a[1]a[2]之间,所以结果为a[1] + (a[2] - a[1]) * 0.4.

例1

In [1]: import numpy as np

In [2]: a=np.array(([7, 9, 5, 1, 3]))

In [3]: np.percentile(a,30)
Out[3]: 3.4000000000000004

解析:

  1. 首先对a进行排序得到[1, 3, 5, 7, 9]
  2. 然后计算位置loc = 1 + (5 - 1) * 0.3 = 2.2表示30%分位数为第数组中第2.2个元素
  3. 因为2.2小数部分非零,所以百分位数计算公式为:a[1] + (a[2] - a[1]) * 0.2 = 3 + (5 - 3) * 0.2 = 3.4

例2

In [4]: np.percentile(a,50)
Out[4]: 5.0

解析:

  1. 首先对a进行排序得到[1, 3, 5, 7, 9]
  2. 然后计算位置loc = 1 + (5 - 1) * 0.5 = 3.0表示50%分位数为第数组中第3个元素
  3. 因为3.0小数部分为零所以百分位数为a[3 - 1]a[2],结果是5

你可能感兴趣的:(Python,Numpy)