在python中,dataframe自身带了nlargest和nsmallest用来求解n个最大值/n个最小值,具体案例如下:
data = pd.DataFrame(np.array([[1,2],[3,4],[5,6],[7,8],[6,8],[17,98]]),columns=['x','y'],dtype=float)
Three = data.nlargest(3,'y',keep='all')
print(Three)
结果:
x y
5 17.0 98.0
3 7.0 8.0
4 6.0 8.0
data = pd.DataFrame(np.array([[1,2],[3,4],[5,6],[7,8],[6,8],[17,98]]),columns=['x','y'],dtype=float)
Three = data.nsmallest(3,'y',keep='all')
print(Three)
结果:
` x y
0 1.0 2.0
1 3.0 4.0
2 5.0 6.0`
想要了解更详细的信息,可参考源码,源码解释的很详细,并且有官方案例可参考:
def nlargest(self, n, columns, keep="first") -> "DataFrame":
"""
Return the first `n` rows ordered by `columns` in descending order.
Return the first `n` rows with the largest values in `columns`, in
descending order. The columns that are not specified are returned as
well, but not used for ordering.
This method is equivalent to
``df.sort_values(columns, ascending=False).head(n)``, but more
performant.
Parameters
----------
n : int
Number of rows to return.
columns : label or list of labels
Column label(s) to order by.
keep : {'first', 'last', 'all'}, default 'first'
Where there are duplicate values:
- `first` : prioritize the first occurrence(s)
- `last` : prioritize the last occurrence(s)
- ``all`` : do not drop any duplicates, even it means
selecting more than `n` items.
.. versionadded:: 0.24.0
Returns
-------
DataFrame
The first `n` rows ordered by the given columns in descending
order.
See Also
--------
DataFrame.nsmallest : Return the first `n` rows ordered by `columns` in
ascending order.
DataFrame.sort_values : Sort DataFrame by the values.
DataFrame.head : Return the first `n` rows without re-ordering.
Notes
-----
This function cannot be used with all column types. For example, when
specifying columns with `object` or `category` dtypes, ``TypeError`` is
raised.
Examples
--------
>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
... 434000, 434000, 337000, 11300,
... 11300, 11300],
... 'GDP': [1937894, 2583560 , 12011, 4520, 12128,
... 17036, 182, 38, 311],
... 'alpha-2': ["IT", "FR", "MT", "MV", "BN",
... "IS", "NR", "TV", "AI"]},
... index=["Italy", "France", "Malta",
... "Maldives", "Brunei", "Iceland",
... "Nauru", "Tuvalu", "Anguilla"])
>>> df
population GDP alpha-2
Italy 59000000 1937894 IT
France 65000000 2583560 FR
Malta 434000 12011 MT
Maldives 434000 4520 MV
Brunei 434000 12128 BN
Iceland 337000 17036 IS
Nauru 11300 182 NR
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
In the following example, we will use ``nlargest`` to select the three
rows having the largest values in column "population".
>>> df.nlargest(3, 'population')
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Malta 434000 12011 MT
When using ``keep='last'``, ties are resolved in reverse order:
>>> df.nlargest(3, 'population', keep='last')
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Brunei 434000 12128 BN
When using ``keep='all'``, all duplicate items are maintained:
>>> df.nlargest(3, 'population', keep='all')
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Malta 434000 12011 MT
Maldives 434000 4520 MV
Brunei 434000 12128 BN
To order by the largest values in column "population" and then "GDP",
we can specify multiple columns like in the next example.
>>> df.nlargest(3, ['population', 'GDP'])
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Brunei 434000 12128 BN
"""
def nsmallest(self, n, columns, keep="first") -> "DataFrame":
"""
Return the first `n` rows ordered by `columns` in ascending order.
Return the first `n` rows with the smallest values in `columns`, in
ascending order. The columns that are not specified are returned as
well, but not used for ordering.
This method is equivalent to
``df.sort_values(columns, ascending=True).head(n)``, but more
performant.
Parameters
----------
n : int
Number of items to retrieve.
columns : list or str
Column name or names to order by.
keep : {'first', 'last', 'all'}, default 'first'
Where there are duplicate values:
- ``first`` : take the first occurrence.
- ``last`` : take the last occurrence.
- ``all`` : do not drop any duplicates, even it means
selecting more than `n` items.
.. versionadded:: 0.24.0
Returns
-------
DataFrame
See Also
--------
DataFrame.nlargest : Return the first `n` rows ordered by `columns` in
descending order.
DataFrame.sort_values : Sort DataFrame by the values.
DataFrame.head : Return the first `n` rows without re-ordering.
Examples
--------
>>> df = pd.DataFrame({'population': [59000000, 65000000, 434000,
... 434000, 434000, 337000, 337000,
... 11300, 11300],
... 'GDP': [1937894, 2583560 , 12011, 4520, 12128,
... 17036, 182, 38, 311],
... 'alpha-2': ["IT", "FR", "MT", "MV", "BN",
... "IS", "NR", "TV", "AI"]},
... index=["Italy", "France", "Malta",
... "Maldives", "Brunei", "Iceland",
... "Nauru", "Tuvalu", "Anguilla"])
>>> df
population GDP alpha-2
Italy 59000000 1937894 IT
France 65000000 2583560 FR
Malta 434000 12011 MT
Maldives 434000 4520 MV
Brunei 434000 12128 BN
Iceland 337000 17036 IS
Nauru 337000 182 NR
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
In the following example, we will use ``nsmallest`` to select the
three rows having the smallest values in column "population".
>>> df.nsmallest(3, 'population')
population GDP alpha-2
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
Iceland 337000 17036 IS
When using ``keep='last'``, ties are resolved in reverse order:
>>> df.nsmallest(3, 'population', keep='last')
population GDP alpha-2
Anguilla 11300 311 AI
Tuvalu 11300 38 TV
Nauru 337000 182 NR
When using ``keep='all'``, all duplicate items are maintained:
>>> df.nsmallest(3, 'population', keep='all')
population GDP alpha-2
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
Iceland 337000 17036 IS
Nauru 337000 182 NR
To order by the smallest values in column "population" and then "GDP", we can
specify multiple columns like in the next example.
>>> df.nsmallest(3, ['population', 'GDP'])
population GDP alpha-2
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
Nauru 337000 182 NR
"""