从零开始学Pandas(四)-DataFrame API介绍3

1. 找最大的一批值

import pandas as pd
import numpy as np
pd.set_option('max_columns', 4, 'max_rows', 10, 'max_colwidth', 12)

movie = pd.read_csv('../data/movie.csv')
movie2 = movie[['movie_title', 'imdb_score', 'budget']]
movie2.head()

# Use the .nlargest method to select the top N data by column name
# eg: select the top 100 movies by imdb_score
movie2.nlargest(100, 'imdb_score').head()

结果


image.png

 

2. 找一批最小值

你可以用链式操作,在前一个结果集上继续调用函数。
如下:

# chain the .nsmallest method to return the 3 lowest budget films among those with a top 100 score
(movie2
  .nlargest(100, 'imdb_score')
  .nsmallest(3, 'budget')
)

结果


image.png

 

3. 值排序

(movie
  [['movie_title', 'title_year', 'imdb_score']]
  .sort_values('imdb_score', ascending=False)
)

结果


image.png

 

4. 去重

去重前

# original data
(movie
  [['movie_title', 'title_year', 'imdb_score']]
  .sort_values(['title_year','imdb_score'],
               ascending=False)
)
image.png

去重后

# use the .drop_duplicates method to keep only the first row of every year
(movie
  [['movie_title', 'title_year', 'imdb_score']]
  .sort_values(['title_year','imdb_score'],
               ascending=False)
  .drop_duplicates(subset='title_year')
)
image.png

你可能感兴趣的:(从零开始学Pandas(四)-DataFrame API介绍3)