参考文献:pandas cookbook
# the renamed DataFrame method accepts dictionaries that map the old value to the new value
col_map={"director_name":"director","num_critic_for_reviews":"critic_reviews"}
movies.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4916 entries, 0 to 4915
Data columns (total 28 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 color 4897 non-null object
1 director_name 4814 non-null object
2 num_critic_for_reviews 4867 non-null float64
3 duration 4901 non-null float64
4 director_facebook_likes 4814 non-null float64
5 actor_3_facebook_likes 4893 non-null float64
6 actor_2_name 4903 non-null object
7 actor_1_facebook_likes 4909 non-null float64
8 gross 4054 non-null float64
9 genres 4916 non-null object
10 actor_1_name 4909 non-null object
11 movie_title 4916 non-null object
12 num_voted_users 4916 non-null int64
13 cast_total_facebook_likes 4916 non-null int64
14 actor_3_name 4893 non-null object
15 facenumber_in_poster 4903 non-null float64
16 plot_keywords 4764 non-null object
17 movie_imdb_link 4916 non-null object
18 num_user_for_reviews 4895 non-null float64
19 language 4904 non-null object
20 country 4911 non-null object
21 content_rating 4616 non-null object
22 budget 4432 non-null float64
23 title_year 4810 non-null float64
24 actor_2_facebook_likes 4903 non-null float64
25 imdb_score 4916 non-null float64
26 aspect_ratio 4590 non-null float64
27 movie_facebook_likes 4916 non-null int64
dtypes: float64(13), int64(3), object(12)
memory usage: 1.1+ MB
movies.rename(columns=col_map).head()
Out[35]:
color director ... aspect_ratio movie_facebook_likes
0 Color James Cameron ... 1.78 33000
1 Color Gore Verbinski ... 2.35 0
2 Color Sam Mendes ... 2.35 85000
3 Color Christopher Nolan ... 2.35 164000
4 NaN Doug Walker ... NaN 0
[5 rows x 28 columns]
movies.rename(col_map).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4916 entries, 0 to 4915
Data columns (total 28 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 color 4897 non-null object
1 director_name 4814 non-null object
2 num_critic_for_reviews 4867 non-null float64
3 duration 4901 non-null float64
4 director_facebook_likes 4814 non-null float64
5 actor_3_facebook_likes 4893 non-null float64
6 actor_2_name 4903 non-null object
7 actor_1_facebook_likes 4909 non-null float64
8 gross 4054 non-null float64
9 genres 4916 non-null object
10 actor_1_name 4909 non-null object
11 movie_title 4916 non-null object
12 num_voted_users 4916 non-null int64
13 cast_total_facebook_likes 4916 non-null int64
14 actor_3_name 4893 non-null object
15 facenumber_in_poster 4903 non-null float64
16 plot_keywords 4764 non-null object
17 movie_imdb_link 4916 non-null object
18 num_user_for_reviews 4895 non-null float64
19 language 4904 non-null object
20 country 4911 non-null object
21 content_rating 4616 non-null object
22 budget 4432 non-null float64
23 title_year 4810 non-null float64
24 actor_2_facebook_likes 4903 non-null float64
25 imdb_score 4916 non-null float64
26 aspect_ratio 4590 non-null float64
27 movie_facebook_likes 4916 non-null int64
dtypes: float64(13), int64(3), object(12)
memory usage: 1.1+ MB
# 用字典改索引名
# rename the index using .rename method
idx_map={"Avatar":"Ratava","Spectre":"Ertceps","Pirates of the Caribbean: At World's End":
"POC",}
movies.set_index("movie_title").rename(index=idx_map,columns=col_map).head()
Out[39]:
color ... movie_facebook_likes
movie_title ...
Ratava Color ... 33000
POC Color ... 0
Ertceps Color ... 85000
The Dark Knight Rises Color ... 164000
Star Wars: Episode VII - The Force Awakens NaN ... 0
[5 rows x 27 columns]
movies.head()
Out[40]:
color director_name ... aspect_ratio movie_facebook_likes
0 Color James Cameron ... 1.78 33000
1 Color Gore Verbinski ... 2.35 0
2 Color Sam Mendes ... 2.35 85000
3 Color Christopher Nolan ... 2.35 164000
4 NaN Doug Walker ... NaN 0
[5 rows x 28 columns]
movies["movie_title"].head()
Out[41]:
0 Avatar
1 Pirates of the Caribbean: At World's End
2 Spectre
3 The Dark Knight Rises
4 Star Wars: Episode VII - The Force Awakens
Name: movie_title, dtype: object
这种方法可以只更改几个列名,不修改全部。
movies=pd.read_csv("movie.csv",index_col="movie_title")
ids=movies.index.to_list()
columns=movies.columns.to_list()
ids[0]='Ratava'
ids[1:3]=['POC','Ertceps']
columns[1]="director"
columns[-2]="aspect"
columns[-1]="fblikes"
movies.index=ids
movies.columns=columns
movies.head(3)
Out[54]:
color director ... aspect fblikes
Ratava Color James Cameron ... 1.78 33000
POC Color Gore Verbinski ... 2.35 0
Ertceps Color Sam Mendes ... 2.35 85000
[3 rows x 27 columns]
# clean up spaces and uppercases in the columns
# pass a function into the .rename method
# this function takes a column name and returns a new name
movies.columns
Out[58]:
Index(['color', 'director', 'num_critic_for_reviews', 'duration',
'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
'num_voted_users', 'cast_total_facebook_likes', 'actor_3_name',
'facenumber_in_poster', 'plot_keywords', 'movie_imdb_link',
'num_user_for_reviews', 'language', 'country', 'content_rating',
'budget', 'title_year', 'actor_2_facebook_likes', 'imdb_score',
'aspect', 'fblikes'],
dtype='object')
def to_clean(val):
val.strip().lower().replace(" ","_")
movies.rename(columns=to_clean).head()
Out[61]:
NaN ... NaN
Ratava Color ... 33000
POC Color ... 0
Ertceps Color ... 85000
The Dark Knight Rises Color ... 164000
Star Wars: Episode VII - The Force Awakens NaN ... 0
[5 rows x 27 columns]
######################
cols=[col.strip().lower().replace(" ","_") for col in movies.columns]
movies.columns=cols
movies.head()
Out[64]:
color ... fblikes
Ratava Color ... 33000
POC Color ... 0
Ertceps Color ... 85000
The Dark Knight Rises Color ... 164000
Star Wars: Episode VII - The Force Awakens NaN ... 0
[5 rows x 27 columns]