用pandas分析百万电影数据

用pandas分析电影数据

Lift is short, use Python.

用Python做数据分析,pandas是Python数据分析的重要包,其他重要的包:numpy、matplotlib .

安装pandas(Linux, Mac, Windows皆同):

pip install pandas

电影数据来源:http://grouplens.org/datasets/movielens/

下载数据文件解压,包含如下4个文件:

  • users.dat 用户数据
  • movies.dat 电影数据
  • ratings.dat 评分数据
  • README 文件解释

查看README文件,可知源数据文件的格式:

  • users.dat (UserID::Gender::Age::Occupation::Zip-code)
  • movies.dat (MovieID::Title::Genres)
  • ratings.dat (UserID::MovieID::Rating::Timestamp)

特别解释:Occupation用户职业,Zip-code邮编, Timestamp时间戳, Genres电影类型(更多解释可以查看README文件).

文件中各每条数据的分割符是 ::


环境:

  • OS:Windows
  • Language:Python3.4
  • 编辑器:Jupyter

用pandas读取数据.

导入必要的头文件:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
  • 1
  • 2
  • 3
  • 1
  • 2
  • 3
读取数据,先定义字段名,因为源数据中无字段名,只有用’::’分割的每条数据.
user_names = ['user_id', 'gender', 'age', 'occupation', 'zip'] #用户表的数据字段名
  • 1
  • 1
读取数据,注意源文件的地址.
users = pd.read_table('C:\\Users\\Administrator\\Downloads\\ml-1m\\users.dat', sep='::', header=None, names=user_names)
  • 1
  • 1
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.
  if __name__ == '__main__':

上面有个警告,可以不管,即:加载数据是用的python engine 而不是 c engine.(更多请google) 
查看有多少个数据. 
前5行数据.

print(len(users))
users.head()
  • 1
  • 2
  • 1
  • 2
6040
  user_id gender age occupation zip
0 1 F 1 10 48067
1 2 M 56 16 70072
2 3 M 25 15 55117
3 4 M 45 7 02460
4 5 M 25 20 55455

同理将movies,ratings数据读进来.

ratings_names = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table('C:\\Users\\Administrator\\Downloads\\ml-1m\\ratings.dat', sep='::', header=None, names=ratings_names)
movies_names = ['movie_id', 'title', 'genres']
movies = pd.read_table('C:\\Users\\Administrator\\Downloads\\ml-1m\\movies.dat', sep='::', header=None, names=movies_names)
  • 1
  • 2
  • 3
  • 4
  • 1
  • 2
  • 3
  • 4
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.
  from ipykernel import kernelapp as app
D:\Anaconda3\lib\site-packages\ipykernel\__main__.py:4: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.

加载数据需要一点点时间,应为数据有上百万条. 
查看ratings表,movies表.

print(len(ratings))
ratings.head()
  • 1
  • 2
  • 1
  • 2
1000209
  user_id movie_id rating timestamp
0 1 1193 5 978300760
1 1 661 3 978302109
2 1 914 3 978301968
3 1 3408 4 978300275
4 1 2355 5 978824291
print(len(movies))
movies.head()
  • 1
  • 2
  • 1
  • 2
3883
  movie_id title genres
0 1 Toy Story (1995) Animation|Children’s|Comedy
1 2 Jumanji (1995) Adventure|Children’s|Fantasy
2 3 Grumpier Old Men (1995) Comedy|Romance
3 4 Waiting to Exhale (1995) Comedy|Drama
4 5 Father of the Bride Part II (1995) Comedy

电影的评分的数据有1百万多个. 
将3个表合并为一个表data .

data = pd.merge(pd.merge(users, ratings), movies)
print(len(data))
data.head()
  • 1
  • 2
  • 3
  • 1
  • 2
  • 3
1000209
  user_id gender age occupation zip movie_id rating timestamp title genres
0 1 F 1 10 48067 1193 5 978300760 One Flew Over the Cuckoo’s Nest (1975) Drama
1 2 M 56 16 70072 1193 5 978298413 One Flew Over the Cuckoo’s Nest (1975) Drama
2 12 M 25 12 32793 1193 4 978220179 One Flew Over the Cuckoo’s Nest (1975) Drama
3 15 M 25 7 22903 1193 4 978199279 One Flew Over the Cuckoo’s Nest (1975) Drama
4 17 M 50 1 95350 1193 5 978158471 One Flew Over the Cuckoo’s Nest (1975) Drama

查看用户id为1,对所有电影的评分.

data[data.user_id==1]
  • 1
  • 1
  user_id gender age occupation zip movie_id rating timestamp title genres
0 1 F 1 10 48067 1193 5 978300760 One Flew Over the Cuckoo’s Nest (1975) Drama
1725 1 F 1 10 48067 661 3 978302109 James and the Giant Peach (1996) Animation|Children’s|Musical
2250 1 F 1 10 48067 914 3 978301968 My Fair Lady (1964) Musical|Romance
2886 1 F 1 10 48067 3408 4 978300275 Erin Brockovich (2000) Drama
4201 1 F 1 10 48067 2355 5 978824291 Bug’s Life, A (1998) Animation|Children’s|Comedy
5904 1 F 1 10 48067 1197 3 978302268 Princess Bride, The (1987) Action|Adventure|Comedy|Romance
8222 1 F 1 10 48067 1287 5 978302039 Ben-Hur (1959) Action|Adventure|Drama
8926 1 F 1 10 48067 2804 5 978300719 Christmas Story, A (1983) Comedy|Drama
10278 1 F 1 10 48067 594 4 978302268 Snow White and the Seven Dwarfs (1937) Animation|Children’s|Musical
11041 1 F 1 10 48067 919 4 978301368 Wizard of Oz, The (1939) Adventure|Children’s|Drama|Musical
12759 1 F 1 10 48067 595 5 978824268 Beauty and the Beast (1991) Animation|Children’s|Musical
13819 1 F 1 10 48067 938 4 978301752 Gigi (1958) Musical
14006 1 F 1 10 48067 2398 4 978302281 Miracle on 34th Street (1947) Drama
14386 1 F 1 10 48067 2918 4 978302124 Ferris Bueller’s Day Off (1986) Comedy
15859 1 F 1 10 48067 1035 5 978301753 Sound of Music, The (1965) Musical
16741 1 F 1 10 48067 2791 4 978302188 Airplane! (1980) Comedy
18472 1 F 1 10 48067 2687 3 978824268 Tarzan (1999) Animation|Children’s
18914 1 F 1 10 48067 2018 4 978301777 Bambi (1942) Animation|Children’s
19503 1 F 1 10 48067 3105 5 978301713 Awakenings (1990) Drama
20183 1 F 1 10 48067 2797 4 978302039 Big (1988) Comedy|Fantasy
21674 1 F 1 10 48067 2321 3 978302205 Pleasantville (1998) Comedy
22832 1 F 1 10 48067 720 3 978300760 Wallace & Gromit: The Best of Aardman Animatio… Animation
23270 1 F 1 10 48067 1270 5 978300055 Back to the Future (1985) Comedy|Sci-Fi
25853 1 F 1 10 48067 527 5 978824195 Schindler’s List (1993) Drama|War
28157 1 F 1 10 48067 2340 3 978300103 Meet Joe Black (1998) Romance
28501 1 F 1 10 48067 48 5 978824351 Pocahontas (1995) Animation|Children’s|Musical|Romance
28883 1 F 1 10 48067 1097 4 978301953 E.T. the Extra-Terrestrial (1982) Children’s|Drama|Fantasy|Sci-Fi
31152 1 F 1 10 48067 1721 4 978300055 Titanic (1997) Drama|Romance
32698 1 F 1 10 48067 1545 4 978824139 Ponette (1996) Drama
32771 1 F 1 10 48067 745 3 978824268 Close Shave, A (1995) Animation|Comedy|Thriller
33428 1 F 1 10 48067 2294 4 978824291 Antz (1998) Animation|Children’s
34073 1 F 1 10 48067 3186 4 978300019 Girl, Interrupted (1999) Drama
34504 1 F 1 10 48067 1566 4 978824330 Hercules (1997) Adventure|Animation|Children’s|Comedy|Musical
34973 1 F 1 10 48067 588 4 978824268 Aladdin (1992) Animation|Children’s|Comedy|Musical
36324 1 F 1 10 48067 1907 4 978824330 Mulan (1998) Animation|Children’s
36814 1 F 1 10 48067 783 4 978824291 Hunchback of Notre Dame, The (1996) Animation|Children’s|Musical
37204 1 F 1 10 48067 1836 5 978300172 Last Days of Disco, The (1998) Drama
37339 1 F 1 10 48067 1022 5 978300055 Cinderella (1950) Animation|Children’s|Musical
37916 1 F 1 10 48067 2762 4 978302091 Sixth Sense, The (1999) Thriller
40375 1 F 1 10 48067 150 5 978301777 Apollo 13 (1995) Drama
41626 1 F 1 10 48067 1 5 978824268 Toy Story (1995) Animation|Children’s|Comedy
43703 1 F 1 10 48067 1961 5 978301590 Rain Man (1988) Drama
45033 1 F 1 10 48067 1962 4 978301753 Driving Miss Daisy (1989) Drama
45685 1 F 1 10 48067 2692 4 978301570 Run Lola Run (Lola rennt) (1998) Action|Crime|Romance
46757 1 F 1 10 48067 260 4 978300760 Star Wars: Episode IV - A New Hope (1977) Action|Adventure|Fantasy|Sci-Fi
49748 1 F 1 10 48067 1028 5 978301777 Mary Poppins (1964) Children’s|Comedy|Musical
50759 1 F 1 10 48067 1029 5 978302205 Dumbo (1941) Animation|Children’s|Musical
51327 1 F 1 10 48067 1207 4 978300719 To Kill a Mockingbird (1962) Drama
52255 1 F 1 10 48067 2028 5 978301619 Saving Private Ryan (1998) Action|Drama|War
54908 1 F 1 10 48067 531 4 978302149 Secret Garden, The (1993) Children’s|Drama
55246 1 F 1 10 48067 3114 4 978302174 Toy Story 2 (1999) Animation|Children’s|Comedy
56831 1 F 1 10 48067 608 4 978301398 Fargo (1996) Crime|Drama|Thriller
59344 1 F 1 10 48067 1246 4 978302091 Dead Poets Society (1989) Drama

不同性别对不同电影的平均评分.

mean_ratings_by_gender = data.pivot_table(values='rating',index='title',columns='gender', aggfunc='mean')
mean_ratings_by_gender.head(10)#查看前10条数据
  • 1
  • 2
  • 1
  • 2
gender F M
title    
$1,000,000 Duck (1971) 3.375000 2.761905
‘Night Mother (1986) 3.388889 3.352941
‘Til There Was You (1997) 2.675676 2.733333
‘burbs, The (1989) 2.793478 2.962085
…And Justice for All (1979) 3.828571 3.689024
1-900 (1994) 2.000000 3.000000
10 Things I Hate About You (1999) 3.646552 3.311966
101 Dalmatians (1961) 3.791444 3.500000
101 Dalmatians (1996) 3.240000 2.911215
12 Angry Men (1957) 4.184397 4.328421

mean_ratings_by_gender增加一列,男女的平均评分差.

mean_ratings_by_gender['diff'] = mean_ratings_by_gender.F - mean_ratings_by_gender.M
mean_ratings_by_gender.head()
  • 1
  • 2
  • 1
  • 2
gender F M diff
title      
$1,000,000 Duck (1971) 3.375000 2.761905 0.613095
‘Night Mother (1986) 3.388889 3.352941 0.035948
‘Til There Was You (1997) 2.675676 2.733333 -0.057658
‘burbs, The (1989) 2.793478 2.962085 -0.168607
…And Justice for All (1979) 3.828571 3.689024 0.139547

哪些电影是男女评分差异最大的(男性评分高女生评分低,女性高男性低).

mean_ratings_by_gender.sort_values(by='diff',ascending=True).head()
#男高女低
  • 1
  • 2
  • 1
  • 2
gender F M diff
title      
Tigrero: A Film That Was Never Made (1994) 1.0 4.333333 -3.333333
Neon Bible, The (1995) 1.0 4.000000 -3.000000
Enfer, L’ (1994) 1.0 3.750000 -2.750000
Stalingrad (1993) 1.0 3.593750 -2.593750
Killer: A Journal of Murder (1995) 1.0 3.428571 -2.428571
mean_ratings_by_gender.sort_values(by='diff',ascending=False).head()
#女高男低
  • 1
  • 2
  • 1
  • 2
gender F M diff
title      
James Dean Story, The (1957) 4.000000 1.000000 3.000000
Spiders, The (Die Spinnen, 1. Teil: Der Goldene See) (1919) 4.000000 1.000000 3.000000
Country Life (1994) 5.000000 2.000000 3.000000
Babyfever (1994) 3.666667 1.000000 2.666667
Woman of Paris, A (1923) 5.000000 2.428571 2.571429

不同电影的评分次数.

total_rating_by_title = data.groupby('title').size()
total_rating_by_title    #第一列是电影标题,第二列是评分次数
  • 1
  • 2
  • 1
  • 2
title
$1,000,000 Duck (1971)                              37
'Night Mother (1986)                                70
'Til There Was You (1997)                           52
'burbs, The (1989)                                 303
...And Justice for All (1979)                      199
1-900 (1994)                                         2
10 Things I Hate About You (1999)                  700
101 Dalmatians (1961)                              565
101 Dalmatians (1996)                              364
12 Angry Men (1957)                                616
13th Warrior, The (1999)                           750
187 (1997)                                          55
2 Days in the Valley (1996)                        286
20 Dates (1998)                                    139
20,000 Leagues Under the Sea (1954)                575
200 Cigarettes (1999)                              181
2001: A Space Odyssey (1968)                      1716
2010 (1984)                                        470
24 7: Twenty Four Seven (1997)                       5
24-hour Woman (1998)                                 9
28 Days (2000)                                     505
3 Ninjas: High Noon On Mega Mountain (1998)         47
3 Strikes (2000)                                     4
301, 302 (1995)                                      9
39 Steps, The (1935)                               253
400 Blows, The (Les Quatre cents coups) (1959)     187
42 Up (1998)                                        88
52 Pick-Up (1986)                                  140
54 (1998)                                          259
7th Voyage of Sinbad, The (1958)                   258
                                                  ... 
Wrongfully Accused (1998)                          123
Wyatt Earp (1994)                                  270
X-Files: Fight the Future, The (1998)              996
X-Men (2000)                                      1511
X: The Unknown (1956)                               12
Xiu Xiu: The Sent-Down Girl (Tian yu) (1998)        69
Yankee Zulu (1994)                                   2
Yards, The (1999)                                   77
Year My Voice Broke, The (1987)                     27
Year of Living Dangerously (1982)                  391
Year of the Horse (1997)                             4
Yellow Submarine (1968)                            399
Yojimbo (1961)                                     215
You Can't Take It With You (1938)                   77
You So Crazy (1994)                                 13
You've Got Mail (1998)                             838
Young Doctors in Love (1982)                        79
Young Frankenstein (1974)                         1193
Young Guns (1988)                                  562
Young Guns II (1990)                               369
Young Poisoner's Handbook, The (1995)               79
Young Sherlock Holmes (1985)                       379
Young and Innocent (1937)                           10
Your Friends and Neighbors (1998)                  109
Zachariah (1971)                                     2
Zed & Two Noughts, A (1985)                         29
Zero Effect (1998)                                 301
Zero Kelvin (Kj鎟lighetens kj鴗ere) (1995)             2
Zeus and Roxanne (1997)                             23
eXistenZ (1999)                                    410
dtype: int64

评分次数最多的10部电影.

top_10_total_rating = total_rating_by_title.sort_values(ascending=False).head(10)
top_10_total_rating
  • 1
  • 2
  • 1
  • 2
title
American Beauty (1999)                                   3428
Star Wars: Episode IV - A New Hope (1977)                2991
Star Wars: Episode V - The Empire Strikes Back (1980)    2990
Star Wars: Episode VI - Return of the Jedi (1983)        2883
Jurassic Park (1993)                                     2672
Saving Private Ryan (1998)                               2653
Terminator 2: Judgment Day (1991)                        2649
Matrix, The (1999)                                       2590
Back to the Future (1985)                                2583
Silence of the Lambs, The (1991)                         2578
dtype: int64
可以看出,评分次数最多的电影一般是我们比较熟知的电影,一般可认为是热门电影.
再来看看评分最高的10大电影(注:最高分为5.0)
  • 1
  • 2
  • 1
  • 2
mean_ratings_by_title = data.pivot_table(values='rating',index='title',aggfunc='mean')
top_10_mean_ratings = mean_ratings_by_title.sort_values(ascending=False).head(10)
top_10_mean_ratings
  • 1
  • 2
  • 3
  • 1
  • 2
  • 3
title
Gate of Heavenly Peace, The (1995)           5.0
Lured (1947)                                 5.0
Ulysses (Ulisse) (1954)                      5.0
Smashing Time (1967)                         5.0
Follow the Bitch (1998)                      5.0
Song of Freedom (1936)                       5.0
Bittersweet Motel (2000)                     5.0
Baby, The (1973)                             5.0
One Little Indian (1973)                     5.0
Schlafes Bruder (Brother of Sleep) (1995)    5.0
Name: rating, dtype: float64
评分人数最多的10部电影的平均评分.
  • 1
  • 1
mean_ratings_by_title[top_10_total_rating.index]
  • 1
  • 1
title
American Beauty (1999)                                   4.317386
Star Wars: Episode IV - A New Hope (1977)                4.453694
Star Wars: Episode V - The Empire Strikes Back (1980)    4.292977
Star Wars: Episode VI - Return of the Jedi (1983)        4.022893
Jurassic Park (1993)                                     3.763847
Saving Private Ryan (1998)                               4.337354
Terminator 2: Judgment Day (1991)                        4.058513
Matrix, The (1999)                                       4.315830
Back to the Future (1985)                                3.990321
Silence of the Lambs, The (1991)                         4.351823
Name: rating, dtype: float64
可以了解到评论人数最多的10部电影在平均评分最高的10大中排名并不高,评分高的电影有一部分是我们不熟知的电影,是不是数据有问题呢?其实不是,
假如有某部烂片,去观影的人很少,这很少的人给了很高的评分,所以导致一些评论人数很少但平均评分和高的电影.
  • 1
  • 2
  • 3
  • 1
  • 2
  • 3
如若不信,请看数据,评分最高的10大电影的评论次数
  • 1
  • 1
total_rating_by_title[top_10_mean_ratings.index]
  • 1
  • 1
title
Gate of Heavenly Peace, The (1995)           3
Lured (1947)                                 1
Ulysses (Ulisse) (1954)                      1
Smashing Time (1967)                         2
Follow the Bitch (1998)                      1
Song of Freedom (1936)                       1
Bittersweet Motel (2000)                     1
Baby, The (1973)                             1
One Little Indian (1973)                     1
Schlafes Bruder (Brother of Sleep) (1995)    1
dtype: int64
现在来重新统计10大热门电影,此处认为热门电影至少有1000人评论。
统计出热门电影
  • 1
  • 2
  • 1
  • 2
hot_movie = total_rating_by_title[total_rating_by_title>1000]
print(len(hot_movie))
hot_movie
  • 1
  • 2
  • 3
  • 1
  • 2
  • 3
207





title
2001: A Space Odyssey (1968)                          1716
Abyss, The (1989)                                     1715
African Queen, The (1951)                             1057
Air Force One (1997)                                  1076
Airplane! (1980)                                      1731
Aladdin (1992)                                        1351
Alien (1979)                                          2024
Aliens (1986)                                         1820
Amadeus (1984)                                        1382
American Beauty (1999)                                3428
American Pie (1999)                                   1389
American President, The (1995)                        1033
Animal House (1978)                                   1207
Annie Hall (1977)                                     1334
Apocalypse Now (1979)                                 1176
Apollo 13 (1995)                                      1251
Arachnophobia (1990)                                  1367
Armageddon (1998)                                     1110
As Good As It Gets (1997)                             1424
Austin Powers: International Man of Mystery (1997)    1205
Austin Powers: The Spy Who Shagged Me (1999)          1434
Babe (1995)                                           1751
Back to the Future (1985)                             2583
Back to the Future Part II (1989)                     1158
Back to the Future Part III (1990)                    1148
Batman (1989)                                         1431
Batman Returns (1992)                                 1031
Beauty and the Beast (1991)                           1060
Beetlejuice (1988)                                    1495
Being John Malkovich (1999)                           2241
                                                      ... 
Superman (1978)                                       1222
Talented Mr. Ripley, The (1999)                       1331
Taxi Driver (1976)                                    1240
Terminator 2: Judgment Day (1991)                     2649
Terminator, The (1984)                                2098
Thelma & Louise (1991)                                1417
There's Something About Mary (1998)                   1371
This Is Spinal Tap (1984)                             1118
Thomas Crown Affair, The (1999)                       1089
Three Kings (1999)                                    1021
Time Bandits (1981)                                   1010
Titanic (1997)                                        1546
Top Gun (1986)                                        1010
Total Recall (1990)                                   1996
Toy Story (1995)                                      2077
Toy Story 2 (1999)                                    1585
True Lies (1994)                                      1400
Truman Show, The (1998)                               1005
Twelve Monkeys (1995)                                 1511
Twister (1996)                                        1110
Untouchables, The (1987)                              1127
Usual Suspects, The (1995)                            1783
Wayne's World (1992)                                  1120
When Harry Met Sally... (1989)                        1568
Who Framed Roger Rabbit? (1988)                       1799
Willy Wonka and the Chocolate Factory (1971)          1313
Witness (1985)                                        1046
Wizard of Oz, The (1939)                              1718
X-Men (2000)                                          1511
Young Frankenstein (1974)                             1193
dtype: int64
#热门电影的评分
hot_movie_mean_rating = mean_ratings_by_title[hot_movie.index]
print(len(hot_movie_mean_rating))
hot_movie_mean_rating
  • 1
  • 2
  • 3
  • 4
  • 1
  • 2
  • 3
  • 4
207





title
2001: A Space Odyssey (1968)                          4.068765
Abyss, The (1989)                                     3.683965
African Queen, The (1951)                             4.251656
Air Force One (1997)                                  3.588290
Airplane! (1980)                                      3.971115
Aladdin (1992)                                        3.788305
Alien (1979)                                          4.159585
Aliens (1986)                                         4.125824
Amadeus (1984)                                        4.251809
American Beauty (1999)                                4.317386
American Pie (1999)                                   3.709863
American President, The (1995)                        3.793804
Animal House (1978)                                   4.053024
Annie Hall (1977)                                     4.141679
Apocalypse Now (1979)                                 4.243197
Apollo 13 (1995)                                      4.073541
Arachnophobia (1990)                                  3.002926
Armageddon (1998)                                     3.191892
As Good As It Gets (1997)                             3.950140
Austin Powers: International Man of Mystery (1997)    3.710373
Austin Powers: The Spy Who Shagged Me (1999)          3.388424
Babe (1995)                                           3.891491
Back to the Future (1985)                             3.990321
Back to the Future Part II (1989)                     3.343696
Back to the Future Part III (1990)                    3.242160
Batman (1989)                                         3.600978
Batman Returns (1992)                                 2.976722
Beauty and the Beast (1991)                           3.885849
Beetlejuice (1988)                                    3.567893
Being John Malkovich (1999)                           4.125390
                                                        ...   
Superman (1978)                                       3.536825
Talented Mr. Ripley, The (1999)                       3.503381
Taxi Driver (1976)                                    4.183871
Terminator 2: Judgment Day (1991)                     4.058513
Terminator, The (1984)                                4.152050
Thelma & Louise (1991)                                3.680311
There's Something About Mary (1998)                   3.904449
This Is Spinal Tap (1984)                             4.179785
Thomas Crown Affair, The (1999)                       3.641873
Three Kings (1999)                                    3.807052
Time Bandits (1981)                                   3.694059
Titanic (1997)                                        3.583441
Top Gun (1986)                                        3.686139
Total Recall (1990)                                   3.682365
Toy Story (1995)                                      4.146846
Toy Story 2 (1999)                                    4.218927
True Lies (1994)                                      3.634286
Truman Show, The (1998)                               3.861692
Twelve Monkeys (1995)                                 3.945731
Twister (1996)                                        3.173874
Untouchables, The (1987)                              4.007986
Usual Suspects, The (1995)                            4.517106
Wayne's World (1992)                                  3.600893
When Harry Met Sally... (1989)                        4.073342
Who Framed Roger Rabbit? (1988)                       3.679822
Willy Wonka and the Chocolate Factory (1971)          3.861386
Witness (1985)                                        3.996176
Wizard of Oz, The (1939)                              4.247963
X-Men (2000)                                          3.820649
Young Frankenstein (1974)                             4.250629
Name: rating, dtype: float64
#评论人数>=1000的10大评分最高电影
top_10_rating_movie = hot_movie_mean_rating.sort_values(ascending=False).head(10)
top_10_rating_movie
  • 1
  • 2
  • 3
  • 1
  • 2
  • 3
title
Shawshank Redemption, The (1994)                                               4.554558
Godfather, The (1972)                                                          4.524966
Usual Suspects, The (1995)                                                     4.517106
Schindler's List (1993)                                                        4.510417
Raiders of the Lost Ark (1981)                                                 4.477725
Rear Window (1954)                                                             4.476190
Star Wars: Episode IV - A New Hope (1977)                                      4.453694
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)    4.449890
Casablanca (1942)                                                              4.412822
Sixth Sense, The (1999)                                                        4.406263
Name: rating, dtype: float64
%matplotlib inline #在ipython(或jupyter)中使用此命令,其他则不必
import matplotlib.pyplot as plt
import numpy as np

x = np.arange(1,11)
y = top_10_rating_movie.values
name = top_10_rating_movie.index

#画出图像
plt.plot(x, y, 'r-o')

#添加注释
for i in range(10):
    plt.text(x[i], y[i], name[i])

#设置坐标范围
plt.xlim(0, 15)
plt.ylim(4.4, 4.56)

#设置坐标标题
#plt.xlabel('Rank')
#plt.ylabel=('Rating')

#plt.show() #非ipython用户使用此命令
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24

用pandas分析百万电影数据_第1张图片

这图太丑,献上下图:
  • 1
  • 1
import matplotlib.pyplot as plt
import numpy as np

plt.rcdefaults()

people = name
y_pos = np.arange(len(people))
performance = y
error = np.random.rand(len(people))

plt.barh(y_pos, performance, xerr=error, align='center', alpha=0.4)
plt.yticks(y_pos, people)

#plt.xlabel('Rating')
#plt.title('Rank')


#plt.show() #非ipython用户使用此命令
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

用pandas分析百万电影数据_第2张图片

你可能感兴趣的:(python)