pandas电影数据分析

电影数据分析

    • 初始操作
        • 数据读取
        • 数据集成
        • 用透视表对电影进行数据分析
          • 选取评分高或低的电影
          • 不同性别对电影平均评分的差异
        • pandas分组运算分析不同年龄段的评分规律
    • 改良操作
    • 总结

初始操作

数据读取

  • 输入
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
%matplotlib inline

# 数据读取
labels = ['UserId','Gender','Age','Occupation','zip-code']
users = pd.read_csv('./users.dat',sep = '::',header = None,names = labels)
users.shape

labels = ['movieId','Title','Genres']
movie = pd.read_csv('./movies.dat',sep = '::',header = None, names = labels)
display(movie.head(),movie.shape)

labels = ['UserId','MovieId','Rating','Time']
ratings = pd.read_csv('./ratings.dat',sep = '::',header = None, names = labels)
display(ratings.head(),ratings.shape)

labels = ['UserId','MovieId','Rating','Time']
ratings = pd.read_csv('./ratings.dat',sep = '::',header = None, names = labels)
display(ratings.head(),ratings.shape)
  • 输出
(6040, 5)

movieId	Title	Genres
0	1	Toy Story (1995)	Animation|Children's|Comedy
1	2	Jumanji (1995)	Adventure|Children's|Fantasy
2	3	Grumpier Old Men (1995)	Comedy|Romance
3	4	Waiting to Exhale (1995)	Comedy|Drama
4	5	Father of the Bride Part II (1995)	Comedy

	UserId	MovieId	Rating	Time
0	1	1193	5	978300760
1	1	661	3	978302109
2	1	914	3	978301968
3	1	3408	4	978300275
4	1	2355	5	978824291
(1000209, 4)

数据集成

数据合并,数据分布于三个表,数据合并专业词汇:数据集成

  • 输入
display(users.head(),movie.head(),ratings.head())

df1 = pd.merge(left = users,right = ratings)
df1.head()

movie_data = pd.merge(movie,df1,left_on = 'movieId',right_on = 'MovieId')
display(movie_data.shape,movie_data.head())

movie_data['Age'].unique()
movie_data.shape
movie_data.head()
movie_data['Title'].unique()
movie_data['Title'].unique().size
  • 输出
UserId	Gender	Age	Occupation	zip-code
0	1	F	1	10	48067
1	2	M	56	16	70072
2	3	M	25	15	55117
3	4	M	45	7	02460
4	5	M	25	20	55455
movieId	Title	Genres
0	1	Toy Story (1995)	Animation|Children's|Comedy
1	2	Jumanji (1995)	Adventure|Children's|Fantasy
2	3	Grumpier Old Men (1995)	Comedy|Romance
3	4	Waiting to Exhale (1995)	Comedy|Drama
4	5	Father of the Bride Part II (1995)	Comedy
UserId	MovieId	Rating	Time
0	1	1193	5	978300760
1	1	661	3	978302109
2	1	914	3	978301968
3	1	3408	4	978300275
4	1	2355	5	978824291

	UserId	Gender	Age	Occupation	zip-code	MovieId	Rating	Time
0	1	F	1	10	48067	1193	5	978300760
1	1	F	1	10	48067	661	3	978302109
2	1	F	1	10	48067	914	3	978301968
3	1	F	1	10	48067	3408	4	978300275
4	1	F	1	10	48067	2355	5	978824291

(1000209, 11)
movieId	Title	Genres	UserId	Gender	Age	Occupation	zip-code	MovieId	Rating	Time
0	1	Toy Story (1995)	Animation|Children's|Comedy	1	F	1	10	48067	1	5	978824268
1	1	Toy Story (1995)	Animation|Children's|Comedy	6	F	50	9	55117	1	4	978237008
2	1	Toy Story (1995)	Animation|Children's|Comedy	8	M	25	12	11413	1	4	978233496
3	1	Toy Story (1995)	Animation|Children's|Comedy	9	M	25	17	61614	1	5	978225952
4	1	Toy Story (1995)	Animation|Children's|Comedy	10	F	35	1	95370	1	5	978226474

array([ 1, 50, 25, 35, 18, 45, 56], dtype=int64)
(1000209, 11)

	movieId	Title	Genres	UserId	Gender	Age	Occupation	zip-code	MovieId	Rating	Time
0	1	Toy Story (1995)	Animation|Children's|Comedy	1	F	1	10	48067	1	5	978824268
1	1	Toy Story (1995)	Animation|Children's|Comedy	6	F	50	9	55117	1	4	978237008
2	1	Toy Story (1995)	Animation|Children's|Comedy	8	M	25	12	11413	1	4	978233496
3	1	Toy Story (1995)	Animation|Children's|Comedy	9	M	25	17	61614	1	5	978225952
4	1	Toy Story (1995)	Animation|Children's|Comedy	10	F	35	1	95370	1	5	978226474

array(['Toy Story (1995)', 'Jumanji (1995)', 'Grumpier Old Men (1995)',
       ..., 'Tigerland (2000)', 'Two Family House (2000)',
       'Contender, The (2000)'], dtype=object)
3706

用透视表对电影进行数据分析

选取评分高或低的电影
  • 输入
movie_rate_mean = pd.pivot_table(movie_data,values = ['Rating'],index = ['Title'],aggfunc = 'mean')
movie_rate_mean.shape

movie_rate_mean.head()
movie_rate_mean.sort_values(by = 'Rating',ascending = False,inplace = True)

# 选取评分最高的电影
movie_rate_mean[:20]

# 选取评分最低的电影
movie_rate_mean[-20:]
  • 输出
(3706, 1)

	Rating
Title	
$1,000,000 Duck (1971)	3.027027
'Night Mother (1986)	3.371429
'Til There Was You (1997)	2.692308
'burbs, The (1989)	2.910891
...And Justice for All (1979)	3.713568

	Rating
Title	
Ulysses (Ulisse) (1954)	5.000000
Lured (1947)	5.000000
Follow the Bitch (1998)	5.000000
Bittersweet Motel (2000)	5.000000
Song of Freedom (1936)	5.000000
One Little Indian (1973)	5.000000
Smashing Time (1967)	5.000000
Schlafes Bruder (Brother of Sleep) (1995)	5.000000
Gate of Heavenly Peace, The (1995)	5.000000
Baby, The (1973)	5.000000
I Am Cuba (Soy Cuba/Ya Kuba) (1964)	4.800000
Lamerica (1994)	4.750000
Apple, The (Sib) (1998)	4.666667
Sanjuro (1962)	4.608696
Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)	4.560510
Shawshank Redemption, The (1994)	4.554558
Godfather, The (1972)	4.524966
Close Shave, A (1995)	4.520548
Usual Suspects, The (1995)	4.517106
Schindler's List (1993)	4.510417

	Rating
Title	
Cheetah (1989)	1.0
Torso (Corpi Presentano Tracce di Violenza Carnale) (1973)	1.0
Mutters Courage (1995)	1.0
Sleepover (1995)	1.0
Bloody Child, The (1996)	1.0
Get Over It (1996)	1.0
Even Dwarfs Started Small (Auch Zwerge haben klein angefangen) (1971)	1.0
Nueba Yol (1995)	1.0
Lotto Land (1995)	1.0
White Boys (1999)	1.0
Terror in a Texas Town (1958)	1.0
Hillbillys in a Haunted House (1967)	1.0
McCullochs, The (1975)	1.0
Shadows (Cienie) (1988)	1.0
Little Indian, Big City (Un indien dans la ville) (1994)	1.0
Uninvited Guest, An (2000)	1.0
Blood Spattered Bride, The (La Novia Ensangrentada) (1972)	1.0
Diebinnen (1995)	1.0
Elstree Calling (1930)	1.0
Windows (1980)	1.0
不同性别对电影平均评分的差异
  • 输入
# 不同性别对电影的平均评分
# 通过透视表来透视数据结构
movie_gender_rating_mean = pd.pivot_table(movie_data,values = ['Rating'],index = ['Title','Gender'],aggfunc = 'mean')
movie_gender_rating_mean.shape
movie_rate_mean.shape

movie_gender_rating_mean.head()

movie_gender_rating_mean = pd.pivot_table(movie_data,values = ['Rating'],index = ['Title'],columns = ['Gender'],aggfunc = 'mean')
movie_gender_rating_mean.head()

# 不同性别争议最大电影
movie_gender_rating_mean.columns

movie_gender_rating_mean = pd.pivot_table(movie_data,values = 'Rating',index = ['Title'],columns = ['Gender'],aggfunc = 'mean')
movie_gender_rating_mean.head()

movie_gender_rating_mean.columns

# 女性用户和男性用户对于电影评分的差异
movie_gender_rating_mean['diff'] = movie_gender_rating_mean.F-movie_gender_rating_mean.M
movie_gender_rating_mean.head()

# 排序
movie_gender_rating_mean.sort_values(by = 'diff',ascending = False,inplace = True)
movie_gender_rating_mean.head()

# 女性用户和男性用户差异巨大,但是女性用户喜欢的数据
f = movie_gender_rating_mean[:10]
f

# 对空数据进行处理,并对处理后的数据进行排序
m = movie_gender_rating_mean.dropna()[-10:]
m
diff = pd.concat([f,m])
diff

# 将数据结果进行可视化
# 绘制水平柱状图
diff.plot(y = 'diff',kind = 'barh',figsize = (12,9))
  • 输出
(3706, 1)

	Rating
Title	Gender	
$1,000,000 Duck (1971)	F	3.375000
M	2.761905
'Night Mother (1986)	F	3.388889
M	3.352941
'Til There Was You (1997)	F	2.675676

	Rating
Gender	F	M
Title		
$1,000,000 Duck (1971)	3.375000	2.761905
'Night Mother (1986)	3.388889	3.352941
'Til There Was You (1997)	2.675676	2.733333
'burbs, The (1989)	2.793478	2.962085
...And Justice for All (1979)	3.828571	3.689024


MultiIndex(levels=[['Rating'], ['F', 'M']],
           codes=[[0, 0], [0, 1]],
           names=[None, 'Gender'])
Gender	F	M
Title		
$1,000,000 Duck (1971)	3.375000	2.761905
'Night Mother (1986)	3.388889	3.352941
'Til There Was You (1997)	2.675676	2.733333
'burbs, The (1989)	2.793478	2.962085
...And Justice for All (1979)	3.828571	3.689024

Index(['F', 'M'], dtype='object', name='Gender')

Gender	F	M	diff
Title			
$1,000,000 Duck (1971)	3.375000	2.761905	0.613095
'Night Mother (1986)	3.388889	3.352941	0.035948
'Til There Was You (1997)	2.675676	2.733333	-0.057658
'burbs, The (1989)	2.793478	2.962085	-0.168607
...And Justice for All (1979)	3.828571	3.689024	0.139547

Gender	F	M	diff
Title			
James Dean Story, The (1957)	4.000000	1.000000	3.000000
Spiders, The (Die Spinnen, 1. Teil: Der Goldene See) (1919)	4.000000	1.000000	3.000000
Country Life (1994)	5.000000	2.000000	3.000000
Babyfever (1994)	3.666667	1.000000	2.666667
Woman of Paris, A (1923)	5.000000	2.428571	2.571429

Gender	F	M	diff
Title			
James Dean Story, The (1957)	4.000000	1.000000	3.000000
Spiders, The (Die Spinnen, 1. Teil: Der Goldene See) (1919)	4.000000	1.000000	3.000000
Country Life (1994)	5.000000	2.000000	3.000000
Babyfever (1994)	3.666667	1.000000	2.666667
Woman of Paris, A (1923)	5.000000	2.428571	2.571429
Cobra (1925)	4.000000	1.500000	2.500000
Other Side of Sunday, The (S鴑dagsengler) (1996)	5.000000	2.928571	2.071429
Theodore Rex (1995)	3.000000	1.000000	2.000000
For the Moment (1994)	5.000000	3.000000	2.000000
Separation, The (La S閜aration) (1994)	4.000000	2.000000	2.000000

Gender	F	M	diff
Title			
Jamaica Inn (1939)	1.0	3.142857	-2.142857
Flying Saucer, The (1950)	1.0	3.300000	-2.300000
Rosie (1998)	1.0	3.333333	-2.333333
In God's Hands (1998)	1.0	3.333333	-2.333333
Dangerous Ground (1997)	1.0	3.333333	-2.333333
Killer: A Journal of Murder (1995)	1.0	3.428571	-2.428571
Stalingrad (1993)	1.0	3.593750	-2.593750
Enfer, L' (1994)	1.0	3.750000	-2.750000
Neon Bible, The (1995)	1.0	4.000000	-3.000000
Tigrero: A Film That Was Never Made (1994)	1.0	4.333333	-3.333333

Gender	F	M	diff
Title			
James Dean Story, The (1957)	4.000000	1.000000	3.000000
Spiders, The (Die Spinnen, 1. Teil: Der Goldene See) (1919)	4.000000	1.000000	3.000000
Country Life (1994)	5.000000	2.000000	3.000000
Babyfever (1994)	3.666667	1.000000	2.666667
Woman of Paris, A (1923)	5.000000	2.428571	2.571429
Cobra (1925)	4.000000	1.500000	2.500000
Other Side of Sunday, The (S鴑dagsengler) (1996)	5.000000	2.928571	2.071429
Theodore Rex (1995)	3.000000	1.000000	2.000000
For the Moment (1994)	5.000000	3.000000	2.000000
Separation, The (La S閜aration) (1994)	4.000000	2.000000	2.000000
Jamaica Inn (1939)	1.000000	3.142857	-2.142857
Flying Saucer, The (1950)	1.000000	3.300000	-2.300000
Rosie (1998)	1.000000	3.333333	-2.333333
In God's Hands (1998)	1.000000	3.333333	-2.333333
Dangerous Ground (1997)	1.000000	3.333333	-2.333333
Killer: A Journal of Murder (1995)	1.000000	3.428571	-2.428571
Stalingrad (1993)	1.000000	3.593750	-2.593750
Enfer, L' (1994)	1.000000	3.750000	-2.750000
Neon Bible, The (1995)	1.000000	4.000000	-3.000000
Tigrero: A Film That Was Never Made (1994)	1.000000	4.333333	-3.333333

pandas电影数据分析_第1张图片

pandas分组运算分析不同年龄段的评分规律

  • 输入
# 评分次数最多的电影,pandas分组运算
movie_data.shape

rating_count = movie_data.groupby(['Title']).size()

rating_count.sort_values(ascending = False)[:50]

# 不同年龄段争议最大的电影
# 1. 查看年龄分布
movie_data['Age'].plot(kind = 'hist',bins = 20)

movie_data.Age.max()

# 使用pandas.cut函数将用户年龄分组
labels = ['0-9','10-19','20-29','30-39','40-49','50-59']
movie_data['Age_range'] = pd.cut(movie_data.Age,bins = range(0,61,10),labels = labels)
movie_data.head()

# 根据年龄段用户评分人数和打分偏好
movie_data.groupby('Age_range').agg({'Rating':[np.size,np.mean]})
  • 输出
(1000209, 11)

Title
American Beauty (1999)                                   3428
Star Wars: Episode IV - A New Hope (1977)                2991
Star Wars: Episode V - The Empire Strikes Back (1980)    2990
Star Wars: Episode VI - Return of the Jedi (1983)        2883
Jurassic Park (1993)                                     2672
Saving Private Ryan (1998)                               2653
Terminator 2: Judgment Day (1991)                        2649
Matrix, The (1999)                                       2590
Back to the Future (1985)                                2583
Silence of the Lambs, The (1991)                         2578
Men in Black (1997)                                      2538
Raiders of the Lost Ark (1981)                           2514
Fargo (1996)                                             2513
Sixth Sense, The (1999)                                  2459
Braveheart (1995)                                        2443
Shakespeare in Love (1998)                               2369
Princess Bride, The (1987)                               2318
Schindler's List (1993)                                  2304
L.A. Confidential (1997)                                 2288
Groundhog Day (1993)                                     2278
E.T. the Extra-Terrestrial (1982)                        2269
Star Wars: Episode I - The Phantom Menace (1999)         2250
Being John Malkovich (1999)                              2241
Shawshank Redemption, The (1994)                         2227
Godfather, The (1972)                                    2223
Forrest Gump (1994)                                      2194
Ghostbusters (1984)                                      2181
Pulp Fiction (1994)                                      2171
Terminator, The (1984)                                   2098
Toy Story (1995)                                         2077
Alien (1979)                                             2024
Total Recall (1990)                                      1996
Fugitive, The (1993)                                     1995
Gladiator (2000)                                         1924
Aliens (1986)                                            1820
Blade Runner (1982)                                      1800
Who Framed Roger Rabbit? (1988)                          1799
Stand by Me (1986)                                       1785
Usual Suspects, The (1995)                               1783
Babe (1995)                                              1751
Airplane! (1980)                                         1731
Independence Day (ID4) (1996)                            1730
Galaxy Quest (1999)                                      1728
One Flew Over the Cuckoo's Nest (1975)                   1725
Wizard of Oz, The (1939)                                 1718
2001: A Space Odyssey (1968)                             1716
Abyss, The (1989)                                        1715
Bug's Life, A (1998)                                     1703
Jaws (1975)                                              1697
Godfather: Part II, The (1974)                           1692
dtype: int64

pandas电影数据分析_第2张图片

56

	movieId	Title	Genres	UserId	Gender	Age	Occupation	zip-code	MovieId	Rating	Time	Age_value	Age_range
0	1	Toy Story (1995)	Animation|Children's|Comedy	1	F	1	10	48067	1	5	978824268	0-9	0-9
1	1	Toy Story (1995)	Animation|Children's|Comedy	6	F	50	9	55117	1	4	978237008	40-49	40-49
2	1	Toy Story (1995)	Animation|Children's|Comedy	8	M	25	12	11413	1	4	978233496	20-29	20-29
3	1	Toy Story (1995)	Animation|Children's|Comedy	9	M	25	17	61614	1	5	978225952	20-29	20-29
4	1	Toy Story (1995)	Animation|Children's|Comedy	10	F	35	1	95370	1	5	978226474	30-39	30-39

	Rating
size	mean
Age_range		
0-9	27211	3.549520
10-19	183536	3.507573
20-29	395556	3.545235
30-39	199003	3.618162
40-49	156123	3.673559
50-59	38780	3.766632

改良操作

  • 由于评分次数相差悬殊,可能导致评分高的电影并不一定是好看的电影,该问题的解决方案如下:

    • 加入评分次数限制的分析不同性别对电影的平均评分
    • 加入评分次数限制的分析平均分高的电影
  • 输入

movie_gender_rating_mean.index

# 加入评分次数限制的分析不同性别对电影的平均评分
movie_gender_rating_mean.head()

top_movie_title = movie_data.groupby('Title').size().sort_values(ascending = False)[:50].index
top_movie_title

flag = movie_gender_rating_mean.index.isin(top_movie_title)
flag

df1 = movie_gender_rating_mean[flag].sort_values(by = 'diff')
df1

df1.plot(kind = 'barh',figsize = (12,9))

# 加入评分次数限制的平均分高的电影
movie_rating_mean = pd.pivot_table(movie_data,values = 'Rating',index = ['Title'])
index = movie_data.groupby('Title').size().sort_values()[::-1][:50].index
index

flag = movie_rating_mean.index.isin(index)

# 热门电影平均分
movie_rating_top_mean = movie_rating_mean[flag]
movie_rating_top_mean 
  • 输出
Index(['James Dean Story, The (1957)',
       'Spiders, The (Die Spinnen, 1. Teil: Der Goldene See) (1919)',
       'Country Life (1994)', 'Babyfever (1994)', 'Woman of Paris, A (1923)',
       'Cobra (1925)', 'Other Side of Sunday, The (S鴑dagsengler) (1996)',
       'Theodore Rex (1995)', 'For the Moment (1994)',
       'Separation, The (La S閜aration) (1994)',
       ...
       'White Boys (1999)', 'Wild Bill (1995)', 'Windows (1980)',
       'Wings of Courage (1995)', 'With Byrd at the South Pole (1930)',
       'With Friends Like These... (1998)',
       'Wooden Man's Bride, The (Wu Kui) (1994)', 'Year of the Horse (1997)',
       'Zachariah (1971)', 'Zero Kelvin (Kj鎟lighetens kj鴗ere) (1995)'],
      dtype='object', name='Title', length=3706)
Gender	F	M	diff
Title			
James Dean Story, The (1957)	4.000000	1.000000	3.000000
Spiders, The (Die Spinnen, 1. Teil: Der Goldene See) (1919)	4.000000	1.000000	3.000000
Country Life (1994)	5.000000	2.000000	3.000000
Babyfever (1994)	3.666667	1.000000	2.666667
Woman of Paris, A (1923)	5.000000	2.428571	2.571429

Index(['American Beauty (1999)', 'Star Wars: Episode IV - A New Hope (1977)',
       'Star Wars: Episode V - The Empire Strikes Back (1980)',
       'Star Wars: Episode VI - Return of the Jedi (1983)',
       'Jurassic Park (1993)', 'Saving Private Ryan (1998)',
       'Terminator 2: Judgment Day (1991)', 'Matrix, The (1999)',
       'Back to the Future (1985)', 'Silence of the Lambs, The (1991)',
       'Men in Black (1997)', 'Raiders of the Lost Ark (1981)', 'Fargo (1996)',
       'Sixth Sense, The (1999)', 'Braveheart (1995)',
       'Shakespeare in Love (1998)', 'Princess Bride, The (1987)',
       'Schindler's List (1993)', 'L.A. Confidential (1997)',
       'Groundhog Day (1993)', 'E.T. the Extra-Terrestrial (1982)',
       'Star Wars: Episode I - The Phantom Menace (1999)',
       'Being John Malkovich (1999)', 'Shawshank Redemption, The (1994)',
       'Godfather, The (1972)', 'Forrest Gump (1994)', 'Ghostbusters (1984)',
       'Pulp Fiction (1994)', 'Terminator, The (1984)', 'Toy Story (1995)',
       'Alien (1979)', 'Total Recall (1990)', 'Fugitive, The (1993)',
       'Gladiator (2000)', 'Aliens (1986)', 'Blade Runner (1982)',
       'Who Framed Roger Rabbit? (1988)', 'Stand by Me (1986)',
       'Usual Suspects, The (1995)', 'Babe (1995)', 'Airplane! (1980)',
       'Independence Day (ID4) (1996)', 'Galaxy Quest (1999)',
       'One Flew Over the Cuckoo's Nest (1975)', 'Wizard of Oz, The (1939)',
       '2001: A Space Odyssey (1968)', 'Abyss, The (1989)',
       'Bug's Life, A (1998)', 'Jaws (1975)',
       'Godfather: Part II, The (1974)'],
      dtype='object', name='Title')
array([False, False, False, ..., False, False, False])

Gender	F	M	diff
Title			
Airplane! (1980)	3.656566	4.064419	-0.407854
Godfather: Part II, The (1974)	4.040936	4.437778	-0.396842
Aliens (1986)	3.802083	4.186684	-0.384601
Terminator 2: Judgment Day (1991)	3.785088	4.115367	-0.330279
Alien (1979)	3.888252	4.216119	-0.327867
Terminator, The (1984)	3.899729	4.205899	-0.306170
Groundhog Day (1993)	3.735562	4.041358	-0.305796
2001: A Space Odyssey (1968)	3.825581	4.129738	-0.304156
Saving Private Ryan (1998)	4.114783	4.398941	-0.284159
Braveheart (1995)	4.016484	4.297839	-0.281355
Pulp Fiction (1994)	4.071956	4.346839	-0.274883
Godfather, The (1972)	4.314700	4.583333	-0.268634
Star Wars: Episode V - The Empire Strikes Back (1980)	4.106481	4.344577	-0.238096
Jurassic Park (1993)	3.579407	3.814197	-0.234791
Matrix, The (1999)	4.128405	4.362235	-0.233830
Blade Runner (1982)	4.086538	4.312500	-0.225962
Star Wars: Episode VI - Return of the Jedi (1983)	3.865237	4.069058	-0.203821
Star Wars: Episode IV - A New Hope (1977)	4.302937	4.495307	-0.192371
Raiders of the Lost Ark (1981)	4.332168	4.520597	-0.188429
Jaws (1975)	3.946875	4.122731	-0.175856
L.A. Confidential (1997)	4.106007	4.256678	-0.150671
Who Framed Roger Rabbit? (1988)	3.569378	3.713251	-0.143873
Total Recall (1990)	3.573718	3.702494	-0.128776
Silence of the Lambs, The (1991)	4.271955	4.381944	-0.109990
American Beauty (1999)	4.238901	4.347301	-0.108400
One Flew Over the Cuckoo's Nest (1975)	4.310811	4.418423	-0.107612
Star Wars: Episode I - The Phantom Menace (1999)	3.328326	3.431054	-0.102728
Ghostbusters (1984)	3.833962	3.928528	-0.094566
Back to the Future (1985)	3.932707	4.009259	-0.076552
Forrest Gump (1994)	4.045031	4.105806	-0.060775
Fargo (1996)	4.217656	4.267780	-0.050124
Abyss, The (1989)	3.659236	3.689507	-0.030272
Gladiator (2000)	4.088312	4.110461	-0.022150
Shawshank Redemption, The (1994)	4.539075	4.560625	-0.021550
Usual Suspects, The (1995)	4.513317	4.518248	-0.004931
Fugitive, The (1993)	4.100457	4.104046	-0.003590
Being John Malkovich (1999)	4.159930	4.113636	0.046293
Princess Bride, The (1987)	4.342767	4.288942	0.053826
Toy Story (1995)	4.187817	4.130552	0.057265
Stand by Me (1986)	4.146341	4.080210	0.066132
Schindler's List (1993)	4.562602	4.491415	0.071187
Shakespeare in Love (1998)	4.181704	4.099936	0.081768
Babe (1995)	3.953368	3.860922	0.092446
Sixth Sense, The (1999)	4.477410	4.379944	0.097465
Men in Black (1997)	3.817844	3.719000	0.098844
Bug's Life, A (1998)	3.927505	3.826580	0.100925
Wizard of Oz, The (1939)	4.355030	4.203138	0.151892
Galaxy Quest (1999)	3.901554	3.733979	0.167575
E.T. the Extra-Terrestrial (1982)	4.089850	3.920264	0.169586
Independence Day (ID4) (1996)	3.651007	3.481145	0.169861

pandas电影数据分析_第3张图片

Index(['American Beauty (1999)', 'Star Wars: Episode IV - A New Hope (1977)',
       'Star Wars: Episode V - The Empire Strikes Back (1980)',
       'Star Wars: Episode VI - Return of the Jedi (1983)',
       'Jurassic Park (1993)', 'Saving Private Ryan (1998)',
       'Terminator 2: Judgment Day (1991)', 'Matrix, The (1999)',
       'Back to the Future (1985)', 'Silence of the Lambs, The (1991)',
       'Men in Black (1997)', 'Raiders of the Lost Ark (1981)', 'Fargo (1996)',
       'Sixth Sense, The (1999)', 'Braveheart (1995)',
       'Shakespeare in Love (1998)', 'Princess Bride, The (1987)',
       'Schindler's List (1993)', 'L.A. Confidential (1997)',
       'Groundhog Day (1993)', 'E.T. the Extra-Terrestrial (1982)',
       'Star Wars: Episode I - The Phantom Menace (1999)',
       'Being John Malkovich (1999)', 'Shawshank Redemption, The (1994)',
       'Godfather, The (1972)', 'Forrest Gump (1994)', 'Ghostbusters (1984)',
       'Pulp Fiction (1994)', 'Terminator, The (1984)', 'Toy Story (1995)',
       'Alien (1979)', 'Total Recall (1990)', 'Fugitive, The (1993)',
       'Gladiator (2000)', 'Aliens (1986)', 'Blade Runner (1982)',
       'Who Framed Roger Rabbit? (1988)', 'Stand by Me (1986)',
       'Usual Suspects, The (1995)', 'Babe (1995)', 'Airplane! (1980)',
       'Independence Day (ID4) (1996)', 'Galaxy Quest (1999)',
       'One Flew Over the Cuckoo's Nest (1975)', 'Wizard of Oz, The (1939)',
       '2001: A Space Odyssey (1968)', 'Abyss, The (1989)',
       'Bug's Life, A (1998)', 'Jaws (1975)',
       'Godfather: Part II, The (1974)'],
      dtype='object', name='Title')
	Rating
Title	
2001: A Space Odyssey (1968)	4.068765
Abyss, The (1989)	3.683965
Airplane! (1980)	3.971115
Alien (1979)	4.159585
Aliens (1986)	4.125824
American Beauty (1999)	4.317386
Babe (1995)	3.891491
Back to the Future (1985)	3.990321
Being John Malkovich (1999)	4.125390
Blade Runner (1982)	4.273333
Braveheart (1995)	4.234957
Bug's Life, A (1998)	3.854375
E.T. the Extra-Terrestrial (1982)	3.965183
Fargo (1996)	4.254676
Forrest Gump (1994)	4.087967
Fugitive, The (1993)	4.103258
Galaxy Quest (1999)	3.771412
Ghostbusters (1984)	3.905548
Gladiator (2000)	4.106029
Godfather, The (1972)	4.524966
Godfather: Part II, The (1974)	4.357565
Groundhog Day (1993)	3.953029
Independence Day (ID4) (1996)	3.510405
Jaws (1975)	4.089570
Jurassic Park (1993)	3.763847
L.A. Confidential (1997)	4.219406
Matrix, The (1999)	4.315830
Men in Black (1997)	3.739953
One Flew Over the Cuckoo's Nest (1975)	4.390725
Princess Bride, The (1987)	4.303710
Pulp Fiction (1994)	4.278213
Raiders of the Lost Ark (1981)	4.477725
Saving Private Ryan (1998)	4.337354
Schindler's List (1993)	4.510417
Shakespeare in Love (1998)	4.127480
Shawshank Redemption, The (1994)	4.554558
Silence of the Lambs, The (1991)	4.351823
Sixth Sense, The (1999)	4.406263
Stand by Me (1986)	4.096919
Star Wars: Episode I - The Phantom Menace (1999)	3.409778
Star Wars: Episode IV - A New Hope (1977)	4.453694
Star Wars: Episode V - The Empire Strikes Back (1980)	4.292977
Star Wars: Episode VI - Return of the Jedi (1983)	4.022893
Terminator 2: Judgment Day (1991)	4.058513
Terminator, The (1984)	4.152050
Total Recall (1990)	3.682365
Toy Story (1995)	4.146846
Usual Suspects, The (1995)	4.517106
Who Framed Roger Rabbit? (1988)	3.679822
Wizard of Oz, The (1939)	4.247963

	Rating
Title	
Shawshank Redemption, The (1994)	4.554558
Godfather, The (1972)	4.524966
Usual Suspects, The (1995)	4.517106
Schindler's List (1993)	4.510417
Raiders of the Lost Ark (1981)	4.477725
Star Wars: Episode IV - A New Hope (1977)	4.453694
Sixth Sense, The (1999)	4.406263
One Flew Over the Cuckoo's Nest (1975)	4.390725
Godfather: Part II, The (1974)	4.357565
Silence of the Lambs, The (1991)	4.351823
Saving Private Ryan (1998)	4.337354
American Beauty (1999)	4.317386
Matrix, The (1999)	4.315830
Princess Bride, The (1987)	4.303710
Star Wars: Episode V - The Empire Strikes Back (1980)	4.292977
Pulp Fiction (1994)	4.278213
Blade Runner (1982)	4.273333
Fargo (1996)	4.254676
Wizard of Oz, The (1939)	4.247963
Braveheart (1995)	4.234957
L.A. Confidential (1997)	4.219406
Alien (1979)	4.159585
Terminator, The (1984)	4.152050
Toy Story (1995)	4.146846
Shakespeare in Love (1998)	4.127480
Aliens (1986)	4.125824
Being John Malkovich (1999)	4.125390
Gladiator (2000)	4.106029
Fugitive, The (1993)	4.103258
Stand by Me (1986)	4.096919
Jaws (1975)	4.089570
Forrest Gump (1994)	4.087967
2001: A Space Odyssey (1968)	4.068765
Terminator 2: Judgment Day (1991)	4.058513
Star Wars: Episode VI - Return of the Jedi (1983)	4.022893
Back to the Future (1985)	3.990321
Airplane! (1980)	3.971115
E.T. the Extra-Terrestrial (1982)	3.965183
Groundhog Day (1993)	3.953029
Ghostbusters (1984)	3.905548
Babe (1995)	3.891491
Bug's Life, A (1998)	3.854375
Galaxy Quest (1999)	3.771412
Jurassic Park (1993)	3.763847
Men in Black (1997)	3.739953
Abyss, The (1989)	3.683965
Total Recall (1990)	3.682365
Who Framed Roger Rabbit? (1988)	3.679822
Independence Day (ID4) (1996)	3.510405
Star Wars: Episode I - The Phantom Menace (1999)	3.409778

总结

  • pandas和sql操作类似,可以方便地对数据进行连接、过滤、转换和聚合,但SQL所能执行的分组运算有限,而pandas表达能力强,更擅长进行分组运算

你可能感兴趣的:(学习笔记)