之前讲述的一直是DataFrame结构,也是Pandas中最核心的结构
我们把dataFrame进行分解,其中的一行,或者一列,就是一个Series结构。
import pandas as pd
fandango=pd.read_csv("fandango_score_comparison.csv")
#提取一个列,一个列就是一个series
series_film=fandango["FILM"]
#获取这个列的type,可以得到类型为,即一个Series
print(type(series_film))
#通过索引和切片的值得到数据
print(series_film[0:5])
series_rt=fandango["RottenTomatoes"]
print(series_rt[0:5])
fandango.head()
Import the Series object from pandas
通过下面的Series结构的.values()方法,我们可以得到一个ndarray对象
即DataFrame内部是由Series组成,Series内部则是由一个个ndarray对象构成
Pandas其实很多的对象是封装在NumPy之上的,很多函数是把NumPy两个库很多操作都是互通的
可以通过Series()方法实现,函数参数为两个ndarray数组,其中一组作为values,另一组作为这组数据相关的一组索引
由nddarray和Series的关系可知,参数也可以是两个列的Series
即如何获取Series中的元素:
from pandas import Series
#调取一个Series的.values方法,返回一个ndarray
film_names=series_film.values
print(type(film_names))
#输出结果为:
#注意这里的元素获取和DataFrame对元素的获取是不同的,DataFrame对于元素的获取要使用.loc[]函数
#而这里直接切片即可
print(film_names[0:10])
rt_scores=series_rt.values
print(rt_scores[0:10])
series_custom=Series(rt_scores,index=film_names)
#特别的是一般地索引只能是1,2,3等数字,但Series是可以用字符串作为索引的
#将两个ndarray对象传入,类似于key-value的结构
print(type(series_custom[["Minions (2015)","Leviathan (2014)"]]))
print(type(series_custom["Cinderella (2015)"]))
#这一步骤类似于字典中取出value值的操作,传入一个key,得到一个value,只不过这里是传入了两个,其实也可以传入一个
['Avengers: Age of Ultron (2015)' 'Cinderella (2015)' 'Ant-Man (2015)'
'Do You Believe? (2015)' 'Hot Tub Time Machine 2 (2015)'
'The Water Diviner (2015)' 'Irrational Man (2015)' 'Top Five (2014)'
'Shaun the Sheep Movie (2015)' 'Love & Mercy (2015)']
[74 85 80 18 14 63 42 86 99 89]
series_custom=Series(rt_scores,index=film_names)
#索引的多样性:
#一:字符串作为索引
print(series_custom[["Minions (2015)","Leviathan (2014)"]])
#二:数字下标作为索引
series_custom[5:10]
Minions (2015) 54
Leviathan (2014) 99
dtype: int64
The Water Diviner (2015) 63
Irrational Man (2015) 42
Top Five (2014) 86
Shaun the Sheep Movie (2015) 99
Love & Mercy (2015) 89
dtype: int64
Series的排序使用的不多,使用Sorted()方法。
排序的方法直接使用sorted()函数,可以类比对DataFrame的排序操作,类比DataFrame的*sorted_values()*方法;
使用reindex()方法可以将Series按照重新排序过得index进行排序,类比DataFrame的*reset_index()*方法;
按照index排序还是按照value值进行排序可以分别调用:
origin_index=series_custom.index.tolist()
#print(origin_index) 即那些字符串
sorted_index=sorted(origin_index)
#print(sorted_index) 将字符串升序排列
sorted_by_index=series_custom.reindex(sorted_index)
print(sorted_by_index)
'71 (2015) 97
5 Flights Up (2015) 52
A Little Chaos (2015) 40
A Most Violent Year (2014) 90
About Elly (2015) 97
..
What We Do in the Shadows (2015) 96
When Marnie Was There (2015) 89
While We're Young (2015) 83
Wild Tales (2014) 96
Woman in Gold (2015) 52
Length: 146, dtype: int64
#按照index的排序
sc2=series_custom.sort_index()
#按照values的排序
sc3=series_custom.sort_values()
print(sc3[0:10])
Paul Blart: Mall Cop 2 (2015) 5
Hitman: Agent 47 (2015) 7
Hot Pursuit (2015) 8
Fantastic Four (2015) 9
Taken 3 (2015) 9
The Boy Next Door (2015) 10
The Loft (2015) 11
Unfinished Business (2015) 11
Mortdecai (2015) 12
Seventh Son (2015) 12
dtype: int64
# The values in a Series object are treated as a ndarray, the core data type in NumPy
import numpy as np
#add each value with each other
print(np.add(series_custom,series_custom))
#apply sine function to each value
np.sin(series_custom)
#Return the highest value (will return a single value but not a series)
np.max(series_custom)
Avengers: Age of Ultron (2015) 148
Cinderella (2015) 170
Ant-Man (2015) 160
Do You Believe? (2015) 36
Hot Tub Time Machine 2 (2015) 28
...
Mr. Holmes (2015) 174
'71 (2015) 194
Two Days, One Night (2014) 194
Gett: The Trial of Viviane Amsalem (2015) 200
Kumiko, The Treasure Hunter (2015) 174
Length: 146, dtype: int64
100
#使用True和False列表作为index值
series_greater_than_50=series_custom[series_custom>50]
print(series_custom)
criteria_one=series_custom>50
criteria_two=series_custom<75
both_criteria=series_custom[criteria_one&criteria_two]
print(both_criteria)
Avengers: Age of Ultron (2015) 74
Cinderella (2015) 85
Ant-Man (2015) 80
Do You Believe? (2015) 18
Hot Tub Time Machine 2 (2015) 14
...
Mr. Holmes (2015) 87
'71 (2015) 97
Two Days, One Night (2014) 97
Gett: The Trial of Viviane Amsalem (2015) 100
Kumiko, The Treasure Hunter (2015) 87
Length: 146, dtype: int64
Avengers: Age of Ultron (2015) 74
The Water Diviner (2015) 63
Unbroken (2014) 51
Southpaw (2015) 59
Insidious: Chapter 3 (2015) 59
The Man From U.N.C.L.E. (2015) 68
Run All Night (2015) 60
5 Flights Up (2015) 52
Welcome to Me (2015) 71
Saint Laurent (2015) 51
Maps to the Stars (2015) 60
Pitch Perfect 2 (2015) 67
The Age of Adaline (2015) 54
The DUFF (2015) 71
Ricki and the Flash (2015) 64
Unfriended (2015) 60
American Sniper (2015) 72
The Hobbit: The Battle of the Five Armies (2014) 61
Paper Towns (2015) 55
Big Eyes (2014) 72
Maggie (2015) 54
Focus (2015) 57
The Second Best Exotic Marigold Hotel (2015) 62
The 100-Year-Old Man Who Climbed Out the Window and Disappeared (2015) 67
Escobar: Paradise Lost (2015) 52
Into the Woods (2014) 71
Inherent Vice (2014) 73
Magic Mike XXL (2015) 62
Woman in Gold (2015) 52
The Last Five Years (2015) 60
Jurassic World (2015) 71
Minions (2015) 54
Spare Parts (2015) 52
dtype: int64
#首先生成两个Index相同的Series
rt_critics=Series(fandango["RottenTomatoes"].values,index=fandango['FILM'])
rt_users=Series(fandango["RottenTomatoes"].values,index=fandango['FILM'].values)
rt_mean=(rt_critics+rt_users)/2
print(rt_mean)
FILM
Avengers: Age of Ultron (2015) 74.0
Cinderella (2015) 85.0
Ant-Man (2015) 80.0
Do You Believe? (2015) 18.0
Hot Tub Time Machine 2 (2015) 14.0
...
Mr. Holmes (2015) 87.0
'71 (2015) 97.0
Two Days, One Night (2014) 97.0
Gett: The Trial of Viviane Amsalem (2015) 100.0
Kumiko, The Treasure Hunter (2015) 87.0
Length: 146, dtype: float64