Python Beginners(4) -- Pandas

1. Series

document
stackflow
运用场景：假设有一个电影名称的list, 一个电影分数的list，两边的index是对齐的。给定一个电影名，如何迅速知道其分数？

通过pd.series(data= , index= )可以建一个表。这里，把电影分数作为data，把电影名称作为index。从而建立起电影名称和分数之间的对应关系。
如果想查询分数，只需通过 result["name"] or result [["n1", "n2 ]]来实现。

# Import the Series object from pandas
from pandas import Series
film_names = series_film.values    # get film names list [ ]
rt_scores = series_rt.values    # get score list [ ]
series_custom = pd.Series(rt_scores, index=film_names) # set index and data
print (series_custom[['Minions (2015)', 'Leviathan (2014)']]) 
# get multiple film scores based on their names at the same time

另外生成的series_custom同样可以用integer index来访问。这样，它既像一个list，也像一个dictionary。

series_custom[5:11]

Sort the string index of series_custom
针对上一步得到的series_custom，进行sort index。

首先把原有的index转化成list；
再sorted(list)
再用series_custom.reindex(new_index_list)得到新的result

original_index = series_custom.index
indexlist = sorted(original_index.tolist())
sorted_by_index = series_custom.reindex(indexlist)
print (sorted_by_index)

series.sort_index(), sort_values()
鉴于sort如此常用，pandas有内置的sort function，可以sort index也可以sort values

sc2 = series_custom.sort_index()  # sort index
sc3 = series_custom.sort_values()  # sort value
print (sc2[0:10])
print (sc3[0:10])

normalization
可以对ndarray统一运算，比如全都除以20

series_normalized = series_custom/20
print (series_normalized)

一些其他的例子

# Add each value with each other
np.add(series_custom, series_custom)
# Apply sine function to each value
np.sin(series_custom)
# Return the highest value (will return a single value, not a Series)
np.max(series_custom)

截取满足某些条件的数
下面这句返回的是中间过程的Boolean，判断是否大于50

series_custom > 50

将中间这句判断带入，得到最后的数据结果

series_greater_than_50 = series_custom[series_custom > 50]

也可以用逻辑连接，在判断条件里写多个条件

criteria_one = series_custom > 50
criteria_two = series_custom < 75
both_criteria = series_custom[criteria_one and criteria_two]

Series始终保有两组数据之间的联系关系
比如把critics和users的平均分算出来，和以前的index一起，生成一个新的series_custom

rt_critics = Series(fandango['RottenTomatoes'].values, index=fandango['FILM'])
rt_users = Series(fandango['RottenTomatoes_User'].values, index=fandango['FILM'])
rt_mean = Series((rt_critics + rt_users)/2, index=fandango['FILM'])

Python Beginners(4) -- Pandas

1. Series

你可能感兴趣的:(Python Beginners(4) -- Pandas)