Series序列,是一种一维的结构,类似于一维list和numpy array,但是功能比他们要更为强大,Series由两部分组成:索引index和数值values
而且,Series的索引可以不是数字,而是一些有意义的值,例如名字、班级等等
创建序列
import pandas as pd
s = pd.Series([1,2,3],index = ['A','B','C'],name = 'First_Series')
'''
A 1
B 2
C 3
Name: First_Series, dtype: int64
'''
s.values
# array([1, 2, 3], dtype=int64)
type(s.values)
# numpy.ndarray
s.index
# Index(['A', 'B', 'C'], dtype='object')
s.name
# 'First_Series'
# 通过索引访问内容
s['A']
# 1
s['A': 'B']
'''
A 1
B 2
Name: First_Series, dtype: int64
'''
看起来Series像是一个字典,实际上,我们确实可以通过字典的方式来创建一个Series
pd.Series({
'Canada': 35.467,
'France': 63.951,
'Germany': 80.94,
'Italy': 60.665,
'Japan': 127.061,
'United Kingdom': 64.511,
'United States': 318.523
}, name='G7 Population in millions')
'''
Canada 35.467
France 63.951
Germany 80.940
Italy 60.665
Japan 127.061
United Kingdom 64.511
United States 318.523
Name: G7 Population in millions, dtype: float64
'''
创建时间序列
dates = pd.date_range("20220101", periods=6)
'''
DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04',
'2022-01-05', '2022-01-06'],
dtype='datetime64[ns]', freq='D')
'''
pd.Series(data=range(0, 10, 2), index=pd.date_range("20220101", periods=5, freq="MS"))
'''
2022-01-01 0
2022-02-01 2
2022-03-01 4
2022-04-01 6
2022-05-01 8
'''
对Series进行布尔操作
g7_pop
'''
Canada 35.467
France 63.951
Germany 80.940
Italy 60.665
Japan 127.061
United Kingdom 64.511
United States 318.523
Name: G7 Population in millions, dtype: float64
'''
g7_pop > 70
'''
Canada False
France False
Germany True
Italy False
Japan True
United Kingdom False
United States True
Name: G7 Population in millions, dtype: bool
'''
g7_pop[g7_pop > 70]
'''
Germany 80.940
Japan 127.061
United States 318.523
Name: G7 Population in millions, dtype: float64
'''
g7_pop[g7_pop > g7_pop.mean()]
'''
Japan 127.061
United States 318.523
Name: G7 Population in millions, dtype: float64
'''
# ~ not
# | or
# & and