直接贴出用例:
(1)用列表list构建Series
import pandas as pd
my_list=[7,'Beijing','19大',3.1415,-10000,'Happy']
s=pd.Series(my_list)
print(type(s))
print(s)
<class 'pandas.core.series.Series'>
0 7
1 Beijing
2 19大
3 3.1415
4 -10000
5 Happy
dtype: object
pandas会默认用0到n来做Series的index,但是我们也可以自己指定index,index可以理解为dict里面的key
s=pd.Series([7,'Beijing','19大',3.1415,-10000,'Happy'],
index=['A','B','C','D','E','F'])
print(s)
A 7
B Beijing
C 19大
D 3.1415
E -10000
F Happy
dtype: object
(2)用字典dict来构建Series,因为Series本身其实就是key-value的结构
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts)
Beijing 55000.0
Guangzhou 45000.0
Hangzhou 20000.0
Shanghai 60000.0
Suzhou NaN
shenzhen 50000.0
Name: income, dtype: float64
(3)用numpy array来构建Series
import numpy as np
d=pd.Series(np.random.randn(5),index=['a','b','c','d','e'])
print(d)
a -0.329401
b -0.435921
c -0.232267
d -0.846713
e -0.406585
dtype: float64
以上还是比较容易理解的。
(1)可以像对待一个list一样对待一个Series,完成各种切片的操作
import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print('apts:\n',apts)
print('apts[3]:\n',apts[3])
print('apts[[3,4,1]]:\n',apts[[3,4,1]])
print('apts[:-1]:\n',apts[:-1])
print('apts[1:]+apts[:-1]:\n',apts[1:]+apts[:-1])
apts:
Beijing 55000.0
Shanghai 60000.0
shenzhen 50000.0
Hangzhou 20000.0
Guangzhou 45000.0
Suzhou NaN
Name: income, dtype: float64
apts[3]:
20000.0
apts[[3,4,1]]:
Hangzhou 20000.0
Guangzhou 45000.0
Shanghai 60000.0
Name: income, dtype: float64
apts[:-1]:
Beijing 55000.0
Shanghai 60000.0
shenzhen 50000.0
Hangzhou 20000.0
Guangzhou 45000.0
Name: income, dtype: float64
apts[1:]+apts[:-1]:
Beijing NaN
Guangzhou 90000.0
Hangzhou 40000.0
Shanghai 120000.0
Suzhou NaN
shenzhen 100000.0
Name: income, dtype: float64
(2)Series可以用来选择数据
import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts['Shanghai'])
print('Hangzhou' in apts)
print('Choingqing' in apts)
60000.0
True
False
(3)和numpy很像,可以使用numpy的各种函数mean,median,max,min
import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
less_than_50000=(apts<=50000)
print(apts[less_than_50000])
print(apts.mean())
Guangzhou 45000.0
Hangzhou 20000.0
shenzhen 50000.0
Name: income, dtype: float64
46000.0
直接利用索引值赋值,boolean indexing,在赋值里它也可以用
import pandas as pd
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
print(apts)
print('Old income of shenzhen:{}'.format(apts['shenzhen']))
apts['shenzhen']=70000
print('New income of shenzhen:{}'.format(apts['shenzhen']),'\n')
less_than_50000=(apts<50000)
print(less_than_50000)
apts[less_than_50000]=40000
print(apts)
Beijing 55000.0
Shanghai 60000.0
shenzhen 50000.0
Hangzhou 20000.0
Guangzhou 45000.0
Suzhou NaN
Name: income, dtype: float64
Old income of shenzhen:50000.0
New income of shenzhen:70000.0
Beijing False
Shanghai False
shenzhen False
Hangzhou True
Guangzhou True
Suzhou False
Name: income, dtype: bool
Beijing 55000.0
Shanghai 60000.0
shenzhen 70000.0
Hangzhou 40000.0
Guangzhou 40000.0
Suzhou NaN
Name: income, dtype: float64
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}
apts=pd.Series(cities,name='income')
apts['shenzhen']=70000
less_than_50000=(apts<50000)
apts[less_than_50000]=40000
print('apts:\n',apts,'\n')
print(apts.notnull()) # boolean条件
print(apts.isnull())
print(apts[apts.isnull()]) #利用缺失索引布尔值取元素
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})
print('apts2:\n',apts2)
apts3=apts+apts2 #索引缺失相加
print('apts3:\n',apts3)
apts3[apts3.isnull()]=300 #将缺失位置赋值为中值
print(apts3)
apts:
Beijing 55000.0
Shanghai 60000.0
shenzhen 70000.0
Hangzhou 40000.0
Guangzhou 40000.0
Suzhou NaN
Name: income, dtype: float64
Beijing True
Shanghai True
shenzhen True
Hangzhou True
Guangzhou True
Suzhou False
Name: income, dtype: bool
Beijing False
Shanghai False
shenzhen False
Hangzhou False
Guangzhou False
Suzhou True
Name: income, dtype: bool
Suzhou NaN
Name: income, dtype: float64
apts2:
Beijing 10000
Shanghai 8000
shenzhen 6000
Tianjin 40000
Guangzhou 7000
Chongqing 30000
dtype: int64
apts3:
Beijing 65000.0
Chongqing NaN
Guangzhou 47000.0
Hangzhou NaN
Shanghai 68000.0
Suzhou NaN
Tianjin NaN
shenzhen 76000.0
dtype: float64
Beijing 65000.0
Chongqing 300.0
Guangzhou 47000.0
Hangzhou 300.0
Shanghai 68000.0
Suzhou 300.0
Tianjin 300.0
shenzhen 76000.0
dtype: float64
import pandas as pd
import numpy as np
data = pd.Series(np.arange(10), index=[49,48,47,46,45, 1, 2, 3, 4, 5])
print('data:\n',data,'\n')
print('data.iloc[:3]:\n',data.iloc[:3],'\n')
print('data.loc[:3]:\n',data.loc[:3],'\n')
print('data.ix[:3]:\n',data.ix[:3],'\n')
data:
49 0
48 1
47 2
46 3
45 4
1 5
2 6
3 7
4 8
5 9
dtype: int64
data.iloc[:3]:
49 0
48 1
47 2
dtype: int64
data.loc[:3]:
49 0
48 1
47 2
46 3
45 4
1 5
2 6
3 7
dtype: int64
data.ix[:3]:
49 0
48 1
47 2
46 3
45 4
1 5
2 6
3 7
dtype: int64
loc:在index的标签上进行索引(即是在index上寻找相应的标签,不是下标),范围包括start和end。
iloc:在index的位置上进行索引(即是按照普通的下标寻找),不包括end.
ix:先在index的标签上索引,索引不到就在index的位置上索引(如果index非全整数),不包括end。
为了避免歧义,建议优先选择loc和iloc
>>> data = pd.Series(np.arange(10), index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> data
49 0
48 1
47 2
46 3
45 4
1 5
2 6
3 7
4 8
5 9
>>> data.iloc[:6] # 从下标0开始,不包括下标为6的标签
49 0
48 1
47 2
46 3
45 4
1 5
dtype: int64
>>> data.loc[:6] # 因为index里面不包含标签6,所以报错
...
...
KeyError: 6
>>> data.ix[:6] # 因为index里面不包含标签6,index都是整数,并不是非全整数的情况
...
...
KeyError: 6
>>> data= pd.Series(np.arange(10), index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> data
a 0
b 1
c 2
d 3
e 4
1 5
2 6
3 7
4 8
5 9
dtype: int64
>>> data.ix[:6] # 这里不会报错,因为index的标签是非全整数
a 0
b 1
c 2
d 3
e 4
1 5
dtype: int64
>>> data.loc[:6]
TypeError: cannot do slice indexing
这里算是一个pandas的语法笔记。。
参考:
https://blog.csdn.net/cymy001/article/details/78268721
https://blog.csdn.net/zeroder/article/details/54319021