科学计算库Pandas(1):创建Series与DataFrame对象
Pandas使用两种数据类型:Series与dataframe
1.Series对象
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
1.1创建
se1 = Series([1,2,3,4])
se2 = Series(data=[1,2,3,4],index=['a','b','c','d'])
se3 = Series(data=[5,6,7,8],index=list('fine'))
print(se1)
print(se2)
print(se3)
0 1
1 2
2 3
3 4
dtype: int64
a 1
b 2
c 3
d 4
dtype: int64
f 5
i 6
n 7
e 8
dtype: int64
1.2获取数据
print(se2.values)
print(se2.index)
print(list(se2.iteritems()))
[1 2 3 4]
Index(['a', 'b', 'c', 'd'], dtype='object')
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
print('索引下标',se2['c'])
print('位置下标',se2[2])
索引下标 3
位置下标 3
print(se2[['b','d']])
print(se2[[1,3]])
b 2
d 4
dtype: int64
b 2
d 4
dtype: int64
print(se2['b':'d'])
print(se2[1:3])
b 2
c 3
d 4
dtype: int64
b 2
c 3
dtype: int64
se2.index = list('wxyz')
print(se2)
w 1
x 2
y 3
z 4
dtype: int64
1.3 reindex(),字典变为Series对象,drop(),
print(se2.reindex(['m','w','x','y','z']))
m NaN
w 1.0
x 2.0
y 3.0
z 4.0
dtype: float64
dict0 = {'r':100,'b':400,"g":300,"p":900}
se0 = Series(dict0)
print(se0)
r 100
b 400
g 300
p 900
dtype: int64
print(se2.drop(['w','y']))
x 2
z 4
dtype: int64
1.4 对Series进行算数运算操作:
····基于index进行
····加减乘除 + - * /
····计算结果以浮点数的形式存储,避免精度丢失
····如果pandas在两个Series里找不大到相同的index,对应yield置就返回一个空值NaN
se_1 = pd.Series([1,3,5,7],['Chen','Li','Guo','sun'])
se_2 = pd.Series([2,4,6,8],['Chen','Li','Guo','sun'])
se_3 = pd.Series([0,2,4,6,8],['Chen','East','South','West','North'])
print(se_2 - se_1)
print('****************')
print(se_3 - se_1)
Chen 1
Li 1
Guo 1
sun 1
dtype: int64
****************
Chen -1.0
East NaN
Guo NaN
Li NaN
North NaN
South NaN
West NaN
sun NaN
dtype: float64
print(se_2 + se_1)
print('****************')
print(se_3 + se_1)
Chen 3
Li 7
Guo 11
sun 15
dtype: int64
****************
Chen 1.0
East NaN
Guo NaN
Li NaN
North NaN
South NaN
West NaN
sun NaN
dtype: float64
ses = Series(data = [1,6,3,5], index=list('abcd'))
print(ses[ses>2])
print('****************')
print(ses * 10)
print('****************')
print(np.square(ses))
b 6
c 3
d 5
dtype: int64
****************
a 10
b 60
c 30
d 50
dtype: int64
****************
a 1
b 36
c 9
d 25
dtype: int64
2.DataFrame数据表
data,index,columns分别为数据、行索引、列索引
2.1 创建1
df_1 = DataFrame(data=np.random.randint(0,100,(5,5)),index = [1,2,3,4,5],columns=['a','b','c','d','e'])
print(df_1)
a b c d e
1 0 48 59 73 58
2 99 94 76 55 3
3 14 29 95 92 36
4 15 82 49 9 49
5 28 42 47 56 75
2.2 创建2,使用字典,index决定行索引,字典的键决定列索引
dt2={'name':['Guo','Li','Huang'],
'year':[25,26,27],
'num':[6,8,10]}
df2=pd.DataFrame(dt2,index=[1,2,3])
print(df2)
name year num
1 Guo 25 6
2 Li 26 8
3 Huang 27 10
2.3 创建3,使用from_dict
dt3 = {'apple':[2,4,6],'blue':[5,50,500]}
df3 = pd.DataFrame.from_dict(dt3)
print(df3)
print('***************')
dt4 = {
'name':pd.Series(['G','F','W'],['a','b','c']),
'year':pd.Series([1998,1999,2000,2001],['a','b','c','d']),
'day':pd.Series([19,18,17],['a','b','d'])
}
df4 = pd.DataFrame(dt4)
print(df4)
print('***************')
dtBack = df4.to_dict()
print(dtBack)
apple blue
0 2 5
1 4 50
2 6 500
***************
name year day
a G 1998 19.0
b F 1999 18.0
c W 2000 NaN
d NaN 2001 17.0
***************
{'name': {'a': 'G', 'b': 'F', 'c': 'W', 'd': nan}, 'year': {'a': 1998, 'b': 1999, 'c': 2000, 'd': 2001}, 'day': {'a': 19.0, 'b': 18.0, 'c': nan, 'd': 17.0}}