Pandas 的主要数据结构是 Series (一维数据)与 DataFrame(二维数据),这两种数据结构足以处理金融、统计、社会科学、工程等领域里的大多数典型用例。
Series的数据使用np.array()和range()创建的,dtype不同,占用的内存大小不同。参数name可以指定Series的名字。
import pandas as pd
import numpy as np
data = pd.Series(np.array(range(3)),name='data')
print(data)
'''
0 0
1 1
2 2
Name: data, dtype: int32
'''
data1 = pd.Series(range(3),name='data1')
print(data1)
'''
0 0
1 1
2 2
Name: data1, dtype: int64
'''
还可以通过列表,元组,字典作为数据创建Series。
参数index可以为Series修改索引。
data_ls = pd.Series([1,2,3,4])
print(data_ls)
'''
0 1
1 2
2 3
3 4
dtype: int64
'''
data_tr = pd.Series((1,2,3,4),index=['aa','bb','cc','dd'])
print(data_tr)
'''
aa 1
bb 2
cc 3
dd 4
'''
# 字典的键是索引
data_dict = pd.Series({'a':1,'b':2,'c':3})
print(data_dict)
'''
a 1
b 2
c 3
dtype: int64
'''
data_dict = pd.Series({'a':1,'b':2,'c':3})
data_dict.pop('a')
print(data_dict)
'''
b 2
c 3
dtype: int64
'''
# drop() 会返回一个删除元素后的新数组,不会对原数组进行修改
data_dict= data_dict.drop('b')
print(data_dict)
'''
c 3
dtype: int64
'''
data_dict = pd.Series({'a':1,'b':2,'c':3})
print(data_dict)
'''
a 1
b 2
c 3
dtype: int64
'''
# 通过标签索引修改
data_dict['a'] = 55
print(data_dict)
'''
a 55
b 2
c 3
dtype: int64
'''
# 通过数字索引修改
data_dict[1] = 666
print(data_dict)
'''
a 55
b 666
c 3
dtype: int64
'''
data_dict = pd.Series({'a':1,'b':2,'c':3})
print(data_dict)
'''
a 1
b 2
c 3
dtype: int64
'''
data_dict['d'] = 4
print(data_dict)
'''
a 1
b 2
c 3
d 4
dtype: int64
'''
data_dict = data_dict.append(pd.Series([5,6],index=['e','f']))
print(data_dict)
'''
a 1
b 2
c 3
d 4
e 5
f 6
dtype: int64
'''
df = pd.DataFrame({'A': 1.,
'B': pd.Timestamp('20130102'),
'C': pd.Series(1, index=list(range(4)), dtype='float32'),
'D': np.array([3] * 4, dtype='int32'),
'E': pd.Categorical(["test", "train", "test", "train"]),
'F': 'foo',
'G':(1,2,3,4)})
print(df)
'''
A B C D E F G
0 1.0 2013-01-02 1.0 3 test foo 1
1 1.0 2013-01-02 1.0 3 train foo 2
2 1.0 2013-01-02 1.0 3 test foo 3
3 1.0 2013-01-02 1.0 3 train foo 4
'''
df1 = pd.DataFrame([1,2,3,4])
print(df1)
'''
0
0 1
1 2
2 3
3 4
'''
df2 = pd.DataFrame([['a',1],['b',2],['c',3]],columns=['string','int'])
print(df2)
'''
string int
0 a 1
1 b 2
2 c 3
'''
修改index
df3 = pd.DataFrame({'name':['zs','ls','ww'],'age':[12,23,42]},index=['a','b','c'])
print(df3)
'''
name age
a zs 12
b ls 23
c ww 42
'''
df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}],index=[1,2,3])
print(df4)
'''
name sex
1 zhangsan nan
2 lisi nv
3 wangwu nan
'''
pop()和drop()方法,也可以用del
df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}],index=[1,2,3])
print(df4)
'''
name sex
1 zhangsan nan
2 lisi nv
3 wangwu nan
'''
df4.pop('sex')
print(df4)
'''
name
1 zhangsan
2 lisi
3 wangwu
'''
df4.drop('name',axis=1,inplace=True)
print(df4)
'''
Empty DataFrame
Columns: []
Index: [1, 2, 3]
'''
loc 标签索引,行和列的名称
iloc 整型索引(绝对位置索引),绝对意义上的几行几列,起始索引为0
df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}],index=[1,2,3])
print(df4)
'''
name sex
1 zhangsan nan
2 lisi nv
3 wangwu nan
'''
df4.columns=['NAME','AGE']
df4.index = ['a','b','c']
print(df4)
'''
NAME AGE
a zhangsan nan
b lisi nv
c wangwu nan
'''
df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}],index=[1,2,3])
print(df4)
'''
name sex
1 zhangsan nan
2 lisi nv
3 wangwu nan
'''
df4.loc[1,'name'] = '张三'
print(df4)
'''
name sex
1 张三 nan
2 lisi nv
3 wangwu nan
'''
df4.loc[2] = ['李四','男']
print(df4)
'''
name sex
1 张三 nan
2 李四 男
3 wangwu nan
'''
df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}])
print(df4)
'''
name sex
0 zhangsan nan
1 lisi nv
2 wangwu nan
'''
# 修改某一个元素
df4.iloc[1:2,0]='nv'
print(df4)
'''
name sex
0 zhangsan nan
1 nv nv
2 wangwu nan
'''
# 修改某一列
df4.iloc[:,0]=['a','b','c']
print(df4)
'''
name sex
0 a nan
1 b nv
2 c nan
'''
# 修改某一行
df4.iloc[2,:]=['zhangmazi','nan']
print(df4)
'''
name sex
0 a nan
1 b nv
2 zhangmazi nan
'''
df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}])
print(df4)
'''
name sex
0 zhangsan nan
1 lisi nv
2 wangwu nan
'''
df4.index
'''
RangeIndex(start=0, stop=3, step=1)
'''
df4.columns
'''
Index(['name', 'sex'], dtype='object')
'''
df4.describe()
'''
name sex
count 3 3
unique 3 2
top zhangsan nan
freq 1 2
'''
df4.info()
'''
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 3 non-null object
1 sex 3 non-null object
dtypes: object(2)
memory usage: 176.0+ bytes
'''