Pandas之Series和DataFrame的基础操作

Pandas 的主要数据结构是 Series (一维数据)与 DataFrame(二维数据),这两种数据结构足以处理金融、统计、社会科学、工程等领域里的大多数典型用例。

一、Series

基础操作

1、创建

Series的数据使用np.array()和range()创建的,dtype不同,占用的内存大小不同。参数name可以指定Series的名字。

import pandas as pd
import numpy as np

data = pd.Series(np.array(range(3)),name='data')
print(data)
'''
0    0
1    1
2    2
Name: data, dtype: int32
'''
data1 = pd.Series(range(3),name='data1')
print(data1)
'''
0    0
1    1
2    2
Name: data1, dtype: int64
'''

还可以通过列表,元组,字典作为数据创建Series。
参数index可以为Series修改索引。

data_ls = pd.Series([1,2,3,4])
print(data_ls)
'''
0    1
1    2
2    3
3    4
dtype: int64
'''
data_tr = pd.Series((1,2,3,4),index=['aa','bb','cc','dd'])
print(data_tr)
'''
aa    1
bb    2
cc    3
dd    4
'''

# 字典的键是索引
data_dict = pd.Series({'a':1,'b':2,'c':3})
print(data_dict)
'''
a    1
b    2
c    3
dtype: int64
'''

2、删除

data_dict = pd.Series({'a':1,'b':2,'c':3})
data_dict.pop('a')
print(data_dict)
'''
b    2
c    3
dtype: int64
'''
# drop() 会返回一个删除元素后的新数组,不会对原数组进行修改
data_dict= data_dict.drop('b')
print(data_dict)
'''
c    3
dtype: int64
'''

3、修改

data_dict = pd.Series({'a':1,'b':2,'c':3})
print(data_dict)
'''
a    1
b    2
c    3
dtype: int64
'''
# 通过标签索引修改
data_dict['a'] = 55
print(data_dict)
'''
a    55
b     2
c     3
dtype: int64
'''
# 通过数字索引修改
data_dict[1] = 666
print(data_dict)
'''
a     55
b    666
c      3
dtype: int64
'''

4、增加

data_dict = pd.Series({'a':1,'b':2,'c':3})
print(data_dict)
'''
a    1
b    2
c    3
dtype: int64
'''
data_dict['d'] = 4
print(data_dict)
'''
a    1
b    2
c    3
d    4
dtype: int64
'''
data_dict = data_dict.append(pd.Series([5,6],index=['e','f']))
print(data_dict)
'''
a    1
b    2
c    3
d    4
e    5
f    6
dtype: int64
'''

二、DataFrame

基础操作

创建

df = pd.DataFrame({'A': 1.,
                    'B': pd.Timestamp('20130102'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': np.array([3] * 4, dtype='int32'),
                    'E': pd.Categorical(["test", "train", "test", "train"]),
                    'F': 'foo',
                   'G':(1,2,3,4)})
print(df)
'''
       A          B    C  D      E    F  G
0  1.0 2013-01-02  1.0  3   test  foo  1
1  1.0 2013-01-02  1.0  3  train  foo  2
2  1.0 2013-01-02  1.0  3   test  foo  3
3  1.0 2013-01-02  1.0  3  train  foo  4
'''
1、一维列表
df1 = pd.DataFrame([1,2,3,4])
print(df1)
'''
   0
0  1
1  2
2  3
3  4
'''
2、二维列表
df2 = pd.DataFrame([['a',1],['b',2],['c',3]],columns=['string','int'])
print(df2)
'''
 string  int
0      a    1
1      b    2
2      c    3
'''
3、字典包列表

修改index

df3 = pd.DataFrame({'name':['zs','ls','ww'],'age':[12,23,42]},index=['a','b','c'])
print(df3)
'''
   name  age
a   zs   12
b   ls   23
c   ww   42
'''
4、列表包字典
df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}],index=[1,2,3])
print(df4)
'''
       name  sex
1  zhangsan  nan
2      lisi   nv
3    wangwu  nan
'''

删除

pop()和drop()方法,也可以用del

df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}],index=[1,2,3])
print(df4)
'''
       name  sex
1  zhangsan  nan
2      lisi   nv
3    wangwu  nan
'''
df4.pop('sex')
print(df4)
'''
       name
1  zhangsan
2      lisi
3    wangwu
'''
df4.drop('name',axis=1,inplace=True)
print(df4)
'''
Empty DataFrame
Columns: []
Index: [1, 2, 3]
'''

修改

loc 标签索引,行和列的名称
iloc 整型索引(绝对位置索引),绝对意义上的几行几列,起始索引为0

修改行列名称
df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}],index=[1,2,3])
print(df4)
'''
       name  sex
1  zhangsan  nan
2      lisi   nv
3    wangwu  nan
'''
df4.columns=['NAME','AGE']
df4.index = ['a','b','c']
print(df4)
'''
       NAME  AGE
a  zhangsan  nan
b      lisi   nv
c    wangwu  nan
'''
通过loc修改
df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}],index=[1,2,3])
print(df4)
'''
       name  sex
1  zhangsan  nan
2      lisi   nv
3    wangwu  nan
'''
df4.loc[1,'name'] = '张三'
print(df4)
'''
     name  sex
1      张三  nan
2    lisi   nv
3  wangwu  nan
'''
df4.loc[2] = ['李四','男']
print(df4)
'''
     name  sex
1      张三  nan
2      李四    男
3  wangwu  nan
'''
通过iloc修改
df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}])
print(df4)
'''
       name  sex
0  zhangsan  nan
1      lisi   nv
2    wangwu  nan
'''
# 修改某一个元素
df4.iloc[1:2,0]='nv'
print(df4)
'''
       name  sex
0  zhangsan  nan
1        nv   nv
2    wangwu  nan
'''
# 修改某一列
df4.iloc[:,0]=['a','b','c']
print(df4)
'''
  name  sex
0    a  nan
1    b   nv
2    c  nan
'''
# 修改某一行
df4.iloc[2,:]=['zhangmazi','nan']
print(df4)
'''
        name  sex
0          a  nan
1          b   nv
2  zhangmazi  nan
'''

查找

df4 = pd.DataFrame([{'name':'zhangsan','sex':'nan'},{'name':'lisi','sex':'nv'},{'name':'wangwu','sex':'nan'}])
print(df4)
'''
       name  sex
0  zhangsan  nan
1      lisi   nv
2    wangwu  nan
'''
df4.index
'''
RangeIndex(start=0, stop=3, step=1)
'''
df4.columns
'''
Index(['name', 'sex'], dtype='object')
'''
df4.describe()
'''
           name  sex
count          3    3
unique         3    2
top     zhangsan  nan
freq           1    2
'''
df4.info()
'''

RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    3 non-null      object
 1   sex     3 non-null      object
dtypes: object(2)
memory usage: 176.0+ bytes
'''

你可能感兴趣的:(Pandas,pandas,python,数据分析)