索引、选择与过滤

索引

In [39]: obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])

In [40]: obj
Out[40]:
a    0.0
b    1.0
c    2.0
d    3.0
dtype: float64

In [41]: obj['b']
Out[41]: 1.0

In [42]: obj[1]
Out[42]: 1.0

In [43]: obj[2 : 4]
Out[43]:
c    2.0
d    3.0
dtype: float64

In [44]: obj[['b', 'a', 'd']]
Out[44]:
b    1.0
a    0.0
d    3.0
dtype: float64

In [45]: obj[[1, 3]]
Out[45]:
b    1.0
d    3.0
dtype: float64

In [46]: obj[obj < 2]
Out[46]:
a    0.0
b    1.0
dtype: float64

普通切片不包含尾部,Series 不同

In [47]: obj['b' : 'c']
Out[47]:
b    1.0
c    2.0
dtype: float64

设值修改相应部分

In [48]:  obj['b' : 'c'] = 5

In [49]: obj
Out[49]:
a    0.0
b    5.0
c    5.0
d    3.0
dtype: float64

使用单个值或序列,可以从 DataFrame 中索引出一个或多个列。

In [50]: data = pd.DataFrame(np.arange(16).reshape((4, 4)),
    ...:                     index=['Ohio', 'Colorado', 'Utah', 'New York'],
    ...:                     columns=['one', 'two', 'three', 'four'])

In [51]: data
Out[51]:
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7
Utah        8    9     10    11
New York   12   13     14    15

In [52]: data['two']
Out[52]:
Ohio         1
Colorado     5
Utah         9
New York    13
Name: two, dtype: int32

In [53]: data[['three', 'one']]
Out[53]:
          three  one
Ohio          2    0
Colorado      6    4
Utah         10    8
New York     14   12

根据一个布尔数组切片或选择数据

In [54]: data[:2]
Out[54]:
          one  two  three  four
Ohio        0    1      2     3
Colorado    4    5      6     7

In [55]: data[data['three'] > 5]
Out[55]:
          one  two  three  four
Colorado    4    5      6     7
Utah        8    9     10    11
New York   12   13     14    15

使用布尔值 DataFrame 进行索引

In [56]: data < 5
Out[56]:
            one    two  three   four
Ohio       True   True   True   True
Colorado   True  False  False  False
Utah      False  False  False  False
New York  False  False  False  False

In [57]: data[data < 5] = 0

In [58]: data
Out[58]:
          one  two  three  four
Ohio        0    0      0     0
Colorado    0    5      6     7
Utah        8    9     10    11
New York   12   13     14    15

很像 Numpy 二维数组索引

使用 loc 和 iloc 选择数据进行行标签索引。

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: data = pd.DataFrame(np.arange(16).reshape((4, 4)),
   ...:                     index = ['Ohio', 'Colorado', 'Utah', 'New York'],
   ...:                     columns = ['one', 'two', 'three', 'four'])

In [4]: data < 5
Out[4]:
            one    two  three   four
Ohio       True   True   True   True
Colorado   True  False  False  False
Utah      False  False  False  False
New York  False  False  False  False

In [5]:

In [5]: data[data < 5] = 0

In [6]: data
Out[6]:
          one  two  three  four
Ohio        0    0      0     0
Colorado    0    5      6     7
Utah        8    9     10    11
New York   12   13     14    15

In [7]: data.loc['Colorado', ['two', 'three']]
Out[7]:
two      5
three    6
Name: Colorado, dtype: int32

In [8]: data.iloc[2, [3, 0, 1]]
Out[8]:
four    11
one      8
two      9
Name: Utah, dtype: int32

In [9]: data.iloc[2]
Out[9]:
one       8
two       9
three    10
four     11
Name: Utah, dtype: int32

In [10]: data.iloc[[1, 2], [3, 0, 1]]
Out[10]:
          four  one  two
Colorado     7    0    5
Utah        11    8    9

索引功能还用于切片

In [12]: data.loc[:'Utah', 'two']
Out[12]:
Ohio        0
Colorado    5
Utah        9
Name: two, dtype: int32

In [13]: data.iloc[:, :3][data.three > 5]
Out[13]:
          one  two  three
Colorado    0    5      6
Utah        8    9     10
New York   12   13     14

DataFrame 索引选项 《利用Python进行数据分析》143页

整数索引

In [15]: ser = pd.Series(np.arange(3.))

In [16]: ser
Out[16]:
0    0.0
1    1.0
2    2.0
dtype: float64

In [17]: ser[:1]
Out[17]:
0    0.0
dtype: float64

In [18]: ser.loc[:1]
Out[18]:
0    0.0
1    1.0
dtype: float64

In [19]: ser.iloc[:1]
Out[19]:
0    0.0
dtype: float64

你可能感兴趣的:(索引、选择与过滤)