pandas.loc方法使用详解

今天还是接着昨天的那一篇文章来说,今天我们来好好聊聊Pandas中的.loc方法!

我们首先来看一下文档里是怎么说的:

pandas provides a suite of methods in order to have purely label based indexing.

The .loc attribute is the primary access method. The following are valid inputs:

  • A single label, e.g. 5 or'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index)
  • A list or array of labels ['a', 'b', 'c']
  • A slice object with labels 'a':'f' (note that contrary to usual python slices, both the start and the stop are included, when present in the index! - also See Slicing with labels)
  • A boolean array
  • A callable, see Selection By Callable

>>> import numpy as np
>>> import pandas as pd
>>> s1 = pd.Series(np.random.randn(6), index=list('abcdef'))
>>> s1
a   -0.354041
b    0.286674
c   -1.144354
d   -2.290284
e   -0.299573
f   -0.011348
dtype: float64
复制代码

我们定义了一个包含6个随机数的pandas.Series,这6个数的索引标签(a label of the index)分别是abcdef 这6个字符,我们可以通过索引标签来获取我们想要获取的数据:

>>> s1.loc['b']
0.28667372019035603
复制代码

我还想要再强调一下,我们是按照索引的标签(a label of the index)了来获取数据的,跟这个索引本身是几并无关系:

>>> s2 = pd.Series(np.random.randn(6), index=[5, 4, 3, 2, 1, 0])
>>> s2
5    0.063622
4   -0.789719
3   -0.916464
2    1.023828
1   -0.440047
0    0.269705
dtype: float64
>>> s2.loc[5]
0.063622356476971106
复制代码

可以看到,我们这里生成了一个新的包含6个随机数的pandas.Series,这6个数的索引标签(a label of the index)分别是5,4,3,2,1,0这6个整数,这与他们在这个pandas.Series中处于第几行并无关系。当我们传入整数5的时候,返回了标签5所在行所对应的数字,而并非第5行所对应的数字。

我们也可以通过一个标签切片(a slice objects with labels)来获取多个数据也可以进行赋值:

>>> s1.loc['b':'f']
b    0.286674
c   -1.144354
d   -2.290284
e   -0.299573
f   -0.011348
dtype: float64
>>> s1.loc['b':]
b    0.286674
c   -1.144354
d   -2.290284
e   -0.299573
f   -0.011348
dtype: float64
>>> s1.loc['d':'f'] = 0
>>> s1
a   -0.354041
b    0.286674
c   -1.144354
d    0.000000
e    0.000000
f    0.000000
dtype: float64
复制代码

可以看到这里的切片用法和Python原生的list的切片是不一样的,冒号两边的startstop位置都被包含了进来,要注意两者之间的差别!

pandasDataFrame.loc方法并没有很大区别,以下展示代码,不进行过多赘述

>>> df1 = pd.DataFrame(np.random.randn(6, 4), 
                       index=list('abcdef'), 
                       columns=list('ABCD'))
>>> df1
	A	        B	        C	        D
a	0.031419	0.658151	1.069829	-1.366788
b	0.889844	-1.402487	0.183858	-0.037312
c	0.278374	-0.122152	0.429787	-1.251808
d	-0.935268	-0.768464	-1.343263	-0.435845
e	-0.612629	-1.538650	-1.774796	1.013778
f	-1.313907	-0.472731	-1.635683	0.140725
>>> df1.loc[['a', 'b', 'd'], :]
	A	        B	        C	        D
a	0.031419	0.658151	1.069829	-1.366788
b	0.889844	-1.402487	0.183858	-0.037312
d	-0.935268	-0.768464	-1.343263	-0.435845
>>> df1.loc['d':, 'A':'C']
	A	        B	        C	        
d	-0.935268	-0.768464	-1.343263	
e	-0.612629	-1.538650	-1.774796	
f	-1.313907	-0.472731	-1.635683
>>> df1.loc['a']
A    0.031419
B    0.658151
C    1.069829
D   -1.366788
Name: a, dtype: float64
>>> df1.xs('a')
A    0.031419
B    0.658151
C    1.069829
D   -1.366788
Name: a, dtype: float64
>>> df1.loc['a'] > 1
A    False
B    False
C     True
D    False
Name: a, dtype: bool
>>> df1.loc[:, df1.loc['a'] > 1]
        C
a	1.069829
b	0.183858
c	0.429787
d	-1.343263
e	-1.774796
f	-1.635683
>>> df1.loc['a', 'A']
0.03141854106892028
>>> df1.at['a', 'A']
0.03141854106892028
复制代码

最后再看一点:

>>> s = pd.Series(list('abcde'), index=[0, 3, 2, 5, 4])
>>> s
0    a
3    b
2    c
5    d
4    e
dtype: object
>>> s.sort_index()
0    a
2    c
3    b
4    e
5    d
dtype: object
>>> s.sort_index().loc[1:6]
2    c
3    b
4    e
5    d
dtype: object
复制代码

由上面我们看到我们可以根据索引标签(a label of the index)来进行排序,并且可以通过索引标签来筛选数据

关于pandas.loc方法的用法就写到这里啦!文章中涉及的所有代码都可以在我的Github中找到!文章和代码中有什么错误错误恳请大家不吝赐教!欢迎你们留言评论!

你可能感兴趣的:(pandas.loc方法使用详解)