pandas loc与iloc的区别

目录

一、二者的特点

二、官网原文

三、例子——总有一款适合你


一、二者的特点

  • loc 可用“字符”、“整数”、“布尔值”作为索引,也就是标签索引

注意:此处的“整数”将被解释为index的一个label而不是index的位置

  • iloc 只允许“整数”作为索引,也就是位置索引,和列表索引类似,里面只能是数字

注意:此处的“整数”将被解释为index的位置,前闭后开

其中,loc是指location的意思,iloc中的i是指integer。

 

用人话说

  • 用“index名称”或“column名称”索引:

df.loc["Adam", "Age"] # 返回 df 中 index=="Adam" and column=="Age"的值;

df.loc["Adam"]            # 返回 df 中 index=="Adam"的行的所有值,形为Series,该Series的index为df的column,values为该行的值。

  • 用df 的位置索引:

df.iloc[2, 3] # 返回 df 中 index==2 and column==3的值;

df.iloc[1:5, 3:6] # 返回 df 中 index从1到4行 and column从3到5行,形为DataFrame 。

 

二、官网原文

DataFrame.loc

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

  • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).

  • A list or array of labels, e.g. ['a', 'b', 'c'].

  • A slice object with labels, e.g. 'a':'f'.

    Warning:Note that contrary to usual python slices, both the start and the stop are included

  • A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

  • callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)

DataFrame.iloc

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:

  • An integer, e.g. 5.
  • A list or array of integers, e.g. [4, 3, 0].
  • A slice object with ints, e.g. 1:7.
  • A boolean array.
  • callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).

 

三、例子——总有一款适合你

  • loc

取值:

# 初始化df:
>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=['cobra', 'viper', 'sidewinder'],
...      columns=['max_speed', 'shield'])
>>> df
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

# 取df 的一行:以 Series的形式返回该行

>>> df.loc['viper']
max_speed    4
shield       5
Name: viper, dtype: int64

# 取df的多行:以 DataFrame的形式返回这些值
>>> df.loc[['viper', 'sidewinder']] # 注意:要使用 [[]]
            max_speed  shield
viper               4       5
sidewinder          7       8

# 取df的一个值:
>>> df.loc['cobra', 'shield']
2

# 以“布尔值”为元素的列表,也可以取值,True取,False不取
>>> df.loc[[False, False, True]]
            max_speed  shield
sidewinder          7       8

# 设定判断条件后,返回“布尔值”构成的Series,也可以取值
# 在'shield'列中筛选大于6的行,取这些行的全部值
>>> df.loc[df['shield'] > 6]
            max_speed  shield
sidewinder          7       8
# 在'shield'列中筛选大于6的行,取['max_speed']列的对应元素(例如,筛选身高大于1.8米者的体重)
>>> df.loc[df['shield'] > 6, ['max_speed']]
            max_speed
sidewinder          7


# 以lambda表达式做判断,返回“布尔值”构成的Series,实现取值
>>> df.loc[lambda df: df['shield'] == 8]
            max_speed  shield
sidewinder          7       8

赋值:

与“取值”类似

all_data.loc[all_data["GarageType"].isnull(), ["GarageType"]] = "No Garage"

 

你可能感兴趣的:(数据处理,python,pandas,loc,iloc)