pandas.loc/iloc/ix的区别

pandas.loc/iloc/ix的区别

loc[]:Access a group of rows and columns by label(s) or a boolean array.

通过标签或布尔数组访问DataF对象的行或列

允许的输入(输入指的是df.loc[]中[]内部的部分)包括:

  • 单个标签,如"5"或"a",都是字符串,”5“也不代表数字

  • 标签列表list或标签数组array,如[‘a’,‘b’,‘c’]

  • 标签切片,如’a’:‘c’,与序列切片如0:2不同,后者不包含index=2的元素,前者包含结束标签’c’所在的行。

  • 布尔类型数组作为标签,例如[True, False, True]等价于[‘a’,‘c’]

  • 一个带有一个参数(调用Series,DataFrame或Panel)的可调用callable函数,它返回索引的有效输出(上面列出的那些)

    callable 参见 https://www.aliyun.com/jiaocheng/462985.html?spm=5176.100033.2.15.6cba1f38Y84nok

例子:

import pandas as pd
data=[[1,2,3,4],[5,6,7,8],[9,10,11,12]]
print(type(data))

index=['a','b','c']
columns=['d','e','f','g']

df=pd.DataFrame(data=data,index=index,columns=columns)
df

d e f g

a 1 2 3 4

b 5 6 7 8

c 9 10 11 12

df.loc['a'] #将'a'行作为Series返回
type(df.loc['a']) #pandas.core.series.Series
df.loc[['a']] #将'a'行作为DataFrame返回
type(df.loc[['a']]) #pandas.core.frame.DataFrame
#综上,[]返回Series,[[]]返回DataFrame
df.loc['a':'c'] #将'a'到'c'行作为DataFrame返回
type(df.loc['a':'c']) #pandas.core.frame.DataFrame

df.loc['a','f']#返回'a'行,'f'列的元素
df.loc[,'f']#将'f'列作为Series返回
df.loc[['a','c'],['d','f']]#将'a'、'c'行'd'、'f'列作为DataFrame返回
df.loc[:,'d':'f']#将所有行,'d'到'f'列作为DataFrame返回
#综上,切片不用[]包围,标签列表或数组需要[]包围

df['g']#将第'g'列以Series返回,与df.loc[:,'g']等价
df['g'].equals(df.loc[:,'g'])#True
df.loc[df['g'] > 6]#将所有第'g'列大于6的行以DataFrame返回
#说明条件表达式和切片类似,将其作为loc[]的输入,返回DataFrame
df.loc[df['g']>6,['g']]
df.loc[df['g']>6,'g']#分别返回DataFrame和Series
df.loc[lambda df:df['g']==8]#DataFrame

#赋值和取值一样
df.loc[['a','c'],['d','f']]=10#将'a'、'c'行'd'、'f'列的所有元素赋值为10

#多索引(MultiIndex)的DataFrame,官方文档的例子
tuples = [
    ('cobra', 'mark i'), ('cobra', 'mark ii'),
    ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
    ('viper', 'mark ii'), ('viper', 'mark iii')
]
index = pd.MultiIndex.from_tuples(tuples)
values = [[12, 2], [0, 4], [10, 20],
          [1, 4], [7, 1], [16, 36]]
dfm = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
dfm
                       max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36
dfm.loc['cobra']#返回DataFrame
dfm.loc[('cobra', 'mark ii')]#Series
dfm.loc['cobra', 'mark i']#Series
dfm.loc['cobra', 'mark i'].equals(dfm.loc[('cobra', 'mark i')])#True
dfm.loc[[('cobra', 'mark ii')]]#DataFrame
dfm.loc[('cobra', 'mark i'), 'shield']#single tuple
dfm.loc[('cobra', 'mark i'):'viper']#DataFrame
dfm.loc['cobra':'viper']#DataFrame
dfm.loc[('cobra', 'mark i'):'viper'].equals(dfm.loc['cobra':'viper']) #True
dfm.loc[('cobra', 'mark i'):('viper', 'mark ii')]#DataFrame

iloc[]:

通过整数索引(0到length-1)或布尔数组来访问DataFrame对象。把DataFrame对象看做一个二维数组来访问。

例如,

dfm.iloc[0]#Series
dfm.iloc[0:1]#DataFrame
dfm.iloc[0:3,0:1]#DataFrame

ix[]: pandas 0.20.0以后,已经不用(deprecated)了。

原始文档

Type:        property
String form: <property object at 0x0000021B3606B728>
Docstring:  
Access a group of rows and columns by label(s) or a boolean array.

``.loc[]`` is primarily label based, but may also be used with a
boolean array.

Allowed inputs are:

- A single label, e.g. ``5`` or ``'a'``, (note that ``5`` is
  interpreted as a *label* of the index, and **never** as an
  integer position along the index).
- A list or array of labels, e.g. ``['a', 'b', 'c']``.
- A slice object with labels, e.g. ``'a':'f'``.

  .. warning:: Note that contrary to usual python slices, **both** the
      start and the stop are included

- A boolean array of the same length as the axis being sliced,
  e.g. ``[True, False, True]``.
- A ``callable`` function with one argument (the calling Series, DataFrame
  or Panel) and that returns valid output for indexing (one of the above)

See more at :ref:`Selection by Label <indexing.label>`

See Also
--------
DataFrame.at : Access a single value for a row/column label pair
DataFrame.iloc : Access group of rows and columns by integer position(s)
DataFrame.xs : Returns a cross-section (row(s) or column(s)) from the
    Series/DataFrame.
Series.loc : Access group of values using labels

Examples
--------
**Getting values**

>>> dfm = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=['cobra', 'viper', 'sidewinder'],
...      columns=['max_speed', 'shield'])
>>> dfm
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

Single label. Note this returns the row as a Series.

>>> dfm.loc['viper']
max_speed    4
shield       5
Name: viper, dtype: int64

List of labels. Note using ``[[]]`` returns a DataFrame.

>>> dfm.loc[['viper', 'sidewinder']]
            max_speed  shield
viper               4       5
sidewinder          7       8

Single label for row and column

>>> df.loc['cobra', 'shield']
2

Slice with labels for row and single label for column. As mentioned
above, note that both the start and stop of the slice are included.

>>> df.loc['cobra':'viper', 'max_speed']
cobra    1
viper    4
Name: max_speed, dtype: int64

Boolean list with the same length as the row axis

>>> df.loc[[False, False, True]]
            max_speed  shield
sidewinder          7       8

Conditional that returns a boolean Series

>>> df.loc[df['shield'] > 6]
            max_speed  shield
sidewinder          7       8

Conditional that returns a boolean Series with column labels specified

>>> df.loc[df['shield'] > 6, ['max_speed']]
            max_speed
sidewinder          7

Callable that returns a boolean Series

>>> df.loc[lambda df: df['shield'] == 8]
            max_speed  shield
sidewinder          7       8

**Setting values**

Set value for all items matching the list of labels

>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df
            max_speed  shield
cobra               1       2
viper               4      50
sidewinder          7      50

Set value for an entire row

>>> df.loc['cobra'] = 10
>>> df
            max_speed  shield
cobra              10      10
viper               4      50
sidewinder          7      50

Set value for an entire column

>>> df.loc[:, 'max_speed'] = 30
>>> df
            max_speed  shield
cobra              30      10
viper              30      50
sidewinder         30      50

Set value for rows matching callable condition

>>> df.loc[df['shield'] > 35] = 0
>>> df
            max_speed  shield
cobra              30      10
viper               0       0
sidewinder          0       0

**Getting values on a DataFrame with an index that has integer labels**

Another example using integers for the index

>>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
...      index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df
   max_speed  shield
7          1       2
8          4       5
9          7       8

Slice with integer labels for rows. As mentioned above, note that both
the start and stop of the slice are included.

>>> df.loc[7:9]
   max_speed  shield
7          1       2
8          4       5
9          7       8

**Getting values with a MultiIndex**

A number of examples using a DataFrame with a MultiIndex

>>> tuples = [
...    ('cobra', 'mark i'), ('cobra', 'mark ii'),
...    ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
...    ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
...         [1, 4], [7, 1], [16, 36]]
>>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Single label. Note this returns a DataFrame with a single index.

>>> df.loc['cobra']
         max_speed  shield
mark i          12       2
mark ii          0       4

Single index tuple. Note this returns a Series.

>>> df.loc[('cobra', 'mark ii')]
max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

Single label for row and column. Similar to passing in a tuple, this
returns a Series.

>>> df.loc['cobra', 'mark i']
max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64

Single tuple. Note using ``[[]]`` returns a DataFrame.

>>> df.loc[[('cobra', 'mark ii')]]
               max_speed  shield
cobra mark ii          0       4

Single tuple for the index with a single label for the column

>>> df.loc[('cobra', 'mark i'), 'shield']
2

Slice from index tuple to single label

>>> df.loc[('cobra', 'mark i'):'viper']
                     max_speed  shield
cobra      mark i           12       2
           mark ii           0       4
sidewinder mark i           10      20
           mark ii           1       4
viper      mark ii           7       1
           mark iii         16      36

Slice from index tuple to index tuple

>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
                    max_speed  shield
cobra      mark i          12       2
           mark ii          0       4
sidewinder mark i          10      20
           mark ii          1       4
viper      mark ii          7       1

Raises
------
KeyError:
    when any items are not found
Type:        property
String form: <property object at 0x0000021B3606B688>
Docstring:  
Purely integer-location based indexing for selection by position.

``.iloc[]`` is primarily integer position based (from ``0`` to
``length-1`` of the axis), but may also be used with a boolean
array.

Allowed inputs are:

- An integer, e.g. ``5``.
- A list or array of integers, e.g. ``[4, 3, 0]``.
- A slice object with ints, e.g. ``1:7``.
- A boolean array.
- A ``callable`` function with one argument (the calling Series, DataFrame
  or Panel) and that returns valid output for indexing (one of the above)

``.iloc`` will raise ``IndexError`` if a requested indexer is
out-of-bounds, except *slice* indexers which allow out-of-bounds
indexing (this conforms with python/numpy *slice* semantics).

See more at :ref:`Selection by Position <indexing.integer>`

你可能感兴趣的:(python,数据分析)