使用iloc,loc和ix在Pandas DataFrames中选择行和列

使用iloc,loc和ix在Pandas DataFrames中选择行和列

使用iloc,loc和ix在Pandas DataFrames中选择行和列_第1张图片

Pandas数据选择

有多种方法可以从Pandas DataFrames中选择和索引行和列。我发现在线教程侧重于行和列选择的高级选择,这对我的要求有点复杂。

选择选项

在Pandas中实现选择和索引活动有三个主要选项,这可能会令人困惑。本文涉及的三个选择案例和方法是:

  1. 按行号选择数据(.iloc)
  2. 按标签或条件语句选择数据(.loc)
  3. 选择混合方法(.ix)(现在在Pandas 0.20.1中弃用)

数据设置

此博客文章受其他教程的启发,描述了使用这些操作的选择活动。本教程适用于一般数据科学情况,通常我发现自己:

  1. 数据框中的每一行代表一个数据样本。
  2. 每列都是一个变量,通常以其命名。我很少选择没有名字的专栏。
  3. 我需要快速并经常从数据框中选择相关的行以进行建模和可视化活动。

对于初学者来说,Python 的Pandas库提供了高性能,易于使用的数据结构和数据分析工具,用于处理“系列”和“数据框架”中的表格数据。它使您的数据处理更加轻松,我之前写过关于  使用Pandas对数据进行分组和汇总的文章。

本博客文章中讨论的iloc和loc方法摘要。iloc和loc是从Pandas数据帧中检索数据的操作。

Pandas DataFrames的选择和索引方法

对于这些探索,我们需要一些样本数据 - 我从www.briandunning.com下载了uk-500样本数据集。此数据包含虚构英国字符的人工名称,地址,公司和电话号码。要继续,您可以在此处下载.csv文件  。加载如下数据(图表这里来自Jupyter笔记本  在蟒蛇Python的安装):

pandas 导入 pd
随机导入
从下载的CSV文件中读取数据。
data = pd.read_csv(' https://s3-eu-west-1.amazonaws.com/shanebucket/downloads/uk-500.csv '
设置一个数字id,用作示例的索引。
数据[ ' ID ' ] = [random.randint(01000 X 范围(data.shape [ 0 ])]
data.head(5
  
View the code on Gist. 从CSV文件加载的示例数据。
使用iloc和DataFrame进行单选
行:
data.iloc [ 0 ] 第一行数据帧(Aleshia Tomkiewicz) - 注意一个Series数据类型输出。
data.iloc [ 1 ] 第二行数据框(Evan Zigomalas)
data.iloc [ - 1 ] 最后一行数据帧(Mi Richan)
列:
data.iloc [:,0 ] 数据帧的第一列(first_name)
data.iloc [:,1 ] 数据帧的第二列(last_name)
data.iloc [:,- 1 ] 数据帧的最后一列(id)
  
View the code on Gist.

可以使用.iloc索引器一起选择多个列和行。

使用iloc和DataFrame进行多行和列选择
data.iloc [ 05 ] 前五行数据帧
data.iloc [:,02 ] 包含所有行的前两列数据帧
data.iloc [[ 03624 ],[ 056 ]] 第一,第四,第七,第25行+第一第六第七列。
data.iloc [ 0558 ] 前5行和第五,第六,数据帧的第七列(县- > PHONE1)。
  
View the code on Gist.

以这种方式使用iloc时要记住两个问题:

  1. 请注意,.iloc在选择一行时返回Pandas系列,在选择多行时返回Pandas DataFrame,或者如果选择了任何完整列。要解决此问题,请在需要DataFrame输出时传递单值列表。

    使用.loc或.iloc时,可以通过将列表或单个值传递给选择器来控制输出格式。

  2. 当以这种方式选择多列或多行时,请记住在您的选择中,例如[1:5],所选的行/列将从第一个数字运行到  一个减去第二个数字。例如[1:5]将是1,2,3,4。,[x,y]从x到y-1。

在实践中,我很少使用iloc索引器,除非我想要数据帧的第一行(.iloc [0])或最后一行(.iloc [-1])。

2.使用“loc”选择pandas数据

Pandas loc索引器可以与DataFrames一起用于两种不同的用例:

  • a。)按标签/索引选择行
  • b。)使用布尔/条件查找选择行

loc索引器的使用方法与iloc相同:data.loc [<行选择>,<列选择>]。

2A。使用.loc进行基于标签/基于索引的索引

使用loc方法的选择基于数据帧的索引(如果有的话)。使用 df.set_index()在DataFrame上设置索引的位置,.loc方法直接根据任何行的索引值进行选择。例如,将测试数据框的索引设置为人员“last_name”:

data.set_index( 姓氏就地=
data.head()
  
View the code on Gist.

姓氏设置为样本数据框上的索引集 现在使用索引集,我们可以使用.loc [
View the code on Gist.

Note that in the last example, data.loc[487] (the row with index value 487) is not equal to data.iloc[487] (the 487th row in the data). The index of the DataFrame can be out of numeric order, and/or a string or multi-value.

2b. Boolean / Logical indexing using .loc

Conditional selections with boolean arrays using data.loc[] is the most common method that I use with Pandas DataFrames. With boolean indexing or logical selection, you pass an array or Series of True/False values to the .loc indexer to select the rows where your Series has True values.

In most use cases, you will make selections based on the values of different columns in your data set.

For example, the statement data[‘first_name’] == ‘Antonio’] produces a Pandas Series with a True/False value for every row in the ‘data’ DataFrame, where there are “True” values for the rows where the first_name is “Antonio”. These type of boolean arrays can be passed directly to the .loc indexer as so:

.loc索引器可以接受布尔数组来选择行 Using a boolean True/False series to select rows in a pandas data frame – all rows with first name of “Antonio” are selected.

As before, a second argument can be passed to .loc to select particular columns out of the data frame. Again, columns are referred to by name for the loc indexer and can be a single string, a list of columns, or a slice “:” operation.

使用.loc的多列选择示例 Selecting multiple columns with loc can be achieved by passing column names to the second argument of .loc[] 请注意,在选择列时,如果仅选择了一列,则.loc运算符将返回一个Series。对于单列DataFrame,使用单元素列表来保留DataFrame格式,例如:

.loc根据选择返回Series或DataFrames 如果将单个列的选择作为字符串,则从.loc返回一个系列。传递一个列表以获取DataFrame。

为清晰起见,请确保您了解.loc选项的以下附加示例:

选择名为Antonio,#的行以及'city'和'email'之间的所有列
data.loc [data [ ' first_name ' ] == ' Antonio '' city '' email ' ]
# Select rows where the email column ends with 'hotmail.com', include all columns
data.loc[data['email'].str.endswith("hotmail.com")]
# Select rows with last_name equal to some values, all columns
data.loc[data['first_name'].isin(['France', 'Tyisha', 'Eric'])]
# Select rows with first name Antonio AND hotmail email addresses
data.loc[data['email'].str.endswith("gmail.com") & (data['first_name'] == 'Antonio')]
# select rows with id column between 100 and 200, and just return 'postal' and 'web' columns
data.loc[(data['id'] > 100) & (data['id'] <= 200), ['postal', 'web']]
# A lambda function that yields True/False values can also be used.
# Select rows where the company name has 4 words in it.
data.loc[data['company_name'].apply(lambda x: len(x.split(' ')) == 4)]
# Selections can be achieved outside of the main .loc for clarity:
# Form a separate variable with your selections:
idx = data['company_name'].apply(lambda x: len(x.split(' ')) == 4)
# Select only the True values in 'idx' and only the 3 columns specified:
data.loc[idx, ['email', 'first_name', 'company']]
  
  

View the code on Gist.

Logical selections and boolean Series can also be passed to the generic [] indexer of a pandas DataFrame and will give the same results: data.loc[data[‘id’] == 9] == data[data[‘id’] == 9] .

3. Selecting pandas data using ix

Note: The ix indexer has been deprecated in recent versions of Pandas, starting with version 0.20.1.

The ix[] indexer is a hybrid of .loc and .iloc. Generally, ix is label based and acts just as the .loc indexer. However, .ix also supports integer type selections (as in .iloc) where passed an integer. This only works where the index of the DataFrame is not integer based. ix will accept any of the inputs of .loc and .iloc.

Slightly more complex, I prefer to explicitly use .iloc and .loc to avoid unexpected results.

As an example:

# ix indexing works just the same as .loc when passed strings
data.ix[['Andrade']] == data.loc[['Andrade']]
# ix indexing works the same as .iloc when passed integers.
data.ix[[33]] == data.iloc[[33]]
# ix only works in both modes when the index of the DataFrame is NOT an integer itself.
  
  

View the code on Gist.

Setting values in DataFrames using .loc

With a slight change of syntax, you can actually update your DataFrame in the same statement as you select and filter using .loc indexer. This particular pattern allows you to update values in columns depending on different conditions. The setting operation does not make a copy of the data frame, but edits the original data.

As an example:

# Change the first name of all rows with an ID greater than 2000 to "John"
data.loc[data['id'] > 2000, "first_name"] = "John"
# Change the first name of all rows with an ID greater than 2000 to "John"
data.loc[data['id'] > 2000, "first_name"] = "John"
  
  

View the code on Gist.

That’s the basics of indexing and selecting with Pandas. If you’re looking for more, take a look at the .iat, and .at operations for some more performance-enhanced value accessors in the Pandas Documentation and take a look at selecting by callable functions for more iloc and loc fun.

The Pandas DataFrame - this blog post covers the basics of loading, editing, and viewing data in Python, and getting to grips with the all-important data structure in Python - the Pandas Dataframe. Learn by example to load CSV files, rename columns, extract statistics, and select rows and columns." rel=“nofollow” data-origin=“643” data-position=“0”>

The Pandas DataFrame - loading, editing, and viewing data in Python

你可能感兴趣的:(Pandas,Pandas,iloc,loc)