In Numpy arrays, we are familiar with the concepts of indexing, slicing, and masking, etc. Similarly, Pandas to supports indexing in their Dataframe. If we are familiar with the indexing in Numpy arrays, the indexing in Pandas will be very easy.
在Numpy数组中,我们熟悉索引,切片和遮罩等概念。类似地,Pandas支持在其Dataframe中建立索引。 如果我们熟悉Numpy数组中的索引编制,那么Pandas中的索引编制将非常容易。
What is Indexing in Python?
什么是Python中的索引编制?
Selecting values from particular rows and columns in a dataframe is known as Indexing. By using Indexing, we can select all rows and some columns or some rows and all columns.
从数据框中的特定行和列中选择值称为索引。 通过使用索引,我们可以选择所有行和某些列,或者选择某些行和所有列。
Let’s create a sample data in a series form for better understanding of indexing.
让我们以系列形式创建样本数据,以更好地理解索引。
The output series looks like this,
输出系列看起来像这样,
1 a
3 b
5 c
dtype: object
Now, here Python offers two types of indices
现在,这里Python提供了两种类型的索引
Explicit
Explicit
Implicit
Implicit
Explicit Indexing:
明确索引:
For the above dataset if we pass the command as,
对于上述数据集,如果我们将命令传递为
ds[1]
it uses explicit indices
ds[1]
它使用显式索引
# If we pass the above command ds[1], the output will be'a'
This is Explicit Indexing. Whereas, if we pass the command ds[1:3]
it will use the implicit index style,
这是显式索引。 而如果我们传递命令ds[1:3]
,它将使用隐式索引样式,
The output for the command ds[1:3]
will be,
命令ds[1:3]
的输出为
3 b
5 c
dtype: object
These slicing and indexing can lead to some sort of confusion. To avoid this, Python offers some special indexer
attributes:
这些切片和索引编制可能导致某种混乱。 为了避免这种情况,Python提供了一些特殊的indexer
属性:
- loc 位置
The loc
attribute allows indexing and slicing that always references the explicit index
loc
属性允许始终引用显式索引的索引和切片
- iloc iloc
The iloc
attribute allows indexing and slicing that always references the implicit index style
iloc
属性允许始终引用隐式索引样式的索引和切片
One common expression in Python code that everyone follows and practices is “explicit is better than implicit.”
每个人都遵循并实践的Python代码中的一个常见表达是“明确胜于隐含”。
Let’s take a sample dataset and see how indexing can be performed in different formats.
让我们以一个样本数据集为例,看看如何以不同的格式执行索引。
We are using the data of NBA players from kaggle.
我们正在使用kaggle提供的NBA球员数据。
The Dataset looks like this,
数据集看起来像这样,
单列 (Single Column)
To display a single column from the dataframe, we will mention the column name in the print statement.
为了显示数据框中的单个列,我们将在print语句中提及列名。
The output will look like this,
输出将如下所示:
多列 (Multiple Columns)
Let’s try to display the ‘Age’, ‘College’ and ‘Draft Year’ of the players. We can display multiple columns in the following way,
让我们尝试显示球员的“年龄”,“大学”和“起草年”。 我们可以通过以下方式显示多列,
The multiple columns will display like this,
多列将显示如下,
.loc方法 (.loc Method)
Indexing using .loc
method. If we use the .loc
method, we have to pass the data using its Label name.
使用.loc
方法建立索引。 如果使用.loc
方法,则必须使用其Label名称传递数据。
单排 (Single Row)
To display a single row from the dataframe, we will mention the row’s index name in the .loc
method.
为了显示数据框中的一行,我们将在.loc
方法中提及该行的索引名称。
The whole row information will display like this,
整行信息将像这样显示,
多行 (Multiple Rows)
Same as single row, pass the rows information in the print command to display the information.
与单行相同,在print命令中传递行信息以显示信息。
The output will be,
输出将是
选择行和列 (Selecting Rows and Columns)
We can also select multiple rows and multiple columns at a time using the .loc
method.
我们还可以使用.loc
方法一次选择多个行和多个列。
The output will be like this,
输出将是这样,
所有行和某些列 (All Rows and Some Columns)
To display all the rows with some columns using the .loc
method.
使用.loc
方法显示带有某些列的所有行。
The output of the above code will be like this,
上面代码的输出将是这样,
The same output can be achieved by simply giving column names without using the .loc
method as shown in Selecting Multiple Columns.
只需提供列名而无需使用 选择多列中所示的 .loc
方法 ,就可以实现相同的输出。
The output will be same as the one above,
输出将与上面的输出相同,
.iloc方法 (.iloc Method)
Indexing using the .iloc
method. If we use the .iloc
method, we have to pass the data using its Position. It is very similar to the .loc
method, the only difference is .iloc
uses integers to extract the information.
使用.iloc
方法建立索引。 如果使用.iloc
方法,则必须使用其Position传递数据。 它与.loc
方法非常相似,唯一的区别是.iloc
使用整数来提取信息。
单排 (Single Row)
We have to pass a single integer in the .iloc
method to get the row information.
我们必须在.iloc
方法中传递一个整数以获取行信息。
The output will be,
输出将是
多行 (Multiple Rows)
To select multiple rows, we have to pass the positions of the selected rows.
要选择多行,我们必须传递所选行的位置。
The output will look something like this,
输出看起来像这样,
选择行和列 (Selecting Rows and Columns)
To display a specific number of rows and columns, we create a list of integer for rows and a list of integer for columns and pass to iloc
function.
要显示特定数量的行和列,我们为行创建一个整数列表,为列创建一个整数列表,然后传递给iloc
函数。
The output will be,
输出将是
所有行和某些列 (All Rows and Some Columns)
To display all the rows, we have to pass “:” and the integers for columns.
要显示所有行,我们必须传递“:”和列的整数。
The output will look like something like this,
输出看起来像这样,
If the columns in the dataset are of “int” or “float” type, then we can apply all the numeric operations to the column directly and manipulate the data to our requirements.
如果数据集中的列为“ int”或“ float”类型,则我们可以将所有数值运算直接应用于该列,并根据需要操作数据。
使用.loc
方法的数值运算 (Numeric Operations using .loc
method)
The output will be like this,
输出将是这样,
使用.iloc
方法的数值运算 (Numeric operations using .iloc
Method)
The output will be same as the previous one with .loc
method
输出将与使用.loc
方法的上一个输出相同
结论 (Conclusion)
We can conclude this article in three simple statements.
我们可以用三个简单的语句来结束本文。
To avoid confusion on Explicit Indices and Implicit Indices we use
.loc
and.iloc
methods.为避免对显式索引和隐式索引造成混淆,我们使用
.loc
和.iloc
方法。.loc
method is used for label based indexing..loc
方法用于基于标签的索引。.iloc
method is used for position based indexing..iloc
方法用于基于位置的索引。
These are the three main statements, we need to be aware of while using indexing methods for a Pandas Dataframe in Python.
这是三个主要语句,我们在使用Python中的Pandas Dataframe的索引方法时需要注意。
Thank you for reading and Happy Coding!!!
感谢您的阅读和快乐编码!!!
在这里查看我以前关于Python的文章 (Check out my previous articles about Python here)
Seaborn: Python
Seaborn:Python
Pandas: Python
熊猫:Python
Matplotlib: Python
Matplotlib:Python
NumPy: Python
NumPy:Python
Data Visualization and its Importance: Python
数据可视化及其重要性:Python
Time Complexity and Its Importance in Python
时间复杂度及其在Python中的重要性
Python Recursion or Recursive Function in Python
Python中的Python递归或递归函数
翻译自: https://towardsdatascience.com/indexing-in-pandas-dataframe-using-python-63dcc6242323