《Python数据分析技术栈》第06章使用 Pandas 准备数据 07 修改 DataFrame 对象 Modifying DataFrame objects

07 修改 DataFrame 对象 Modifying DataFrame objects

《Python数据分析技术栈》第06章使用 Pandas 准备数据 07 修改 DataFrame 对象 Modifying DataFrame objects

In this section, we will learn how to change the names of columns and add and delete columns and rows.

在本节中,我们将学习如何更改列名以及添加和删除列和行。

重新命名列 Renaming columns

The names of the columns can be changed using the rename method. A dictionary is passed as an argument to this method. The keys for this dictionary are the old column names, and the values are the new column names.

可以使用重命名方法更改列名。该方法的参数是一个字典。该字典的键是旧的列名,值是新的列名。

combined_ages.rename(columns={'class 1':'batch 1','class 2':'batch 2','class 3':'batch 3'},inplace=True)
combined_ages

The reason we use the inplace parameter so that the changes are made in the actual DataFrame object.

我们之所以使用 inplace 参数,是因为要在实际的 DataFrame 对象中进行更改。

Renaming can also be done by accessing the columns attribute directly and mentioning the new column names in an array, as shown in the following example.

重命名还可以直接访问列属性,并在数组中提及新列名,如下例所示。

combined_ages.columns=['batch 1','batch 2','batch 3']

Renaming using the dictionary format is a more straightforward method for renaming columns, and the changes are made to the original DataFrame object. The disadvantage with this method is that one needs to remember the order of the columns in the DataFrame. When we used the rename method, we used a dictionary where we knew which column names we were changing.

使用字典格式重命名是一种更直接的列重命名方法,而且更改是在原始 DataFrame 对象中进行的。这种方法的缺点是需要记住 DataFrame 中列的顺序。当我们使用重命名方法时,我们使用的是字典,我们知道要更改哪些列名。

替换 DataFrame 中的值或观测值 Replacing values or observations in a DataFrame

The replace method can be used to replace values in a DataFrame. We can again use the dictionary format, with the key/value pair representing the old and new values. Here, we replace the value 22 with the value 33.

replace 方法可用于替换 DataFrame 中的值。我们可以再次使用字典格式,用键/值对代表新旧值。在这里,我们用 33 替换数值 22。

combined_ages.replace({22:33})

为 DataFrame 添加新列 Adding a new column to a DataFrame

There are four ways to insert a new column in a DataFrame, as shown in Table 6-3.

在 DataFrame 中插入新列有四种方法,如表 6-3 所示。

With the indexing operator, [ ]:By mentioning the column name as a string within the indexing operator and assigning it values, we can add a column.

使用索引操作符 [ ]:通过在索引操作符中以字符串形式提及列名并为其赋值,我们可以添加一列。

combined_ages['class 4']=[18,40]
combined_ages

Using the insert method:The insert method can be used for adding a column. Three arguments need to be passed to this method, mentioned in the following. The first argument is the index where you want to insert the new column (in this case the index is 2, which means that the new column is added as the third column of our DataFrame) The second argument is the name of the new column you want to insert (“class 0” in this example) The third argument is the list containing the values of the new column (18 and 35 in this case) All the three parameters are mandatory for the insert method to be able to add a column successfully.

使用插入方法:插入方法可用于添加列。该方法需要传递三个参数,如下所述。第一个参数是要插入新列的索引(本例中索引为 2,这意味着新列被添加为 DataFrame 的第三列);第二个参数是要插入的新列的名称(本例中为 “class 0”);第三个参数是包含新列值的列表(本例中为 18 和 35)。

combined_ages.insert(2,'class 0',[18,35])
combined_ages

Using the loc indexer:The loc indexer is generally used for retrieval of values in from Series and DataFrames, but it can also be used for inserting a column. In the preceding statement, all the rows are selected using the : operator. This operator is followed by the name of the column to be inserted. The values for this column are enclosed within a list.

使用 loc 索引器:loc 索引器通常用于从系列和数据帧中检索值,但也可用于插入列。在前面的语句中,使用:操作符选择了所有行。该操作符之后是要插入的列名。该列的值被括入一个列表中。

combined_ages.loc[:,'class 4']=[20,40]
combined_ages

Using the concat function:First, the column to be added (“class5” in this case) is defined as a Series object. It is then added to the DataFrame object using the pd.concat function. The axis needs to be mentioned as “1” since the new data is being added along the column axis.

使用协程函数:首先,将要添加的列(本例中为 “class5”)定义为系列对象。然后使用 pd.concat 函数将其添加到 DataFrame 对象中。由于新数据是沿着列轴添加的,因此需要将轴设置为 “1”。

class5=pd.Series([31,48])
combined_ages=pd.concat([combined_ages,class5],axis=1)
combined_ages

In summary, we can add a column to a DataFrame using the indexing operator, loc indexer, insert method, or concat function. The most straightforward and commonly used method for adding a column is by using the indexing operator [].

总之,我们可以使用索引操作符、loc 索引器、插入方法或连接函数向 DataFrame 添加列。使用索引操作符 [] 添加列是最直接、最常用的方法。

在数据帧中插入行 Inserting rows in a DataFrame

There are two methods for adding rows in a DataFrame, either by using the append method or with the concat function, as shown in Table 6-4.

在 DataFrame 中添加行有两种方法,一种是使用 append 方法,另一种是使用 concat 函数,如表 6-4 所示。

Using the append method:The argument to the append method- the data that needs to be added - is defined as a dictionary. This dictionary is then passed as an argument to the append method. Setting the ignore_index=True parameter prevents an error from being thrown. This parameter resets the index. While using the append method, we need to ensure that we either use the ignore_index parameter or give a name to a Series before appending it to a DataFrame. Note that the append method does not have an inplace parameter that would ensure that the changes reflect in the original object; hence we need to set the original object to point to the new object created using append, as shown in the preceding code.

使用 append 方法:append 方法的参数(需要添加的数据)定义为字典。然后将该字典作为参数传递给 append 方法。设置 ignore_index=True 参数可防止抛出错误。该参数会重置索引。在使用 append 方法时,我们需要确保使用 ignore_index 参数或在将系列追加到 DataFrame 之前为其命名。请注意,append 方法没有 inplace 参数来确保更改反映在原始对象中;因此,我们需要设置原始对象,使其指向使用 append 创建的新对象,如前面的代码所示。

combined_ages=combined_ages.append({'class 1':35,'class 2':33,'class 3':21},ignore_index=True)
combined_ages

Using the pd.concat function:The pd.concat function is used to add new rows as shown in the preceding syntax. The new row to be added is defined as a DataFrame object. Then the pd.concat function is called and the names of the two DataFrames (the original DataFrame and the new row defined as a DataFrame) are passed as arguments.

使用 pd.concat 函数:如前面的语法所示,pd.concat 函数用于添加新行。要添加的新行定义为 DataFrame 对象。然后调用 pd.concat 函数,并将两个 DataFrame(原始 DataFrame 和定义为 DataFrame 的新行)的名称作为参数传递。

new_row=pd.DataFrame([{'class 1':32,'class 2':37,'class 3':41}])
pd.concat([combined_ages,new_row])

In summary, we can use either the append method or concat function for adding rows to a DataFrame.

总之,我们可以使用 append 方法或 concat 函数向 DataFrame 添加行。

从数据帧中删除列 Deleting columns from a DataFrame

Three methods can be used to delete a column from a DataFrame, as shown in Table 6-5.

从 DataFrame 中删除列可以使用三种方法,如表 6-5 所示。

del function:The preceding statement deletes the last column (with the name,“class 3”). Note that the deletion occurs inplace, that is, in the original DataFrame itself.

del 函数:前面的语句删除了最后一列(名称为 “class 3”)。请注意,删除是就地进行的,也就是说,是在原始 DataFrame 本身中进行的。

del combined_ages['class 3']
combined_ages

Using the pop method:The pop method deletes a column inplace and returns the deleted column as a Series object.

使用 pop 方法:pop 方法用于就地删除列,并将删除的列作为 Series 对象返回。

combined_ages.pop('class 2')

Using the drop method:The column(s) that needs to be dropped is mentioned as a string within a list, which is then passed as an argument to the drop method. Since the drop method removes rows (axis=0) by default, we need to specify the axis value as “1” if we want to remove a column.Unlike the del function and pop method, the deletion using the drop method does not occur in the original DataFrame object, and therefore, we need to add the inplace parameter.

使用 drop 方法:需要删除的列以字符串形式在列表中列出,然后作为参数传递给 drop 方法。与 del 函数和 pop 方法不同,使用 drop 方法进行的删除不会在原始 DataFrame 对象中发生,因此我们需要添加 inplace 参数。

combined_ages.drop(['class 1'],axis=1,inplace=True)
combined_ages

To sum up, we can use the del function, pop method, or drop method to delete a column from a DataFrame.

总之,我们可以使用 del 函数、pop 方法或 drop 方法从 DataFrame 中删除一列。

从数据帧中删除一行 Deleting a row from a DataFrame

There are two methods for removing rows from a DataFrame – either by using a Boolean selection or by using the drop method, as shown in Table 6-6.

从 DataFrame 中删除行有两种方法,一种是使用布尔选择,另一种是使用下拉方法,如表 6-6 所示。

Using a Boolean selection:We use the NOT operator (~) to remove the rows that we do not want. Here, we remove all values in the DataFrame that are less than 50.

使用布尔选择:我们使用 NOT 运算符 (~) 删除不需要的记录。在这里,我们要删除 DataFrame 中所有小于 50 的值。

combined_ages[~(combined_ages.values<50)]

Using the drop method:Here, we remove the second row, which has a row index of 1. If there is more than one row to be removed, we need to specify the indexes of the rows in a list.

使用 drop 方法:在此,我们删除行索引为 1 的第二行。 如果要删除的行不止一条,我们需要在列表中指定行的索引。

combined_ages.drop(1)

Thus, we can use either a Boolean selection or the drop method to remove rows from a DataFrame. Since the drop method works with the removal of both rows and columns, it can be used uniformly. Remember to add the required parameters to the drop method. For removing columns, the axis (=1) parameter needs to be added. For changes to reflect in the original DataFrame, the inplace (=True) parameter needs to be included.

因此,我们可以使用布尔选择或下拉方法来删除 DataFrame 中的行。由于 drop 方法可以同时删除行和列,因此可以统一使用。切记要为 drop 方法添加所需的参数。要删除列,需要添加轴 (=1) 参数。若要在原始 DataFrame 中反映更改,则需要包含 inplace (=True) 参数。

你可能感兴趣的:(Python数据分析技术栈,python,数据分析,python,pandas,数据分析)