Pandas处理表格的数据时,有时候需要对某一行或者某一列的一个值需要赋值,有时候也需要其他的数据操作,比如删除,更改等操作。
import pandas as pd
import numpy as np
生成DataFrame表格数据
dates = np.arange(20200101,20200107)
df1 = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['A','B','C','D'])
df1
A | B | C | D | |
---|---|---|---|---|
20200101 | 0 | 1 | 2 | 3 |
20200102 | 4 | 5 | 6 | 7 |
20200103 | 8 | 9 | 10 | 11 |
20200104 | 12 | 13 | 14 | 15 |
20200105 | 16 | 17 | 18 | 19 |
20200106 | 20 | 21 | 22 | 23 |
df1.iloc[2,2]
10
对具体位置重新赋值
df1.iloc[2,2] = 100
df1
A | B | C | D | |
---|---|---|---|---|
20200101 | 0 | 1 | 2 | 3 |
20200102 | 4 | 5 | 6 | 7 |
20200103 | 8 | 9 | 100 | 11 |
20200104 | 12 | 13 | 14 | 15 |
20200105 | 16 | 17 | 18 | 19 |
20200106 | 20 | 21 | 22 | 23 |
通过标签和表头来重新赋值
df1.loc[20200102,'B'] = 200
df1
A | B | C | D | |
---|---|---|---|---|
20200101 | 0 | 1 | 2 | 3 |
20200102 | 4 | 200 | 6 | 7 |
20200103 | 8 | 9 | 100 | 11 |
20200104 | 12 | 13 | 14 | 15 |
20200105 | 16 | 17 | 18 | 19 |
20200106 | 20 | 21 | 22 | 23 |
先判断某行某列满足某个条件,然后赋值
df1[df1.A > 10] = 0
df1
A | B | C | D | |
---|---|---|---|---|
20200101 | 0 | 1 | 2 | 3 |
20200102 | 4 | 200 | 6 | 7 |
20200103 | 8 | 9 | 100 | 11 |
20200104 | 0 | 0 | 0 | 0 |
20200105 | 0 | 0 | 0 | 0 |
20200106 | 0 | 0 | 0 | 0 |
df1.A[df1.A == 0] = 1
df1
A | B | C | D | |
---|---|---|---|---|
20200101 | 1 | 1 | 2 | 3 |
20200102 | 4 | 200 | 6 | 7 |
20200103 | 8 | 9 | 100 | 11 |
20200104 | 1 | 0 | 0 | 0 |
20200105 | 1 | 0 | 0 | 0 |
20200106 | 1 | 0 | 0 | 0 |
添加指定列
df1['E'] = 10 # 添加一列
df1
A | B | C | D | E | |
---|---|---|---|---|---|
20200101 | 1 | 1 | 2 | 3 | 10 |
20200102 | 4 | 200 | 6 | 7 | 10 |
20200103 | 8 | 9 | 100 | 11 | 10 |
20200104 | 1 | 0 | 0 | 0 | 10 |
20200105 | 1 | 0 | 0 | 0 | 10 |
20200106 | 1 | 0 | 0 | 0 | 10 |
调用Series添加指定列
df1['F'] = pd.Series([1,2,3,4,5,6],index=dates)
df1
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
20200101 | 1 | 1 | 2 | 3 | 10 | 1 |
20200102 | 4 | 200 | 6 | 7 | 10 | 2 |
20200103 | 8 | 9 | 100 | 11 | 10 | 3 |
20200104 | 1 | 0 | 0 | 0 | 10 | 4 |
20200105 | 1 | 0 | 0 | 0 | 10 | 5 |
20200106 | 1 | 0 | 0 | 0 | 10 | 6 |
对某行某列的值进行修改
df1.loc[20200107,['A','B','C']] = [1,2,3]
df1
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
20200101 | 1.0 | 1.0 | 2.0 | 3.0 | 10.0 | 1.0 |
20200102 | 4.0 | 200.0 | 6.0 | 7.0 | 10.0 | 2.0 |
20200103 | 8.0 | 9.0 | 100.0 | 11.0 | 10.0 | 3.0 |
20200104 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 4.0 |
20200105 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 5.0 |
20200106 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 6.0 |
20200107 | 1.0 | 2.0 | 3.0 | NaN | NaN | NaN |
调用 .append 函数来实现添加指定行
s1 = pd.Series([1,2,3,4,5,6],index=['A','B','C','D','E','F'])
s1.name = 'S1'
df2 = df1.append(s1)
df2
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
20200101 | 1.0 | 1.0 | 2.0 | 3.0 | 10.0 | 1.0 |
20200102 | 4.0 | 200.0 | 6.0 | 7.0 | 10.0 | 2.0 |
20200103 | 8.0 | 9.0 | 100.0 | 11.0 | 10.0 | 3.0 |
20200104 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 4.0 |
20200105 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 5.0 |
20200106 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 6.0 |
20200107 | 1.0 | 2.0 | 3.0 | NaN | NaN | NaN |
S1 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 | 6.0 |
调用 .insert 在具体位置插入列
df1.insert(1,'G',df2['E']) # 在第一列插入索引为G的df2中的E列
df1
A | G | B | C | D | E | F | |
---|---|---|---|---|---|---|---|
20200101 | 1.0 | 10.0 | 1.0 | 2.0 | 3.0 | 10.0 | 1.0 |
20200102 | 4.0 | 10.0 | 200.0 | 6.0 | 7.0 | 10.0 | 2.0 |
20200103 | 8.0 | 10.0 | 9.0 | 100.0 | 11.0 | 10.0 | 3.0 |
20200104 | 1.0 | 10.0 | 0.0 | 0.0 | 0.0 | 10.0 | 4.0 |
20200105 | 1.0 | 10.0 | 0.0 | 0.0 | 0.0 | 10.0 | 5.0 |
20200106 | 1.0 | 10.0 | 0.0 | 0.0 | 0.0 | 10.0 | 6.0 |
20200107 | 1.0 | NaN | 2.0 | 3.0 | NaN | NaN | NaN |
调用 .pop 弹出某列,并调用 .insert 插入某列
g = df1.pop('G') # 弹出G列
df1.insert(6,'G',g) # 在最后插入
df1
A | B | C | D | E | F | G | |
---|---|---|---|---|---|---|---|
20200101 | 1.0 | 1.0 | 2.0 | 3.0 | 10.0 | 1.0 | 10.0 |
20200102 | 4.0 | 200.0 | 6.0 | 7.0 | 10.0 | 2.0 | 10.0 |
20200103 | 8.0 | 9.0 | 100.0 | 11.0 | 10.0 | 3.0 | 10.0 |
20200104 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 4.0 | 10.0 |
20200105 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 5.0 | 10.0 |
20200106 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 6.0 | 10.0 |
20200107 | 1.0 | 2.0 | 3.0 | NaN | NaN | NaN | NaN |
调用 del 删除某列
del df1['G'] # 删除G列
df1
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
20200101 | 1.0 | 1.0 | 2.0 | 3.0 | 10.0 | 1.0 |
20200102 | 4.0 | 200.0 | 6.0 | 7.0 | 10.0 | 2.0 |
20200103 | 8.0 | 9.0 | 100.0 | 11.0 | 10.0 | 3.0 |
20200104 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 4.0 |
20200105 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 5.0 |
20200106 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 6.0 |
20200107 | 1.0 | 2.0 | 3.0 | NaN | NaN | NaN |
调用 .drop 删除指定列
df2 = df1.drop(['A','B'],axis=1) # 删除AB列
df2
C | D | E | F | |
---|---|---|---|---|
20200101 | 2.0 | 3.0 | 10.0 | 1.0 |
20200102 | 6.0 | 7.0 | 10.0 | 2.0 |
20200103 | 100.0 | 11.0 | 10.0 | 3.0 |
20200104 | 0.0 | 0.0 | 10.0 | 4.0 |
20200105 | 0.0 | 0.0 | 10.0 | 5.0 |
20200106 | 0.0 | 0.0 | 10.0 | 6.0 |
20200107 | 3.0 | NaN | NaN | NaN |
调用 .drop 删除某行
df2 = df1.drop([20200101,20200102],axis=0) #删除行
df2
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
20200103 | 8.0 | 9.0 | 100.0 | 11.0 | 10.0 | 3.0 |
20200104 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 4.0 |
20200105 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 5.0 |
20200106 | 1.0 | 0.0 | 0.0 | 0.0 | 10.0 | 6.0 |
20200107 | 1.0 | 2.0 | 3.0 | NaN | NaN | NaN |