给出一个dataframe数据,根据已有的数据列,按照一定条件创建新的数据列。
原数据:
import pandas as pd
df = pd.DataFrame({'team_A': ['Spain', 'Germany', 'Brazil', 'France'],
'team_B': ['USA', 'Argentina', 'Mexico', 'Belgium'],
'score_A': [5, 3, 2, 0],
'score_B': [4, 0, 3, 0]},
columns = ['team_A', 'team_B', 'score_A', 'score_B'])
print(df, '\n')
out:
team_A team_B score_A score_B
0 Spain USA 5 4
1 Germany Argentina 3 0
2 Brazil Mexico 2 3
3 France Belgium 0 0
Q:根据上面的数据,新增一列(win_team),用来存储得分较高的队伍,即score_A与score_B之间的较高者。
第一种方法:
#创建一个新列
df['win_team'] = ''
#比较A值与B值,task为Series
task = df['score_A'] - df['score_B']
#使用loc进行赋值
df.loc[task > 0, 'win_team'] = df.loc[task > 0, 'team_A']
df.loc[task < 0, 'win_team'] = df.loc[task < 0, 'team_B']
df.loc[task == 0, 'win_team'] = '=='
print(df)
out:
team_A team_B score_A score_B win_team
0 Spain USA 5 4 Spain
1 Germany Argentina 3 0 Germany
2 Brazil Mexico 2 3 Mexico
3 France Belgium 0 0 ==
第二种方法:
使用DataFram.iterrows()以及list来实现:
def find_win_team(df):
win_team = []
#iterrows遍历df的每一行的数据,row为Series类型数据
for i,row in df.iterrows():
if row['score_A'] > row['score_B']:
win_team.append(row['team_A'])
elif row['score_A'] < row['score_B']:
win_team.append(row['team_B'])
else:
win_team.append('==')
return win_team
#函数返回list数据,存放到df新的列中
df['win_team'] = find_win_team(df)
print(df)
out:
team_A team_B score_A score_B win_team
0 Spain USA 5 4 Spain
1 Germany Argentina 3 0 Germany
2 Brazil Mexico 2 3 Mexico
3 France Belgium 0 0 ==