python pandas创建新的列

给出一个dataframe数据,根据已有的数据列,按照一定条件创建新的数据列。

原数据:

import pandas as pd

df = pd.DataFrame({'team_A': ['Spain', 'Germany', 'Brazil', 'France'],
'team_B': ['USA', 'Argentina', 'Mexico', 'Belgium'],
'score_A': [5, 3, 2, 0],
'score_B': [4, 0, 3, 0]},
columns = ['team_A', 'team_B', 'score_A', 'score_B'])

print(df, '\n')

out:

    team_A     team_B  score_A  score_B
0    Spain        USA        5        4
1  Germany  Argentina        3        0
2   Brazil     Mexico        2        3
3   France    Belgium        0        0

Q:根据上面的数据,新增一列(win_team),用来存储得分较高的队伍,即score_A与score_B之间的较高者。

第一种方法:

#创建一个新列
df['win_team'] = ''
#比较A值与B值,task为Series
task = df['score_A'] - df['score_B']
#使用loc进行赋值
df.loc[task > 0, 'win_team'] = df.loc[task > 0, 'team_A']
df.loc[task < 0, 'win_team'] = df.loc[task < 0, 'team_B']
df.loc[task == 0, 'win_team'] = '=='
print(df)

out:

    team_A     team_B  score_A  score_B win_team
0    Spain        USA        5        4    Spain
1  Germany  Argentina        3        0  Germany
2   Brazil     Mexico        2        3   Mexico
3   France    Belgium        0        0       ==

第二种方法:

使用DataFram.iterrows()以及list来实现:

def find_win_team(df):
    win_team = []
    #iterrows遍历df的每一行的数据,row为Series类型数据
    for i,row in df.iterrows():
        if row['score_A'] > row['score_B']:
            win_team.append(row['team_A'])
        elif row['score_A'] < row['score_B']:
            win_team.append(row['team_B'])
        else:
            win_team.append('==')    
    return win_team

#函数返回list数据,存放到df新的列中
df['win_team'] = find_win_team(df)
print(df)

out:

    team_A     team_B  score_A  score_B win_team
0    Spain        USA        5        4    Spain
1  Germany  Argentina        3        0  Germany
2   Brazil     Mexico        2        3   Mexico
3   France    Belgium        0        0       ==

你可能感兴趣的:(python,pandas,python,数据分析)