获取dataframe某列最大值,获取pandas DataFrame中某一行的最大值的列名称

In the DataFrame

import pandas as pd

df=pd.DataFrame({'col1':[1,2,3],'col2':[3,2,1],'col3':[1,1,1]},index= ['row1','row2','row3'])

print df

col1 col2 col3

row1 1 3 1

row2 2 2 1

row3 3 1 1

I want to get the column names of the cells with the max value(s) over a certain row.

The desired output would be (in pseudocode):

get_column_name_for_max_values_of(row2)

>['col1','col2']

What would be the most concise way to express

get_column_name_for_max_values_of(row2)

?

解决方案

If not duplicates, you can use idxmax, but it return only first column of max value:

print (df.idxmax(1))

row1 col2

row2 col1

row3 col1

dtype: object

def get_column_name_for_max_values_of(row):

return df.idxmax(1).ix[row]

print (get_column_name_for_max_values_of('row2'))

col1

But with duplicates use boolean indexing:

print (df.ix['row2'] == df.ix['row2'].max())

col1 True

col2 True

col3 False

Name: row2, dtype: bool

print (df.ix[:,df.ix['row2'] == df.ix['row2'].max()])

col1 col2

row1 1 3

row2 2 2

row3 3 1

print (df.ix[:,df.ix['row2'] == df.ix['row2'].max()].columns)

Index(['col1', 'col2'], dtype='object')

And function is:

def get_column_name_for_max_values_of(row):

return df.ix[:,df.ix[row] == df.ix[row].max()].columns.tolist()

print (get_column_name_for_max_values_of('row2'))

['col1', 'col2']

你可能感兴趣的:(获取dataframe某列最大值,获取pandas DataFrame中某一行的最大值的列名称)