本文翻译自:How to iterate over rows in a DataFrame in Pandas?
I have a DataFrame
from pandas: 我有一个来自熊猫的DataFrame
:
import pandas as pd
inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
df = pd.DataFrame(inp)
print df
Output: 输出:
c1 c2
0 10 100
1 11 110
2 12 120
Now I want to iterate over the rows of this frame. 现在,我要遍历该框架的行。 For every row I want to be able to access its elements (values in cells) by the name of the columns. 对于每一行,我希望能够通过列名访问其元素(单元格中的值)。 For example: 例如:
for row in df.rows:
print row['c1'], row['c2']
Is it possible to do that in pandas? 熊猫有可能这样做吗?
I found this similar question . 我发现了类似的问题 。 But it does not give me the answer I need. 但这并不能给我我所需的答案。 For example, it is suggested there to use: 例如,建议在那里使用:
for date, row in df.T.iteritems():
or 要么
for row in df.iterrows():
But I do not understand what the row
object is and how I can work with it. 但是我不明白什么是row
对象以及如何使用它。
参考:https://stackoom.com/question/178Oq/如何在Pandas的DataFrame中的行上进行迭代
You should use df.iterrows()
. 您应该使用df.iterrows()
。 Though iterating row-by-row is not especially efficient since Series objects have to be created. 尽管逐行迭代并不是特别有效,因为必须创建Series对象。
DataFrame.iterrows is a generator which yield both index and row DataFrame.iterrows是产生索引和行的生成器
import pandas as pd
import numpy as np
df = pd.DataFrame([{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}])
for index, row in df.iterrows():
print(row['c1'], row['c2'])
Output:
10 100
11 110
12 120
You can also use df.apply()
to iterate over rows and access multiple columns for a function. 您还可以使用df.apply()
遍历行并访问一个函数的多列。
docs: DataFrame.apply() docs:DataFrame.apply()
def valuation_formula(x, y):
return x * y * 0.5
df['price'] = df.apply(lambda row: valuation_formula(row['x'], row['y']), axis=1)
While iterrows()
is a good option, sometimes itertuples()
can be much faster: 尽管iterrows()
是一个不错的选择,但有时itertuples()
可以更快:
df = pd.DataFrame({'a': randn(1000), 'b': randn(1000),'N': randint(100, 1000, (1000)), 'x': 'x'})
%timeit [row.a * 2 for idx, row in df.iterrows()]
# => 10 loops, best of 3: 50.3 ms per loop
%timeit [row[1] * 2 for row in df.itertuples()]
# => 1000 loops, best of 3: 541 µs per loop
You can use the df.iloc function as follows: 您可以按以下方式使用df.iloc函数:
for i in range(0, len(df)):
print df.iloc[i]['c1'], df.iloc[i]['c2']