// A code block
import pandas as pd
import numpy as np
df = pd.read_csv('data.csv')
df = df.sort_values(['user','date'])
df_B = df[df['indc'] == 'B']
df_S = df[df['indc'] == 'S']
df['vol-sign'] = np.where(df['indc']=='B',df['vol'],-df['vol']
df['cde'] = df.groupby('user')['vol-sign'].cumsum()
\\data.csv
user,vol,prc,date,indc,cde
a01,42,72,2019.07.22,B,
a01,42,72,2019.07.20,B,
a01,42,72,2019.07.22,S,
a01,42,72,2019.07.22,B,
a02,42,72,2019.07.22,B,
a02,42,72,2019.07.22,B,
a02,42,72,2019.07.20,S,
a03,42,72,2019.07.22,B,
a03,42,72,2019.07.20,B,
a03,42,72,2019.07.22,S,
a03,42,72,2019.07.22,B,
dataframe比较适合整体操作,需要进行逐行运算时,效率太低!
建议转回numpy操作。
比如对于问题:
for i in range(1,10000000):
df.iloc[i,3] = df.iloc[i-1,3]*df[i,1]+df[i,2]