DataFrame插入多列PerformanceWarning: DataFrame is highly fragmented.

DataFrame插入多列PerformanceWarning: DataFrame is highly fragmented.

dataframe列比较多,增加列的代码如下:

df=pd.DataFrame()
for i in range(1000):
    vlist=[]
    
    for j in range(1000):
       vlist.append(j)     
    df['COL_' + str(i)] =  vlist
                 
df

警告错误:

/tmp/ipykernel_27622/2631638338.py:7: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()

df[‘COL_’ + str(i)] = vlist 就是insert ,提示碎片多,执行的时间长。
按提示,用pd.concat(axis=1) 增加列数据。
做一个中间的dataframe变量,通过pd.concat()将两个dataframe变量合并,赋值到df变量中,解决insert效率低,碎片多的警告错误。

df=pd.concat([df,frames], axis=1)

修改后代码如下:

df=pd.DataFrame()
for i in range(1000):
    vlist=[]
    
    for j in range(1000):
       vlist.append(j)     
    frames = pd.DataFrame(pd.Series(vlist),columns=['COL_' + str(i)])
    df=pd.concat([df,frames], axis=1)
                 
df

运行速度快了不少,不再提示警告错误。

你可能感兴趣的:(pandas,Python,python,pandas)