dump bz2可能多花20-30倍的时间

In [31]: arr=np.random.rand(10000,10000)                                                                                                                                                       

In [32]: arr.__sizeof__() / 1024 /1024                                                                                                                                                         
Out[32]: 762.9395599365234

In [33]: start_time = time();  dump(arr, "arr.pkl"); cost_time = time()-start_time; print(cost_time)                                                                                           
2.146432399749756

In [34]: start_time = time();  dump(arr, "arr.bz2"); cost_time = time()-start_time; print(cost_time)                                                                                           
73.71727752685547

In [7]: y = np.random.randint(0, 1, [10000])                                           

In [8]: from sklearn.ensemble import RandomForestClassifier 

In [28]: start_time = time();  dump(dict_, "dict.pkl"); cost_time = time()-start_time; 
    ...: print(cost_time)                                                              
1.5185284614562988

In [29]: df=pd.DataFrame(arr, columns=[f"col{i}" for i in range(10000)])               

In [30]: df.__sizeof__() / 1024 /1024                                                  
Out[30]: 762.9395751953125

In [31]: start_time = time();  dump(df, "df.pkl"); cost_time = time()-start_time; print
    ...: (cost_time)                                                                   
0.9628500938415527

In [32]: start_time = time();  dump(arr, "arr.pkl"); cost_time = time()-start_time; pri
    ...: nt(cost_time)                                                                 
0.45241856575012207


你可能感兴趣的:(automl)