safegraph数据预处理(三):将csv文件按指定字段不同的值进行拆分

将Nin1.csv按region不同的值进行拆分,保存为xxx-region.csv,经验证全部55个子文件大小之和等于父文件的大小。

import pandas as pd
import time
# fileLocation='D:/2020-06-08-weekly-patterns.csv'
# fileLocation='D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part1.csv'
file_loc='D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/2020-06-08-weekly-patterns-5in1.csv'
timee=time.process_time()

df=pd.read_csv(file_loc)
print(time.process_time()-timee)
timee=time.process_time()


regions=df['region'].unique()
print(regions)

# df1=df[df["region"].str.contains("NY")]
# print(time.process_time()-timee)
# timee=time.process_time()
length=len(regions)
for i in range(0,length):
    print('processing '+regions[i]+', schedule:'+str(i+1)+'/'+str(length)+'...')
    
    df_by_region=df[df['region'].str.contains(regions[i])]
    new_file_loc=file_loc[:-8]+regions[i]+'.csv'
    print('new_file_loc:')
    print(new_file_loc)
#     將子集保存
    df_by_region.to_csv(new_file_loc,index=False)
    
    print(regions[i]+' success!')
print('ALL region SUCCESS!')

显示结果:

['IA' 'TX' 'OK' 'OR' 'NC' 'AR' 'PA' 'NY' 'WA' 'MS' 'MA' 'NJ' 'IL' 'VA'
 'VT' 'FL' 'WV' 'OH' 'MI' 'AZ' 'IN' 'GA' 'MN' 'ME' 'MO' 'TN' 'SC' 'CA'
 'WY' 'WI' 'NH' 'CO' 'UT' 'NE' 'KS' 'AL' 'MD' 'AK' 'LA' 'SD' 'HI' 'DE'
 'KY' 'MT' 'CT' 'RI' 'NV' 'ID' 'NM' 'PR' 'DC' 'ND' 'GU' 'VI' 'AS']
processing IA, schedule:1/55...
new_file_loc:
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/2020-06-08-weekly-patterns-IA.csv
IA success!
processing TX, schedule:2/55...
new_file_loc:
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/2020-06-08-weekly-patterns-TX.csv
TX success!
...
...
...
processing VI, schedule:54/55...
new_file_loc:
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/2020-06-08-weekly-patterns-VI.csv
VI success!
processing AS, schedule:55/55...
new_file_loc:
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/2020-06-08-weekly-patterns-AS.csv
AS success!
ALL region SUCCESS!

将大文件拆分成了小文件:
safegraph数据预处理(三):将csv文件按指定字段不同的值进行拆分_第1张图片

你可能感兴趣的:(safegraph,python)