safegraph数据预处理(一):解压给定目录内所有.gz类型的压缩文件

按照给定的路径file_dir遍历其中的所有文件后,给出file_dir内所有压缩包的全路径,并确认是否进行解压缩,输入y或Y并回车,可以继续,输入其他值都会取消。

import pandas
import os
import gzip

file_dir = 'D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08'
file_List =[]

#若有重名文件,则会自动覆盖
def un_gz(file_name):    
    # 获取文件的名称,去掉后缀名
    f_name = file_name.replace(".gz", "")
    # 开始解压
    g_file = gzip.GzipFile(file_name)
    #读取解压后的文件,并写入去掉后缀名的同名文件(即得到解压后的文件)
    open(f_name, "wb+").write(g_file.read())
    g_file.close()

for root, dirs, files in os.walk(file_dir):
    root_new=root.replace('\\','/') #windows下,打印或下一步操作需要将'\'变成'/'
    for file in files:
#         print(file[-3:])
        if file[-3:]=='.gz':
            file_List.append(root_new+'/'+file)
#     print(root) #当前目录路径
#     print(dirs) #当前路径下所有子目录
#     print(files) #当前路径下所有非目录子文件
    
for file in file_List:
    print(file)
#确认是否继续操作,若是,则会解压全部压缩包
input_YorN=input('以上为文件列表,是否全部解压?(Y/N):')
if input_YorN=='y'or input_YorN=='Y':
    for file in file_List:
#         un_gz(file)
        print('done:'+file)
else: 
        print('user cancel.')

输出结果:

D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part1.csv.gz
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part2.csv.gz
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part3.csv.gz
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part4.csv.gz
D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part5.csv.gz
以上为文件列表,是否全部解压?(Y/N):Y
done:D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part1.csv.gz
done:D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part2.csv.gz
done:D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part3.csv.gz
done:D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part4.csv.gz
done:D:/baidu_netdisk/safegraph/weeklyPlacesPatterns/patterns_backfill/2020/12/14/21/2020/06/08/patterns-part5.csv.gz

你可能感兴趣的:(safegraph,python)