python3使用openpyxl导出巨大量数据到excel文件

今天接到一个业务的需求,需要把一部分据大量的数据导出excel,数据达到千万级
从一开始打算直接使用plsql导出tsv文件,然后直接脚本处理,后来发现数据量太大,在查询的过程中一不小心出现闪断,前功尽弃。于是根据条件进行数据分片,大的几百万,小的几十万。
针对office2007以后的版本,xlsx文件上限行数大约为100多万条的样子。
以上是前提。
开始使用openpyxl将tsv文件记录转换成excel的时候,想着尽可能多的将记录写入一个文件,后来发现太天真,8G内存直接占满,使用量99% - 100%。没奈何使用分片方法,大约50万条记录生成一个xlsx文件,
脚本如下代码区。
实际运行过程中,内存使用量稳定在30%-40%之间,还算可以。后边是日志。

file_nums = ["0112","0115","0123","0125","0126","0129","0130"]

start = time.time()
lines = []
try:
    for file_num in file_nums:
        file_name_ext = 1
        file_path = "F:\\export{}.tsv".format(file_num)
        out_path = "B:\\export{}{}.xlsx".format(file_num,"_"+str(file_name_ext))

        lines.clear()
        with open(file_path, 'r', encoding="utf-8") as file:
            print("write all file begin")
            line_num = 0
            for line in file:
                line_tmp = line.split('\t')
                lines.append(line_tmp)
                line_num += 1
                if line_num % 500000 == 0:
                    part_start = time.time()
                    print("write file {} at :{}".format(out_path,part_start))
                    workbook = openpyxl.Workbook(write_only=True)
                    sheet = workbook.create_sheet()

                    for l in lines:
                        sheet.append(l)

                    workbook.save(out_path)
                    workbook.close()
                    workbook = None
                    file_name_ext += 1
                    out_path = "B:\\export{}{}.xlsx".format(file_num, "_" + str(file_name_ext))

                    part_end = time.time()
                    print("file {} write done at :{}".format(out_path, part_end))
                    print("part used : {}".format(str(part_end - part_start)))
                    lines.clear()

            if lines and len(lines) > 0:
                part_start = time.time()
                print("write file {} at :{}".format(out_path, part_start))
                workbook = openpyxl.Workbook(write_only=True)
                sheet = workbook.create_sheet()

                for l in lines:
                    sheet.append(l)

                workbook.save(out_path)
                workbook.close()
                workbook = None
                part_end = time.time()
                print("file {} write done at :{}".format(out_path, part_end))
                print("part used : {}".format(str(part_end - part_start)))
                lines.clear()

except Exception as e:
    print(e)

end = time.time()
total = end - start
print("write all file finish, used {} times".format(total))

控制台打印日志部分:

write all file begin
write file B:\export0112_1.xlsx at :1517926709.4048114
file B:\export0112_2.xlsx write done at :1517926758.3921452
part used : 48.98733377456665
write file B:\export0112_2.xlsx at :1517926759.657679
file B:\export0112_3.xlsx write done at :1517926808.537671
part used : 48.87999200820923
write file B:\export0112_3.xlsx at :1517926809.696322
file B:\export0112_4.xlsx write done at :1517926859.3176138
part used : 49.62129187583923
write file B:\export0112_4.xlsx at :1517926860.551937
file B:\export0112_5.xlsx write done at :1517926910.1919634
part used : 49.640026330947876
write file B:\export0112_5.xlsx at :1517926911.337625
file B:\export0112_6.xlsx write done at :1517926960.894076
part used : 49.556451082229614
write file B:\export0112_6.xlsx at :1517926962.0258036
file B:\export0112_7.xlsx write done at :1517927011.4833808
part used : 49.45757722854614
write file B:\export0112_7.xlsx at :1517927012.7268114
file B:\export0112_8.xlsx write done at :1517927061.5736873
part used : 48.8468759059906
write file B:\export0112_8.xlsx at :1517927062.6536007
file B:\export0112_9.xlsx write done at :1517927111.822423
part used : 49.168822288513184
write file B:\export0112_9.xlsx at :1517927113.0346901
file B:\export0112_10.xlsx write done at :1517927162.1026373
part used : 49.06794714927673
write file B:\export0112_10.xlsx at :1517927163.3083832
file B:\export0112_11.xlsx write done at :1517927212.4063733
part used : 49.09799003601074
write file B:\export0112_11.xlsx at :1517927212.8430336
file B:\export0112_11.xlsx write done at :1517927230.3664558
part used : 17.523422241210938
write all file begin
write file B:\export0115_1.xlsx at :1517927230.3875127
file B:\export0115_1.xlsx write done at :1517927230.4441934
part used : 0.05668067932128906
write all file begin
write file B:\export0123_1.xlsx at :1517927231.383569
file B:\export0123_1.xlsx write done at :1517927265.6331534
part used : 34.249584436416626
write all file begin
write file B:\export0125_1.xlsx at :1517927267.00361
file B:\export0125_2.xlsx write done at :1517927316.4248767
part used : 49.42126679420471
write file B:\export0125_2.xlsx at :1517927317.801019
file B:\export0125_2.xlsx write done at :1517927370.0042286
part used : 52.20320963859558
write all file begin
write file B:\export0126_1.xlsx at :1517927371.6056333
file B:\export0126_2.xlsx write done at :1517927427.7860975
part used : 56.18046426773071
write file B:\export0126_2.xlsx at :1517927429.2113287
file B:\export0126_3.xlsx write done at :1517927481.465444
part used : 52.25411534309387
write file B:\export0126_3.xlsx at :1517927482.6818128
file B:\export0126_3.xlsx write done at :1517927526.3801858
part used : 43.69837307929993
write all file begin
write file B:\export0129_1.xlsx at :1517927527.860628
file B:\export0129_2.xlsx write done at :1517927580.998396
part used : 53.137768030166626
write file B:\export0129_2.xlsx at :1517927582.5338671
file B:\export0129_3.xlsx write done at :1517927640.1109939
part used : 57.5771267414093
write file B:\export0129_3.xlsx at :1517927640.3472462
file B:\export0129_3.xlsx write done at :1517927647.2646072
part used : 6.91736102104187
write all file begin
write file B:\export0130_1.xlsx at :1517927647.2751348
file B:\export0130_1.xlsx write done at :1517927647.291679
part used : 0.016544103622436523
write all file finish, used 938.9572479724884 times

Process finished with exit code 0

你可能感兴趣的:(python,数据,excel,office,python)