通过 POI的SXSSFWorkbook,使用操作系统的临时文件来作为缓存,可以生成超大的excel 文件(我自己测试到500W,就没往下测了)。
记得使用压缩。关键代码
SXSSFWorkbook wb = null;
try {
wb = new SXSSFWorkbook();
wb.setCompressTempFiles(true); //压缩临时文件,很重要,否则磁盘很快就会被写满
...
} finally {
if (wb != null) {
wb.dispose();// 删除临时文件,很重要,否则磁盘可能会被写满
}
}
由于业务需要,最近要做一个导出超大数据的功能。之间已经有人做过一版,由于受到POI 导出超大数据量时会出错的影响,它把一个大文件拆成很多个小文件,然后再压缩下载,结果经常出现少一两个文件的问题。
支持单个 excel 的 sheet 导出100w 的数据
首先想到的是导出 csv 文件,最方便。但是调研后,也是最快放弃的,因为它存在两个很严重的问题:
导出格式 | 1w | 10w | 30w | 50w | 70w | 90w | 100w |
---|---|---|---|---|---|---|---|
csv | 4.0K/120ms | 50M/1261ms | 160M/3828ms | 271M/7415ms | 381M/8929ms | 491M/11356ms | 546M/13688ms |
每行30个字段,每个字段里的内容由 Math.random()产生
大数据量的情况下,csv 的表现较差。只能考虑 excel. 对 excel 作了一个简单的测试
指标 | 1w | 2w | 3w | 4w | 5w | 6w | 7w | 8w | 10w |
---|---|---|---|---|---|---|---|---|---|
耗时 | 3326ms | 6483ms | 7894 ms | 9899 ms | 12873 ms | 15198 ms | 17362 ms | 20106 ms | 25494 ms |
导出文件大小 | 3.7M | 7.4MM | 12M | 15M | 19M | 23M | 26M | 30M | 37M |
cpu 使用率 | 100% | 100% | 100% | 100% | 100% | 200% | 200% | 800% | 900% |
cpu 使用率均指稳定时的 cpu 使用率
发现几个很严重的问题:
excel 在内存里存储地越来越大,研究到了瓶颈。要解决这个问题,有两种方案:
这时,在 POI 的文档里发现了SXSSFWorkbook,其支持使用临时文件,可以用来生成超大 Excel 文件。
Since 3.8-beta3, POI provides a low-memory footprint SXSSF API built on top of XSSF.
SXSSF is an API-compatible streaming extension of XSSF to be used when very large
spreadsheets have to be produced, and heap space is limited. SXSSF achieves its
low memory footprint by limiting access to the rows that are within a sliding window,
while XSSF gives access to all rows in the document. Older rows that are no longer
in the window become inaccessible, as they are written to the disk.
In auto-flush mode the size of the access window can be specified, to hold a certain
number of rows in memory. When that value is reached, the creation of an additional
row causes the row with the lowest index to to be removed from the access window and
written to disk. Or, the window size can be set to grow dynamically; it can be trimmed
periodically by an explicit call to flushRows(int keepRows) as needed.
Due to the streaming nature of the implementation, there are the following
limitations when compared to XSSF:
* Only a limited number of rows are accessible at a point in time.
* Sheet.clone() is not supported.
* Formula evaluation is not supported
以下是 SXSSFWorkbook的测试结果:
使用缓存文件导出 excel
指标 | 10w | 20w | 30w | 50w | 80w | 100w | 150w | 200w | 300w |
---|---|---|---|---|---|---|---|---|---|
导出文件大小 | 37M | 74M | 111M | 184M | 295M | 368M | 552M | 736M | 1.1G |
耗时(ms) | 16259 | 29516 | 45846 | 75503 | 120434 | 156484 | 233730 | 303510 | 463399 |
cpu 使用率 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
内存使用(k) | 149460 | 176576 | 141940 | 143700 | 168460 | 180168 | 169632 | 198320 | 187484 |
缓存文件大小 | 37M | 74M | 111M | 185M | 295M | 369M | 553M | 737M | 1.1G |
可以看到,其在性能与资源耗用上都比较平均,至此,问题完美解决。
SXSSFWorkbook在使用上有一些注意项
SXSSF flushes sheet data in temporary files (a temp file per sheet) and the size
of these temporary files can grow to a very large value. For example, for a 20 MB
csv data the size of the temp xml becomes more than a gigabyte. If the size of the
temp files is an issue, you can tell SXSSF to use gzip compression:
SXSSFWorkbook wb = new SXSSFWorkbook();
wb.setCompressTempFiles(true); // temp files will be gzipped
private static void prcoessCSV(int rowsNum) throws Exception {
try {
long startTime = System.currentTimeMillis();
final int NUM_OF_ROWS = rowsNum;
final int NUM_OF_COLUMNS = 30;
File file = new File("ooxml-scatter-chart_" + rowsNum + ".csv");
BufferedWriter bf = new BufferedWriter(new FileWriter(file));
StringBuffer sb = new StringBuffer();
try {
for (int rownum = 0; rownum < NUM_OF_ROWS; rownum++) {
for (int cellnum = 0; cellnum < NUM_OF_COLUMNS; cellnum++) {
sb.append(Math.random());
if ((cellnum + 1) != NUM_OF_COLUMNS) {
sb.append(",");
}
}
sb.append("\n");
if (rownum % 10000 == 0) {
bf.write(sb.toString());
sb = new StringBuffer();
}
}
bf.close();
} catch (Exception ex) {
ex.printStackTrace();
}
long endTime = System.currentTimeMillis();
System.out.println("process " + rowsNum + " spent time:" + (endTime - startTime));
} catch (Exception e) {
e.printStackTrace();
throw e;
}
}
excel,不使用缓存
try {
long startTime = System.currentTimeMillis();
final int NUM_OF_ROWS = rowsNum;
final int NUM_OF_COLUMNS = 30;
Workbook wb = new XSSFWorkbook();
Sheet sheet = wb.createSheet("Sheet 1");
// Create a row and put some cells in it. Rows are 0 based.
Row row;
Cell cell;
for (int rowIndex = 0; rowIndex < NUM_OF_ROWS; rowIndex++) {
row = sheet.createRow(rowIndex);
for (int colIndex = 0; colIndex < NUM_OF_COLUMNS; colIndex++) {
cell = row.createCell(colIndex);
cell.setCellValue(Math.random());
}
}
// Write the output to a file
FileOutputStream out = new FileOutputStream("ooxml-scatter-chart_XSSF_" + rowsNum + ".xlsx");
wb.write(out);
out.close();
wb.close();
long endTime = System.currentTimeMillis();
System.out.println("process " + rowsNum + " spent time:" + (endTime - startTime));
} catch (Exception e) {
e.printStackTrace();
throw e;
}
excel,使用缓存
try {
long startTime = System.currentTimeMillis();
final int NUM_OF_ROWS = rowsNum;
final int NUM_OF_COLUMNS = 30;
SXSSFWorkbook wb = null;
try {
wb = new SXSSFWorkbook();
wb.setCompressTempFiles(true); //压缩临时文件,很重要,否则磁盘很快就会被写满
Sheet sh = wb.createSheet();
int rowNum = 0;
for (int num = 0; num < NUM_OF_ROWS; num++) {
if (num % 100_0000 == 0) {
sh = wb.createSheet("sheet " + num);
rowNum = 0;
}
rowNum++;
Row row = sh.createRow(rowNum);
for (int cellnum = 0; cellnum < NUM_OF_COLUMNS; cellnum++) {
Cell cell = row.createCell(cellnum);
cell.setCellValue(Math.random());
}
}
FileOutputStream out = new FileOutputStream("ooxml-scatter-chart_SXSSFW_" + rowsNum + ".xlsx");
wb.write(out);
out.close();
} catch (Exception ex) {
ex.printStackTrace();
} finally {
if (wb != null) {
wb.dispose();// 删除临时文件,很重要,否则磁盘可能会被写满
}
}
long endTime = System.currentTimeMillis();
System.out.println("process " + rowsNum + " spent time:" + (endTime - startTime));
} catch (Exception e) {
e.printStackTrace();
throw e;
}
作者:听风过隙
链接:https://www.jianshu.com/p/a1a885e09b13
来源:简书