之前在项目中用到了大数据文件的导入,再次总结一下心路里程
数据文件有两种可以选xls,txt.(200M+)
由于之前有利用jxl和POI的经验,所以首先就选择了xls文件. 但是在实施是总是报java堆栈不够用.在几次增加了堆栈之后还是无果.
这是由于JXL在处理时,一次把整个文件全部读入并解析的原因.因此只能另寻他路,选择了利用java最基本的IO流的操作,然后自己解析.一行一行的解析,然后插入.
FileInputStream fis = null; InputStreamReader isr = null; BufferedReader br = null; Connection conn = null; PreparedStatement stmt = null; try { Class.forName(jdbc_driver); conn = DriverManager.getConnection(jdbc_url, jdbc_user, jdbc_pwd); String sql = "insert into pmc values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"; stmt = conn.prepareStatement(sql); String str = ""; fis = new FileInputStream(filePath);// FileInputStream isr = new InputStreamReader(fis); br = new BufferedReader(isr); while ((str = br.readLine()) != null) { String[] rowData = tr.split("\\|"); if(rowData.length>=20){ for(int i = 0; i < 20; i++) { stmt.setString(i+1,rowData[i]); } stmt.execute(); } }
只是堆栈问题解决,但是发现速度太慢,采用了addBatch的方法1000条记录批量插入一次,最终代码如此:
private static int batchsize = 1000;
public void importFormTxt(String filePath) { FileInputStream fis = null; InputStreamReader isr = null; BufferedReader br = null; Connection conn = null; PreparedStatement stmt = null; try { Class.forName(jdbc_driver); conn = DriverManager.getConnection(jdbc_url, jdbc_user, jdbc_pwd); String sql = "insert into pmc values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)"; stmt = conn.prepareStatement(sql); String str = ""; fis = new FileInputStream(filePath);// FileInputStream isr = new InputStreamReader(fis); br = new BufferedReader(isr); int rowNum = 0; int batchNo = 1; long tmpT1 = System.currentTimeMillis(); System.out.println("import PMC start at:"+(new SimpleDateFormat("yyyy.MM.dd HH:mm:ss")).format(tmpT1)); while ((str = br.readLine()) != null) { String[] rowData = str.split("\\|"); if(rowData.length>=20){ rowNum++; for(int i = 0; i < 20; i++){ stmt.setString(i+1, rowData[i]); } stmt.addBatch(); } if(rowNum == batchNo * batchsize){ ++batchNo; stmt.executeBatch(); System.out.println("insert into "+rowNum+" success!"); stmt.clearBatch(); } } if ((batchNo - 1) * batchsize < rowNum) { stmt.executeBatch(); System.out.println("insert into "+rowNum+" success!"); stmt.clearBatch(); } long tmpT2 = System.currentTimeMillis(); System.out.println("import PMC end at:"+(new SimpleDateFormat("yyyy.MM.dd HH:mm:ss")).format(tmpT2)); System.out.println("use time:"+(tmpT2-tmpT1)/1000+"s"); } catch (FileNotFoundException e) { System.out.println("no file found"); } catch (IOException e) { System.out.println("read file failure"); } catch (ClassNotFoundException e) { e.printStackTrace(); } catch (SQLException e) { e.printStackTrace(); } finally { try { br.close(); isr.close(); fis.close(); stmt.close(); conn.close(); } catch (IOException e) { e.printStackTrace(); } catch (SQLException e) { e.printStackTrace(); } } }忽然间发现,java最基本的就可以解决最实际的问题.有时候第三方的jar包反而把问题搞复杂了.