这里以mysql数据库为例,提供批量插入数据的高效方式,并做一定的对比,模拟10万条数据
Application.yml文件配置
server:
port: 8086
spring:
application:
name: batch
jpa:
database: mysql
show-sql: true
properties:
hibernate:
dialect: org.hibernate.dialect.MySQL5InnoDBDialect
generate_statistics: true
jdbc:
batch_size: 500
batch_versioned_data: true
order_inserts: true
order_updates: true
datasource:
url: jdbc:mysql://localhost:3306/hr?rewriteBatchedStatements=true&serverTimezone=UTC&useUnicode=true&characterEncoding=utf-8&useSSL=true&allowMultiQueries=true
username: root
password: ****
driver-class-name: com.mysql.cj.jdbc.Driver
数据:
public void inits(){
//这里模拟10万条数据
for(int i=0;i<100000;i++){
User user = new User();
user.setAge(i);
user.setId(i+"");
user.setName("name"+i);
userList.add(user);
}
}
传统jpa的saveAll方法还是太慢了,加了配置依旧很慢,10万数据需要23s的时间
@PersistenceContext
private EntityManager em;
private static final int BATCH_SIZE = 10000;
/**
* 批量增加,需要配置,10万条数据的消耗是 2549 ms
* @param list
*/
@Transactional(rollbackFor = Exception.class)
public void batchInsertWithEntityManager(List list){
Iterator iterator = list.listIterator();
int index = 0;
while (iterator.hasNext()){
em.persist(iterator.next());
index++;
if (index % BATCH_SIZE == 0){
em.flush();
em.clear();
}
}
if (index % BATCH_SIZE != 0){
em.flush();
em.clear();
}
}
效果:
@Test
public void testBatchInsert(){
long saveStart = System.currentTimeMillis();
batchDao.batchInsertWithEntityManager(userList);
long saveEnd = System.currentTimeMillis();
System.out.println("the save total time is "+(saveEnd-saveStart)+" ms"); //the save total time is 2549 ms
userDao.deleteAllInBatch(); //一条语句,批量删除
}
@Autowired
private JdbcTemplate jdbcTemplate;
/**
* jdbcTemplate,batchUpdate增加,需自己定义sql,需要配置
* @param list
*/
public void batchWithJDBCTemplate(List list){
String sql = "Insert into t_user(id,name,age) values(?,?,?)";
jdbcTemplate.batchUpdate(sql,new BatchPreparedStatementSetter() {
@Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
ps.setString(1,list.get(i).getId());
ps.setString(2,list.get(i).getName());
ps.setInt(3,list.get(i).getAge());
}
@Override
public int getBatchSize() {
return list.size();
}
});
}
效果:
@Test
public void testBatchWithJDBC(){
long saveStart = System.currentTimeMillis();
batchDao.batchWithJDBCTemplate(userList);
long saveEnd = System.currentTimeMillis();
System.out.println("the save total time is "+(saveEnd-saveStart)+" ms"); // the save total time is 1078 ms
userDao.deleteAllInBatch(); //一条语句,批量删除
}
/**
* 使用数据库原生的方式执行,不需要配置
* @param list
*/
public void batchWithNativeSql(List list) throws SQLException {
String sql = "Insert into t_user(id,name,age) values(?,?,?)";
DataSource dataSource = jdbcTemplate.getDataSource();
try{
Connection connection = dataSource.getConnection();
connection.setAutoCommit(false);
PreparedStatement ps = connection.prepareStatement(sql);
final int batchSize = 10000;
int count = 0;
for(User user :list){
ps.setString(1,user.getId());
ps.setString(2,user.getName());
ps.setInt(3,user.getAge());
ps.addBatch();
count++;
if(count % batchSize == 0 || count == list.size()) {
ps.executeBatch();
ps.clearBatch();
}
}
connection.commit();
}catch (SQLException e){
e.printStackTrace();
}
}
效果:
@Test
public void testBatchWithNativeSql() throws SQLException {
long saveStart = System.currentTimeMillis();
batchDao.batchWithNativeSql(userList);
long saveEnd = System.currentTimeMillis();
System.out.println("the save total time is "+(saveEnd-saveStart)+" ms"); // the save total time is 899 ms
userDao.deleteAllInBatch();
}
JdbcTemplate的batchUpdate和原生SQL操作两种方式基本上能满足数据大的操作需求,前者需要在进行配置,而后者不需要配置。对于小批量的数据操作,根据自己的需要选择。
温馨提醒:配置失效或者没有效果,请重新检查一下配置文件,同时也检查表是否存在触发器,有触发器则需要与相关人员进行沟通,建议把触发器涉及的业务进行抽取成几张表的批量插入。