两个服务同时批量更新相同Pg表的相同记录时,产生如下错误:
[2024-01-18 09:17:52,172 host: 173.16.34.10 ERROR alarm [org.springframework.kafka.KafkaListenerEndpointContainer#3-0-C-1] com.zhongbao.alarm.service.impl.AlarmServiceImpl - <ZG-COMMON-JS-ALARM>批量更新失败:]
org.springframework.dao.DeadlockLoserDataAccessException:
### Error updating database. Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
Detail: Process 2345 waits for ShareLock on transaction 177149283; blocked by process 30377.
Process 30377 waits for ShareLock on transaction 177149284; blocked by process 2345.
Hint: See server log for query details.
Where: while updating tuple (2672,3) in relation "alarm_data_20240117"
### The error may exist in class path resource [mapper/AlarmDataMapper.xml]
### The error may involve com.zhongbao.alarm.mapper.AlarmDataMapper.updateAlarmBatch-Inline
### The error occurred while setting parameters
两个更新事务(如批量update)在已获取锁(行锁)的情况下,都尝试去获取对方的锁(行锁),会互相等待,形成死锁 !死锁的详细信息,出现在服务器日志中。
推测:
1、基于update …的批量更新,在提交给Pg后,Pg自动生成批量更新的事务;
2、更新事务欲修改行数据时,会获取锁来锁定该行,直到事务执行完毕后释放所有的行锁(Pg会自动为更新添加行锁);
3、多个更新事务因修改记录相互等待对方的释放锁时,即发生死锁;
4、死锁后,直到其中一个事务超期结束,其锁被释放,死锁才打开;
查看Pg所有锁: SELECT * FROM pg_locks;
查看Pg的行锁: CREATE EXTENSION pgrowlocks; SELECT * FROM pgrowlocks('表名');
The correct way to perform such operations is to lock resources in the same order.
For example: in this case, accounts(表) can be locked in ascending order of their numbers.即批量更改表数据,需按照相同字段的相同排序去批量修改,保证修改顺序即可。
/// 顺序更新数据,以防止死锁
oldAlarmDataList.sort(new Comparator<AlarmData>() {
@Override
public int compare(AlarmData o1, AlarmData o2) {
return o1.getSid().compareTo(o2.getSid());
}
});
startTime = System.currentTimeMillis();
this.alarmAllMapper.updateAlarmAllBatch(oldAlarmDataList);
log.info("更新消息数量:{}, 耗时:{}",
oldAlarmDataList.size(), System.currentTimeMillis() - startTime);
=> create table accounts (
acc_no integer,
amount integer
);
=> insert into accounts values(0,0),(1,1),(2,2),(3,3),(4,4);
=> create index on accounts(amount desc);
/// 自增函数,延迟1秒后加100
=> CREATE FUNCTION inc_slow(n numeric) RETURNS numeric AS $$
SELECT pg_sleep(1);
SELECT n + 100.00;
$$ LANGUAGE SQL;
/// 安装扩展函数
=> CREATE EXTENSION pgrowlocks;
/ 第一会话 顺序修改表记录
/// Since tuples on the table page are located in ascending order of the amount (exactly how we added them),
/// they will also be updated in the same order.
/// explain命令输出结果中的"Seq Scan"表示顺序扫描,即全表扫描。
=> EXPLAIN (costs off) UPDATE accounts SET amount = inc_slow(amount);
| QUERY PLAN
| ----------------------------
| Update on accounts
| -> Seq Scan on accounts
| (2 rows)
/ 第二会话 逆序修改表记录
/// 禁止顺序扫描
=> SET enable_seqscan = off;
/// 注意:这里的100如果写成100.00会因数据类型不一致导致索引失效 !!!
/// explain命令输出结果中的"Index Scan"表示索引扫描。
=> EXPLAIN (costs off)
UPDATE accounts SET amount = inc_slow(amount) WHERE amount > 100;
| QUERY PLAN
| --------------------------------------------------------
| Update on accounts
| -> Index Scan using accounts_amount_idx on accounts
| Index Cond: (amount > 100.00)
| (3 rows)
/// 查看sql线程(pid)对记录的加锁情况
=> SELECT * FROM pgrowlocks('accounts');
当两个sql线程同时运行,其更新的记录"碰头"后,会产生死锁 !
特别注意: 在执行"SET enable_seqscan = off;“之前,虽然已添加索引,但执行"UPDATE accounts SET amount = inc_slow(amount) WHERE amount > 100;”
时候,Pg依然进行的全表扫描!因此推测:对应较简单的查询,即使建立并使用索引过滤,Pg依然会进行全表扫描。
参考文档
https://blog.csdn.net/pg_hgdb/article/details/100653715