同学,别再说不是你kill的会话了,这个锅DBA不背。
A和B两位同学由于某个重要SQL会话被kill,争论得面红耳赤,原因是一个表加字段没生效。A同学是一位业务研发,B同学是一位DBA。
B同学看了数据库日志,打印如下:
2023-02-26 11:25:01.925 CST,"prouser","prodb",1514,"192.168.2.5",63fad0a5.5ea,4,"idle in transaction",2023-02-26 11:23:17 CST,7/50609,43757341,FATAL,57P01,"terminating connection due to administrator command",,,,,,,,,"psql","client backend",,-7856242769374161054
从日志看对应时间有一个idle in transaction的session被kill了,但是并没记录被kill的具体query,没有十足的证据就是对应sql被kill。
只能是怀疑alter table执行完后,在idle in transaction状态session被kill,导致事务未提交alter table未生效。当然这是根据日志做的推测,没有十足的证据,被A同学怼的没有半点脾气。
idle,idle in transaction状态的会话被terminate后不会打印对应的query,具体可以看下进程接收到SIGTERM后ProcessInterrupts和errfinish的处理逻辑。只有active也就是正在执行的被terminate后可以打印出具体的query,但是也存在两个因素无法证明是谁执行的terminate。
1、只记录被terminate的进程信息,不记录操作terminate的进程信息,也就是不记录“凶手”
;
2、直接在数据库后台kill -15 pid,也是同样的效果,使用pg_terminate_backend() 其实就是封装了kill -15 pid,因此DBA同学也没法完全自证排除嫌疑。
这种情况下,免不了要互相扯皮,甩锅。
那么只要记录pg_terminate_backend() 的调用操作记录,同时记录kill的会话对应的query信息,这里就非常清晰了,不用再浪费时间扯皮了。看起来改造pg_terminate_backend()函数就可以了。
我们都知道当实例发生crash后postmaster会记录异常的process信息,假如是某个query导致了OOM,那么日志会打印出对应进程的query。看了下具体的实现,是postmaster在处理子进程退出时LogChildExit函数记录退出子进程的信息。但是,对于普通backend子进程的FATAL这种level是不调用LogChildExit的,被pg_terminate_backend() kill的就是FATAL类型的报错。
/*
* HandleChildCrash -- cleanup after failed backend, bgwriter, checkpointer,
* walwriter, autovacuum, archiver or background worker.
*
* The objectives here are to clean up our local state about the child
* process, and to signal all other remaining children to quickdie.
*/
static void
HandleChildCrash(int pid, int exitstatus, const char *procname)
{
dlist_mutable_iter iter;
slist_iter siter;
Backend *bp;
bool take_action;
/*
* We only log messages and send signals if this is the first process
* crash and we're not doing an immediate shutdown; otherwise, we're only
* here to update postmaster's idea of live processes. If we have already
* signaled children, nonzero exit status is to be expected, so don't
* clutter log.
*/
/* 当子进程是FatalError退出时是不会调用LogChildExit函记录进程信息的 */
take_action = !FatalError && Shutdown != ImmediateShutdown;
if (take_action)
{
LogChildExit(LOG, procname, pid, exitstatus);
ereport(LOG,
(errmsg("terminating any other active server processes")));
SetQuitSignalReason(PMQUIT_FOR_CRASH);
}
/* 省略部分代码行 */
}
改动Postmaster不合适,代价太高。修改pg_terminate_backend()函数,直接从LogChildExit中copy记录子进程的逻辑即可。
/*
* Send a signal to terminate a backend process. This is allowed if you are a
* member of the role whose process is being terminated. If the timeout input
* argument is 0, then this function just signals the backend and returns
* true. If timeout is nonzero, then it waits until no process has the given
* PID; if the process ends within the timeout, true is returned, and if the
* timeout is exceeded, a warning is emitted and false is returned.
*
* Note that only superusers can signal superuser-owned processes.
*/
Datum
pg_terminate_backend(PG_FUNCTION_ARGS)
{
int pid;
int r;
int timeout; /* milliseconds */
/* Modify by Nickyoung at 2023-02-26 AM */
/*
* size of activity_buffer is arbitrary, but set equal to default
* track_activity_query_size
*/
char activity_buffer[1024];
const char *activity = NULL;
/* End at 2023-02-26 AM */
pid = PG_GETARG_INT32(0);
timeout = PG_GETARG_INT64(1);
if (timeout < 0)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
errmsg("\"timeout\" must not be negative")));
/* 从共享内存查询pid对应query文本 */
activity = pgstat_get_crashed_backend_activity(pid,
activity_buffer,
sizeof(activity_buffer));
/* 向对应pid发送SIGTERM终止进程 */
r = pg_signal_backend(pid, SIGTERM);
if (r == SIGNAL_BACKEND_NOSUPERUSER)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be a superuser to terminate superuser process")));
if (r == SIGNAL_BACKEND_NOPERMISSION)
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be a member of the role whose process is being terminated or member of pg_signal_backend")));
/* Modify by Nickyoung at 2023-02-26 AM */
/*
* Record the operation of using pg_terminate_backend (PID)
* to kill the session and the terminated query
*/
/* 如果SIGTERM发送成功,那么打印pg_terminate_backend()函数调用记录,以及被终止的query文本 */
if (r != SIGNAL_BACKEND_ERROR)
{
ereport(WARNING,
(errmsg("process is terminated by: select pg_terminate_backend(%d), query is: %s" ,pid ,activity)));
}
/* End at 2023-02-26 AM */
/* Wait only on success and if actually requested */
if (r == SIGNAL_BACKEND_SUCCESS && timeout > 0)
PG_RETURN_BOOL(pg_wait_until_termination(pid, timeout));
else
PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
}
1、session1开启事务执行alter table,未提交,进程处于idle in transaction状态
testdb=> begin;
BEGIN
testdb=*> alter table instance_list add column type varchar(50) not null default 'rds';
ALTER TABLE
testdb=*>
2、session2 查询表等锁,进程active状态
testdb=> select * from instance_list limit 1;
3、session3 kill所有admin账户的query
testdb=> select pg_terminate_backend(pid),* from pg_stat_activity where usename='admin' and pid <> pg_backend_pid();
WARNING: process is terminated by: select pg_terminate_backend(9680), query is: alter table instance_list add column type varchar(50) not null default 'rds';
WARNING: process is terminated by: select pg_terminate_backend(9744), query is: select * from instance_list limit 1;
pg_terminate_backend | datid | datname | pid | leader_pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend
_start | xact_start | query_start | state_change | wait_event_type | wait_event | state
| backend_xid | backend_xmin | query_id | query | backend_type
----------------------+-------+---------+------+------------+----------+---------+------------------+-------------+-----------------+-------------+----------------
---------------+-------------------------------+-------------------------------+-------------------------------+-----------------+------------+--------------------
-+-------------+--------------+----------------------+-------------------------------------------------------------------------------+----------------
t | 24583 | testdb | 9680 | | 24582 | admin | psql | | | -1 | 2023-02-26 14:4
6:09.621174+08 | 2023-02-26 14:46:12.540059+08 | 2023-02-26 14:46:17.352702+08 | 2023-02-26 14:46:17.354734+08 | Client | ClientRead | idle in transaction
| 43785370 | | -7856242769374161054 | alter table instance_list add column type varchar(50) not null default 'rds'; | client backend
t | 24583 | testdb | 9744 | | 24582 | admin | psql | | | -1 | 2023-02-26 14:4
6:45.833136+08 | 2023-02-26 14:46:48.770609+08 | 2023-02-26 14:46:48.770609+08 | 2023-02-26 14:46:48.770626+08 | Lock | relation | active
| | 43785370 | | select * from instance_list limit 1; | client backend
(2 rows)
testdb=>
可以看到执行pg_terminate_backend(pid)后打印了对应的信息
WARNING: process is terminated by: select pg_terminate_backend(9680), query is: alter table instance_list add column type varchar(50) not null default ‘rds’;
WARNING: process is terminated by: select pg_terminate_backend(9744), query is: select * from instance_list limit 1;
查看日志已打印了pg_terminate_backend()函数调用记录,以及被终止的query文本
2023-02-26 14:48:37.792 CST,"admin","testdb",9850,"192.168.2.6",63fb0068.267a,4,"SELECT",2023-02-26 14:47:04 CST,7/21,0,WARNING,01000,"process is terminated by: select pg_terminate_backend(9680), query is: alter table instance_list add column type varchar(50) not null default 'rds';",,,,,,,,,"psql","client backend",,-3764573643027268885
2023-02-26 14:48:37.792 CST,"admin","testdb",9680,"192.168.2.6",63fb0031.25d0,1,"idle in transaction",2023-02-26 14:46:09 CST,4/16,43785370,FATAL,57P01,"terminating connection due to administrator command",,,,,,,,,"psql","client backend",,-7856242769374161054
2023-02-26 14:48:37.792 CST,"admin","testdb",9850,"192.168.2.6",63fb0068.267a,5,"SELECT",2023-02-26 14:47:04 CST,7/21,0,WARNING,01000,"process is terminated by: select pg_terminate_backend(9744), query is: select * from instance_list limit 1;",,,,,,,,,"psql","client backend",,-3764573643027268885
2023-02-26 14:48:37.792 CST,"admin","testdb",9744,"192.168.2.6",63fb0055.2610,1,"SELECT waiting",2023-02-26 14:46:45 CST,5/18,0,FATAL,57P01,"terminating connection due to administrator command",,,,,,"select * from instance_list limit 1;",15,,"psql","client backend",,0
这样就能抓到操作pg_terminate_backend()的凶手了,再也不用扯皮喽。
不过,因为和postmaster处理不同,postmaster是回收子进程过程中记录要退出子进程的query信息,子进程这里已经在退出逻辑中不再执行新sql了,postmaster记录的就是子进程最终执行的query。
而我们这里的方案是两个同级的子进程,先获取了query,然后紧接着去发送信号终止进程,这是个异步的过程,有细微的时间差(纳秒级别),获取query过程中假如连接数非常多,那么轮询匹配pid的过程可能就会久一些。那么假如批量select pg_terminate_backend(pid),pid,state,query from pg_stat_activity where xxx;杀连接的过程中,对于执行sql很快的活跃连接(比如单条sql执行1ms都不到这种)可能就会存在本来要杀这个sql,实际终止时已经执行到了其他sql。
我用pgbench做了测试,实例连接数20000的情况下,批量 杀2000 active连接,大约会出现20个pid是这个情况。
修改为先发信号终止进程,再获取query,这个又会出现采集不到query的情况,可能在采集时进程就已经退出了,获取的query就是NULL。
其实换个思路来想,假如我想杀一个pid ,这个pid执行的sql很快等我操作完pg_terminate_backend(pid)后可能已经是在执行其他query了。这种情况下是不可避免的,不过这个方案对于pg_terminate_backend(pid)的操作记录可以很准确的抓取到。