PostgreSQL的最大连接数在某些时候会成为一个让业务很苦恼的问题。业务连接数暴增,连接数打满,但又不能重启服务这种场景下,显得非常尴尬。最大连接数就只能重启调整,要想不停库,貌似“无解”。
为什么调整最大连接数需要重启,不能修改为reload方式?
这个问题不详细展开,简单来说最大连接数参数和共享内存及信号量等相关的,设置最大连接数后,在启动时系统会相应分配这些资源。那原生的最大连接数参数看起来只能是重启方式修改生效了。
原生pg除了max_connections,还可以ALTER USER / DATABASE … CONNECTION LIMIT connlimit通过user和 database这两个维度去动态调整连接数。不过只适用于比较特定的场景下,稍微推敲一番,就会发现大部分场景下不适用。比如同实例多database,多user,并且每个user需要一定数量的连接数,那瓶颈又回到了max_connections
思考了一番,决定尝试新增一个在线参数db_connlimit,可以动态调整最大连接数,参数级别为PGC_SIGHUP即reload生效。
默认值为-1,保持默认值,最大连接数还是沿用原生的逻辑。取值范围从-1(default)至max_connnections 。也就是说系统资源还是由max_connections参数来管理,db_connlimit参数只用来管理连接数个数,资源满足情况下适当放大max_connections,例如将db_connlimit设置为max_connections的一半,在连接数已满的情况下,可以继续增大db_connlimit,来缓解一些压力。
当然这个方案是存在问题的:例如未考虑版本差异。要想做到兼容所有版本,那么可以考虑HooK extensions 或者中间件方式,不过这样架构可能会更加复杂了。研发大佬肯定有成熟的方案,作为运维同学,我们就最后挣扎一下,验证下这个方案。
这里使用pg13.0进行测试。
db_connlimit参数必须满足以下几点要求:
针对以上3个问题,依次来看:
global.c 设置全局变量DbConnlimit
/*
* Modify by Nickxyang at 2021-06-07 PM
* Add the global variable DbConnlimit.
*/
int DbConnlimit = 100;
/*
* Modify End at 2021-06-07 PM
*/
miscadmin.h
/*
* Modify by Nickxyang at 2021-06-07 PM
*/
extern PGDLLIMPORT int DbConnlimit;
/*
* Modify End at 2021-06-07 PM
*/
guc.c新增参数db_connlimit, 首个-1为default,第二个-1为最小值
/*
* Modify by Nickxyang at 2021-06-07 PM
* Add the parameter db_connlimit, you can modify the maximum number of connections online.
* The valid range is -1 (default) to Maxconntions. A value of -1 means that the parameter is not open.
*/
{
{
'db_connlimit', PGC_SIGHUP, CONN_AUTH_SETTINGS,
gettext_noop('Sets the maximum number of concurrent connections, which can be set online.'),
NULL
},
&DbConnlimit,
-1, -1, MAX_BACKENDS,
check_dbconnlimit, NULL, NULL
},
/*
* Modify End at 2021-06-07 PM
*/
guc.c增加check函数,该函数由函数指针指向,在initdb以及start环节check该参数合法性
/*
* Modify by Nickxyang at 2021-06-07 PM
*/
static bool
check_dbconnlimit(int *newval, void **extra, GucSource source)
{
if ((*newval + autovacuum_max_workers + 1 +
max_worker_processes + max_wal_senders > MAX_BACKENDS) && *newval > MaxConnections)
return false;
return true;
}
/*
* Modify End at 2021-06-07 PM
*/
postgresql.conf.sample
#db_connlimit = -1 # range -1 to max_connections
postinit.c 中InitPostgres函数
/*
* Modify by Nickxyang at 2021-06-07 PM
*
* If the DbConnlimit parameter is configured
* and the number of connections reaches DbConnlimit - ReservedBackends,
* non-superuser will not be able to create new connections
*/
if (DbConnlimit > 0 && DbConnlimit < MaxConnections)
{
if (!am_superuser && !am_walsender
&& ReservedBackends > 0 &&
!HaveNFreeProcs(MaxConnections - DbConnlimit + ReservedBackends))
ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg('remaining connection slots are reserved for non-replication superuser connections,please check parameter db_connlimit.')));
if (!am_walsender &&
!HaveNFreeProcs(MaxConnections - DbConnlimit))
ereport(FATAL,
(errcode(ERRCODE_TOO_MANY_CONNECTIONS),
errmsg('sorry, too many clients already,please check parameter db_connlimit.')));
}
/* Modify End at 2021-06-07 PM */
这里处理逻辑完全沿用原来的逻辑,老的逻辑保留着,加入了db_connlimit参数的控制逻辑。If条件中DbConnlimit > 0可以保证在使用默认值-1时不进入该逻辑。
另外判断连接数是否已满HaveNFreeProcs函数修改了入参
针对非超级用户的判断逻辑,入参为:MaxConnections – DbConnlimit + ReservedBackends,
针对所有用户入的判断逻辑,入参为(包涵ReservedBackends):MaxConnections - DbConnlimit
该函数是通过查看共享内存ProcGlobal->freeProcs列表长度来判断是否连接数已满。
guc-file.c 中ProcessConfigFileInternal函数
/* Modify by Nickxyang at 2021-06-07 PM */
char *dbconnectlimit = 'db_connlimit';
/* Modify End at 2021-06-07 PM */
/* Modify by Nickxyang at 2021-06-07 PM
* When the db_connlimit configuration is greater than MaxConnections
* or is less than ReservedBackends (when ReservedBackends > 0)
* reload will not be effective */
if (strcmp(item->name,dbconnectlimit) == 0 && atoi(item->value) != -1
&& (atoi(item->value) > MaxConnections || atoi(item->value) <= ReservedBackends || atoi(item->value) == 0))
{
if (ReservedBackends > 0 && atoi(item->value) <= ReservedBackends)
{
ereport(elevel,
(errmsg('parameter \'%s\' is not set to \'%s\', because it must be bigger than superuser_reserved_connections=%d',
item->name, item->value,ReservedBackends)));
}
else
{
if (atoi(item->value) == 0)
{
ereport(elevel,
(errmsg('parameter \'%s\' is not set to \'%s\', because this will not allow non-superuser to login',
item->name, item->value)));
}
else
{
ereport(elevel,
(errmsg('parameter \'%s\' is not set to \'%s\', because it is bigger than max_connnections=%d',
item->name, item->value, MaxConnections)));
}
}
scres = -1;
}
else
/* Modify End at 2021-06-07 PM*/
scres = set_config_option(item->name, item->value,
context, PGC_S_FILE,
GUC_ACTION_SET, applySettings, 0, false);
这里只有当修改了db_connlimit 参数,并且参数值不满足要求(值大于MaxConnections 或者小于ReservedBackends或者值等于0)才会进入该逻辑,会在pglog里边记录一条设置失败的信息。并把scres赋值为-1,不会调用set_config_option函数使参数修改值生效。
如果不修改或者关闭db_connlimit参数,还是原生的逻辑。
编译安装并启动数据库,查看参数信息
postgres=# select * from pg_settings where name='db_connlimit';
-[ RECORD 1 ]---+----------------------------------------------------------------------------
name | db_connlimit
setting | -1
unit |
category | Connections and Authentication / Connection Settings
short_desc | Sets the maximum number of concurrent connections, which can be set online.
extra_desc |
context | sighup
vartype | integer
source | configuration file
min_val | -1
max_val | 262143
enumvals |
boot_val | -1
reset_val | -1
sourcefile | /data/pg13-0debug/data/postgresql.conf
sourceline | 3
pending_restart | f
postgres=#
修改参数值为5
postgres=# alter system set db_connlimit to 5 ;
ALTER SYSTEM
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
postgres=# show db_connlimit ;
db_connlimit
--------------
5
(1 row)
postgres=# show max_connections ;
max_connections
-----------------
10
(1 row)
postgres=# show superuser_reserved_connections ;
superuser_reserved_connections
--------------------------------
3
(1 row)
postgres=#
可以看到使用testuser(testuser是非超级用户)建立2个连接后,非管理员就不能再新建连接了
[postgres@NickCentos:pg13.0:5432 ~]$psql -U testuser -d postgres
psql: error: could not connect to server: FATAL: remaining connection slots are reserved for non-replication superuser connections, please check parameter db_connlimit
[postgres@NickCentos:pg13.0:5432 ~]$psql -U testuser -d postgres
psql: error: could not connect to server: FATAL: remaining connection slots are reserved for non-replication superuser connections, please check parameter db_connlimit
[postgres@NickCentos:pg13.0:5432 ~]$psql -U postgres -d postgres
psql (13.0)
PG干O,天天象上!
Type 'help' for help.
postgres=# select count(1),usename from pg_stat_activity where backend_type='client backend' group by 2 order by 1;
count | usename
-------+----------
1 | postgres
2 | testuser
(2 rows)
postgres=#
当连接数达到db_connlimit 后,superuser也无法登陆了
postgres=# select count(1),usename from pg_stat_activity where backend_type='client backend' group by 2 order by 1;
count | usename
-------+----------
2 | testuser
3 | postgres
(2 rows)
postgres=# ^Z
[1]+ Stopped psql -U postgres -d postgres
[postgres@NickCentos:pg13.0:5432 ~]$psql -U postgres -d postgres
psql: error: could not connect to server: FATAL: sorry, too many clients already, please check parameter db_connlimit
[postgres@NickCentos:pg13.0:5432 ~]$psql -U postgres -d postgres
psql: error: could not connect to server: FATAL: sorry, too many clients already, please check parameter db_connlimit
[postgres@NickCentos:pg13.0:5432 ~]$
这里要修改连接数就需要手动改文件,pg_ctl reload生效,刚好我们验证下在线增加功能,
结果来看是生效了。
[postgres@NickCentos:pg13.0:5432 ~]$psql -U postgres -d postgres
psql: error: could not connect to server: FATAL: sorry, too many clients already, please check parameter db_connlimit
[postgres@NickCentos:pg13.0:5432 ~]$psql -U postgres -d postgres
psql: error: could not connect to server: FATAL: sorry, too many clients already, please check parameter db_connlimit
[postgres@NickCentos:pg13.0:5432 ~]$vim $PGDATA/postgresql.conf
[postgres@NickCentos:pg13.0:5432 ~]$vim $PGDATA/postgresql.auto.conf
[postgres@NickCentos:pg13.0:5432 ~]$grep db_connlimit $PGDATA/postgresql*conf
/data/pg13-0debug/data/postgresql.auto.conf:db_connlimit = '8'
/data/pg13-0debug/data/postgresql.conf:db_connlimit = 8 # range -1 to max_connections
[postgres@NickCentos:pg13.0:5432 ~]$pg_ctl reload
server signaled
[postgres@NickCentos:pg13.0:5432 ~]$psql -U postgres -d postgres
psql (13.0)
PG干O,天天象上!
Type 'help' for help.
postgres=# select count(1),usename from pg_stat_activity where backend_type='client backend' group by 2 order by 1;
count | usename
-------+----------
2 | testuser
4 | postgres
(2 rows)
postgres=#
postgres=# alter system set db_connlimit to 11 ;
ALTER SYSTEM
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
postgres=# show db_connlimit ;
db_connlimit
--------------
8
(1 row)
postgres=# alter system set db_connlimit to 0 ;
ALTER SYSTEM
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
postgres=# show db_connlimit ;
db_connlimit
--------------
8
(1 row)
postgres=# alter system set db_connlimit to 2 ;
ALTER SYSTEM
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
postgres=# show db_connlimit ;
db_connlimit
--------------
8
(1 row)
postgres=#
这里并未生效,可以看到日志里有打印原因。
2021-06-07 14:36:48.866 CST [720] LOG: received SIGHUP, reloading configuration files
2021-06-07 14:36:48.866 CST [720] LOG: parameter 'db_connlimit' is not set to '11', because it is bigger than max_connnections=10.
2021-06-07 14:37:29.966 CST [720] LOG: received SIGHUP, reloading configuration files
2021-06-07 14:37:29.967 CST [720] LOG: parameter 'db_connlimit' is not set to '0', because it must be bigger than superuser_reserved_connections=3.
2021-06-07 14:37:47.122 CST [720] LOG: received SIGHUP, reloading configuration files
2021-06-07 14:37:47.123 CST [720] LOG: parameter 'db_connlimit' is not set to '2', because it must be bigger than superuser_reserved_connections=3.
无效值不会被识别
postgres=# alter system set db_connlimit to -2 ;
ERROR: -2 is outside the valid range for parameter 'db_connlimit' (-1 .. 262143)
postgres=# alter system set db_connlimit to a ;
ERROR: invalid value for parameter 'db_connlimit': 'a'
postgres=# alter system set db_connlimit to # ;
ERROR: syntax error at or near '#'
LINE 1: alter system set db_connlimit to # ;
^
postgres=#
配置如下:
postgres=# alter system set db_connlimit to -1 ;
ALTER SYSTEM
postgres=# select pg_reload_conf();
pg_reload_conf
----------------
t
(1 row)
postgres=# show db_connlimit ;
db_connlimit
--------------
-1
(1 row)
postgres=# show max_connections ;
max_connections
-----------------
10
(1 row)
postgres=# show superuser_reserved_connections ;
superuser_reserved_connections
--------------------------------
3
(1 row)
postgres=#
这里非超级用户只能建立7个连接,可以看到结果是正常的。
[postgres@NickCentos:pg13.0:5432 ~]$psql -U testuser -d postgres
psql: error: could not connect to server: FATAL: remaining connection slots are reserved for non-replication superuser connections
[postgres@NickCentos:pg13.0:5432 ~]$psql -U testuser -d postgres
psql: error: could not connect to server: FATAL: remaining connection slots are reserved for non-replication superuser connections
[postgres@NickCentos:pg13.0:5432 ~]$psql -U postgres -d postgres
psql (13.0)
PG干O,天天象上!
Type 'help' for help.
postgres=# select count(1),usename from pg_stat_activity where backend_type='client backend' group by 2 order by 1;
count | usename
-------+----------
1 | postgres
7 | testuser
(2 rows)
postgres=#
后续也进行了多次测试是符合预期的。
尝试加入在线管理参数db_connlimit初步验证是符合预期的。这只是一个设想的方案,适不适合生产还得慎重考虑。在应用连接数异常暴增且不能重启的场景下,这个方案可以缓解一些压力,不至于修改最大连接数但不能重启生效那么尴尬,但是同时要考虑信号量等系统资源问题。