postgresql tcp 连接超时问题

错误现象

应用中遇到一个错误

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

问题分析

Connection timed out

数据库侧记录 的日志之一

could not receive data from client: Connection timed out

数据库日志中没有同样的记录,应该是连接断开,客户端直接返回这个错误了。于是想办法重现这样的错误。

猜测和连接断开有关,这个好模拟在服务端将对应的连接kill -15 杀掉,客户端再继续操作会出现类似的错误。

[postgres@work ~]$ psql -h 192.168.149.131 -U antdb
Password for user antdb: 
psql (11.5, server 11.6)
Type "help" for help.

192.168.149.131:5432 antdb@test=# begin;
BEGIN
192.168.149.131:5432 antdb@test=*# commit;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

真实应用中不会有主动断开连接的操作,所以继续分析。联想到数据库会对空闲的连接做处理的机制。由几个参数控制

 tcp_keepalives_count            | Maximum number of TCP keepalive retransmits.
 tcp_keepalives_idle             | Time between issuing TCP keepalives.
 tcp_keepalives_interval         | Time between TCP keepalive retransmits.

和操作系统对tcp socket 控制类似。

在server端查看数据库连接网络情况,用ss或者netstat 命令

[antdb@node1 ~]$ ss -apte|grep 105408
ESTAB      0      0      192.168.149.131:postgres             192.168.149.100:44338               
users:(("postgres",pid=105408,fd=11)) timer:(keepalive,19sec,0) uid:1004 ino:118045263 
sk:ffff92e9416a4d80 <->

[antdb@node1 ~]$ netstat -antpo|grep 105408
tcp        0      0 192.168.149.131:5432    192.168.149.100:44338   ESTABLISHED 105408/postgres: an  keepalive (10.05/0/0)

可以看到keepalive 的情况。

查看数据库中tcp相关参数

[local]:5432 postgres@test=# select name,setting,reset_val from pg_settings where name ~ 'tcp';
          name           | setting | reset_val 
-------------------------+---------+-----------
 tcp_keepalives_count    | 0       | 3
 tcp_keepalives_idle     | 0       | 60
 tcp_keepalives_interval | 0       | 10

setting 列没有改变,感觉生效的是reset_val 列的数值,此处没有细研究。这三个参数的组合意思就是数据库对每个链接都会有计时器一样的操作,轮询计时,间隔tcp_keepalives_idle 这么长时间没有网络交互的话,数据库会发送心跳包给客户端,用tcpdump 可以抓取到

16:57:08.366531 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294551424 ecr 123129537], length 0
16:57:08.366555 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123189623 ecr 294491338], length 0
16:58:08.398387 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294611456 ecr 123189623], length 0
16:58:08.398409 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123249654 ecr 294491338], length 0
16:59:08.558872 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294671616 ecr 123249654], length 0
16:59:08.558907 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123309815 ecr 294491338], length 0

客户端对心跳包做返回时,数据库就认为这不是一个废弃连接,也就不会进一步杀掉这个连接。

什么情况下数据库会主动杀掉连接呢,从心跳机制看数据库主动发给客户端的心跳超过设置的次数并且客户端一直没有反馈,该连接就会被杀掉。但怎么模拟客户端没有反馈呢,曾尝试各个级别的kill命令,和在代码里调用sleep函数,gdb close掉tcp socket 都没有模拟出来能力有限。决定把客户端的服务器网网卡关闭(用的虚拟机模拟),这样就可以模拟网络中断导致的通信失败了。

从命令观测到,在尝试了3次后,数据库把这个连接给杀掉了。

(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (20.44/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (19.29/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (18.80/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (2.86/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (1.10/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (0.29/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (9.45/0/1)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (6.29/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (4.47/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (0.30/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (8.72/0/3)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (3.19/0/3)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)

数据库日志中能看到断开连接。

2020-07-30 17:55:32.955 CST,,,63841,"192.168.149.100:47752",5f229914.f961,1,"",2020-07-30 17:55:32 CST,,0,LOG,00000,"connection received: host=192.168.149.100 port=47752",,,,,,,,,""
2020-07-30 17:55:32.957 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,2,"authentication",2020-07-30 17:55:32 CST,3/2434,0,LOG,00000,"connection authorized: user=antdb database=test",,,,,,,,,""
2020-07-30 17:57:03.087 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,3,"idle",2020-07-30 17:55:32 CST,3/0,0,LOG,XX000,"could not receive data from client: Connection timed out",,,,,,,,,"psql"
2020-07-30 17:57:03.087 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,4,"idle",2020-07-30 17:55:32 CST,,0,LOG,00000,"disconnection: session time: 0:01:30.131 user=antdb database=test host=192.168.149.100 port=47752",,,,,,,,,"psql"

根据模拟出来的日志在生产系统中过滤确实有相同的日志,基本可以断定该错误是因为心跳超时数据库主动杀掉连接导致的,但生产系统为什么会出现心跳超时错误呢,生产系统应用是通过F5负载均衡连接数据库的,初步分析是F5对数据库发出的心跳包不再回应导致断联。

Connection reset by peer

could not receive data from client: Connection reset by peer

这样的日志有别于time out错误,虽然客户端看到的都是server closed the connection unexpectedly,但数据库日志记录中这个是另外的错误。Connection reset by peer是一个典型的tcp 网络连接的问题,当tcp正产建立连接后,如果一端主动端开连接(发送rst 网络包等),另一侧就会收到报错,这个错误不是数据库主动断开的,所以在应用侧针对连接抓包,用wareshark分析一下,可以看到F5主动发了rst信号给连接,将其断掉了。

No. Time    Source  Destination Protocol    Length  Info
1   0.000000    10.159.101.45   10.154.52.156   TCP 77  58721 → 15432 [PSH, ACK] Seq=1 Ack=1 Win=19561 Len=11 TSval=899685638 TSecr=3959865574
2   0.001125    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=1 Ack=12 Win=65535 Len=0 TSval=3960160103 TSecr=899685638
3   0.004199    10.154.52.156   10.159.101.45   TCP 83  15432 → 58721 [PSH, ACK] Seq=1 Ack=12 Win=65535 Len=17 TSval=3960160106 TSecr=899685638
4   0.004215    10.159.101.45   10.154.52.156   TCP 66  58721 → 15432 [ACK] Seq=12 Ack=18 Win=19561 Len=0 TSval=899685642 TSecr=3960160106
5   0.004390    10.159.101.45   10.154.52.156   TCP 1514    58721 → 15432 [ACK] Seq=12 Ack=18 Win=19561 Len=1448 TSval=899685642 TSecr=3960160106
6   0.004398    10.159.101.45   10.154.52.156   TCP 209 58721 → 15432 [PSH, ACK] Seq=1460 Ack=18 Win=19561 Len=143 TSval=899685642 TSecr=3960160106
7   0.005488    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=18 Ack=1603 Win=65535 Len=0 TSval=3960160108 TSecr=899685642
8   0.078162    10.154.52.156   10.159.101.45   TCP 314 15432 → 58721 [PSH, ACK] Seq=18 Ack=1603 Win=65535 Len=248 TSval=3960160180 TSecr=899685642
9   0.078262    10.159.101.45   10.154.52.156   TCP 7306    58721 → 15432 [ACK] Seq=1603 Ack=266 Win=20823 Len=7240 TSval=899685716 TSecr=3960160180
10  0.078278    10.159.101.45   10.154.52.156   TCP 7306    58721 → 15432 [ACK] Seq=8843 Ack=266 Win=20823 Len=7240 TSval=899685716 TSecr=3960160180
11  0.079582    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=4499 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
12  0.079590    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=7395 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
13  0.079595    10.159.101.45   10.154.52.156   TCP 7306    58721 → 15432 [ACK] Seq=16083 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
14  0.079616    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=10291 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
15  0.079620    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=13187 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
16  0.079624    10.159.101.45   10.154.52.156   TCP 7306    58721 → 15432 [ACK] Seq=23323 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
17  0.079630    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=16083 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
18  0.079633    10.159.101.45   10.154.52.156   TCP 7306    58721 → 15432 [ACK] Seq=30563 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
19  0.081022    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=18979 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
20  0.081077    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=21875 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
21  0.081089    10.159.101.45   10.154.52.156   TCP 1514    58721 → 15432 [ACK] Seq=37803 Ack=266 Win=20823 Len=1448 TSval=899685719 TSecr=3960160183
22  0.081126    10.159.101.45   10.154.52.156   TCP 988 58721 → 15432 [PSH, ACK] Seq=39251 Ack=266 Win=20823 Len=922 TSval=899685719 TSecr=3960160183
23  0.081132    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=24771 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
24  0.081136    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=27667 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
25  0.081140    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=30563 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
26  0.081144    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=33459 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
27  0.081147    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=36355 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
28  0.082225    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=39251 Win=65535 Len=0 TSval=3960160184 TSecr=899685717
29  0.082251    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=40173 Win=65535 Len=0 TSval=3960160184 TSecr=899685719
30  0.104067    10.154.52.156   10.159.101.45   TCP 85  15432 → 58721 [PSH, ACK] Seq=266 Ack=40173 Win=65535 Len=19 TSval=3960160206 TSecr=899685719
31  0.104154    10.159.101.45   10.154.52.156   TCP 78  58721 → 15432 [PSH, ACK] Seq=40173 Ack=285 Win=20823 Len=12 TSval=899685742 TSecr=3960160206
32  0.105258    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=285 Ack=40185 Win=65535 Len=0 TSval=3960160207 TSecr=899685742
33  0.313198    10.154.52.156   10.159.101.45   TCP 84  15432 → 58721 [PSH, ACK] Seq=285 Ack=40185 Win=65535 Len=18 TSval=3960160415 TSecr=899685742
34  0.324214    10.159.101.45   10.154.52.156   TCP 77  58721 → 15432 [PSH, ACK] Seq=40185 Ack=303 Win=20823 Len=11 TSval=899685962 TSecr=3960160415
35  0.325344    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=303 Ack=40196 Win=65535 Len=0 TSval=3960160427 TSecr=899685962
36  0.337175    10.154.52.156   10.159.101.45   TCP 83  15432 → 58721 [PSH, ACK] Seq=303 Ack=40196 Win=65535 Len=17 TSval=3960160439 TSecr=899685962
37  0.337250    10.159.101.45   10.154.52.156   TCP 1514    58721 → 15432 [ACK] Seq=40196 Ack=320 Win=20823 Len=1448 TSval=899685975 TSecr=3960160439
38  0.337258    10.159.101.45   10.154.52.156   TCP 209 58721 → 15432 [PSH, ACK] Seq=41644 Ack=320 Win=20823 Len=143 TSval=899685975 TSecr=3960160439
39  0.338360    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=320 Ack=41787 Win=65535 Len=0 TSval=3960160440 TSecr=899685975
40  0.390031    10.154.52.156   10.159.101.45   TCP 314 15432 → 58721 [PSH, ACK] Seq=320 Ack=41787 Win=65535 Len=248 TSval=3960160492 TSecr=899685975
41  0.390156    10.159.101.45   10.154.52.156   TCP 4410    58721 → 15432 [ACK] Seq=41787 Ack=568 Win=22085 Len=4344 TSval=899686028 TSecr=3960160492
42  0.390169    10.159.101.45   10.154.52.156   TCP 90  58721 → 15432 [PSH, ACK] Seq=46131 Ack=568 Win=22085 Len=24 TSval=899686028 TSecr=3960160492
43  0.391292    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=568 Ack=44683 Win=65535 Len=0 TSval=3960160493 TSecr=899686028
44  0.391301    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=568 Ack=46155 Win=65535 Len=0 TSval=3960160493 TSecr=899686028
45  0.416738    10.154.52.156   10.159.101.45   TCP 84  15432 → 58721 [PSH, ACK] Seq=568 Ack=46155 Win=65535 Len=18 TSval=3960160519 TSecr=899686028
46  0.416830    10.159.101.45   10.154.52.156   TCP 78  58721 → 15432 [PSH, ACK] Seq=46155 Ack=586 Win=22085 Len=12 TSval=899686055 TSecr=3960160519
47  0.417929    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=586 Ack=46167 Win=65535 Len=0 TSval=3960160520 TSecr=899686055
48  0.622674    10.154.52.156   10.159.101.45   TCP 84  15432 → 58721 [PSH, ACK] Seq=586 Ack=46167 Win=65535 Len=18 TSval=3960160725 TSecr=899686055
49  0.662720    10.159.101.45   10.154.52.156   TCP 66  58721 → 15432 [ACK] Seq=46167 Ack=604 Win=22085 Len=0 TSval=899686301 TSecr=3960160725
50  5.646245    10.159.101.45   10.154.52.156   TCP 77  58721 → 15432 [PSH, ACK] Seq=46167 Ack=604 Win=22085 Len=11 TSval=899691284 TSecr=3960160725
51  5.647397    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=604 Ack=46178 Win=65535 Len=0 TSval=3960165749 TSecr=899691284
52  5.650477    10.154.52.156   10.159.101.45   TCP 83  15432 → 58721 [PSH, ACK] Seq=604 Ack=46178 Win=65535 Len=17 TSval=3960165753 TSecr=899691284
53  5.650490    10.159.101.45   10.154.52.156   TCP 66  58721 → 15432 [ACK] Seq=46178 Ack=621 Win=22085 Len=0 TSval=899691288 TSecr=3960165753
54  5.650699    10.159.101.45   10.154.52.156   TCP 1514    58721 → 15432 [ACK] Seq=46178 Ack=621 Win=22085 Len=1448 TSval=899691289 TSecr=3960165753
55  5.650725    10.159.101.45   10.154.52.156   TCP 209 58721 → 15432 [PSH, ACK] Seq=47626 Ack=621 Win=22085 Len=143 TSval=899691289 TSecr=3960165753
56  5.651861    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=621 Ack=47769 Win=65535 Len=0 TSval=3960165754 TSecr=899691289
57  5.655522    10.154.52.156   10.159.101.45   TCP 314 15432 → 58721 [PSH, ACK] Seq=621 Ack=47769 Win=65535 Len=248 TSval=3960165758 TSecr=899691289
58  5.655759    10.159.101.45   10.154.52.156   TCP 1290    58721 → 15432 [PSH, ACK] Seq=47769 Ack=869 Win=23347 Len=1224 TSval=899691294 TSecr=3960165758
59  5.656885    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=869 Ack=48993 Win=65535 Len=0 TSval=3960165759 TSecr=899691294
60  5.663736    10.154.52.156   10.159.101.45   TCP 84  15432 → 58721 [PSH, ACK] Seq=869 Ack=48993 Win=65535 Len=18 TSval=3960165766 TSecr=899691294
61  5.663830    10.159.101.45   10.154.52.156   TCP 78  58721 → 15432 [PSH, ACK] Seq=48993 Ack=887 Win=23347 Len=12 TSval=899691302 TSecr=3960165766
62  5.664924    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=887 Ack=49005 Win=65535 Len=0 TSval=3960165767 TSecr=899691302
63  5.799961    10.154.52.156   10.159.101.45   TCP 84  15432 → 58721 [PSH, ACK] Seq=887 Ack=49005 Win=65535 Len=18 TSval=3960165902 TSecr=899691302
64  5.839732    10.159.101.45   10.154.52.156   TCP 66  58721 → 15432 [ACK] Seq=49005 Ack=905 Win=23347 Len=0 TSval=899691478 TSecr=3960165902
65  306.902943  10.154.52.156   10.159.101.45   TCP 60  15432 → 58721 [RST, ACK] Seq=905 Ack=49005 Win=65535 Len=0

解决方案是修改F5的reset 连接参数,或者在应用端加入重试机制。

操作系统心跳参数和数据库心跳参数关系

将操作系统心跳改小

[root@node1 ~]# sysctl -a|grep keepalive
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 60

数据库心跳改大

[local]:5432 antdb@test=# select setting,reset_val from pg_settings where name ~ 'tcp';
 setting | reset_val 
---------+-----------
 0       | 3
 0       | 300
 0       | 30
(3 rows)

观察数据库连接心跳管理,可以观察到是以数据库设置为准的。

[antdb@node1 pg_log]$ netstat -antpo|grep 122214
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47904   ESTABLISHED 122214/postgres: an  keepalive (281.39/0/0)
[antdb@node1 pg_log]$ netstat -antpo|grep 122214
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47904   ESTABLISHED 122214/postgres: an  keepalive (271.24/0/0)

同时在客户端抓包,发现确实是以数据库的心跳参数为准。

root@work ~]# tcpdump -i ens33 tcp port 47904
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens33, link-type EN10MB (Ethernet), capture size 262144 bytes
18:20:24.590429 IP 192.168.149.131.postgres > work.47904: Flags [.], ack 2175538745, win 227, options [nop,nop,TS val 299547648 ecr 127884958], length 0
18:20:24.590492 IP work.47904 > 192.168.149.131.postgres: Flags [.], ack 1, win 239, options [nop,nop,TS val 128185847 ecr 299246720], length 0

数据库日志查看连接开始时间,和首次心跳发送时间间隔正好是数据库心跳设置数值。

2020-07-30 18:15:23.659 CST,,,122214,"192.168.149.100:47904",5f229dbb.1dd66,1,"",2020-07-30 18:15:23 CST,,0,LOG,00000,"connection received: host=192.168.149.100 port=47904",,,,,,,,,""
2020-07-30 18:15:23.661 CST,"antdb","test",122214,"192.168.149.100:47904",5f229dbb.1dd66,2,"authentication",2020-07-30 18:15:23 CST,3/2455,0,LOG,00000,"connection authorized: user=antdb database=test",,,,,,,,,""

所以结论是数据库连接由数据库心跳参数管理。

你可能感兴趣的:(postgresql tcp 连接超时问题)