监控报警发现MGR的一个节点故障,查看时发现LVS已经发生切换,LVS切到了MGR新的写节点上了,排查原因
/var/log/message
Mar 27 16:51:05 db10 kernel: crond invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0 Mar 27 16:51:05 db10 kernel: crond cpuset=/ mems_allowed=0-1 Mar 27 16:51:05 db10 kernel: CPU: 35 PID: 12090 Comm: crond Tainted: G OE ------------ 3.10.0-693.21.1.el7.x86_64 #1 Mar 27 16:51:05 db10 kernel: Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS 4.1.16 06/21/2018 Mar 27 16:51:05 db10 kernel: Call Trace: Mar 27 16:51:05 db10 kernel: [] dump_stack+0x19/0x1b Mar 27 16:51:05 db10 kernel: [ ] dump_header+0x90/0x229 Mar 27 16:51:05 db10 kernel: [ ] ? ktime_get_ts64+0x52/0xf0 Mar 27 16:51:05 db10 kernel: [ ] ? delayacct_end+0x8f/0xb0 Mar 27 16:51:05 db10 kernel: [ ] oom_kill_process+0x254/0x3d0 Mar 27 16:51:05 db10 kernel: [ ] ? oom_unkillable_task+0xcd/0x120 Mar 27 16:51:05 db10 kernel: [ ] ? find_lock_task_mm+0x56/0xc0 Mar 27 16:51:05 db10 kernel: [ ] out_of_memory+0x4b6/0x4f0 Mar 27 16:51:05 db10 kernel: [ ] __alloc_pages_slowpath+0x5d6/0x724 Mar 27 16:51:05 db10 kernel: [ ] __alloc_pages_nodemask+0x405/0x420 Mar 27 16:51:05 db10 kernel: [ ] copy_process+0x1dd/0x1970 Mar 27 16:51:05 db10 kernel: [ ] ? audit_filter_rules.isra.8+0x280/0xf90 Mar 27 16:51:05 db10 kernel: [ ] do_fork+0x91/0x320 Mar 27 16:51:05 db10 kernel: [ ] SyS_clone+0x16/0x20 Mar 27 16:51:05 db10 kernel: [ ] stub_clone+0x44/0x70 Mar 27 16:51:05 db10 kernel: [ ] ? system_call_fastpath+0x1c/0x21 Mar 27 16:51:05 db10 kernel: Mem-Info: Mar 27 16:51:05 db10 kernel: active_anon:32289123 inactive_anon:180550 isolated_anon:0#012 active_file:960 inactive_file:195 isolated_file:0#012 unevictable:0 dirty:4 8 writeback:0 unstable:0#012 slab_reclaimable:59079 slab_unreclaimable:32778#012 mapped:13096 shmem:534843 pagetables:66034 bounce:0#012 free:96590 free_pcp:105 free_cma:0 Mar 27 16:51:05 db10 kernel: Node 0 DMA free:13540kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB iso lated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kern el_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Mar 27 16:51:05 db10 kernel: lowmem_reserve[]: 0 1680 64143 64143 Mar 27 16:51:05 db10 kernel: Node 0 DMA32 free:250600kB min:1176kB low:1468kB high:1764kB active_anon:1442100kB inactive_anon:464kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1934208kB managed:1722948kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:1740kB slab_reclaimable:11840 kB slab_unreclaimable:7640kB kernel_stack:368kB pagetables:1132kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unre claimable? yes Mar 27 16:51:05 db10 kernel: lowmem_reserve[]: 0 0 62462 62462 Mar 27 16:51:05 db10 kernel: Node 0 Normal free:54592kB min:43744kB low:54680kB high:65616kB active_anon:62871276kB inactive_anon:371740kB active_file:12kB inactive_f ile:24kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:65011712kB managed:63961888kB mlocked:0kB dirty:0kB writeback:0kB mapped:1028kB shmem:1190332kB slab_ reclaimable:124084kB slab_unreclaimable:45492kB kernel_stack:4768kB pagetables:92984kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Mar 27 16:51:05 db10 kernel: lowmem_reserve[]: 0 0 0 0 Mar 27 16:51:05 db10 kernel: Node 1 Normal free:68040kB min:45176kB low:56468kB high:67764kB active_anon:64843172kB inactive_anon:349996kB active_file:0kB inactive_file:160kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:67108864kB managed:66056756kB mlocked:0kB dirty:192kB writeback:0kB mapped:50080kB shmem:947300kB slab_reclaimable:100392kB slab_unreclaimable:77980kB kernel_stack:28736kB pagetables:170020kB unstable:0kB bounce:0kB free_pcp:640kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:55 all_unreclaimable? no Mar 27 16:51:05 db10 kernel: lowmem_reserve[]: 0 0 0 0 Mar 27 16:51:05 db10 kernel: Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 13540kB Mar 27 16:51:05 db10 kernel: Node 0 DMA32: 264*4kB (UEM) 403*8kB (UEM) 475*16kB (UEM) 342*32kB (UEM) 391*64kB (UEM) 300*128kB (UEM) 208*256kB (UEM) 107*512kB (UEM) 45*1024kB (EM) 5*2048kB (E) 0*4096kB = 250600kB Mar 27 16:51:05 db10 kernel: Node 0 Normal: 13593*4kB (UEM) 22*8kB (UM) 9*16kB (M) 2*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54756kB Mar 27 16:51:05 db10 kernel: Node 1 Normal: 16649*4kB (UEM) 8*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66660kB Mar 27 16:51:05 db10 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Mar 27 16:51:05 db10 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Mar 27 16:51:05 db10 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Mar 27 16:51:05 db10 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Mar 27 16:51:05 db10 kernel: 535067 total pagecache pages Mar 27 16:51:05 db10 kernel: 0 pages in swap cache Mar 27 16:51:05 db10 kernel: Swap cache stats: add 0, delete 0, find 0/0 Mar 27 16:51:05 db10 kernel: Free swap = 0kB Mar 27 16:51:05 db10 kernel: Total swap = 0kB Mar 27 16:51:05 db10 kernel: 33517692 pages RAM Mar 27 16:51:05 db10 kernel: 0 pages HighMem/MovableOnly Mar 27 16:51:05 db10 kernel: 578319 pages reserved Mar 27 16:51:05 db10 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 27 16:51:05 db10 kernel: [ 6050] 0 6050 35461 19476 75 0 0 systemd-journal Mar 27 16:51:05 db10 kernel: [ 6075] 0 6075 30235 80 28 0 0 lvmetad Mar 27 16:51:05 db10 kernel: [ 6094] 0 6094 10898 172 24 0 -1000 systemd-udevd Mar 27 16:51:05 db10 kernel: [11985] 0 11985 4845 104 15 0 0 irqbalance Mar 27 16:51:05 db10 kernel: [11988] 995 11988 25173 71 20 0 0 chronyd Mar 27 16:51:06 db10 kernel: [11989] 81 11989 6709 161 21 0 -900 dbus-daemon Mar 27 16:51:06 db10 kernel: [12004] 0 12004 31998 151 22 0 0 smartd Mar 27 16:51:06 db10 kernel: [12006] 996 12006 2144 37 10 0 0 lsmd Mar 27 16:51:06 db10 kernel: [12009] 0 12009 186971 9901 237 0 0 rsyslogd Mar 27 16:51:06 db10 kernel: [12016] 0 12016 1105 39 8 0 0 rngd Mar 27 16:51:06 db10 kernel: [12034] 0 12034 6620 99 19 0 0 systemd-logind Mar 27 16:51:06 db10 kernel: [12068] 0 12068 5955 48 17 0 0 atd Mar 27 16:51:06 db10 kernel: [12090] 0 12090 31058 165 19 0 0 crond Mar 27 16:51:06 db10 kernel: [12242] 0 12242 1055 19 7 0 0 supervise Mar 27 16:51:06 db10 kernel: [12243] 0 12243 28807 54 14 0 0 run Mar 27 16:51:06 db10 kernel: [12260] 0 12260 139002 3217 93 0 0 tuned Mar 27 16:51:06 db10 kernel: [12273] 0 12273 27021 242 54 0 -1000 sshd Mar 27 16:51:06 db10 kernel: [12316] 0 12316 27523 33 10 0 0 agetty Mar 27 16:51:06 db10 kernel: [12319] 0 12319 20378 199 38 0 0 hooagentd Mar 27 16:51:06 db10 kernel: [12324] 0 12324 80468 586 57 0 0 hooagent Mar 27 16:51:06 db10 kernel: [12804] 0 12804 22895 259 43 0 0 master Mar 27 16:51:06 db10 kernel: [12831] 89 12831 22965 281 45 0 0 qmgr Mar 27 16:51:06 db10 kernel: [13103] 0 13103 828994 4025 115 0 0 wonder-agent Mar 27 16:51:06 db10 kernel: [20985] 0 20985 175106 1241 72 0 -1000 logmon Mar 27 16:51:06 db10 kernel: [18570] 42583 18570 32515 159 19 0 0 screen Mar 27 16:51:06 db10 kernel: [18571] 42583 18571 29229 485 15 0 0 bash Mar 27 16:51:06 db10 kernel: [22385] 42583 22385 32515 153 19 0 0 screen Mar 27 16:51:06 db10 kernel: [22386] 42583 22386 29230 485 16 0 0 bash Mar 27 16:51:06 db10 kernel: [22416] 42583 22416 32515 154 20 0 0 screen Mar 27 16:51:06 db10 kernel: [22417] 42583 22417 29230 485 13 0 0 bash Mar 27 16:51:06 db10 kernel: [12032] 0 12032 28326 102 13 0 0 mysqld_safe Mar 27 16:51:06 db10 kernel: [13363] 33173 13363 74431932 31903076 64367 0 0 mysqld Mar 27 16:51:06 db10 kernel: [33949] 0 33949 14918 7466 33 0 0 mysqld_exporter Mar 27 16:51:06 db10 kernel: [ 6287] 0 6287 663221 5068 121 0 0 bbmon Mar 27 16:51:06 db10 kernel: [ 6621] 89 6621 22921 255 46 0 0 pickup Mar 27 16:51:06 db10 kernel: [ 6957] 89 6957 22922 256 44 0 0 trivial-rewrite Mar 27 16:51:06 db10 kernel: [ 7033] 0 7033 45072 238 45 0 0 crond Mar 27 16:51:06 db10 kernel: [ 7045] 0 7045 28274 48 13 0 0 sh Mar 27 16:51:06 db10 kernel: [ 7054] 0 7054 372238 1382 69 0 0 dbvip Mar 27 16:51:06 db10 kernel: [ 7421] 0 7421 47770 1426 49 0 0 python Mar 27 16:51:06 db10 kernel: [ 7422] 0 7422 4935 159 12 0 0 msval Mar 27 16:51:06 db10 kernel: Out of memory: Kill process 5396 (mysqld) score 970 or sacrifice child Mar 27 16:51:06 db10 kernel: Killed process 13363 (mysqld) total-vm:297727728kB, anon-rss:127612364kB, file-rss:0kB, shmem-rss:0kB
直接原因是下面这个mysqld进程被杀
Mar 27 16:51:06 db10 kernel: Killed process 13363 (mysqld) total-vm:297727728kB, anon-rss:127612364kB, file-rss:0kB, shmem-rss:0kB
然后往上面看,mysqld占用的内存是70多G,系统物理内存是128G
Mar 27 16:51:06 db10 kernel: [13363] 33173 13363 74431932 31903076 64367 0 0 mysqld
再往上看涉及到了node0、node1、hugepages_total,swap,这主要是numa和大页相关,先跳过这两个问题,既然这里是70多Gmysqld就被kill掉了,那我先设置mysqlbuffer_pool为 64G,先为防止该问题再出现加一道保险,然后再慢慢排查
mysql> show variables like '%pool_size%'; +-------------------------+-------------+ | Variable_name | Value | +-------------------------+-------------+ | innodb_buffer_pool_size | 85899345920 | +-------------------------+-------------+ 1 row in set (0.00 sec) mysql> select 64*1024*1024*1024; +-------------------+ | 64*1024*1024*1024 | +-------------------+ | 68719476736 | +-------------------+ 1 row in set (0.00 sec) mysql> mysql> mysql> set global innodb_buffer_pool_size=68719476736; Query OK, 0 rows affected (0.00 sec) mysql> show global variables like '%pool_size%'; +-------------------------+-------------+ | Variable_name | Value | +-------------------------+-------------+ | innodb_buffer_pool_size | 68719476736 | +-------------------------+-------------+ 1 row in set (0.00 sec)
注意,配置文件也要修改一下;修改后OS会慢慢释放一些内存,当然,那些正在使用内存不会被释放。
再回头看这个内存问题,从日志中可以看出OS对内存处理的顺序是numa-->大页-->swap,numa内存不足,查看大页,最后查看了swap,暂时跳过numa、大页的问题,先看swap,系统想使用swap时,发现swap为0,然后就kill了mysql
Mar 27 16:51:05 db10 kernel: 535067 total pagecache pages Mar 27 16:51:05 db10 kernel: 0 pages in swap cache Mar 27 16:51:05 db10 kernel: Swap cache stats: add 0, delete 0, find 0/0 Mar 27 16:51:05 db10 kernel: Free swap = 0kB Mar 27 16:51:05 db10 kernel: Total swap = 0kB
系统想去要535067个页的内存,去swap找,结果系统没有swap,后面系统打印出各进程的内存使用情况,发现mysqld进程占用的内存最多,就杀掉了mysqld进程,从而有了可回收的内存;看到这里一个方案就出来了-->加swap,物理内存128G,加swap 64G,这里提一下swap是磁盘上的一块空间,访问swap当然没有访问内存快,同时,在内存不足与swap空间进行交互时,对OS的性能也是一种损耗,所以swap不是越多越好,这里加64G,加swap的过程略。内存向swap中放的通常是冷数据,这些冷数据是可以定期回收的。
再看这个numa,当时安装迁移数据库的时候,安装是在centos7.4上安装的,centos7.4默认是关闭numa,确切说默认没有配置numa,直接将原系统(centos6.2)上的配置文件复制过来,做一些简单调整,就开始安装了数据库,那numa是怎么回事?
查看现运行库numa相关参数,发现该参数是ON,配置文件是复制来的,这说明复制来的配置文件是ON
mysql> show variables like '%numa%'; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | innodb_numa_interleave | ON | +------------------------+-------+ 1 row in set (0.00 sec)
配置文件中有
loose_innodb_numa_interleave = 1
查看mysql官方的参数说明,这个参数开启以为着系统使用了numa的MPOL_DEFAULT特性,并且要求系统开启numa功能
Enables the NUMA interleave memory policy for allocation of the InnoDB
buffer pool. When innodb_numa_interleave
is enabled, the NUMA memory policy is set to MPOL_INTERLEAVE
for the mysqld process. After the InnoDB
buffer pool is allocated, the NUMA memory policy is set back to MPOL_DEFAULT
. For the innodb_numa_interleave
option to be available, MySQL must be compiled on a NUMA-enabled Linux system.
查看原来的系统,原来的centos6.2系统果然是开启了numa的
$ numastat node0 node1 numa_hit 48300952797 40465837517 numa_miss 8640293195 30793567 numa_foreign 30793567 8640293195 interleave_hit 1459753044 1206990706 local_node 47936393296 39695132194 other_node 9004852696 801498890
而新系统centos7.4默认没有开启numa,解决方案随之就出来了,关闭innodb_numa_interleave参数,mysql5.7.9默认关闭该参数,现版本是mysql5.7.24,删除配置文件中该参数即可,重启实例才能生效。由于前面缩小了内存,并加了swap,这里只重启读节点(读节点重启时lvs流量先切到其他节点),写节点只修改配置。由于服务器节点比较多,修改的内容也比较重要,统一修改完后,要再回头一一验证一下,以防遗漏。
关于大页,hugepages_total=0表示大页是关闭的,mysql关闭大页的参数有以下两个,默认也为关闭
mysql> show variables like 'large_page%'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | large_page_size | 0 | | large_pages | OFF | +-----------------+-------+ 2 rows in set (0.00 sec)
大页关闭时的状态
$ grep Huge /proc/meminfo
AnonHugePages: 39612416 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
根据日志分析,解决方案也出来了,但真正的问题未没有解决,mysql的内存为什么会使用这么多,128G不够用?是什么占用了内存?
mysql使用内存= key_buffer_size + query_cache_size + tmp_table_size + innodb_buffer_pool_size + innodb_additional_mem_pool_size + innodb_log_buffer_size + max_connections ×( sort_buffer_size + read_buffer_size + read_rnd_buffer_size + join_buffer_size + thread_stack + binlog_cache_size)
以上是mysql内存的占用部分,依次排查,发现binlog_cache_size参数在不久前从16M调整到了128M,系统的连接高时可到5000,可想相乘以后需要的内存有多大;解决方案--binlog_cache_size是会话级的,恢复该参数为16M.官方关于该参数的解释
The size of the cache to hold changes to the binary log during a transaction. A binary log cache is allocated for each client if the server supports any transactional storage engines and if the server has the binary log enabled (--log-bin option).
在一个事务运行期间,binlog_cache_size会缓存该事务修改的内存到binlog日志中。这意味着对于select语句,该参数不生效,但对于DML语句,就会按该参数分配资源,当系统的DML并发高时,占用的内存就多。
话外音:
前不久新入职一个数据库专家,给了优化的建议,大约修改了10个参数,加上这个,已经回退了两个参数。他是根据他之前公司的经验修改的,但他之前的公司没什么并发,同时对参数的含义理解也有误,不像这里单库并发能到四五千。修改现有线上系统前,一定要对修改的内容做充分的调研,最好是模拟一下线上运行环境,再进行上线。