今天从库crash重启后出现很有趣的现象:
可以发现:
Retrieved_Gtid_Set值显示从库没有接收到部分事务,丢失了部分事务。但是从Executed_Gtid_Set显示从库没有丢失事务。
错误日志:
2017-03-08 10:41:12 118393 [ERROR] /usr/local/mysql/bin/mysqld: Sort aborted: Query execution was interrupted 170308 10:55:38 mysqld_safe Number of processes running now: 0 170308 10:55:38 mysqld_safe mysqld restarted 2017-03-08 10:55:39 0 [Note] /usr/local/mysql/bin/mysqld (mysqld 5.6.29-log) starting as process 131069 ...
从中可以看到,mysql crash后由mysqld_safe重新拉起来,并没有其他有用的报错信息。
系统日志(/var/log/messsage):
Mar 8 10:55:37 mysql02 kernel: irqbalance invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0 Mar 8 10:55:37 mysql02 kernel: irqbalance cpuset=/ mems_allowed=0 Mar 8 10:55:37 mysql02 kernel: Pid: 1853, comm: irqbalance Not tainted 2.6.32-431.el6.x86_64 #1 Mar 8 10:55:37 mysql02 kernel: Call Trace: Mar 8 10:55:37 mysql02 kernel: [] ? cpuset_print_task_mems_allowed+0x91/0xb0 Mar 8 10:55:37 mysql02 kernel: [ ] ? dump_header+0x90/0x1b0 Mar 8 10:55:37 mysql02 kernel: [ ] ? security_real_capable_noaudit+0x3c/0x70 Mar 8 10:55:37 mysql02 kernel: [ ] ? oom_kill_process+0x82/0x2a0 Mar 8 10:55:37 mysql02 kernel: [ ] ? select_bad_process+0xe1/0x120 Mar 8 10:55:37 mysql02 kernel: [ ] ? out_of_memory+0x220/0x3c0 Mar 8 10:55:37 mysql02 kernel: [ ] ? __alloc_pages_nodemask+0x8ac/0x8d0 Mar 8 10:55:37 mysql02 kernel: [ ] ? alloc_pages_vma+0x9a/0x150 Mar 8 10:55:37 mysql02 kernel: [ ] ? handle_pte_fault+0x73d/0xb00 Mar 8 10:55:37 mysql02 kernel: [ ] ? inode_init_always+0x11e/0x1c0 Mar 8 10:55:37 mysql02 kernel: [ ] ? bit_waitqueue+0x17/0xd0 Mar 8 10:55:37 mysql02 kernel: [ ] ? inotify_d_instantiate+0x2a/0x60 Mar 8 10:55:37 mysql02 kernel: [ ] ? __d_instantiate+0xbb/0x110 Mar 8 10:55:37 mysql02 kernel: [ ] ? security_d_instantiate+0x1b/0x30 Mar 8 10:55:37 mysql02 kernel: [ ] ? handle_mm_fault+0x22a/0x300 Mar 8 10:55:37 mysql02 kernel: [ ] ? number+0x2ee/0x320 Mar 8 10:55:37 mysql02 kernel: [ ] ? __do_page_fault+0x138/0x480 Mar 8 10:55:37 mysql02 kernel: [ ] ? vsnprintf+0x218/0x5e0 Mar 8 10:55:37 mysql02 kernel: [ ] ? seq_release+0x0/0x30 Mar 8 10:55:37 mysql02 kernel: [ ] ? do_page_fault+0x3e/0xa0 Mar 8 10:55:37 mysql02 kernel: [ ] ? page_fault+0x25/0x30 Mar 8 10:55:37 mysql02 kernel: [ ] ? copy_user_enhanced_fast_string+0x6/0x10 Mar 8 10:55:37 mysql02 kernel: [ ] ? seq_read+0x2d2/0x400 Mar 8 10:55:37 mysql02 kernel: [ ] ? proc_reg_read+0x7e/0xc0 Mar 8 10:55:37 mysql02 kernel: [ ] ? vfs_read+0xb5/0x1a0 Mar 8 10:55:37 mysql02 kernel: [ ] ? sys_read+0x51/0x90 Mar 8 10:55:37 mysql02 kernel: [ ] ? system_call_fastpath+0x16/0x1b Mar 8 10:55:37 mysql02 kernel: Mem-Info: Mar 8 10:55:37 mysql02 kernel: Node 0 DMA per-cpu: Mar 8 10:55:37 mysql02 kernel: CPU 0: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 1: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 2: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 3: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 4: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 5: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 6: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 7: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 8: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 9: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 10: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 11: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 12: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 13: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 14: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 15: hi: 0, btch: 1 usd: 0 Mar 8 10:55:37 mysql02 kernel: Node 0 DMA32 per-cpu: Mar 8 10:55:37 mysql02 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 10: hi: 186, btch: 31 usd: 30 Mar 8 10:55:37 mysql02 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: Node 0 Normal per-cpu: Mar 8 10:55:37 mysql02 kernel: CPU 0: hi: 186, btch: 31 usd: 28 Mar 8 10:55:37 mysql02 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Mar 8 10:55:37 mysql02 kernel: active_anon:6861787 inactive_anon:578006 isolated_anon:0 Mar 8 10:55:37 mysql02 kernel: active_file:242921 inactive_file:379133 isolated_file:0 Mar 8 10:55:37 mysql02 kernel: unevictable:0 dirty:356609 writeback:22685 unstable:0 Mar 8 10:55:37 mysql02 kernel: free:49534 slab_reclaimable:21256 slab_unreclaimable:10798 Mar 8 10:55:37 mysql02 kernel: mapped:2837 shmem:294 pagetables:16180 bounce:0 Mar 8 10:55:37 mysql02 kernel: Node 0 DMA free:15560kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15176kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Mar 8 10:55:37 mysql02 kernel: lowmem_reserve[]: 0 3512 32289 32289 Mar 8 10:55:37 mysql02 kernel: Node 0 DMA32 free:122436kB min:7344kB low:9180kB high:11016kB active_anon:2254796kB inactive_anon:583736kB active_file:68kB inactive_file:1944kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596508kB mlocked:0kB dirty:1984kB writeback:60kB mapped:32kB shmem:0kB slab_reclaimable:256kB slab_unreclaimable:124kB kernel_stack:0kB pagetables:28kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:768 all_unreclaimable? no Mar 8 10:55:37 mysql02 kernel: lowmem_reserve[]: 0 0 28777 28777 Mar 8 10:55:37 mysql02 kernel: Node 0 Normal free:60140kB min:60204kB low:75252kB high:90304kB active_anon:25192352kB inactive_anon:1728288kB active_file:971616kB inactive_file:1514588kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:29467760kB mlocked:0kB dirty:1424452kB writeback:90680kB mapped:11316kB shmem:1176kB slab_reclaimable:84768kB slab_unreclaimable:43068kB kernel_stack:3880kB pagetables:64692kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:367200 all_unreclaimable? no Mar 8 10:55:37 mysql02 kernel: lowmem_reserve[]: 0 0 0 0 Mar 8 10:55:37 mysql02 kernel: Node 0 DMA: 2*4kB 2*8kB 1*16kB 1*32kB 2*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15560kB Mar 8 10:55:37 mysql02 kernel: Node 0 DMA32: 19*4kB 35*8kB 48*16kB 57*32kB 59*64kB 54*128kB 43*256kB 39*512kB 28*1024kB 20*2048kB 2*4096kB = 122436kB Mar 8 10:55:37 mysql02 kernel: Node 0 Normal: 474*4kB 368*8kB 446*16kB 303*32kB 190*64kB 92*128kB 12*256kB 3*512kB 2*1024kB 4*2048kB 0*4096kB = 60456kB Mar 8 10:55:37 mysql02 kernel: 622459 total pagecache pages Mar 8 10:55:37 mysql02 kernel: 0 pages in swap cache Mar 8 10:55:37 mysql02 kernel: Swap cache stats: add 0, delete 0, find 0/0 Mar 8 10:55:37 mysql02 kernel: Free swap = 4194296kB Mar 8 10:55:37 mysql02 kernel: Total swap = 4194296kB Mar 8 10:55:37 mysql02 kernel: 8386559 pages RAM Mar 8 10:55:37 mysql02 kernel: 172714 pages reserved Mar 8 10:55:37 mysql02 kernel: 627141 pages shared Mar 8 10:55:37 mysql02 kernel: 7536243 pages non-shared Mar 8 10:55:37 mysql02 kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name Mar 8 10:55:37 mysql02 kernel: [ 605] 0 605 2662 171 0 -17 -1000 udevd Mar 8 10:55:37 mysql02 kernel: [ 1522] 0 1522 2280 202 0 0 0 dhclient Mar 8 10:55:37 mysql02 kernel: [ 1828] 0 1828 1540 93 0 0 0 portreserve Mar 8 10:55:37 mysql02 kernel: [ 1835] 0 1835 62993 1908 0 0 0 rsyslogd Mar 8 10:55:37 mysql02 kernel: [ 1853] 0 1853 2706 138 9 0 0 irqbalance Mar 8 10:55:37 mysql02 kernel: [ 1894] 0 1894 1695 71 1 0 0 mcelog Mar 8 10:55:37 mysql02 kernel: [ 1916] 81 1916 5351 124 4 0 0 dbus-daemon Mar 8 10:55:37 mysql02 kernel: [ 1945] 0 1945 1020 116 0 0 0 acpid Mar 8 10:55:37 mysql02 kernel: [ 1954] 68 1954 9442 381 12 0 0 hald Mar 8 10:55:37 mysql02 kernel: [ 1955] 0 1955 5082 158 0 0 0 hald-runner Mar 8 10:55:37 mysql02 kernel: [ 1984] 0 1984 5612 137 0 0 0 hald-addon-inpu Mar 8 10:55:37 mysql02 kernel: [ 2005] 68 2005 4484 141 0 0 0 hald-addon-acpi Mar 8 10:55:37 mysql02 kernel: [ 2020] 0 2020 113075 252 0 0 0 automount Mar 8 10:55:37 mysql02 kernel: [ 2061] 0 2061 1884 116 0 0 0 uvp-monitor Mar 8 10:55:37 mysql02 kernel: [ 2063] 0 2063 101731 766 0 0 0 uvp-monitor Mar 8 10:55:37 mysql02 kernel: [ 2118] 0 2118 16656 280 1 -17 -1000 sshd Mar 8 10:55:37 mysql02 kernel: [ 2148] 0 2148 27580 163 4 0 0 abrtd Mar 8 10:55:37 mysql02 kernel: [ 2175] 0 2175 29331 274 2 0 0 crond Mar 8 10:55:37 mysql02 kernel: [ 2186] 0 2186 5385 95 0 0 0 atd Mar 8 10:55:37 mysql02 kernel: [ 2194] 0 2194 47832 1721 0 0 0 python Mar 8 10:55:37 mysql02 kernel: [ 2205] 0 2205 15588 129 3 0 0 certmonger Mar 8 10:55:37 mysql02 kernel: [ 2221] 0 2221 1016 84 0 0 0 mingetty Mar 8 10:55:37 mysql02 kernel: [ 2223] 0 2223 1016 84 0 0 0 mingetty Mar 8 10:55:37 mysql02 kernel: [ 2225] 0 2225 1016 84 0 0 0 mingetty Mar 8 10:55:37 mysql02 kernel: [ 2227] 0 2227 1016 85 0 0 0 mingetty Mar 8 10:55:37 mysql02 kernel: [ 2229] 0 2229 1016 84 4 0 0 mingetty Mar 8 10:55:37 mysql02 kernel: [ 2231] 0 2231 1016 85 7 0 0 mingetty Mar 8 10:55:37 mysql02 kernel: [100856] 0 100856 2663 168 0 -17 -1000 udevd Mar 8 10:55:37 mysql02 kernel: [100860] 0 100860 2663 168 0 -17 -1000 udevd Mar 8 10:55:37 mysql02 kernel: [81005] 0 81005 18916 236 2 0 0 zabbix_agentd Mar 8 10:55:37 mysql02 kernel: [81007] 0 81007 18916 458 6 0 0 zabbix_agentd Mar 8 10:55:37 mysql02 kernel: [81008] 0 81008 18916 302 1 0 0 zabbix_agentd Mar 8 10:55:37 mysql02 kernel: [81009] 0 81009 18916 303 1 0 0 zabbix_agentd Mar 8 10:55:37 mysql02 kernel: [81010] 0 81010 18916 302 0 0 0 zabbix_agentd Mar 8 10:55:37 mysql02 kernel: [81011] 0 81011 18919 250 12 0 0 zabbix_agentd Mar 8 10:55:37 mysql02 kernel: [81012] 0 81012 18919 247 1 0 0 zabbix_agentd Mar 8 10:55:37 mysql02 kernel: [117075] 0 117075 26558 162 11 0 0 mysqld_safe Mar 8 10:55:37 mysql02 kernel: [118393] 500 118393 8672252 7434655 0 0 0 mysqld Mar 8 10:55:37 mysql02 kernel: [85036] 0 85036 24574 989 0 0 0 sshd Mar 8 10:55:37 mysql02 kernel: [85038] 0 85038 27084 444 0 0 0 bash Mar 8 10:55:37 mysql02 kernel: Out of memory: Kill process 118393 (mysqld) score 804 or sacrifice child Mar 8 10:55:37 mysql02 kernel: Killed process 118393, UID 500, (mysqld) total-vm:34689008kB, anon-rss:29732924kB, file-rss:5696kB
最终从系统日志中看出,是由于mysqld进程占用大量内存导致OOM,至于为何zabbix监控没有采集到内存这个数据,应该是发生OOM的时候zabbix agent还未到采集数据的时间点导致。
通过pt-query-digest工具分析当时故障发生点的慢日志,抓出一条慢sql,explain执行计划发现是全表扫描的SQL。
为了确认从库是否真的因为少接收到事务而漏了部分数据,特意去解析了从库的binlog日志。
可以发现,其实从库后续有接收到事务号:77d12988-29c1-11e6-a323-fa163ea5bbe1:334314693的事务,只是事务号次序被打乱,没有依次递增的情况。
这是主库的binlog日志记录:
注意:由于mysql的主从数据一致是以从库必须严格同主库按照相同sql执行次序为前提的,这种情况虽然从库也接收所有的事务并执行完成,但是主从库的执行次序并不一致。谨慎来说,从库仍然存在数据不一致的风险。需要使用pt工具包对主从库的数据做数据校验为好!