author:sufei
版本:mysql 8.0.18,mysql 8.0.20
现象:
Mysql 启动时出现coredump
错误日志如下:
2020-07-01T14:46:04.895732+08:00 0 [Note] [MY-010251] [Server] Server socket created on IP: '::'.
2020-07-01T14:46:04.895732+08:00 0 [Note] [MY-010251] [Server] Server socket created on IP: '::'.
06:46:04 UTC - mysqld got signal 11 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x6224e90
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 2ba1bff42c80 thread_stack 0x40000
/mysql-install/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x1e825be]
/mysql-install/bin/mysqld(handle_fatal_signal+0x341) [0xf8b031]
/lib64/libpthread.so.0(+0xf630) [0x2ba0ceb75630]
/mysql-install/bin/mysqld(Cached_authentication_plugins::get_cached_plugin_ref(MYSQL_LEX_CSTRING const*)+0xd) [0xfd6e6d]
/mysql-install/bin/mysqld() [0xfd6f2c]
/mysql-install/bin/mysqld(acl_authenticate(THD*, enum_server_command)+0x1f7) [0xfe03f7]
/mysql-install/bin/mysqld() [0xdf1635]
/mysql-install/bin/mysqld(thd_prepare_connection(THD*)+0x46) [0xdf27e6]
/mysql-install/bin/mysqld() [0xf78e41]
/mysql-install/bin/mysqld() [0x23c08a5]
/lib64/libpthread.so.0(+0x7ea5) [0x2ba0ceb6dea5]
/lib64/libc.so.6(clone+0x6d) [0x2ba0d087e8dd]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 4
Status: NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
Core文件调用栈如下:
(gdb) bt
#0 0x00002ba0ceb72aa1 in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000001e81c47 in my_write_core (sig=) at /data2/sf/mysql8/mysql/8.0.18/mysql-8.0.18/mysys/stacktrace.cc:305
#2 0x0000000000f8afdd in handle_fatal_signal (sig=11) at /data2/sf/mysql8/mysql/8.0.18/mysql-8.0.18/sql/signal_handler.cc:169
#3
#4 0x0000000000fd6e6d in Cached_authentication_plugins::get_cached_plugin_ref (this=0x0, plugin=plugin@entry=0x2ba1bff41ff0)
at /data2/sf/mysql8/mysql/8.0.18/mysql-8.0.18/sql/auth/sql_authentication.cc:881
#5 0x0000000000fd6f2c in do_auth_once (thd=thd@entry=0x6224e90, auth_plugin_name=..., mpvio=mpvio@entry=0x2ba1bff423c0)
at /data2/sf/mysql8/mysql/8.0.18/mysql-8.0.18/sql/auth/sql_authentication.cc:2971
#6 0x0000000000fe03f7 in acl_authenticate (thd=thd@entry=0x6224e90, command=command@entry=COM_CONNECT)
at /data2/sf/mysql8/mysql/8.0.18/mysql-8.0.18/sql/auth/sql_authentication.cc:3268
#7 0x0000000000df1635 in check_connection (thd=thd@entry=0x6224e90, this=) at /data2/sf/mysql8/mysql/8.0.18/mysql-8.0.18/sql/sql_connect.cc:649
#8 0x0000000000df27e6 in login_connection (thd=0x6224e90) at /data2/sf/mysql8/mysql/8.0.18/mysql-8.0.18/sql/sql_connect.cc:704
#9 thd_prepare_connection (thd=thd@entry=0x6224e90) at /data2/sf/mysql8/mysql/8.0.18/mysql-8.0.18/sql/sql_connect.cc:877
#10 0x0000000000f78e41 in handle_connection (arg=arg@entry=0x2ba0e42e6970) at /data2/sf/mysql8/mysql/8.0.18/mysql-8.0.18/sql/conn_handler/connection_handler_per_thread.cc:298
#11 0x00000000023c08a5 in pfs_spawn_thread (arg=0x2ba0e4752210) at /data2/sf/mysql8/mysql/8.0.18/mysql-8.0.18/storage/perfschema/pfs.cc:2854
#12 0x00002ba0ceb6dea5 in start_thread () from /lib64/libpthread.so.0
#13 0x00002ba0d087e8dd in clone () from /lib64/libc.so.6
从core文件中可以看出:
1、发送coredump的位置是在等入时,进行权限认证;
2、然而权限认证的全局变量g_cached_authentication_plugins为空指针,从而造成段错误
分析:
通过上面的分析可以知道,也就是在使能监听用户连接时,acl权限变量还没有初始化,这在通常是不可能的,因为:
- 全局变量g_cached_authentication_plugins的初始化是在acl_init函数中进行的,而该函数的调用位置为6738行,如下:
if (abort || acl_init(opt_noacl)) {
if (!abort) LogErr(ERROR_LEVEL, ER_PRIVILEGE_SYSTEM_INIT_FAILED);
abort = true;
opt_noacl = true;
}
- 传统的监听是在主函数的7050行
mysqld_socket_acceptor->connection_event_loop();
也就是在监听之前已经初始化了g_cached_authentication_plugins
这不应该出问题呀,可是在mysql8.0.18开启了admin_IP的独立线程,即设置create_admin_listener_thread =ON时,我们可以看到其监听的开启在network_init中(位置为主函数的6671,在acl_init之前),其调用栈如下:
network_init
|->mysqld_socket_acceptor->init_connection_acceptor()
|--->Mysqld_socket_listener->setup_listener
|------>spawn_admin_thread(m_admin_interface_listen_socket,
m_admin_bind_address.network_namespace) // 开启监听线程
所以在设置create_admin_listener_thread = ON时,存在监听线程先于acl_init初始化,此时(network_init调用与acl_init调用之间)如果有admin用户使用admin_port端口连接进来就会造成cordump。
复现:
开启一个循环连接数据库的脚本,模拟客户端连接(注意用户名随便,无需登录)
#!/bin/bash
for a in {1..10000}
do
/mysql-install/bin/mysql -h admin_ip -Padmin_port -uroot -pxxxxx
done
同时,配置文件中开启create_admin_listener_thread ,启动数据库。
在不断的重复可能还出现如下问题,即任意用户登入成功,跳过了权限检测。
出现这个问题的原因主要是由于在acl_init中初始化g_cached_authentication_plugins与initialized之间
g_cached_authentication_plugins = new Cached_authentication_plugins(); // 初始化g_cached_authentication_plugins
unknown_accounts = new Map_with_rw_lock(0);
if (!g_cached_authentication_plugins->is_valid()) return 1;
if (dont_read_acl_tables) {
return 0; /* purecov: tested */
}
if (!(thd = new THD)) return 1; /* purecov: inspected */
thd->thread_stack = (char *)&thd;
thd->store_globals();
return_val = check_engine_type_for_acl_table(thd, false);
check_acl_tables_intact(thd, false);
return_val |= acl_reload(thd, false); // 设置initialized为true
如果启动时主线程在两个变量初始化之间,而管理的监控线程可以通过权限认证,逻辑如下
// 在acl_authenticate函数中
do_auth_once(thd, auth_plugin_name, &mpvio); // 由于g_cached_authentication_plugins设置了,所以可以通过而不发生coredump
……
if (initialized) // 由于此时initialized依然为false
{
……
}else{
sctx->skip_grants(); //跳过了权限表,从而造成用户跳过权限表登入
}
修复:
之前团队已经向官方提bug,可以通过关闭create_admin_listener_thread 来规避。后续在mysql 8.0.22进行了修复。