异常断电mysql集群数据库恢复

说明:项目相关,有些命令不一定适用于其他场景,仅供参考


20180121日,xxxx项目,超融合异常掉电,导致数据库启动不了。 

首先要备份/var/lib/mysql文件夹!!!!

恢复:

1)不强制恢复

180121 20:00:37 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql//wsrep_recovery.7565d7' --pid-file='/var/lib/mysql//plhcs_controller_3-recover.pid'

2018-01-21 20:00:38 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).

180121 20:00:38 mysqld_safe WSREP: Failed to recover position:  2018-01-21 20:00:38 338811 [Warning] Using unique option prefix myisam_recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead. 2018

-01-21 20:00:38 338811 [Note] Plugin 'FEDERATED' is disabled. 2018-01-21 20:00:38 7f64279b9740 InnoDB: Warning: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option may be removed in future releases. Please use READ COMMITTED transaction isolation level instead, see http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html. 2018-01-21 20:00:38 338811 [Note] InnoDB: Using atomics to ref count buffer pool pages 2018-01-21 20:00:38 338811 [Note] InnoDB: The InnoDB memory heap is disabled 2018-01-21 20:00:38 338811 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-01-21 20:00:38 338811 [Note] InnoDB: Memory barrier is not used 2018-01-21 20:00:38 338811 [Note] InnoDB: Compressed tables use zlib 1.2.3 2018-01-21 20:00:38 338811 [Note] InnoDB: Using Linux native AIO 2018-01-21 20:00:38 338811 [Note] InnoDB: Using CPU crc32 instructions 2018-01-21 20:00:38 338811 [Note] InnoDB: Initializing buffer pool, size = 4.9G 2018-01-21 20:00:38 338811 [Note] InnoDB: Completed initialization of buffer pool 2018-01-21 20:00:38 338811 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-21 20:00:38 338811 [Note] InnoDB: Log scan progressed past the checkpoint lsn 1956901700 2018-01-21 20:00:38 338811 [Note] InnoDB: Database was not shutdown normally! 2018-01-21 20:00:38 338811 [Note] InnoDB: Starting crash recovery. 2018-01-21 20:00:38 338811 [Note] InnoDB: Reading tablespace information from the .ibd files... 2018-01-21 20:00:38 338811 [ERROR] InnoDB: space header page consists of zero bytes in tablespace ./cmon/cmon_stats.ibd (table cmon/cmon_stats) 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size:1024 Pages to analyze:64 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size: 1024, Possible space_id count:0 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size:2048 Pages to analyze:56 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size: 2048, Possible space_id count:0 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size:4096 Pages to analyze:28 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size: 4096, Possible space_id count:0 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size:8192 Pages to analyze:14 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size: 8192, Possible space_id count:0 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size:16384 Pages to analyze:7 2018-01-21 20:00:38 338811 [Note] InnoDB: Page size: 16384, Possible space_id count:0 2018-01-21 20:00:38 7f64279b9740 InnoDB: Operating system error number 2 in a file operation. InnoDB: The error means the system cannot find the path specified. InnoDB: If you are installing InnoDB, remember that you must create InnoDB: directories yourself, InnoDB does not create them. InnoDB: Error: could not open single-table tablespace file ./cmon/cmon_stats.ibd InnoDB: We do not continue the crash recovery, because the table may become InnoDB: corrupt if we cannot apply the log records in the InnoDB log to it. InnoDB: To fix the problem and start mysqld: InnoDB: 1) If there is a permission problem in the file and mysqld cannot InnoDB: open the file, you should modify the permissions. InnoDB: 2) If the table is not needed, or you can restore it from a backup, InnoDB: then you can remove the .ibd file, and InnoDB will do a normal InnoDB: crash recovery and ignore that table. InnoDB: 3) If the file system or the disk is broken, and you cannot remove InnoDB: the .ibd file, you can set innodb_force_recovery > 0 in my.cnf InnoDB: and force InnoDB to continue crash recovery here.

 

 

2)强制恢复1

2018-01-21 20:02:29 340095 [ERROR] InnoDB: Space id in fsp header 1316159744,but in the page header 33554432

InnoDB: Error: tablespace id is 2163 in the data dictionary

InnoDB: but in file ./cmon/cmon_job.ibd it is 18446744073709551615!

2018-01-21 20:02:29 7fa4c9b82700  InnoDB: Assertion failure in thread 140345735653120 in file fil0fil.cc line 796

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to http://bugs.mysql.com.

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html

InnoDB: about forcing recovery.

12:02:29 UTC - mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

 

key_buffer_size=0

read_buffer_size=131072

max_used_connections=0

max_threads=10000

thread_count=2

connection_count=2

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3981875 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

 

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0 thread_stack 0x40000

/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0x904aeb]

/usr/sbin/mysqld(handle_fatal_signal+0x491)[0x68dc71]

/lib64/libpthread.so.0(+0xf370)[0x7fa6db20c370]

/lib64/libc.so.6(gsignal+0x37)[0x7fa6da0131d7]

/lib64/libc.so.6(abort+0x148)[0x7fa6da0148c8]

/usr/sbin/mysqld[0xacfd89]

/usr/sbin/mysqld[0xacff9c]

/usr/sbin/mysqld[0xad7a6b]

/usr/sbin/mysqld[0xaa020b]

/usr/sbin/mysqld[0xa86e40]

/usr/sbin/mysqld[0xa6bae3]

/usr/sbin/mysqld[0xa10836]

/usr/sbin/mysqld[0xa0d700]

/usr/sbin/mysqld[0xa0e4ca]

/usr/sbin/mysqld[0xa0ee08]

/usr/sbin/mysqld[0x9dd365]

/usr/sbin/mysqld[0xa359ae]

/usr/sbin/mysqld[0xa27a2c]

/lib64/libpthread.so.0(+0x7dc5)[0x7fa6db204dc5]

/lib64/libc.so.6(clone+0x6d)[0x7fa6da0d576d]

 

3)强制恢复2

2018-01-21 20:03:51 341253 [Note] WSREP: Service thread queue flushed.

2018-01-21 20:03:51 341253 [Note] WSREP: GCache history reset: old(bbdd25de-fe77-11e7-9e6f-2b7b75cdd72a:0) -> new(bbdd25de-fe77-11e7-9e6f-2b7b75cdd72a:3644)

2018-01-21 20:03:51 341253 [Note] WSREP: Synchronized with group, ready for connections

2018-01-21 20:03:51 341253 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

InnoDB: A new raw disk partition was initialized or

InnoDB: innodb_force_recovery is on: we do not allow

InnoDB: database modifications by the user. Shut down

InnoDB: mysqld and edit my.cnf so that newraw is replaced

InnoDB: with raw, and innodb_force_... is removed.

InnoDB: A new raw disk partition was initialized or

InnoDB: innodb_force_recovery is on: we do not allow

InnoDB: database modifications by the user. Shut down

InnoDB: mysqld and edit my.cnf so that newraw is replaced

InnoDB: with raw, and innodb_force_... is removed.

InnoDB: A new raw disk partition was initialized or

InnoDB: innodb_force_recovery is on: we do not allow

InnoDB: database modifications by the user. Shut down

InnoDB: mysqld and edit my.cnf so that newraw is replaced

InnoDB: with raw, and innodb_force_... is removed.

InnoDB: A new raw disk partition was initialized or

InnoDB: innodb_force_recovery is on: we do not allow

InnoDB: database modifications by the user. Shut down

InnoDB: mysqld and edit my.cnf so that newraw is replaced

InnoDB: with raw, and innodb_force_... is removed.

……

……

InnoDB: Error: tablespace id is 2168 in the data dictionary

InnoDB: but in file ./cmon/backup.ibd it is 0!

2018-01-21 20:06:24 7f26f0255700  InnoDB: Assertion failure in thread 139805214463744 in file fil0fil.cc line 796

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to http://bugs.mysql.com.

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html

InnoDB: about forcing recovery.

 

Keystone服务报错:
Operation not allowed when innodb_forced_recovery > 0

 

Mysql可以登录,查表,但无法dump。在innodb_forced_recovery > 0模式下各个库表是只读模式。

 

4)强制恢复3

错误和强制恢复2一致

 

5)强制恢复4

2018-01-21 20:10:43 350749 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."networkdhcpagentbindings"' in the cache. Attempting to load the tablespace with space id 385.

2018-01-21 20:10:47 350749 [Warning] InnoDB: Allocated tablespace 385, old maximum was 0

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."agents"' in the cache. Attempting to load the tablespace with space id 578.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."bgp_speaker_dragent_bindings"' in the cache. Attempting to load the tablespace with space id 576.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_hosting_devices"' in the cache. Attempting to load the tablespace with space id 474.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_router_mappings"' in the cache. Attempting to load the tablespace with space id 476.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."ha_router_agent_port_bindings"' in the cache. Attempting to load the tablespace with space id 392.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."poolloadbalanceragentbindings"' in the cache. Attempting to load the tablespace with space id 441.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."routerl3agentbindings"' in the cache. Attempting to load the tablespace with space id 390.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_LOCKS"' in the cache. Attempting to load the tablespace with space id 1400.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SCHEDULER_STATE"' in the cache. Attempting to load the tablespace with space id 1399.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_FIRED_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1398.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1391.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_BLOB_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1395.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_CRON_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1393.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPLE_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1392.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPROP_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1394.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_PAUSED_TRIGGER_GRPS"' in the cache. Attempting to load the tablespace with space id 1397.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_JOB_DETAILS"' in the cache. Attempting to load the tablespace with space id 1390.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"zabbix"."escalations"' in the cache. Attempting to load the tablespace with space id 1437.

 

Keystone报错Can't lock file (errno: 165 - Table is read only

 

备份数据库:

/usr/bin/sh /opt/backup/shell/backupmysql.sh

Warning: Using a password on the command line interface can be insecure.

Error: Couldn't read status information for table backup ()

mysqldump: Couldn't execute 'show create table `backup`': Table 'cmon.backup' doesn't exist (1146)

 

但如下:
mysql  -uroot -p`cat /etc/contrail/mysql.token` aodh ceilometer cinder glance heat keystone mysql neutron nova nova_api zabbix > ./aaa.sql

 

说明是cmon表出错。

 

6)强制恢复5

2018-01-21 20:10:43 350749 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."networkdhcpagentbindings"' in the cache. Attempting to load the tablespace with space id 385.

2018-01-21 20:10:47 350749 [Warning] InnoDB: Allocated tablespace 385, old maximum was 0

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."agents"' in the cache. Attempting to load the tablespace with space id 578.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."bgp_speaker_dragent_bindings"' in the cache. Attempting to load the tablespace with space id 576.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_hosting_devices"' in the cache. Attempting to load the tablespace with space id 474.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."cisco_router_mappings"' in the cache. Attempting to load the tablespace with space id 476.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."ha_router_agent_port_bindings"' in the cache. Attempting to load the tablespace with space id 392.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."poolloadbalanceragentbindings"' in the cache. Attempting to load the tablespace with space id 441.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"neutron"."routerl3agentbindings"' in the cache. Attempting to load the tablespace with space id 390.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_LOCKS"' in the cache. Attempting to load the tablespace with space id 1400.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SCHEDULER_STATE"' in the cache. Attempting to load the tablespace with space id 1399.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_FIRED_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1398.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1391.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_BLOB_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1395.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_CRON_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1393.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPLE_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1392.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_SIMPROP_TRIGGERS"' in the cache. Attempting to load the tablespace with space id 1394.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_PAUSED_TRIGGER_GRPS"' in the cache. Attempting to load the tablespace with space id 1397.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"flexscape"."QRTZ_JOB_DETAILS"' in the cache. Attempting to load the tablespace with space id 1390.

2018-01-21 20:10:47 350749 [ERROR] InnoDB: Failed to find tablespace for table '"zabbix"."escalations"' in the cache. Attempting to load the tablespace with space id 1437.

 

恢复步骤:

A, B两台controller,均出现上述问题。

0)备份A B两台的/var/lib/mysql文件夹

1)A –>强制恢复4,这样虽然数据库是readonly的,但是仍然是可以dump的,一个个数据库进行dump,现场的情况是cmon数据库无法dump。这样产生一个dump文件。

2)B,删除/var/lib/mysql下的所有文件,然后进行如下:

--  mysql_install_db --user=mysql

--  mysqld_safe --wsrep-new-cluster &

--  sudo -E /usr/local/security/kolla_security_reset

-- mysql -u root --password="${DB_ROOT_PASSWORD}" -e "GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFIED BY '${DB_ROOT_PASSWORD}' WITH GRANT OPTION;"

-- mysql -u root --password="${DB_ROOT_PASSWORD}" -e "GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '${DB_ROOT_PASSWORD}' WITH GRANT OPTION;"

-- mysql -u root --password="${DB_ROOT_PASSWORD}" -e "CREATE USER 'haproxy'@'%' IDENTIFIED BY '';"

--  mysqladmin -uroot -p"${DB_ROOT_PASSWORD}" shutdown

3)B的数据库重建好了,将第一步骤产生的dump文件导入,并启动数据库,之所以这样是因为B没有各个表的用户名及密码;

4)将B中的/var/lib/mysql/下的各个database文件夹拷贝至A中,并且chown –R mysql:mysql

5)A的强制恢复去掉,正常启动/etc/init.d/mysql start --wsrep-new-cluster

6)删除A的cmon文件夹,进行cmon恢复

-- mysql -u root -p`cat /etc/contrail/mysql.token` -e "CREATE SCHEMA IF NOT EXISTS cmon"

-- mysql -u root -p`cat /etc/contrail/mysql.token` < /usr/share/cmon/cmon_db.sql

-- mysql -u root -p`cat /etc/contrail/mysql.token` < /usr/share/cmon/cmon_data.sql

-- mysql -u root -p`cat /etc/contrail/mysql.token` -e "use cmon; insert into cluster(type) VALUES ('galera')"

7)完成恢复

 

20180319:
删除/var/lib/mysql后,动作如下:
1)mysql_install_db --user=mysql

2)mysqld_safe --wsrep-new-cluster &

3)sudo -E /usr/local/security/kolla_security_reset
4)再分别执行上述几个建表和用户权限设定的语句


总结:

上述断电场景是innodb和cmon库表文件不一致导致的,主要恢复思路是强制恢复,此时数据库为只读,这时候mysql会报告不同步的错误,直接忽略,将这些库表dump出来至一个空的mysql中,利用这个空的mysql产生新的innodb文件。

你可能感兴趣的:(超融合openstack)