kudu master故障恢复

1.10版本的运行过程中,有1个master节点显示异常状态,参考官网以下资料和实际操作记录如下:
https://github.com/apache/kudu/blob/master/docs/administration.adoc#migrate_to_multi_master

https://kudu.apache.org/docs/command_line_tools_reference.html#local_replica-cmeta

#kudu master脑裂抢救…无效,按理解,这个需要集群本身是健康的,去先加后减节点,实现迁移
#新加节点
sudo -u kudu kudu master add 192.168.1.54:7051 192.168.1.52:7051 --fs_wal_dir=/home/kudu_server/master/wals --fs_data_dirs=/home/kudu_server/master/data
#移除节点
sudo -u kudu kudu master remove 192.168.1.53:7051,192.168.1.54:7051 192.168.1.52:7051

#下面这个有效,在master挂掉一个,另外新节点起集群时适用
#停kudu所有服务
#新节点执行,格式化本地文件
sudo -u kudu kudu fs format --fs_wal_dir=/home/kudu_server/master/wals --fs_data_dirs=/home/kudu_server/master/data
#3个节点执行,记录返回的uuid,dump相当于元数据本地备份文件的id
sudo -u kudu kudu fs dump uuid --fs_wal_dir=/home/kudu_server/master/wals --fs_data_dirs=/home/kudu_server/master/data
#所有旧节点执行,相当于修改本地文件的master元数据,ip前的ID是3个节点dump的uuid
sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/home/kudu_server/master/wals --fs_data_dirs=/home/kudu_server/master/data 00000000000000000000000000000000 442c101a8ec844b88616be154d1c7329:192.168.1.53:7051 25d7184b525a4797bc9610524c66f95b:192.168.1.54:7051 7e25cf00fd444f8aaf423fdb4e027e0a:192.168.1.52:7051
#启动旧节点服务,注意,需要/etc/kudu/conf/master.gflagfile配置–master_addresses=192.168.1.52:7051,192.168.1.53:7051,192.168.1.54:7051,否则旧节点启动会提示配置和元信息不一致
#新节点执行,同步旧节点元数据
sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/home/kudu_server/master/wals --fs_data_dirs=/home/kudu_server/master/data 00000000000000000000000000000000 192.168.1.54:7051
#启动新节点,检查服务

#查看master信息,需要每个节点状态都相同,正常
kudu master list 192.168.1.52:7051,192.168.1.53:7051,192.168.1.54:7051

你可能感兴趣的:(大数据,kudu,运维,故障恢复)