ceph 多mon 挂掉后的快速恢复

# ceph -s
cluster 0fbf2746-8132-4944-af64-e29e24e871bb
health HEALTH_WARN
1 mons down, quorum 1,2 ceph01,ceph03
monmap e3: 3 mons at {ceph01=172.28.13.58:6789/0,ceph02=172.28.13.59:6789/0,ceph03=172.28.13.60:6789/0}
election epoch 616, quorum 1,2 ceph01,ceph03
…….

上面看出ceph02挂掉了.

移除该mon
# ceph mon remove ceph02

清理该mon的文件
# rm -rf /var/lib/ceph/mon/ceph-ceph02

为ceph02 mon 初始化数据库 store.db
# ceph-mon –mkfs -i ceph02 –keyring /etc/ceph/ceph.mon.keyring

补充done空文件
# touch /var/lib/ceph/mon/ceph-ceph02/done

添加ceph02 mon的keyring
# ceph auth get-or-create mon.ceph02 mon ‘allow rwx’ osd ‘allow *’ -o /var/lib/ceph/mon/ceph-ceph02/keyring

补充服务相关空文件:
如果是sysvinit来管理的用这个:
# touch /var/lib/ceph/mon/ceph-ceph02/sysvinit
如果是systemctl来管理的用这个:
# touch /var/lib/ceph/mon/ceph-ceph02/systemd

修改文件权限
# chown -R ceph:ceph /var/lib/ceph/mon/ceph-ceph02

添加mon到集群
# ceph mon add ceph02 172.28.13.59:6789

重启服务
# systemctl restart [email protected]

报错了:
Apr 26 17:14:48 ceph02 systemd[1]: [email protected] failed.
Apr 26 17:14:48 ceph02 polkitd[741]: Unregistered Authentication Agent for unix-process:29956:253026270 (system bus name :1.10026, object path /org/freedesktop/PolicyKit1/AuthenticationAg
Apr 26 17:14:54 ceph02 polkitd[741]: Registered Authentication Agent for unix-process:29988:253026886 (system bus name :1.10027 [/usr/bin/pkttyagent –notify-fd 5 –fallback], object path
Apr 26 17:14:54 ceph02 systemd[1]: start request repeated too quickly for [email protected]
Apr 26 17:14:54 ceph02 systemd[1]: Failed to start Ceph cluster monitor daemon.
— Subject: Unit [email protected] has failed

reload下,再次重启正常
# systemctl daemon-reload
# systemctl restart [email protected]

# ceph -s
cluster 0fbf2746-8132-4944-af64-e29e24e871bb
health HEALTH_OK
monmap e7: 3 mons at {ceph01=172.28.13.58:6789/0,ceph02=172.28.13.59:6789/0,ceph03=172.28.13.60:6789/0}
election epoch 626, quorum 0,1,2 ceph01,ceph02,ceph03
fsmap e98: 1/1/1 up {0=ceph01=up:active}
osdmap e280: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v41214: 594 pgs, 16 pools, 2197 MB data, 854 objects
7669 MB used, 592 GB / 599 GB avail
594 active+clean

验证下是否有文件丢失
# ceph osd pool ls
# rados -p rbd ls

你可能感兴趣的:(Ceph)