ceph报 daemons have recently crashed

问题:ceph rdma协议的集群总是报daemons have recently crashed,而且数目越来越多,然并没有找到相关错误的日志
解决:可参考官网解决方案

RECENT_CRASH

One or more Ceph daemons has crashed recently, and the crash has not yet been archived (acknowledged) by the administrator. This may indicate a software bug, a hardware problem (e.g., a failing disk), or some other problem.

New crashes can be listed with:

#ceph crash ls-new

Information about a specific crash can be examined with:

#ceph crash info 

This warning can be silenced by “archiving” the crash (perhaps after being examined by an administrator) so that it does not generate this warning:

#ceph crash archive 

Similarly, all new crashes can be archived with:

#ceph crash archive-all

Archived crashes will still be visible via ceph crash ls but not ceph crash ls-new.

The time period for what “recent” means is controlled by the option mgr/crash/warn_recent_interval (default: two weeks).

These warnings can be disabled entirely with:

#ceph config set mgr/crash/warn_recent_interval 0

参考:

https://docs.ceph.com/docs/master/rados/operations/health-checks/?highlight=backfillfull%20ratio
https://docs.ceph.com/docs/master/mgr/crash/?highlight=crash

你可能感兴趣的:(ceph,ceph,crush)