本站以分享各种运维经验和运维所需要的技能为主
《python零基础入门》:python零基础入门学习
《python运维脚本》: python运维脚本实践
《shell》:shell学习
《terraform》持续更新中:terraform_Aws学习零基础入门到最佳实战
《k8》暂未更新
《docker学习》暂未更新
《ceph学习》ceph日常问题解决分享
《日志收集》ELK+各种中间件
《运维日常》运维日常
《linux》运维面试100问
Degraded data redundancy: 460/77222938 objects degraded (0.001%), 11 pgs degraded, 20 pgs undersized
pg 1.9 is stuck undersized for 67505.094338, current state active+undersized+degraded, last acting [95,60]
pg 1.a is stuck undersized for 67505.092252, current state active+undersized, last acting [125,26]
pg 1.b is stuck undersized for 67505.096730, current state active+undersized, last acting [25,125]
pg 1.c is stuck undersized for 67505.098017, current state active+undersized, last acting [79,166]
pg 1.e is stuck undersized for 67505.096981, current state active+undersized+degraded, last acting [97,63]
pg 2.1 is stuck undersized for 66585.680482, current state active+undersized+degraded, last acting [167,74]
pg 2.7 is stuck undersized for 64373.451329, current state active+undersized+degraded, last acting [119,13]
pg 2.8 is stuck undersized for 66579.748265, current state active+undersized+degraded, last acting [141,7]
pg 2.a is stuck undersized for 63851.308379, current state active+undersized+degraded, last acting [85,0]
pg 2.d is stuck undersized for 63752.075326, current state active+undersized+degraded, last acting [51,115]
pg 3.a is stuck undersized for 63856.354608, current state active+undersized+degraded, last acting [119,27]
pg 4.1 is stuck undersized for 67505.096880, current state active+undersized, last acting [30,87]
pg 4.2 is stuck undersized for 67505.061049, current state active+undersized, last acting [158,74]
pg 4.6 is stuck undersized for 67505.081249, current state active+undersized, last acting [111,22]
pg 4.7 is stuck undersized for 67505.093693, current state active+undersized, last acting [95,17]
pg 4.8 is stuck undersized for 67505.092161, current state active+undersized, last acting [70,150]
pg 4.9 is stuck undersized for 67505.095216, current state active+undersized+degraded, last acting [107,17]
pg 4.c is stuck undersized for 67505.097948, current state active+undersized, last acting [127,30]
pg 4.d is stuck undersized for 67505.086830, current state active+undersized+degraded, last acting [123,2]
pg 4.f is stuck undersized for 67505.097398, current state active+undersized+degraded, last acting [110,66]
根据信息, Ceph 集群的健康状态显示为
HEALTH_WARN
,并且存在数据冗余度降低的警告。具体的警告信息是:460/76950244 个对象降级(0.001%),11 个降级的 PG(Placement Group),20 个大小不足的 PG。警告信息中提到了一些
pg
(Placement Group)被标记为 "stuck undersized",表示这些 PG 处于大小不足的状态,并且已经持续了一段时间。每个 PG 的状态描述了它们的活动状态、大小和降级情况,以及最后一次执行操作的 OSD(Object Storage Daemon)节点。这些警告表明你的 Ceph 集群中的一些 PG 处于不完整的状态,可能由于某些原因导致了数据冗余度的降低。这可能会影响数据的可靠性和性能。为了解决这个问题,你可以采取以下步骤:
检查集群的 OSD 健康状态:运行
ceph osd tree
命令,检查 OSD 的状态是否正常,是否有 OSD 处于故障或离线状态。检查集群的存储池(Pool)状态:运行
ceph osd pool stats
命令,检查存储池的状态是否正常,是否有存储池达到了容量限制。检查集群的副本数和故障域设置:确保你的存储池设置了足够的副本数,并且故障域设置正确。这样可以确保数据在集群中的多个位置进行复制,提高数据的冗余度和可靠性。
执行数据再平衡操作:运行
ceph pg repair
命令来触发数据再平衡操作,以恢复 PG 的正常状态。这将重新分布数据并恢复大小不足的 PG。监控和调整集群性能:确保你的 Ceph 集群具备足够的计算、存储和网络资源,以满足负载需求。监控集群的性能指标,并根据需要进行调整和优化。
我的问题是第三条,副本数规划不对,修改完后集群状态恢复ok。
这里也参考一下了一下大佬写的pg状态详解,对我帮助颇大。
分布式存储Ceph之PG状态详解 - 简书