【ceph】pg状态不正常,Degraded data redundancy: 460/77222938 objects degraded (0.001%), 11 pgs degraded

  本站以分享各种运维经验和运维所需要的技能为主

《python零基础入门》:python零基础入门学习

《python运维脚本》: python运维脚本实践

《shell》:shell学习

《terraform》持续更新中:terraform_Aws学习零基础入门到最佳实战

《k8》暂未更新

《docker学习》暂未更新

《ceph学习》ceph日常问题解决分享

《日志收集》ELK+各种中间件

《运维日常》运维日常

《linux》运维面试100问

 pg状态不正常

            Degraded data redundancy: 460/77222938 objects degraded (0.001%), 11 pgs degraded, 20 pgs undersized

    pg 1.9 is stuck undersized for 67505.094338, current state active+undersized+degraded, last acting [95,60]

    pg 1.a is stuck undersized for 67505.092252, current state active+undersized, last acting [125,26]

    pg 1.b is stuck undersized for 67505.096730, current state active+undersized, last acting [25,125]

    pg 1.c is stuck undersized for 67505.098017, current state active+undersized, last acting [79,166]

    pg 1.e is stuck undersized for 67505.096981, current state active+undersized+degraded, last acting [97,63]

    pg 2.1 is stuck undersized for 66585.680482, current state active+undersized+degraded, last acting [167,74]

    pg 2.7 is stuck undersized for 64373.451329, current state active+undersized+degraded, last acting [119,13]

    pg 2.8 is stuck undersized for 66579.748265, current state active+undersized+degraded, last acting [141,7]

    pg 2.a is stuck undersized for 63851.308379, current state active+undersized+degraded, last acting [85,0]

    pg 2.d is stuck undersized for 63752.075326, current state active+undersized+degraded, last acting [51,115]

    pg 3.a is stuck undersized for 63856.354608, current state active+undersized+degraded, last acting [119,27]

    pg 4.1 is stuck undersized for 67505.096880, current state active+undersized, last acting [30,87]

    pg 4.2 is stuck undersized for 67505.061049, current state active+undersized, last acting [158,74]

    pg 4.6 is stuck undersized for 67505.081249, current state active+undersized, last acting [111,22]

    pg 4.7 is stuck undersized for 67505.093693, current state active+undersized, last acting [95,17]

    pg 4.8 is stuck undersized for 67505.092161, current state active+undersized, last acting [70,150]

    pg 4.9 is stuck undersized for 67505.095216, current state active+undersized+degraded, last acting [107,17]

    pg 4.c is stuck undersized for 67505.097948, current state active+undersized, last acting [127,30]

    pg 4.d is stuck undersized for 67505.086830, current state active+undersized+degraded, last acting [123,2]

    pg 4.f is stuck undersized for 67505.097398, current state active+undersized+degraded, last acting [110,66]

根据信息, Ceph 集群的健康状态显示为 HEALTH_WARN,并且存在数据冗余度降低的警告。具体的警告信息是:460/76950244 个对象降级(0.001%),11 个降级的 PG(Placement Group),20 个大小不足的 PG。

警告信息中提到了一些 pg(Placement Group)被标记为 "stuck undersized",表示这些 PG 处于大小不足的状态,并且已经持续了一段时间。每个 PG 的状态描述了它们的活动状态、大小和降级情况,以及最后一次执行操作的 OSD(Object Storage Daemon)节点。

这些警告表明你的 Ceph 集群中的一些 PG 处于不完整的状态,可能由于某些原因导致了数据冗余度的降低。这可能会影响数据的可靠性和性能。为了解决这个问题,你可以采取以下步骤:

  1. 检查集群的 OSD 健康状态:运行 ceph osd tree 命令,检查 OSD 的状态是否正常,是否有 OSD 处于故障或离线状态。

  2. 检查集群的存储池(Pool)状态:运行 ceph osd pool stats 命令,检查存储池的状态是否正常,是否有存储池达到了容量限制。

  3. 检查集群的副本数和故障域设置:确保你的存储池设置了足够的副本数,并且故障域设置正确。这样可以确保数据在集群中的多个位置进行复制,提高数据的冗余度和可靠性。

  4. 执行数据再平衡操作:运行 ceph pg repair 命令来触发数据再平衡操作,以恢复 PG 的正常状态。这将重新分布数据并恢复大小不足的 PG。

  5. 监控和调整集群性能:确保你的 Ceph 集群具备足够的计算、存储和网络资源,以满足负载需求。监控集群的性能指标,并根据需要进行调整和优化。

我的问题是第三条,副本数规划不对,修改完后集群状态恢复ok。

这里也参考一下了一下大佬写的pg状态详解,对我帮助颇大。

分布式存储Ceph之PG状态详解 - 简书 

 

你可能感兴趣的:(ceph,ceph)