异常情况

1、收到异常情况如下:

HEALTH_ERR 37 scrub errors; Possible data damage: 1 pg inconsistent

2、查看详细信息

#ceph health detail
HEALTH_ERR 37 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 37 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 1.dbc is active+clean+inconsistent, acting [55,71,25]

3、预处理办法

一般情况采用 ceph pg [pgid],但是经过观察,并不能解决。

参考处理办法

https://ceph.com/geen-categorie/ceph-manually-repair-object/

Just move the object away with the following:

  • stop the OSD that has the wrong object responsible for that PG
  • flush the journal (ceph-osd -i --flush-journal)
  • move the bad object to another location
  • start the OSD again
  • call ceph pg repair 17.1c1

我的处理过程

找出异常的 pg,然后到对应的osd所在主机上修复。


root@CLTQ-064-070:~# ceph osd find 55
{
    "osd": 55,
    "ip": "172.29.64.76:6817/789571",
    "crush_location": {
        "host": "CLTQ-064-076",
        "root": "default"
    }
}

这里表示是主机CLTQ-064-076
然后到 进行修复

1、停止osd

systemctl stop [email protected]

2、刷入日志

ceph-osd -i 55 --flush-journal

3、启动osd

systemctl start [email protected]

4、修复(一般不需要)

ceph pg repair 1.dbc

5、查看pg所在osd

# ceph pg ls|grep 1.dbc

1.dbc      3695                  0        0         0       0 12956202159 1578     1578                active+clean 2018-04-03 19:34:45.924642  2489'4678 2494:19003 [55,71,25]         55 [55,71,25]             55  2489'4678 2018-04-03 18:32:56.365327       2489'4678 2018-04-03 18:32:56.365327 

可以确认集群恢复OK。PG还是在 osd.55上。