ceph PG 状态恢复不了的问题 (来自ceph-devel邮件)

ceph health detail
HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive
pg 1.165 is stuck inactive since forever, current state down+remapped+peering, last acting [38,48]
pg 1.60 is stuck inactive since forever, current state down+remapped+peering, last acting [66,40]
pg 1.60 is down+remapped+peering, acting [66,40]
pg 1.165 is down+remapped+peering, acting [38,48]
[root@cc1 ~]# ceph -s
cluster 8cdfbff9-b7be-46de-85bd-9d49866fcf60
health HEALTH_WARN
2 pgs down
2 pgs peering
2 pgs stuck inactive
monmap e1: 3 mons at {cc1=192.168.128.1:6789/0,cc2=192.168.128.2:6789/0,cc3=192.168.128.3:6789/0}
election epoch 872, quorum 0,1,2 cc1,cc2,cc3
osdmap e115175: 100 osds: 88 up, 86 in; 2 remapped pgs
pgmap v67583069: 3520 pgs, 17 pools, 26675 GB data, 4849 kobjects
76638 GB used, 107 TB / 182 TB avail
3515 active+clean
3 active+clean+scrubbing+deep
2 down+remapped+peering
client io 0 B/s rd, 869 kB/s wr, 14 op/s rd, 113 op/s wr



The thing where you can't query a PG is because the OSD is throttling
incoming work and the throttle is exhausted (the PG can't do work so it
isn't making progress). A workaround for jewel is to restart the OSD
serving the PG and do the query quickly after that (probably in a loop so
that you catch it after it starts up but before the throttle is
exhausted again). (In luminous this is fixed.)
Once you have the query output ('ceph tell $pgid query') you'll be able to
tell what is preventing the PG from peering.
You can identify the osd(s) hosting the pg with 'ceph pg map $pgid'.


If you haven't deleted the data, you should start the OSDs back up.
If they are partially damanged you can use ceph-objectstore-tool to
extract just the PGs in question to make sure you haven't lost anything,
inject them on some other OSD(s) and restart those, and *then* mark the
bad OSDs as 'lost'.
If all else fails, you can just mark those OSDs 'lost', but in doing so
you might be telling the cluster to lose data.
The best thing to do is definitely to get those OSDs started again.

你可能感兴趣的:(ceph PG 状态恢复不了的问题 (来自ceph-devel邮件))