ceph health detail,可以看到如下pg处于incomplete状态:
pg 7.c is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.11 is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.15 is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.17 is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.1a is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.1f is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.22 is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.25 is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.27 is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.29 is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.39 is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.3f is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.41 is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.43 is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.48 is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.4d is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.4f is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.55 is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.5b is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.5d is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.61 is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.66 is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.67 is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.6e is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.76 is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.78 is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.7b is incomplete, acting [3,2] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 7.7e is incomplete, acting [2,3] (reducing pool ceph-kvm-pool min_size from 2 may help; search ceph.com/docs for 'incomplete')
看起来,都是位于osd.2和osd.3上。先设置集群不进行reblance:
ceph osd set noout
ceph osd set nodown
ceph osd set norebalance
停止osd.2和osd.3:
ceph osd down osd.2
ceph osd down osd.3
列出osd.2下的pg:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2/ --op list-pgs
可以看到,上面标记的incomplete状态的pg确实在里面。
用如下脚本,令down掉的osd里的pg都被mark为complete:
#!/bin/bash
for i in `ceph osd tree down 2>/dev/null |grep -w -A 4 node1 |grep -v node|awk '{print $4}'|sed 's/osd.//g'` #获取当前节点down的osd
do
ceph osd in osd.$i #将osd标记为in,防止数据迁移
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-$i/ --op list-pgs > pg."$i" 2>/dev/null #获取当前osd的所有pg并放入对应pg.id文件中
for j in `cat pg."$i"`
do
ceph-objectstore-tool --pgid $j --op mark-complete --data-path /var/lib/ceph/osd/ceph-$i/ --type bluestore #取出每个pg,标记为complete
done
done
需要注意:ceph osd tree down 2>/dev/null |grep -w -A 4 node1 |grep -v node 这里的node1和node请按照自己的环境情况进行替换,保证可以拿到正确的ceph-osd的index。
启动osd:
systemctl enable ceph-osd@2
systemctl enable ceph-osd@3
systemctl start ceph-osd@2
systemctl start ceph-osd@3
清除标记:
ceph osd unset noout
ceph osd unset nodown
ceph osd unset norebalance
如果还存在pg报错,not deep-scrubbed in time,手动deep-scrubbed一下pg:
ceph pg deep-scrub 7.66
ceph pg deep-scrub 7.5b
ceph pg deep-scrub 7.15
ceph pg deep-scrub 7.1f
ceph pg deep-scrub 7.25
此后,ceph还是处于health_warn状态,因为有91 daemons have recently crashed警告
列出ceph全部crash记录:ceph crash ls-new
将奔溃记录归档:ceph crash archive-all
此时,ceph即恢复到health_ok状态