记录一次手动模拟bluestore集群故障,osd数据未损坏,rbd-image恢复方法。
bluestore由于没有文件系统了,所以不能像filestore那样看到每个osd下的current目录下对应的文件了,并且集群已经不可用,无法通过rados命令拿到pool中的对象,osd全部处于离线状态.
[root@test-1 home]# ceph osd pool create rbd 64 64
pool 'rbd' created
[root@test-1 home]# rbd create rbd/image01 --size 50G
[root@test-1 home]# rbd ls
image01
[root@test-1 home]#
由于内核版本低,map不出来,只能用librbd向其中写了一些数据,我是在rbd10M的位置写了“abc123”字符串.
[root@test-1 withoutdu]# ./util
Created a cluster handle: ok.
Connected to the cluster: ok.
read buf str = abc123 // 利用librbd读出来的数据正是写进去的数据,用来做迁移rbd后的验证.
停止当前进群全部服务.模拟集群不可用(比如mondb损坏等问题),但是osd数据未损坏.
利用ceph-objectstore-tool工具导出所有对象.
先列出来要导出的对象.
[root@test-1 rbddir]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --type bluestore --op list
["2.31",{"oid":"rbd_data.37146b8b4567.0000000000000002","key":"","snapid":-2,"hash":3590010929,"max":0,"pool":2,"namespace":"","max":0}]
[root@test-1 rbddir]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --type bluestore --op list
["2.1c",{"oid":"rbd_directory","key":"","snapid":-2,"hash":816417820,"max":0,"pool":2,"namespace":"","max":0}]
["2.4",{"oid":"rbd_header.37146b8b4567","key":"","snapid":-2,"hash":2261731972,"max":0,"pool":2,"namespace":"","max":0}]
["2.4",{"oid":"rbd_id.image01","key":"","snapid":-2,"hash":3086248900,"max":0,"pool":2,"namespace":"","max":0}]
["2.2",{"oid":"rbd_object_map.37146b8b4567","key":"","snapid":-2,"hash":3371410498,"max":0,"pool":2,"namespace":"","max":0}]
[root@test-1 rbddir]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2/ --type bluestore --op list
["2.3a",{"oid":"rbd_info","key":"","snapid":-2,"hash":2886620986,"max":0,"pool":2,"namespace":"","max":0}]
然后分别将全部对象从对应的osd中拿出来,导出到文件中,请确保崩溃前此osd中的数据是一致的,如果不确定的话可以把全部osd中的同名对象都拿出来,比较md5,选择md5相同的对象(3副本的情况)
[root@test-1 rbddir]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --type bluestore rbd_data.37146b8b4567.0000000000000002 get-bytes /home/rbddir/rbd_data.37146b8b4567.0000000000000002
[root@test-1 rbddir]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --type bluestore rbd_directory get-bytes /home/rbddir/rbd_directory
[root@test-1 rbddir]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --type bluestore rbd_header.37146b8b4567 get-bytes /home/rbddir/rbd_header.37146b8b4567
[root@test-1 rbddir]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --type bluestore rbd_id.image01 get-bytes /home/rbddir/rbd_id.image01
[root@test-1 rbddir]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1/ --type bluestore rbd_object_map.37146b8b4567 get-bytes /home/rbddir/rbd_object_map.37146b8b4567
[root@test-1 rbddir]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-2/ --type bluestore rbd_info get-bytes /home/rbddir/rbd_info
[root@test-1 rbddir]# pwd
/home/rbddir
[root@test-1 rbddir]# ls
rbd_data.37146b8b4567.0000000000000002 rbd_directory rbd_header.37146b8b4567 rbd_id.image01 rbd_info rbd_object_map.37146b8b4567
[root@test-1 rbddir]#
[root@aaa-1 rbddir]# ceph osd pool create rbd 128 128
pool 'rbd' created
[root@aaa-1 rbddir]# ls
rbd_data.37146b8b4567.0000000000000002 rbd_directory rbd_header.37146b8b4567 rbd_id.image01 rbd_info rbd_object_map.37146b8b4567
这里需要注意的一点是,如果format为1的image,就可以直接导入到pool中;如果是format为2的image,需要再新pool中新建一个同名image,然后把对应的位置全部替换(即把旧image对象中带有rbd_id的地方替换为新的rbd_id,然后再导入新pool中,比如rdb_data.xxxxxx.000000002中间的rbd_id需要替换为新的,object_map也需要替换)
[root@aaa-1 withoutdu]# rbd ls
image01
[root@aaa-1 withoutdu]# ./util
Created a cluster handle: ok.
Connected to the cluster: ok.
read buf str = abc123
本次只是做了简单的验证数据是否一致的问题,没有做大规模的验证,可能后续会补上。
至于image-format 为 2 的时候为什么不能直接导入对象来生成image这块我也没太弄清楚,待日后弄清楚了再补上。
目前怀疑是因为format带有的这些feature不是通过rbd对象保存的,而是保存到osd的metadata中了(纯属瞎猜)。
如果发现有什么不对的地方,欢迎指正,共同学习探讨.