RBD块存储是ceph提供的3种存储类型中使用最广泛,最稳定的存储类型。RBD块类似于磁盘,可以挂载到物理机或虚拟机中,通常的挂载方式有两种:
块是一个有序字节,普通的一个块大小为512字节。基于块存储是最常见的方式,常见的硬盘、软盘和CD光驱等都是存储数据最简单快捷的设备。
在物理机上提供块设备时,使用的是Kernel的RBD模块,基于内核模块驱动时,可以使用Linux自带的页缓存(Page Caching)来提高性能。
当在虚拟机(比如QUEM/KVM)提供块设备时,通常是使用LIBVIRT调用librbd库的方式提供块设备。
在部署之前,需要检查内核版本,看是否支持RBD,建议升级到4.5以上版本内核
[root@local-node-1 ~]# uname -r
4.4.174-1.el7.elrepo.x86_64
[root@local-node-1 ~]# modprobe rbd
[root@local-node-1 ~]# ceph -s
cluster:
id: 7bd25f8d-b76f-4ff9-89ec-186287bbeaa5
health: HEALTH_OK
services:
mon: 2 daemons, quorum local-node-2,local-node-3
mgr: ceph-mgr(active)
osd: 9 osds: 9 up, 9 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 9.2 GiB used, 81 GiB / 90 GiB avail
pgs:
[root@local-node-1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.08817 root default
-3 0.02939 host local-node-1
0 hdd 0.00980 osd.0 up 1.00000 1.00000
1 hdd 0.00980 osd.1 up 1.00000 1.00000
2 hdd 0.00980 osd.2 up 1.00000 1.00000
-5 0.02939 host local-node-2
3 hdd 0.00980 osd.3 up 1.00000 1.00000
4 hdd 0.00980 osd.4 up 1.00000 1.00000
5 hdd 0.00980 osd.5 up 1.00000 1.00000
-7 0.02939 host local-node-3
6 hdd 0.00980 osd.6 up 1.00000 1.00000
7 hdd 0.00980 osd.7 up 1.00000 1.00000
8 hdd 0.00980 osd.8 up 1.00000 1.00000
ceph osd pool create ceph_rbd 128
rbd pool init ceph_rbd
ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]'
EG:
# ceph auth get-or-create client.docker mon 'profile rbd' osd 'profile rbd pool=ceph_rbd, profile rbd-read-only pool=images'
[client.docker]
key = AQDQkK1cpNAKJRAAnaw2ZYeFHsXrsTWX3QonkQ==
[root@local-node-1 ~]# cat /etc/ceph/ceph.client.docker.keyring
[client.docker]
key = AQDQkK1cpNAKJRAAnaw2ZYeFHsXrsTWX3QonkQ==
rbd create --size {megabytes} {pool-name}/{image-name}
eg:
[root@local-node-1 ~]# rbd create --size 1024 ceph_rbd/docker_image
[root@local-node-1 ~]# rbd ls ceph_rbd
docker_image
[root@local-node-1 ~]# rbd trash ls ceph_rbd
[root@local-node-1 ~]# rbd info ceph_rbd/docker_image
rbd image 'docker_image':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
id: 1bedc6b8b4567
block_name_prefix: rbd_data.1bedc6b8b4567
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags:
create_timestamp: Wed Apr 10 14:52:48 2019
# 扩容
[root@local-node-1 ~]# rbd resize --size 2048 ceph_rbd/docker_image
Resizing image: 100% complete...done.
# 缩容
[root@local-node-1 ~]# rbd resize --size 1024 ceph_rbd/docker_image --allow-shrink
Resizing image: 100% complete...done.
Ceph的块镜像设备是精简配置,在创建他们并指定大小时并不会使用物理存储空间,直到存储数据。但是它们可以通过--size指定最大容量的限制。
rbd rm ceph_rbd/docker_image
# rbd trash mv ceph_rbd/docker_image
# rbd trash ls ceph_rbd
1beb76b8b4567 docker_image
rbd trash restore ceph_rbd/1beb76b8b4567
# 此时trash中将不会有数据,原来的镜像已经恢复
# rbd ls ceph_rbd
docker_image
rbd trash restore ceph_rbd/1beb76b8b4567 --image docker # 将镜像重命名为docker
rbd trash rm ceph_rbd/1beb76b8b4567
提示:
# 在内核版本为4.5以下的,建议使用hammer,否则会出现无法映射rbd的情况
[root@local-node-1 ~]# ceph osd crush show-tunables
{
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 54,
"profile": "hammer",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "hammer",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 1,
"require_feature_tunables5": 0,
"has_v5_rules": 0
}
# 设置版本
[root@local-node-1 ~]# ceph osd crush tunables hammer
[root@local-node-1 ~]# rbd list ceph_rbd
docker_image
3.在客户端映射块设备(需要安装ceph)
[root@local-node-1 ~]# rbd device map ceph_rbd/docker_image --id admin
/dev/rbd0
==提示:==
如果在执行映射步骤时出现以下报错,说明当前内核rbd的一些特性并不支持,需要禁用某些特性:
# rbd device map ceph_rbd/docker_image --id admin
rbd: sysfs write failed
RBD image feature set mismatch. Try disabling features unsupported by the kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address
禁用特性:
# rbd feature disable ceph_rbd/docker_image exclusive-lock, object-map, fast-diff, deep-flatten
sudo rbd device map rbd/myimage --id admin --keyring /path/to/keyring
sudo rbd device map rbd/myimage --id admin --keyfile /path/to/file
eg:
#rbd device map ceph_rbd/docker_image --id docker --keyring /etc/ceph/ceph.client.docker.keyring
/dev/rbd0
[root@local-node-1 ~]# rbd device list
id pool image snap device
0 ceph_rbd docker_image - /dev/rbd0
[root@local-node-1 ~]# lsblk | grep rbd
rbd0 252:0 0 1G 0 disk
[root@local-node-1 ~]# mkfs.xfs /dev/rbd
[root@local-node-1 ~]# mount /dev/rbd0 /mnt
7 . 如果要卸载设备使用如下命令:
[root@local-node-1 ~]# umount /mnt/
[root@local-node-1 ~]# rbd device unmap /dev/rbd0
当挂载的rbd 被卸载掉之后,块设备中的数据一般情况下不会丢失(强制重启后可能会损坏数据从而不可用),可以重新挂载到另一个主机上。
RBD支持复制,快照和在线扩容等功能。
快照是映像在某个特定时间点的一份==只读副本==。 Ceph 块设备的一个高级特性就是你可以为映像创建快照来保留其历史。 Ceph 还支持分层快照,让你快速、简便地克隆映像(如 VM 映像)。 Ceph 的快照功能支持 rbd 命令和多种高级接口,包括 QEMU 、 libvirt 、 OpenStack 和 CloudStack 。
如果在做快照时映像仍在进行 I/O 操作,快照可能就获取不到该映像准确的或最新的数据,并且该快照可能不得不被克隆到一个新的可挂载的映像中。所以,我们建议在做快照前先停止 I/O 操作。如果映像内包含文件系统,在做快照前请确保文件系统处于一致的状态或者使用
fsck
命令先检查挂载的块设备。要停止 I/O 操作可以使用 fsfreeze 命令。 对于虚拟机,qemu-guest-agent 被用来在做快照时自动冻结文件系统。
fsfreeze -f /mnt
rbd snap create rbd/foo@snapname
EG:
rbd snap create rbd/test@test-snap
fsfreeze -u /mnt/
[root@local-node-1 ~]# rbd snap ls rbd/test
SNAPID NAME SIZE TIMESTAMP
4 test-snap 1 GiB Tue Apr 16 14:49:27 2019
5 test-snap-2 1 GiB Tue Apr 16 15:56:18 2019
rbd snap rollback rbd/test@test-snap
# mount /dev/rbd0 /mnt/
mount: wrong fs type, bad option, bad superblock on /dev/rbd0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
# 当使用xfs_repair修复时出现如下报错:
xfs_repair /dev/rbd0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
# 使用-L 参数修复:
xfs_repair -L /dev/rbd0
mount /dev/rbd0 /mnt/
rbd snap rm rbd/test@test-snap
# 如果要删除一个映像的所有快照,执行:
rbd snap purge rbd/foo
Ceph 可以从快照中克隆写时复制副本,由于快照是只读的,当需要对文件进行修改时,我们可以使用快照创建一个写时复制的副本。(Openstack中,使用这种机制创建新的虚拟机,通常使用快照保存镜像,复制这个快照创建新的虚拟机)
Ceph 仅支持克隆 format 2 的映像(即用 rbd create --image-format 2 创建的,这个在新版中是默认的)。内核客户端从 3.10 版开始支持克隆的映像
具体的流程如下:
创建块设备映像-->创建快照-->保护快照-->克隆快照
# 创建快照
rbd snap create rbd/test@test-snap
#列出快照
# rbd snap list rbd/test
SNAPID NAME SIZE TIMESTAMP
10 test-snap 1 GiB Tue Apr 16 17:46:48 2019
# 保护快照,这样就无法被删除了
rbd snap protect rbd/test@test-snap
rbd clone {pool-name}/{parent-image}@{snap-name} {pool-name}/{child-image-name}
EG:
rbd clone rbd/test@test-snap rbd/test-new
查看新创建的镜像:
# rbd ls
test
test-new
rbd snap unprotect rbd/test@test-snap
# rbd children rbd/test@test-snap
rbd/test-new
[root@local-node-1 ~]# rbd --pool rbd --image test-new info
rbd image 'test-new':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
id: ba9096b8b4567
block_name_prefix: rbd_data.ba9096b8b4567
format: 2
features: layering
op_features:
flags:
create_timestamp: Tue Apr 16 17:53:51 2019
parent: rbd/test@test-snap # 此处显示了父快照等关联信息
overlap: 1 GiB
rbd flatten rbd/test-new
# rbd resize ceph_rbd/docker_image --size 4096
Resizing image: 100% complete...done.
# rbd info ceph_rbd/docker_image
rbd image 'docker_image':
size 4 GiB in 1024 objects
order 22 (4 MiB objects)
id: 1bef96b8b4567
block_name_prefix: rbd_data.1bef96b8b4567
format: 2
features: layering
op_features:
flags:
create_timestamp: Wed Apr 10 15:50:21 2019
# xfs_growfs -d /mnt
meta-data=/dev/rbd0 isize=512 agcount=9, agsize=31744 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0 spinodes=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=1024 swidth=1024 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
data blocks changed from 262144 to 1048576
# lsblk | grep mnt
rbd0 252:0 0 4G 0 disk /mnt
# df -h | grep mnt
/dev/rbd0 4.0G 3.1G 998M 76% /mnt
# rbd resize --size 2048 ceph_rbd/docker_image --allow-shrink
Resizing image: 100% complete...done.
# lsblk |grep mnt
rbd0 252:0 0 2G 0 disk /mnt
# xfs_growfs -d /mnt/
meta-data=/dev/rbd0 isize=512 agcount=34, agsize=31744 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0 spinodes=0
data = bsize=4096 blocks=1048576, imaxpct=25
= sunit=1024 swidth=1024 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
data size 524288 too small, old size is 1048576
==如果要卸载RBD,存放的数据会丢失。==
[root@local-node-3 mnt]# df -h | grep mnt
...
/dev/rbd0 2.0G 33M 2.0G 2% /mnt
[root@local-node-3 ~]# rbd device list
id pool image snap device
0 ceph_rbd docker_image - /dev/rbd0
[root@local-node-3 ~]# rbd device unmap /dev/rbd0
[root@local-node-3 ~]# rbd ls ceph_rbd
docker_image
[root@local-node-3 ~]# rbd info ceph_rbd/docker_image
rbd image 'docker_image':
size 2 GiB in 512 objects
order 22 (4 MiB objects)
id: 1bef96b8b4567
block_name_prefix: rbd_data.1bef96b8b4567
format: 2
features: layering
op_features:
flags:
create_timestamp: Wed Apr 10 15:50:21 2019
[root@local-node-3 ~]# rbd trash ls ceph_rbd
# 将rbd移除到trash,也可以直接删除
[root@local-node-3 ~]# rbd trash mv ceph_rbd/docker_image
[root@local-node-3 ~]# rbd trash ls ceph_rbd
1bef96b8b4567 docker_image
# 从trash 中删除镜像
[root@local-node-3 ~]# rbd trash rm ceph_rbd/1bef96b8b4567
Removing image: 100% complete...done.
[root@local-node-3 ~]# rbd trash ls ceph_rbd
[root@local-node-3 ~]# rbd ls ceph_rbd
[root@local-node-3 ~]# ceph osd lspools
7 ceph_rbd
[root@local-node-3 ~]# ceph osd pool rm ceph_rbd ceph_rbd --yes-i-really-really-mean-it
pool 'ceph_rbd' removed
[root@local-node-3 ~]# ceph -s
cluster:
id: 7bd25f8d-b76f-4ff9-89ec-186287bbeaa5
health: HEALTH_OK
services:
mon: 3 daemons, quorum local-node-1,local-node-2,local-node-3
mgr: ceph-mgr(active)
osd: 9 osds: 9 up, 9 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 9.3 GiB used, 81 GiB / 90 GiB avail
pgs:
[root@local-node-3 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.08817 root default
-3 0.02939 host local-node-1
0 hdd 0.00980 osd.0 up 1.00000 1.00000
1 hdd 0.00980 osd.1 up 1.00000 1.00000
2 hdd 0.00980 osd.2 up 1.00000 1.00000
-5 0.02939 host local-node-2
3 hdd 0.00980 osd.3 up 1.00000 1.00000
4 hdd 0.00980 osd.4 up 1.00000 1.00000
5 hdd 0.00980 osd.5 up 1.00000 1.00000
-7 0.02939 host local-node-3
6 hdd 0.00980 osd.6 up 1.00000 1.00000
7 hdd 0.00980 osd.7 up 1.00000 1.00000
8 hdd 0.00980 osd.8 up 1.00000 1.00000
转载于:https://blog.51cto.com/tryingstuff/2379837