Ceph是一个支持大量小文件和随机读写的分布式文件系统,在维护 POSIX 兼容性的同时加入了复制和容错功能。目前Ceph已经被加到了Linux内核之中,虽然可能还不适用于生产环境。它也想实现统一存储的目标,即:
对象系统,类似Swift, 这里是RADOS, Reliable Autonomic Distributed Object Store, 可靠的自主分布式对象存储。在每台host上都要运行OSD(Object Storage Daemon)进程,当然,如果已经用RAID, LVM或btrf,xfs(最好别用ext4)将每台host上的硬盘都做成一个池了的话,运行一个OSD就可以了。OSD会默认创建三个池:data, metada与RBD 。同时,在每台host上还要运行MON (Monitor)进程。
文件存储,类似Hadoop中的HDFS,但HDFS是流式存储,即一次写多次读。想使用Ceph文件存储的话,那还在每台host上还要运行MDS(Meta-Data Server)进程。MDS是在对象系统的基础之上为Ceph客户端又提供的一层POSIX文件系统抽象实现。
块存储, 类似Cinder
这样说来,至少有下列几种方式可以访问Ceph中的对象:
RADOS方式,RADOS是Ceph的基础,即使对于Ceph文件存储,底层也是使用RADOS,RADOS本来提供一个librados库来访问对象,这个库支持php, java, python, c/c++。还通过RADOS Gateway来提供和Swift与Amazon-S3兼容的REST接口。
RBD(rados block device)与QEMU-RBD,前面说了,Ceph已经加到内核了,所以可以使用内核的RBD驱动来访问对象,它也和QEMU-RBD兼容。
CephFS, 上述MDS提供的POSIX兼容的文件系统。在生产系统中,建议用以上三种方式,不建议这种。
一个数据块具体存放在哪些host上需要有元数据来描述,HDFS是在一台机器上集中存储元数据的(HA可以通过配置主备实现),Swift则完全是分布式的,一个数据块具体存放在哪些host(在Ceph中称OSD, OSD是在host上维护数据块的一个进程)上由一致性哈希算法决定,元数据使用rsync命令同步分布在每一个host上,所以需要分级来减小元数据的大小,所以也就有了Accounts, Containers, Objects这三级RING。对应在RADOS中,有两级映射,先经过哈希把key映射到PG (Placement Group),再通过一致性哈希函数CRUSH从PGID映射到实际存储数据的host (OSD)。Swift使用的一致性哈希算法使用flat的host列表,但是CRUSH这种一致性哈希算法使用的host列表具有层次结构(shelves, racks, rows),并且能允许用户通过指定policies把复制存放在不同的机架。剩下的事和Swift类似,CRUSH会生成在RING上产生副本信息,第一个副本是主,其它是从,主负责接收来自客户端的写,及协调多个客户端的写,主再将数据写给从,待主返回结果后,主才告诉用户写成功,所以副本是强一致性的,这点和AWS dynamo这些最终一致性的做法有些区别。当新增机器或发生宕机时,和swift也类似,CRUSH一致性哈希算法也会保证数据的抖动性最小(即转移的数据块最少)。
除了存储节点外,还有一些监控节点组成的小集群,负责监控存储节点的运行状态,它们通过Paxos协议达到一致和保持数据冗余,Paxos和ZooKeeper中用到的领导者选择算法Zap协议类似,只要保证这些host中的大多数host不出故障就行,并且我们一般选择奇数台host,举个例子,在5个host的监控集群中,任何两台机器故障的情况下服务都能继续运行。
在一致性保证方面,在ZooKeeper中,领导者与跟随者非常聪明,跟随者通过更新号(唯一的全局标识叫zxid, ZooKeeper Transaction ID)来滞后领导者,这样大部分host确认更新之后,写操作就能被提交了。Ceph换汤不换药,这个全局标识改了个名叫epoch序号,所以Monitor节点记录的是epoch序号和一些全局状态(如存储节点是否在线,地址端口等),非常轻量,每个监测到存储节点发生变更时,如存储节点上线或下线,将epoch序号增加以区别先前的状态。总之,Monitor节点维护了这些集群状态映射对象ClusterMap,包括:monitor map, OSD map, placement group (PG) map, CRUSH map, epoch map。例如当存储节点宕机时,监控节点发现后更新epoch和ClusterMap,然后通过gossip p2p方式推送给存储节点(这种p2p通知和存储节点自主复制和HDFS中的master-slave模型是有区别的),存储节点再重新计算CRUSH决定将宕机机器丢失副本补上,由于一致性哈希的特性,发生变更的PG不会很多,也就是说抖动性不会很大。
通过将Ceph与现有的Swift, Hadoop等现有技术一坐标映射,到了这一步,笔者也就清楚Ceph是做什么的了。有机会再看看OpenStack是怎样用它的,以及它是怎样具体安装部署的。
Glance_store 使用 rbd 和 rados 模块:
import rados
import rbd
Glance_store 中新增了文件 /glance_store/_drivers/rbd.py,其中实现了三个类:
class StoreLocation(location.StoreLocation) #表示 Glance image 在 RBD 中的location,格式为 “rbd://///”
class ImageIterator(object) #实现从 RBD image 中读取数据块
class Store(driver.Store): #实现将 RBD 作为Nova 镜像后端(backend)
class Store(driver.Store) 实现的主要方法:
(1)获取 image 数据:根据传入的Glance image location,获取 image 的 IO Interator
def get(self, location, offset=0, chunk_size=None, context=None):
return (ImageIterator(loc.pool, loc.image, loc.snapshot, self), self.get_size(location))
可见该Glance image 在 Ceph 中其实是个 RBD snapshot,其父 image 为 rbd://4387471a-ae2b-47c4-b67e-9004860d0fd0/images/71dc76da-774c-411f-a958-1b51816ec50f/,snapshot 名称为 “snap”。
查看 rdb:
root@ceph1:~# rbd info images/71dc76da-774c-411f-a958-1b51816ec50f
rbd image ‘71dc76da-774c-411f-a958-1b51816ec50f’:
size 40162 kB in 5 objects
order 23 (8192 kB objects)
block_name_prefix: rbd_data.103d2246be0e
format: 2
features: layering
Glance 自动创建了它的 snapshot:
root@ceph1:~# rbd snap ls images/71dc76da-774c-411f-a958-1b51816ec50f
SNAPID NAME SIZE
2 snap 40162 kB
查看该 snapshot:
root@ceph1:~# rbd info images/71dc76da-774c-411f-a958-1b51816ec50f@snap
rbd image ‘71dc76da-774c-411f-a958-1b51816ec50f’:
size 40162 kB in 5 objects
order 23 (8192 kB objects)
block_name_prefix: rbd_data.103d2246be0e
format: 2
features: layering
protected: True
基于该 Glance image 创建的 Cinder volume 都是该 snapshot 的 clone:
root@ceph1:~# rbd children images/71dc76da-774c-411f-a958-1b51816ec50f@snap
volumes/volume-65dbaf38-0b9d-4654-bba4-53f12cc906e3
volumes/volume-6868a043-1412-4f6c-917f-bbffb1a8d21a
此时如果试着去删除该 image,则会报错:
The image cannot be deleted because it is in use through the backend store outside of Glance.
这是因为该 image 的 rbd image 的 snapshot 被使用了。
2. Cinder 与 Ceph RBD 的集成
OpenStack Cinder 组件和 Ceph RBD 集成的目的是将 Cinder 卷(volume)保存在 Ceph RBD 中。当使用 Ceph RBD 作为 Cinder 的后端存储时,你不需要单独的一个 Cinder-volume 节点.
2.1 配置项
配置项 含义 默认值
rbd_pool 保存rbd 卷的ceph pool 名称 rbd
rbd_user 访问 RBD 的用户的 ID,仅仅在使用 cephx 认证时使用 none
rbd_ceph_conf Ceph 配置文件的完整路径 ‘’,表示使用 librados 的默认ceph 配置文件
rbd_secret_uuid rbd secret uuid
rbd_flatten_volume_from_snapshot RBD Snapshot 在底层会快速复制一个元信息表,但不会产生实际的数据拷贝,因此当从 Snapshot 创建新的卷时,用户可能会期望不要依赖原来的 Snapshot,这个选项开启会在创建新卷时对原来的 Snapshot 数据进行拷贝来生成一个不依赖于源 Snapshot 的卷。 false
rbd_max_clone_depth 卷克隆的最大层数,超过的话则使用 fallter。设为 0 的话,则禁止克隆。
与上面这个选项类似的原因,RBD 在支持 Cinder 的部分 API (如从 Snapshot 创建卷和克隆卷)都会使用 rbd clone 操作,但是由于 RBD 目前对于多级卷依赖的 IO 操作不好,多级依赖卷会有比较严重的性能问题。因此这里设置了一个最大克隆值来避免这个问题,一旦超出这个阀值,新的卷会自动被 flatten。 5
rbd_store_chunk_size 每个 RBD 卷实际上就是由多个对象组成的,因此用户可以指定一个对象的大小来决定对象的数量,默认是 4 MB 4
rados_connect_timeout 连接 ceph 集群的超时时间,单位为秒。如果设为负值,则使用默认 librados 中的值 -1
从这里也能看出来,
(1)Cinder 不支持单个 volume 的条带化参数设置,而只是使用了公共配置项 rbd_store_chunk_size 来指定 order。
(2)Cinder 不支持卷被附加到客户机时设置缓存模式。
2.2 代码
Cinder 使用的就是之前介绍过的 rbd phthon 模块:
import rados
import rbd
它实现了以下主要接口。
2.2.1 与 RBD 的连接
(1)初始化连接到 RBD 的连接
def initialize_connection(self, volume, connector):
hosts, ports = self._get_mon_addrs() #调用 args = [‘ceph’, ‘mon’, ‘dump’, ‘–format=json’] 获取 monmap,再获取 hosts 和 ports
data = {
‘driver_volume_type’: ‘rbd’,
‘data’: {
‘name’: ‘%s/%s’ % (self.configuration.rbd_pool, volume[‘name’]),
‘hosts’: hosts,
‘ports’: ports,
‘auth_enabled’: (self.configuration.rbd_user is not None),
‘auth_username’: self.configuration.rbd_user,
‘secret_type’: ‘ceph’,
‘secret_uuid’: self.configuration.rbd_secret_uuid, }
}
(2)连接到 ceph rados
client = self.rados.Rados(rados_id=self.configuration.rbd_user, conffile=self.configuration.rbd_ceph_conf)
client.connect(timeout= self.configuration.rados_connect_timeout)
ioctx = client.open_ioctx(pool)
(3)断开连接
ioctx.close()
client.shutdown()
2.2.2 创建卷
Cinder API:create_volume
调用的是 RBD 的 create 方法来创建 RBD image:
with RADOSClient(self) as client:
self.rbd.RBD().create(client.ioctx,
encodeutils.safe_encode(volume[‘name’]),
size,
order,
old_format=old_format,
features=features)
2.2.3 克隆卷
#创建克隆卷
def create_cloned_volume(self, volume, src_vref):
if CONF.rbd_max_clone_depth <= 0: #如果设置的 rbd_max_clone_depth 为负数,则做一个完整的 rbd image copy
vol.copy(vol.ioctx, dest_name)
depth = self._get_clone_depth(client, src_name) #判断 volume 对应的 image 的 clone depth,如果已经达到 CONF.rbd_max_clone_depth,则需要做 flattern
src_volume = self.rbd.Image(client.ioctx, src_name) #获取 source volume 对应的 rbd image
#如果需要 flattern,
_pool, parent, snap = self._get_clone_info(src_volume, src_name) #获取 parent 和 snapshot
src_volume.flatten() # 将 parent 的data 拷贝到该 clone 中
parent_volume = self.rbd.Image(client.ioctx, parent) #获取 paraent image
parent_volume.unprotect_snap(snap) #将 snap 去保护
parent_volume.remove_snap(snap) #删除 snapshot
src_volume.create_snap(clone_snap) #创建新的 snapshot
src_volume.protect_snap(clone_snap) #将 snapshot 加保护
self.rbd.RBD().clone(client.ioctx, src_name, clone_snap, client.ioctx, dest_name, features=client.features) #在 snapshot 上做clone
self._resize(volume) #如果 clone 的size 和 src volume 的size 不一样,则 resize
3. Nova 与 Ceph 的集成
常规地,Nova 将虚机的镜像文件放在本地磁盘或者Cinder 卷上。为了与 Ceph 集成,Nova 中添加了新的代码来将镜像文件保存在 Ceph 中。
3.1 Nova 与 Ceph 集成的配置项
配置项 含义 默认值
images_type 其值可以设为下面几个选项中的一个:
raw:实现了 class Raw(Image) 类,在 CONF.instances_path 指定的目录中保存 raw 格式的 镜像文件
qcow2:实现了 class Qcow2(Image) 类,在 CONF.instances_path 指定的目录中创建和保存 qcow2 格式的镜像文件 [‘qemu-img’, ‘create’, ‘-f’, ‘qcow2’]
lvm:实现了 class Lvm(Image) 类,在 CONF.libvirt.images_volume_group 指定的 LVM Volume Group 中创建 local vm 来存放 vm images
rbd:实现了 class Rbd(Image) 类
default:使用 use_cow_images 配置项 default
images_rbd_pool 存放 vm 镜像文件的 RBD pool rbd
images_rbd_ceph_conf Ceph 配置文件的完整路径 ‘’
hw_disk_discard 设置使用或者不使用discard 模式,使用的话需要 Need Libvirt(1.0.6)、 Qemu1.5 (raw format) 和 Qemu1.6(qcow2 format)’) 的支持
“unmap” : Discard requests(“trim” or “unmap”) are passed to the filesystem.
“ignore”: Discard requests(“trim” or “unmap”) are ignored and aren’t passed to the filesystem. none
rbd_user rbd user ID
rbd_secret_uuid rbd secret UUID
关于 discard 模式(详情见这篇文章):
Discard,在 SSD 上称为 trim,是在磁盘上回收不使用的数据块的机制。默认地 RBD image 是稀疏的(thin provision),这意味着只有在你写入数据时物理空间才会被占用。在 OpenStack 虚机中使用 discard 机制需要满足两个条件:
(1)虚机使用 Virtio-scsi 作为存储接口,而不是使用传统的 Virtio-blk。这是因为 Virtio-scsi 中实现了如下的新的功能:
设备直接暴露给客户机(device pass-through to directly expose physical storage devices to guests)
更好的性能(better performance and support for true SCSI device)
标准的设备命名方法(common and standard device naming identical to the physical world thus virtualising physical applications is made easier)
更好的扩展性(better scalability of the storage where virtual machines can attach more device (more LUNs etc…))
要让虚机使用该接口,需要设置 Glance image 的属性:
$ glance image-update --property hw_scsi_model=virtio-scsi --property hw_disk_bus=scsi
(2)在 nova.conf 中配置 hw_disk_discard = unmap
注意目前 cinder 尚不支持 discard。
3.2 Nova 中实现的 RBD image 操作
Nova 在 \nova\virt\libvirt\imagebackend.py 文件中添加了支持 RBD 的新类 class Rbd(Image) 来支持将虚机的image 放在 RBD 中。其主要方法包括:
(1)image create API
def create_image(self, prepare_template, base, size, *args, **kwargs):
rbd import 命令:
import [–image-format format-id] [–order bits] [–stripe-unit size-in-B/K/M –stripe-count num] [–image-feature feature-name]… [–image-shared] src-path[image-spec]
Creates a new image and imports its data from path (use - for stdin). The import operation will try to create sparse rbd images if possible. For import from stdin, the sparsification unit is the data block size of the destination image (1 << order).
The –stripe-unit and –stripe-count arguments are optional, but must be used together.
虚机创建成功后,在 RBD 中查看创建出来的 image:
root@ceph1:~# rbd ls vms
74cbdb41-3789-4eae-b22e-5085de8caba8_disk.local
root@ceph1:~# rbd info vms/74cbdb41-3789-4eae-b22e-5085de8caba8_disk.local
rbd image ‘74cbdb41-3789-4eae-b22e-5085de8caba8_disk.local’:
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.11552ae8944a
format: 2
features: layering
查看虚机的 xml 定义文件,能看到虚机的系统盘、临时盘和交换盘的镜像文件都在 RBD 中,而且可以使用特定的 cache 模式(由 CONF.libvirt.disk_cachemodes 配置项指定)和 discard 模式(由 CONF.libvirt.hw_disk_discard 配置项设置):
关于 libvirt.disk_cachemodes 配置项,可以指定镜像文件的缓存模式,其值的格式为 ”A=B",其中:
A 可以为 ‘file’,‘block’,‘network’,‘mount’。其中,file 是针对 file-backend 的disk(比如使用 qcow2 格式的镜像文件),block 是针对块设备的disk(比如使用 cinder volume 或者 lvm),network 是针对通过网络连接的设备的disk(比如 Ceph)。
B 可以为 none,writethrough,writeback,directsync,unsafe。writethrough 和 writeback 可以参考 理解 OpenStack + Ceph (2):Ceph 的物理和逻辑结构,其它的请自行google。
比如,disk_cachemodes=“network=writeback”,disk_cachemodes=“block=writeback”。
上面的虚机 XML 定义文件是在Nova 配置为 disk_cachemodes=“network=writeback” 和 hw_disk_discard = unmap 的情形下生成的。
(2)image clone API
def clone(self, context, image_id_or_uri):
def _create_image(self, context, instance,
disk_mapping, suffix=’’,
disk_images=None, network_info=None,
block_device_info=None, files=None,
admin_pass=None, inject_files=True,
fallback_from_host=None):
if not booted_from_volume: #如果不是从 volume 启动虚机
root_fname = imagecache.get_cache_fname(disk_images, 'image_id')
size = instance.root_gb * units.Gi
backend = image('disk')
if backend.SUPPORTS_CLONE: #如果 image backend 支持 clone 的话(目前的各种 image backend,只有 RBD 支持 clone)
def clone_fallback_to_fetch(*args, **kwargs):
backend.clone(context, disk_images['image_id']) #直接调用 backend 的 clone 函数做 image clone
fetch_func = clone_fallback_to_fetch
else:
fetch_func = libvirt_utils.fetch_image #否则走常规的 image 下载-导入过程
self._try_fetch_image_cache(backend, fetch_func, context, root_fname, disk_images['image_id'], instance, size, fallback_from_host)
可见,当 nova 后端使用 ceph 时,nova driver 调用 RBD imagebackend 命令,直接在 ceph 存储层完成镜像拷贝动作(无需消耗太多的nova性能,也无需将镜像下载到hypervisor本地,再上传镜像到ceph),如此创建虚拟机时间将会大大提升。当然,这个的前提是 image 也是保存在 ceph 中,而且 image 的格式为 raw,否则 clone 过程会报错。 具体过程如下:
(1)这是 Glance image 对应的 rbd image:
root@ceph1:~# rbd info images/0a64fa67-3e34-42e7-b7b0-423c11850e18
rbd image ‘0a64fa67-3e34-42e7-b7b0-423c11850e18’:
size 564 MB in 71 objects
order 23 (8192 kB objects)
block_name_prefix: rbd_data.16d21e1d755b
format: 2
features: layering
(2)使用该 image 创建第一个虚机
root@ceph1:~# rbd info vms/982b8eac-6bcc-4a21-bd04-b67e26188be0_disk
rbd image ‘982b8eac-6bcc-4a21-bd04-b67e26188be0_disk’:
size 3072 MB in 384 objects
order 23 (8192 kB objects)
block_name_prefix: rbd_data.130a36a6b435
format: 2
features: layering
parent: images/0a64fa67-3e34-42e7-b7b0-423c11850e18@snap
overlap: 564 MB
root@ceph1:~# rbd info vms/982b8eac-6bcc-4a21-bd04-b67e26188be0_disk.local
rbd image ‘982b8eac-6bcc-4a21-bd04-b67e26188be0_disk.local’:
size 2048 MB in 512 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.a69b2ae8944a
format: 2
features: layering
root@ceph1:~# rbd info vms/982b8eac-6bcc-4a21-bd04-b67e26188be0_disk.swap
rbd image ‘982b8eac-6bcc-4a21-bd04-b67e26188be0_disk.swap’:
size 102400 kB in 25 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.a69e74b0dc51
format: 2
features: layering
(3)创建第二个虚机
root@ceph1:~# rbd info vms/a9670d9a-8aa7-49ba-baf5-9d7a450172f3_disk
rbd image ‘a9670d9a-8aa7-49ba-baf5-9d7a450172f3_disk’:
size 3072 MB in 384 objects
order 23 (8192 kB objects)
block_name_prefix: rbd_data.13611f6abac6
format: 2
features: layering
parent: images/0a64fa67-3e34-42e7-b7b0-423c11850e18@snap
overlap: 564 MB
(4)会看到 Glance image 对应的 RBD image 有两个克隆,分别是上面虚机的系统盘
root@ceph1:~# rbd children images/0a64fa67-3e34-42e7-b7b0-423c11850e18@snap
vms/982b8eac-6bcc-4a21-bd04-b67e26188be0_disk
vms/a9670d9a-8aa7-49ba-baf5-9d7a450172f3_disk
4. 其它集成
除了上面所描述的 Cinder、Nova 和 Glance 与 Ceph RBD 的集成外,OpenStack 和 Ceph 之间还有其它的集成点:
(1)使用 Ceph 替代 Swift 作为对象存储 (网络上有很多比较 Ceph 和 Swift 的文章,比如 1,2,3,)
(2)CephFS 作为 Manila 的后端(backend)
(3)Keystone 和 Ceph Object Gateway 的集成,具体可以参考文章 (1)(2)
-------------------------------------------------------------------------------------------------------------------------------技术堆栈
Ceph的一个使用场景是结合Openstack来提供云存储服务,Openstack到Ceph之间的调用堆栈就是下面这个结构:
集成Ceph相关配置
创建Pool
安装Ceph Client包
配置centos7 ceph yum源
在glance-api(控制节点)节点上
yum install python-rbd -y
1
(计算节点)在nova-compute和cinder-volume节点上
yum install ceph-common -y
openstack安装Ceph客户端认证
集群ceph存储端操作
[root@ceph ~]# ssh controller sudo tee /etc/ceph/ceph.conf < /etc/ceph/ceph.conf
[root@ceph ~]# ssh compute sudo tee /etc/ceph/ceph.conf < /etc/ceph/ceph.conf
如果开启了cephx authentication,需要为Nova/Cinder and Glance创建新的用户,如下
ceph auth get-or-create client.cinder mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rx pool=images’
ceph auth get-or-create client.glance mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=images’
为client.cinder, client.glance添加keyring,如下
ceph auth get-or-create client.glance | ssh controller sudo tee /etc/ceph/ceph.client.glance.keyring
ssh controller sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
ceph auth get-or-create client.cinder | ssh compute sudo tee /etc/ceph/ceph.client.cinder.keyring
ssh compute sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
为nova-compute节点上创建临时密钥
ceph auth get-key client.cinder | ssh {your-compute-node} tee client.cinder.key
此处为
ceph auth get-key client.cinder | ssh compute tee client.cinder.key
在所有计算节点上(本例就只有一台计算节点)执行如下操作:在计算节点上为libvert替换新的key
uuidgen
536f43c1-d367-45e0-ae64-72d987417c91
cat > secret.xml <
536f43c1-d367-45e0-ae64-72d987417c91
client.cinder secret
EOF
virsh secret-define --file secret.xml
以下—base64 后的秘钥为计算节点上/root目录下的client.cinder.key。是之前为计算节点创建的临时秘钥文件
virsh secret-set-value 536f43c1-d367-45e0-ae64-72d987417c91 AQCliYVYCAzsEhAAMSeU34p3XBLVcvc4r46SyA==
1
[root@compute ~]#rm –f client.cinder.key secret.xml
1
Openstack配置
在控制节点操作
vim /etc/glance/glance-api.conf
[DEFAULT]
…
default_store = rbd
show_image_direct_url = True
show_multiple_locations = True
…
[glance_store]
stores = rbd
default_store = rbd
rbd_store_pool = images
rbd_store_user = glance
rbd_store_ceph_conf = /etc/ceph/ceph.conf
rbd_store_chunk_size = 8
取消Glance cache管理,去掉cachemanagement
[paste_deploy]
flavor = keystone
在计算节点操作
vim /etc/cinder/cinder.conf
[DEFAULT]
保留之前的
enabled_backends = ceph
#glance_api_version = 2
…
[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
glance_api_version = 2
rbd_user = cinder
volume_backend_name = ceph
rbd_secret_uuid =536f43c1-d367-45e0-ae64-72d987417c91
请注意,每个计算节点uuid不同。按照实际情况填写。本例只有一个计算节点
注意,如果配置多个cinder后端,glance_api_version = 2必须添加到[DEFAULT]中。本例注释了
每个计算节点上,设置/etc/nova/nova.conf
vim /etc/nova/nova.conf
[libvirt]
virt_type = qemu
hw_disk_discard = unmap
images_type = rbd
images_rbd_pool = vms
images_rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_user = cinder
rbd_secret_uuid = 536f43c1-d367-45e0-ae64-72d987417c91
disk_cachemodes=“network=writeback”
libvirt_inject_password = false
libvirt_inject_key = false
libvirt_inject_partition = -2
live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_LIVE, VIR_MIGRATE_TUNNELLED
重启OpenStack
控制节点
systemctl restart openstack-glance-api.service
1
计算节点
systemctl restart openstack-nova-compute.service openstack-cinder-volume.service
1
配置文件
1、nova
[root@controller nova]# cat nova.conf
[DEFAULT]
enabled_apis = osapi_compute,metadata
rpc_backend = rabbit
auth_strategy = keystone
my_ip = 192.168.8.100
use_neutron = True
firewall_driver = nova.virt.firewall.NoopFirewallDriver
[api_database]
connection = mysql+pymysql://nova:Changeme_123@controller/nova_api
[barbican]
[cache]
[cells]
[cinder]
os_region_name = RegionOne
[conductor]
[cors]
[cors.subdomain]
[database]
connection = mysql+pymysql://nova:Changeme_123@controller/nova
[ephemeral_storage_encryption]
[glance]
api_servers = http://controller:9292
[guestfs]
[hyperv]
[image_file_url]
[ironic]
[keymgr]
[keystone_authtoken]
auth_uri = http://controller:5000
auth_url = http://controller:35357
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = nova
password = Changeme_123
[libvirt]
[libvirt]
virt_type = qemu
hw_disk_discard = unmap
images_type = rbd
images_rbd_pool = nova
images_rbd_ceph_conf = /etc/cinder/ceph.conf
rbd_user = cinder
rbd_secret_uuid = 457eb676-33da-42ec-9a8c-9293d545c337
disk_cachemodes=“network=writeback”
libvirt_inject_password = false
libvirt_inject_key = false
libvirt_inject_partition = -2
live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_LIVE, VIR_MIGRATE_TUNNELLED
[matchmaker_redis]
[metrics]
[neutron]
url = http://controller:9696
auth_url = http://controller:35357
auth_type = password
project_domain_name = default
user_domain_name = default
region_name = RegionOne
project_name = service
username = neutron
password = Changeme_123
service_metadata_proxy = True
metadata_proxy_shared_secret = Changeme_123
[osapi_v21]
[oslo_concurrency]
lock_path = /var/lib/nova/tmp
[oslo_messaging_amqp]
[oslo_messaging_notifications]
[oslo_messaging_rabbit]
rabbit_host = controller
rabbit_userid = openstack
rabbit_password = Changeme_123
[oslo_middleware]
[oslo_policy]
[rdp]
[serial_console]
[spice]
[ssl]
[trusted_computing]
[upgrade_levels]
[vmware]
[vnc]
vncserver_listen = 0.0.0.0
vncserver_proxyclient_address = 192.168.8.100
enabled = True
novncproxy_base_url = http://192.168.8.100:6080/vnc_auto.html
[workarounds]
[xenserver]
cinder
[root@controller nova]# cat /etc/cinder/cinder.conf
[DEFAULT]
rpc_backend = rabbit
auth_strategy = keystone
my_ip = 192.168.8.100
glance_host = controller
enabled_backends = lvm,ceph
glance_api_servers = http://controller:9292
[BACKEND]
[BRCD_FABRIC_EXAMPLE]
[CISCO_FABRIC_EXAMPLE]
[COORDINATION]
[FC-ZONE-MANAGER]
[KEYMGR]
[cors]
[cors.subdomain]
[database]
connection = mysql+pymysql://cinder:Changeme_123@controller/cinder
[keystone_authtoken]
auth_uri = http://controller:5000
auth_url = http://controller:35357
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = cinder
password = Changeme_123
[matchmaker_redis]
[oslo_concurrency]
lock_path = /var/lib/cinder/tmp
[oslo_messaging_amqp]
[oslo_messaging_notifications]
[oslo_messaging_rabbit]
rabbit_host = controller
rabbit_userid = openstack
rabbit_password = Changeme_123
[oslo_middleware]
[oslo_policy]
[oslo_reports]
[oslo_versionedobjects]
[ssl]
[lvm]
volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver
volume_group = cinder-volumes
iscsi_protocol = iscsi
iscsi_helper = lioadm
[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
rbd_pool = cinder
rbd_ceph_conf = /etc/cinder/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
glance_api_version = 2
rbd_user = cinder
rbd_secret_uuid =457eb676-33da-42ec-9a8c-9293d545c337
volume_backend_name = ceph
glance
[root@controller nova]# cat /etc/glance/glance-api.conf
[DEFAULT]
#default_store = rbd
show_image_direct_url = True
#show_multiple_locations = True
[cors]
[cors.subdomain]
[database]
connection = mysql+pymysql://glance:Changeme_123@controller/glance
[glance_store]
stores = rbd
default_store = rbd
rbd_store_pool = glance
rbd_store_user = glance
rbd_store_ceph_conf = /etc/glance/ceph.conf
rbd_store_chunk_size = 8
[image_format]
[keystone_authtoken]
auth_uri = http://controller:5000
auth_url = http://controller:35357
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
username = glance
password = Changeme_123
project_name = service
[matchmaker_redis]
[oslo_concurrency]
[oslo_messaging_amqp]
[oslo_messaging_notifications]
[oslo_messaging_rabbit]
[oslo_policy]
[paste_deploy]
flavor = keystone
[profiler]
[store_type_location_strategy]
[task]
[taskflow_executor]
ceph
[root@controller nova]# cat /etc/cinder/cinder.conf
[DEFAULT]
rpc_backend = rabbit
auth_strategy = keystone
my_ip = 192.168.8.100
glance_host = controller
enabled_backends = lvm,ceph
glance_api_servers = http://controller:9292
[BACKEND]
[BRCD_FABRIC_EXAMPLE]
[CISCO_FABRIC_EXAMPLE]
[COORDINATION]
[FC-ZONE-MANAGER]
[KEYMGR]
[cors]
[cors.subdomain]
[database]
connection = mysql+pymysql://cinder:Changeme_123@controller/cinder
[keystone_authtoken]
auth_uri = http://controller:5000
auth_url = http://controller:35357
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
project_name = service
username = cinder
password = Changeme_123
[matchmaker_redis]
[oslo_concurrency]
lock_path = /var/lib/cinder/tmp
[oslo_messaging_amqp]
[oslo_messaging_notifications]
[oslo_messaging_rabbit]
rabbit_host = controller
rabbit_userid = openstack
rabbit_password = Changeme_123
[oslo_middleware]
[oslo_policy]
[oslo_reports]
[oslo_versionedobjects]
[ssl]
[lvm]
volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver
volume_group = cinder-volumes
iscsi_protocol = iscsi
iscsi_helper = lioadm
[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
rbd_pool = cinder
rbd_ceph_conf = /etc/cinder/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
glance_api_version = 2
rbd_user = cinder
rbd_secret_uuid =457eb676-33da-42ec-9a8c-9293d545c337
volume_backend_name = ceph
[root@controller nova]# cat /etc/glance/glance-api.conf
[DEFAULT]
#default_store = rbd
show_image_direct_url = True
#show_multiple_locations = True
[cors]
[cors.subdomain]
[database]
connection = mysql+pymysql://glance:Changeme_123@controller/glance
[glance_store]
stores = rbd
default_store = rbd
rbd_store_pool = glance
rbd_store_user = glance
rbd_store_ceph_conf = /etc/glance/ceph.conf
rbd_store_chunk_size = 8
[image_format]
[keystone_authtoken]
auth_uri = http://controller:5000
auth_url = http://controller:35357
memcached_servers = controller:11211
auth_type = password
project_domain_name = default
user_domain_name = default
username = glance
password = Changeme_123
project_name = service
[matchmaker_redis]
[oslo_concurrency]
[oslo_messaging_amqp]
[oslo_messaging_notifications]
[oslo_messaging_rabbit]
[oslo_policy]
[paste_deploy]
flavor = keystone
[profiler]
[store_type_location_strategy]
[task]
[taskflow_executor]
[root@controller nova]# cat /etc/cinder/ceph.conf
[global]
heartbeat interval = 5
osd pool default size = 3
osd heartbeat grace = 10
#keyring = /etc/ceph/keyring.admin
mon osd down out interval = 90
fsid = 5e8080b0-cc54-11e6-b346-000c29976397
osd heartbeat interval = 10
max open files = 131072
auth supported = cephx
[mon]
mon osd full ratio = .90
mon data = /var/lib/ceph/mon/monKaTeX parse error: Expected 'EOF', got '#' at position 411: …se omap = true #̲keyring = /etc/…name
osd mkfs type = xfs
osd data = /var/lib/ceph/osd/osd$id
osd heartbeat interval = 10
osd heartbeat grace = 10
osd mkfs options xfs = -f
osd journal size = 0
[osd.0]
osd journal = /dev/sdb1
devs = /dev/sdb2
host = cloud-node1
cluster addr = 192.168.8.102
public addr = 192.168.8.102
[osd.1]
osd journal = /dev/sdb1
devs = /dev/sdb2
host = cloud-node2
cluster addr = 192.168.8.103
public addr = 192.168.8.103
[osd.2]
osd journal = /dev/sdb1
devs = /dev/sdb2
host = cloud-node3
cluster addr = 192.168.8.104
public addr = 192.168.8.104
[client.cinder]
keyring=/etc/ceph/ceph.client.cinder.keyring
如何安装Ceph并对接OpenStack
安装Ceph
准备工作:
关闭SELinux
sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/g’ /etc/selinux/config
setenforce 0
打开Ceph端口
安装epel源
1
安装 ntp 同步时间
安装Ceph
安装Ceph源
安装ceph-deploy工具
创建安装Ceph的工作目录
创建Monitor节点
#ceph-deploy new mon1 mon2 mon3
为每个节点安装Ceph包(包含Monitor节点与OSD节点)
初始化Monitor节点
安装一个OSD
分区:
parted /dev/sdb
mklabel gpt
mkpart primary 0% 50GB
mkpart primary xfs 50GB 100%
格式化数据分区
mkfs.xfs /dev/sdb2
创建OSD
ceph-deploy osd create osd1:/dev/sdb2:/dev/sdb1
激活OSD
ceph-deploy osd activate osd1:/dev/sdb2:/dev/sdb1
同时创建多个
ceph-deploy osd create controller2:/dev/sdb2:/dev/sdb1 controller2:/dev/sdd2:/dev/sdd1 controller2:/dev/sde2:/dev/sde1
ceph-deploy osd activate controller2:/dev/sdb2:/dev/sdb1 controller2:/dev/sdd2:/dev/sdd1 controller2:/dev/sde2:/dev/sde1
注意点:
步骤1-6只需要在一台机器上,即安装节点上做
只有这两步必须在各个节点做
格式化数据分区
mkfs.xfs /dev/sdb2
创建OSD
ceph-deploy osd create osd1:/dev/sdb2:/dev/sdb1
机械硬盘
第一个盘的第一个分区给日志分区
SSD
SSD的多个分区给多个osd做log存储
基本配置与检查
[root@controller3 ceph]# ceph health
[root@controller3 ceph]# ceph -w
[root@controller2 ~(keystone_admin)]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 2.7T 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 500M 0 part /boot
└─sda3 8:3 0 2.7T 0 part
├─centos-swap 253:0 0 7.9G 0 lvm [SWAP]
├─centos-root 253:1 0 50G 0 lvm /
└─centos-home 253:2 0 2.7T 0 lvm /home
sdb 8:16 0 2.7T 0 disk
sdd 8:48 0 2.7T 0 disk
sde 8:64 0 2.7T 0 disk
loop0 7:0 0 2G 0 loop /srv/node/swiftloopback
loop2 7:2 0 20.6G 0 loop
[root@controller2 ~(keystone_admin)]# parted /dev/sdb
GNU Parted 3.1
Using /dev/sdb
Welcome to GNU Parted! Type ‘help’ to view a list of commands.
(parted) mklabel gpt
(parted) mkpart primary 0% 50GB
(parted) mkpart primary xfs 50GB 100%
(parted) p
Model: ATA HGST HUS724030AL (scsi)
Disk /dev/sdb: 3001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 50.0GB 50.0GB primary
2 50.0GB 3001GB 2951GB primary
(parted)
[root@controller3 ceph]# ceph osd df
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR
0 2.67999 1.00000 2746G 37096k 2746G 0.00 1.03
7 2.67999 1.00000 2746G 35776k 2746G 0.00 0.99
8 2.67999 1.00000 2746G 35256k 2746G 0.00 0.98
1 2.67999 1.00000 2746G 36800k 2746G 0.00 1.02
5 2.67999 1.00000 2746G 35568k 2746G 0.00 0.99
6 2.67999 1.00000 2746G 35572k 2746G 0.00 0.99
2 2.67999 1.00000 2746G 36048k 2746G 0.00 1.00
3 2.67999 1.00000 2746G 36128k 2746G 0.00 1.00
4 2.67999 1.00000 2746G 35664k 2746G 0.00 0.99
TOTAL 24719G 316M 24719G 0.00
MIN/MAX VAR: 0.98/1.03 STDDEV: 0
[root@controller3 ceph]# ceph osd pool get rbd size
[root@controller3 ceph]# ceph osd pool set rbd pg_num 256
set pool 0 pg_num to 256
[root@controller3 ceph]# ceph osd pool set rbd pgp_num 256
Error EBUSY: currently creating pgs, wait
[root@controller3 ceph]# ceph osd pool set rbd pgp_num 256
set pool 0 pgp_num to 256
[root@controller3 ceph]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 24.11993 root default
-2 8.03998 host controller3
0 2.67999 osd.0 up 1.00000 1.00000
7 2.67999 osd.7 up 1.00000 1.00000
8 2.67999 osd.8 up 1.00000 1.00000
-3 8.03998 host controller1
1 2.67999 osd.1 up 1.00000 1.00000
5 2.67999 osd.5 up 1.00000 1.00000
6 2.67999 osd.6 up 1.00000 1.00000
-4 8.03998 host controller2
2 2.67999 osd.2 up 1.00000 1.00000
3 2.67999 osd.3 up 1.00000 1.00000
4 2.67999 osd.4 up 1.00000 1.00000
[root@controller3 ceph]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
24719G 24719G 307M 0
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd 0 0 0 8239G 0
写下去Ceph对接OpenStack
环境:3台Controller的硬盘作为ceph存储,每台Controller上有3块硬盘,其余为Compute节点
install clients
每个节点安装(要访问ceph集群的节点)
sudo yum install python-rbd
sudo yum install ceph-common
create pools
只需在一个ceph节点上操作即可
ceph osd pool create images 1024
ceph osd pool create vms 1024
ceph osd pool create volumes 1024
显示pool的状态
ceph osd lspools
ceph -w
3328 pgs: 3328 active+clean
创建用户
只需在一个ceph节点上操作即可
ceph auth get-or-create client.glance mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=images’
ceph auth get-or-create client.cinder mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rx pool=images’
nova使用cinder用户,就不单独创建了
拷贝ceph-ring
只需在一个ceph节点上操作即可
ceph auth get-or-create client.glance > /etc/ceph/ceph.client.glance.keyring
ceph auth get-or-create client.cinder > /etc/ceph/ceph.client.cinder.keyring
使用scp拷贝到其他节点
[root@controller1 ceph]# ls
ceph.client.admin.keyring ceph.conf tmpOZNQbH
ceph.client.cinder.keyring rbdmap tmpPbf5oV
ceph.client.glance.keyring tmpJSCMCV tmpUKxBQB
权限
更改文件的权限(所有客户端节点均执行)
sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
更改libvirt权限(只需在nova-compute节点上操作即可)
uuidgen
生成uuid
cat > secret.xml <
457eb676-33da-42ec-9a8c-9293d545c337
client.cinder secret
EOF
拷贝到所有compute节点
sudo virsh secret-define --file secret.xml
生成key
ceph auth get-key client.cinder > ./client.cinder.key
sudo virsh secret-set-value --secret adc522c4-237c-4cb8-8e39-682adcf4a830 --base64 $(cat ./client.cinder.key)
1
2
3
4
最后所有compute节点的client.cinder.key和secret.xml都是一样的
记下之前生成的uuid
adc522c4-237c-4cb8-8e39-682adcf4a830
1
配置Glance
在所有的controller节点上做如下更改
etc/glance/glance-api.conf
[DEFAULT]
default_store = rbd
show_image_direct_url=False
bind_host=controller1
bind_port=9292
workers=24
backlog=4096
image_cache_dir=/var/lib/glance/image-cache
registry_host=controller1
registry_port=9191
registry_client_protocol=http
debug=False
verbose=True
log_file=/var/log/glance/api.log
log_dir=/var/log/glance
use_syslog=False
syslog_log_facility=LOG_USER
use_stderr=True
[database]
connection=mysql+pymysql://glance:a9717e6e21384389@controller:3305/glance
idle_timeout=3600
[glance_store]
os_region_name=RegionOne
stores = rbd
default_store = rbd
rbd_store_pool = images
rbd_store_user = glance
rbd_store_ceph_conf = /etc/ceph/ceph.conf
rbd_store_chunk_size = 8
[image_format]
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_user=glance
admin_password=79c14ca2ba51415b
admin_tenant_name=services
[matchmaker_redis]
[matchmaker_ring]
[oslo_concurrency]
[oslo_messaging_amqp]
[oslo_messaging_qpid]
[oslo_messaging_rabbit]
rabbit_host=controller1,controller2,controller3
rabbit_hosts=controller1:5672,controller2:5672,controller3:5672
[oslo_policy]
[paste_deploy]
flavor=keystone
[store_type_location_strategy]
[task]
[taskflow_executor]
验证Glance
在所有的controller节点上做如下更改
重启服务
openstack-service restart glance
1
查看状态
[root@controller1 ~]# openstack-service status glance
MainPID=23178 Id=openstack-glance-api.service ActiveState=active
MainPID=23155 Id=openstack-glance-registry.service ActiveState=active
[root@controller1 ~]# netstat -plunt | grep 9292
tcp 0 0 192.168.53.58:9292 0.0.0.0:* LISTEN 23178/python2
tcp 0 0 192.168.53.23:9292 0.0.0.0:* LISTEN 11435/haproxy
上传一个镜像
[root@controller1 ~]# rbd ls images
ac5c334f-fbc2-4c56-bf48-47912693b692
这样就说明成功了
配置 Cinder
/etc/cinder/cinder.conf
[DEFAULT]
enabled_backends = ceph
glance_host = controller
enable_v1_api = True
enable_v2_api = True
host = controller1
storage_availability_zone = nova
default_availability_zone = nova
auth_strategy = keystone
#enabled_backends = lvm
osapi_volume_listen = controller1
osapi_volume_workers = 24
nova_catalog_info = compute:nova:publicURL
nova_catalog_admin_info = compute:nova:adminURL
debug = False
verbose = True
log_dir = /var/log/cinder
notification_driver =messagingv2
rpc_backend = rabbit
control_exchange = openstack
api_paste_config=/etc/cinder/api-paste.ini
[ceph]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
glance_api_version = 2
rbd_user = cinder
rbd_secret_uuid = adc522c4-237c-4cb8-8e39-682adcf4a830
volume_backend_name=ceph
[BRCD_FABRIC_EXAMPLE]
[CISCO_FABRIC_EXAMPLE]
[cors]
[cors.subdomain]
[database]
connection = mysql+pymysql://cinder:02d2f2f82467400d@controller:3305/cinder
[fc-zone-manager]
[keymgr]
[keystone_authtoken]
auth_uri = http://controller:5000
identity_uri = http://controller:35357
admin_user = cinder
admin_password = 7281b7bf47044f96
admin_tenant_name = services
memcache_servers = 127.0.0.1:11211
token_cache_time = 3600
cache = true
[matchmaker_redis]
[matchmaker_ring]
[oslo_concurrency]
[oslo_messaging_amqp]
[oslo_messaging_qpid]
[oslo_messaging_rabbit]
amqp_durable_queues = False
kombu_ssl_keyfile =
kombu_ssl_certfile =
kombu_ssl_ca_certs =
rabbit_host = controller1,controller2,controller3
rabbit_port = 5672
rabbit_hosts = controller1:5672,controller2:5672,controller3:5672
rabbit_use_ssl = False
rabbit_userid = guest
rabbit_password = guest
rabbit_virtual_host = /
rabbit_ha_queues = False
heartbeat_timeout_threshold = 0
heartbeat_rate = 2
[oslo_middleware]
[oslo_policy]
[oslo_reports]
[profiler]
[lvm]
iscsi_helper=lioadm
volume_group=cinder-volumes
iscsi_ip_address=192.168.56.251
volume_driver=cinder.volume.drivers.lvm.LVMVolumeDriver
volumes_dir=/var/lib/cinder/volumes
iscsi_protocol=iscsi
volume_backend_name=lvm
注意:ceph段在文本中要重新创建
验证Cinder
openstack-service restart cinder
创建一个云硬盘
[root@controller1 ~]# rbd ls volumes
volume-463c3495-1747-480f-974f-51ac6e1c5612
这样就成功了
配置Nova
/etc/nova/nova.conf
[DEFAULT]
internal_service_availability_zone=internal
default_availability_zone=nova
use_ipv6=False
notify_api_faults=False
state_path=/var/lib/nova
report_interval=10
compute_manager=nova.compute.manager.ComputeManager
service_down_time=60
rootwrap_config=/etc/nova/rootwrap.conf
volume_api_class=nova.volume.cinder.API
auth_strategy=keystone
allow_resize_to_same_host=False
heal_instance_info_cache_interval=60
reserved_host_memory_mb=512
network_api_class=nova.network.neutronv2.api.API
force_snat_range =0.0.0.0/0
metadata_host=192.168.56.200
dhcp_domain=novalocal
security_group_api=neutron
compute_driver=libvirt.LibvirtDriver
vif_plugging_is_fatal=True
vif_plugging_timeout=300
firewall_driver=nova.virt.firewall.NoopFirewallDriver
force_raw_images=True
debug=False
verbose=True
log_dir=/var/log/nova
use_syslog=False
syslog_log_facility=LOG_USER
use_stderr=True
notification_topics=notifications
rpc_backend=rabbit
vncserver_proxyclient_address=compute1
vnc_keymap=en-us
sql_connection=mysql+pymysql://nova:[email protected]/nova
vnc_enabled=True
image_service=nova.image.glance.GlanceImageService
lock_path=/var/lib/nova/tmp
vncserver_listen=0.0.0.0
novncproxy_base_url=http://192.168.56.200:6080/vnc_auto.html
[api_database]
[barbican]
[cells]
[cinder]
catalog_info=volumev2:cinderv2:publicURL
[conductor]
[cors]
[cors.subdomain]
[database]
[ephemeral_storage_encryption]
[glance]
api_servers=192.168.56.200:9292
[guestfs]
[hyperv]
[image_file_url]
[ironic]
[keymgr]
[keystone_authtoken]
[libvirt]
images_type = rbd
images_rbd_pool = vms
images_rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_user = cinder
rbd_secret_uuid = 5b059071-1aff-4b72-bc5f-0122a7d6c1df
disk_cachemodes=“network=writeback”
inject_password = false
inject_key = false
inject_partition = -2
live_migration_flag=“VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED”
virt_type=kvm
inject_password=False
inject_key=False
inject_partition=-1
live_migration_uri=qemu+tcp://nova@%s/system
cpu_mode=host-model
vif_driver=nova.virt.libvirt.vif.LibvirtGenericVIFDriver
[matchmaker_redis]
[matchmaker_ring]
[metrics]
[neutron]
url=http://192.168.56.200:9696
admin_username=neutron
admin_password=0cccbadebaa14569
admin_tenant_name=services
region_name=RegionOne
admin_auth_url=http://192.168.56.200:5000/v2.0
auth_strategy=keystone
ovs_bridge=br-int
extension_sync_interval=600
timeout=30
default_tenant_id=default
[osapi_v21]
[oslo_concurrency]
[oslo_messaging_amqp]
[oslo_messaging_qpid]
[oslo_messaging_rabbit]
amqp_durable_queues=False
kombu_reconnect_delay=1.0
rabbit_host=controller1,controller2,controller3
rabbit_port=5672
rabbit_hosts=controller1:5672,controller2:5672,controller3:5672
rabbit_use_ssl=False
rabbit_userid=guest
rabbit_password=guest
rabbit_virtual_host=/
rabbit_ha_queues=False
heartbeat_timeout_threshold=0
heartbeat_rate=2
[oslo_middleware]
[rdp]
[serial_console]
[spice]
[ssl]
[trusted_computing]
[upgrade_levels]
[vmware]
[vnc]
[workarounds]
[xenserver]
[zookeeper]
配置Neutron网络
[root@compute1 ~]# ovs-vsctl add-br br-data
[root@compute1 ~]# ovs-vsctl add-port br-data enp7s0f0
[root@compute1 ~]# egrep -v “$|#” /etc/neutron/plugins/ml2/openvswitch_agent.ini
[ovs]
integration_bridge = br-int
bridge_mappings =default:br-data
enable_tunneling=False
[agent]
polling_interval = 2
l2_population = False
arp_responder = False
prevent_arp_spoofing = True
enable_distributed_routing = False
extensions =
drop_flows_on_start=False
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
磁盘总量如下,使用了3T*9=27T(ceph3备份)
Ceph作为Linux PB级分布式文件系统,因其灵活智能可配置, 在软件定义存储的大潮中,越来越受到iass方案提供商的注意。
我们知道OpenStack中围绕虚拟机主要的存储需求来自于nova中的disk,glance中的image,cinder中的虚拟硬盘,本文中,我们将全部采用ceph作为这些存储的后端,摆脱现有部署中各搞一套的现状。本文主要主要是对Ceph使用的总结,因个人环境不同,可能存在各种环境与包依赖等问题,本人就遇到了qemu的版本不够,iptables等问题,欢迎交流。先画个集成逻辑图
CEPH底层为RADOS存储,提供访问RADOS的是librados库,librbd的调用就是基于librados,Nova只要是通过libvirt->qemu来调用librbd,所以我们知道暂时只有libvirtDriver支持,Cinder与Glance直接调用librbd。
CEPH存储集群中的层次结构也可见上图,主要是先文件条带化为obj, obj通过hash函数映射到PG(上图中Pool就是PG的容器),PG通过CRUSH算法均匀映射到OSD,OSD基于文件系统,比如xfs,ext4等等。
本文中将只使用三个osd(官方推荐是至少两个, 一个无法应对故障), 三个监视器(主要负责接受数据上报, 提供cluster map, 至少要三个, 一个不好容灾,奇数个可确保PAXOS算法能确定一批监视器里哪个版本的集群运行图是最新的) , 只放了一个mds, 这样的搭配基本是测试环境下最小的配置了,ceph很强调它的扩展性, 所以越多越好, 越能体现其优势
本文使用的系统环境: redhat6.5 四台机器 规划如下:
[html] view plain copy
mds 192.168.122.149 装一个mds 一个mon, 一个osd
osd 192.168.122.169 装一个mon, 一个osd
mon 192.168.122.41 装 一个mon, 一个osd
client 192.168.122.104 上安装openstack all-in-one,管理节点
三台机器组成ceph存储集群,hostname分别为mds,osd,mon,下文将都是使用这些短的hostname代表节点,其中在这三台机器上都部署monitor和对象存储osd,在mds上部署metadata服务器mds,另外一台机器作为openstack all-in-one环境节点 hostname:client
采用ceph-deploy来部署安装ceph, 这个类似与我们部署openstack用的chef。非常方便。
第一步: 在管理节点上修改/etc/hosts,ceph-deploy 后面的节点参数必须使用hostname,为了能够解析hostname,需要配置/etc/hosts,为下面粘贴部分的后四行。
[html] view plain copy
[root@client ceph ]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.122.149 mds
192.168.122.169 osd
192.168.122.41 mon
192.168.122.104 client
第二步:配置管理节点无密码访问其他节点,这个是方便我们使用ceph-deploy部署安装ceph
[html] view plain copy
[root@client install]# ssh-keygen
[root@client install]# ssh-copy-id mds
[root@client install]# ssh-copy-id ods
[root@client install]# ssh-copy-id mon
第三步:在client上添加yum源文件ceph.repo 使用最新版本 firefly, 本地环境是redhat 6.5, 所以baseurl中用rhel6, 本机为64位系统,后面的目录也使用的x86_64, 如下
[html] view plain copy
[root@client~]# cat /etc/yum.repos.d/ceph.repo
[Ceph]
name=Cephpackages for $basearch
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
enabled=1
baseurl=http://ceph.com/rpm-firefly/rhel6/x86_64
priority=1
gpgcheck=1
type=rpm-md
[ceph-source]
name=Cephsource packages
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
enabled=1
baseurl=http://ceph.com/rpm-firefly/rhel6/SRPMS
priority=1
gpgcheck=1
type=rpm-md
[Ceph-noarch]
name=Cephnoarch packages
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
enabled=1
baseurl=http://ceph.com/rpm-firefly/rhel6/noarch
priority=1
gpgcheck=1
type=rpm-md
第四步: 安装ceph
[html] view plain copy
[root@client~]# yum -y install ceph-deploy
本文使用ceph-deploy来做部署,这时最好建个目录,存放一些生成文件,避免在其他目录中与已有文件交杂在一起。
[html] view plain copy
[root@client ~]# mkdir ceph
[root@client ~]# cd ceph
建立一个集群包含mds osd mon
[html] view plain copy
[root@client ceph]# ceph-deploy new mds mon osd # 必须使用hostname
安装ceph在三个节点上。
[html] view plain copy
[root@client ceph]# ceph-deploy install mds mon osd
安装monitor
[html] view plain copy
[root@client ceph]# ceph-deploy mon create mds mon osd
收集keyring文件,Note: 做这个时候,这个如果mds mon osd上防火墙开着, 会收集不到,建议关掉,不然就要通过iptables设置相关rule,
不关报的错是:[ceph_deploy.gatherkeys][WARNIN]Unable to find /var/lib/ceph/bootstrap-mds/ceph.keyring
[html] view plain copy
[root@client ceph]# ceph-deploy gatherkeys mds #用其中一个节点即可
[root@client ceph]# ls
ceph.bootstrap-mds.keyring ceph.bootstrap-osd.keyring ceph.client.admin.keyring ceph.conf ceph.log ceph.mon.keyring
建立osd,默认是基于xfs文件系统,并激活。
[html] view plain copy
[root@client ceph]# ceph-deploy osd prepare mds:/opt/ceph mds:/opt/cephmon:/opt/ceph
[root@client ceph]# ceph-deploy osd activate mds:/opt/ceph mds:/opt/cephmon:/opt/ceph
创建metadata服务器
[html] view plain copy
[root@client ceph]# ceph-deploy mds create mds
这里插一段[如果只做ceph与openstack集成,请无视它]
第五步: 整合到nova,glance 和cinder的使用上
[html] view plain copy
[root@client ceph]# yum install ceph
[root@client ceph]# rados mkpool volumes
[root@client ceph]# rados mkpool images
[root@client ceph]# ceph osd pool set volumes size 3
[root@client ceph]# ceph osd pool set images size 3
[root@client ceph]# ceph osd lspools
0data,1 metadata,2 rbd,4 volumes,5 images,
keyring
[html] view plain copy
[root@client ceph]# ceph auth get-or-create client.volumes mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=volumes,allow rx pool=images’ -o /etc/ceph/client.volumes.keyring
[root@client ceph]# ceph auth get-or-create client.images mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=images’ -o /etc/ceph/client.images.keyring
在ceph.conf 中加上后面的几行。
[html] view plain copy
[root@clientceph]# cat /etc/ceph/ceph.conf
[global]
auth_service_required= cephx
filestore_xattr_use_omap= true
auth_client_required= cephx
auth_cluster_required= cephx
mon_host= 192.168.122.149,192.168.122.169,192.168.122.41
mon_initial_members= mds, osd, mon
fsid= 3493ee7b-ce67-47ce-9ce1-f5d6b219a709
[client.volumes] #此行开始是加的
keyring= /etc/ceph/client.volumes.keyring
[client.images]
keyring= /etc/ceph/client.images.keyring
一定要加上,不然glance image upload的时候会报如下错
Requestreturned failure status.
500Internal Server Error
GL-F9EE247Failed to upload image 1c3d2523-9752-4644-89c2-b066428144fd
(HTTP500)
本文中安装的qemu,libvirt均为新版本,版本过低可能存在rbd的支持问题。编译安装的方法为qemu安装libvirt安装
【问题: qemu版本问题,必须能支持rbd格式的,因为libvirt就是通过qemu相关命令与ceph存储交互,可通过"qemu-img–help”察看。
qemu-img version 0.12.1,
Supported formats: raw cow qcow vdi vmdk cloop dmg bochs vpc vvfat qcow2 qedvhdx parallels nbd blkdebug host_cdrom host_floppy host_device filegluster gluster gluster gluster
可以看到0.12.1不支持rbd,要装0.15以上的】
为了集成nova, 先做如下给libvirt创建密钥的操作,这个密钥在qemu执行创建image命令时会使用到,应在nova-compute服务运行的节点上执行如下操作。本文使用all-in-one,所以直接在client节点执行
[html] view plain copy
[root@client ~]#ceph auth get-key client.volumes | ssh client tee client.volumes.key #ssh后紧跟的是本机的hostname
[root@client ~]# cat > secret.xml << EOF
client.volumes secret
EOF
[root@client ~]# sudo virsh secret-define --file secret.xml
Secret ce31d8b1-62b5-1561-a489-be305336520a created
[root@client ~]# sudo virsh secret-set-value --secret ce31d8b1-62b5-1561-a489-be305336520a --base64 $(cat client.volumes.key) &&rm client.volumes.key secret.xml
Secret value set
rm:是否删除普通文件 “client.volumes.key”?y
rm:是否删除普通文件 “secret.xml”?y
修改配置文件
/etc/glance/glance-api.conf
[html] view plain copy
default_store = rbd
show_image_direct_url = True
bd_store_user = images
rbd_store_pool = images
/etc/cinder/cinder.conf
[html] view plain copy
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_pool=volumes
rbd_user=volumes
rbd_secret_uuid=ce31d8b1-62b5-1561-a489-be305336520a
/etc/nova/nova.conf
[html] view plain copy
images_type=rbd
images_rbd_pool=volumes
rbd_user=volumes
rbd_secret_uuid=ce31d8b1-62b5-1561-a489-be305336520a
接着重启glance-api, cinder-volume,nova-compute
[html] view plain copy
[root@clientceph]# service openstack-glance-api restart
Stoppingopenstack-glance-api: [ OK ]
Startingopenstack-glance-api: [ OK ]
[root@clientceph]# glance image-create --disk-format qcow2 --is-public True --container-format bare --file cirros-0.3.1-x86_64-disk.img --name cirros
±-----------------±-------------------------------------+
|Property | Value |
±-----------------±-------------------------------------+
|checksum | d972013792949d0d3ba628fbe8685bce |
|container_format | bare |
|created_at | 2014-06-24T08:49:43 |
|deleted | False |
|deleted_at | None |
|disk_format | qcow2 |
|id | 77b79879-addb-4a22-b750-7f0ef51ec154 |
|is_public | True |
|min_disk | 0 |
|min_ram | 0 |
|name | cirros |
|owner | f17fbd28fa184a39830f14a2e01a3b70 |
|protected | False |
|size | 13147648 |
|status | active |
|updated_at | 2014-06-24T08:50:01 |
|virtual_size | None |
±-----------------±-------------------------------------+
[root@clientceph]# glance index
ID Name DiskFormat Container Format Size
77b79879-addb-4a22-b750-7f0ef51ec154cirros qcow2 bare 13147648
[root@client ceph]# service openstack-cinder-volume restart
Stopping openstack-cinder-volume: [ OK ]
Starting openstack-cinder-volume: [ OK ]
[root@client ceph]# cinder create –display-name test-ceph 1
[root@client ceph]# cinder list
±-------------------------------------±----------±-------------±-----±------------±---------±------------+
| ID | Status | Display Name | Size| Volume Type | Bootable | Attached to |
±-------------------------------------±----------±-------------±-----±------------±---------±------------+
|1cc908d0-bbe9-4008-a10f-80cf1aa53afb | available | test-ceph | 1 | None | false | |
±-------------------------------------±----------±-------------±-----±------------±---------±------------+
[root@client ceph]# service openstack-nova-compute restart
Stopping openstack-nova-compute: [ OK ]
Starting openstack-nova-compute: [ OK ]
[root@client ceph]# nova boot --image image1 --flavor 1 xiao-new
[root@client ceph]# nova list
±-------------------------------------±----------±-------±-----------±------------±--------------------+
| ID | Name | Status | Task State | Power State | Networks |
±-------------------------------------±----------±-------±-----------±------------±--------------------+
| f6b04300-2d60-47d3-a65b-2d4ce32eeced | xiao-new | ACTIVE | - | Running | net_local=10.0.1.33 |
±-------------------------------------±----------±-------±-----------±------------±---------
openstack 管理三十九 - cinder 连接多个 ceph 存储方法
环境说明
当前 openstack 使用正常
由于后端 ceph 存储容量已经超过 60%
不希望直接进行扩容, 因为会存在大量的数据迁移问题
新创建另外一个 ceph 集群, 并计划用于 openstack 成为一个新的 ceph 后端
旧的 ceph 集群称为 ceph-A, 使用中的 pool 为 volumes
新的 ceph 集群称为 ceph-B, 使用中的 pool 为 develop-ceph
目标
在 openstack 中, 同时连接到两个不同的 ceph backend
cinder server 配置
pwd
…
[CEPH_SATA]
glance_api_version=2
volume_backend_name=ceph_sata
rbd_ceph_conf=/etc/ceph/ceph-volumes.conf
rbd_user=cinder
rbd_flatten_volume_from_snapshot=False
rados_connect_timeout=-1
rbd_max_clone_depth=5
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_store_chunk_size=4
rbd_secret_uuid=dc4f91c1-8792-4948-b68f-2fcea75f53b9
rbd_pool=volumes
host=hh-yun-cinder.vclound.com
[CEPH_DEVELOP]
glance_api_version=2
volume_backend_name=ceph_develop
rbd_ceph_conf=/etc/ceph/ceph-develop.conf
rbd_user=developcinder
rbd_flatten_volume_from_snapshot=False
rados_connect_timeout=-1
rbd_max_clone_depth=5
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_store_chunk_size=4
rbd_secret_uuid=4bf07d3e-a289-456d-9bd9-5a89832b413b
rbd_pool=develop-ceph
host=hh-yun-cinder.vclound.com
命令行对 cinder 服务进行管理
重启服务后, 将会看到增加了 hh-yun-cinder.vclound.com@CEPH_DEVELOP 服务类型
[root@hh-yun-puppet-129021 ~(keystone_admin)]# cinder service-list
±-----------------±---------------------------------------±-----±--------
| Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
±-----------------±---------------------------------------±-----±--------| cinder-backup | hh-yun-cinder.vclound.com | nova | enabled | up | 2017-04-18T06:14:57.000000 | None |
| cinder-scheduler | hh-yun-cinder.vclound.com | nova | enabled | up | 2017-04-18T06:14:49.000000 | None |
| cinder-volume | hh-yun-cinder.vclound.com@CEPH_DEVELOP | nova | enabled | up | 2017-04-18T06:14:53.000000 | None |
| cinder-volume | hh-yun-cinder.vclound.com@CEPH_SATA | nova | enabled | up | 2017-04-18T06:14:53.000000 | None |
±-----------------±---------------------------------------±-----±-------+
为 cinder 添加新的 type
[root@hh-yun-puppet-129021 ~(keystone_admin)]# cinder type-create DEVELOP-CEPH
±-------------------------------------±-------------+
| ID | Name |
±-------------------------------------±-------------+
| 14b43bcb-0085-401d-8e2f-504587cf3589 | DEVELOP-CEPH |
±-------------------------------------±-------------+
查询 type
[root@hh-yun-puppet-129021 ~(keystone_admin)]# cinder type-list
±-------------------------------------±---------------+
| ID | Name |
±-------------------------------------±---------------+
| 14b43bcb-0085-401d-8e2f-504587cf3589 | DEVELOP-CEPH |
| 45fdd68a-ca0f-453c-bd10-17e826a1105e | CEPH-SATA |
±-------------------------------------±---------------+
添加 extra 信息
[root@hh-yun-db-129041 ~(keystone_admin)]# cinder type-key DEVELOP-CEPH set volume_backend_name=ceph_develop
1
验证 extra 信息
[root@hh-yun-db-129041 ~(keystone_admin)]# cinder extra-specs-list
±-------------------------------------±---------------±-------------------
| ID | Name | extra_specs |
±-------------------------------------±---------------±-------------------
| 14b43bcb-0085-401d-8e2f-504587cf3589 | DEVELOP-CEPH | {u’volume_backend_name’: u’ceph_develop’} |
| 45fdd68a-ca0f-453c-bd10-17e826a1105e | CEPH-SATA | {u’volume_backend_name’: u’ceph_sata’} |
±-------------------------------------±---------------±-------------------
验证
利用命令行进行 cinder volume 创建验证
[root@hh-yun-db-129041 ceph(keystone_admin)]# cinder create --display-name tt-test --volume-type DEVELOP-CEPH 20
±--------------------±-------------------------------------+
| Property | Value |
±--------------------±-------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| created_at | 2017-04-18T07:02:27.783977 |
| display_description | None |
| display_name | tt-test |
| encrypted | False |
| id | 4fd11447-fd34-4dd6-8da3-634cf1c67a1e |
| metadata | {} |
| size | 20 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| user_id | 226e71f1c1aa4bae85485d1d17b6f0ae |
| volume_type | DEVELOP-CEPH | <- 指定 ceph-B 集群
±--------------------±-------------------------------------+
[root@hh-yun-db-129041 ceph(keystone_admin)]# cinder create --display-name tt-test02 --volume-type CEPH-SATA 20
±--------------------±-------------------------------------+
| Property | Value |
±--------------------±-------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| created_at | 2017-04-18T07:03:20.880786 |
| display_description | None |
| display_name | tt-test02 |
| encrypted | False |
| id | f7f11c03-e2dc-44a4-bc5b-6718fc4c064d |
| metadata | {} |
| size | 20 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| user_id | 226e71f1c1aa4bae85485d1d17b6f0ae |
| volume_type | CEPH-SATA | <- 指定 ceph-A 集群
±--------------------±-------------------------------------+
验证
[root@hh-yun-db-129041 ceph(keystone_admin)]# cinder list
±-------------------------------------±----------±-------------±-----±-------------±---------±------------------+
| ID | Status | Display Name | Size | Volume Type | Bootable | Attached to |
±-------------------------------------±----------±-------------±-----±-------------±---------±------------------+
| 422a43f7-79b3-4fa1-a300-2ad1f3d63018 | in-use | dd250 | 250 | CEPH-SATA | false | 208a713d-ae71-4243-94a8-5a3ab22126d7 |
| 4fd11447-fd34-4dd6-8da3-634cf1c67a1e | available | tt-test | 20 | DEVELOP-CEPH | false | |
| f7f11c03-e2dc-44a4-bc5b-6718fc4c064d | available | tt-test02 | 20 | CEPH-SATA | false | |
±-------------------------------------±----------±-------------±-----±-------------±---------±-------------------+
从 volume type 中可以看到, 这两个新创建云盘, 分别存储到不同的 cinder volume backend
nova compuet 连接 cinder
注意: opesntack nova compute 是不支持上述方法同时连接两个不同 ceph mon 的 ceph 集群
openstack 管理二十三 - nova compute 连接 ceph 集群
前提
配置 nova compute, 令其他与 ceph 集群进行连接
最终目的能够允许 instances 利用 ceph RBD 当作外部卷使用
nova compute 默认已经能够正常工作, 当前只添加 ceph 连接部分
安装软件
yum install -y python-ceph ceph
1
2
额外的目录
mkdir -p /var/run/ceph/guests/ /var/log/qemu/
chown qemu:qemu /var/run/ceph/guests /var/log/qemu/
确认版本
确保 qemu 支持 rbd 协议, 利用 ldd 检查是否支持 librbd.so 动态库
[root@hh-yun-compute-130133 ~]# ldd /usr/bin/qemu-img | grep rbd
librbd.so.1 => /lib64/librbd.so.1 (0x00007fa708216000)
也可以通过命令行进行检测
[root@hh-yun-compute-130133 ~]# qemu-img -h | grep rbd
Supported formats: vvfat vpc vmdk vhdx vdi sheepdog rbd raw host_cdrom host_floppy host_device file qed qcow2 qcow parallels nbd iscsi gluster dmg cloop bochs blkverify blkdebug
当前使用的版本
qemu-img-1.5.3-86.el7_1.2.x86_64 qemu-kvm-1.5.3-86.el7_1.2.x86_64 qemu-kvm-common-1.5.3-86.el7_1.2.x86_64
配置
修改 nova compute 下的 ceph.conf
[global]
fsid = dc4f91c1-8792-4948-b68f-2fcea75f53b9
mon initial members = XXX.XXXX.XXXXX
mon host = XXX.XXX.XXX
public network = XXX.XXX.XXX.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
filestore xattr use omap = true
osd pool default size = 3
osd pool default min size = 1
osd pool default pg num = 10240
osd pool default pgp num = 10240
osd crush chooseleaf type = 1
[client]
rbd cache = true
rbd cache writethrough until flush = true
admin socket = /var/run/ceph/guests/ c l u s t e r − cluster- cluster−type. i d . id. id.pid. c c t i d . a s o k l o g f i l e = / v a r / l o g / q e m u / q e m u − g u e s t − cctid.asok log file = /var/log/qemu/qemu-guest- cctid.asoklogfile=/var/log/qemu/qemu−guest−pid.log
rbd concurrent management ops = 20
修改 nova compute 连接 ceph 方法配置
[libvirt]
libvirt_images_type = rbd
libvirt_images_rbd_pool = volumes
libvirt_images_rbd_ceph_conf = /etc/ceph/ceph.conf
libvirt_disk_cachemodes=“network=writeback”
rbd_user = cinder
rbd_secret_uuid = dc4f91c1-8792-4948-b68f-2fcea75f53b9
libvirt_inject_password = false
libvirt_inject_key = false
libvirt_inject_partition = -2
libvirt_live_migration_flag=“VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST,VIR_MIGRATE_TUNNELLED”
复制 ceph 服务器中 ceph.client.cinder.keyring 到 /etc/ceph 目录下 (参考上面 rbd_user=cinder)
添加 ceph 配置信息
/etc/libvirt/secrets/dc4f91c1-8792-4948-b68f-2fcea75f53b9.base64 (连接 ceph 的用户密钥定义)
AQADxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx==
/etc/libvirt/secrets/dc4f91c1-8792-4948-b68f-2fcea75f53b9.xml (连接 ceph 的用户定义)
dc4f91c1-8792-4948-b68f-2fcea75f53b9
client.volumes secret
注意:
1 文件属性必须 0600
/etc/libvirt/secrets/dc4f91c1-8792-4948-b68f-2fcea75f53b9.base64
/etc/libvirt/secrets/dc4f91c1-8792-4948-b68f-2fcea75f53b9.xml
2. /etc/libvirt/secrets/dc4f91c1-8792-4948-b68f-2fcea75f53b9.base64 文件内容必须不包含回车
重启服务
分别重启下面两个服务
systemctl restart libvirtd
systemctl restart openstack-nova-compute
ceph运维常用指令
集群
启动一个ceph 进程
启动mon进程
service ceph start mon.node1
启动msd进程
service ceph start mds.node1
启动osd进程
service ceph start osd.0
查看机器的监控状态
[root@client ~]# ceph health
HEALTH_OK
查看ceph的实时运行状态
[root@client ~]# ceph -w
cluster be1756f2-54f7-4d8f-8790-820c82721f17
health HEALTH_OK
monmap e2: 3 mons at {node1=10.240.240.211:6789/0,node2=10.240.240.212:6789/0,node3=10.240.240.213:6789/0}, election epoch 294, quorum 0,1,2 node1,node2,node3
mdsmap e95: 1/1/1 up {0=node2=up:active}, 1 up:standby
osdmap e88: 3 osds: 3 up, 3 in
pgmap v1164: 448 pgs, 4 pools, 10003 MB data, 2520 objects
23617 MB used, 37792 MB / 61410 MB avail
448 active+clean
2014-06-30 00:48:28.756948 mon.0 [INF] pgmap v1163: 448 pgs: 448 active+clean; 10003 MB data, 23617 MB used, 37792 MB / 61410 MB avail
检查信息状态信息
[root@client ~]# ceph -s
cluster be1756f2-54f7-4d8f-8790-820c82721f17
health HEALTH_OK
monmap e2: 3 mons at {node1=10.240.240.211:6789/0,node2=10.240.240.212:6789/0,node3=10.240.240.213:6789/0}, election epoch 294, quorum 0,1,2 node1,node2,node3
mdsmap e95: 1/1/1 up {0=node2=up:active}, 1 up:standby
osdmap e88: 3 osds: 3 up, 3 in
pgmap v1164: 448 pgs, 4 pools, 10003 MB data, 2520 objects
23617 MB used, 37792 MB / 61410 MB avail
448 active+clean
[root@client ~]#
查看ceph存储空间
[root@client ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
61410M 37792M 23617M 38.46
POOLS:
NAME ID USED %USED OBJECTS
data 0 10000M 16.28 2500
metadata 1 3354k 0 20
rbd 2 0 0 0
jiayuan 3 0 0 0
[root@client ~]#
删除一个节点的所有的ceph数据包
[root@node1 ~]# ceph-deploy purge node1
[root@node1 ~]# ceph-deploy purgedata node1
为ceph创建一个admin用户并为admin用户创建一个密钥,把密钥保存到/etc/ceph目录下:
或
为osd.0创建一个用户并创建一个key
为mds.node1创建一个用户并创建一个key
查看ceph集群中的认证用户及相关的key
ceph auth list
删除集群中的一个认证用户
ceph auth del osd.0
查看集群的详细配置
[root@node1 ~]# ceph daemon mon.node1 config show | more
查看集群健康状态细节
[root@admin ~]# ceph health detail
HEALTH_WARN 12 pgs down; 12 pgs peering; 12 pgs stuck inactive; 12 pgs stuck unclean
pg 3.3b is stuck inactive since forever, current state down+peering, last acting [1,2]
pg 3.36 is stuck inactive since forever, current state down+peering, last acting [1,2]
pg 3.79 is stuck inactive since forever, current state down+peering, last acting [1,0]
pg 3.5 is stuck inactive since forever, current state down+peering, last acting [1,2]
pg 3.30 is stuck inactive since forever, current state down+peering, last acting [1,2]
pg 3.1a is stuck inactive since forever, current state down+peering, last acting [1,0]
pg 3.2d is stuck inactive since forever, current state down+peering, last acting [1,0]
pg 3.16 is stuck inactive since forever, current state down+peering, last acting [1,2]
查看ceph log日志所在的目录
[root@node1 ~]# ceph-conf –name mon.node1 –show-config-value log_file
/var/log/ceph/ceph-mon.node1.log
mon
查看mon的状态信息
[root@client ~]# ceph mon stat
e2: 3 mons at {node1=10.240.240.211:6789/0,node2=10.240.240.212:6789/0,node3=10.240.240.213:6789/0}, election epoch 294, quorum 0,1,2 node1,node2,node3
查看mon的选举状态
[root@client ~]# ceph quorum_status
{“election_epoch”:294,”quorum”:[0,1,2],”quorum_names”:[“node1”,”node2”,”node3”],”quorum_leader_name”:”node1”,”monmap”:{“epoch”:2,”fsid”:”be1756f2-54f7-4d8f-8790-820c82721f17”,”modified”:”2014-06-26 18:43:51.671106”,”created”:”0.000000”,”mons”:[{“rank”:0,”name”:”node1”,”addr”:”10.240.240.211:6789/0”},{“rank”:1,”name”:”node2”,”addr”:”10.240.240.212:6789/0”},{“rank”:2,”name”:”node3”,”addr”:”10.240.240.213:6789/0”}]}}
查看mon的映射信息
[root@client ~]# ceph mon dump
dumped monmap epoch 2
epoch 2
fsid be1756f2-54f7-4d8f-8790-820c82721f17
last_changed 2014-06-26 18:43:51.671106
created 0.000000
0: 10.240.240.211:6789/0 mon.node1
1: 10.240.240.212:6789/0 mon.node2
2: 10.240.240.213:6789/0 mon.node3
删除一个mon节点
[root@node1 ~]# ceph mon remove node1
removed mon.node1 at 10.39.101.1:6789/0, there are now 3 monitors
2014-07-07 18:11:04.974188 7f4d16bfd700 0 monclient: hunting for new mon
获得一个正在运行的mon map,并保存在1.txt文件中
[root@node3 ~]# ceph mon getmap -o 1.txt
got monmap epoch 6
查看上面获得的map
[root@node3 ~]# monmaptool –print 1.txt
monmaptool: monmap file 1.txt
epoch 6
fsid 92552333-a0a8-41b8-8b45-c93a8730525e
last_changed 2014-07-07 18:22:51.927205
created 0.000000
0: 10.39.101.1:6789/0 mon.node1
1: 10.39.101.2:6789/0 mon.node2
2: 10.39.101.3:6789/0 mon.node3
[root@node3 ~]#
把上面的mon map注入新加入的节点
ceph-mon -i node4 –inject-monmap 1.txt
查看mon的amin socket
root@node1 ~]# ceph-conf –name mon.node1 –show-config-value admin_socket
/var/run/ceph/ceph-mon.node1.asok
查看mon的详细状态
[root@node1 ~]# ceph daemon mon.node1 mon_status
{ “name”: “node1”,
“rank”: 0,
“state”: “leader”,
“election_epoch”: 96,
“quorum”: [
0,
1,
2],
“outside_quorum”: [],
“extra_probe_peers”: [
“10.39.101.4:6789/0”],
“sync_provider”: [],
“monmap”: { “epoch”: 6,
“fsid”: “92552333-a0a8-41b8-8b45-c93a8730525e”,
“modified”: “2014-07-07 18:22:51.927205”,
“created”: “0.000000”,
“mons”: [
{ “rank”: 0,
“name”: “node1”,
“addr”: “10.39.101.1:6789/0”},
{ “rank”: 1,
“name”: “node2”,
“addr”: “10.39.101.2:6789/0”},
{ “rank”: 2,
“name”: “node3”,
“addr”: “10.39.101.3:6789/0”}]}
删除一个mon节点
[root@os-node1 ~]# ceph mon remove os-node1
removed mon.os-node1 at 10.40.10.64:6789/0, there are now 3 monitors
msd
查看msd状态
[root@client ~]# ceph mds stat
e95: 1/1/1 up {0=node2=up:active}, 1 up:standby
查看msd的映射信息
[root@client ~]# ceph mds dump
dumped mdsmap epoch 95
epoch 95
flags 0
created 2014-06-26 18:41:57.686801
modified 2014-06-30 00:24:11.749967
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 84
last_failure_osd_epoch 81
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap}
max_mds 1
in 0
up {0=5015}
failed
stopped
data_pools 0
metadata_pool 1
inline_data disabled
5015: 10.240.240.212:6808/3032 ‘node2’ mds.0.12 up:active seq 30
5012: 10.240.240.211:6807/3459 ‘node1’ mds.-1.0 up:standby seq 38
删除一个mds节点
[root@node1 ~]# ceph mds rm 0 mds.node1
mds gid 0 dne
osd
查看ceph osd运行状态
[root@client ~]# ceph osd stat
osdmap e88: 3 osds: 3 up, 3 in
查看osd映射信息
[root@client ~]# ceph osd dump
epoch 88
fsid be1756f2-54f7-4d8f-8790-820c82721f17
created 2014-06-26 18:41:57.687442
modified 2014-06-30 00:46:27.179793
flags
pool 0 ‘data’ replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool crash_replay_interval 45 stripe_width 0
pool 1 ‘metadata’ replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0
pool 2 ‘rbd’ replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0
pool 3 ‘jiayuan’ replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 73 owner 0 flags hashpspool stripe_width 0
max_osd 3
osd.0 up in weight 1 up_from 65 up_thru 75 down_at 64 last_clean_interval [53,55) 10.240.240.211:6800/3089 10.240.240.211:6801/3089 10.240.240.211:6802/3089 10.240.240.211:6803/3089 exists,up 8a24ad16-a483-4bac-a56a-6ed44ab74ff0
osd.1 up in weight 1 up_from 59 up_thru 74 down_at 58 last_clean_interval [31,55) 10.240.240.212:6800/2696 10.240.240.212:6801/2696 10.240.240.212:6802/2696 10.240.240.212:6803/2696 exists,up 8619c083-0273-4203-ba57-4b1dabb89339
osd.2 up in weight 1 up_from 62 up_thru 74 down_at 61 last_clean_interval [39,55) 10.240.240.213:6800/2662 10.240.240.213:6801/2662 10.240.240.213:6802/2662 10.240.240.213:6803/2662 exists,up f8107c04-35d7-4fb8-8c82-09eb885f0e58
[root@client ~]#
查看osd的目录树
[root@client ~]# ceph osd tree
-1 3 root default
-2 1 host node1
0 1 osd.0 up 1
-3 1 host node2
1 1 osd.1 up 1
-4 1 host node3
2 1 osd.2 up 1
down掉一个osd硬盘
[root@node1 ~]# ceph osd down 0 #down掉osd.0节点
在集群中删除一个osd硬盘
[root@node4 ~]# ceph osd rm 0
removed osd.0
在集群中删除一个osd 硬盘 crush map
[root@node1 ~]# ceph osd crush rm osd.0
在集群中删除一个osd的host节点
[root@node1 ~]# ceph osd crush rm node1
removed item id -2 name ‘node1’ from crush map
查看最大osd的个数
[root@node1 ~]# ceph osd getmaxosd
max_osd = 4 in epoch 514 #默认最大是4个osd节点
设置最大的osd的个数(当扩大osd节点的时候必须扩大这个值)
[root@node1 ~]# ceph osd setmaxosd 10
设置osd crush的权重为1.0
ceph osd crush set {id} {weight} [{loc1} [{loc2} …]]
例如:
[root@admin ~]# ceph osd crush set 3 3.0 host=node4
set item id 3 name ‘osd.3’ weight 3 at location {host=node4} to crush map
[root@admin ~]# ceph osd tree
-1 6 root default
-2 1 host node1
0 1 osd.0 up 1
-3 1 host node2
1 1 osd.1 up 1
-4 1 host node3
2 1 osd.2 up 1
-5 3 host node4
3 3 osd.3 up 0.5
或者用下面的方式
[root@admin ~]# ceph osd crush reweight osd.3 1.0
reweighted item id 3 name ‘osd.3’ to 1 in crush map
[root@admin ~]# ceph osd tree
-1 4 root default
-2 1 host node1
0 1 osd.0 up 1
-3 1 host node2
1 1 osd.1 up 1
-4 1 host node3
2 1 osd.2 up 1
-5 1 host node4
3 1 osd.3 up 0.5
设置osd的权重
[root@admin ~]# ceph osd reweight 3 0.5
reweighted osd.3 to 0.5 (8327682)
[root@admin ~]# ceph osd tree
-1 4 root default
-2 1 host node1
0 1 osd.0 up 1
-3 1 host node2
1 1 osd.1 up 1
-4 1 host node3
2 1 osd.2 up 1
-5 1 host node4
3 1 osd.3 up 0.5
把一个osd节点逐出集群
[root@admin ~]# ceph osd out osd.3
marked out osd.3.
[root@admin ~]# ceph osd tree
-1 4 root default
-2 1 host node1
0 1 osd.0 up 1
-3 1 host node2
1 1 osd.1 up 1
-4 1 host node3
2 1 osd.2 up 1
-5 1 host node4
3 1 osd.3 up 0 # osd.3的reweight变为0了就不再分配数据,但是设备还是存活的
把逐出的osd加入集群
[root@admin ~]# ceph osd in osd.3
marked in osd.3.
[root@admin ~]# ceph osd tree
-1 4 root default
-2 1 host node1
0 1 osd.0 up 1
-3 1 host node2
1 1 osd.1 up 1
-4 1 host node3
2 1 osd.2 up 1
-5 1 host node4
3 1 osd.3 up 1
暂停osd (暂停后整个集群不再接收数据)
[root@admin ~]# ceph osd pause
set pauserd,pausewr
再次开启osd (开启后再次接收数据)
[root@admin ~]# ceph osd unpause
unset pauserd,pausewr
查看一个集群osd.2参数的配置
ceph –admin-daemon /var/run/ceph/ceph-osd.2.asok config show | less
PG组
查看pg组的映射信息
[root@client ~]# ceph pg dump
dumped all in format plain
version 1164
stamp 2014-06-30 00:48:29.754714
last_osdmap_epoch 88
last_pg_scan 73
full_ratio 0.95
nearfull_ratio 0.85
pg_stat objects mip degr unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrudeep_scrub_stamp
0.3f 39 0 0 0 163577856 128 128 active+clean 2014-06-30 00:30:59.193479 52’128 88:242 [0,2] 0 [0,2] 0 44’25 2014-06-29 22:25:25.282347 0’0 2014-06-26 19:52:08.521434
3.3c 0 0 0 0 0 0 0 active+clean 2014-06-30 00:15:38.675465 0’0 88:21 [2,1] 2 [2,1] 2 0’0 2014-06-30 00:15:04.295637 0’0 2014-06-30 00:15:04.295637
2.3c 0 0 0 0 0 0 0 active+clean 2014-06-30 00:10:48.583702 0’0 88:46 [2,1] 2 [2,1] 2 0’0 2014-06-29 22:29:13.701625 0’0 2014-06-26 19:52:08.845944
1.3f 2 0 0 0 452 2 2 active+clean 2014-06-30 00:10:48.596050 16’2 88:66 [2,1] 2 [2,1] 2 16’2 2014-06-29 22:28:03.570074 0’0 2014-06-26 19:52:08.655292
0.3e 31 0 0 0 130023424 130 130 active+clean 2014-06-30 00:26:22.803186 52’130 88:304 [2,0] 2 [2,0] 2 44’59 2014-06-29 22:26:41.317403 0’0 2014-06-26 19:52:08.518978
3.3d 0 0 0 0 0 0 0 active+clean 2014-06-30 00:16:57.548803 0’0 88:20 [0,2] 0 [0,2] 0 0’0 2014-06-30 00:15:19.101314 0’0 2014-06-30 00:15:19.101314
2.3f 0 0 0 0 0 0 0 active+clean 2014-06-30 00:10:58.750476 0’0 88:106 [0,2] 0 [0,2] 0 0’0 2014-06-29 22:27:44.604084 0’0 2014-06-26 19:52:08.864240
1.3c 1 0 0 0 0 1 1 active+clean 2014-06-30 00:10:48.939358 16’1 88:66 [1,2] 1 [1,2] 1 16’1 2014-06-29 22:27:35.991845 0’0 2014-06-26 19:52:08.646470
0.3d 34 0 0 0 142606336 149 149 active+clean 2014-06-30 00:23:57.348657 52’149 88:300 [0,2] 0 [0,2] 0 44’57 2014-06-29 22:25:24.279912 0’0 2014-06-26 19:52:08.514526
3.3e 0 0 0 0 0 0 0 active+clean 2014-06-30 00:15:39.554742 0’0 88:21 [2,1] 2 [2,1] 2 0’0 2014-06-30 00:15:04.296812 0’0 2014-06-30 00:15:04.296812
2.3e 0 0 0 0 0 0 0 active+clean 2014-06-30 00:10:48.592171 0’0 88:46 [2,1] 2 [2,1] 2 0’0 2014-06-29 22:29:14.702209 0’0 2014-06-26 19:52:08.855382
1.3d 0 0 0 0 0 0 0 active+clean 2014-06-30 00:10:48.938971 0’0 88:58 [1,2] 1 [1,2] 1 0’0 2014-06-29 22:27:36.971820 0’0 2014-06-26 19:52:08.650070
0.3c 41 0 0 0 171966464 157 157 active+clean 2014-06-30 00:24:55.751252 52’157 88:385 [1,0] 1 [1,0] 1 44’41 2014-06-29 22:26:34.829858 0’0 2014-06-26 19:52:08.513798
3.3f 0 0 0 0 0 0 0 active+clean 2014-06-30 00:17:08.416756 0’0 88:20 [0,1] 0 [0,1] 0 0’0 2014-06-30 00:15:19.406120 0’0 2014-06-30 00:15:19.406120
2.39 0 0 0 0 0 0 0 active+clean 2014-06-30 00:10:58.784789 0’0 88:71 [2,0] 2 [2,0] 2 0’0 2014-06-29 22:29:10.673549 0’0 2014-06-26 19:52:08.834644
1.3a 0 0 0 0 0 0 0 active+clean 2014-06-30 00:10:58.738782 0’0 88:106 [0,2] 0 [0,2] 0 0’0 2014-06-29 22:26:29.457318 0’0 2014-06-26 19:52:08.642018
0.3b 37 0 0 0 155189248 137 137 active+clean 2014-06-30 00:28:45.021993 52’137 88:278 [0,2] 0 [0,2] 0 44’40 2014-06-29 22:25:22.275783 0’0 2014-06-26 19:52:08.510502
3.38 0 0 0 0 0 0 0 active+clean 2014-06-30 00:16:13.222339 0’0 88:21 [1,0] 1 [1,0] 1 0’0 2014-06-30 00:15:05.446639 0’0 2014-06-30 00:15:05.446639
2.38 0 0 0 0 0 0 0 active+clean 2014-06-30 00:10:58.783103 0’0 88:71 [2,0] 2 [2,0] 2 0’0 2014-06-29 22:29:06.688363 0’0 2014-06-26 19:52:08.827342
1.3b 0 0 0 0 0 0 0 active+clean 2014-06-30 00:10:58.857283 0’0 88:78 [1,0] 1 [1,0] 1 0’0 2014-06-29 22:27:30.017050 0’0 2014-06-26 19:52:08.644820
0.3a 40 0 0 0 167772160 149 149 active+clean 2014-06-30 00:28:47.002342 52’149 88:288 [0,2] 0 [0,2] 0 44’46 2014-06-29 22:25:21.273679 0’0 2014-06-26 19:52:08.508654
3.39 0 0 0 0 0 0 0 active+clean 2014-06-30 00:16:13.255056 0’0 88:21 [1,0] 1 [1,0] 1 0’0 2014-06-30 00:15:05.447461 0’0 2014-06-30 00:15:05.447461
2.3b 0 0 0 0 0 0 0 active+clean 2014-06-30 00:10:48.935872 0’0 88:57 [1,2] 1 [1,2] 1 0’0 2014-06-29 22:28:35.095977 0’0 2014-06-26 19:52:08.844571
1.38 0 0 0 0 0 0 0 active+clean 2014-06-30 00:10:48.597540 0’0 88:46 [2,1] 2 [2,1] 2 0’0 2014-06-29 22:28:01.519137 0’0 2014-06-26 19:52:08.633781
0.39 48 0 0 0 201326592 164 164 active+clean 2014-06-30 00:25:30.757843 52’164 88:432 [1,0] 1 [1,0] 1 44’32 2014-06-29 22:26:33.823947 0’0 2014-06-26 19:52:08.504628
下面部分省略
查看一个PG的map
[root@client ~]# ceph pg map 0.3f
osdmap e88 pg 0.3f (0.3f) -> up [0,2] acting [0,2] #其中的[0,2]代表存储在osd.0、osd.2节点,osd.0代表主副本的存储位置
查看PG状态
[root@client ~]# ceph pg stat
v1164: 448 pgs: 448 active+clean; 10003 MB data, 23617 MB used, 37792 MB / 61410 MB avail
查询一个pg的详细信息
[root@client ~]# ceph pg 0.26 query
查看pg中stuck的状态
[root@client ~]# ceph pg dump_stuck unclean
ok
[root@client ~]# ceph pg dump_stuck inactive
ok
[root@client ~]# ceph pg dump_stuck stale
ok
显示一个集群中的所有的pg统计
ceph pg dump –format plain
恢复一个丢失的pg
ceph pg {pg-id} mark_unfound_lost revert
显示非正常状态的pg
ceph pg dump_stuck inactive|unclean|stale
pool
查看ceph集群中的pool数量
[root@admin ~]# ceph osd lspools
0 data,1 metadata,2 rbd,
在ceph集群中创建一个pool
ceph osd pool create jiayuan 100 #这里的100指的是PG组
为一个ceph pool配置配额
ceph osd pool set-quota data max_objects 10000
在集群中删除一个pool
ceph osd pool delete jiayuan jiayuan –yes-i-really-really-mean-it #集群名字需要重复两次
显示集群中pool的详细信息
[root@admin ~]# rados df
pool name category KB objects clones degraded unfound rd rd KB wr wr KB
data - 475764704 116155 0 0 0 0 0 116379 475764704
metadata - 5606 21 0 0 0 0 0 314 5833
rbd - 0 0 0 0 0 0 0 0 0
total used 955852448 116176
total avail 639497596
total space 1595350044
[root@admin ~]#
给一个pool创建一个快照
[root@admin ~]# ceph osd pool mksnap data date-snap
created pool data snap date-snap
删除pool的快照
[root@admin ~]# ceph osd pool rmsnap data date-snap
removed pool data snap date-snap
查看data池的pg数量
[root@admin ~]# ceph osd pool get data pg_num
pg_num: 64
设置data池的最大存储空间为100T(默认是1T)
[root@admin ~]# ceph osd pool set data target_max_bytes 100000000000000
set pool 0 target_max_bytes to 100000000000000
设置data池的副本数是3
[root@admin ~]# ceph osd pool set data size 3
set pool 0 size to 3
设置data池能接受写操作的最小副本为2
[root@admin ~]# ceph osd pool set data min_size 2
set pool 0 min_size to 2
查看集群中所有pool的副本尺寸
[root@admin mycephfs]# ceph osd dump | grep ‘replicated size’
pool 0 ‘data’ replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 26 owner 0 flags hashpspool crash_replay_interval 45 target_bytes 100000000000000 stripe_width 0
pool 1 ‘metadata’ replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0
pool 2 ‘rbd’ replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool stripe_width 0
设置一个pool的pg数量
[root@admin ~]# ceph osd pool set data pg_num 100
set pool 0 pg_num to 100
设置一个pool的pgp数量
[root@admin ~]# ceph osd pool set data pgp_num 100
set pool 0 pgp_num to 100
rados指令
查看ceph集群中有多少个pool (只是查看pool)
[root@node-44 ~]# rados lspools
data
metadata
rbd
images
volumes
.rgw.root
compute
.rgw.control
.rgw
.rgw.gc
.users.uid
查看ceph集群中有多少个pool,并且每个pool容量及利用情况
[root@node-44 ~]# rados df
pool name category KB objects clones degraded unfound rd rd KB wr wr KB
.rgw - 0 0 0 0 0 0 0 0 0
.rgw.control - 0 8 0 0 0 0 0 0 0
.rgw.gc - 0 32 0 0 0 57172 57172 38136 0
.rgw.root - 1 4 0 0 0 75 46 10 10
.users.uid - 1 1 0 0 0 0 0 2 1
compute - 67430708 16506 0 0 0 398128 75927848 1174683 222082706
data - 0 0 0 0 0 0 0 0 0
images - 250069744 30683 0 0 0 50881 195328724 65025 388375482
metadata - 0 0 0 0 0 0 0 0 0
rbd - 0 0 0 0 0 0 0 0 0
volumes - 79123929 19707 0 0 0 2575693 63437000 1592456 163812172
total used 799318844 66941
total avail 11306053720
total space 12105372564
[root@node-44 ~]#
创建一个pool
[root@node-44 ~]#rados mkpool test
查看ceph pool中的ceph object (这里的object是以块形式存储的)
[root@node-44 ~]# rados ls -p volumes | more
rbd_data.348f21ba7021.0000000000000866
rbd_data.32562ae8944a.0000000000000c79
rbd_data.589c2ae8944a.00000000000031ba
rbd_data.58c9151ff76b.00000000000029af
rbd_data.58c9151ff76b.0000000000002c19
rbd_data.58c9151ff76b.0000000000000a5a
rbd_data.58c9151ff76b.0000000000001c69
rbd_data.58c9151ff76b.000000000000281d
rbd_data.58c9151ff76b.0000000000002de1
rbd_data.58c9151ff76b.0000000000002dae
创建一个对象object
[root@admin-node ~]# rados create test-object -p test
[root@admin-node ~]# rados -p test ls
test-object
删除一个对象
[root@admin-node ~]# rados rm test-object-1 -p test
rbd命令的用法
查看ceph中一个pool里的所有镜像
[root@node-44 ~]# rbd ls images
2014-05-24 17:17:37.043659 7f14caa6e700 0 – :/1025604 >> 10.49.101.9:6789/0 pipe(0x6c5400 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x6c5660).fault
2182d9ac-52f4-4f5d-99a1-ab3ceacbf0b9
34e1a475-5b11-410c-b4c4-69b5f780f03c
476a9f3b-4608-4ffd-90ea-8750e804f46e
60eae8bf-dd23-40c5-ba02-266d5b942767
72e16e93-1fa5-4e11-8497-15bd904eeffe
74cb427c-cee9-47d0-b467-af217a67e60a
8f181a53-520b-4e22-af7c-de59e8ccca78
9867a580-22fe-4ed0-a1a8-120b8e8d18f4
ac6f4dae-4b81-476d-9e83-ad92ff25fb13
d20206d7-ff31-4dce-b59a-a622b0ea3af6
[root@node-44 ~]# rbd ls volumes
2014-05-24 17:22:18.649929 7f9e98733700 0 – :/1010725 >> 10.49.101.9:6789/0 pipe(0x96a400 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x96a660).fault
volume-0788fc6c-0dd4-4339-bad4-e9d78bd5365c
volume-0898c5b4-4072-4cae-affc-ec59c2375c51
volume-2a1fb287-5666-4095-8f0b-6481695824e2
volume-35c6aad4-8ea4-4b8d-95c7-7c3a8e8758c5
volume-814494cc-5ae6-4094-9d06-d844fdf485c4
volume-8a6fb0db-35a9-4b3b-9ace-fb647c2918ea
volume-8c108991-9b03-4308-b979-51378bba2ed1
volume-8cf3d206-2cce-4579-91c5-77bcb4a8a3f8
volume-91fc075c-8bd1-41dc-b5ef-844f23df177d
volume-b1263d8b-0a12-4b51-84e5-74434c0e73aa
volume-b84fad5d-16ee-4343-8630-88f265409feb
volume-c03a2eb1-06a3-4d79-98e5-7c62210751c3
volume-c17bf6c0-80ba-47d9-862d-1b9e9a48231e
volume-c32bca55-7ec0-47ce-a87e-a883da4b4ccd
volume-df8961ef-11d6-4dae-96ee-f2df8eb4a08c
volume-f1c38695-81f8-44fd-9af0-458cddf103a3
查看ceph pool中一个镜像的信息
[root@node-44 ~]# rbd info -p images –image 74cb427c-cee9-47d0-b467-af217a67e60a
rbd image ‘74cb427c-cee9-47d0-b467-af217a67e60a’:
size 1048 MB in 131 objects
order 23 (8192 KB objects)
block_name_prefix: rbd_data.95c7783fc0d0
format: 2
features: layering
在test池中创建一个命名为zhanguo的10000M的镜像
[root@node-44 ~]# rbd create -p test –size 10000 zhanguo
[root@node-44 ~]# rbd -p test info zhanguo #查看新建的镜像的信息
rbd image ‘zhanguo’:
size 10000 MB in 2500 objects
order 22 (4096 KB objects)
block_name_prefix: rb.0.127d2.2ae8944a
format: 1
[root@node-44 ~]#
删除一个镜像
[root@node-44 ~]# rbd rm -p test lizhanguo
Removing image: 100% complete…done.
调整一个镜像的尺寸
[root@node-44 ~]# rbd resize -p test –size 20000 zhanguo
Resizing image: 100% complete…done.
[root@node-44 ~]# rbd -p test info zhanguo #调整后的镜像大小
rbd image ‘zhanguo’:
size 20000 MB in 5000 objects
order 22 (4096 KB objects)
block_name_prefix: rb.0.127d2.2ae8944a
format: 1
[root@node-44 ~]#
给一个镜像创建一个快照
[root@node-44 ~]# rbd snap create test/zhanguo@zhanguo123 #池/镜像@快照
[root@node-44 ~]# rbd snap ls -p test zhanguo
SNAPID NAME SIZE
2 zhanguo123 20000 MB
[root@node-44 ~]#
[root@node-44 ~]# rbd info test/zhanguo@zhanguo123
rbd image ‘zhanguo’:
size 20000 MB in 5000 objects
order 22 (4096 KB objects)
block_name_prefix: rb.0.127d2.2ae8944a
format: 1
protected: False
[root@node-44 ~]#
查看一个镜像文件的快照
[root@os-node101 ~]# rbd snap ls -p volumes volume-7687988d-16ef-4814-8a2c-3fbd85e928e4
SNAPID NAME SIZE
5 snapshot-ee7862aa-825e-4004-9587-879d60430a12 102400 MB
删除一个镜像文件的一个快照快照
[root@os-node101 ~]# rbd snap rm volumes/volume-7687988d-16ef-4814-8a2c-3fbd85e928e4@snapshot-ee7862aa-825e-4004-9587-879d60430a12
rbd: snapshot ‘snapshot-60586eba-b0be-4885-81ab-010757e50efb’ is protected from removal.
2014-08-18 19:23:42.099301 7fd0245ef760 -1 librbd: removing snapshot from header failed: (16) Device or resource busy
上面不能删除显示的报错信息是此快照备写保护了,下面命令是删除写保护后再进行删除。
[root@os-node101 ~]# rbd snap unprotect volumes/volume-7687988d-16ef-4814-8a2c-3fbd85e928e4@snapshot-ee7862aa-825e-4004-9587-879d60430a12
[root@os-node101 ~]# rbd snap rm volumes/volume-7687988d-16ef-4814-8a2c-3fbd85e928e4@snapshot-ee7862aa-825e-4004-9587-879d60430a12
删除一个镜像文件的所有快照
[root@os-node101 ~]# rbd snap purge -p volumes volume-7687988d-16ef-4814-8a2c-3fbd85e928e4
Removing all snapshots: 100% complete…done.
把ceph pool中的一个镜像导出
导出镜像
[root@node-44 ~]# rbd export -p images –image 74cb427c-cee9-47d0-b467-af217a67e60a /root/aaa.img
2014-05-24 17:16:15.197695 7ffb47a9a700 0 – :/1020493 >> 10.49.101.9:6789/0 pipe(0x1368400 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x1368660).fault
Exporting image: 100% complete…done.
导出云硬盘
[root@node-44 ~]# rbd export -p volumes –image volume-470fee37-b950-4eef-a595-d7def334a5d6 /var/lib/glance/ceph-pool/volumes/Message-JiaoBenJi-10.40.212.24
2014-05-24 17:28:18.940402 7f14ad39f700 0 – :/1032237 >> 10.49.101.9:6789/0 pipe(0x260a400 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x260a660).fault
Exporting image: 100% complete…done.
把一个镜像导入ceph中 (但是直接导入是不能用的,因为没有经过openstack,openstack是看不到的)
[root@node-44 ~]# rbd import /root/aaa.img -p images –image 74cb427c-cee9-47d0-b467-af217a67e60a
Importing image: 100% complete…done.