ceph官方为一些常用平台编译了rpm包,对于Centos7,可到此下载:http://download.ceph.com/rpm-luminous/el7/x86_64/
根据你需要的功能而定,一般不需要全部下载,我下载了下列rpm:
ceph-12.1.3-0.el7.x86_64.rpm
ceph-base-12.1.3-0.el7.x86_64.rpm
ceph-common-12.1.3-0.el7.x86_64.rpm
ceph-mds-12.1.3-0.el7.x86_64.rpm
ceph-mgr-12.1.3-0.el7.x86_64.rpm
ceph-mon-12.1.3-0.el7.x86_64.rpm
ceph-osd-12.1.3-0.el7.x86_64.rpm
ceph-radosgw-12.1.3-0.el7.x86_64.rpm
ceph-selinux-12.1.3-0.el7.x86_64.rpm
libcephfs2-12.1.3-0.el7.x86_64.rpm
librados2-12.1.3-0.el7.x86_64.rpm
librados-devel-12.1.3-0.el7.x86_64.rpm
libradosstriper1-12.1.3-0.el7.x86_64.rpm
libradosstriper-devel-12.1.3-0.el7.x86_64.rpm
librbd1-12.1.3-0.el7.x86_64.rpm
librgw2-12.1.3-0.el7.x86_64.rpm
python-cephfs-12.1.3-0.el7.x86_64.rpm
python-rados-12.1.3-0.el7.x86_64.rpm
python-rbd-12.1.3-0.el7.x86_64.rpm
python-rgw-12.1.3-0.el7.x86_64.rpm
rpm -hiv librados2-12.1.3-0.el7.x86_64.rpm
rpm -hiv python-rados-12.1.3-0.el7.x86_64.rpm
rpm -hiv librbd1-12.1.3-0.el7.x86_64.rpm
rpm -hiv python-rbd-12.1.3-0.el7.x86_64.rpm
rpm -hiv libcephfs2-12.1.3-0.el7.x86_64.rpm
rpm -hiv python-cephfs-12.1.3-0.el7.x86_64.rpm
rpm -hiv librgw2-12.1.3-0.el7.x86_64.rpm
rpm -hiv librados-devel-12.1.3-0.el7.x86_64.rpm
rpm -hiv libradosstriper1-12.1.3-0.el7.x86_64.rpm
rpm -hiv libradosstriper-devel-12.1.3-0.el7.x86_64.rpm
rpm -hiv python-rgw-12.1.3-0.el7.x86_64.rpm
rpm -hiv ceph-common-12.1.3-0.el7.x86_64.rpm
rpm -hiv ceph-selinux-12.1.3-0.el7.x86_64.rpm ceph-base-12.1.3-0.el7.x86_64.rpm
rpm -hiv ceph-osd-12.1.3-0.el7.x86_64.rpm
rpm -hiv ceph-mon-12.1.3-0.el7.x86_64.rpm
rpm -hiv ceph-mds-12.1.3-0.el7.x86_64.rpm
rpm -hiv ceph-mgr-12.1.3-0.el7.x86_64.rpm
rpm -hiv ceph-12.1.3-0.el7.x86_64.rpm
rpm -hiv ceph-radosgw-12.1.3-0.el7.x86_64.rpm
本例子在node2上手动部署一个单机ceph集群:testcluster。本机器有两个网卡:192.168.100.132和192.168.73.132,分别做集群的public network和cluster network。另外,本机器有三块磁盘,用做OSD。
为了看清楚集群名的作用,我没有使用默认的集群名ceph,而是使用testcluster作为集群名。很多命令都有一个--cluster {cluster-name}选项,并根据此全选项来找到配置文件{cluster-name}.conf。不指定时,命令就使用默认集群名ceph,因此,多数情况下,我们的配置文件为ceph.conf,命令中也不用特意加上 --cluster ceph选项。
另外,为了方便,我使用root用户。
基于上面两个原因,我需要修改一下这些systemctl配置文件:
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
修改:
Environment=CLUSTER=ceph <--- 改成CLUSTER=testcluster
ExecStart=/usr/bin/... --id %i --setuser ceph --setgroup ceph <--- 删掉--setuser ceph --setgroup ceph
[global]
cluster = testcluster
fsid = a7f64266-0894-4f1e-a635-d0aeaca0e993
mon initial members = node2
mon host = 192.168.100.132
public network = 192.168.100.0/24
cluster network = 192.168.73.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 1
osd pool default min size = 1
osd pool default pg num = 33
osd pool default pgp num = 33
osd crush chooseleaf type = 1
ceph中的osd,mon,mds,client等被抽象成user。这里的user可以是一个人,也可以是程序模块。每个user由type和id定位。例如:client.admin的type是client,id是admin;再如,osd.3的type是osd,id是3。每个user有key和权限。其权限由CAP (Capabilities)描述。
ceph中对权限的操作分两种:
这里,通过第一种方式,先生成keyring文件,后面创建集群时将其带入。
ceph-authtool --create-keyring /tmp/testcluster.mon.keyring --gen-key -n mon. --cap mon 'allow *'
ceph-authtool --create-keyring /etc/ceph/testcluster.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *'
ceph-authtool /tmp/testcluster.mon.keyring --import-keyring /etc/ceph/testcluster.client.admin.keyring
ceph-authtool --create-keyring /etc/ceph/testcluster.client.bootstrap-osd.keyring --gen-key -n client.bootstrap-osd --cap mon 'allow profile bootstrap-osd'
ceph-authtool /tmp/testcluster.mon.keyring --import-keyring /etc/ceph/testcluster.client.bootstrap-osd.keyring
# cat testcluster.client.admin.keyring
[client.admin]
key = AQBhVpFZr7x8MBAAMaLBiv5Zvkcg+S9oD+pEBA==
auid = 0
caps mds = "allow *"
caps mgr = "allow *"
caps mon = "allow *"
caps osd = "allow *"
# cat testcluster.client.bootstrap-osd.keyring
[client.bootstrap-osd]
key = AQDaVpFZ2z7vKhAARWoHu4u75lrE1gfDFoLjCg==
caps mon = "allow profile bootstrap-osd"
# cat /tmp/testcluster.mon.keyring
[mon.]
key = AQATVpFZ4/XdNBAACvFjCBjGz1G4m1WIum8+Jw==
caps mon = "allow *"
[client.admin]
key = AQBhVpFZr7x8MBAAMaLBiv5Zvkcg+S9oD+pEBA==
auid = 0
caps mds = "allow *"
caps mgr = "allow *"
caps mon = "allow *"
caps osd = "allow *"
[client.bootstrap-osd]
key = AQDaVpFZ2z7vKhAARWoHu4u75lrE1gfDFoLjCg==
caps mon = "allow profile bootstrap-osd"
monmaptool --create --add node2 192.168.100.132 --fsid a7f64266-0894-4f1e-a635-d0aeaca0e993 /tmp/monmap
单机环境下,只有一个monitor。和keyring类似,monmap只是一个本地文件,后面创建集群时将其带入。它的内容可以通过monmaptool --print查看:
# monmaptool --print /tmp/monmap
monmaptool: monmap file /tmp/monmap
epoch 0
fsid a7f64266-0894-4f1e-a635-d0aeaca0e993
last_changed 2017-08-14 07:55:57.114621
created 2017-08-14 07:55:57.114621
0: 192.168.100.132:6789/0 mon.node2
配置项{mon data}的默认值是“/var/lib/ceph/mon/$cluster-$id” (当然,我们可以在testcluster.conf中把它改成别的目录),我们的cluster名叫testcluster,monitor id是node2,故创建如下目录:
mkdir /var/lib/ceph/mon/testcluster-node2
即在{mon data}中生成monitor的初始文件。它需要{mon data}目录存在,所以前面我们创建了此目录。注意,这里我们带入了前面生成的keyring和monmap。
ceph-mon --cluster testcluster --mkfs -i node2 --monmap /tmp/monmap --keyring /tmp/testcluster.mon.keyring
成功之后,我们可以看见{mon data}目录下生成了一些初始文件
# ls /var/lib/ceph/mon/testcluster-node2
keyring kv_backend store.db
touch /var/lib/ceph/mon/testcluster-node2/done
systemctl start ceph-mon@node2
这时可以看集群的状态:
# ps -ef |grep ceph
root 67290 1 2 08:10 ? 00:00:00 /usr/bin/ceph-mon -f --cluster testcluster --id node2
# ceph --cluster testcluster -s
cluster:
id: a7f64266-0894-4f1e-a635-d0aeaca0e993
health: HEALTH_OK
services:
mon: 1 daemons, quorum node2
mgr: no daemons active
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs:
这里,我们使用不同的方式添加三个osd。这只是为了测试,不同添加方式的区别。现实中不必如此。
2.5.1.1 删除/dev/sdb的所有分区
2.5.1.2 为osd生成uuid
# uuidgen
a270cc4b-54e4-4d5f-ab6c-d31b3037b6c7
2.5.1.3 为osd生成cephx key
# ceph-authtool --gen-print-key
AQA/XJFZmeaDFRAAKpJ1o6XXnrC6cTMLws5GrA==
2.5.1.4 生成osd号 (如下,生成的osd号为0)
# echo "{\"cephx_secret\": \"AQA/XJFZmeaDFRAAKpJ1o6XXnrC6cTMLws5GrA==\"}" | ceph --cluster testcluster osd new a270cc4b-54e4-4d5f-ab6c-d31b3037b6c7 -i - -n client.bootstrap-osd -k /tmp/testcluster.mon.keyring
0
2.5.1.5 格式化并mount磁盘
和前面{mon data}类似,这里的文件夹是{osd data};默认值是:/var/lib/ceph/osd/$cluster-$id。我们也可以在testcluster.conf中修改它。
mkdir /var/lib/ceph/osd/testcluster-0
mkfs.xfs -f -i size=1024 /dev/sdb
mount -o rw,noatime,nobarrier,inode64,logbsize=256k,delaylog /dev/sdb /var/lib/ceph/osd/testcluster-0
2.5.1.6 为osd创建keyring
ceph-authtool --create-keyring /var/lib/ceph/osd/testcluster-0/keyring --name osd.0 --add-key AQA/XJFZmeaDFRAAKpJ1o6XXnrC6cTMLws5GrA==
# cat /var/lib/ceph/osd/testcluster-0/keyring
[osd.0]
key = AQA/XJFZmeaDFRAAKpJ1o6XXnrC6cTMLws5GrA==
2.5.1.7 初始化osd
和初始化monitor类似,在{osd data}目录中生成osd所需的一些文件或文件夹:
ceph-osd --cluster testcluster -i 0 --mkfs --osd-uuid a270cc4b-54e4-4d5f-ab6c-d31b3037b6c7
# ls /var/lib/ceph/osd/testcluster-0
ceph_fsid current fsid journal keyring magic ready store_version superblock type whoami
2.5.1.8 启动osd
systemctl start ceph-osd@0
# ceph --cluster testcluster -s
cluster:
id: a7f64266-0894-4f1e-a635-d0aeaca0e993
health: HEALTH_WARN
no active mgr
services:
mon: 1 daemons, quorum node2
mgr: no daemons active
osd: 1 osds: 1 up, 1 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs:
# ceph --cluster testcluster daemon osd.0 config show | grep objectstore
"objectstore_blackhole": "false",
"osd_objectstore": "filestore", <-------- 手动方式,osd使用的filestore
"osd_objectstore_fuse": "false",
"osd_objectstore_tracing": "false",
2.5.2.1 删除/dev/sdc的所有分区
2.5.2.2 prepare
ceph-disk prepare --cluster testcluster --cluster-uuid a7f64266-0894-4f1e-a635-d0aeaca0e993 /dev/sdc
这一步会把/dev/sdc分为/dev/sdc1和/dev/sdc2两个分区,并把/dev/sdc1格式化为文件系统(默认xfs, 可以通过"--fs-type"指定别的文件系统)。
2.5.2.3 activate
ceph-disk activate /dev/sdc1 --activate-key /etc/ceph/testcluster.client.bootstrap-osd.keyring
这样,osd就直接被添加并启动了:
# mount | grep sdc
/dev/sdc1 on /var/lib/ceph/osd/testcluster-1 type xfs (rw,noatime,seclabel,attr2,inode64,noquota) <-------- sdc被分区、格式化并挂载
# ls /var/lib/ceph/osd/testcluster-1
activate.monmap active block block_uuid bluefs ceph_fsid fsid keyring kv_backend magic mkfs_done ready systemd type whoami
# ll /var/lib/ceph/osd/testcluster-1/block
lrwxrwxrwx. 1 ceph ceph 58 Aug 14 09:04 /var/lib/ceph/osd/testcluster-1/block -> /dev/disk/by-partuuid/1f29c7dc-7b4a-4644-9283-95bd265a77ed
# ll /dev/disk/by-partuuid/1f29c7dc-7b4a-4644-9283-95bd265a77ed
lrwxrwxrwx. 1 root root 10 Aug 14 09:06 /dev/disk/by-partuuid/1f29c7dc-7b4a-4644-9283-95bd265a77ed -> ../../sdc2 <----------- block是sdc2的软链接
# ceph --cluster testcluster daemon osd.1 config show | grep objectstore
"objectstore_blackhole": "false",
"osd_objectstore": "bluestore", <-------------bluestore !!! bluestore建在osd/testcluster-1/block之上,它是sdc2的软链接
"osd_objectstore_fuse": "false",
"osd_objectstore_tracing": "false",
2.5.3.1 将/dev/sdd分区,我们只使用/dev/sdd2
2.5.3.2 prepare
ceph-disk prepare --cluster testcluster --cluster-uuid a7f64266-0894-4f1e-a635-d0aeaca0e993 /dev/sdd2
2.5.3.3 activate
ceph-disk activate /dev/sdd2 --activate-key /etc/ceph/testcluster.client.bootstrap-osd.keyring
观察:
# mount | grep sdd
/dev/sdd2 on /var/lib/ceph/osd/testcluster-2 type xfs (rw,noatime,seclabel,attr2,inode64,noquota) <---- sdd2 被格式化并挂载
# ll /var/lib/ceph/osd/testcluster-2/block
-rw-r--r--. 1 ceph ceph 10737418240 Aug 14 09:11 /var/lib/ceph/osd/testcluster-2/block <----- block是一个常规文件
# ceph --cluster testcluster daemon osd.2 config show | grep objectstore
"objectstore_blackhole": "false",
"osd_objectstore": "bluestore", <------------------ bluestore. !!! bluestore建在osd/testcluster-1/block之上,它是一个常规文件
"osd_objectstore_fuse": "false",
"osd_objectstore_tracing": "false",
自从ceph 12开始,manager是必须的。应该为每个运行monitor的机器添加一个mgr,否则集群处于WARN状态。
ceph --cluster testcluster auth get-or-create mgr.node2 mon 'allow profile mgr' osd 'allow *' mds 'allow *'
[mgr.node2]
key = AQCTa5FZ5z2SBxAAmcNNPFCi40jI+qi+Kyk2Pw==
mkdir /var/lib/ceph/mgr/testcluster-node2/
ceph --cluster testcluster auth get-or-create mgr.node2 -o /var/lib/ceph/mgr/testcluster-node2/keyring
systemctl start ceph-mgr@node2
# ceph --cluster testcluster -s
cluster:
id: a7f64266-0894-4f1e-a635-d0aeaca0e993
health: HEALTH_OK <------ cluster is healthy
services:
mon: 1 daemons, quorum node2
mgr: node2(active) <------ mgr with id node2 is active
osd: 3 osds: 3 up, 3 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 bytes
usage: 3240 MB used, 88798 MB / 92038 MB avail
pgs: