《Ceph 分布式存储架构解析与工作原理》
ceph-node1
ceph-node2
ceph-node2
网络类型:
NOTE:本文将 Deploy MGMT Network 和 Public Network 统一。
ceph-node1:
ceph-node2:
ceph-node1:
NOTE:Client 通过 Public Network 访问 Ceph Cluster(e.g. curl osd listen),所以当 Client 无法访问存储集群时,不妨检查一下 Public Network 的配置。
[root@ceph-node1 ~]# uname -a
Linux ceph-node1 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[Ceph]
name=Ceph packages for $basearch
baseurl=https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/x86_64/
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[Ceph-noarch]
name=Ceph noarch packages
baseurl=https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/noarch/
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[ceph-source]
name=Ceph source packages
baseurl=https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-mimic/el7/SRPMS/
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
$ yum update -y
yum install -y vim wget
$ systemctl stop firewalld && systemctl disable firewalld && systemctl status firewalld
$ sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config && cat /etc/selinux/config
$ setenforce 0
$ sestatus
SELinux status: disabled
# /etc/hosts
# Ceph Cluster
172.18.22.234 ceph-node1
172.18.22.235 ceph-node2
172.18.22.236 ceph-node3
# OpenStack
172.18.22.231 controller
172.18.22.232 compute
[root@ceph-node1 ~]# cat /etc/chrony.conf
server controller iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
[root@ceph-node1 ~]# systemctl enable chronyd.service && systemctl start chronyd.service && systemctl status chronyd.service
[root@ceph-node1 ~]# chronyc sources
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* controller 3 6 17 5 +22us[ +165us] +/- 72ms
$ yum install -y epel-release
$ ssh-keygen
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@ceph-node1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@ceph-node2
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@ceph-node3
yum install -y ceph-deploy
mkdir -pv /opt/ceph/deploy && /opt/ceph/deploy
NOTE:所有的 ceph-deploy 指令都应该在此目录执行
ceph-deploy new ceph-node1 ceph-node2 ceph-node3
其中关键的日志输出是:
[ceph_deploy.new][DEBUG ] Monitor initial members are ['ceph-node1', 'ceph-node2', 'ceph-node3']
[ceph_deploy.new][DEBUG ] Monitor addrs are ['172.18.22.234', '172.18.22.235', '172.18.22.236']
[ceph_deploy.new][DEBUG ] Creating a random mon key...
[ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
[root@ceph-node1 deploy]# ls
ceph.conf ceph.log ceph.mon.keyring
[root@ceph-node1 deploy]# cat ceph.conf
[global]
fsid = d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c
mon_initial_members = ceph-node1, ceph-node2, ceph-node3
mon_host = 172.18.22.234,172.18.22.235,172.18.22.236
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
[root@ceph-node1 deploy]# cat ceph.conf
[global]
fsid = d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c
mon_initial_members = ceph-node1, ceph-node2, ceph-node3
mon_host = 172.18.22.234,172.18.22.235,172.18.22.236
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
# MON 与 OSD,Client 与 MON,Client 与 OSD 通信的网络
public_network = 172.18.22.0/24
# OSD 之间通信的网络
cluster_network = 192.168.57.0/24
osd_pool_default_size = 3
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 8
osd_pool_default_pgp_num = 8
osd_crush_chooseleaf_type = 1
[mon]
mon_clock_drift_allowed = 0.5
[osd]
osd_mkfs_type = xfs
osd_mkfs_options_xfs = -f
filestore_max_sync_interval = 10
filestore_min_sync_interval = 5
filestore_fd_cache_size = 32768
osd op threads = 8
osd disk threads = 4
filestore op threads = 8
max_open_files = 655350
详细的配置项参数解析,请浏览 http://www.yangguanjun.com/2017/05/15/Ceph-configuration/
ceph-deploy install ceph-node1 ceph-node2 ceph-node3
初始化所有节点的 MON:
ceph-deploy mon create-initial
现在 3 个 Ceph Node 都会运行各自的 MON 守护进程:
[root@ceph-node1 ~]# systemctl status [email protected]
● [email protected] - Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled)
Active: active (running) since Tue 2019-04-23 05:14:08 EDT; 21min ago
Main PID: 17857 (ceph-mon)
CGroup: /system.slice/system-ceph\x2dmon.slice/[email protected]
└─17857 /usr/bin/ceph-mon -f --cluster ceph --id ceph-node1 --setuser ceph --setgroup ceph
Apr 23 05:14:08 ceph-node1 systemd[1]: Started Ceph cluster monitor daemon.
查看生成的配置文件以及秘钥文件:
[root@ceph-node1 deploy]# ll
total 268
-rw------- 1 root root 113 Apr 23 05:14 ceph.bootstrap-mds.keyring
-rw------- 1 root root 113 Apr 23 05:14 ceph.bootstrap-mgr.keyring
-rw------- 1 root root 113 Apr 23 05:14 ceph.bootstrap-osd.keyring
-rw------- 1 root root 113 Apr 23 05:14 ceph.bootstrap-rgw.keyring
-rw------- 1 root root 151 Apr 23 05:14 ceph.client.admin.keyring
-rw-r--r-- 1 root root 771 Apr 23 04:01 ceph.conf
-rw-r--r-- 1 root root 205189 Apr 23 05:14 ceph-deploy-ceph.log
-rw-r--r-- 1 root root 35529 Apr 23 04:29 ceph.log
-rw------- 1 root root 73 Apr 23 03:55 ceph.mon.keyring
将配置文件 ceph.conf 与秘钥文件 ceph.client.admin.keyring 拷贝到 3 个节点:
ceph-deploy --overwrite-conf admin ceph-node1 ceph-node2 ceph-node3
NOTE:任意节点的 ceph.conf 配置被修改后都应该重新 --overwrite-conf
此时就可以通过 Client 查看 Ceph Cluster 状态了:
[root@ceph-node1 ~]# ceph status
cluster:
id: d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c
health: HEALTH_OK
services:
# 3 个 MON Daemons
mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3
mgr: no daemons active
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
注:
# 移除 MON
ceph-deploy mon destroy ceph-node1
# 重新添加 MON
ceph-deploy mon add ceph-node1
Ceph Manager 是在 L 版以后引入的守护进程,用于托管 PG 的状态。
ceph-deploy mgr create ceph-node1 ceph-node2 ceph-node3
现在 3 个 Ceph Node 都会运行各自的 Manager 守护进程:
[root@ceph-node1 deploy]# systemctl status ceph-mgr@ceph-node1
● [email protected] - Ceph cluster manager daemon
Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled)
Active: active (running) since Tue 2019-04-23 05:59:23 EDT; 8min ago
Main PID: 18637 (ceph-mgr)
CGroup: /system.slice/system-ceph\x2dmgr.slice/[email protected]
└─18637 /usr/bin/ceph-mgr -f --cluster ceph --id ceph-node1 --setuser ceph --setgroup ceph
Apr 23 05:59:23 ceph-node1 systemd[1]: Started Ceph cluster manager daemon.
再次查看 Ceph Cluster 状态:
[root@ceph-node1 ~]# ceph -s
cluster:
id: d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3
# 3 个 Manager Daemons
mgr: ceph-node1(active), standbys: ceph-node2, ceph-node3
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
查看当前集群的仲裁状态:
[root@ceph-node1 ~]# ceph quorum_status --format json-pretty
{
"election_epoch": 12,
"quorum": [
0,
1,
2
],
"quorum_names": [
"ceph-node1",
"ceph-node2",
"ceph-node3"
],
# 当前由 ceph-node1 MON 作为 Master MON
"quorum_leader_name": "ceph-node1",
"monmap": {
"epoch": 1,
"fsid": "d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c",
"modified": "2019-04-23 05:14:08.122885",
"created": "2019-04-23 05:14:08.122885",
"features": {
"persistent": [
"kraken",
"luminous",
"mimic",
"osdmap-prune"
],
"optional": []
},
"mons": [
{
"rank": 0,
"name": "ceph-node1",
"addr": "172.18.22.234:6789/0",
"public_addr": "172.18.22.234:6789/0"
},
{
"rank": 1,
"name": "ceph-node2",
"addr": "172.18.22.235:6789/0",
"public_addr": "172.18.22.235:6789/0"
},
{
"rank": 2,
"name": "ceph-node3",
"addr": "172.18.22.236:6789/0",
"public_addr": "172.18.22.236:6789/0"
}
]
}
}
列出节点上的磁盘设备:
[root@ceph-node1 deploy]# ceph-deploy disk list ceph-node1
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy disk list ceph-node1
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] debug : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : list
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf :
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] host : ['ceph-node1']
[ceph_deploy.cli][INFO ] func :
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph-node1][DEBUG ] connected to host: ceph-node1
[ceph-node1][DEBUG ] detect platform information from remote host
[ceph-node1][DEBUG ] detect machine type
[ceph-node1][DEBUG ] find the location of an executable
[ceph-node1][INFO ] Running command: fdisk -l
[ceph-node1][INFO ] Disk /dev/sda: 21.5 GB, 21474836480 bytes, 41943040 sectors
[ceph-node1][INFO ] Disk /dev/sdc: 10.7 GB, 10737418240 bytes, 20971520 sectors
[ceph-node1][INFO ] Disk /dev/sdd: 10.7 GB, 10737418240 bytes, 20971520 sectors
[ceph-node1][INFO ] Disk /dev/sdb: 10.7 GB, 10737418240 bytes, 20971520 sectors
[ceph-node1][INFO ] Disk /dev/mapper/centos-root: 18.2 GB, 18249416704 bytes, 35643392 sectors
[ceph-node1][INFO ] Disk /dev/mapper/centos-swap: 2147 MB, 2147483648 bytes, 4194304 sectors
这里我们选用 sdb、sdc、sdd 作为 OSD,为了保证操作系统的稳定性,sda 只作为系统盘。
惯例对磁盘设备进行一次清理,销毁磁盘中已存在的分区表和数据:
ceph-deploy disk zap ceph-node1 /dev/sdb
ceph-deploy disk zap ceph-node1 /dev/sdc
ceph-deploy disk zap ceph-node1 /dev/sdd
将磁盘设备创建为 OSD:
[root@ceph-node1 deploy]# ceph-deploy osd create --data /dev/sdb ceph-node1
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy osd create --data /dev/sdb ceph-node1
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] bluestore : None
[ceph_deploy.cli][INFO ] cd_conf :
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] fs_type : xfs
[ceph_deploy.cli][INFO ] block_wal : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] journal : None
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] host : ceph-node1
[ceph_deploy.cli][INFO ] filestore : None
[ceph_deploy.cli][INFO ] func :
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] zap_disk : False
[ceph_deploy.cli][INFO ] data : /dev/sdb
[ceph_deploy.cli][INFO ] block_db : None
[ceph_deploy.cli][INFO ] dmcrypt : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] debug : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdb
[ceph-node1][DEBUG ] connected to host: ceph-node1
[ceph-node1][DEBUG ] detect platform information from remote host
[ceph-node1][DEBUG ] detect machine type
[ceph-node1][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.6.1810 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph-node1
[ceph-node1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-node1][WARNIN] osd keyring does not exist yet, creating one
[ceph-node1][DEBUG ] create a keyring file
[ceph-node1][DEBUG ] find the location of an executable
[ceph-node1][INFO ] Running command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
[ceph-node1][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
[ceph-node1][DEBUG ] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 65fc482d-6c94-4517-ad4b-8b363251c6b6
[ceph-node1][DEBUG ] Running command: /usr/sbin/vgcreate --force --yes ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04 /dev/sdb
[ceph-node1][DEBUG ] stdout: Physical volume "/dev/sdb" successfully created.
[ceph-node1][DEBUG ] stdout: Volume group "ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04" successfully created
[ceph-node1][DEBUG ] Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n osd-block-65fc482d-6c94-4517-ad4b-8b363251c6b6 ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04
[ceph-node1][DEBUG ] stdout: Logical volume "osd-block-65fc482d-6c94-4517-ad4b-8b363251c6b6" created.
[ceph-node1][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
[ceph-node1][DEBUG ] Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
[ceph-node1][DEBUG ] Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-0
[ceph-node1][DEBUG ] Running command: /bin/chown -h ceph:ceph /dev/ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04/osd-block-65fc482d-6c94-4517-ad4b-8b363251c6b6
[ceph-node1][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/dm-2
[ceph-node1][DEBUG ] Running command: /bin/ln -s /dev/ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04/osd-block-65fc482d-6c94-4517-ad4b-8b363251c6b6 /var/lib/ceph/osd/ceph-0/block
[ceph-node1][DEBUG ] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
[ceph-node1][DEBUG ] stderr: got monmap epoch 1
[ceph-node1][DEBUG ] Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-0/keyring --create-keyring --name osd.0 --add-key AQBu575cdhmoMxAAj0b4L25hgdqBJFwJb7MDoQ==
[ceph-node1][DEBUG ] stdout: creating /var/lib/ceph/osd/ceph-0/keyring
[ceph-node1][DEBUG ] added entity osd.0 auth auth(auid = 18446744073709551615 key=AQBu575cdhmoMxAAj0b4L25hgdqBJFwJb7MDoQ== with 0 caps)
[ceph-node1][DEBUG ] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
[ceph-node1][DEBUG ] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
[ceph-node1][DEBUG ] Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-0/ --osd-uuid 65fc482d-6c94-4517-ad4b-8b363251c6b6 --setuser ceph --setgroup ceph
[ceph-node1][DEBUG ] --> ceph-volume lvm prepare successful for: /dev/sdb
[ceph-node1][DEBUG ] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
[ceph-node1][DEBUG ] Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04/osd-block-65fc482d-6c94-4517-ad4b-8b363251c6b6 --path /var/lib/ceph/osd/ceph-0 --no-mon-config
[ceph-node1][DEBUG ] Running command: /bin/ln -snf /dev/ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04/osd-block-65fc482d-6c94-4517-ad4b-8b363251c6b6 /var/lib/ceph/osd/ceph-0/block
[ceph-node1][DEBUG ] Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
[ceph-node1][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/dm-2
[ceph-node1][DEBUG ] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
[ceph-node1][DEBUG ] Running command: /bin/systemctl enable ceph-volume@lvm-0-65fc482d-6c94-4517-ad4b-8b363251c6b6
[ceph-node1][DEBUG ] stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/[email protected] to /usr/lib/systemd/system/[email protected].
[ceph-node1][DEBUG ] Running command: /bin/systemctl enable --runtime ceph-osd@0
[ceph-node1][DEBUG ] stderr: Created symlink from /run/systemd/system/ceph-osd.target.wants/[email protected] to /usr/lib/systemd/system/[email protected].
[ceph-node1][DEBUG ] Running command: /bin/systemctl start ceph-osd@0
[ceph-node1][DEBUG ] --> ceph-volume lvm activate successful for osd ID: 0
[ceph-node1][DEBUG ] --> ceph-volume lvm create successful for: /dev/sdb
[ceph-node1][INFO ] checking OSD status...
[ceph-node1][DEBUG ] find the location of an executable
[ceph-node1][INFO ] Running command: /bin/ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host ceph-node1 is now ready for osd use.
# 继续执行:
ceph-deploy osd create --data /dev/sdc ceph-node1
ceph-deploy osd create --data /dev/sdd ceph-node1
NOTE:新版本默认使用 bluestore,没有 journal 日志和 block_db。e.g.
/usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
查看节点上 OSDs 的详细信息:
[root@ceph-node1 deploy]# ceph-deploy osd list ceph-node1
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy osd list ceph-node1
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] debug : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : list
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf :
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] host : ['ceph-node1']
[ceph_deploy.cli][INFO ] func :
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph-node1][DEBUG ] connected to host: ceph-node1
[ceph-node1][DEBUG ] detect platform information from remote host
[ceph-node1][DEBUG ] detect machine type
[ceph-node1][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.6.1810 Core
[ceph_deploy.osd][DEBUG ] Listing disks on ceph-node1...
[ceph-node1][DEBUG ] find the location of an executable
[ceph-node1][INFO ] Running command: /usr/sbin/ceph-volume lvm list
[ceph-node1][DEBUG ]
[ceph-node1][DEBUG ]
[ceph-node1][DEBUG ] ====== osd.1 =======
[ceph-node1][DEBUG ]
[ceph-node1][DEBUG ] [block] /dev/ceph-95326f6f-553a-4c14-9da7-05ebd00eb47f/osd-block-49eb6282-70ed-4436-9139-bcdbf4f4c4e0
[ceph-node1][DEBUG ]
[ceph-node1][DEBUG ] type block
[ceph-node1][DEBUG ] osd id 1
[ceph-node1][DEBUG ] cluster fsid d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c
[ceph-node1][DEBUG ] cluster name ceph
[ceph-node1][DEBUG ] osd fsid 49eb6282-70ed-4436-9139-bcdbf4f4c4e0
[ceph-node1][DEBUG ] encrypted 0
[ceph-node1][DEBUG ] cephx lockbox secret
[ceph-node1][DEBUG ] block uuid AeHtvB-wCTC-cO1X-XVLI-bJEf-zhyw-ZmNqJQ
[ceph-node1][DEBUG ] block device /dev/ceph-95326f6f-553a-4c14-9da7-05ebd00eb47f/osd-block-49eb6282-70ed-4436-9139-bcdbf4f4c4e0
[ceph-node1][DEBUG ] vdo 0
[ceph-node1][DEBUG ] crush device class None
[ceph-node1][DEBUG ] devices /dev/sdc
[ceph-node1][DEBUG ]
[ceph-node1][DEBUG ] ====== osd.0 =======
[ceph-node1][DEBUG ]
[ceph-node1][DEBUG ] [block] /dev/ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04/osd-block-65fc482d-6c94-4517-ad4b-8b363251c6b6
[ceph-node1][DEBUG ]
[ceph-node1][DEBUG ] type block
[ceph-node1][DEBUG ] osd id 0
[ceph-node1][DEBUG ] cluster fsid d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c
[ceph-node1][DEBUG ] cluster name ceph
[ceph-node1][DEBUG ] osd fsid 65fc482d-6c94-4517-ad4b-8b363251c6b6
[ceph-node1][DEBUG ] encrypted 0
[ceph-node1][DEBUG ] cephx lockbox secret
[ceph-node1][DEBUG ] block uuid RdDa8U-13Rg-AqkN-pmSa-AB2g-fe3c-iurKn2
[ceph-node1][DEBUG ] block device /dev/ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04/osd-block-65fc482d-6c94-4517-ad4b-8b363251c6b6
[ceph-node1][DEBUG ] vdo 0
[ceph-node1][DEBUG ] crush device class None
[ceph-node1][DEBUG ] devices /dev/sdb
[ceph-node1][DEBUG ]
[ceph-node1][DEBUG ] ====== osd.2 =======
[ceph-node1][DEBUG ]
[ceph-node1][DEBUG ] [block] /dev/ceph-a1d2dce5-2064-47fb-95a4-8e76a83a575d/osd-block-e9e42d62-bec5-49e0-be9b-a4b995ad5db1
[ceph-node1][DEBUG ]
[ceph-node1][DEBUG ] type block
[ceph-node1][DEBUG ] osd id 2
[ceph-node1][DEBUG ] cluster fsid d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c
[ceph-node1][DEBUG ] cluster name ceph
[ceph-node1][DEBUG ] osd fsid e9e42d62-bec5-49e0-be9b-a4b995ad5db1
[ceph-node1][DEBUG ] encrypted 0
[ceph-node1][DEBUG ] cephx lockbox secret
[ceph-node1][DEBUG ] block uuid H9BSN3-oyjs-QoM5-kRXr-RE34-eEF1-ABf5De
[ceph-node1][DEBUG ] block device /dev/ceph-a1d2dce5-2064-47fb-95a4-8e76a83a575d/osd-block-e9e42d62-bec5-49e0-be9b-a4b995ad5db1
[ceph-node1][DEBUG ] vdo 0
[ceph-node1][DEBUG ] crush device class None
[ceph-node1][DEBUG ] devices /dev/sdd
现在节点上会为每个磁盘设备运行相应的 OSD 守护进程:
[email protected] loaded active running Ceph object storage daemon osd.0
[email protected] loaded active running Ceph object storage daemon osd.1
[email protected] loaded active running Ceph object storage daemon osd.2
ceph-osd.target loaded active active ceph target allowing to start/stop all [email protected] instances at once
再次查看 Ceph Cluster 状态:
[root@ceph-node1 ~]# ceph -s
cluster:
id: d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3
mgr: ceph-node1(active), standbys: ceph-node2, ceph-node3
# 三个 OSD 守护进程
osd: 3 osds: 3 up, 3 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 27 GiB / 30 GiB avail
pgs:
每个 OSD 物理介质(block device)形式是 LVM:
[root@ceph-node1 ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
...
osd-block-65fc482d-6c94-4517-ad4b-8b363251c6b6 ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04 -wi-ao---- <10.00g
osd-block-49eb6282-70ed-4436-9139-bcdbf4f4c4e0 ceph-95326f6f-553a-4c14-9da7-05ebd00eb47f -wi-ao---- <10.00g
osd-block-e9e42d62-bec5-49e0-be9b-a4b995ad5db1 ceph-a1d2dce5-2064-47fb-95a4-8e76a83a575d -wi-ao---- <10.00g
[root@ceph-node1 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
...
ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04 1 1 0 wz--n- <10.00g 0
ceph-95326f6f-553a-4c14-9da7-05ebd00eb47f 1 1 0 wz--n- <10.00g 0
ceph-a1d2dce5-2064-47fb-95a4-8e76a83a575d 1 1 0 wz--n- <10.00g 0
[root@ceph-node1 ~]# pvs
PV VG Fmt Attr PSize PFree
...
/dev/sdb ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04 lvm2 a-- <10.00g 0
/dev/sdc ceph-95326f6f-553a-4c14-9da7-05ebd00eb47f lvm2 a-- <10.00g 0
/dev/sdd ceph-a1d2dce5-2064-47fb-95a4-8e76a83a575d lvm2 a-- <10.00g 0
数据的本质是 XFS 文件系统上的文件:
[root@ceph-node1 ~]# mount
...
tmpfs on /var/lib/ceph/osd/ceph-0 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-1 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-2 type tmpfs (rw,relatime)
[root@ceph-node1 ~]# ll /var/lib/ceph/osd/ceph-0
total 48
-rw-r--r-- 1 ceph ceph 438 Apr 23 06:22 activate.monmap
lrwxrwxrwx 1 ceph ceph 93 Apr 23 06:22 block -> /dev/ceph-28f20cf6-9329-4600-afef-5ac72fcf0a04/osd-block-65fc482d-6c94-4517-ad4b-8b363251c6b6
-rw-r--r-- 1 ceph ceph 2 Apr 23 06:22 bluefs
-rw-r--r-- 1 ceph ceph 37 Apr 23 06:22 ceph_fsid
-rw-r--r-- 1 ceph ceph 37 Apr 23 06:22 fsid
-rw------- 1 ceph ceph 55 Apr 23 06:22 keyring
-rw-r--r-- 1 ceph ceph 8 Apr 23 06:22 kv_backend
-rw-r--r-- 1 ceph ceph 21 Apr 23 06:22 magic
-rw-r--r-- 1 ceph ceph 4 Apr 23 06:22 mkfs_done
-rw-r--r-- 1 ceph ceph 41 Apr 23 06:22 osd_key
-rw-r--r-- 1 ceph ceph 6 Apr 23 06:22 ready
-rw-r--r-- 1 ceph ceph 10 Apr 23 06:22 type
-rw-r--r-- 1 ceph ceph 2 Apr 23 06:22 whoami
[root@ceph-node1 ~]# cat /var/lib/ceph/osd/ceph-0/whoami
0
[root@ceph-node1 ~]# cat /var/lib/ceph/osd/ceph-0/type
bluestore
[root@ceph-node1 ~]# cat /var/lib/ceph/osd/ceph-0/fsid
65fc482d-6c94-4517-ad4b-8b363251c6b6
NOTE:循环上述步骤继续为 ceph-node1、ceph-node3 执行。
此时查看 Ceph Cluster 健康状态:
[root@ceph-node1 ~]# ceph health
HEALTH_OK
查看 OSD 树:
[root@ceph-node1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.08817 root default
-3 0.02939 host ceph-node1
0 hdd 0.00980 osd.0 up 1.00000 1.00000
1 hdd 0.00980 osd.1 up 1.00000 1.00000
2 hdd 0.00980 osd.2 up 1.00000 1.00000
-5 0.02939 host ceph-node2
3 hdd 0.00980 osd.3 up 1.00000 1.00000
4 hdd 0.00980 osd.4 up 1.00000 1.00000
5 hdd 0.00980 osd.5 up 1.00000 1.00000
-7 0.02939 host ceph-node3
6 hdd 0.00980 osd.6 up 1.00000 1.00000
7 hdd 0.00980 osd.7 up 1.00000 1.00000
8 hdd 0.00980 osd.8 up 1.00000 1.00000
3 个节点的 OSD 都创建完后的 Ceph Cluster 状态:
[root@ceph-node1 ~]# ceph -s
cluster:
id: d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3
mgr: ceph-node1(active), standbys: ceph-node2, ceph-node3
# 有 9 个 OSD 守护进程
osd: 9 osds: 9 up, 9 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 9.0 GiB used, 81 GiB / 90 GiB avail
pgs:
注:
# 停止指定节点的 OSD,实际是停止一个系统的守护进程
$ systemctl stop ceph-osd@0
# 移除指定 OSD
$ ceph osd out 0
marked out osd.0.
# 移除指定 OSD 的 CRUSH Map
$ ceph osd crush remove osd.0
# 移除指定 OSD 的认证
$ ceph auth del osd.0
updated
# 清理指定 OSD 的磁盘设备
## 首先 remove 一个 lv
$ lvs | awk 'NR!=1 {if($1~"osd-block-") print $2 "/" $1}' | xargs -I {} sudo lvremove -y {}
## 然后清理磁盘分区和数据
ceph-deploy disk zap ceph-node1 /dev/sdb
# 观察数据迁移
$ ceph -w
# 通过 ceph-bluestore-tool 查看 OSD 详细信息
$ ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-0/block
{
"/var/lib/ceph/osd/ceph-0/block": {
"osd_uuid": "65fc482d-6c94-4517-ad4b-8b363251c6b6",
"size": 10733223936,
"btime": "2019-04-23 06:22:41.387255",
"description": "main",
"bluefs": "1",
"ceph_fsid": "d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"osd_key": "AQBu575cdhmoMxAAj0b4L25hgdqBJFwJb7MDoQ==",
"ready": "ready",
"whoami": "0"
}
}
$ ceph-deploy mds create ceph-node1 ceph-node2 ceph-node3
在各节点上运行了 MDS 守护进程:
[root@ceph-node1 ~]# systemctl status ceph-mds@ceph-node1
● [email protected] - Ceph metadata server daemon
Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled)
Active: active (running) since Wed 2019-04-24 03:24:54 EDT; 19s ago
Main PID: 24252 (ceph-mds)
CGroup: /system.slice/system-ceph\x2dmds.slice/[email protected]
└─24252 /usr/bin/ceph-mds -f --cluster ceph --id ceph-node1 --setuser ceph --setgroup ceph
Apr 24 03:24:54 ceph-node1 systemd[1]: Started Ceph metadata server daemon.
Apr 24 03:24:54 ceph-node1 ceph-mds[24252]: starting mds.ceph-node1 at -
ceph-deploy rgw create ceph-node1 ceph-node2 ceph-node3
在各节点上运行了 RGW 守护进程:
[root@ceph-node1 ~]# systemctl status [email protected]
● [email protected] - Ceph rados gateway
Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled)
Active: active (running) since Wed 2019-04-24 03:28:02 EDT; 14s ago
Main PID: 24422 (radosgw)
CGroup: /system.slice/system-ceph\x2dradosgw.slice/[email protected]
└─24422 /usr/bin/radosgw -f --cluster ceph --name client.rgw.ceph-node1 --setuser ceph --setgroup ceph
Apr 24 03:28:02 ceph-node1 systemd[1]: Started Ceph rados gateway.
查看当前 Ceph Cluster 状态:
[root@ceph-node1 ~]# ceph -s
cluster:
id: d82f0b96-6a69-4f7f-9d79-73d5bac7dd6c
health: HEALTH_WARN
too few PGs per OSD (13 < min 30)
services:
mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3
mgr: ceph-node1(active), standbys: ceph-node2, ceph-node3
osd: 9 osds: 9 up, 9 in
rgw: 3 daemons active
data:
pools: 5 pools, 40 pgs
objects: 283 objects, 109 MiB
usage: 9.4 GiB used, 81 GiB / 90 GiB avail
pgs: 40 active+clean
查看 Pools:
[root@ceph-node1 ~]# rados lspools
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
部署了 RGW 之后会自动创建上述默认 Pools。
创建一个 Pool:
[root@ceph-node1 ~]# ceph osd pool create test_pool 32 32
pool 'test_pool' created
[root@ceph-node1 ~]# rados lspools
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
test_pool
设置指定 Pool 的副本数(配置了默认为 3):
[root@ceph-node1 ~]# ceph osd pool set test_pool size 2
set pool 7 size to 2
创建一个 Object:
[root@ceph-node1 ~]# rados -p test_pool put object1 /etc/hosts
[root@ceph-node1 ~]# rados -p test_pool ls
object1
查看 Object 的 ODS Map
[root@ceph-node1 ~]# ceph osd map test_pool object1
osdmap e69 pool 'test_pool' (7) object 'object1' -> pg 7.bac5debc (7.1c) -> up ([3,8], p3) acting ([3,8], p3)
启用 Dashboard 模块:
$ ceph mgr module enable dashboard
默认情况下 Dashboard 所有的 HTTP 连接均使用 SSL/TLS。这里我们为了快速启动,生成并安装自签名证书:
ceph dashboard create-self-signed-cert
创建具有管理员角色的用户:
ceph dashboard set-login-credentials admin admin
查看 Dashboard URL,默认端口为 8443 或 8080:
[root@ceph-node1 deploy]# ceph mgr services
{
"dashboard": "https://ceph-node1:8443/"
}
包括卸载软件包和配置:
# destroy and uninstall all packages
$ ceph-deploy purge ceph-node1 ceph-node2 ceph-node3
# destroy data
$ ceph-deploy purgedata ceph-node1 ceph-node2 ceph-node3
$ ceph-deploy forgetkeys
# remove all keys
$ rm -rfv ceph.*