Ceph学习笔记1-Mimic版本多节点部署

特别说明:

  1. 本方法也可以用于单节点部署,只部署一个 Monitor (只是会形成单点故障而已),最低要求是使用两个分区创建 2 OSD (因为默认最小副本是 2 );如果不需要使用 CephFS ,则可以不部署 MDS 服务;如果不使用对象存储,则可以不部署 RGW 服务。
  2. Ceph 11.x (kraken) 版本开始新增 Manager 服务,是可选的,从 12.x (luminous) 版本开始是必选的。

系统环境

  • 3个节点的主机 DNS 名及 IP 配置(主机名和 DNS 名称一样):
$ cat /etc/hosts
...

172.29.101.166 osdev01
172.29.101.167 osdev02
172.29.101.168 osdev03

...
  • 内核及发行版版本:
$ uname -r
3.10.0-862.11.6.el7.x86_64

$ cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core)
  • 3个节点使用 sdb OSD 磁盘,使用 dd 命令清除其中可能存在的分区信息(会破坏磁盘数据,谨慎操作):
$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 222.6G  0 disk 
├─sda1   8:1    0     1G  0 part /boot
└─sda2   8:2    0 221.6G  0 part /
sdb      8:16   0   7.3T  0 disk

$ dd if=/dev/zero of=/dev/sdb bs=512 count=1024

系统配置

Yum配置

  • 安装 epel 仓库:
$ yum install -y epel-release
  • 安装 yum 优先级插件:
$ yum install -y yum-plugin-priorities --enablerepo=rhel-7-server-optional-rpms

系统配置

  • 安装和开启 NTP 服务:
$ yum install -y ntp ntpdate ntp-doc

$ systemctl enable ntpd.service && systemctl start ntpd.service && systemctl status ntpd.service
  • 添加 osdev 用户,并放开 sudo 权限(也可以直接使用 root 用户,此步骤只是出于安全考虑):
$ useradd -d /home/osdev -m osdev
$ passwd osdev

$ echo "osdev ALL = (root) NOPASSWD:ALL" | tee /etc/sudoers.d/osdev
$ chmod 0440 /etc/sudoers.d/osdev
  • 关闭防火墙:
$ systemctl stop firewalld && systemctl disable firewalld && systemctl status firewalld
  • 关闭 SELinux
$ sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config && cat /etc/selinux/config
# setenforce 0 && sestatus
$ reboot
$ sestatus
SELinux status:                 disabled

SSH配置

  • 安装 SSH 服务软件包:
$ yum install -y openssh-server
  • SSH 免密登录:
$ ssh-keygen
$ ssh-copy-id osdev@osdev01
$ ssh-copy-id osdev@osdev02
$ ssh-copy-id osdev@osdev03
  • 配置 SSH 默认用户,或者在执行 cepy-deploy 命令时使用 --username 指定用户名(这个配置会导致 Kolla-Ansible 也把这个用户作为默认用户使用,导致权限不足而出现错误。可以在 osdev 用户下进行如下配置,在 root 用户下使用 Kolla-Ansible 即可):
$ vi ~/.ssh/config
Host osdev01
   Hostname osdev01
   User osdev
Host osdev02
   Hostname osdev02
   User osdev
Host osdev03
   Hostname osdev03
   User osdev
  • 测免密登录是否正确:
[root@osdev01 ~]# ssh osdev01
Last login: Wed Aug 22 16:53:56 2018 from osdev01
[osdev@osdev01 ~]$ exit
登出
Connection to osdev01 closed.
[root@osdev01 ~]# ssh osdev02
Last login: Wed Aug 22 16:55:06 2018 from osdev01
[osdev@osdev02 ~]$ exit
登出
Connection to osdev02 closed.
[root@osdev01 ~]# ssh osdev03
Last login: Wed Aug 22 16:55:35 2018 from osdev01
[osdev@osdev03 ~]$ exit
登出
Connection to osdev03 closed.

开始部署

初始化系统

  • 安装 ceph-deploy
$ yum install -y ceph-deploy
  • 创建 ceph-deploy 配置目录:
$ su - osdev
$ mkdir -pv /opt/ceph/deploy && cd /opt/ceph/deploy
  • 创建一个 Ceph 集群,使用 osdev01 osdev02 osdev03 Monitor 节点:
$ ceph-deploy new osdev01 osdev02 osdev03
  • 查看生成的配置文件:
$ ls
ceph.conf  ceph-deploy-ceph.log  ceph.mon.keyring

$ cat ceph.conf 
[global]
fsid = 42ded78e-211b-4095-b795-a33f116727fc
mon_initial_members = osdev01, osdev02, osdev03
mon_host = 172.29.101.166,172.29.101.167,172.29.101.168
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
  • 编辑 Ceph 集群配置:
$ vi ceph.conf
public_network = 172.29.101.0/24
cluster_network = 172.29.101.0/24

osd_pool_default_size = 3
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 8
osd_pool_default_pgp_num = 8
osd_crush_chooseleaf_type = 1

[mon]
mon_clock_drift_allowed = 0.5

[osd]
osd_mkfs_type = xfs
osd_mkfs_options_xfs = -f
filestore_max_sync_interval = 5
filestore_min_sync_interval = 0.1
filestore_fd_cache_size = 655350
filestore_omap_header_cache_size = 655350
filestore_fd_cache_random = true
osd op threads = 8
osd disk threads = 4
filestore op threads = 8
max_open_files = 655350

安装软件包

  • 3 个节点上安装 Ceph 软件包(如果出现错误,则先到 3 个节点上分别先删除软件包):
# sudo yum remove -y ceph-release
$ ceph-deploy install osdev01 osdev02 osdev03

部署Monitor

  • 部署初始 Monitor
$ ceph-deploy mon create-initial
  • 查看生成的配置和秘钥文件:
$ ls
ceph.bootstrap-mds.keyring  ceph.bootstrap-mgr.keyring  ceph.bootstrap-osd.keyring  ceph.bootstrap-rgw.keyring  ceph.client.admin.keyring  ceph.conf  ceph-deploy-ceph.log  ceph.mon.keyring

$ sudo chmod a+r /etc/ceph/ceph.client.admin.keyring
  • 拷贝配置和秘钥文件到指定节点:
$ ceph-deploy --overwrite-conf admin osdev01 osdev02 osdev03
  • 配置 osdev01 Monitor 剩余可用数据空间警告比例:
$ ceph -s
  cluster:
    id:     383237bd-becf-49d5-9bd6-deb0bc35ab2a
    health: HEALTH_WARN
            mon osdev01 is low on available space
 
  services:
    mon: 3 daemons, quorum osdev01,osdev02,osdev03
    mgr: osdev03(active), standbys: osdev02, osdev01
    osd: 3 osds: 3 up, 3 in
    rgw: 3 daemons active
 
  data:
    pools:   10 pools, 176 pgs
    objects: 578  objects, 477 MiB
    usage:   4.0 GiB used, 22 TiB / 22 TiB avail
    pgs:     176 active+clean

$ ceph daemon mon.osdev01 config get mon_data_avail_warn
{
    "mon_data_avail_warn": "30"
}

$ ceph daemon mon.osdev01 config set mon_data_avail_warn 10
{
    "success": "mon_data_avail_warn = '10' (not observed, change may require restart) "
}

$ vi /etc/ceph/ceph.conf
[mon]
mon_clock_drift_allowed = 0.5
mon allow pool delete = true
mon_data_avail_warn = 10

$ systemctl restart [email protected]

$ ceph daemon mon.osdev01 config get mon_data_avail_warn
{
    "mon_data_avail_warn": "10"
}

$ ceph -s
  cluster:
    id:     383237bd-becf-49d5-9bd6-deb0bc35ab2a
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum osdev01,osdev02,osdev03
    mgr: osdev03(active), standbys: osdev02, osdev01
    osd: 3 osds: 3 up, 3 in
    rgw: 3 daemons active
 
  data:
    pools:   10 pools, 176 pgs
    objects: 578  objects, 477 MiB
    usage:   4.0 GiB used, 22 TiB / 22 TiB avail
    pgs:     176 active+clean

移除Monitor

  • 移除 osdev01 上的 Monitor 服务:
$ ceph-deploy mon destroy osdev01
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy mon destroy osdev01
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : destroy
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : 
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  mon                           : ['osdev01']
[ceph_deploy.cli][INFO  ]  func                          : 
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.mon][DEBUG ] Removing mon from osdev01
[osdev01][DEBUG ] connected to host: osdev01 
[osdev01][DEBUG ] detect platform information from remote host
[osdev01][DEBUG ] detect machine type
[osdev01][DEBUG ] find the location of an executable
[osdev01][DEBUG ] get remote short hostname
[osdev01][INFO  ] Running command: ceph --cluster=ceph -n mon. -k /var/lib/ceph/mon/ceph-osdev01/keyring mon remove osdev01
[osdev01][WARNIN] removing mon.osdev01 at 172.29.101.166:6789/0, there will be 2 monitors
[osdev01][INFO  ] polling the daemon to verify it stopped
[osdev01][INFO  ] Running command: systemctl stop [email protected]
[osdev01][INFO  ] Running command: mkdir -p /var/lib/ceph/mon-removed
[osdev01][DEBUG ] move old monitor data
  • 重新在 osdev01 上添加 Monitor 服务:
$ ceph-deploy mon add osdev01
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy mon add osdev01
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : add
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : 
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  mon                           : ['osdev01']
[ceph_deploy.cli][INFO  ]  func                          : 
[ceph_deploy.cli][INFO  ]  address                       : None
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.mon][INFO  ] ensuring configuration of new mon host: osdev01
[ceph_deploy.admin][DEBUG ] Pushing admin keys and conf to osdev01
[osdev01][DEBUG ] connected to host: osdev01 
[osdev01][DEBUG ] detect platform information from remote host
[osdev01][DEBUG ] detect machine type
[osdev01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.mon][DEBUG ] Adding mon to cluster ceph, host osdev01
[ceph_deploy.mon][DEBUG ] using mon address by resolving host: 172.29.101.166
[ceph_deploy.mon][DEBUG ] detecting platform for host osdev01 ...
[osdev01][DEBUG ] connected to host: osdev01 
[osdev01][DEBUG ] detect platform information from remote host
[osdev01][DEBUG ] detect machine type
[osdev01][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO  ] distro info: CentOS Linux 7.5.1804 Core
[osdev01][DEBUG ] determining if provided host has same hostname in remote
[osdev01][DEBUG ] get remote short hostname
[osdev01][DEBUG ] adding mon to osdev01
[osdev01][DEBUG ] get remote short hostname
[osdev01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[osdev01][DEBUG ] create the mon path if it does not exist
[osdev01][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-osdev01/done
[osdev01][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-osdev01/done
[osdev01][INFO  ] creating keyring file: /var/lib/ceph/tmp/ceph-osdev01.mon.keyring
[osdev01][DEBUG ] create the monitor keyring file
[osdev01][INFO  ] Running command: ceph --cluster ceph mon getmap -o /var/lib/ceph/tmp/ceph.osdev01.monmap
[osdev01][WARNIN] got monmap epoch 3
[osdev01][INFO  ] Running command: ceph-mon --cluster ceph --mkfs -i osdev01 --monmap /var/lib/ceph/tmp/ceph.osdev01.monmap --keyring /var/lib/ceph/tmp/ceph-osdev01.mon.keyring --setuser 167 --setgroup 167
[osdev01][INFO  ] unlinking keyring file /var/lib/ceph/tmp/ceph-osdev01.mon.keyring
[osdev01][DEBUG ] create a done file to avoid re-doing the mon deployment
[osdev01][DEBUG ] create the init path if it does not exist
[osdev01][INFO  ] Running command: systemctl enable ceph.target
[osdev01][INFO  ] Running command: systemctl enable ceph-mon@osdev01
[osdev01][INFO  ] Running command: systemctl start ceph-mon@osdev01
[osdev01][INFO  ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.osdev01.asok mon_status
[osdev01][WARNIN] monitor osdev01 does not exist in monmap
[osdev01][INFO  ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.osdev01.asok mon_status
[osdev01][DEBUG ] ********************************************************************************
[osdev01][DEBUG ] status for monitor: mon.osdev01
[osdev01][DEBUG ] {
[osdev01][DEBUG ]   "election_epoch": 0, 
[osdev01][DEBUG ]   "extra_probe_peers": [], 
[osdev01][DEBUG ]   "feature_map": {
[osdev01][DEBUG ]     "client": [
[osdev01][DEBUG ]       {
[osdev01][DEBUG ]         "features": "0x1ffddff8eea4fffb", 
[osdev01][DEBUG ]         "num": 1, 
[osdev01][DEBUG ]         "release": "luminous"
[osdev01][DEBUG ]       }, 
[osdev01][DEBUG ]       {
[osdev01][DEBUG ]         "features": "0x3ffddff8ffa4fffb", 
[osdev01][DEBUG ]         "num": 1, 
[osdev01][DEBUG ]         "release": "luminous"
[osdev01][DEBUG ]       }
[osdev01][DEBUG ]     ], 
[osdev01][DEBUG ]     "mds": [
[osdev01][DEBUG ]       {
[osdev01][DEBUG ]         "features": "0x3ffddff8ffa4fffb", 
[osdev01][DEBUG ]         "num": 2, 
[osdev01][DEBUG ]         "release": "luminous"
[osdev01][DEBUG ]       }
[osdev01][DEBUG ]     ], 
[osdev01][DEBUG ]     "mgr": [
[osdev01][DEBUG ]       {
[osdev01][DEBUG ]         "features": "0x3ffddff8ffa4fffb", 
[osdev01][DEBUG ]         "num": 3, 
[osdev01][DEBUG ]         "release": "luminous"
[osdev01][DEBUG ]       }
[osdev01][DEBUG ]     ], 
[osdev01][DEBUG ]     "mon": [
[osdev01][DEBUG ]       {
[osdev01][DEBUG ]         "features": "0x3ffddff8ffa4fffb", 
[osdev01][DEBUG ]         "num": 1, 
[osdev01][DEBUG ]         "release": "luminous"
[osdev01][DEBUG ]       }
[osdev01][DEBUG ]     ], 
[osdev01][DEBUG ]     "osd": [
[osdev01][DEBUG ]       {
[osdev01][DEBUG ]         "features": "0x3ffddff8ffa4fffb", 
[osdev01][DEBUG ]         "num": 2, 
[osdev01][DEBUG ]         "release": "luminous"
[osdev01][DEBUG ]       }
[osdev01][DEBUG ]     ]
[osdev01][DEBUG ]   }, 
[osdev01][DEBUG ]   "features": {
[osdev01][DEBUG ]     "quorum_con": "0", 
[osdev01][DEBUG ]     "quorum_mon": [], 
[osdev01][DEBUG ]     "required_con": "144115188346404864", 
[osdev01][DEBUG ]     "required_mon": [
[osdev01][DEBUG ]       "kraken", 
[osdev01][DEBUG ]       "luminous", 
[osdev01][DEBUG ]       "mimic", 
[osdev01][DEBUG ]       "osdmap-prune"
[osdev01][DEBUG ]     ]
[osdev01][DEBUG ]   }, 
[osdev01][DEBUG ]   "monmap": {
[osdev01][DEBUG ]     "created": "2018-08-23 10:55:27.755434", 
[osdev01][DEBUG ]     "epoch": 3, 
[osdev01][DEBUG ]     "features": {
[osdev01][DEBUG ]       "optional": [], 
[osdev01][DEBUG ]       "persistent": [
[osdev01][DEBUG ]         "kraken", 
[osdev01][DEBUG ]         "luminous", 
[osdev01][DEBUG ]         "mimic", 
[osdev01][DEBUG ]         "osdmap-prune"
[osdev01][DEBUG ]       ]
[osdev01][DEBUG ]     }, 
[osdev01][DEBUG ]     "fsid": "383237bd-becf-49d5-9bd6-deb0bc35ab2a", 
[osdev01][DEBUG ]     "modified": "2018-09-19 14:57:08.984472", 
[osdev01][DEBUG ]     "mons": [
[osdev01][DEBUG ]       {
[osdev01][DEBUG ]         "addr": "172.29.101.167:6789/0", 
[osdev01][DEBUG ]         "name": "osdev02", 
[osdev01][DEBUG ]         "public_addr": "172.29.101.167:6789/0", 
[osdev01][DEBUG ]         "rank": 0
[osdev01][DEBUG ]       }, 
[osdev01][DEBUG ]       {
[osdev01][DEBUG ]         "addr": "172.29.101.168:6789/0", 
[osdev01][DEBUG ]         "name": "osdev03", 
[osdev01][DEBUG ]         "public_addr": "172.29.101.168:6789/0", 
[osdev01][DEBUG ]         "rank": 1
[osdev01][DEBUG ]       }
[osdev01][DEBUG ]     ]
[osdev01][DEBUG ]   }, 
[osdev01][DEBUG ]   "name": "osdev01", 
[osdev01][DEBUG ]   "outside_quorum": [], 
[osdev01][DEBUG ]   "quorum": [], 
[osdev01][DEBUG ]   "rank": -1, 
[osdev01][DEBUG ]   "state": "probing", 
[osdev01][DEBUG ]   "sync_provider": []
[osdev01][DEBUG ] }
[osdev01][DEBUG ] ********************************************************************************
[osdev01][INFO  ] monitor: mon.osdev01 is currently at the state of probing

部署Manager

  • 3 个节点上安装 Manager 服务(从 kraken 版本开始增加该服务,从 luminous 版本开始是必选):
$ ceph-deploy mgr create osdev01 osdev02 osdev03
  • 查看集群状态, 3 Manager 只有一个是被激活的:
$ sudo ceph -s
  cluster:
    id:     383237bd-becf-49d5-9bd6-deb0bc35ab2a
    health: HEALTH_WARN
            mon osdev01 is low on available space
 
  services:
    mon: 3 daemons, quorum osdev01,osdev02,osdev03
    mgr: osdev01(active), standbys: osdev03, osdev02
    osd: 0 osds: 0 up, 0 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:
  • 查看当前的集群投票状态:
$ sudo ceph quorum_status --format json-pretty
{
    "election_epoch": 8,
    "quorum": [
        0,
        1,
        2
    ],
    "quorum_names": [
        "osdev01",
        "osdev02",
        "osdev03"
    ],
    "quorum_leader_name": "osdev01",
    "monmap": {
        "epoch": 2,
        "fsid": "383237bd-becf-49d5-9bd6-deb0bc35ab2a",
        "modified": "2018-08-23 10:55:53.598952",
        "created": "2018-08-23 10:55:27.755434",
        "features": {
            "persistent": [
                "kraken",
                "luminous",
                "mimic",
                "osdmap-prune"
            ],
            "optional": []
        },
        "mons": [
            {
                "rank": 0,
                "name": "osdev01",
                "addr": "172.29.101.166:6789/0",
                "public_addr": "172.29.101.166:6789/0"
            },
            {
                "rank": 1,
                "name": "osdev02",
                "addr": "172.29.101.167:6789/0",
                "public_addr": "172.29.101.167:6789/0"
            },
            {
                "rank": 2,
                "name": "osdev03",
                "addr": "172.29.101.168:6789/0",
                "public_addr": "172.29.101.168:6789/0"
            }
        ]
    }
}

部署OSD

  • 如果之前部署过 OSD ,则清理掉其中的LVM卷:
$ sudo lvs | awk 'NR!=1 {if($1~"osd-block-") print $2 "/" $1}' | xargs -I {} sudo lvremove -y {}
  • 清除磁盘数据(如果之前 dd处理过 ,以及没有 LVM 卷,则可省略):
$ ceph-deploy disk zap osdev01 /dev/sdb
$ ceph-deploy disk zap osdev02 /dev/sdb
$ ceph-deploy disk zap osdev03 /dev/sdb
  • 3 个节点上部署 OSD 服务,默认使用 bluestore ,没有 journal block_db
$ ceph-deploy osd create --data /dev/sdb osdev01
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/osdev/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy osd create --data /dev/sdb osdev01
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  bluestore                     : None
[ceph_deploy.cli][INFO  ]  cd_conf                       : 
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  fs_type                       : xfs
[ceph_deploy.cli][INFO  ]  block_wal                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  journal                       : None
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  host                          : osdev01
[ceph_deploy.cli][INFO  ]  filestore                     : None
[ceph_deploy.cli][INFO  ]  func                          : 
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  zap_disk                      : False
[ceph_deploy.cli][INFO  ]  data                          : /dev/sdb
[ceph_deploy.cli][INFO  ]  block_db                      : None
[ceph_deploy.cli][INFO  ]  dmcrypt                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  debug                         : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdb
[osdev01][DEBUG ] connection detected need for sudo
[osdev01][DEBUG ] connected to host: osdev01 
[osdev01][DEBUG ] detect platform information from remote host
[osdev01][DEBUG ] detect machine type
[osdev01][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.5.1804 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to osdev01
[osdev01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[osdev01][DEBUG ] find the location of an executable
[osdev01][INFO  ] Running command: sudo /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
[osdev01][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
[osdev01][DEBUG ] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 3c3d6c5a-c82e-4318-a8fb-134de5444ca7
[osdev01][DEBUG ] Running command: /usr/sbin/vgcreate --force --yes ceph-95b94aa4-22df-401c-822b-dd62f82f6b08 /dev/sdb
[osdev01][DEBUG ]  stdout: Physical volume "/dev/sdb" successfully created.
[osdev01][DEBUG ]  stdout: Volume group "ceph-95b94aa4-22df-401c-822b-dd62f82f6b08" successfully created
[osdev01][DEBUG ] Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n osd-block-3c3d6c5a-c82e-4318-a8fb-134de5444ca7 ceph-95b94aa4-22df-401c-822b-dd62f82f6b08
[osdev01][DEBUG ]  stdout: Logical volume "osd-block-3c3d6c5a-c82e-4318-a8fb-134de5444ca7" created.
[osdev01][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
[osdev01][DEBUG ] Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1
[osdev01][DEBUG ] Running command: /bin/chown -h ceph:ceph /dev/ceph-95b94aa4-22df-401c-822b-dd62f82f6b08/osd-block-3c3d6c5a-c82e-4318-a8fb-134de5444ca7
[osdev01][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/dm-0
[osdev01][DEBUG ] Running command: /bin/ln -s /dev/ceph-95b94aa4-22df-401c-822b-dd62f82f6b08/osd-block-3c3d6c5a-c82e-4318-a8fb-134de5444ca7 /var/lib/ceph/osd/ceph-1/block
[osdev01][DEBUG ] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-1/activate.monmap
[osdev01][DEBUG ]  stderr: got monmap epoch 1
[osdev01][DEBUG ] Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-1/keyring --create-keyring --name osd.1 --add-key AQDxF35bOAdNHBAAelXgl7laeMnVsGAlHl0dxQ==
[osdev01][DEBUG ]  stdout: creating /var/lib/ceph/osd/ceph-1/keyring
[osdev01][DEBUG ] added entity osd.1 auth auth(auid = 18446744073709551615 key=AQDxF35bOAdNHBAAelXgl7laeMnVsGAlHl0dxQ== with 0 caps)
[osdev01][DEBUG ] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/keyring
[osdev01][DEBUG ] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1/
[osdev01][DEBUG ] Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 1 --monmap /var/lib/ceph/osd/ceph-1/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-1/ --osd-uuid 3c3d6c5a-c82e-4318-a8fb-134de5444ca7 --setuser ceph --setgroup ceph
[osdev01][DEBUG ] --> ceph-volume lvm prepare successful for: /dev/sdb
[osdev01][DEBUG ] Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-95b94aa4-22df-401c-822b-dd62f82f6b08/osd-block-3c3d6c5a-c82e-4318-a8fb-134de5444ca7 --path /var/lib/ceph/osd/ceph-1 --no-mon-config
[osdev01][DEBUG ] Running command: /bin/ln -snf /dev/ceph-95b94aa4-22df-401c-822b-dd62f82f6b08/osd-block-3c3d6c5a-c82e-4318-a8fb-134de5444ca7 /var/lib/ceph/osd/ceph-1/block
[osdev01][DEBUG ] Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block
[osdev01][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/dm-0
[osdev01][DEBUG ] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
[osdev01][DEBUG ] Running command: /bin/systemctl enable ceph-volume@lvm-1-3c3d6c5a-c82e-4318-a8fb-134de5444ca7
[osdev01][DEBUG ]  stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/[email protected] to /usr/lib/systemd/system/[email protected].
[osdev01][DEBUG ] Running command: /bin/systemctl start ceph-osd@1
[osdev01][DEBUG ] --> ceph-volume lvm activate successful for osd ID: 1
[osdev01][DEBUG ] --> ceph-volume lvm create successful for: /dev/sdb
[osdev01][INFO  ] checking OSD status...
[osdev01][DEBUG ] find the location of an executable
[osdev01][INFO  ] Running command: sudo /bin/ceph --cluster=ceph osd stat --format=json
[osdev01][WARNIN] there is 1 OSD down
[osdev01][WARNIN] there is 1 OSD out
[ceph_deploy.osd][DEBUG ] Host osdev01 is now ready for osd use.

$ ceph-deploy osd create --data /dev/sdb osdev02
$ ceph-deploy osd create --data /dev/sdb osdev03
  • 查看 OSD 的分区状况,新版 Ceph 默认使用 bluestore
$ ceph-deploy osd list osdev01
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy osd list osdev01
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  debug                         : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : list
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : 
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  host                          : ['osdev01']
[ceph_deploy.cli][INFO  ]  func                          : 
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[osdev01][DEBUG ] connected to host: osdev01 
[osdev01][DEBUG ] detect platform information from remote host
[osdev01][DEBUG ] detect machine type
[osdev01][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.5.1804 Core
[ceph_deploy.osd][DEBUG ] Listing disks on osdev01...
[osdev01][DEBUG ] find the location of an executable
[osdev01][INFO  ] Running command: /usr/sbin/ceph-volume lvm list
[osdev01][DEBUG ] 
[osdev01][DEBUG ] 
[osdev01][DEBUG ] ====== osd.0 =======
[osdev01][DEBUG ] 
[osdev01][DEBUG ]   [block]    /dev/ceph-a2130090-fb78-4b65-838f-7496c63fa025/osd-block-2cb30e7c-7b98-4a6c-816a-2de7201a7669
[osdev01][DEBUG ] 
[osdev01][DEBUG ]       type                      block
[osdev01][DEBUG ]       osd id                    0
[osdev01][DEBUG ]       cluster fsid              383237bd-becf-49d5-9bd6-deb0bc35ab2a
[osdev01][DEBUG ]       cluster name              ceph
[osdev01][DEBUG ]       osd fsid                  2cb30e7c-7b98-4a6c-816a-2de7201a7669
[osdev01][DEBUG ]       encrypted                 0
[osdev01][DEBUG ]       cephx lockbox secret      
[osdev01][DEBUG ]       block uuid                AL5bfk-acAQ-9guP-tl61-A4Jf-RQOF-nFnE9o
[osdev01][DEBUG ]       block device              /dev/ceph-a2130090-fb78-4b65-838f-7496c63fa025/osd-block-2cb30e7c-7b98-4a6c-816a-2de7201a7669
[osdev01][DEBUG ]       vdo                       0
[osdev01][DEBUG ]       crush device class        None
[osdev01][DEBUG ]       devices                   /dev/sdb

$ lvs
  LV                                             VG                                        Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  osd-block-2cb30e7c-7b98-4a6c-816a-2de7201a7669 ceph-a2130090-fb78-4b65-838f-7496c63fa025 -wi-ao---- <7.28t                                                    

$ pvs
  PV         VG                                        Fmt  Attr PSize  PFree
  /dev/sdb   ceph-a2130090-fb78-4b65-838f-7496c63fa025 lvm2 a--  <7.28t    0

# osdev01
$ df -h | grep ceph
tmpfs           189G   24K  189G    1% /var/lib/ceph/osd/ceph-0

$ ll /var/lib/ceph/osd/ceph-0
总用量 24
lrwxrwxrwx 1 ceph ceph 93 8月  29 15:15 block -> /dev/ceph-a2130090-fb78-4b65-838f-7496c63fa025/osd-block-2cb30e7c-7b98-4a6c-816a-2de7201a7669
-rw------- 1 ceph ceph 37 8月  29 15:15 ceph_fsid
-rw------- 1 ceph ceph 37 8月  29 15:15 fsid
-rw------- 1 ceph ceph 55 8月  29 15:15 keyring
-rw------- 1 ceph ceph  6 8月  29 15:15 ready
-rw------- 1 ceph ceph 10 8月  29 15:15 type
-rw------- 1 ceph ceph  2 8月  29 15:15 whoami

$ cat /var/lib/ceph/osd/ceph-0/whoami 
0
$ cat /var/lib/ceph/osd/ceph-0/type 
bluestore
$ cat /var/lib/ceph/osd/ceph-0/ready 
ready
$ cat /var/lib/ceph/osd/ceph-0/fsid 
2cb30e7c-7b98-4a6c-816a-2de7201a7669

# osdev02
$ df -h | grep ceph
tmpfs           189G   48K  189G    1% /var/lib/ceph/osd/ceph-1
  • 查看集群状态:
$ sudo ceph health
HEALTH_WARN mon osdev01 is low on available space

$ sudo ceph -s
  cluster:
    id:     383237bd-becf-49d5-9bd6-deb0bc35ab2a
    health: HEALTH_WARN
            mon osdev01 is low on available space
 
  services:
    mon: 3 daemons, quorum osdev01,osdev02,osdev03
    mgr: osdev01(active), standbys: osdev03, osdev02
    osd: 3 osds: 3 up, 3 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   3.0 GiB used, 22 TiB / 22 TiB avail
    pgs:     
  • 查看 OSD 状态:
$ sudo ceph osd tree
ID CLASS WEIGHT   TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       21.83066 root default                             
-3        7.27689     host osdev01                         
 0   hdd  7.27689         osd.0        up  1.00000 1.00000 
-5        7.27689     host osdev02                         
 1   hdd  7.27689         osd.1        up  1.00000 1.00000 
-7        7.27689     host osdev03                         
 2   hdd  7.27689         osd.2        up  1.00000 1.00000

移除OSD

  • 删除 OSD
$ ceph osd out 0
marked out osd.0.
  • 观察数据迁移:
$ ceph -w
  • 在对应的节点上停止 OSD 服务:
$ systemctl stop ceph-osd@0
  • 删除该 OSD CRUSH 表:
$ ceph osd crush remove osd.0
removed item id 0 name 'osd.0' from crush map
  • 删除该 OSD 的认证:
$ ceph auth del osd.0
updated
  • 清理 OSD 的磁盘:
$ sudo lvs | awk 'NR!=1 {if($1~"osd-block-") print $2 "/" $1}' | xargs -I {} sudo lvremove -y {}
  Logical volume "osd-block-2cb30e7c-7b98-4a6c-816a-2de7201a7669" successfully removed

$ ceph-deploy disk zap osdev01 /dev/sdb
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy disk zap osdev01 /dev/sdb
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  debug                         : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : zap
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : 
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  host                          : osdev01
[ceph_deploy.cli][INFO  ]  func                          : 
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  disk                          : ['/dev/sdb']
[ceph_deploy.osd][DEBUG ] zapping /dev/sdb on osdev01
[osdev01][DEBUG ] connected to host: osdev01 
[osdev01][DEBUG ] detect platform information from remote host
[osdev01][DEBUG ] detect machine type
[osdev01][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.5.1804 Core
[osdev01][DEBUG ] zeroing last few blocks of device
[osdev01][DEBUG ] find the location of an executable
[osdev01][INFO  ] Running command: /usr/sbin/ceph-volume lvm zap /dev/sdb
[osdev01][DEBUG ] --> Zapping: /dev/sdb
[osdev01][DEBUG ] Running command: /usr/sbin/cryptsetup status /dev/mapper/
[osdev01][DEBUG ]  stdout: /dev/mapper/ is inactive.
[osdev01][DEBUG ] Running command: /usr/sbin/wipefs --all /dev/sdb
[osdev01][DEBUG ]  stdout: /dev/sdb:8 个字节已擦除,位置偏移为 0x00000218 (LVM2_member):4c 56 4d 32 20 30 30 31
[osdev01][DEBUG ] Running command: /bin/dd if=/dev/zero of=/dev/sdb bs=1M count=10
[osdev01][DEBUG ]  stderr: 记录了10+0 的读入
[osdev01][DEBUG ] 记录了10+0 的写出
[osdev01][DEBUG ] 10485760字节(10 MB)已复制
[osdev01][DEBUG ]  stderr: ,0.0131341 秒,798 MB/秒
[osdev01][DEBUG ] --> Zapping successful for: /dev/sdb
  • 重新添加 OSD
$ ceph-deploy osd create --data /dev/sdb osdev01
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy osd create --data /dev/sdb osdev01
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  bluestore                     : None
[ceph_deploy.cli][INFO  ]  cd_conf                       : 
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  fs_type                       : xfs
[ceph_deploy.cli][INFO  ]  block_wal                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  journal                       : None
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  host                          : osdev01
[ceph_deploy.cli][INFO  ]  filestore                     : None
[ceph_deploy.cli][INFO  ]  func                          : 
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  zap_disk                      : False
[ceph_deploy.cli][INFO  ]  data                          : /dev/sdb
[ceph_deploy.cli][INFO  ]  block_db                      : None
[ceph_deploy.cli][INFO  ]  dmcrypt                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  debug                         : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdb
[osdev01][DEBUG ] connected to host: osdev01 
[osdev01][DEBUG ] detect platform information from remote host
[osdev01][DEBUG ] detect machine type
[osdev01][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.5.1804 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to osdev01
[osdev01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[osdev01][DEBUG ] find the location of an executable
[osdev01][INFO  ] Running command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdb
[osdev01][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
[osdev01][DEBUG ] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new df124d5a-122a-48b4-9173-87088c6e6aac
[osdev01][DEBUG ] Running command: /usr/sbin/vgcreate --force --yes ceph-5cddc4d4-2b62-452a-8ba1-61df276d5320 /dev/sdb
[osdev01][DEBUG ]  stdout: Physical volume "/dev/sdb" successfully created.
[osdev01][DEBUG ]  stdout: Volume group "ceph-5cddc4d4-2b62-452a-8ba1-61df276d5320" successfully created
[osdev01][DEBUG ] Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n osd-block-df124d5a-122a-48b4-9173-87088c6e6aac ceph-5cddc4d4-2b62-452a-8ba1-61df276d5320
[osdev01][DEBUG ]  stdout: Logical volume "osd-block-df124d5a-122a-48b4-9173-87088c6e6aac" created.
[osdev01][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
[osdev01][DEBUG ] Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-3
[osdev01][DEBUG ] Running command: /bin/chown -h ceph:ceph /dev/ceph-5cddc4d4-2b62-452a-8ba1-61df276d5320/osd-block-df124d5a-122a-48b4-9173-87088c6e6aac
[osdev01][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/dm-0
[osdev01][DEBUG ] Running command: /bin/ln -s /dev/ceph-5cddc4d4-2b62-452a-8ba1-61df276d5320/osd-block-df124d5a-122a-48b4-9173-87088c6e6aac /var/lib/ceph/osd/ceph-3/block
[osdev01][DEBUG ] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-3/activate.monmap
[osdev01][DEBUG ]  stderr: got monmap epoch 4
[osdev01][DEBUG ] Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-3/keyring --create-keyring --name osd.3 --add-key AQDP9qFbXoYRERAAMMz5EHjYAdlveVdDe1uAYg==
[osdev01][DEBUG ]  stdout: creating /var/lib/ceph/osd/ceph-3/keyring
[osdev01][DEBUG ]  stdout: added entity osd.3 auth auth(auid = 18446744073709551615 key=AQDP9qFbXoYRERAAMMz5EHjYAdlveVdDe1uAYg== with 0 caps)
[osdev01][DEBUG ] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-3/keyring
[osdev01][DEBUG ] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-3/
[osdev01][DEBUG ] Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 3 --monmap /var/lib/ceph/osd/ceph-3/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-3/ --osd-uuid df124d5a-122a-48b4-9173-87088c6e6aac --setuser ceph --setgroup ceph
[osdev01][DEBUG ] --> ceph-volume lvm prepare successful for: /dev/sdb
[osdev01][DEBUG ] Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-5cddc4d4-2b62-452a-8ba1-61df276d5320/osd-block-df124d5a-122a-48b4-9173-87088c6e6aac --path /var/lib/ceph/osd/ceph-3 --no-mon-config
[osdev01][DEBUG ] Running command: /bin/ln -snf /dev/ceph-5cddc4d4-2b62-452a-8ba1-61df276d5320/osd-block-df124d5a-122a-48b4-9173-87088c6e6aac /var/lib/ceph/osd/ceph-3/block
[osdev01][DEBUG ] Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-3/block
[osdev01][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/dm-0
[osdev01][DEBUG ] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-3
[osdev01][DEBUG ] Running command: /bin/systemctl enable ceph-volume@lvm-3-df124d5a-122a-48b4-9173-87088c6e6aac
[osdev01][DEBUG ]  stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/[email protected] to /usr/lib/systemd/system/[email protected].
[osdev01][DEBUG ] Running command: /bin/systemctl start ceph-osd@3
[osdev01][DEBUG ] --> ceph-volume lvm activate successful for osd ID: 3
[osdev01][DEBUG ] --> ceph-volume lvm create successful for: /dev/sdb
[osdev01][INFO  ] checking OSD status...
[osdev01][DEBUG ] find the location of an executable
[osdev01][INFO  ] Running command: /bin/ceph --cluster=ceph osd stat --format=json
[osdev01][WARNIN] there is 1 OSD down
[osdev01][WARNIN] there is 1 OSD out
[ceph_deploy.osd][DEBUG ] Host osdev01 is now ready for osd use.
  • 查看 OSD 状态:
$ ceph osd tree
ID CLASS WEIGHT   TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       21.83066 root default                             
-3        7.27689     host osdev01                         
 3   hdd  7.27689         osd.3        up  1.00000 1.00000 
-5        7.27689     host osdev02                         
 1   hdd  7.27689         osd.1        up  1.00000 1.00000 
-7        7.27689     host osdev03                         
 2   hdd  7.27689         osd.2        up  1.00000 1.00000 
 0              0 osd.0              down        0 1.00000
 
$ ceph-bluestore-tool show-label --dev /var/lib/ceph/osd/ceph-3/block 
{
    "/var/lib/ceph/osd/ceph-3/block": {
        "osd_uuid": "df124d5a-122a-48b4-9173-87088c6e6aac",
        "size": 8000995590144,
        "btime": "2018-09-19 15:12:17.376253",
        "description": "main",
        "bluefs": "1",
        "ceph_fsid": "383237bd-becf-49d5-9bd6-deb0bc35ab2a",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "osd_key": "AQDP9qFbXoYRERAAMMz5EHjYAdlveVdDe1uAYg==",
        "ready": "ready",
        "whoami": "3"
    }
}
  • 查看数据迁移状态:
$ ceph -w
  cluster:
    id:     383237bd-becf-49d5-9bd6-deb0bc35ab2a
    health: HEALTH_WARN
            Degraded data redundancy: 4825/16156 objects degraded (29.865%), 83 pgs degraded, 63 pgs undersized
            clock skew detected on mon.osdev02
            mon osdev01 is low on available space
 
  services:
    mon: 3 daemons, quorum osdev01,osdev02,osdev03
    mgr: osdev03(active), standbys: osdev02, osdev01
    osd: 4 osds: 3 up, 3 in; 63 remapped pgs
    rgw: 3 daemons active
 
  data:
    pools:   10 pools, 176 pgs
    objects: 5.39 k objects, 19 GiB
    usage:   43 GiB used, 22 TiB / 22 TiB avail
    pgs:     4825/16156 objects degraded (29.865%)
             88 active+clean
             48 active+undersized+degraded+remapped+backfill_wait
             19 active+recovery_wait+degraded
             15 active+recovery_wait+undersized+degraded+remapped
             5  active+recovery_wait
             1  active+recovering+degraded
 
  io:
    recovery: 15 MiB/s, 3 objects/s
 

2018-09-19 15:14:35.149958 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4825/16156 objects degraded (29.865%), 83 pgs degraded, 63 pgs undersized (PG_DEGRADED)
2018-09-19 15:14:40.154936 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4802/16156 objects degraded (29.723%), 83 pgs degraded, 63 pgs undersized (PG_DEGRADED)
2018-09-19 15:14:45.155511 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4785/16156 objects degraded (29.617%), 72 pgs degraded, 63 pgs undersized (PG_DEGRADED)
2018-09-19 15:14:50.156258 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4761/16156 objects degraded (29.469%), 70 pgs degraded, 63 pgs undersized (PG_DEGRADED)
2018-09-19 15:14:55.157259 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4736/16156 objects degraded (29.314%), 66 pgs degraded, 63 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:00.157805 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4715/16156 objects degraded (29.184%), 66 pgs degraded, 63 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:05.159788 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4700/16156 objects degraded (29.091%), 65 pgs degraded, 62 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:10.160347 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4687/16156 objects degraded (29.011%), 65 pgs degraded, 62 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:15.161346 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4663/16156 objects degraded (28.862%), 65 pgs degraded, 62 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:20.163878 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4639/16156 objects degraded (28.714%), 64 pgs degraded, 62 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:25.166626 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4634/16156 objects degraded (28.683%), 64 pgs degraded, 62 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:30.168933 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4612/16156 objects degraded (28.547%), 62 pgs degraded, 61 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:35.170116 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4590/16156 objects degraded (28.410%), 62 pgs degraded, 61 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:35.310448 mon.osdev01 [WRN] Health check failed: Reduced data availability: 1 pg inactive, 1 pg peering (PG_AVAILABILITY)
2018-09-19 15:15:40.170608 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4578/16156 objects degraded (28.336%), 60 pgs degraded, 60 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:41.314443 mon.osdev01 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive, 1 pg peering)
2018-09-19 15:15:45.171537 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4564/16156 objects degraded (28.250%), 60 pgs degraded, 60 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:50.172340 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4546/16156 objects degraded (28.138%), 59 pgs degraded, 59 pgs undersized (PG_DEGRADED)
2018-09-19 15:15:55.173243 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4536/16156 objects degraded (28.076%), 59 pgs degraded, 59 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:00.174125 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4514/16156 objects degraded (27.940%), 59 pgs degraded, 59 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:05.176502 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4496/16156 objects degraded (27.829%), 58 pgs degraded, 58 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:10.177113 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4486/16156 objects degraded (27.767%), 58 pgs degraded, 58 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:15.178024 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4464/16156 objects degraded (27.631%), 58 pgs degraded, 58 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:20.178774 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4457/16156 objects degraded (27.587%), 57 pgs degraded, 57 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:25.179609 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4436/16156 objects degraded (27.457%), 57 pgs degraded, 57 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:30.180333 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4426/16156 objects degraded (27.395%), 56 pgs degraded, 56 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:35.180850 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4404/16156 objects degraded (27.259%), 56 pgs degraded, 56 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:37.760009 mon.osdev01 [WRN] mon.1 172.29.101.167:6789/0 clock skew 1.47964s > max 0.5s
2018-09-19 15:16:40.181520 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4383/16156 objects degraded (27.129%), 55 pgs degraded, 55 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:45.183101 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4373/16156 objects degraded (27.067%), 55 pgs degraded, 55 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:50.184008 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4351/16156 objects degraded (26.931%), 55 pgs degraded, 55 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:51.434708 mon.osdev01 [WRN] Health check failed: Reduced data availability: 1 pg inactive, 1 pg peering (PG_AVAILABILITY)
2018-09-19 15:16:55.184869 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4336/16156 objects degraded (26.838%), 54 pgs degraded, 54 pgs undersized (PG_DEGRADED)
2018-09-19 15:16:56.238863 mon.osdev01 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive, 1 pg peering)
2018-09-19 15:17:00.185629 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4318/16156 objects degraded (26.727%), 54 pgs degraded, 54 pgs undersized (PG_DEGRADED)
2018-09-19 15:17:05.186503 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4296/16156 objects degraded (26.591%), 54 pgs degraded, 54 pgs undersized (PG_DEGRADED)
2018-09-19 15:17:10.187331 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4283/16156 objects degraded (26.510%), 52 pgs degraded, 52 pgs undersized (PG_DEGRADED)
2018-09-19 15:17:15.188170 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4261/16156 objects degraded (26.374%), 52 pgs degraded, 52 pgs undersized (PG_DEGRADED)
2018-09-19 15:17:20.189922 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4243/16156 objects degraded (26.263%), 51 pgs degraded, 51 pgs undersized (PG_DEGRADED)
2018-09-19 15:17:25.190843 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4227/16156 objects degraded (26.164%), 51 pgs degraded, 51 pgs undersized (PG_DEGRADED)
2018-09-19 15:17:30.191813 mon.osdev01 [WRN] Health check update: Degraded data redundancy: 4205/16156 objects degraded (26.027%), 51 pgs degraded, 51 pgs undersized (PG_DEGRADED)
2018-09-19 15:17:32.348305 mon.osdev01 [WRN] Health check failed: Reduced data availability: 1 pg inactive, 1 pg peering (PG_AVAILABILITY)
...

$ watch -n1 ceph -s
Every 1.0s: ceph -s                                                                                                                                                    Wed Sep 19 15:21:12 2018

  cluster:
    id:     383237bd-becf-49d5-9bd6-deb0bc35ab2a
    health: HEALTH_WARN
            Degraded data redundancy: 3372/16156 objects degraded (20.872%), 36 pgs degraded, 36 pgs undersized
            clock skew detected on mon.osdev02
            mon osdev01 is low on available space

  services:
    mon: 3 daemons, quorum osdev01,osdev02,osdev03
    mgr: osdev03(active), standbys: osdev02, osdev01
    osd: 4 osds: 3 up, 3 in; 36 remapped pgs
    rgw: 3 daemons active

  data:
    pools:   10 pools, 176 pgs
    objects: 5.39 k objects, 19 GiB
    usage:   48 GiB used, 22 TiB / 22 TiB avail
    pgs:     3372/16156 objects degraded (20.872%)
             140 active+clean
             35  active+undersized+degraded+remapped+backfill_wait
             1   active+undersized+degraded+remapped+backfilling

  io:
    recovery: 17 MiB/s, 4 objects/s

部署MDS

  • 3 个节点上部署 MDS 服务:
$ ceph-deploy mds create osdev01 osdev02 osdev03

部署RGW

  • 3 个节点上部署 RGW 服务:
$ ceph-deploy rgw create osdev01 osdev02 osdev03
  • 查看集群状态:
$ sudo ceph -s
  cluster:
    id:     383237bd-becf-49d5-9bd6-deb0bc35ab2a
    health: HEALTH_WARN
            too few PGs per OSD (22 < min 30)
 
  services:
    mon: 3 daemons, quorum osdev01,osdev02,osdev03
    mgr: osdev01(active), standbys: osdev03, osdev02
    osd: 3 osds: 3 up, 3 in
    rgw: 1 daemon active
 
  data:
    pools:   4 pools, 32 pgs
    objects: 16  objects, 3.2 KiB
    usage:   3.0 GiB used, 22 TiB / 22 TiB avail
    pgs:     31.250% pgs unknown
             3.125% pgs not active
             21 active+clean
             10 unknown
             1  creating+peering
 
  io:
    client:   2.4 KiB/s rd, 731 B/s wr, 3 op/s rd, 0 op/s wr

卸载Ceph

  • 卸载掉部署的 Ceph ,包括软件包和配置:
# destroy and uninstall all packages
$ ceph-deploy purge osdev01 osdev02 osdev03

# destroy data
$ ceph-deploy purgedata osdev01 osdev02 osdev03

$ ceph-deploy forgetkeys

# remove all keys
$ rm -rfv ceph.*

测试使用

创建Pool

  • 查看当前 Pool 信息,可以看到里面有几个 RGW 网关的默认存储池:
$ rados lspools
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log

$ rados -p .rgw.root ls
zone_info.4741b9cf-cc27-43d8-9bbc-59eee875b4db
zone_info.c775c6a6-036a-43ab-b558-ab0df40c3ad2
zonegroup_info.df77b60a-8423-4570-b9ae-ae4ef06a13a2
zone_info.0e5daa99-3863-4411-8d75-7d14a3f9a014
zonegroup_info.f652f53f-94bb-4599-a1c1-737f792a9510
zonegroup_info.5a4fb515-ef63-4ddc-85e0-5cf8339d9472
zone_names.default
zonegroups_names.default

$ ceph osd pool get .rgw.root pg_num
pg_num: 8

$ ceph osd dump
epoch 25
fsid 383237bd-becf-49d5-9bd6-deb0bc35ab2a
created 2018-08-23 10:55:49.409542
modified 2018-08-23 16:23:00.574710
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 7
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release mimic
pool 1 '.rgw.root' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 17 flags hashpspool stripe_width 0 application rgw
pool 2 'default.rgw.control' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 20 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.meta' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 22 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.log' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 24 flags hashpspool stripe_width 0 application rgw
max_osd 3
osd.0 up   in  weight 1 up_from 5 up_thru 23 down_at 0 last_clean_interval [0,0) 172.29.101.166:6801/719880 172.29.101.166:6802/719880 172.29.101.166:6803/719880 172.29.101.166:6804/719880 exists,up 2cb30e7c-7b98-4a6c-816a-2de7201a7669
osd.1 up   in  weight 1 up_from 15 up_thru 23 down_at 14 last_clean_interval [9,14) 172.29.101.167:6800/189449 172.29.101.167:6804/1189449 172.29.101.167:6805/1189449 172.29.101.167:6806/1189449 exists,up 9d3bafa9-9ea0-401c-ad67-a08ef7c2d9f7
osd.2 up   in  weight 1 up_from 13 up_thru 23 down_at 0 last_clean_interval [0,0) 172.29.101.168:6800/188591 172.29.101.168:6801/188591 172.29.101.168:6802/188591 172.29.101.168:6803/188591 exists,up a41fa4e0-c80b-4091-95cc-b58af291f387
  • 创建一个 Pool
$ ceph osd pool create glance 32 32
pool 'glance' created
  • 删除一个 Pool ,发现无法删除:
$ ceph osd pool delete glance
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool glance.  If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.

$ ceph osd pool delete glance glance --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
  • 配置允许删除 Pool
$ vi /etc/ceph/ceph.conf
[mon]
mon allow pool delete = true

$ systemctl restart ceph-mon.target
  • 再次删除 Pool
$ ceph osd pool delete glance glance --yes-i-really-really-mean-it
pool 'glance' removed

创建Object

  • 创建一个测试用 Pool ,并设置副本数为3:
$ ceph osd pool create test-pool 128 128
$ ceph osd lspools
1 .rgw.root
2 default.rgw.control
3 default.rgw.meta
4 default.rgw.log
5 test-pool

$ ceph osd dump | grep pool
pool 1 '.rgw.root' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 17 flags hashpspool stripe_width 0 application rgw
pool 2 'default.rgw.control' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 20 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.meta' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 22 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.log' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 24 flags hashpspool stripe_width 0 application rgw
pool 5 'test-pool' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 26 flags hashpspool stripe_width 0

$ rados lspools
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
test-pool

# set replicated size
$ ceph osd pool set test-pool size 3
set pool 5 size to 3

$ rados -p test-pool ls
  • 创建一个测试文件:
$ echo "He110 Ceph, You are Awesome 1ike MJ" > hello_ceph
  • 创建一个 Object
$ rados -p test-pool put object1 hello_ceph
  • 查看 Object OSDMap ,可以看到名字,所属 PG OSD ,以及他们的状态:
$ ceph osd map test-pool object1
osdmap e29 pool 'test-pool' (5) object 'object1' -> pg 5.bac5debc (5.3c) -> up ([0,1,2], p0) acting ([0,1,2], p0)

$ rados -p test-pool ls
object1

创建RBD

  • 创建一个 RBD Pool
$ ceph osd pool create rbd 8 8
$ rbd pool init rbd
  • 创建一个 RBD
$ rbd create rbd_test --size 10240
  • 查看 RADOS OSD 的变化,可以看到新建的 RBD 会多出来 3 个文件:
$ rbd ls
rbd_test

$ rados -p rbd ls
rbd_directory
rbd_header.11856b8b4567
rbd_info
rbd_object_map.11856b8b4567
rbd_id.rbd_test

$ ceph osd dump | grep pool
pool 1 '.rgw.root' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 17 flags hashpspool stripe_width 0 application rgw
pool 2 'default.rgw.control' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 20 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.meta' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 22 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.log' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 24 flags hashpspool stripe_width 0 application rgw
pool 5 'test-pool' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 29 flags hashpspool stripe_width 0
pool 6 'rbd' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 35 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd

映射RBD

  • 加载 RBD 内核模块:
$ uname -r
3.10.0-862.11.6.el7.x86_64

$ modprobe rbd

$ lsmod | grep rbd
rbd                    83728  0 
libceph               301687  1 rbd
  • 映射 RBD 块设备,发现由于内核版本较低,无法映射:
$ rbd map rbd_test
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the kernel with "rbd feature disable rbd_test object-map fast-diff deep-flatten".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address

$ dmesg | tail
[150078.190941] Key type dns_resolver registered
[150078.231155] Key type ceph registered
[150078.231538] libceph: loaded (mon/osd proto 15/24)
[150078.239110] rbd: loaded
[152620.392095] libceph: mon1 172.29.101.167:6789 session established
[152620.392821] libceph: client4522 fsid 383237bd-becf-49d5-9bd6-deb0bc35ab2a
[152620.646943] rbd: image rbd_test: image uses unsupported features: 0x38
[152648.322295] libceph: mon0 172.29.101.166:6789 session established
[152648.322845] libceph: client4530 fsid 383237bd-becf-49d5-9bd6-deb0bc35ab2a
[152648.357522] rbd: image rbd_test: image uses unsupported features: 0x38
  • 查看 RBD 块设备的特性:
$ rbd info rbd_test
rbd image 'rbd_test':
	size 10 GiB in 2560 objects
	order 22 (4 MiB objects)
	id: 11856b8b4567
	block_name_prefix: rbd_data.11856b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features: 
	flags: 
	create_timestamp: Fri Aug 24 10:21:11 2018

layering: 支持分层
striping: 支持条带化 v2
exclusive-lock: 支持独占锁
object-map: 支持对象映射(依赖 exclusive-lock )
fast-diff: 快速计算差异(依赖 object-map )
deep-flatten: 支持快照扁平化操作
journaling: 支持记录 IO 操作(依赖独占锁)
  • 修改 Ceph 默认 RBD 特性来解决这一问题:
$ vi /etc/ceph/ceph.conf
rbd_default_features = 1

$ ceph --show-config | grep rbd | grep "features rbd_default_features = 1"
  • 或者在创建 RBD 指定特性:
$ rbd create rbd_test --size 10G --image-format 1 --image-feature layering
  • 关闭掉内核不支持的特性:
$ rbd feature disable rbd_test object-map fast-diff deep-flatten
$ rbd info rbd_test
rbd image 'rbd_test':
	size 10 GiB in 2560 objects
	order 22 (4 MiB objects)
	id: 11856b8b4567
	block_name_prefix: rbd_data.11856b8b4567
	format: 2
	features: layering, exclusive-lock
	op_features: 
	flags: 
	create_timestamp: Fri Aug 24 10:21:11 2018
  • 重新映射 RBD
# rbd map rbd/rbd_test
$ rbd map rbd_test
/dev/rbd0

$ rbd showmapped
id pool image    snap device    
0  rbd  rbd_test -    /dev/rbd0

$ lsblk | grep rbd0
rbd0      252:0    0  10.2G  0 disk

使用RBD

  • 创建文件系统:
$ mkfs.xfs /dev/rbd0
meta-data=/dev/rbd0              isize=512    agcount=16, agsize=167936 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=2682880, imaxpct=25
         =                       sunit=1024   swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
  • 挂载 RBD ,并写入数据:
$ mkdir -pv /mnt/rbd_test
mkdir: 已创建目录 "/mnt/rbd_test"

$ mount /dev/rbd0 /mnt/rbd_test

$ dd if=/dev/zero of=/mnt/rbd_test/fi1e1 count=100 bs=1M
  • 查看 RADOS 的变化,可以看到一个 RBD 会被分为很多小对象:
$ ll -h /mnt/rbd_test/
总用量 100M
-rw-r--r-- 1 root root 100M 8月  24 11:35 fi1e1

$ rados -p rbd ls | grep 1185
rbd_data.11856b8b4567.0000000000000003
rbd_data.11856b8b4567.00000000000003d8
rbd_data.11856b8b4567.0000000000000d74
rbd_data.11856b8b4567.0000000000001294
rbd_data.11856b8b4567.0000000000000522
rbd_data.11856b8b4567.0000000000000007
rbd_data.11856b8b4567.0000000000001338
rbd_data.11856b8b4567.0000000000000018
rbd_data.11856b8b4567.000000000000000d
rbd_data.11856b8b4567.0000000000000148
rbd_data.11856b8b4567.00000000000000a4
rbd_data.11856b8b4567.00000000000013dc
rbd_data.11856b8b4567.0000000000000013
rbd_header.11856b8b4567
rbd_data.11856b8b4567.0000000000000000
rbd_data.11856b8b4567.0000000000000a40
rbd_data.11856b8b4567.000000000000114c
rbd_data.11856b8b4567.0000000000000008
rbd_data.11856b8b4567.0000000000000b88
rbd_data.11856b8b4567.0000000000000009
rbd_data.11856b8b4567.0000000000000521
rbd_data.11856b8b4567.0000000000000010
rbd_data.11856b8b4567.00000000000008f8
rbd_data.11856b8b4567.0000000000000012
rbd_data.11856b8b4567.0000000000000016
rbd_data.11856b8b4567.0000000000000014
rbd_data.11856b8b4567.000000000000001a
rbd_data.11856b8b4567.0000000000000854
rbd_data.11856b8b4567.000000000000000c
rbd_data.11856b8b4567.0000000000000ae4
rbd_data.11856b8b4567.000000000000047c
rbd_data.11856b8b4567.0000000000000005
rbd_data.11856b8b4567.0000000000000e18
rbd_data.11856b8b4567.000000000000000f
rbd_data.11856b8b4567.0000000000000cd0
rbd_data.11856b8b4567.00000000000001ec
rbd_data.11856b8b4567.0000000000000017
rbd_data.11856b8b4567.0000000000000a3b
rbd_data.11856b8b4567.0000000000000011
rbd_data.11856b8b4567.000000000000070c
rbd_data.11856b8b4567.0000000000000520
rbd_data.11856b8b4567.00000000000010a8
rbd_data.11856b8b4567.0000000000000015
rbd_data.11856b8b4567.0000000000000004
rbd_data.11856b8b4567.000000000000099c
rbd_data.11856b8b4567.0000000000000001
rbd_data.11856b8b4567.000000000000000b
rbd_data.11856b8b4567.0000000000000c2c
rbd_data.11856b8b4567.0000000000000334
rbd_data.11856b8b4567.00000000000005c4
rbd_data.11856b8b4567.000000000000000a
rbd_data.11856b8b4567.0000000000000006
rbd_data.11856b8b4567.0000000000000668
rbd_data.11856b8b4567.0000000000001004
rbd_data.11856b8b4567.0000000000000019
rbd_data.11856b8b4567.00000000000011f0
rbd_data.11856b8b4567.000000000000000e
rbd_data.11856b8b4567.0000000000000f60
rbd_data.11856b8b4567.00000000000007b0
rbd_data.11856b8b4567.0000000000000290
rbd_data.11856b8b4567.0000000000000ebc
rbd_data.11856b8b4567.0000000000000002

$ rados -p rbd ls | grep 1185 | wc -l
62
  • 再次写入数据并查看变化,随着写入的数据变多,其中的对象也会变多:
$ dd if=/dev/zero of=/mnt/rbd_test/fi1e1 count=200 bs=1M
记录了200+0 的读入
记录了200+0 的写出
209715200字节(210 MB)已复制,0.441176 秒,475 MB/秒

$ rados -p rbd ls | grep 1185 | wc -l
87

调整RBD

  • 调整 RBD 大小:
$ rbd resize rbd_test --size 20480
Resizing image: 100% complete...done.
  • 调整文件系统大小:
$ xfs_growfs -d /mnt/rbd_test/
meta-data=/dev/rbd0              isize=512    agcount=16, agsize=167936 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=2682880, imaxpct=25
         =                       sunit=1024   swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 2682880 to 5242880
  • 查看 RBD 变化:
$ rbd info rbd_test
rbd image 'rbd_test':
	size 20 GiB in 5120 objects
	order 22 (4 MiB objects)
	id: 11856b8b4567
	block_name_prefix: rbd_data.11856b8b4567
	format: 2
	features: layering, exclusive-lock
	op_features: 
	flags: 
	create_timestamp: Fri Aug 24 10:21:11 2018

$ lsblk | grep rbd0
rbd0      252:0    0    20G  0 disk /mnt/rbd_test

$ df -h | grep rbd
/dev/rbd0        20G  234M   20G    2% /mnt/rbd_test

快照RBD

  • 创建测试文件:
$ echo "Hello Ceph This is snapshot test" > /mnt/rbd_test/file2

$ ls -lh /mnt/rbd_test/
总用量 201M
-rw-r--r-- 1 root root 200M 8月  24 15:46 fi1e1
-rw-r--r-- 1 root root   33 8月  24 15:51 file2

$ cat /mnt/rbd_test/file2
Hello Ceph This is snapshot test
  • 创建 RBD 快照:
$ rbd snap create rbd_test@snap1
$ rbd snap ls rbd_test
SNAPID NAME    SIZE TIMESTAMP                
     4 snap1 20 GiB Fri Aug 24 15:52:49 2018
  • 删除文件:
$ rm -rfv /mnt/rbd_test/file2
已删除"/mnt/rbd_test/file2"
$ ls -lh /mnt/rbd_test/
总用量 200M
-rw-r--r-- 1 root root 200M 8月  24 15:46 fi1e1
  • 卸载并取消 RBD 映射:
$ umount /mnt/rbd_test
$ rbd unmap rbd_test
  • 回滚 RBD
$ rbd snap rollback rbd_test@snap1
Rolling back to snapshot: 100% complete...done.
  • 重新映射和挂载 RBD ,并查看文件:
$ rbd map rbd_test
/dev/rbd0

$ mount /dev/rbd0 /mnt/rbd_test
$ ls -lh /mnt/rbd_test/
总用量 201M
-rw-r--r-- 1 root root 200M 8月  24 15:46 fi1e1
-rw-r--r-- 1 root root   33 8月  24 15:51 file2

观察PG

  • 随意查看 rbd 存储池中的对象 OSDMap ,可以看到其中 PG OSD 顺序并不完全相同,而且同一个 Pool 中的对象的 PG ID 中小数点前的数字是一样的:
$ ceph osd map rbd rbd_info
osdmap e74 pool 'rbd' (6) object 'rbd_info' -> pg 6.ac0e573a (6.2) -> up ([1,0,2], p1) acting ([1,0,2], p1)

$ ceph osd map rbd rbd_directory
osdmap e74 pool 'rbd' (6) object 'rbd_directory' -> pg 6.30a98c1c (6.4) -> up ([0,1,2], p0) acting ([0,1,2], p0)

$ ceph osd map rbd rbd_id.rbd_test
osdmap e74 pool 'rbd' (6) object 'rbd_id.rbd_test' -> pg 6.818788b3 (6.3) -> up ([1,2,0], p1) acting ([1,2,0], p1)

$ ceph osd map rbd rbd_data.11856b8b4567.0000000000000022
osdmap e74 pool 'rbd' (6) object 'rbd_data.11856b8b4567.0000000000000022' -> pg 6.deee7c73 (6.3) -> up ([1,2,0], p1) acting ([1,2,0], p1)

$ ceph osd map rbd rbd_data.11856b8b4567.000000000000000a
osdmap e74 pool 'rbd' (6) object 'rbd_data.11856b8b4567.000000000000000a' -> pg 6.561c344b (6.3) -> up ([1,2,0], p1) acting ([1,2,0], p1)

$ ceph osd map rbd rbd_data.11856b8b4567.00000000000007b0
osdmap e74 pool 'rbd' (6) object 'rbd_data.11856b8b4567.00000000000007b0' -> pg 6.a603e1f (6.7) -> up ([1,0,2], p1) acting ([1,0,2], p1)
  • 创建一个两副本的存储池,可以看到同一个存储池对象的 PG 也可能会使用不同的 OSD
$ ceph osd pool create pg_test 8 8
pool 'pg_test' created

$ ceph osd dump | grep pg_test
pool 12 'pg_test' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 75 flags hashpspool stripe_width 0

$ osd pool set pg_test size 2
set pool 12 size to 2
$ ceph osd dump | grep pg_test
pool 12 'pg_test' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 78 flags hashpspool stripe_width 0

$ rados -p pg_test put object1 /etc/hosts
$ rados -p pg_test put object2 /etc/hosts
$ rados -p pg_test put object3 /etc/hosts
$ rados -p pg_test put object4 /etc/hosts
$ rados -p pg_test put object5 /etc/hosts

$ rados -p pg_test ls
object1
object2
object3
object4
object5

$ ceph osd map pg_test object1
osdmap e79 pool 'pg_test' (12) object 'object1' -> pg 12.bac5debc (12.4) -> up ([2,0], p2) acting ([2,0], p2)

$ ceph osd map pg_test object2
osdmap e79 pool 'pg_test' (12) object 'object2' -> pg 12.f85a416a (12.2) -> up ([2,0], p2) acting ([2,0], p2)

$ ceph osd map pg_test object3
osdmap e79 pool 'pg_test' (12) object 'object3' -> pg 12.f877ac20 (12.0) -> up ([1,0], p1) acting ([1,0], p1

$ ceph osd map pg_test object4
osdmap e79 pool 'pg_test' (12) object 'object4' -> pg 12.9d9216ab (12.3) -> up ([2,1], p2) acting ([2,1], p2)

$ ceph osd map pg_test object5
osdmap e79 pool 'pg_test' (12) object 'object5' -> pg 12.e1acd6d (12.5) -> up ([1,2], p1) acting ([1,2], p1)

测试性能

  • 写入性能测试:
$ rados bench -p test-pool 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_osdev01_1827771
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        31        15   59.8716        60    0.388146    0.666288
    2      16        49        33   65.9176        72     0.62486    0.824162
    3      16        65        49   65.2595        64     1.18038    0.834558
    4      16        86        70   69.8978        84    0.657194    0.834779
    5      16       107        91   72.7115        84    0.594541    0.829814
    6      16       125       109   72.5838        72    0.371435    0.796664
    7      16       149       133   75.8989        96     1.17764    0.803259
    8      16       165       149   74.4101        64    0.568129    0.797091
    9      16       185       169     75.01        80    0.813372     0.81463
   10      16       203       187   74.7085        72    0.728715    0.812529
Total time run:         10.3161
Total writes made:      203
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     78.7122
Stddev Bandwidth:       11.1634
Max bandwidth (MB/sec): 96
Min bandwidth (MB/sec): 60
Average IOPS:           19
Stddev IOPS:            2
Max IOPS:               24
Min IOPS:               15
Average Latency(s):     0.80954
Stddev Latency(s):      0.293645
Max latency(s):         1.77366
Min latency(s):         0.240024
  • 顺序读取性能测试:
$ rados bench -p test-pool 10 seq
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        72        56   223.808       224   0.0519066    0.217292
    2      16       111        95   189.736       156    0.658876    0.289657
    3      16       160       144   191.663       196   0.0658452    0.301259
    4      16       203       187   186.745       172    0.210803    0.297584
Total time run:       4.43386
Total reads made:     203
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   183.136
Average IOPS:         45
Stddev IOPS:          7
Max IOPS:             56
Min IOPS:             39
Average Latency(s):   0.346754
Max latency(s):       1.37891
Min latency(s):       0.0249563
  • 随机读取性能测试:
$ rados bench -p test-pool 10 rand
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        59        43    171.94       172    0.271225    0.222279
    2      16       108        92    183.95       196     1.06429    0.275433
    3      16       153       137   182.618       180  0.00350975    0.304582
    4      16       224       208   207.951       284   0.0678476    0.278888
    5      16       267       251   200.757       172  0.00386545    0.289519
    6      16       319       303   201.955       208    0.866646    0.294983
    7      16       360       344   196.529       164  0.00428517     0.30615
    8      16       405       389   194.458       180    0.903073    0.311316
    9      16       455       439   195.071       200  0.00368576    0.316057
   10      16       517       501    200.36       248    0.621325    0.309242
Total time run:       10.5614
Total reads made:     518
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   196.187
Average IOPS:         49
Stddev IOPS:          9
Max IOPS:             71
Min IOPS:             41
Average Latency(s):   0.321834
Max latency(s):       1.16304
Min latency(s):       0.0026629
  • 使用 fio 进行测试:
$ yum install -y fio "*librbd*"

$ rbd create fio_test --size 20480

$ vi write.fio
[global]
description="write test with block size of 4M"
ioengine=rbd
clustername=ceph
clientname=admin
pool=rbd
rbdname=fio_test
iodepth=32
runtime=120
rw=write
bs=4M

[logging]
write_iops_log=write_iops_log
write_bw_log=write_bw_log
write_lat_log=write_lat_log


$ fio write.fio 
logging: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=rbd, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]  
logging: (groupid=0, jobs=1): err= 0: pid=161962: Wed Aug 29 19:17:17 2018
  Description  : ["write test with block size of 4M"]
  write: IOPS=15, BW=60.4MiB/s (63.3MB/s)(7252MiB/120085msec)
    slat (usec): min=665, max=14535, avg=1584.29, stdev=860.28
    clat (msec): min=1828, max=4353, avg=2092.28, stdev=180.12
     lat (msec): min=1829, max=4354, avg=2093.87, stdev=180.15
    clat percentiles (msec):
     |  1.00th=[ 1838],  5.00th=[ 1938], 10.00th=[ 1989], 20.00th=[ 2022],
     | 30.00th=[ 2039], 40.00th=[ 2056], 50.00th=[ 2072], 60.00th=[ 2106],
     | 70.00th=[ 2123], 80.00th=[ 2165], 90.00th=[ 2198], 95.00th=[ 2232],
     | 99.00th=[ 2333], 99.50th=[ 3977], 99.90th=[ 4111], 99.95th=[ 4329],
     | 99.99th=[ 4329]
   bw (  KiB/s): min=  963, max= 2294, per=3.26%, avg=2013.72, stdev=117.50, samples=1813
   iops        : min=    1, max=    1, avg= 1.00, stdev= 0.00, samples=1813
  lat (msec)   : 2000=13.40%, >=2000=86.60%
  cpu          : usr=1.94%, sys=0.40%, ctx=157, majf=0, minf=157364
  IO depths    : 1=2.3%, 2=6.0%, 4=12.6%, 8=25.2%, 16=50.3%, 32=3.6%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=97.0%, 8=0.0%, 16=0.0%, 32=3.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,1813,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=60.4MiB/s (63.3MB/s), 60.4MiB/s-60.4MiB/s (63.3MB/s-63.3MB/s), io=7252MiB (7604MB), run=120085-120085msec

Disk stats (read/write):
  sda: ios=5/653, merge=0/6, ticks=6/2818, in_queue=2824, util=0.17%

参考文档

  1. INSTALLATION (CEPH-DEPLOY)

你可能感兴趣的:(ceph,学习,linux,运维,centos)