CentOS7下部署ceph-12 (luminous)--多机集群

0. 准备

前一篇点击打开链接只部署了一个单机集群。在这一篇里,手动部署一个多机集群:mycluster。我们有三台机器nod1,node2和node3;其中node1可以免密ssh/scp任意其他两台机器。我们的所有工作都在node1上完成。

准备工作包括在各个机器上安装ceph rpm包(见前一篇第1节点击打开链接),并在各个机器上修改下列文件:

/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]
/usr/lib/systemd/system/[email protected]

修改:

Environment=CLUSTER=ceph                                                  <---  改成CLUSTER=mycluster
ExecStart=/usr/bin/... --id %i --setuser ceph --setgroup ceph    <--- 删掉--setuser ceph --setgroup ceph


1. 创建工作目录

在node1创建一个工作目录,后续所有工作都在node1上的这个工作目录中完成;

mkdir /tmp/mk-ceph-cluster
cd /tmp/mk-ceph-cluster

2. 创建配置文件

vim mycluster.conf
[global]
    cluster                     = mycluster
    fsid                        = 116d4de8-fd14-491f-811f-c1bdd8fac141

    public network              = 192.168.100.0/24
    cluster network             = 192.168.73.0/24

    auth cluster required       = cephx
    auth service required       = cephx
    auth client required        = cephx

    osd pool default size       = 3
    osd pool default min size   = 2

    osd pool default pg num     = 128
    osd pool default pgp num    = 128

    osd pool default crush rule = 0
    osd crush chooseleaf type   = 1

    admin socket                = /var/run/ceph/$cluster-$name.asock
    pid file                    = /var/run/ceph/$cluster-$name.pid
    log file                    = /var/log/ceph/$cluster-$name.log
    log to syslog               = false

    max open files              = 131072
    ms bind ipv6                = false

[mon]
    mon initial members = node1,node2,node3
    mon host = 192.168.100.131:6789,192.168.100.132:6789,192.168.100.133:6789

    ;Yuanguo: the default value of {mon data} is /var/lib/ceph/mon/$cluster-$id,
    ;         we overwrite it.
    mon data                     = /var/lib/ceph/mon/$cluster-$name
    mon clock drift allowed      = 10
    mon clock drift warn backoff = 30

    mon osd full ratio           = .95
    mon osd nearfull ratio       = .85

    mon osd down out interval    = 600
    mon osd report timeout       = 300

    debug ms                     = 20
    debug mon                    = 20
    debug paxos                  = 20
    debug auth                   = 20
    mon allow pool delete      = true  ; without this, you cannot delete pool
[mon.node1]
    host                         = node1
    mon addr                     = 192.168.100.131:6789
[mon.node2]
    host                         = node2
    mon addr                     = 192.168.100.132:6789
[mon.node3]
    host                         = node3
    mon addr                     = 192.168.100.133:6789

[mgr]
    ;Yuanguo: the default value of {mgr data} is /var/lib/ceph/mgr/$cluster-$id,
    ;         we overwrite it.
    mgr data                     = /var/lib/ceph/mgr/$cluster-$name

[osd]
    ;Yuanguo: we wish to overwrite {osd data}, but it seems that 'ceph-disk' forces
    ;     to use the default value, so keep the default now; maybe in later versions
    ;     of ceph the limitation will be eliminated.
    osd data                     = /var/lib/ceph/osd/$cluster-$id
    osd recovery max active      = 3
    osd max backfills            = 5
    osd max scrubs               = 2

    osd mkfs type = xfs
    osd mkfs options xfs = -f -i size=1024
    osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog

    filestore max sync interval  = 5
    osd op threads               = 2

    debug ms                     = 100
    debug osd                    = 100
需要说明的是,在这个配置文件中,我们覆盖了一些默认值,比如:{mon data}和{mgr data},但是没有覆盖{osd data},因为ceph-disk貌似强制使用默认值。另外,pid, sock文件被放置在/var/run/ceph/中,以$cluster-$name命名;log文件放置在/var/log/ceph/中,也是以$cluster-$name命名。这些都可以覆盖。

3. 生成keyring

在单机部署中点击打开链接,我们说过,有两种操作集群中user及其权限的方式,这里我们使用第一种:先生成keyring文件,然后在创建集群时带入使之生效。

ceph-authtool --create-keyring mycluster.keyring --gen-key -n mon. --cap mon 'allow *'

ceph-authtool --create-keyring mycluster.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *'
ceph-authtool --create-keyring mycluster.client.bootstrap-osd.keyring --gen-key -n client.bootstrap-osd --cap mon 'allow profile bootstrap-osd'
ceph-authtool --create-keyring mycluster.mgr.node1.keyring --gen-key -n mgr.node1 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *'
ceph-authtool --create-keyring mycluster.mgr.node2.keyring --gen-key -n mgr.node2 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *'
ceph-authtool --create-keyring mycluster.mgr.node3.keyring --gen-key -n mgr.node3 --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *'
  
ceph-authtool mycluster.keyring  --import-keyring mycluster.client.admin.keyring
ceph-authtool mycluster.keyring  --import-keyring mycluster.client.bootstrap-osd.keyring
ceph-authtool mycluster.keyring  --import-keyring mycluster.mgr.node1.keyring
ceph-authtool mycluster.keyring  --import-keyring mycluster.mgr.node2.keyring
ceph-authtool mycluster.keyring  --import-keyring mycluster.mgr.node3.keyring

cat mycluster.keyring
[mon.]
        key = AQA525NZsY73ERAAIM1J6wSxglBNma3XAdEcVg==
        caps mon = "allow *"
[client.admin]
        key = AQBJ25NZznIpEBAAlCdCy+OyUIvxtNq+1DSLqg==
        auid = 0
        caps mds = "allow *"
        caps mgr = "allow *"
        caps mon = "allow *"
        caps osd = "allow *"
[client.bootstrap-osd]
        key = AQBW25NZtl/RBxAACGWafYy1gPWEmx9geCLi6w==
        caps mon = "allow profile bootstrap-osd"
[mgr.node1]
        key = AQBb25NZ1mIeFhAA/PmRHFY6OgnAMXL1/8pSxw==
        caps mds = "allow *"
        caps mon = "allow profile mgr"
        caps osd = "allow *"
[mgr.node2]
        key = AQBg25NZJ6jyHxAAf2GfBAG5tuNwf9YjkhhEWA==
        caps mds = "allow *"
        caps mon = "allow profile mgr"
        caps osd = "allow *"
[mgr.node3]
        key = AQBl25NZ7h6CJRAAaFiea7hiTrQNVoZysA7n/g==
        caps mds = "allow *"
        caps mon = "allow profile mgr"
        caps osd = "allow *"

4. 生成monmap

生成monmap并添加3个monitor

monmaptool --create --add node1 192.168.100.131:6789 --add node2 192.168.100.132:6789 --add node3 192.168.100.133:6789  --fsid 116d4de8-fd14-491f-811f-c1bdd8fac141 monmap
monmaptool --print monmap
monmaptool: monmap file monmap
epoch 0
fsid 116d4de8-fd14-491f-811f-c1bdd8fac141
last_changed 2017-08-16 05:45:37.851899
created 2017-08-16 05:45:37.851899
0: 192.168.100.131:6789/0 mon.node1
1: 192.168.100.132:6789/0 mon.node2
2: 192.168.100.133:6789/0 mon.node3


5. 分发配置文件,keyring和monmap

把第2、3和4步中生成的配置文件,keyring,monmap分发到各个机器。由于mycluster.mgr.nodeX.keyring暂时使用不到,先不分发它们(见第8节)。

cp mycluster.client.admin.keyring mycluster.client.bootstrap-osd.keyring mycluster.keyring  mycluster.conf monmap /etc/ceph
scp mycluster.client.admin.keyring mycluster.client.bootstrap-osd.keyring mycluster.keyring  mycluster.conf monmap node2:/etc/ceph
scp mycluster.client.admin.keyring mycluster.client.bootstrap-osd.keyring mycluster.keyring  mycluster.conf monmap node3:/etc/ceph

6. 创建集群

6.1 创建{mon data}目录

mkdir /var/lib/ceph/mon/mycluster-mon.node1    
ssh node2 mkdir /var/lib/ceph/mon/mycluster-mon.node2
ssh node3 mkdir /var/lib/ceph/mon/mycluster-mon.node3

注意,在配置文件mycluster.conf中,我们把{mon data}设置为/var/lib/ceph/mon/$cluster-$name,而不是默认的/var/lib/ceph/mon/$cluster-$id;
$cluster-$name展开为mycluster-mon.node1(23);
默认的$cluster-$id展开为mycluster-node1(23);

6.2 初始化monitor

ceph-mon --cluster mycluster --mkfs -i node1 --monmap /etc/ceph/monmap --keyring /etc/ceph/mycluster.keyring
ssh node2 ceph-mon --cluster mycluster --mkfs -i node2 --monmap /etc/ceph/monmap --keyring /etc/ceph/mycluster.keyring
ssh node3 ceph-mon --cluster mycluster --mkfs -i node3 --monmap /etc/ceph/monmap --keyring /etc/ceph/mycluster.keyring
注意,在配置文件mycluster.conf,我们把{mon data}设置为/var/lib/ceph/mon/$cluster-$name,展开为/var/lib/ceph/mon/mycluster-mon.node1(23)。ceph-mon会
根据--cluster mycluster找到配置文件mycluster.conf,并解析出{mon data},然后在那个目录下进行初始化。

6.3 touch done

touch /var/lib/ceph/mon/mycluster-mon.node1/done
ssh node2 touch /var/lib/ceph/mon/mycluster-mon.node2/done
ssh node3 touch /var/lib/ceph/mon/mycluster-mon.node3/done


6.4 启动monitors

systemctl start ceph-mon@node1
ssh node2 systemctl start ceph-mon@node2
ssh node3 systemctl start ceph-mon@node3

6.5 检查机器状态

ceph --cluster mycluster -s
  cluster:
    id:     116d4de8-fd14-491f-811f-c1bdd8fac141
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum node1,node2,node3
    mgr: no daemons active
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

7. 添加osd

每台集群都有一个/dev/sdb,我们把它们作为osd。

7.1 删除它们的分区

7.2 prepare

ceph-disk prepare --cluster mycluster --cluster-uuid 116d4de8-fd14-491f-811f-c1bdd8fac141 --bluestore --block.db /dev/sdb  --block.wal /dev/sdb /dev/sdb
ssh node2 ceph-disk prepare --cluster mycluster --cluster-uuid 116d4de8-fd14-491f-811f-c1bdd8fac141 --bluestore --block.db /dev/sdb  --block.wal /dev/sdb /dev/sdb
ssh node3 ceph-disk prepare --cluster mycluster --cluster-uuid 116d4de8-fd14-491f-811f-c1bdd8fac141 /dev/sdb
注意:prepare node3:/dev/sdb时,我们没有加选项:--bluestore --block.db /dev/sdb  --block.wal /dev/sdb;后面我们会看它和其他两个有什么不同。

7.3 activate

ceph-disk activate /dev/sdb1 --activate-key /etc/ceph/mycluster.client.bootstrap-osd.keyring
ssh node2 ceph-disk activate /dev/sdb1 --activate-key /etc/ceph/mycluster.client.bootstrap-osd.keyring
ssh node3 ceph-disk activate /dev/sdb1 --activate-key /etc/ceph/mycluster.client.bootstrap-osd.keyring

注意:ceph-disk好像有两个问题:

  • 前面说过,它不使用自定义的{osd data},而强制使用默认值 /var/lib/ceph/osd/$cluster-$id
  • 好像不能为一个磁盘指定osd id,而只能依赖它自动生成。虽然ceph-disk prepare有一个选项--osd-id,但是ceph-disk activate并不使用它而是自己生成。当不匹配时,会出现 如下错误:
    # ceph-disk activate /dev/sdb1 --activate-key /etc/ceph/mycluster.client.bootstrap-osd.keyring
    command_with_stdin: Error EEXIST: entity osd.0 exists but key does not match
    
    mount_activate: Failed to activate
    '['ceph', '--cluster', 'mycluster', '--name', 'client.bootstrap-osd', '--keyring', '/etc/ceph/mycluster.client.bootstrap-osd.keyring', '-i', '-', 'osd', 'new', u'ca8aac6a-b442-4b07-8fa6-62ac93b7cd29']' failed with status code 17
    
    从 '-i', '-'可以看出,它只能自动生成osd id;

7.4 检查osd

在ceph-disk prepare时,node1:/dev/sdb和node2:/dev/sdb一样,都有--bluestore --block.db /dev/sdb  --block.wal选项;node3:/dev/sdb不同,没有加这些选项。我们看看有什么不同。

7.4.1 node1

mount | grep sdb
/dev/sdb1 on /var/lib/ceph/osd/mycluster-0 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)

ls /var/lib/ceph/osd/mycluster-0/
activate.monmap  block     block.db_uuid  block.wal       bluefs     fsid     kv_backend  mkfs_done  systemd  whoami
active           block.db  block_uuid     block.wal_uuid  ceph_fsid  keyring  magic       ready      type

ls -l /var/lib/ceph/osd/mycluster-0/block
lrwxrwxrwx. 1 ceph ceph 58 Aug 16 05:52 /var/lib/ceph/osd/mycluster-0/block -> /dev/disk/by-partuuid/a12dd642-b64c-4fef-b9e6-0b45cff40fa9

ls -l /dev/disk/by-partuuid/a12dd642-b64c-4fef-b9e6-0b45cff40fa9
lrwxrwxrwx. 1 root root 10 Aug 16 05:55 /dev/disk/by-partuuid/a12dd642-b64c-4fef-b9e6-0b45cff40fa9 -> ../../sdb2

blkid /dev/sdb2
/dev/sdb2: PARTLABEL="ceph block" PARTUUID="a12dd642-b64c-4fef-b9e6-0b45cff40fa9"

cat /var/lib/ceph/osd/mycluster-0/block_uuid
a12dd642-b64c-4fef-b9e6-0b45cff40fa9



ls -l /var/lib/ceph/osd/mycluster-0/block.db
lrwxrwxrwx. 1 ceph ceph 58 Aug 16 05:52 /var/lib/ceph/osd/mycluster-0/block.db -> /dev/disk/by-partuuid/1c107775-45e6-4b79-8a2f-1592f5cb03f2

ls -l /dev/disk/by-partuuid/1c107775-45e6-4b79-8a2f-1592f5cb03f2
lrwxrwxrwx. 1 root root 10 Aug 16 05:55 /dev/disk/by-partuuid/1c107775-45e6-4b79-8a2f-1592f5cb03f2 -> ../../sdb3

blkid /dev/sdb3
/dev/sdb3: PARTLABEL="ceph block.db" PARTUUID="1c107775-45e6-4b79-8a2f-1592f5cb03f2"

cat /var/lib/ceph/osd/mycluster-0/block.db_uuid
1c107775-45e6-4b79-8a2f-1592f5cb03f2



ls -l /var/lib/ceph/osd/mycluster-0/block.wal
lrwxrwxrwx. 1 ceph ceph 58 Aug 16 05:52 /var/lib/ceph/osd/mycluster-0/block.wal -> /dev/disk/by-partuuid/76055101-b892-4da9-b80a-c1920f24183f

ls -l /dev/disk/by-partuuid/76055101-b892-4da9-b80a-c1920f24183f
lrwxrwxrwx. 1 root root 10 Aug 16 05:55 /dev/disk/by-partuuid/76055101-b892-4da9-b80a-c1920f24183f -> ../../sdb4

blkid /dev/sdb4
/dev/sdb4: PARTLABEL="ceph block.wal" PARTUUID="76055101-b892-4da9-b80a-c1920f24183f"

cat /var/lib/ceph/osd/mycluster-0/block.wal_uuid
76055101-b892-4da9-b80a-c1920f24183f


可见,node1(node2)上,/dev/sdb被分为4个分区:
  • /dev/sdb1: metadata
  • /dev/sdb2:the main block device
  • /dev/sdb3: db
  • /dev/sdb4: wal 
具体见:ceph-disk prepare --help

7.4.2 node3

mount | grep sdb
/dev/sdb1 on /var/lib/ceph/osd/mycluster-2 type xfs (rw,noatime,seclabel,attr2,inode64,noquota)

ls /var/lib/ceph/osd/mycluster-2
activate.monmap  active  block  block_uuid  bluefs  ceph_fsid  fsid  keyring  kv_backend  magic  mkfs_done  ready  systemd  type  whoami

ls -l /var/lib/ceph/osd/mycluster-2/block
lrwxrwxrwx. 1 ceph ceph 58 Aug 16 05:54 /var/lib/ceph/osd/mycluster-2/block -> /dev/disk/by-partuuid/0a70b661-43f5-4562-83e0-cbe6bdbd31fb

ls -l /dev/disk/by-partuuid/0a70b661-43f5-4562-83e0-cbe6bdbd31fb
lrwxrwxrwx. 1 root root 10 Aug 16 05:56 /dev/disk/by-partuuid/0a70b661-43f5-4562-83e0-cbe6bdbd31fb -> ../../sdb2

blkid /dev/sdb2
/dev/sdb2: PARTLABEL="ceph block" PARTUUID="0a70b661-43f5-4562-83e0-cbe6bdbd31fb"

cat /var/lib/ceph/osd/mycluster-2/block_uuid
0a70b661-43f5-4562-83e0-cbe6bdbd31fb


可见,在node3上,/dev/sdb被分为2个分区:

  • /dev/sdb1:metadata
  • /dev/sdb2:the main block device;db和wal也在这个分区上。
具体见:ceph-disk prepare --help

7.5 检查集群状态

ceph --cluster mycluster -s
  cluster:
    id:     116d4de8-fd14-491f-811f-c1bdd8fac141
    health: HEALTH_WARN
            no active mgr

  services:
    mon: 3 daemons, quorum node1,node2,node3
    mgr: no daemons active
    osd: 3 osds: 3 up, 3 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

由于没有添加mgr,集群处于WARN状态。


8. 添加mgr

8.1 创建{mgr data}目录

mkdir /var/lib/ceph/mgr/mycluster-mgr.node1
ssh node2 mkdir /var/lib/ceph/mgr/mycluster-mgr.node2
ssh node3 mkdir /var/lib/ceph/mgr/mycluster-mgr.node3
注意,和{mon data}类似,在配置文件mycluster.conf中,我们把{mgr data}设置为/var/lib/ceph/mgr/$cluster-$name,而不是默认的/var/lib/ceph/mgr/$cluster-$id。

8.2 分发mgr的keyring

cp mycluster.mgr.node1.keyring /var/lib/ceph/mgr/mycluster-mgr.node1/keyring
scp mycluster.mgr.node2.keyring node2:/var/lib/ceph/mgr/mycluster-mgr.node2/keyring
scp mycluster.mgr.node3.keyring node3:/var/lib/ceph/mgr/mycluster-mgr.node3/keyring

8.3 启动mgr

systemctl start ceph-mgr@node1
ssh node2 systemctl start ceph-mgr@node2
ssh node3 systemctl start ceph-mgr@node3

8.4 检查集群状态

ceph --cluster mycluster -s
  cluster:
    id:     116d4de8-fd14-491f-811f-c1bdd8fac141
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum node1,node2,node3
    mgr: node1(active), standbys: node3, node2
    osd: 3 osds: 3 up, 3 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   5158 MB used, 113 GB / 118 GB avail
    pgs:
可见,添加mgr之后,集群处于OK状态。

你可能感兴趣的:(ceph,操作,ceph工具部署)