kolla-ceph 4:支持bcache设备和iscsi及多路径设备

系列链接

  1. https://www.jianshu.com/p/f18a1b3a4920 如何用kolla来部署容器化ceph集群
  2. https://www.jianshu.com/p/a39f226d5dfb 修复一些部署中遇到的问题
  3. https://www.jianshu.com/p/d520fed237c0 在kolla ceph中引入device classes特性
  4. https://www.jianshu.com/p/d6e047e1ad06 支持bcache设备和iscsi及多路径设备
  5. https://www.jianshu.com/p/ab8251fc991a 一个ceph容器化部署编排项目

本篇主要介绍如何在kolla-ceph中支持bcache设备和iscsi及多路径设备.

commit url

kolla和kolla-ansible本身不支持bcache磁盘和多路径磁盘, 我提交了这两个commit来支持以上磁盘,和原有实现最大的区别是, 原有实现利用partname来建立osd分区软链接, 我是使用partuuid来建立软链接, 具体实现我参考了ceph-disk的做法.

kolla: https://review.opendev.org/#/c/599961/

kolla-ansible: https://review.opendev.org/#/c/599962/

Kolla Ceph 使用Bcache磁盘

Bcache介绍

Bcache是Linux内核块设备层cache,支持多块HDD使用同一块SSD或者NVME作为缓存盘。它让SSD作为HDD的缓存成为了可能。由于SSD价格昂贵,存储空间小,而HDD价格低廉,存储空间大,因此采用SSD作为缓存,HDD作为数据存储盘,既解决了SSD容量太小,又解决了HDD运行速度太慢的问题。

为什么要在bluestore中使用Bcache磁盘

我们知道, bluestore不使用本地文件系统,直接接管裸设备,由于操作系统支持的aio操作只支持directIO,所以对Block设备的写操作直接写入磁盘。相比filestore, 跳过写日志的步骤, 写两次变成写一次, 理论上写入速度应该变大. 所以设计bluestore的初衷是为高速磁盘使用, 但是没办法, 经费决定着我们必须以普通磁盘为主. 对于普通磁盘来说,它的IO瓶颈决定了性能的上限, 为了提高这个上限, 我们需要加一层缓存给它, 这就是bcache的目的.

构建Bcache磁盘

以下都在我的虚拟机上进行测试, 环境为centos7

节点 ssd磁盘 普通磁盘
ceph-node1 sdb sdc,sdd
ceph-node2 sdb sdc,sdd
ceph-node3 sdb sdc,sdd
  • 首先对ssd磁盘进行分区, 我们要做的是一个ssd磁盘(sdb)对应两个普通磁盘(sdc,sdd)
sudo sgdisk --zap-all -- /dev/sdb

parted /dev/sdb -s -- mklabel gpt mkpart  bcache0  1  25000
parted /dev/sdb -s mkpart bcache1  25001  100%
  • 安装bcache
# 我的环境缺少以下两个包, blkid和uuid, 根据build的错误自行安装对应包
yum install libblkid-devel uuid -y

#安装bcache-tools
git clone https://evilpiepirate.org/git/bcache-tools.git
cd bcache-tools
make
make install

# 内核加载bcache模块
modprobe bcache
  • 清除旧的bcache分区数据
dd if=/dev/zero of=/dev/sdb1 bs=512k count=200
dd if=/dev/zero of=/dev/sdb2 bs=512k count=200

ps: bcache提示可以使用wipefs -a /dev/sdb1来清除, 但是这个命令在我的环境上有个bug, 比如我想清除之前的bcache缓存重新分盘, 但是执行wipefs命令后缓存磁盘又会出现在/sys/fs/bcache下面.导致后续操作都出现"Device or resource busy".

  • 清除bcache后端设备分区
sudo sgdisk --zap-all -- /dev/sdc
sudo sgdisk --zap-all -- /dev/sdd
  • 新建bcache设备
make-bcache -C /dev/sdb1 -B /dev/sdb --writeback
make-bcache -C /dev/sdb2 -B /dev/sdc --writeback
  • 查看
[root@ceph-node1 bcache]# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sdd               8:48   0   50G  0 disk
└─bcache1       252:128  0   50G  0 disk
sdb               8:16   0   50G  0 disk
├─sdb2            8:18   0 26.7G  0 part
│ └─bcache1     252:128  0   50G  0 disk
└─sdb1            8:17   0 23.3G  0 part
  └─bcache0     252:0    0   50G  0 disk
sdc               8:32   0   50G  0 disk
└─bcache0       252:0    0   50G  0 disk

[root@ceph-node1 bcache]# fdisk -l

Disk /dev/bcache0: 53.7 GB, 53687083008 bytes, 104857584 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/bcache1: 53.7 GB, 53687083008 bytes, 104857584 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

现在我们可以把bcache磁盘当做正常的磁盘使用了.

使用bcache磁盘来部署kolla ceph

  • 准备kolla ceph 磁盘
sudo sgdisk --zap-all -- /dev/bcache0
sudo sgdisk --zap-all -- /dev/bcache1

sudo /sbin/parted  /dev/bcache0  -s  -- mklabel  gpt  mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1  1 -1
sudo /sbin/parted  /dev/bcache1  -s  -- mklabel  gpt  mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO2  1 -1

使用我的commit可部署成功.

  • 如果使用kolla和kolla-ansible的原有代码去部署, 会报个错误:
"+ sudo -E kolla_set_configs\n
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json\n
INFO:__main__:Validating config file\n
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS\n
INFO:__main__:Copying service configuration files\n
INFO:__main__:Copying /var/lib/kolla/config_files/ceph.conf to /etc/ceph/ceph.conf\n
INFO:__main__:Setting permission for /etc/ceph/ceph.conf\n
INFO:__main__:Copying /var/lib/kolla/config_files/ceph.client.admin.keyring to /etc/ceph/ceph.client.admin.keyring\n
INFO:__main__:Setting permission for /etc/ceph/ceph.client.admin.keyring\n
INFO:__main__:Writing out command to execute\n
++ cat /run_command\n
+ CMD='/usr/bin/ceph-osd -f  --public-addr 10.34.135.160 --cluster-addr 10.34.135.160'\n
+ ARGS=\n
+ [[ ! -n '' ]]\n
+ . kolla_extend_start\n
++ [[ ! -d /var/log/kolla/ceph ]]\n
+++ stat -c %a /var/log/kolla/ceph\n
++ [[ 2755 != \\7\\5\\5 ]]\n
++ chmod 755 /var/log/kolla/ceph\n
++ [[ -n 0 ]]\n
++ CEPH_JOURNAL_TYPE_CODE=45B0969E-9B03-4F30-B4C6-B4B80CEFF106\n
++ CEPH_OSD_TYPE_CODE=4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D\n
++ CEPH_OSD_BS_WAL_TYPE_CODE=0FC63DAF-8483-4772-8E79-3D69D8477DE4\n
++ CEPH_OSD_BS_DB_TYPE_CODE=CE8DF73C-B89D-45B0-AD98-D45332906d90\n
++ ceph quorum_status\n
++ [[ False == \\F\\a\\l\\s\\e ]]\n
++ [[ bluestore == \\b\\l\\u\\e\\s\\t\\o\\r\\e ]]\n
++ [[ /dev/bcache0 =~ /dev/loop ]]\n
++ sgdisk --zap-all -- /dev/bcache01\n
Problem opening /dev/bcache01 for reading! Error is 2.\n
The specified file does not exist!\n
Problem opening '' for writing! Program will now terminate.\n
Warning! MBR not overwritten! Error is 2!\n",

从日志中可以看到kolla识别的磁盘如下:

{
            "bs_blk_device": "",
            "bs_blk_label": "",
            "bs_blk_partition_num": "",
            "bs_db_device": "",
            "bs_db_label": "",
            "bs_db_partition_num": "",
            "bs_wal_device": "",
            "bs_wal_label": "",
            "bs_wal_partition_num": "",
            "device": "/dev/bcache0",
            "external_journal": false,
            "fs_label": "",
            "fs_uuid": "",
            "journal": "",
            "journal_device": "",
            "journal_num": 0,
            "partition": "/dev/bcache0",
            "partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1",
            "partition_num": "1"
        }

出错的原因就是这段代码(kolla/docker/ceph/ceph-osd/extend_start.sh):

if [[ "${OSD_BS_DEV}" =~ "/dev/loop" ]]; then
    sgdisk --zap-all -- "${OSD_BS_DEV}""p${OSD_BS_PARTNUM}"
else
    sgdisk --zap-all -- "${OSD_BS_DEV}""${OSD_BS_PARTNUM}"
fi

kolla的代码中只有当设备是/dev/loop才给子分区前面加p,而bcache0的第一个分区是bcache0p1, kolla只能处理成而bcache01,所以会出现这个错误.

卸载bcache磁盘

#删除后端设备
echo 1 > /sys/block/bcache/bcache/stop

# 删除cache设备
echo 1 > /sys/fs/bcache//unregister

ps: 注意删除顺序, 如果先删除了cache设备,而没有停止绑定的后端设备, 则cache设备会自动恢复

多路径磁盘

节点 磁盘 用途 IP
ceph-node1 sdb,sdc,sdd 目标节点 192.168.10.11
ceph-node2 sdb,sdc,sdd 目标节点 192.168.10.12
ceph-node3 sdb,sdc 目标节点 192.168.10.13
ceph-node4 sdb,sdc,sdd 源节点, 双网卡 192.168.10.14/192.168.11.14

源节点初始化

  • 安装相关包
yum install targetd targetcli -y

systemctl enable target && systemctl start target
  • 准备逻辑卷
sudo sgdisk --zap-all -- /dev/sdb
sudo sgdisk --zap-all -- /dev/sdc
sudo sgdisk --zap-all -- /dev/sdd

pvcreate /dev/sdb
vgcreate vg00 /dev/sdb
lvcreate -l 100%free -n lv00 vg00

pvcreate /dev/sdc
vgcreate vg01 /dev/sdc
lvcreate -l 100%free -n lv01 vg01

pvcreate /dev/sdd
vgcreate vg02 /dev/sdd
lvcreate -l 100%free -n lv02 vg02
  • 查看逻辑卷
[root@ceph-node3 irteamsu]# lvs
  LV   VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root centos -wi-ao----  45.99g                                                    
  swap centos -wi-ao----   3.00g                                                    
  lv00 vg00   -wi-a----- <50.00g                                                    
  lv01 vg01   -wi-a----- <50.00g                                                    
  lv02 vg02   -wi-a----- <50.00g
  • 进入targetcli
[root@ceph-node3 irteamsu]# targetcli
targetcli shell version 2.1.fb46
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/>
  • 创建多路径磁盘
/backstores/block create disk0 /dev/vg00/lv00
iscsi/ create iqn.2017-05.con.benet:disk0
/iscsi/iqn.2017-05.con.benet:disk0/tpg1/acls create iqn.2017-05.com.benet:192.168.10.11
/iscsi/iqn.2017-05.con.benet:disk0/tpg1/luns create /backstores/block/disk0


/backstores/block create disk1 /dev/vg01/lv01
iscsi/ create iqn.2017-05.con.benet:disk1
/iscsi/iqn.2017-05.con.benet:disk1/tpg1/acls create iqn.2017-05.com.benet:192.168.10.12
/iscsi/iqn.2017-05.con.benet:disk1/tpg1/luns create /backstores/block/disk1

/backstores/block create disk2 /dev/vg02/lv02
iscsi/ create iqn.2017-05.con.benet:disk2
/iscsi/iqn.2017-05.con.benet:disk2/tpg1/acls create iqn.2017-05.com.benet:192.168.10.13
/iscsi/iqn.2017-05.con.benet:disk2/tpg1/luns create /backstores/block/disk2
  • 查看
/> ls
o- / ......................................................................................................................... [...]
  o- backstores .............................................................................................................. [...]
  | o- block .................................................................................................. [Storage Objects: 3]
  | | o- disk0 ..................................................................... [/dev/vg00/lv00 (50.0GiB) write-thru activated]
  | | | o- alua ................................................................................................... [ALUA Groups: 1]
  | | |   o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
  | | o- disk1 ..................................................................... [/dev/vg01/lv01 (50.0GiB) write-thru activated]
  | | | o- alua ................................................................................................... [ALUA Groups: 1]
  | | |   o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
  | | o- disk2 ..................................................................... [/dev/vg02/lv02 (50.0GiB) write-thru activated]
  | |   o- alua ................................................................................................... [ALUA Groups: 1]
  | |     o- default_tg_pt_gp ....................................................................... [ALUA state: Active/optimized]
  | o- fileio ................................................................................................. [Storage Objects: 0]
  | o- pscsi .................................................................................................. [Storage Objects: 0]
  | o- ramdisk ................................................................................................ [Storage Objects: 0]
  o- iscsi ............................................................................................................ [Targets: 3]
  | o- iqn.2017-05.con.benet:disk0 ....................................................................................... [TPGs: 1]
  | | o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
  | |   o- acls .......................................................................................................... [ACLs: 1]
  | |   | o- iqn.2017-05.com.benet:192.168.10.11 .................................................................. [Mapped LUNs: 1]
  | |   |   o- mapped_lun0 ................................................................................. [lun0 block/disk0 (rw)]
  | |   o- luns .......................................................................................................... [LUNs: 1]
  | |   | o- lun0 ................................................................ [block/disk0 (/dev/vg00/lv00) (default_tg_pt_gp)]
  | |   o- portals .................................................................................................... [Portals: 1]
  | |     o- 0.0.0.0:3260 ..................................................................................................... [OK]
  | o- iqn.2017-05.con.benet:disk1 ....................................................................................... [TPGs: 1]
  | | o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
  | |   o- acls .......................................................................................................... [ACLs: 1]
  | |   | o- iqn.2017-05.com.benet:192.168.10.12 .................................................................. [Mapped LUNs: 1]
  | |   |   o- mapped_lun0 ................................................................................. [lun0 block/disk1 (rw)]
  | |   o- luns .......................................................................................................... [LUNs: 1]
  | |   | o- lun0 ................................................................ [block/disk1 (/dev/vg01/lv01) (default_tg_pt_gp)]
  | |   o- portals .................................................................................................... [Portals: 1]
  | |     o- 0.0.0.0:3260 ..................................................................................................... [OK]
  | o- iqn.2017-05.con.benet:disk2 ....................................................................................... [TPGs: 1]
  |   o- tpg1 ............................................................................................... [no-gen-acls, no-auth]
  |     o- acls .......................................................................................................... [ACLs: 1]
  |     | o- iqn.2017-05.com.benet:192.168.10.13.................................................................. [Mapped LUNs: 1]
  |     |   o- mapped_lun0 ................................................................................. [lun0 block/disk2 (rw)]
  |     o- luns .......................................................................................................... [LUNs: 1]
  |     | o- lun0 ................................................................ [block/disk2 (/dev/vg02/lv02) (default_tg_pt_gp)]
  |     o- portals .................................................................................................... [Portals: 1]
  |       o- 0.0.0.0:3260 ..................................................................................................... [OK]
  o- loopback ......................................................................................................... [Targets: 0]

目标节点建立多路径

  • 安装包及配置
yum -y install iscsi-initiator-utils

# 配置InitiatorName, 以ceph-node1为例
vi /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2017-05.com.benet:192.168.10.11

systemctl enable iscsi && systemctl start iscsi
  • 扫描设备并展示设备
[root@ceph-node1 irteamsu]# iscsiadm -m discovery -t st -p 192.168.10.14
192.168.10.14:3260,1 iqn.2017-05.con.benet:disk0
192.168.10.14:3260,1 iqn.2017-05.con.benet:disk1
192.168.10.14:3260,1 iqn.2017-05.con.benet:disk2
[root@ceph-node1 irteamsu]# iscsiadm -m discovery -t st -p 192.168.11.14
192.168.11.14:3260,1 iqn.2017-05.con.benet:disk0
192.168.11.14:3260,1 iqn.2017-05.con.benet:disk1
192.168.11.14:3260,1 iqn.2017-05.con.benet:disk2
  • 遇到问题:
# 3.10.0-327.el7.x86_64内核的节点配置后扫描设备报错
[root@ceph-node3 ~]# iscsiadm -m discovery -t st -p 192.168.10.14
iscsiadm: Cannot perform discovery. Invalid Initiatorname.
iscsiadm: Could not perform SendTargets discovery: invalid parameter

重启后解决
  • 连接设备
# 以ceph-node1为例
iscsiadm -m node -T iqn.2017-05.con.benet:disk0 -p 192.168.10.14 --op update -n node.startup -v automatic
iscsiadm -m node -T iqn.2017-05.con.benet:disk0 -p 192.168.11.14 --op update -n node.startup -v automatic
  • 查看网络磁盘
Disk /dev/sde: 53.7 GB, 53682896896 bytes, 104849408 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4194304 bytes


Disk /dev/sdf: 53.7 GB, 53682896896 bytes, 104849408 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4194304 bytes
  • 配置多路径
yum install device-mapper-multipath -y
systemctl enable multipathd.service && systemctl restart multipathd.service

vi /etc/multipath.conf
blacklist {
    devnode "^sda"
}
defaults {
    user_friendly_names yes
    path_grouping_policy multibus
    failback immediate
    no_path_retry fail
}
  • 多路径不能自动识别问题(在4.20.2-1.el7.elrepo.x86_64内核上出现)
[root@ceph-node2 ~]# multipath -v3
...
Apr 28 11:11:16 | mpathc: pgfailback = -2 (config file default)
Apr 28 11:11:16 | mpathc: pgpolicy = multibus (config file default)
Apr 28 11:11:16 | mpathc: selector = service-time 0 (internal default)
Apr 28 11:11:16 | mpathc: features = 0 (config file default)
Apr 28 11:11:16 | mpathc: hwhandler = 0 (internal default)
Apr 28 11:11:16 | mpathc: rr_weight = 1 (internal default)
Apr 28 11:11:16 | mpathc: minio = 1 rq (config file default)
Apr 28 11:11:16 | mpathc: no_path_retry = -1 (config file default)
Apr 28 11:11:16 | mpathc: pg_timeout = NONE (internal default)
Apr 28 11:11:16 | mpathc: fast_io_fail_tmo = 5 (config file default)
Apr 28 11:11:16 | mpathc: retain_attached_hw_handler = 1 (config file default)
Apr 28 11:11:16 | mpathc: deferred_remove = 1 (config file default)
Apr 28 11:11:16 | delay_watch_checks = DISABLED (internal default)
Apr 28 11:11:16 | delay_wait_checks = DISABLED (internal default)
Apr 28 11:11:16 | skip_kpartx = 1 (config file default)
Apr 28 11:11:16 | unpriv_sgio = 1 (config file default)
Apr 28 11:11:16 | mpathc: remove queue_if_no_path from '0'
Apr 28 11:11:16 | mpathc: assembled map [0 0 1 1 service-time 0 2 1 8:64 1 8:80 1]
Apr 28 11:11:16 | mpathc: set ACT_CREATE (map does not exist)
Apr 28 11:11:16 | ghost_delay = -1 (config file default)
Apr 28 11:11:16 | mpathc: domap (0) failure for create/reload map
Apr 28 11:11:16 | mpathc: ignoring map
Apr 28 11:11:16 | const prioritizer refcount 2
Apr 28 11:11:16 | directio checker refcount 2
Apr 28 11:11:16 | const prioritizer refcount 1
Apr 28 11:11:16 | directio checker refcount 1
Apr 28 11:11:16 | unloading const prioritizer
Apr 28 11:11:16 | unloading directio checker

查了一下,主要原因是新版的多路径插件需要启用scsi-mq:

https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/7/html/7.2_release_notes/storage

# 如果需要使用scsi-mq,需要添加scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y到内核启动参数,提升盘读写性能

# 在grub.cfg中找到对应的内核, 加入参数scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y
vi /boot/grub2/grub.cfg

### BEGIN /etc/grub.d/10_linux ###
menuentry 'CentOS Linux (4.20.2-1.el7.elrepo.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.el7.x86_64-advanced-be679149-35c2-4143-b8c4-34a594f1b15f' {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_msdos
        insmod xfs
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'  9f370650-6e47-4d78-b54d-420c0068cf6b
        else
          search --no-floppy --fs-uuid --set=root 9f370650-6e47-4d78-b54d-420c0068cf6b
        fi
        linux16 /vmlinuz-4.20.2-1.el7.elrepo.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y
        initrd16 /initramfs-4.20.2-1.el7.elrepo.x86_64.img
}

# 然后需要reboot机器
reboot
# 检查是否生效
[root@ceph-node2 ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.20.2-1.el7.elrepo.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y
[root@ceph-node2 ~]# cat /sys/module/scsi_mod/parameters/use_blk_mq
Y

重新执行multipath -v3后出现多路径磁盘

  • 对路径的磁盘进行初始化,即可用来部署ceph(使用我的commit)
sudo sgdisk --zap-all -- /dev/mapper/mpatha
sudo /sbin/parted  /dev/mapper/mpatha  -s  -- mklabel  gpt  mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1  1 -1

ps: 多路径磁盘和bcache磁盘都会使用p + number的子分区后缀, kolla的代码并不支持, 然后在kolla/docker/kolla-toolbox/find_disks.py中也不支持对发现多路径磁盘的专门逻辑.

你可能感兴趣的:(kolla-ceph 4:支持bcache设备和iscsi及多路径设备)