Rook是一个自管理的分布式存储编排系统,可以为Kubernetes提供便利的存储解决方案。Rook本身并不提供存储,而是在kubernetes和存储系统之间提供适配层,简化存储系统的部署与维护工作。目前,rook支持的存储系统包括:Ceph、CockroachDB、Cassandra、EdgeFS、Minio、NFS,其中Ceph为Stable状态,其余均为Alpha。本文仅介绍Ceph相关内容。
Rook由Operator和Cluster两部分组成:
下图是Rook的体系结构图,Operator启动之后,首先创建Agent和Discover容器,负责监视和管理各个节点上存储资源。然后创建Cluster,Cluster是创建Operator时定义的CRD。Operator根据Cluster的配置信息启动Ceph的相关容器。存储集群启动之后,使用kubernetes元语创建PVC为应用容器所用。
[root@master-0 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master-0 Ready master 24m v1.13.0 172.16.7.11 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://18.9.2
worker-0 Ready worker 23m v1.13.0 172.16.7.12 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://18.9.2
worker-1 Ready worker 23m v1.13.0 172.16.7.13 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://18.9.2
[root@master-0 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 20G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 19G 0 part
├─centos-root 253:0 0 17G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm
sdb 8:16 0 20G 0 disk
sr0 11:0 1 1024M 0 rom
每个节点两块磁盘,第二块sdb用来做ceph的数据盘
[root@master-0 ~]# kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/cluster/examples/kubernetes/ceph/operator.yaml
[root@master-0 ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
<snip>
rook-ceph-system rook-ceph-agent-4tf9h 1/1 Running 0 17m
rook-ceph-system rook-ceph-agent-4zg9t 1/1 Running 0 17m
rook-ceph-system rook-ceph-agent-r82n7 1/1 Running 0 17m
rook-ceph-system rook-ceph-operator-b996864dd-zbn29 1/1 Running 0 18m
rook-ceph-system rook-discover-88zkc 1/1 Running 0 17m
rook-ceph-system rook-discover-ffsns 1/1 Running 0 17m
rook-ceph-system rook-discover-wt942 1/1 Running 0 17m
[root@master-0 ~]# kubectl get ds --all-namespaces
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
<snip>
rook-ceph-system rook-ceph-agent 3 3 3 3 3 <none> 17m
rook-ceph-system rook-discover 3 3 3 3 3 <none> 17m
需要重点关注的是deployment:rook-ceph-operator,rook-ceph-operator启动之后会以DS启动Agent和Discover
### 设置label,标记哪些节点用来部署哪些应用 ###
[root@master-0 ~]# kubectl label node master-0 role=storage-node
node/master-0 labeled
[root@master-0 ~]# kubectl label node worker-0 role=storage-node
node/worker-0 labeled
[root@master-0 ~]# kubectl label node worker-1 role=storage-node
node/worker-1 labeled
[root@master-0 ~]# kubectl get nodes -L role
NAME STATUS ROLES AGE VERSION ROLE
master-0 Ready master 26m v1.13.0 storage-node
worker-0 Ready worker 25m v1.13.0 storage-node
worker-1 Ready worker 25m v1.13.0 storage-node
这里仅设置一个label,ceph的mon、ose、mgr都运行在这个label对应的节点上,当然也可以分别设置~
### cluster.yml ###
apiVersion: v1
kind: Namespace
metadata:
name: rook-ceph
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rook-ceph-osd
namespace: rook-ceph
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rook-ceph-mgr
namespace: rook-ceph
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-osd
namespace: rook-ceph
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: [ "get", "list", "watch", "create", "update", "delete" ]
---
# Aspects of ceph-mgr that require access to the system namespace
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-mgr-system
namespace: rook-ceph
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
---
# Aspects of ceph-mgr that operate within the cluster's namespace
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-mgr
namespace: rook-ceph
rules:
- apiGroups:
- ""
resources:
- pods
- services
verbs:
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs
verbs:
- get
- list
- watch
- create
- update
- delete
- apiGroups:
- ceph.rook.io
resources:
- "*"
verbs:
- "*"
---
# Allow the operator to create resources in this cluster's namespace
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-cluster-mgmt
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: rook-ceph-cluster-mgmt
subjects:
- kind: ServiceAccount
name: rook-ceph-system
namespace: rook-ceph-system
---
# Allow the osd pods in this namespace to work with configmaps
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-osd
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rook-ceph-osd
subjects:
- kind: ServiceAccount
name: rook-ceph-osd
namespace: rook-ceph
---
# Allow the ceph mgr to access the cluster-specific resources necessary for the mgr modules
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-mgr
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rook-ceph-mgr
subjects:
- kind: ServiceAccount
name: rook-ceph-mgr
namespace: rook-ceph
---
# Allow the ceph mgr to access the rook system resources necessary for the mgr modules
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-mgr-system
namespace: rook-ceph-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rook-ceph-mgr-system
subjects:
- kind: ServiceAccount
name: rook-ceph-mgr
namespace: rook-ceph
---
# Allow the ceph mgr to access cluster-wide resources necessary for the mgr modules
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rook-ceph-mgr-cluster
namespace: rook-ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: rook-ceph-mgr-cluster
subjects:
- kind: ServiceAccount
name: rook-ceph-mgr
namespace: rook-ceph
---
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
# The container image used to launch the Ceph daemon pods (mon, mgr, osd, mds, rgw).
# v12 is luminous, v13 is mimic, and v14 is nautilus.
# RECOMMENDATION: In production, use a specific version tag instead of the general v13 flag, which pulls the latest release and could result in different
# versions running within the cluster. See tags available at https://hub.docker.com/r/ceph/ceph/tags/.
image: 192.168.101.88:5000/ceph/ceph:v13.1
# Whether to allow unsupported versions of Ceph. Currently only luminous and mimic are supported.
# After nautilus is released, Rook will be updated to support nautilus.
# Do not set to true in production.
allowUnsupported: false
# The path on the host where configuration files will be persisted. If not specified, a kubernetes emptyDir will be created (not recommended).
# Important: if you reinstall the cluster, make sure you delete this directory from each host or else the mons will fail to start on the new cluster.
# In Minikube, the '/data' directory is configured to persist across reboots. Use "/data/rook" in Minikube environment.
dataDirHostPath: /var/lib/rook
# set the amount of mons to be started
mon:
count: 3
allowMultiplePerNode: true
# enable the ceph dashboard for viewing cluster status
dashboard:
enabled: true
# serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
# urlPrefix: /ceph-dashboard
# serve the dashboard at the given port.
# port: 8443
# serve the dashboard using SSL
# ssl: true
network:
# toggle to use hostNetwork
hostNetwork: false
rbdMirroring:
# The number of daemons that will perform the rbd mirroring.
# rbd mirroring must be configured with "rbd mirror" from the rook toolbox.
workers: 0
# To control where various services will be scheduled by kubernetes, use the placement configuration sections below.
# The example under 'all' would have all services scheduled on kubernetes nodes labeled with 'role=storage-node' and
# tolerate taints with a key of 'storage-node'.
placement:
all:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: role
operator: In
values:
- storage-node
# podAffinity:
# podAntiAffinity:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
# The above placement information can also be specified for mon, osd, and mgr components
# mon:
# osd:
# mgr:
resources:
# The requests and limits set here, allow the mgr pod to use half of one CPU core and 1 gigabyte of memory
# mgr:
# limits:
# cpu: "500m"
# memory: "1024Mi"
# requests:
# cpu: "500m"
# memory: "1024Mi"
# The above example requests/limits can also be added to the mon and osd components
# mon:
# osd:
storage: # cluster level storage configuration and selection
useAllNodes: false
useAllDevices: false
deviceFilter:
location:
config:
# The default and recommended storeType is dynamically set to bluestore for devices and filestore for directories.
# Set the storeType explicitly only if it is required not to use the default.
# storeType: bluestore
databaseSizeMB: "1024" # this value can be removed for environments with normal sized disks (100 GB or larger)
# journalSizeMB: "1024" # this value can be removed for environments with normal sized disks (20 GB or larger)
osdsPerDevice: "1" # this value can be overridden at the node or device level
# Cluster level list of directories to use for storage. These values will be set for all nodes that have no `directories` set.
# directories:
# - path: /rook/storage-dir
# Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
# nodes below will be used as storage resources. Each node's 'name' field should match their 'kubernetes.io/hostname' label.
nodes:
- name: "master-0"
devices:
- name: "sdb"
- name: "worker-0"
devices:
- name: "sdb"
- name: "worker-1"
devices:
- name: "sdb"
# - name: "172.17.4.101"
# directories: # specific directories to use for storage can be specified for each node
# - path: "/rook/storage-dir"
# resources:
# limits:
# cpu: "500m"
# memory: "1024Mi"
# requests:
# cpu: "500m"
# memory: "1024Mi"
# - name: "172.17.4.201"
# devices: # specific devices to use for storage can be specified for each node
# - name: "sdb"
# - name: "nvme01" # multiple osds can be created on high performance devices
# config:
# osdsPerDevice: "5"
# config: # configuration can be specified at the node level which overrides the cluster level config
# storeType: filestore
# - name: "172.17.4.301"
# deviceFilter: "^sd."
CephCluster中需要重点注意啊的几个地方:
- spec.dataDirHostPath:存放rook元数据,确保在服务器重启后能正常运行,重新部署时 需要手动删除遗留文件
- spec.storage.useAllNodes:是否根据配置将所有节点用于存储,如果指定了nodes,该值必须设置为false
- spec.storage.config:根据实际磁盘大小,可以删除config下的某些配置
- spec.storage.nodes:分别设置各个节点的存储路径,可以说磁盘或者目录
[root@master-0 ~]# kubectl apply -f cluster.yml
namespace/rook-ceph created
serviceaccount/rook-ceph-osd created
serviceaccount/rook-ceph-mgr created
role.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-system created
role.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
cephcluster.ceph.rook.io/rook-ceph created
[root@master-0 ~]# kubectl get pods --all-namespaces
<snip>
rook-ceph-system rook-ceph-agent-4tf9h 1/1 Running 0 19m
rook-ceph-system rook-ceph-agent-4zg9t 1/1 Running 0 19m
rook-ceph-system rook-ceph-agent-r82n7 1/1 Running 0 19m
rook-ceph-system rook-ceph-operator-b996864dd-zbn29 1/1 Running 0 20m
rook-ceph-system rook-discover-88zkc 1/1 Running 0 19m
rook-ceph-system rook-discover-ffsns 1/1 Running 0 19m
rook-ceph-system rook-discover-wt942 1/1 Running 0 19m
rook-ceph rook-ceph-mgr-a-7b9667498-j4bdx 1/1 Running 0 16m
rook-ceph rook-ceph-mon-a-749779c884-bqm9b 1/1 Running 0 17m
rook-ceph rook-ceph-mon-b-b97f6cbdb-hmrln 1/1 Running 0 17m
rook-ceph rook-ceph-mon-c-67d7dcc89f-pcjpn 1/1 Running 0 16m
rook-ceph rook-ceph-osd-0-86dff67f75-mvjkc 1/1 Running 0 15m
rook-ceph rook-ceph-osd-1-6cdd46dcdc-p62zq 1/1 Running 0 15m
rook-ceph rook-ceph-osd-2-c7b97f7bf-65r6d 1/1 Running 0 15m
rook-ceph rook-ceph-osd-prepare-master-0-vvp6c 0/2 Completed 0 16m
rook-ceph rook-ceph-osd-prepare-worker-0-spfbn 0/2 Completed 0 16m
rook-ceph rook-ceph-osd-prepare-worker-1-blmsz 0/2 Completed 0 16m
这一步如果有问题,可以查看operator的日志~
### rook会自动将磁盘分区 ###
[root@master-0 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 20G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 19G 0 part
├─centos-root 253:0 0 17G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm
sdb 8:16 0 20G 0 disk
├─sdb1 8:17 0 576M 0 part
├─sdb2 8:18 0 1G 0 part
└─sdb3 8:19 0 18.4G 0 part
sr0 11:0 1 1024M 0 rom
- 给rook使用的sdb实现不要自己分区,rook会自行分区
- 从事例配置来看,除了device外,rook还可以配置directory做存储,这里就先不尝试了~
[root@master-0 ~]# kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
<snip>
rook-ceph rook-ceph-mgr ClusterIP 10.100.35.42 <none> 9283/TCP 19m
rook-ceph rook-ceph-mgr-dashboard NodePort 10.109.82.52 <none> 8443/TCP 19m
rook-ceph rook-ceph-mon-a ClusterIP 10.107.76.183 <none> 6789/TCP 20m
rook-ceph rook-ceph-mon-b ClusterIP 10.99.3.203 <none> 6789/TCP 20m
rook-ceph rook-ceph-mon-c ClusterIP 10.97.73.46 <none> 6789/TCP 19m
### 修改dashboard的访问方式为NodePort ###
[root@master-0 ~]# kubectl edit svc -n rook-ceph rook-ceph-mgr-dashboard
[root@master-0 ~]# kubectl get svc -n rook-ceph rook-ceph-mgr-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-mgr-dashboard NodePort 10.109.82.52 <none> 8443:30372/TCP 20m
### 获得dashboard的登录密码,用户为admin ###
[root@master-0 ~]# kubectl get secrets -n rook-ceph rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d
eXnScTg7nm
浏览器登录:https://172.16.7.11:30372
### 块存 ###
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
### sc ###
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
provisioner: ceph.rook.io/block
parameters:
blockPool: replicapool
# The value of "clusterNamespace" MUST be the same as the one in which your rook cluster exist
clusterNamespace: rook-ceph
# Specify the filesystem type of the volume. If not specified, it will use `ext4`.
fstype: xfs
# Optional, default reclaimPolicy is "Delete". Other options are: "Retain", "Recycle" as documented in https://kubernetes.io/docs/concepts/storage/storage-classes/
reclaimPolicy: Retain
### 使用busybox测试 ###
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ceph-block-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: rook-ceph-block
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: busybox
name: busybox
spec:
replicas: 1
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
spec:
containers:
- name: busybox
image: busybox
command: ["sh", "-c", "sleep 3600"]
volumeMounts:
- name: volume
mountPath: /volume
volumes:
- name: volume
persistentVolumeClaim:
claimName: ceph-block-volume
[root@master-0 ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ceph-block-volume Bound pvc-066408f1-5059-11e9-95ad-005056260373 1Gi RWO rook-ceph-block 23s
[root@master-0 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox-578db44c4-vwszl 1/1 Running 0 25s
[root@master-0 ~]# kubectl exec -ti busybox-578db44c4-vwszl sh
/ # df -h
Filesystem Size Used Available Use% Mounted on
overlay 17.0G 4.3G 12.6G 26% /
tmpfs 64.0M 0 64.0M 0% /dev
tmpfs 1.8G 0 1.8G 0% /sys/fs/cgroup
/dev/rbd0 1014.0M 32.3M 981.7M 3% /volume
<snip>
文件存储和对象存储和上面类型,先创建CRD对象,然后使用。但是文件系统目前还不支持使用StorageClass,当使用RWX时,不太好用,后面考虑使用 ganesha 导出NFS服务实现RWX功能。
https://rook.io/docs/rook/v0.9/