公司因项目需要使用K8S部署ETCD集群供其他业务调用,网上搜索了解了下,一般K8S搭建ETCD集群大部分都是使用Etcd Operator来搭建。但是公司的项目运行在离线ARM架构平台,直接使用网上Etcd Operator代码,他们提供的镜像都是x86_64架构,经过Opeartor编译等尝试,最后都以失败告终。最后在Github上面找到一位大佬的开源代码,经过一通折腾,总算成功部署上了,故而博文记录,用于备忘
minikube
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.3
apiVersion: v1
kind: Service
metadata:
name: etcdsrv
spec:
clusterIP: None
publishNotReadyAddresses: true
ports:
- name: etcd-client-port
protocol: TCP
port: 2379
targetPort: 2379
selector:
name: etcd-operator
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: etcd0
spec:
replicas: 1
selector:
matchLabels:
name: etcd-operator
serviceName: etcdsrv
template:
metadata:
labels:
name: etcd-operator
spec:
containers:
- name: app
image: quay.io/coreos/etcd:v3.5.9
imagePullPolicy: Always
volumeMounts:
- mountPath: /data/etcd_data
name: etcd-volume
command:
- /usr/local/bin/etcd
- --data-dir
- /data/etcd_data
- --auto-compaction-retention
- '1'
- --quota-backend-bytes
- '8589934592'
- --listen-client-urls
- http://0.0.0.0:2379
- --advertise-client-urls
- http://etcd0-0.etcdsrv:2379
- --listen-peer-urls
- http://0.0.0.0:2380
- --initial-advertise-peer-urls
- http://etcd0-0.etcdsrv:2380
- --initial-cluster-token
- etcd-cluster
- --initial-cluster
- etcd0=http://etcd0-0.etcdsrv:2380,etcd1=http://etcd1-0.etcdsrv:2380,etcd2=http://etcd2-0.etcdsrv:2380
- --initial-cluster-state
- new
- --enable-pprof
- --election-timeout
- '5000'
- --heartbeat-interval
- '250'
- --name
- etcd0
- --logger
- zap
volumes:
- name: etcd-volume
hostPath:
path: /var/tmp/etcd0
type: Directory
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: etcd1
spec:
replicas: 1
selector:
matchLabels:
name: etcd-operator
serviceName: etcdsrv #该名字需要和Service的name字段一致
template:
metadata:
labels:
name: etcd-operator
spec:
containers:
- name: app
image: quay.io/coreos/etcd:v3.5.9
imagePullPolicy: Always
volumeMounts:
- mountPath: /data/etcd_data #该目录名称注意要对应修改
name: etcd-volume
command:
- /usr/local/bin/etcd
- --data-dir
- /data/etcd_data
- --auto-compaction-retention
- '1'
- --quota-backend-bytes
- '8589934592'
- --listen-client-urls
- http://0.0.0.0:2379
- --advertise-client-urls
- http://etcd1-0.etcdsrv:2379
- --listen-peer-urls
- http://0.0.0.0:2380
- --initial-advertise-peer-urls
- http://etcd1-0.etcdsrv:2380
- --initial-cluster-token
- etcd-cluster
- --initial-cluster
- etcd0=http://etcd0-0.etcdsrv:2380,etcd1=http://etcd1-0.etcdsrv:2380,etcd2=http://etcd2-0.etcdsrv:2380
- --initial-cluster-state
- new
- --enable-pprof
- --election-timeout
- '5000'
- --heartbeat-interval
- '250'
- --name
- etcd1
- --logger
- zap
volumes:
- name: etcd-volume
hostPath:
path: /var/tmp/etcd1
type: Directory
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: etcd2
spec:
replicas: 1
selector:
matchLabels:
name: etcd-operator
serviceName: etcdsrv
template:
metadata:
labels:
name: etcd-operator
spec:
containers:
- name: app
image: quay.io/coreos/etcd:v3.5.9
imagePullPolicy: Always
volumeMounts:
- mountPath: /data/etcd_data
name: etcd-volume
command:
- /usr/local/bin/etcd
- --data-dir
- /data/etcd_data
- --auto-compaction-retention
- '1'
- --quota-backend-bytes
- '8589934592'
- --listen-client-urls
- http://0.0.0.0:2379
- --advertise-client-urls
- http://etcd2-0.etcdsrv:2379
- --listen-peer-urls
- http://0.0.0.0:2380
- --initial-advertise-peer-urls
- http://etcd2-0.etcdsrv:2380
- --initial-cluster-token
- etcd-cluster
- --initial-cluster
- etcd0=http://etcd0-0.etcdsrv:2380,etcd1=http://etcd1-0.etcdsrv:2380,etcd2=http://etcd2-0.etcdsrv:2380
- --initial-cluster-state
- new
- --enable-pprof
- --election-timeout
- '5000'
- --heartbeat-interval
- '250'
- --name
- etcd2
- --logger
- zap
volumes:
- name: etcd-volume
hostPath:
path: /var/tmp/etcd2
type: Directory
Q: 第一次因为不熟悉,在大佬的基础上修改了service.yaml的name字段,然后部署后,各个Pod找不到对应主机导致集群失败
A:
Statefulset的serviceName要和service.yaml文件中的name字段一致才会生成对应规则的pod域名。Statefulset Pod域名的规则如下
$(statefulset-name)-$(ordinal-index).$(service-name).$(namespace).svc.cluster.local
按照规则域名,要访问etcd0的Pod,全域名需要使用
etcd0-0.etcdsrv.default.svc.cluster.local
因为k8s内部的/etc/resolv.conf一般会包含对应的search域,所以同一命名空间的访问域名可以简写成"etcd0-0.etcdsrv";不同命名空间的需要带上对应的命名空间,也就是"etcd0-0.etcdsrv.命名空间"
下面是其中一个Pod的/etc/resolv.conf
#default.svc.cluster.local中就是default就是当前命名空间,其他命名空间同理
root@ubuntutest:/# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local localdomain
options ndots:5
root@ubuntutest:/#
Q: 部署好服务后,执行kubctl get pod 发现看不到对应的服务信息
A: 因为kubectl get pod默认是获取default命名空间的服务信息,因为在实际部署的时候,yaml文件里面指定了命名空间,所以执行kubectl的时候,需要带上命名空间
所以正确的命令应该是
[root@192 ~]#kubectl -n 命名空间 get pod
Q: 修改了cluster.yaml文件的数据存储路径,然后导致容器启动失败,启动失败的原因是找不到目录
A: 因为我是使用minikube 部署的,minikube是启动了一个k8s的容器做节点,部署时候的路径映射,是映射minikube 容器里的路径,而不是物理机,故而需要执行下面操作
minikube ssh #进入Minikube容器
mkdir -p /{etcd0,etcd1,etcd2} #创建目录
Q: minikube start 因为镜像拉取超时导致启动失败
A: 对于国内用户,执行minikube start命令的时候,加上–image-mirror-country=‘cn’,不然默认会拉取国外的仓库,从而导致启动失败,完整命令如下
minikube start --image-mirror-country='cn'
k8s-club/etcd-operator
k8s-StatefulSet
minikube拉取镜像失败、安装慢、无法启动apiserver,一条命令解决