《kubernetes-1.8.0》17-examples-Running ZooKeeper

《kubernetes-1.8.0》17-examples-Running ZooKeeper

《kubernetes 1.8.0 测试环境安装部署》

时间:2017-12-11

一、环境及基础知识准备:

在做zookeeper这个样例之前,至少需要明白 StatefulSets, PodDisruptionBudgets,PodAntiAffinity,pv,pvc,storageclass等这些概念:
相关官网的doc:

  • StatefulSets
  • PodDisruptionBudgets
  • PodAntiAffinity
  • Persistent Volumes
  • basic-stateful-set(tutorials)
  • Dynamic Volume Provisioning

后续再文档中我也会做适当的说明:

环境需求:

  1. 至少4个节点的k8s集群
  2. 每个节点至少2CPUS 4G 内存
  3. 需配置了Dynamic Volume Provisioning特性,即,需配置了storageclass
  4. 如果没有配置Dynamic Volume Provisioning,则在创建StatefulSets之前需要预先创建好对应的pvc(20G)并bound。(特别说明:statefulset 用的是template方式创建pvc,所以对应的pvc名字是固定的,采用statefulsetname-volumename-number方式命名,比如statefulset名为zk,volume名字为datadir,rc为3:则需提前创建3个30G的pvc,名字分别为zk-datadir-0、zk-datadir-1、zk-datadir-2)
    5.关于Dynamic Volume Provisioning特性,之前已经做了本例中做部分修改:—> 《kubernetes-1.8.0》15-addon-vSphere Cloud Provider

环境准备中,部分修改内容如下:

创建缺省的storageclass,即没有指定storageclassname的请求都从这个storageclass中划分资源:

创建缺省storageclass

default-sc.yaml

kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: default
  annotations: {
    "storageclass.kubernetes.io/is-default-class" : "true"
  }
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: zeroedthick
    datastore: local_datastore_47
reclaimPolicy: Retain
  • annotations: 需指定storageclass.kubernetes.io/is-default-classtrue,表明本sc 为 default sc:

开启apiserver支持:

在三台master上修改/etc/kubernetes/apiserver配置文件,ADMISSION_CONTROL字段中加入DefaultStorageClass字段:

重启apiserver:

$ systemctl daemon-reload
$ systemctl restart kube-apiserver

应用yaml:

$ kubectl create -f default-sc.yaml

查看所创建的StorageClass:

[root@node-131 zookeeper]# kubectl get sc
NAME                PROVISIONER
default (default)   kubernetes.io/vsphere-volume
fast                kubernetes.io/vsphere-volume
  • 此时应该看到创建的名字叫default的sc,后跟的(default)表明其为缺省sc;

至此基础环境大致完成了,这里多提醒一句,在做vsphere Volume Provisioning尽量使用共享数据存储,如果使用主机数据存储那必须保证所有虚拟机在同一个数据存储之上,否则后续会导致部分pod无法挂载pvc的情况出现;

二、zookeeper 环境搭建及部分配置解读:

zookeeper.yaml

apiVersion: v1
kind: Service
metadata:
  name: zk-hs
  labels:
    app: zk
spec:
  ports:
  - port: 2888
    name: server
  - port: 3888
    name: leader-election
  clusterIP: None
  selector:
    app: zk
---
apiVersion: v1
kind: Service
metadata:
  name: zk-cs
  labels:
    app: zk
spec:
  ports:
  - port: 2181
    name: client
  selector:
    app: zk
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: zk-pdb
spec:
  selector:
    matchLabels:
      app: zk
  maxUnavailable: 1
---
apiVersion: apps/v1beta2 # for versions before 1.8.0 use apps/v1beta1
kind: StatefulSet
metadata:
  name: zk
spec:
  selector:
    matchLabels:
      app: zk
  serviceName: zk-hs
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  podManagementPolicy: Parallel
  template:
    metadata:
      labels:
        app: zk
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                    - zk
              topologyKey: "kubernetes.io/hostname"
      containers:
      - name: kubernetes-zookeeper
        imagePullPolicy: Always
        image: "gcr.mirrors.ustc.edu.cn/google_containers/kubernetes-zookeeper:1.0-3.4.10"
        resources:
          requests:
            memory: "1Gi"
            cpu: "0.5"
        ports:
        - containerPort: 2181
          name: client
        - containerPort: 2888
          name: server
        - containerPort: 3888
          name: leader-election
        command:
        - sh
        - -c
        - "start-zookeeper \
          --servers=3 \
          --data_dir=/var/lib/zookeeper/data \
          --data_log_dir=/var/lib/zookeeper/data/log \
          --conf_dir=/opt/zookeeper/conf \
          --client_port=2181 \
          --election_port=3888 \
          --server_port=2888 \
          --tick_time=2000 \
          --init_limit=10 \
          --sync_limit=5 \
          --heap=512M \
          --max_client_cnxns=60 \
          --snap_retain_count=3 \
          --purge_interval=12 \
          --max_session_timeout=40000 \
          --min_session_timeout=4000 \
          --log_level=INFO"
        readinessProbe:
          exec:
            command:
            - sh
            - -c
            - "zookeeper-ready 2181"
          initialDelaySeconds: 10
          timeoutSeconds: 5
        livenessProbe:
          exec:
            command:
            - sh
            - -c
            - "zookeeper-ready 2181"
          initialDelaySeconds: 10
          timeoutSeconds: 5
        volumeMounts:
        - name: datadir
          mountPath: /var/lib/zookeeper
      securityContext:
        runAsUser: 1000
        fsGroup: 1000
  volumeClaimTemplates:
  - metadata:
      name: datadir
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi
  • 这个yaml文件中提供了4个部分的内容,两个service,一个PodDisruptionBudget,一个StatefulSet
  • clusterIP: None:表明是 Headless Service,statefulset用固定的hostname进行解析(podname.servicename.namespace.svc.cluster.local.),继而绕过clusterIP的虚地址负载均衡,直接访问到pod。
  • PodDisruptionBudget:是一个保护机制,其中确保了某个tag的pod实例的最小个数,因为想zk或者etcd这样的群集系统,为了防止管理员误操作或者autoscale缩容等使得pod实例数不满足最小条件,导致zk群集故障。比如该例子中设置标签是app: zk的pod最大不可用数maxUnavailable1。此时如果delete某一个pod是ok的,但是在这个pod恢复Running之前还想再delete一个则将被拒绝,因为….maxUnavailable1;
  • StatefulSet
    • podManagementPolicy: Parallel:表示同时启动或者终止所有的pod,缺省为OrderedReady表明必须按照顺序 0 ~ N-1启动
    • spec.affinity: 这个部分涉及亲和性部署,其中spec.affinity.podAntiAffinity:表明是反亲和性部署
    • requiredDuringSchedulingIgnoredDuringExecution 表明了是hard方式,也就是必须遵循反亲和性部署原则;后续labelSelector说明了只要看到某个节点上的pods中 app:这个标签中有 zk的都不在这个node上创建该pod。
    • volumeClaimTemplates:写的是一个pvc的申请模板,会自动的为每个pod申请10G的pvc;
    • 特别说明:如果没有设置缺省的StorageClass,需在volumeClaimTemplates中指定storageClassName:声明从哪个StorageClass中创建pvc。

加载yaml

$ kubectl apply -f zookeeper.yaml

查看pvc是否创建并bound

[root@node-131 zookeeper]# kubectl get pvc
NAME            STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
...
datadir-zk-0    Bound     pvc-fc2918cf-dd83-11e7-8e94-005056bc80ed   10Gi       RWO            default        18h
datadir-zk-1    Bound     pvc-fc2ac19e-dd83-11e7-8e94-005056bc80ed   10Gi       RWO            default        18h
datadir-zk-2    Bound     pvc-fc2c2889-dd83-11e7-8e94-005056bc80ed   10Gi       RWO            default        18h
...
  • 注意看pvc name的规律

查看pod运行情况

kubectl get pods -w -l app=zk
NAME      READY     STATUS    RESTARTS   AGE
zk-0      0/1       Pending   0          0s
zk-0      0/1       Pending   0         0s
zk-0      0/1       ContainerCreating   0         0s
zk-0      0/1       Running   0         19s
zk-0      1/1       Running   0         40s
zk-1      0/1       Pending   0         0s
zk-1      0/1       Pending   0         0s
zk-1      0/1       ContainerCreating   0         0s
zk-1      0/1       Running   0         18s
zk-1      1/1       Running   0         40s
zk-2      0/1       Pending   0         0s
zk-2      0/1       Pending   0         0s
zk-2      0/1       ContainerCreating   0         0s
zk-2      0/1       Running   0         19s
zk-2      1/1       Running   0         40s

直到zk-0 ~ zk-2 running完成后,基本搭建工作就算完成了:

三、测试:

查看主机名:

[root@node-131 zookeeper]# for i in 0 1 2; do kubectl exec zk-$i -- hostname; done
zk-0
zk-1
zk-2

查看三个pod的myid

[root@node-131 zookeeper]# for i in 0 1 2; do echo "myid zk-$i";kubectl exec zk-$i -- cat /var/lib/zookeeper/data/myid; done
myid zk-0
1
myid zk-1
2
myid zk-2
3

查看三个pod的FQDN名

[root@node-131 zookeeper]# for i in 0 1 2; do kubectl exec zk-$i -- hostname -f; done
zk-0.zk-hs.default.svc.cluster.local.
zk-1.zk-hs.default.svc.cluster.local.
zk-2.zk-hs.default.svc.cluster.local.
  • 这也是dns的解析记录,如果pod重新部署了这个记录名字不变只是改对应的地址

查看zookeeper的配置

[root@node-131 zookeeper]# kubectl exec zk-0 -- cat /opt/zookeeper/conf/zoo.cfg
#This file was autogenerated DO NOT EDIT
clientPort=2181
dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/data/log
tickTime=2000
initLimit=10
syncLimit=5
maxClientCnxns=60
minSessionTimeout=4000
maxSessionTimeout=40000
autopurge.snapRetainCount=3
autopurge.purgeInteval=12
server.1=zk-0.zk-hs.default.svc.cluster.local.:2888:3888
server.2=zk-1.zk-hs.default.svc.cluster.local.:2888:3888
server.3=zk-2.zk-hs.default.svc.cluster.local.:2888:3888
  • server.1 、server.2 、server.3 : 1、2、3分别是3台的myid值,=后是三个pod的FQDN(DNS可解析);

测试zookeeper组件功能

在一个节点上创建数据,从另一个节点读取:

zk-0上创建键值对hello/world

[root@node-131 zookeeper]# kubectl exec zk-0 zkCli.sh create /hello world 
Connecting to localhost:2181
...

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
Node already exists: /hello

zk-1上读取键值hello的value

[root@node-131 zookeeper]# kubectl exec zk-1 zkCli.sh get /hello
Connecting to localhost:2181
...

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
world
cZxid = 0x100000002
ctime = Sun Dec 10 11:24:11 UTC 2017
mZxid = 0x100000002
mtime = Sun Dec 10 11:24:11 UTC 2017
pZxid = 0x100000002
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 5
numChildren = 0

测试数据持久

删除statefulset:

[root@node-131 zookeeper]# kubectl delete statefulset zk
statefulset "zk" deleted

重新创建statefulset:

[root@node-131 zookeeper]# kubectl apply -f zookeeper.yaml
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
service "zk-hs" configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
service "zk-cs" configured
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
poddisruptionbudget "zk-pdb" configured
statefulset "zk" created
[root@node-131 zookeeper]# kubectl get pods -w -l app=zk
NAME      READY     STATUS              RESTARTS   AGE
zk-0      0/1       ContainerCreating   0          12s
zk-1      0/1       ContainerCreating   0          12s
zk-2      0/1       ContainerCreating   0          12s
zk-2      0/1       Running   0         14s
zk-1      0/1       Running   0         15s
zk-1      1/1       Running   0         29s
zk-2      1/1       Running   0         31s

在zk-2上查看是否还能get到 hello这个键值:

[root@node-131 zookeeper]# kubectl exec zk-2 zkCli.sh get /hello
Connecting to localhost:2181
...

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
world
cZxid = 0x100000002
ctime = Sun Dec 10 11:24:11 UTC 2017
mZxid = 0x100000002
mtime = Sun Dec 10 11:24:11 UTC 2017
pZxid = 0x100000002
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 5
numChildren = 0

还能get到 说明,statefulset删除后数据有持久化。因为在创建statefulset的时候就已经动态创建了3个pvc,当statefulset被reschedule的时候,还是依然将这3个pvc挂载到对应的目录下(/var/lib/zookeeper)


验证配置是否持久化:

$ kubectl get sts zk -o yaml
...
 command:
        - sh
        - -c
        - "start-zookeeper \
          --servers=3 \
          --data_dir=/var/lib/zookeeper/data \
          --data_log_dir=/var/lib/zookeeper/data/log \
          --conf_dir=/opt/zookeeper/conf \
          --client_port=2181 \
          --election_port=3888 \
          --server_port=2888 \
          --tick_time=2000 \
          --init_limit=10 \
          --sync_limit=5 \
          --heap=512M \
          --max_client_cnxns=60 \
          --snap_retain_count=3 \
          --purge_interval=12 \
          --max_session_timeout=40000 \
          --min_session_timeout=4000 \
          --log_level=INFO"
...
  • 通过命令行的方式启动zookeeper服务,通过command字段传递参数。

四、其他特性:

查看日志相关配置:

[root@node-131 ~]# kubectl exec zk-0 cat /usr/etc/zookeeper/log4j.properties
zookeeper.root.logger=CONSOLE
zookeeper.console.threshold=INFO
log4j.rootLogger=${zookeeper.root.logger}
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Threshold=${zookeeper.console.threshold}
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n
  • 该文件通过zkGenConfig.sh脚本生成,控制zookeeper的日志。并通过时间和大小进行滚动(lograte)

查看相关安全上下文:

在之前的yaml中写到:

securityContext:
  runAsUser: 1000
  fsGroup: 1000

这段表明使用非特权用户运行zookeeper进程。即,在pod中的container里,UID 1000对应zookeeper用户 GID 1000对应zookeeper组。

查看zookeeper进程:

用的是zookeeper用户执行的进程而非root用户。

同样,在缺省情况下,pod的pv是挂载至zookeeper server的数据目录,只允许root用户访问。上述上下文的配置能够支持zookeeper进程访问相应的数据目录:

查看数据目录权限:

[root@node-131 ~]# kubectl exec -ti zk-0 -- ls -ld /var/lib/zookeeper/data
drwxrwsr-x 4 zookeeper zookeeper 4096 Dec 10 08:27 /var/lib/zookeeper/data

可以看到数据目录的属组 属主为zookeeper,这是因为在fsGroup中指定1000,则此时pod的pv的所有者将自动设置为zookeeper组,如此一来zookeeper进程就有了访问数据目录的权限;


管理zookeeper进程:

在zookeeper官方文档中有提到,“可能需要一个监督管理进程来检查群集中所有zookeeper server进程的状态,在一个分布式环境中能够及时重启失败的进程”,在k8s环境中可以利用kubernetes作为watchdog代替外部的工具。

在线升级zookeeper组件

在之前的yaml文件中已经制定了update方式:

  updateStrategy:
    type: RollingUpdate

可以使用kubectl path 的方式更新pod的cpu数(0.5降低为0.3):

kubectl patch sts zk --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value":"0.3"}]'

statefulset "zk" patched

用kubectl rollout status 查看update过程:

[root@node-131 ~]# kubectl rollout status sts/zk
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
waiting for statefulset rolling update to complete 1 pods at revision zk-7c9f9fc76b...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
waiting for statefulset rolling update to complete 2 pods at revision zk-7c9f9fc76b...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 3 pods at revision zk-7c9f9fc76b...

查看rollout history:

[root@node-131 ~]# kubectl rollout history sts/zk
statefulsets "zk"
REVISION
1
2

验证升级效果:

[root@node-131 ~]# kubectl get pod zk-0 -o yaml
...
    resources:
      requests:
        cpu: 300m
        memory: 1Gi
...

关于升级的回滚:

[root@node-131 ~]# kubectl rollout undo sts/zk
statefulset "zk" rolled back

验证回滚结果:

[root@node-131 ~]# kubectl rollout history sts/zk
statefulsets "zk"
REVISION
2
3
[root@node-131 ~]# kubectl get pod zk-0 -o yaml
...
    resources:
      requests:
        cpu: 500m
        memory: 1Gi
...

关于失败进程的处理:

之前已经提过了,在kubernetes群集中不需要在部署特别的外部工具,因为kubernetes自带健康检查以及相应的恢复策略,比如本statefulset中Restart Policies 为 Always,因此只要判定检查检查失败就会restart the pod;

测试一下,查看zk进程:

[root@node-131 ~]# kubectl exec zk-0 -- ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
zookeep+     1     0  0 08:02 ?        00:00:00 sh -c start-zookeeper --servers=3 --data_dir=/var/lib/zookeeper/data --data_log_dir=/var/lib/zookeeper/data/log --conf_dir=/opt/zookeeper/conf --client_port=2181 --election_port=3888 --server_port=2888 --tick_time=2000 --init_limit=10 --sync_limit=5 --heap=512M --max_client_cnxns=60 --snap_retain_count=3 --purge_interval=12 --max_session_timeout=40000 --min_session_timeout=4000 --log_level=INFO
zookeep+     7     1  0 08:02 ?        00:00:01 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.root.logger=INFO,CONSOLE -cp /usr/bin/../build/classes:/usr/bin/../build/lib/*.jar:/usr/bin/../share/zookeeper/zookeeper-3.4.10.jar:/usr/bin/../share/zookeeper/slf4j-log4j12-1.6.1.jar:/usr/bin/../share/zookeeper/slf4j-api-1.6.1.jar:/usr/bin/../share/zookeeper/netty-3.10.5.Final.jar:/usr/bin/../share/zookeeper/log4j-1.2.16.jar:/usr/bin/../share/zookeeper/jline-0.9.94.jar:/usr/bin/../src/java/lib/*.jar:/usr/bin/../etc/zookeeper: -Xmx512M -Xms512M -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/bin/../etc/zookeeper/zoo.cfg
  • zk一共两个进程,pid 1 的为主进程,7的为子进程

在一个节点上开启watch检查,zk相关pod状态,在另一个节点kill掉zk-0 的zookeeper进程:

[root@node-131 ~]#  kubectl exec zk-0 -- pkill java

在node.132上查看到的信息:

[root@node-132 ~]# kubectl get pod -w -l app=zk
NAME      READY     STATUS    RESTARTS   AGE
zk-0      1/1       Running   0          8m
zk-1      1/1       Running   0          9m
zk-2      1/1       Running   0          11m
zk-0      0/1       Error     0         8m
zk-0      0/1       Running   1         8m
zk-0      1/1       Running   1         8m
  • 这说明了kubernetes自己就是一个watchdog,实时监控着这些pod的健康度,fail了就重启它。

测试健康检查:

如果只是根据进程是否存活来判断群集是否健康这显然是不够的,还有很多健康检查的方案可以判定系统进程存活但无法响应或者不健康。我们可以使用liveness探针,来通知kubernetes你的应用是否不健康应该重启;

本例中的 liveness probe 如下:

        livenessProbe:
          exec:
            command:
            - sh
            - -c
            - "zookeeper-ready 2181"

这个探针用了一个很简单的脚本,用ruok这四个字母,测试服务的健康度

#!/usr/bin/env bash
# Copyright 2017 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# zkOk.sh uses the ruok ZooKeeper four letter work to determine if the instance
# is health. The $? variable will be set to 0 if server responds that it is 
# healthy, or 1 if the server fails to respond.

OK=$(echo ruok | nc 127.0.0.1 $1)
if [ "$OK" == "imok" ]; then
        exit 0
else
        exit 1

进到zk-0里验证一下(给ruok恢复imok…说明正常):

捣乱,在一个节点上删除zookeeper-ready,在另一个节点上查看状态:

[root@node-131 ~]# kubectl exec zk-0 -- rm /opt/zookeeper-3.4.10/bin/zookeeper-ready

在node.132上查看:

[root@node-132 ~]# kubectl get pod -w -l app=zk
NAME      READY     STATUS    RESTARTS   AGE
zk-0      1/1       Running   1          28m
zk-1      1/1       Running   0          29m
zk-2      1/1       Running   0          30m
zk-0      0/1       Running   1         28m
zk-0      0/1       Running   2         29m
zk-0      1/1       Running   2         29m

zk-0,自动重启了,虽然它的zk进程并没问题:

除了用liveness还可以用Readiness

 readinessProbe:
          exec:
            command:
            - "zookeeper-ready 2181"
          initialDelaySeconds: 15
          timeoutSeconds: 5
  • liveness:该探针,判断何时重启pod,关乎于是否活着
  • Readiness:该探针,判断何时到导入流量到当前pod,因为有可能进程正常,但是需要读大量的配置文件或者数据,在此阶段pod是健康的但是在加载完数据前不应该导流量到当前pod
  • 在这个例子里,两者区别不大..

对于node故障的容忍度:

对于一个三个节点的zk群集来说,至少需要有2个zk server运行才能保证群集健康。为了避免误操作或者pod分布不合理,需要合理的规划(亲和性部署)以及PDB;

先来看本例中用到的反亲和性部署(podAntiAffinity):

查看zk目前部署在哪些节点之上:

[root@node-132 ~]# for i in 0 1 2; do kubectl get pod zk-$i --template {{.spec.nodeName}}; echo ""; done
node.132
node.131
node.134

为什么三个pod平均分布在三个节点上?因为这段:

      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values: 
                    - zk-headless
              topologyKey: "kubernetes.io/hostname"
  • requiredDuringSchedulingIgnoredDuringExecution:这条说明了告诉kube-scheduler永远不要部署两个zk-headless服务的pod在同一个topologyKey定义的域内;
  • topologyKey kubernetes.io/hostname:定义了这个域为一个独立的node;

接下来测试将一个node cordon并且 drain,并测试对群集的影响:

之前看了zk的3个pod,分别跑在node.131、node.132、node.134上:

将node.134 cordon

$ kubectl cordon node.134

查看pdb状态:

[root@node-132 ~]# kubectl get pdb zk-pdb
NAME      MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
zk-pdb    N/A             1                 1                     1d
  • max-unavailable:说明了zk这个statefulset最大不用的pod数量为1

在一个终端watch状态:

$ kubectl get pods -w -l app=zk

在另一个终端检查pod当前部署的节点:

[root@node-132 ~]# for i in 0 1 2; do kubectl get pod zk-$i --template {{.spec.nodeName}}; echo ""; done
node.132
node.131
node.134

使用kubectl drain,cordown和drain zk-2这个pod所在node(node.134),所谓drain也就是terminating掉这个node上所有的pod;

[root@node-132 ~]# kubectl drain $(kubectl get pod zk-2 --template {{.spec.nodeName}}) --ignore-daemonsets --force --delete-local-data
node "node.134" already cordoned
WARNING: Ignoring DaemonSet-managed pods: calico-node-kt5fk, node-exporter-b9wwq; Deleting pods with local storage: elasticsearch-logging-0, monitoring-influxdb-78c4cffd8f-bfjz7, alertmanager-main-1
...
pod "zk-2" evicted
...
node "node.134" drained

当前本群集有4台机器,node.134drain后zk-2应该就自动部署到node.133上了:

[root@node-131 ~]# kubectl get pods -w -l app=zk
NAME      READY     STATUS    RESTARTS   AGE
zk-0      1/1       Running   2          58m
zk-1      1/1       Running   0          59m
zk-2      1/1       Running   0          1h
zk-2      1/1       Terminating   0         1h
zk-2      0/1       Terminating   0         1h
zk-2      0/1       Terminating   0         1h
zk-2      0/1       Terminating   0         1h
zk-2      0/1       Pending   0         0s
zk-2      0/1       Pending   0         0s
zk-2      0/1       ContainerCreating   0         0s
zk-2      0/1       Running   0         20s
zk-2      1/1       Running   0         34s

等到zk-2 running后

[root@node-132 ~]# for i in 0 1 2; do kubectl get pod zk-$i --template {{.spec.nodeName}}; echo ""; done
node.132
node.131
node.133

接下来。。。drain zk-1所在node:

[root@node-132 ~]# kubectl drain $(kubectl get pod zk-1 --template {{.spec.nodeName}}) --ignore-daemonsets --force --delete-local-data
node "node.131" cordoned
WARNING: Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: pvpod, pvpod-sc, test-vmdk; Ignoring DaemonSet-managed pods: calico-node-fgcnz, node-exporter-qgfv5; Deleting pods with local storage: elasticsearch-logging-0, alertmanager-main-0, grafana-7d966ff57-lzwqs, prometheus-k8s-1
...
pod "zk-1" evicted
...
node "node.131" drained

drained完node.131后,查看watch那端发现:

[root@node-131 ~]# kubectl get pods -w -l app=zk
NAME      READY     STATUS    RESTARTS   AGE
zk-0      1/1       Running   2          58m
zk-1      1/1       Running   0          59m
zk-2      1/1       Running   0          1h
zk-2      1/1       Terminating   0         1h
zk-2      0/1       Terminating   0         1h
zk-2      0/1       Terminating   0         1h
zk-2      0/1       Terminating   0         1h
zk-2      0/1       Pending   0         0s
zk-2      0/1       Pending   0         0s
zk-2      0/1       ContainerCreating   0         0s
zk-2      0/1       Running   0         20s
zk-2      1/1       Running   0         34s
zk-1      1/1       Terminating   0         1h
zk-1      0/1       Terminating   0         1h
zk-1      0/1       Terminating   0         1h
zk-1      0/1       Terminating   0         1h
zk-1      0/1       Pending   0         0s
zk-1      0/1       Pending   0         0s
  • zk-1 pending了:原因就是我们只有4个节点,2个节点被drain了,剩下两个都以及部署了zk的pod,zk-1由于受到podAntiAffinity的限制,没地方部署了。

查看zk-1的event:

[root@node-132 ~]# kubectl describe pod zk-1
...

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  15s (x13 over 2m)  default-scheduler  No nodes are available that match all of the predicates: MatchInterPodAffinity (2), NodeUnschedulable (2).

接下来。。。drain zk-0所在node:

[root@node-132 ~]# kubectl drain $(kubectl get pod zk-2 --template {{.spec.nodeName}}) --ignore-daemonsets --force --delete-local-data
node "node.133" cordoned
...
There are pending pods when an error occurred: Cannot evict pod as it would violate the pod's disruption budget.
pod/zk-2
  • 发现不允许杉树pod/zk-2,node.133 drain失败了,原因很明显,如果node.133 deained则破坏了zk-pdb这个pdb规则(最小不可用为1,目前zk-1是pending的)

测试一下群集是否还能正常提供服务:

[root@node-132 ~]# kubectl exec zk-0 zkCli.sh get /hello
Connecting to localhost:2181
...

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
world
cZxid = 0x100000002
ctime = Sun Dec 10 11:24:11 UTC 2017
mZxid = 0x100000002
mtime = Sun Dec 10 11:24:11 UTC 2017
pZxid = 0x100000002
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 5
numChildren = 0

把node.134解放回来:

[root@node-132 ~]# kubectl uncordon node.134
node "node.134" uncordoned

查看watch端,发现zk-1有地方去了:

[root@node-131 ~]# kubectl get pods -w -l app=zk
NAME      READY     STATUS    RESTARTS   AGE
zk-0      1/1       Running   2          58m
zk-1      1/1       Running   0          59m
zk-2      1/1       Running   0          1h
zk-2      1/1       Terminating   0         1h
zk-2      0/1       Terminating   0         1h
zk-2      0/1       Terminating   0         1h
zk-2      0/1       Terminating   0         1h
zk-2      0/1       Pending   0         0s
zk-2      0/1       Pending   0         0s
zk-2      0/1       ContainerCreating   0         0s
zk-2      0/1       Running   0         20s
zk-2      1/1       Running   0         34s
zk-1      1/1       Terminating   0         1h
zk-1      0/1       Terminating   0         1h
zk-1      0/1       Terminating   0         1h
zk-1      0/1       Terminating   0         1h
zk-1      0/1       Pending   0         0s
zk-1      0/1       Pending   0         0s
zk-1      0/1       Pending   0         6m
zk-1      0/1       Pending   0         18m
zk-1      0/1       ContainerCreating   0         18m
zk-1      0/1       Running   0         18m
zk-1      1/1       Running   0         18m

接下来吧刚才几个drain的节点 uncordon 吧:

[root@node-132 ~]# kubectl get node
NAME       STATUS                     ROLES     AGE       VERSION
node.131   Ready,SchedulingDisabled   <none>    3d        v1.8.0
node.132   Ready                      <none>    3d        v1.8.0
node.133   Ready,SchedulingDisabled   <none>    3d        v1.8.0
node.134   Ready                      <none>    3d        v1.8.0
[root@node-132 ~]# kubectl uncordon  node.131
node "node.131" uncordoned
[root@node-132 ~]# kubectl uncordon  node.133
node "node.133" uncordoned
[root@node-132 ~]# kubectl get node          
NAME       STATUS    ROLES     AGE       VERSION
node.131   Ready     <none>    3d        v1.8.0
node.132   Ready     <none>    3d        v1.8.0
node.133   Ready     <none>    3d        v1.8.0
node.134   Ready     <none>    3d        v1.8.0

至此,zookeeper算是正式折腾结束!

本系列其他内容:

  • 01-环境准备

  • 02-etcd群集搭建

  • 03-kubectl管理工具

  • 04-master搭建

  • 05-node节点搭建

  • 06-addon-calico

  • 07-addon-kubedns

  • 08-addon-dashboard

  • 09-addon-kube-prometheus

  • 10-addon-EFK

  • 11-addon-Harbor

  • 12-addon-ingress-nginx

  • 13-addon-traefik

参考资料:

https://kubernetes.io/docs/tutorials/stateful-application/zookeeper/

你可能感兴趣的:(kubernetes,1.8.0)