RC、Deployment、DaemonSet都是面向无状态的服务,它们所管理的Pod的IP、Hostname、启停顺序等都是随机的,被管理的Pod重建时,Pod的IP、Hostname都会有变化。而StatefulSet是有状态的集合,管理所有有状态的服务,比如MySQL、MongoDB集群等。
StatefulSet本质上是Deployment的一种变体,在v1.9版本中已成为GA版本,它为了解决有状态服务的问题,它所管理的Pod拥有固定的Pod名称、启停顺序;在StatefulSet中,Pod名字称为网络标识(hostname),还必须要用到共享存储。
在Deployment中,与之对应的服务是service,而在StatefulSet中与之对应的headless service,headless service,即无头服务,与service的区别就是它没有Cluster IP,解析它的名称时将返回该Headless Service对应的全部Pod的Endpoint列表。
以redis cluster为例,由于各redis container 的角色不一定相同(有master、slave之分),所以每个redis container被重建之后必须保持原有的hostname,必须挂载原有的volume,这样才能保证每个shard内是正常的。而且每个redis shard 所管理的slot不同,存储的数据不同,所以要求每个redis shard 所连接的存储不同,保证数据不会被覆盖或混乱。(注:在Deployment中 Pod template里定义的存储卷,所有副本集共用一个存储卷,数据是相同的,因为Pod创建时基于同一模板生成)
为了保证container所挂载的volume不会出错,k8s引入了volumeClaimTemplate。
所以具有以下特性的应用使用statefullSet:
1)、稳定且唯一的网络标识符;
2)、稳定且持久的存储;
3)、有序、平滑地部署和扩展;
4)、有序、平滑的终止和删除;
5)、有序的滚动更新;
对于一个完整的StatefulSet应用由三个部分组成: headless service、StatefulSet controller、volumeClaimTemplate。
例1:
由于本例中pv是静态提供,所以首先准备pv,如下:
[root@k8s-master-dev statefulset]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv01 5Gi RWO,RWX Retain Available 27m
pv02 10Gi RWO,RWX Retain Available 27m
pv03 15Gi RWO,RWX Retain Available 27m
[root@k8s-master-dev statefulset]#
然后定义一个statefulset 应用,如下:
[root@k8s-master-dev statefulset]# vim statefulset-demo.yaml
[root@k8s-master-dev statefulset]# cat statefulset-demo.yaml
apiVersion: v1
kind: Service
metadata:
name: ngx-svc
labels:
app: ngx-svc
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: ngx-pod
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ngx
spec:
serviceName: ngx-svc #声明它属于哪个Headless Service
replicas: 2
selector:
matchLabels:
app: ngx-pod #has to match .spec.template.metadata.labels
template:
metadata:
labels:
app: ngx-pod #has to match .spec.selector.matchLabels
spec:
containers:
- name: ngx
image: nginx:1.15-alpine
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
name: web
volumeMounts:
- name: ngxvol
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: ngxvol
spec:
accessModes: ["ReadWriteMany"]
resources:
requests:
storage: 5Gi
[root@k8s-master-dev statefulset]# kubectl apply -f statefulset-demo.yaml
service/ngx-svc created
statefulset.apps/ngx created
[root@k8s-master-dev statefulset]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 443/TCP 1d
ngx-svc ClusterIP None 80/TCP 15s
[root@k8s-master-dev statefulset]# kubectl get sts
NAME DESIRED CURRENT AGE
ngx 2 2 30s
[root@k8s-master-dev statefulset]# kubectl get pods
NAME READY STATUS RESTARTS AGE
ngx-0 1/1 Running 0 35s
ngx-1 1/1 Running 0 34s
[root@k8s-master-dev statefulset]#
每个podname 被定义为pod.name-0、pod.name-1... 依次类推。而每个pod的FQDN名被定义为: $(pod.name).(headless server name).namespace.svc.cluster.local
每个PVC 的名称又由两个部分组成:$(volumeClaimTemplates.name)-(pod.name) ,代表该PVC由哪个volumeClaimTemplates申请创建,且永远被挂载到$(pod.name)上。当原Pod被删除之后,PVC保持不变,数据不会丢失(手动删除pvc将自动释放pv)。当新Pod被创建之后,原podname会被继承,也会再次挂载到原Volume之上。
[root@k8s-master-dev statefulset]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv01 5Gi RWO,RWX Retain Bound default/ngxvol-ngx-0 35m
pv02 10Gi RWO,RWX Retain Bound default/ngxvol-ngx-1 35m
pv03 15Gi RWO,RWX Retain Available 35m
[root@k8s-master-dev statefulset]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ngxvol-ngx-0 Bound pv01 5Gi RWO,RWX 5m
ngxvol-ngx-1 Bound pv02 10Gi RWO,RWX 5m
[root@k8s-master-dev statefulset]#
[root@k8s-master-dev manifests]# kubectl exec -it ngx-0 -- /bin/sh
/ # nslookup ngx-0.ngx-svc.default.svc.cluster.local
nslookup: can't resolve '(null)': Name does not resolve
Name: ngx-0.ngx-svc.default.svc.cluster.local
Address 1: 10.244.4.2 ngx-0.ngx-svc.default.svc.cluster.local
/ #
/ #
/ # nslookup ngx-1.ngx-svc.default.svc.cluster.local
nslookup: can't resolve '(null)': Name does not resolve
Name: ngx-1.ngx-svc.default.svc.cluster.local
Address 1: 10.244.1.101 ngx-1.ngx-svc.default.svc.cluster.local
/ #
/ # [root@k8s-master-dev manifests]#
[root@k8s-master-dev statefulset]# kubectl exec ngx-0 -- ls /usr/share/nginx/html
[root@k8s-master-dev statefulset]# kubectl exec -it ngx-0 -- /bin/sh
/ # echo ngx-0 > /usr/share/nginx/html/index.html
/ # [root@k8s-master-dev statefulset]#
[root@k8s-master-dev statefulset]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
ngx-0 1/1 Running 0 9m 10.244.1.98 k8s-node1-dev
ngx-1 1/1 Running 0 9m 10.244.2.63 k8s-node2-dev
[root@k8s-master-dev statefulset]# curl http://10.244.1.98
ngx-0
[root@k8s-master-dev statefulset]# kubectl delete pod/ngx-0
pod "ngx-0" deleted
[root@k8s-master-dev statefulset]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
ngx-0 1/1 Running 0 8s 10.244.1.99 k8s-node1-dev
ngx-1 1/1 Running 0 10m 10.244.2.63 k8s-node2-dev
[root@k8s-master-dev statefulset]# curl http://10.244.1.99
ngx-0
[root@k8s-master-dev statefulset]#
pod的扩展、收缩都是按照顺序执行。如下所示:
[root@k8s-master-dev statefulset]# kubectl scale sts ngx --replicas=3
statefulset.apps/ngx scaled
[root@k8s-master-dev statefulset]# kubectl get pods
NAME READY STATUS RESTARTS AGE
ngx-0 1/1 Running 0 8m
ngx-1 1/1 Running 0 18m
ngx-2 1/1 Running 0 3s
[root@k8s-master-dev statefulset]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ngxvol-ngx-0 Bound pv01 5Gi RWO,RWX 18m
ngxvol-ngx-1 Bound pv02 10Gi RWO,RWX 18m
ngxvol-ngx-2 Bound pv03 15Gi RWO,RWX 12s
[root@k8s-master-dev statefulset]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv01 5Gi RWO,RWX Retain Bound default/ngxvol-ngx-0 49m
pv02 10Gi RWO,RWX Retain Bound default/ngxvol-ngx-1 48m
pv03 15Gi RWO,RWX Retain Bound default/ngxvol-ngx-2 48m
[root@k8s-master-dev statefulset]# kubectl patch sts ngx -p '{"spec":{"replicas":2}}'
statefulset.apps/ngx patched
[root@k8s-master-dev statefulset]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ngxvol-ngx-0 Bound pv01 5Gi RWO,RWX 20m
ngxvol-ngx-1 Bound pv02 10Gi RWO,RWX 20m
ngxvol-ngx-2 Bound pv03 15Gi RWO,RWX 1m
[root@k8s-master-dev statefulset]# kubectl get pods
NAME READY STATUS RESTARTS AGE
ngx-0 1/1 Running 0 9m
ngx-1 1/1 Running 0 20m
[root@k8s-master-dev statefulset]#
[root@k8s-master-dev statefulset]# kubectl delete -f statefulset-demo.yaml
service "ngx-svc" deleted
statefulset.apps "ngx" deleted
[root@k8s-master-dev statefulset]# kubectl delete pvc --all
persistentvolumeclaim "ngxvol-ngx-0" deleted
persistentvolumeclaim "ngxvol-ngx-1" deleted
persistentvolumeclaim "ngxvol-ngx-2" deleted
[root@k8s-master-dev statefulset]# kubectl delete -f ../volumes/pv-vol-demo.yaml
persistentvolume "pv01" deleted
persistentvolume "pv02" deleted
persistentvolume "pv03" deleted
[root@k8s-master-dev statefulset]#
更新策略
在Kubernetes 1.7及更高版本中,通过.spec.updateStrategy字段允许配置或禁用Pod、labels、source request/limits、annotations自动滚动更新功能。
OnDelete:通过.spec.updateStrategy.type 字段设置为OnDelete,StatefulSet控制器不会自动更新StatefulSet中的Pod。用户必须手动删除Pod,以使控制器创建新的Pod。
RollingUpdate:通过.spec.updateStrategy.type 字段设置为RollingUpdate,实现了Pod的自动滚动更新,如果.spec.updateStrategy未指定,则此为默认策略。
StatefulSet控制器将删除并重新创建StatefulSet中的每个Pod。它将以Pod终止(从最大序数到最小序数)的顺序进行,一次更新每个Pod。在更新下一个Pod之前,必须等待这个Pod Running and Ready。
Partitions:通过指定 .spec.updateStrategy.rollingUpdate.partition 来对 RollingUpdate 更新策略进行分区,如果指定了分区,则当 StatefulSet 的 .spec.template 更新时,具有大于或等于分区序数的所有 Pod 将被更新。
具有小于分区的序数的所有 Pod 将不会被更新,即使删除它们也将被重新创建。如果 StatefulSet 的 .spec.updateStrategy.rollingUpdate.partition 大于其 .spec.replicas,则其 .spec.template 的更新将不会传播到 Pod。在大多数情况下,不需要使用分区。
修改更新策略及更新image 例:
kubectl patch sts ngx -p '{"spec":{"updateStrategy":{"rollingUpdate":{"partition":4}}}}'
kubectl set image sts/ngx ngx=nginx:latest