RC、Deployment、DaemonSet都是面向无状态的服务,它们所管理的Pod的IP、Hostname、启停顺序等都是随机的,被管理的Pod重建时,Pod的IP、Hostname都会有变化。而StatefulSet是有状态的集合,管理所有有状态的服务,比如MySQL、MongoDB集群等。
StatefulSet本质上是Deployment的一种变体,在v1.9版本中已成为GA版本,它为了解决有状态服务的问题,它所管理的Pod拥有固定的Pod名称、启停顺序;在StatefulSet中,Pod名字称为网络标识(hostname),还必须要用到共享存储。
在Deployment中,与之对应的服务是service,而在StatefulSet中与之对应的headless service,headless service,即无头服务,与service的区别就是它没有Cluster IP,解析它的名称时将返回该Headless Service对应的全部Pod的Endpoint列表。
以redis cluster为例,由于各redis container 的角色不一定相同(有master、slave之分),所以每个redis container被重建之后必须保持原有的hostname,必须挂载原有的volume,这样才能保证每个shard内是正常的。而且每个redis shard 所管理的slot不同,存储的数据不同,所以要求每个redis shard 所连接的存储不同,保证数据不会被覆盖或混乱。(注:在Deployment中 Pod template里定义的存储卷,所有副本集共用一个存储卷,数据是相同的,因为Pod创建时基于同一模板生成)
为了保证container所挂载的volume不会出错,k8s引入了volumeClaimTemplate。
所以具有以下特性的应用使用statefullSet
1)、稳定且唯一的网络标识符;
2)、稳定且持久的存储;
3)、有序、平滑地部署和扩展;
4)、有序、平滑的终止和删除;
5)、有序的滚动更新;

对于一个完整的StatefulSet应用由三个部分组成: headless service、StatefulSet controller、volumeClaimTemplate

例1:
由于本例中pv是静态提供,所以首先准备pv,如下:

[root@k8s-master-dev statefulset]# kubectl get pv
NAME      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM     STORAGECLASS   REASON    AGE
pv01      5Gi        RWO,RWX        Retain           Available                                      27m
pv02      10Gi       RWO,RWX        Retain           Available                                      27m
pv03      15Gi       RWO,RWX        Retain           Available                                      27m
[root@k8s-master-dev statefulset]#

然后定义一个statefulset 应用,如下:

[root@k8s-master-dev statefulset]# vim statefulset-demo.yaml
[root@k8s-master-dev statefulset]# cat statefulset-demo.yaml
apiVersion: v1
kind: Service
metadata:
  name: ngx-svc
  labels:
    app: ngx-svc
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: ngx-pod

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: ngx
spec:
  serviceName: ngx-svc #声明它属于哪个Headless Service
  replicas: 2
  selector:
    matchLabels:
      app: ngx-pod  #has to match .spec.template.metadata.labels
  template:
    metadata:
      labels:
        app: ngx-pod #has to match .spec.selector.matchLabels
    spec:
      containers:
      - name: ngx
        image: nginx:1.15-alpine
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: ngxvol
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: ngxvol
    spec:
      accessModes: ["ReadWriteMany"]
      resources:
        requests:
          storage: 5Gi
[root@k8s-master-dev statefulset]# kubectl apply -f statefulset-demo.yaml
service/ngx-svc created
statefulset.apps/ngx created
[root@k8s-master-dev statefulset]# kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1            443/TCP   1d
ngx-svc      ClusterIP   None                 80/TCP    15s
[root@k8s-master-dev statefulset]# kubectl get sts
NAME      DESIRED   CURRENT   AGE
ngx       2         2         30s
[root@k8s-master-dev statefulset]# kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
ngx-0     1/1       Running   0          35s
ngx-1     1/1       Running   0          34s
[root@k8s-master-dev statefulset]#

每个podname 被定义为pod.name-0、pod.name-1... 依次类推。而每个pod的FQDN名被定义为: $(pod.name).(headless server name).namespace.svc.cluster.local
每个PVC 的名称又由两个部分组成:$(volumeClaimTemplates.name)-(pod.name) ,代表该PVC由哪个volumeClaimTemplates申请创建,且永远被挂载到$(pod.name)上。当原Pod被删除之后,PVC保持不变,数据不会丢失(手动删除pvc将自动释放pv)。当新Pod被创建之后,原podname会被继承,也会再次挂载到原Volume之上。

[root@k8s-master-dev statefulset]# kubectl get pv
NAME      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                  STORAGECLASS   REASON    AGE
pv01      5Gi        RWO,RWX        Retain           Bound       default/ngxvol-ngx-0                            35m
pv02      10Gi       RWO,RWX        Retain           Bound       default/ngxvol-ngx-1                            35m
pv03      15Gi       RWO,RWX        Retain           Available                                                   35m
[root@k8s-master-dev statefulset]# kubectl get pvc
NAME           STATUS    VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ngxvol-ngx-0   Bound     pv01      5Gi        RWO,RWX                       5m
ngxvol-ngx-1   Bound     pv02      10Gi       RWO,RWX                       5m
[root@k8s-master-dev statefulset]#
[root@k8s-master-dev manifests]# kubectl exec -it ngx-0 -- /bin/sh
/ # nslookup ngx-0.ngx-svc.default.svc.cluster.local
nslookup: can't resolve '(null)': Name does not resolve

Name:      ngx-0.ngx-svc.default.svc.cluster.local
Address 1: 10.244.4.2 ngx-0.ngx-svc.default.svc.cluster.local
/ #
/ #
/ # nslookup ngx-1.ngx-svc.default.svc.cluster.local
nslookup: can't resolve '(null)': Name does not resolve

Name:      ngx-1.ngx-svc.default.svc.cluster.local
Address 1: 10.244.1.101 ngx-1.ngx-svc.default.svc.cluster.local
/ #
/ # [root@k8s-master-dev manifests]#

[root@k8s-master-dev statefulset]# kubectl exec ngx-0 -- ls /usr/share/nginx/html
[root@k8s-master-dev statefulset]# kubectl exec -it ngx-0 -- /bin/sh
/ # echo ngx-0 > /usr/share/nginx/html/index.html
/ # [root@k8s-master-dev statefulset]#
[root@k8s-master-dev statefulset]# kubectl get pods -o wide
NAME      READY     STATUS    RESTARTS   AGE       IP            NODE            NOMINATED NODE
ngx-0     1/1       Running   0          9m        10.244.1.98   k8s-node1-dev   
ngx-1     1/1       Running   0          9m        10.244.2.63   k8s-node2-dev   
[root@k8s-master-dev statefulset]# curl http://10.244.1.98
ngx-0
[root@k8s-master-dev statefulset]# kubectl delete pod/ngx-0
pod "ngx-0" deleted
[root@k8s-master-dev statefulset]# kubectl get pods -o wide
NAME      READY     STATUS    RESTARTS   AGE       IP            NODE            NOMINATED NODE
ngx-0     1/1       Running   0          8s        10.244.1.99   k8s-node1-dev   
ngx-1     1/1       Running   0          10m       10.244.2.63   k8s-node2-dev   
[root@k8s-master-dev statefulset]# curl http://10.244.1.99
ngx-0
[root@k8s-master-dev statefulset]#

pod的扩展、收缩都是按照顺序执行。如下所示:

[root@k8s-master-dev statefulset]# kubectl scale sts ngx --replicas=3
statefulset.apps/ngx scaled
[root@k8s-master-dev statefulset]# kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
ngx-0     1/1       Running   0          8m
ngx-1     1/1       Running   0          18m
ngx-2     1/1       Running   0          3s
[root@k8s-master-dev statefulset]# kubectl get pvc
NAME           STATUS    VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ngxvol-ngx-0   Bound     pv01      5Gi        RWO,RWX                       18m
ngxvol-ngx-1   Bound     pv02      10Gi       RWO,RWX                       18m
ngxvol-ngx-2   Bound     pv03      15Gi       RWO,RWX                       12s
[root@k8s-master-dev statefulset]# kubectl get pv
NAME      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                  STORAGECLASS   REASON    AGE
pv01      5Gi        RWO,RWX        Retain           Bound     default/ngxvol-ngx-0                            49m
pv02      10Gi       RWO,RWX        Retain           Bound     default/ngxvol-ngx-1                            48m
pv03      15Gi       RWO,RWX        Retain           Bound     default/ngxvol-ngx-2                            48m
[root@k8s-master-dev statefulset]# kubectl patch sts ngx -p '{"spec":{"replicas":2}}'
statefulset.apps/ngx patched
[root@k8s-master-dev statefulset]# kubectl get pvc
NAME           STATUS    VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ngxvol-ngx-0   Bound     pv01      5Gi        RWO,RWX                       20m
ngxvol-ngx-1   Bound     pv02      10Gi       RWO,RWX                       20m
ngxvol-ngx-2   Bound     pv03      15Gi       RWO,RWX                       1m
[root@k8s-master-dev statefulset]# kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
ngx-0     1/1       Running   0          9m
ngx-1     1/1       Running   0          20m
[root@k8s-master-dev statefulset]#
[root@k8s-master-dev statefulset]# kubectl delete -f statefulset-demo.yaml
service "ngx-svc" deleted
statefulset.apps "ngx" deleted
[root@k8s-master-dev statefulset]# kubectl delete pvc --all
persistentvolumeclaim "ngxvol-ngx-0" deleted
persistentvolumeclaim "ngxvol-ngx-1" deleted
persistentvolumeclaim "ngxvol-ngx-2" deleted
[root@k8s-master-dev statefulset]# kubectl delete -f ../volumes/pv-vol-demo.yaml
persistentvolume "pv01" deleted
persistentvolume "pv02" deleted
persistentvolume "pv03" deleted
[root@k8s-master-dev statefulset]#

更新策略
在Kubernetes 1.7及更高版本中,通过.spec.updateStrategy字段允许配置或禁用Pod、labels、source request/limits、annotations自动滚动更新功能。
OnDelete:通过.spec.updateStrategy.type 字段设置为OnDelete,StatefulSet控制器不会自动更新StatefulSet中的Pod。用户必须手动删除Pod,以使控制器创建新的Pod。
RollingUpdate:通过.spec.updateStrategy.type 字段设置为RollingUpdate,实现了Pod的自动滚动更新,如果.spec.updateStrategy未指定,则此为默认策略。
StatefulSet控制器将删除并重新创建StatefulSet中的每个Pod。它将以Pod终止(从最大序数到最小序数)的顺序进行,一次更新每个Pod。在更新下一个Pod之前,必须等待这个Pod Running and Ready。
Partitions:通过指定 .spec.updateStrategy.rollingUpdate.partition 来对 RollingUpdate 更新策略进行分区,如果指定了分区,则当 StatefulSet 的 .spec.template 更新时,具有大于或等于分区序数的所有 Pod 将被更新。
具有小于分区的序数的所有 Pod 将不会被更新,即使删除它们也将被重新创建。如果 StatefulSet 的 .spec.updateStrategy.rollingUpdate.partition 大于其 .spec.replicas,则其 .spec.template 的更新将不会传播到 Pod。在大多数情况下,不需要使用分区。
修改更新策略及更新image 例:

kubectl patch sts ngx -p '{"spec":{"updateStrategy":{"rollingUpdate":{"partition":4}}}}'
kubectl set image sts/ngx ngx=nginx:latest