A stateful workload is a piece of software that must store and maintain state in order to function. This state must be maintained when the workload is restarted or relocated. This makes stateful workloads much more difficult to operate.
Stateful workloads are also much harder to scale because you can’t simply add and remove replicas without considering their state, as you can with stateless workloads. If the replicas can share state by reading and writing the same files, adding new replicas isn’t a problem. However, for this to be possible, the underlying storage technology must support it. On the other hand, if each replica stores its state in its own files, you’ll need to allocate a separate volume for each replica.
Another notable difference between Deployments and StatefulSets is that, by default, the Pods of a StatefulSet aren’t created concurrently. Instead, they’re created one at a time, similar to a rolling update of a Deployment. When you create a StatefulSet, only the first Pod is created initially. Then the StatefulSet controller waits until the Pod is ready before creating the next one.
A StatefulSet can be scaled just like a Deployment. When you scale a StatefulSet up, new Pods and PersistentVolumeClaims are created from their respective templates. When you scale down the StatefulSet, the Pods are deleted, but the PersistentVolumeClaims are either retained or deleted, depending on the policy you configure in the StatefulSet.
root@AlexRampUpVM-01:/setup/yaml/statefulset# more statefulsetwithPV.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: statefulset-azuredisk
labels:
app: statefulsetwithazuredisk
spec:
serviceName: statefulset-azuredisk
replicas: 2
template:
metadata:
labels:
app: statefulsetwithazuredisk
spec:
nodeSelector:
"kubernetes.io/os": linux
containers:
- name: statefulset-azuredisk
image: mcr.microsoft.com/oss/nginx/nginx:1.19.5
volumeMounts:
- name: persistent-storage
mountPath: /mnt/azuredisk
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: statefulsetwithazuredisk
volumeClaimTemplates:
- metadata:
name: persistent-storage
spec:
accessModes: [ "ReadWriteOnce" ]
# storageClassName: "ultra-disk-sc"
storageClassName: "managed-premium"
resources:
requests:
storage: 4Gi
Create the StatefulSet by applying the manifest file as follows:
$ kubectl apply -f statefulsetwithPV.yaml
statefulset.apps/statefulset-azuredisk created
As you know, you can examine an object in detail with the kubectl describe command. Here you can see what it displays for the quiz StatefulSet:
kubectl describe sts
Let’s take a closer look at the manifest of the first Pod to see how it compares to Pods created by a ReplicaSet. Use the kubectl get command to print the Pod manifest like so:
kubectl describe pod
Unlike the Pods created by a ReplicaSet, the Pods of a StatefulSet are named differently and each has its own PersistentVolumeClaim (or set of PersistentVolumeClaims if the StatefulSet contains multiple claim templates). As mentioned in the introduction, if a StatefulSet Pod is deleted and replaced by the controller with a new instance, the replica retains the same identity and is associated with the same PersistentVolumeClaim. Try deleting the quiz-1 Pod as follows:
$ kubectl delete po quiz-1
pod "quiz-1" deleted
The pod that’s created in its place has the same name, as you can see here:
The IP address of the new Pod might be different, but that doesn’t matter because the DNS records have been updated to point to the new address. Clients using the Pod’s hostname to communicate with it won’t notice any difference.
In general, this new Pod can be scheduled to any cluster node if the PersistentVolume bound to the PersistentVolumeClaim represents a network-attached volume and not a local volume. If the volume is local to the node, the Pod is always scheduled to this node.
Like the ReplicaSet controller, its StatefulSet counterpart ensures that there are always the desired number of Pods configured in the replicas field. However, there’s an important difference in the guarantees that a StatefulSet provides compared to a ReplicaSet. This difference is explained next.
In addition to declarative scaling, StatefulSets also provide declarative updates, similar to Deployments. When you update the Pod template in a StatefulSet, the controller recreates the Pods with the updated template.
You may recall that the Deployment controller can perform the update in two ways, depending on the strategy specified in the Deployment object. You can also specify the update strategy in the updateStrategy field in the spec section of the StatefulSet manifest, but the available strategies are different from those in a Deployment, as you can see in the following table.
Table 15.2 The supported StatefulSet update strategies
Value | Description |
---|---|
RollingUpdate | In this update strategy, the Pods are replaced one by one. The Pod with the highest ordinal number is deleted first and replaced with a Pod created with the new template. When this new Pod is ready, the Pod with the next highest ordinal number is replaced. The process continues until all Pods have been replaced. This is the default strategy. |
OnDelete | The StatefulSet controller waits for each Pod to be manually deleted. When you delete the Pod, the controller replaces it with a Pod created with the new template. With this strategy, you can replace Pods in any order and at any rate. |
The following figure shows how the Pods are updated over time for each update strategy.
The RollingUpdate strategy, which you can find in both Deployments and StatefulSets, is similar between the two objects, but differs in the parameters you can set. The OnDelete strategy lets you replace Pods at your own pace and in any order. It’s different from the Recreate strategy found in Deployments, which automatically deletes and replaces all Pods at once.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: quiz
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 3
replicas: 3
...
If you’re updating the StatefulSet and the rollout hangs, or if the rollout was successful, but you want to revert to the previous revision, you can use the kubectl rollout undo command, as described in the previous chapter. You’ll update the quiz StatefulSet again in the next section, so please reset it to the previous version as follows:
$ kubectl rollout undo sts quiz
statefulset.apps/quiz rolled back
You can also use the --to-revision option to return to a specific revision. As with Deployments, Pods are rolled back using the update strategy configured in the StatefulSet. If the strategy is RollingUpdate, the Pods are reverted one at a time.