Kubernetes 发布周期
发布补丁
版本偏差
升级集群
升级 kubeadm 集群
Container Runtimes
Migrate Docker Engine nodes from dockershim to cri-dockerd
Well-Known Labels, Annotations and Taints
现使用的 k8s 版本是 1.19.10(cri 用的 docker,docker 和 kubelet 的 cgroup 驱动用的 cgroup, 系统 ubuntu:20.04, 内核 5.4.0),计划升级至版本 1.25.12,现相关的版本信息:
# kubectl get nodes -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node1 Ready master 11d v1.19.10 192.168.111.10 <none> Ubuntu 20.04.4 LTS 5.4.0-153-generic docker://20.10.7
node2 Ready master 11d v1.19.10 192.168.111.11 <none> Ubuntu 20.04.4 LTS 5.4.0-153-generic docker://20.10.7
node3 Ready master 11d v1.19.10 192.168.111.12 <none> Ubuntu 20.04.4 LTS 5.4.0-153-generic docker://20.10.7
node4 Ready worker 11d v1.19.10 192.168.111.21 <none> Ubuntu 20.04.4 LTS 5.4.0-153-generic docker://20.10.7
node5 Ready worker 11d v1.19.10 192.168.111.22 <none> Ubuntu 20.04.4 LTS 5.4.0-153-generic docker://20.10.7
node6 Ready worker 11d v1.19.10 192.168.111.23 <none> Ubuntu 20.04.4 LTS 5.4.0-153-generic docker://20.10.7
# kubectl describe nodes node1 node2 node3 | grep Taint
Taints: node-role.kubernetes.io/master:NoSchedule
Taints: node-role.kubernetes.io/master:NoSchedule
Taints: node-role.kubernetes.io/master:NoSchedule
# kubectl get nodes -l node-role.kubernetes.io/master
NAME STATUS ROLES AGE VERSION
node1 Ready master 11d v1.19.10
node2 Ready master 11d v1.19.10
node3 Ready master 11d v1.19.10
# docker info | grep cgroup
Cgroup Driver: cgroupfs
# cat /var/lib/kubelet/config.yaml | grep cgroup
cgroupDriver: cgroupfs
kubeadm 不支持跨版本升级,故而只能一个版本一个版本的升级了, 可先用 kubeadm upgrade plan
看看
没有跨越版本,能够正确的计划
# ./kubeadm-1.20.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.19.10
[upgrade/versions] kubeadm version: v1.20.12
W0726 14:27:54.999693 69390 version.go:102] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable.txt": Get "https://storage.googleapis.com/kubernetes-release/release/stable.txt": dial tcp: lookup storage.googleapis.com on 192.168.111.10:53: server misbehaving
W0726 14:27:54.999891 69390 version.go:103] falling back to the local client version: v1.20.12
[upgrade/versions] Latest stable version: v1.20.12
[upgrade/versions] Latest stable version: v1.20.12
W0726 14:27:55.793421 69390 version.go:102] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.19.txt": Get "https://storage.googleapis.com/kubernetes-release/release/stable-1.19.txt": dial tcp: lookup storage.googleapis.com on 192.168.111.10:53: server misbehaving
W0726 14:27:55.793566 69390 version.go:103] falling back to the local client version: v1.20.12
[upgrade/versions] Latest version in the v1.19 series: v1.20.12
[upgrade/versions] Latest version in the v1.19 series: v1.20.12
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT CURRENT AVAILABLE
kubelet 6 x v1.19.10 v1.20.12
Upgrade to the latest version in the v1.19 series:
COMPONENT CURRENT AVAILABLE
kube-apiserver v1.19.10 v1.20.12
kube-controller-manager v1.19.10 v1.20.12
kube-scheduler v1.19.10 v1.20.12
kube-proxy v1.19.10 v1.20.12
CoreDNS 1.7.0 1.7.0
etcd 3.4.13-0 3.4.13-0
You can now apply the upgrade by executing the following command:
kubeadm upgrade apply v1.20.12
_____________________________________________________________________
The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.
API GROUP CURRENT VERSION PREFERRED VERSION MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io v1alpha1 v1alpha1 no
kubelet.config.k8s.io v1beta1 v1beta1 no
_____________________________________________________________________
跨越了 1 个版本, 原因已经说的很清楚了
# ./kubeadm-1.21.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[upgrade/config] FATAL: this version of kubeadm only supports deploying clusters with the control plane version >= 1.20.0. Current version: v1.19.10
To see the stack trace of this error execute with --v=5 or higher
1.24 移除了 dockershim, 这还需要处理 cri 的事情。。。
# ./kubeadm-1.24.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0726 14:31:18.115581 73932 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/dockershim.sock". Please update your configuration!
[upgrade/config] FATAL: this version of kubeadm only supports deploying clusters with the control plane version >= 1.23.0. Current version: v1.19.10
To see the stack trace of this error execute with --v=5 or higher
因各种原因,采用的本地私有仓库, 所以在升级之前需要准备一些必要的文件,比如镜像,比如 kubectl,kubelet,kubeadm, 可在 Download Kubernetes 直接下载相关的二进制文件,也可以通过包管理工具下载安装。镜像可以使用 kubeadm config images list
来查看
# tree
.
├── k8s-1.20.12
│ ├── kubeadm-1.20.12
│ ├── kubectl-1.20.12
│ └── kubelet-1.20.12
├── k8s-1.21.12
│ ├── kubeadm-1.21.12
│ ├── kubectl-1.21.12
│ └── kubelet-1.21.12
├── k8s-1.22.12
│ ├── kubeadm-1.22.12
│ ├── kubectl-1.22.12
│ └── kubelet-1.22.12
├── k8s-1.23.12
│ ├── kubeadm-1.23.12
│ ├── kubectl-1.23.12
│ └── kubelet-1.23.12
├── k8s-1.24.12
│ ├── kubeadm-1.24.12
│ ├── kubectl-1.24.12
│ └── kubelet-1.24.12
└── k8s-1.25.12
├── kubeadm-1.25.12
├── kubectl-1.25.12
└── kubelet-1.25.12
6 directories, 18 files
# ./kubeadm-1.20.12 config images list --kubernetes-version v1.20.12
k8s.gcr.io/kube-apiserver:v1.20.12
k8s.gcr.io/kube-controller-manager:v1.20.12
k8s.gcr.io/kube-scheduler:v1.20.12
k8s.gcr.io/kube-proxy:v1.20.12
k8s.gcr.io/pause:3.2
k8s.gcr.io/etcd:3.4.13-0
k8s.gcr.io/coredns:1.7.0
已经到这里了,就假设需要的镜像,执行文件这些已经准备好了;私有仓库地址通过 kubectl -n kube-system edit configmaps kubeadm-config
的 imageRepository:
来设置; 升级分为 control-plane
和 worker
, 对于 control-plan
来说,第一台执行 kubeadm upgrade apply k8sversion
,其他执行kubeadm upgrade node
; 对于 worker
来说,直接执行 kubeadm upgrade node
。这里需要注意 kubeadm
的版本不要弄混了
control-plane
升级会更新该节点上的 kube-apiserver
,kube-controller-manager
,kube-scheduler
,etcd
和这些组件的相关证书以及 coredns
,kube-proxy
, 同时给 control-plane
节点添加 node-role.kubernetes.io/control-plane
的 label
, 将 etcd
的数据和 /etc/kubernetes/manifests/
备份至 /etc/kubernetes/tmp/
, 还会更新 kubelet
的配置
### 升级
# ./kubeadm-1.20.12 upgrade apply v1.20.12
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.20.12"
[upgrade/versions] Cluster version: v1.19.10
[upgrade/versions] kubeadm version: v1.20.12
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y #### z这里需要交互一下,输入 y,表示同意升级
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.20.12"...
Static pod: kube-apiserver-node1 hash: 86f9d5eb415c02995e243dab09764902
Static pod: kube-controller-manager-node1 hash: e77fd5078bafd951d87c970393d28284
Static pod: kube-scheduler-node1 hash: 62dcf2eef35b837428c13af11ba57cf5
[upgrade/etcd] Upgrading to TLS for etcd
Static pod: etcd-node1 hash: 88a10dbea90896953d5bedb7da1eccce
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.20.12". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
### control-plane 添加了 node-role.kubernetes.io/control-plane 的 label
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 11d v1.19.10
node2 Ready control-plane,master 11d v1.19.10
node3 Ready control-plane,master 11d v1.19.10
node4 Ready worker 11d v1.19.10
node5 Ready worker 11d v1.19.10
node6 Ready worker 11d v1.19.10
### 备份相关的原始数据
# tree /etc/kubernetes/tmp/
/etc/kubernetes/tmp/
├── kubeadm-backup-etcd-2023-07-26-15-01-26
│ └── etcd
│ └── member
│ ├── snap
│ │ ├── 0000000000000005-00000000003ff0df.snap
│ │ ├── 0000000000000005-00000000004017f0.snap
│ │ ├── 0000000000000005-0000000000403f01.snap
│ │ ├── 0000000000000005-0000000000406612.snap
│ │ ├── 0000000000000005-0000000000408d23.snap
│ │ └── db
│ └── wal
│ ├── 0000000000000035-00000000003b4801.wal
│ ├── 0000000000000036-00000000003c6ac4.wal
│ ├── 0000000000000037-00000000003d8c31.wal
│ ├── 0000000000000038-00000000003eae65.wal
│ ├── 0000000000000039-00000000003fd1a2.wal
│ ├── 0.tmp
│ └── 1.tmp
└── kubeadm-backup-manifests-2023-07-26-15-01-26
├── etcd.yaml
├── kube-apiserver.yaml
├── kube-controller-manager.yaml
└── kube-scheduler.yaml
6 directories, 17 files
drain 该节点,替换 kubeadm、kubectl 和 kubelet, 重启 kubelet
# kubectl drain node1 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubectl-1.20.12 `which kubectl`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node1
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 11d v1.20.12
node2 Ready control-plane,master 11d v1.19.10
node3 Ready control-plane,master 11d v1.19.10
node4 Ready worker 11d v1.19.10
node5 Ready worker 11d v1.19.10
node6 Ready worker 11d v1.19.10
control-plane
升级# ./kubeadm-1.20.12 upgrade node
# kubectl drain node2 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubectl-1.20.12 `which kubectl`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node2
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 11d v1.20.12
node2 Ready control-plane,master 11d v1.20.12
node3 Ready control-plane,master 11d v1.19.10
node4 Ready worker 11d v1.19.10
node5 Ready worker 11d v1.19.10
node6 Ready worker 11d v1.19.10
control-plane
升级# ./kubeadm-1.20.12 upgrade node
# kubectl drain node3 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubectl-1.20.12 `which kubectl`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node3
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 11d v1.20.12
node2 Ready control-plane,master 11d v1.20.12
node3 Ready control-plane,master 11d v1.20.12
node4 Ready worker 11d v1.19.10
node5 Ready worker 11d v1.19.10
node6 Ready worker 11d v1.19.10
worker
建议一台一台的升级, worker 就比较简单了,单纯的更新了 kubelet 的配置
# ./kubeadm-1.20.12 upgrade node
[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks
[preflight] Skipping prepull. Not a control plane node.
[upgrade] Skipping phase. Not a control plane node.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.
# kubectl drain node4 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node4
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 11d v1.20.12
node2 Ready control-plane,master 11d v1.20.12
node3 Ready control-plane,master 11d v1.20.12
node4 Ready worker 11d v1.20.12
node5 Ready worker 11d v1.19.10
node6 Ready worker 11d v1.19.10
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 11d v1.20.12
node2 Ready control-plane,master 11d v1.20.12
node3 Ready control-plane,master 11d v1.20.12
node4 Ready worker 11d v1.20.12
node5 Ready worker 11d v1.20.12
node6 Ready worker 11d v1.20.12
参照 1.19.10 -> 1.20.12
参照 1.19.10 -> 1.20.12
参照 1.19.10 -> 1.20.12
1.24 移除了 dockershim 的支持,但为了最小化影响,这里还是采取 docker 作为 cri,使用 cri-docker 替代 dockershim
# ./kubeadm-1.24.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0726 16:36:49.267722 260221 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/dockershim.sock". Please update your configuration!
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.23.12
[upgrade/versions] kubeadm version: v1.24.12
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
cri-docker 默认采用的 pause 镜像为: “registry.k8s.io/pause:3.6”, 所以在安装完之后记得修改 /lib/systemd/system/cri-docker.service
, 添加 --pod-infra-container-image
参数,指定私有仓库中的 pause 镜像
install cri-dockerd
# cat /lib/systemd/system/cri-docker.service | grep cri-dockerd
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image=172.30.3.150/k8s/k8s.gcr.io/pause:3.8
Migrate Docker Engine nodes from dockershim to cri-dockerd
# cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--pod-infra-container-image=172.30.3.150/k8s/k8s.gcr.io/pause:3.8 --container-runtime-endpoint=unix:///var/run/cri-dockerd.sock"
# kubectl describe node | grep "kubeadm.alpha.kubernetes.io/cri-socket"
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
配置完之后,不再有 Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/dockershim.sock". Please update your configuration!
的提示了
# ./kubeadm-1.24.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.23.12
[upgrade/versions] kubeadm version: v1.24.12
node-role.kubernetes.io/master
, 替换成了 node-role.kubernetes.io/control-plane
, 所以需要手动添加 master
相关的 lable 和 taint,否则之前使用了该 label,taint 的应用可能你懂的。# kubectl describe nodes node1 node2 node3 | grep Taint
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Taints: node-role.kubernetes.io/control-plane:NoSchedule
# kubectl get nodes -l node-role.kubernetes.io/master
No resources found
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane 11d v1.24.12
node2 Ready control-plane 11d v1.23.12
node3 Ready control-plane 11d v1.23.12
node4 Ready worker 11d v1.23.12
node5 Ready worker 11d v1.23.12
node6 Ready worker 11d v1.23.12
# kubectl label nodes node1 node2 node3 node-role.kubernetes.io/master=
node/node1 labeled
node/node2 labeled
node/node3 labeled
# kubectl taint nodes node1 node2 node3 node-role.kubernetes.io/control-plane-
node/node1 untainted
node/node2 untainted
node/node3 untainted
# kubectl taint nodes node1 node2 node3 node-role.kubernetes.io/master:NoSchedule --overwrite
node/node1 modified
node/node2 modified
node/node3 modified
参照 1.19.10 -> 1.20.12
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 11d v1.25.12
node2 Ready control-plane,master 11d v1.25.12
node3 Ready control-plane,master 11d v1.25.12
node4 Ready worker 11d v1.25.12
node5 Ready worker 11d v1.25.12
node6 Ready worker 11d v1.25.12
这样操作下来,猜测升级到 1.26,1.27 也是一样的套路吧