Have you ever thought about a “low-level” way of changing the etcd data of your Kubernetes cluster? That is, you alter etcd-stored values without using any common Kubernetes tooling like its native CLI utilities or even API. We’ve been made to perform such a task and here’s our story: why and how we’ve done it.
您是否曾经考虑过更改Kubernetes集群的etcd数据的“低级”方法? 也就是说,您无需使用任何通用的Kubernetes工具(例如其本机CLI实用程序甚至API)即可更改etcd存储的值。 我们被迫执行这样的任务,这就是我们的故事:为什么以及如何做到这一点。
一切如何开始 (How it all started)
An increasing number of customers (that’s basically developers) ask us to provide access to the Kubernetes cluster in order to interact with internal services. They want to be able to connect directly to a database or a service, to connect their local application to other applications within the cluster, etc.
越来越多的客户(基本上是开发人员)要求我们提供对Kubernetes集群的访问权限,以便与内部服务进行交互。 他们希望能够直接连接到数据库或服务,将本地应用程序连接到集群中的其他应用程序,等等。
For example, you might need to connect to the memcached.staging.svc.cluster.local
service from your local machine. We accomplish that via a VPN inside the cluster to which the client connects. To do this, we announce subnets related to pods & services and push cluster’s DNS to the client. As a result, when the client tries to connect to the memcached.staging.svc.cluster.local
service, the request goes to the cluster’s DNS. It returns the address for this service from the cluster’s service network or the address of the pod.
例如,您可能需要从本地计算机连接到memcached.staging.svc.cluster.local
服务。 我们通过客户端所连接的集群内部的VPN来实现。 为此,我们宣布与Pod和服务相关的子网,并将集群的DNS推送到客户端。 因此,当客户端尝试连接到memcached.staging.svc.cluster.local
服务时,请求将转到群集的DNS。 它从群集的服务网络返回此服务的地址或Pod的地址。
We configure K8s clusters using kubeadm
. In this case, the default service subnet is 192.168.0.0/16
, and the pod’s subnet is 10.244.0.0/16
. Generally, this approach works just fine. However, there are a couple of subtleties:
我们使用kubeadm
配置K8s集群。 在这种情况下,默认服务子网是192.168.0.0/16
,并且容器的子网是10.244.0.0/16
。 通常,此方法效果很好。 但是,有一些微妙之处:
The
192.168.*.*
subnet is often used in our customers’ offices, and even more often in the home offices of developers. And that is a recipe for disaster: home routers use the same address space, so the VPN pushes these subnets from the cluster to the client.192.168.*.*
子192.168.*.*
常在客户的办公室中使用,甚至在开发人员的家庭办公室中也更常用。 这就是灾难的秘诀:家用路由器使用相同的地址空间,因此VPN将这些子网从群集推送到客户端。We have several clusters (production, stage, multiple dev clusters). In this case, all of them will have the same subnets for pods and services by default, which makes it very difficult to use services in multiple clusters simultaneously.
我们有几个集群( production , stage ,多个dev集群)。 在这种情况下,默认情况下,它们全部将具有用于pod和服务的相同子网,这使得同时使用多个群集中的服务非常困难。
We have been using different subnets for different services and pods within the same project for quite a while. In this case, any cluster has its own networks. At the same time, we are maintaining a large number of K8s clusters that we would prefer not to redeploy from the scratch since they have many running services, stateful applications, and so on.
我们已经在同一项目中使用了不同的子网用于不同的服务和Pod。 在这种情况下,任何群集都有其自己的网络。 同时,我们正在维护大量的K8集群,我们不希望它们从头开始重新部署,因为它们具有许多正在运行的服务,有状态的应用程序等。
At some point, we’ve asked ourselves: how do we change a subnet in the existing cluster?
在某个时候,我们问自己:我们如何更改现有集群中的子网?
寻找解决方案 (Searching for a solution)
The most common way is to recreate all services of the ClusterIP type. You can find this kind of advices as well:
最常见的方法是重新创建所有 ClusterIP类型的服务。 您也可以找到这种建议 :
The following process has a problem: after everything configured, the pods come up with the old IP as a DNS nameserver in /etc/resolv.conf.
以下过程存在问题:配置完所有内容后,pod在/etc/resolv.conf中将旧IP用作DNS名称服务器。
Since I still did not find the solution, I had to reset the entire cluster with kubeadm reset and init it again.
由于仍然找不到解决方案,因此我不得不使用kubeadm reset重置整个集群,然后再次将其初始化。
Unfortunately, that does not work for everyone… Let’s have a more detailed problem definition for our case:
不幸的是,这并不适合所有人。让我们为我们的案例提供更详细的问题定义:
- We use Flannel; 我们使用法兰绒;
- There are both bare metal and cloud Kubernetes clusters; Kubernetes集群既有裸机集群,也有云集群。
- We would prefer to avoid having to redeploy all services in the cluster; 我们希望避免重新部署集群中的所有服务;
- We would like to make the transition as hassle-free as possible; 我们希望尽可能轻松地进行过渡;
- The cluster is managed by Kubernetes 1.16.6 (however, our steps will fit other versions, too); 该集群由Kubernetes 1.16.6管理(但是,我们的步骤也适用于其他版本);
The goal is to replace the
192.168.0.0/16
service subnet with172.24.0.0/16
in the cluster deployed usingkubeadm
.我们的目标是取代
192.168.0.0/16
与服务子网172.24.0.0/16
使用部署在集群中kubeadm
。
As a matter of fact, we have long been tempted to investigate how Kubernetes stores its data in etcd and what can be done with this storage at all… So we just thought: “Why don’t we update the data in etcd by replacing old subnet IPs with the new ones?”
事实上,我们很久以来一直很想研究Kubernetes如何将其数据存储在etcd中以及该存储可以做什么……因此,我们只是想:“ 为什么不通过替换旧的来更新etcd中的数据子网IP与新IP? ”
We have been looking for ready-made tools for modifying data in etcd… and nothing has met our needs. But it’s not all bad: etcdhelper by OpenShift was a good starting point (thanks to its creators!). This tool can connect to etcd using certificates, and read etcd data using ls
, get
, dump
commands.
我们一直在寻找现成的工具来修改etcd中的数据……而没有任何东西可以满足我们的需求。 但这并不尽然 :OpenShift的etcdhelper是一个不错的起点( 感谢其创建者! )。 该工具可以使用证书连接到etcd,并使用ls
, get
, dump
命令读取 etcd数据。
By the way, do not hesitate to share links if you are aware of tools for direct processing data in etcd!
顺便说一下,如果您知道直接在etcd中处理数据的工具,请不要犹豫共享链接!
扩展etcdhelper (Extending etcdhelper)
Looking at etcdhelper we thought: “Why don’t we expand this utility so it will write data to etcd?”
看着etcdhelper,我们想到:“为什么不扩展此实用程序,以便它将数据写入 etcd?”
Our efforts have resulted in creating an updated version of etcdhelper with two new functions: changeServiceCIDR
and changePodCIDR
. Its source code is available here.
我们的努力导致了两个新的功能创造etcdhelper的更新版本: changeServiceCIDR
和changePodCIDR
。 它的源代码 在这里 。
What do the new features do? Here is the algorithm of changeServiceCIDR
:
新功能有什么作用? 这是changeServiceCIDR
的算法:
- we create a deserializer; 我们创建一个反序列化器;
- compile a regular expression to replace CIDR; 编译正则表达式以替换CIDR;
- go through a list of ClusterIP services in the cluster and perform a few operations for each of them. 浏览群集中的ClusterIP服务列表,并对每个服务执行一些操作。
Here are our operations:
这是我们的操作:
- we decode the etcd value and place it in the Go object; 我们解码etcd值并将其放置在Go对象中;
- replace the first two bytes of the address using a regular expression; 使用正则表达式替换地址的前两个字节;
- assign the service an IP address from the new subnet’s address range; 从新子网的地址范围中为服务分配IP地址;
- create a serializer, convert the Go object to protobuf, write new data to etcd. 创建一个序列化器,将Go对象转换为protobuf,将新数据写入etcd。
The changePodCIDR
function is essentially the same as changeServiceCIDR
. The only difference is that instead of services, we edit the specification of nodes and replace the value of .spec.PodCIDR
with the new subnet.
该changePodCIDR
功能是基本相同changeServiceCIDR
。 唯一的不同是,我们代替了服务,而是编辑节点的规范,并用新的子网替换了.spec.PodCIDR
的值。
用法 (Usage)
更换服务CIDR (Replacing serviceCIDR)
This task is very straightforward to implement. However, it involves a downtime while all the pods in the cluster are being recreated. First, we will describe the main steps, and later, we will share our thoughts on how to minimize that downtime.
该任务非常容易实现。 但是,这将导致停机,而集群中的所有Pod都将被重新创建。 首先,我们将描述主要步骤,然后,我们将分享有关如何最大程度地减少停机时间的想法。
Preparatory steps:
准备步骤:
install the necessary software and build the patched
etcdhelper
tool;安装必要的软件并构建修补的
etcdhelper
工具;back up your etcd and
/etc/kubernetes
.备份您的etcd和
/etc/kubernetes
。
Here is a summary of actions for changing serviceCIDR:
以下是更改serviceCIDR的操作摘要:
- make changes in apiserver and controller-manager manifests; 更改apiserver和控制器管理器清单;
- reissue certificates; 重新签发证书;
- modify the ClusterIP specification of services in etcd; 修改etcd中服务的ClusterIP规范;
- restart all pods in the cluster. 重新启动集群中的所有Pod。
Below is a detailed description of the steps.
以下是这些步骤的详细说明。
1. Install etcd-client
for dumping the data:
1.安装etcd-client
以转储数据:
apt install etcd-client
2. Build the etcdhelper
tool:
2.构建etcdhelper
工具:
Install
golang
:安装
golang
:
GOPATH=/root/golang
mkdir -p $GOPATH/local
curl -sSL https://dl.google.com/go/go1.14.1.linux-amd64.tar.gz | tar -xzvC $GOPATH/local
echo "export GOPATH=\"$GOPATH\"" >> ~/.bashrc
echo 'export GOROOT="$GOPATH/local/go"' >> ~/.bashrc
echo 'export PATH="$PATH:$GOPATH/local/go/bin"' >> ~/.bashrc
Copy
etcdhelper.go
, download dependencies, build the tool:复制
etcdhelper.go
,下载依赖项,构建工具:
wget https://raw.githubusercontent.com/flant/examples/master/2020/04-etcdhelper/etcdhelper.go
go get go.etcd.io/etcd/clientv3 k8s.io/kubectl/pkg/scheme k8s.io/apimachinery/pkg/runtime
go build -o etcdhelper etcdhelper.go
3. Back up the etcd data:
3.备份etcd数据:
backup_dir=/root/backup
mkdir ${backup_dir}
cp -rL /etc/kubernetes ${backup_dir}
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/server.key --cert=/etc/kubernetes/pki/etcd/server.crt --endpoints https://192.168.199.100:2379 snapshot save ${backup_dir}/etcd.snapshot
4. Switch the services subnet in the manifests of the Kubernetes control plane. Replace the value of the --service-cluster-ip-range
parameter with the new subnet (172.24.0.0/16
instead of 192.168.0.0/16
) in /etc/kubernetes/manifests/kube-apiserver.yaml
and /etc/kubernetes/manifests/kube-controller-manager.yaml
.
4.在Kubernetes控制平面清单中切换服务子网。 用/etc/kubernetes/manifests/kube-apiserver.yaml
和/etc/kubernetes/manifests/kube-controller-manager.yaml
的新子网( 172.24.0.0/16
而不是192.168.0.0/16
)替换--service-cluster-ip-range
参数的值。 /etc/kubernetes/manifests/kube-controller-manager.yaml
。
5. Since we are making changes to the service subnet for which kubeadm
issues the apiserver certificates (among others), you have to reissue them:
5.由于我们正在更改kubeadm
为其颁发apiserver证书(以及其他证书)的服务子网,因此您必须重新发布它们:
5.1. Check which domains and IP addresses current certificate is issued for:
5.1。 检查当前证书颁发给的域和IP地址:
openssl x509 -noout -ext subjectAltName X509v3 Subject Alternative Name:
DNS:dev-1-master, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:apiserver, IP Address:192.168.0.1, IP Address:10.0.0.163, IP Address:192.168.199.100
5.2. Prepare the basic config for kubeadm
:
5.2。 准备kubeadm
的基本配置:
cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
networking:
podSubnet: "10.244.0.0/16"
serviceSubnet: "172.24.0.0/16"
apiServer:
certSANs:
- "192.168.199.100" # master node's IP address
5.3. Delete the old crt
and key
files (you have to remove them in order to issue the new certificate):
5.3。 删除旧的crt
和key
文件(您必须删除它们才能颁发新证书):
rm /etc/kubernetes/pki/apiserver.{key,crt}
5.4. Reissue certificates for the API server:
5.4。 重新颁发API服务器的证书:
kubeadm init phase certs apiserver --config=kubeadm-config.yaml
5.5. Check that the certificate is issued for the new subnet:
5.5。 检查是否为新子网颁发了证书:
openssl x509 -noout -ext subjectAltName X509v3 Subject Alternative Name:
DNS:kube-2-master, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:172.24.0.1, IP Address:10.0.0.163, IP Address:192.168.199.100
5.6. After the API server certificate reissue, you’ll have to restart its container:
5.6。 重新颁发API服务器证书后,您必须重新启动其容器:
docker ps | grep k8s_kube-apiserver | awk '{print $1}' | xargs docker restart
5.7. Renew the certificate embedded in the admin.conf
:
5.7。 续订admin.conf
嵌入的证书:
kubeadm alpha certs renew admin.conf
5.8. Edit the data in etcd:
5.8。 编辑etcd中的数据:
./etcdhelper -cacert /etc/kubernetes/pki/etcd/ca.crt -cert /etc/kubernetes/pki/etcd/server.crt -key /etc/kubernetes/pki/etcd/server.key -endpoint https://127.0.0.1:2379 change-service-cidr 172.24.0.0/16
Caution! At this point, the DNS stops resolving domain names in the cluster. It happens because the existing pods still have the old CoreDNS (kube-dns) address in /etc/resolv.conf
, while kube-proxy has already changed iptables’ rules using our new subnet instead of the old one. Below, we will discuss possible ways to minimize downtime.
警告! 此时,DNS停止解析群集中的域名。 发生这种情况是因为现有的Pod在/etc/resolv.conf
仍然具有旧的CoreDNS(kube-dns)地址,而kube-proxy已经使用新的子网而不是旧的子网更改了iptables的规则。 下面,我们将讨论减少停机时间的可能方法。
5.9. Edit ConfigMaps in the kube-system
namespace:
5.9。 在kube-system
名称空间中编辑ConfigMap:
a) In this CM:
a)在此CM中:
kubectl -n kube-system edit cm kubelet-config-1.16
— replace ClusterDNS
with the new IP address of the kube-dns service: kubectl -n kube-system get svc kube-dns
.
—用ClusterDNS
-dns服务的新IP地址替换ClusterDNS
: kubectl -n kube-system get svc kube-dns
。
b) In this CM:
b)在此CM中:
kubectl -n kube-system edit cm kubeadm-config
— switch the data.ClusterConfiguration.networking.serviceSubnet
parameter to the new subnet.
—将data.ClusterConfiguration.networking.serviceSubnet
参数切换到新的子网。
5.10. Since the kube-dns address has changed, you need to update the kubelet config on all nodes:
5.10。 由于kube-dns地址已更改,因此您需要在所有节点上更新kubelet配置:
kubeadm upgrade node phase kubelet-config && systemctl restart kubelet
5.11. It is time to restart all pods in the cluster:
5.11。 现在该重新启动集群中的所有Pod了:
kubectl get pods --no-headers=true --all-namespaces |sed -r 's/(\S+)\s+(\S+).*/kubectl --namespace \1 delete pod \2/e'
减少停机时间 (Minimizing downtime)
Here are a few ideas on how to minimize downtime:
以下是有关如何最大程度地减少停机时间的一些想法:
After editing the control plane manifests, you can create a new kube-dns service with a new name (e.g.,
kube-dns-tmp
) and a new address (172.24.0.10
).编辑控制平面清单后,可以使用新名称(例如
kube-dns-tmp
)和新地址(172.24.0.10
)创建新的kube-dns服务。Then you can insert the
if
condition inetcdhelper
. It will prevent modifying the kube-dns service.然后,您可以将
if
条件插入etcdhelper
。 这将防止修改kube-dns服务。- Replace the old ClusterDNS address in all kubelets with the new one (meanwhile, the old service will continue running simultaneously with the new one). 将所有kubelet中的旧ClusterDNS地址替换为新的(同时,旧服务将继续与新服务同时运行)。
- Wait until all applications’ pods will be redeployed either naturally or at the agreed time. 等待直到所有应用程序的pod都将自然地或在约定的时间重新部署。
Delete the
kube-dns-tmp
service and editserviceSubnetCIDR
for the kube-dns service.删除
kube-dns-tmp
服务,并为kube-dns服务编辑serviceSubnetCIDR
。
This plan will shorten downtime approximately to a minute: the period required to delete the kube-dns-tmp
service and switch the subnet of the kube-dns service.
该计划将停机时间缩短到大约一分钟:删除kube-dns-tmp
服务和切换kube-dns服务的子网所需的时间。
修改podNetwork (Modifying podNetwork)
Along the way, we have decided to modify podNetwork using our etcdhelper
. Here is the required sequence of actions:
在此过程中,我们决定使用etcdhelper
修改etcdhelper
。 这是必需的操作顺序:
edit configurations in the
kube-system
namespace;在
kube-system
名称空间中编辑配置;- edit the manifest of the kube-controller-manager; 编辑kube-controller-manager的清单;
- edit podCIDR directly in etcd; 直接在etcd中编辑podCIDR;
- restart all nodes in the cluster; 重新启动集群中的所有节点;
Below is a detailed description of the above actions:
以下是上述操作的详细说明:
Edit ConfigMaps in the
kube-system
namespace:在
kube-system
名称空间中编辑ConfigMap:
a) Here:
a)在这里:
kubectl -n kube-system edit cm kubeadm-config
— replace data.ClusterConfiguration.networking.podSubnet
with the new subnet (10.55.0.0/16
).
—用新子网( 10.55.0.0/16
)替换data.ClusterConfiguration.networking.podSubnet
。
b) Here:
b)在这里:
kubectl -n kube-system edit cm kube-proxy
— specify the new data.config.conf.clusterCIDR: 10.55.0.0/16
.
—指定新的data.config.conf.clusterCIDR: 10.55.0.0/16
。
2. Edit the manifest of the controller-manager:
2.编辑控制器管理器的清单:
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
— specify: --cluster-cidr=10.55.0.0/16
.
—指定:-- --cluster-cidr=10.55.0.0/16
。
3. Verify the current values of .spec.podCIDR
, .spec.podCIDRs
, .InternalIP
, .status.addresses
for all cluster nodes:
3.验证所有集群节点的.spec.podCIDR
, .spec.podCIDRs
, .InternalIP
, .status.addresses
的当前值:
kubectl get no -o json | jq '[.items[] | {"name": .metadata.name, "podCIDR": .spec.podCIDR, "podCIDRs": .spec.podCIDRs, "InternalIP": (.status.addresses[] | select(.type == "InternalIP") | .address)}]'[
{
"name": "kube-2-master",
"podCIDR": "10.244.0.0/24",
"podCIDRs": [
"10.244.0.0/24"
],
"InternalIP": "192.168.199.2"
},
{
"name": "kube-2-master",
"podCIDR": "10.244.0.0/24",
"podCIDRs": [
"10.244.0.0/24"
],
"InternalIP": "10.0.1.239"
},
{
"name": "kube-2-worker-01f438cf-579f9fd987-5l657",
"podCIDR": "10.244.1.0/24",
"podCIDRs": [
"10.244.1.0/24"
],
"InternalIP": "192.168.199.222"
},
{
"name": "kube-2-worker-01f438cf-579f9fd987-5l657",
"podCIDR": "10.244.1.0/24",
"podCIDRs": [
"10.244.1.0/24"
],
"InternalIP": "10.0.4.73"
}
]
4. Replace podCIDR
by editing etcd directly:
4.通过直接编辑etcd来替换podCIDR
:
./etcdhelper -cacert /etc/kubernetes/pki/etcd/ca.crt -cert /etc/kubernetes/pki/etcd/server.crt -key /etc/kubernetes/pki/etcd/server.key -endpoint https://127.0.0.1:2379 change-pod-cidr 10.55.0.0/16
5. Check if podCIDR
has changed:
5.检查podCIDR
是否已更改:
kubectl get no -o json | jq '[.items[] | {"name": .metadata.name, "podCIDR": .spec.podCIDR, "podCIDRs": .spec.podCIDRs, "InternalIP": (.status.addresses[] | select(.type == "InternalIP") | .address)}]'[
{
"name": "kube-2-master",
"podCIDR": "10.55.0.0/24",
"podCIDRs": [
"10.55.0.0/24"
],
"InternalIP": "192.168.199.2"
},
{
"name": "kube-2-master",
"podCIDR": "10.55.0.0/24",
"podCIDRs": [
"10.55.0.0/24"
],
"InternalIP": "10.0.1.239"
},
{
"name": "kube-2-worker-01f438cf-579f9fd987-5l657",
"podCIDR": "10.55.1.0/24",
"podCIDRs": [
"10.55.1.0/24"
],
"InternalIP": "192.168.199.222"
},
{
"name": "kube-2-worker-01f438cf-579f9fd987-5l657",
"podCIDR": "10.55.1.0/24",
"podCIDRs": [
"10.55.1.0/24"
],
"InternalIP": "10.0.4.73"
}
]
6. Restart all nodes of the cluster one at a time.
6.一次重新启动集群的所有节点。
7. If there is at least one node with the old podCIDR, kube-controller-manager will not start, and pods in the cluster will not be scheduled.
7.如果至少有一个带有旧podCIDR的节点,则kube-controller-manager将不会启动,并且集群中的Pod也不会被调度。
As a matter of fact, there are easier ways to change podCIDR (example). But still, we wanted to learn how to work with etcd directly since there are cases when editing Kubernetes objects right in etcd is the only possible solution (for example, there is no way to avoid downtime when changing the spec.clusterIP
field of the Service).
实际上,有更简单的方法来更改podCIDR( 示例 )。 但是,我们仍然想直接学习如何使用etcd,因为在某些情况下, 只能在etcd中直接编辑Kubernetes对象(例如,在更改Service的spec.clusterIP
字段时,无法避免停机) )。
摘要 (Summary)
In this article, we have explored the possibility of working with the data in etcd directly (i.e., without using the Kubernetes API). At times, this approach allows you to do some “tricky things”. We have successfully tested all the above steps using our etcdhelper on real K8s clusters. However, the whole scenario is still PoC (proof of concept) only. Please use it at your own risk.
在本文中,我们探讨了直接在etcd中处理数据的可能性(即,无需使用Kubernetes API)。 有时,这种方法使您可以做一些“棘手的事情”。 我们已经使用我们的etcdhelper在真实的K8s集群上成功测试了上述所有步骤。 但是,整个场景仍然只是PoC(概念验证) 。 请自行承担风险。
This article has been written by our engineers, Vitaly Snurnitsyn & Andrey Sidorov. Follow our blog to get new excellent content from Flant!
本文由我们的工程师 Vitaly Snurnitsyn 和 Andrey Sidorov 撰写 。 跟随 我们的博客 ,从Flant获得新的优秀内容!
翻译自: https://medium.com/flant-com/modifying-kubernetes-etcd-data-ed3d4bb42379