前提条件:
添加master节点
环境:
[root@node131 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
cm-server Ready 123d v1.14.1
node130 Ready master 135d v1.14.1
node132 Ready master 135d v1.14.1
现在要往集群中添加node131为master。
在node130上操作
[root@node130 ~]# kubeadm init phase upload-certs --experimental-upload-certs
I0313 20:23:00.559871 28333 version.go:248] remote version is much newer: v1.17.4; falling back to: stable-1.14
[upload-certs] Storing the certificates in ConfigMap "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
a4786889248af30b616aa8307968e56007b9be1821fefe922e040f6e5e85f5f4
[root@node130 ~]# kubeadm token create --print-join-command
kubeadm join apiserver.cluster.local:6443 --token 53e01r.qgkiaw0u5x72cuu1 --discovery-token-ca-cert-hash sha256:9e3e902497b8ab6c4e9111482aaed5a094013e00ff3f0d68f5489078480df3cf
把以上获得的两条进行拼接获得下面的join命令:
kubeadm join apiserver.cluster.local:6443 --token 53e01r.qgkiaw0u5x72cuu1 --discovery-token-ca-cert-hash sha256:9e3e902497b8ab6c4e9111482aaed5a094013e00ff3f0d68f5489078480df3cf \
--experimental-control-plane --certificate-key a4786889248af30b616aa8307968e56007b9be1821fefe922e040f6e5e85f5f4
在node131上操作:
tar zxvf kube1.14.1.tar.gz
cd kube/shell && sh init.sh
再执行join命令:
kubeadm join apiserver.cluster.local:6443 --token 53e01r.qgkiaw0u5x72cuu1 --discovery-token-ca-cert-hash sha256:9e3e902497b8ab6c4e9111482aaed5a094013e00ff3f0d68f5489078480df3cf \
--experimental-control-plane --certificate-key a4786889248af30b616aa8307968e56007b9be1821fefe922e040f6e5e85f5f4
遇到报错:error execution phase check-etcd: etcd cluster is not healthy: context deadline exceeded。
这个报错是因为etcd集群没有找到。
处理报错:
1、在kubeadm-config删除的状态不存在的etcd节点:
因为节点node131的etcd没有找到,所以我们在这里把它删除。
[root@node131 ~]# kubectl edit configmaps -n kube-system kubeadm-config
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
ClusterConfiguration: |
apiServer:
certSANs:
- 127.0.0.1
- apiserver.cluster.local
- 192.168.3.130
- 192.168.3.131
- 192.168.3.132
- 10.103.97.2
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: apiserver.cluster.local:6443
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.14.1
networking:
dnsDomain: cluster.local
podSubnet: 100.64.0.0/10
serviceSubnet: 10.96.0.0/12
scheduler: {}
ClusterStatus: |
apiEndpoints:
node130:
advertiseAddress: 192.168.3.130
bindPort: 6443
# node131:
# advertiseAddress: 192.168.3.131
# bindPort: 6443
node132:
advertiseAddress: 192.168.3.132
bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterStatus
kind: ConfigMap
metadata:
creationTimestamp: "2019-10-30T02:53:33Z"
name: kubeadm-config
namespace: kube-system
resourceVersion: "775"
selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-config
uid: 7640fad9-fac0-11e9-9990-e0d55e7a4876
因为etcd集群未自动删除此节点上的etcd成员,因此需要手动删除。
接下来还要进入etcd容器中把node131的配置删除。
[root@node132 kubernetes]# kubectl exec -it etcd-node130 sh -n kube-system
/ # export ETCDCTL_API=3
/ #
/ # alias etcdctl='etcdctl --endpoints=https://192.168.3.130:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
/ #
/ # etcdctl member list
ca953a0d64b48849, started, node132, https://192.168.3.132:2380, https://192.168.3.132:2379
df8b283813118626, started, node130, https://192.168.3.130:2380, https://192.168.3.130:2379
ea4c8cb8cc15f00b, started, node131, https://192.168.3.131:2380, https://192.168.3.131:2379
/ # etcdctl member remove ea4c8cb8cc15f00b
Member ea4c8cb8cc15f00b removed from cluster 884edae04b421411
/ #
/ # etcdctl member list
ca953a0d64b48849, started, node132, https://192.168.3.132:2380, https://192.168.3.132:2379
df8b283813118626, started, node130, https://192.168.3.130:2380, https://192.168.3.130:2379
最后每次kubeadm join失败后要kubeadm reset重置节点,在kubeadm join才会成功。
[root@node131 ~]# kubeadm reset
[root@node131 ~]# kubeadm join apiserver.cluster.local:6443 --token 53e01r.qgkiaw0u5x72cuu1 --discovery-token-ca-cert-hash sha256:9e3e902497b8ab6c4e9111482aaed5a094013e00ff3f0d68f5489078480df3cf \
--experimental-control-plane --certificate-key a4786889248af30b616aa8307968e56007b9be1821fefe922e040f6e5e85f5f4
[root@node131 ~]# mkdir -p $HOME/.kube
[root@node131 ~]# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@node131 ~]# chown $(id -u):$(id -g) $HOME/.kube/config
下面我们查看一下集群情况:
[root@node131 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
cm-server Ready 123d v1.14.1
node130 Ready master 135d v1.14.1
node131 Ready master 52s v1.14.1
node132 Ready master 135d v1.14.1
node131已经作为master加入到集群中了。
觉得有帮助的同学请点一波关注,以后有更多干货分享给大家。