上一集:K8S最新版本集群部署(v1.28) + 容器引擎Docker部署(上)
*️⃣主目录:温故知新专栏
下一集:Kubernetes可视化管理工具Kuboard部署使用及k8s常用命令梳理记录
kubectl
是使用 Kubernetes API
与 Kubernetes
集群的控制面进行通信的命令行工具。详见官网安装步骤
[kubernetes@renxiaozhao01 ~]$ sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
[kubernetes@renxiaozhao01 ~]$
[kubernetes@renxiaozhao01 ~]$ kubectl version --client
Client Version: v1.28.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
[kubernetes@renxiaozhao01 ~]$
[kubernetes@renxiaozhao01 ~]$
就这么简单,之前傻傻的通过curl下载,估计哪里卡住了,一个半小时都没下载好,网络环境好的话当然还是直接curl -LO https://dl.k8s.io/release/v1.28.1/bin/linux/amd64/kubectl
方便
执行kubectl version --client
或者kubectl version --client --output=yaml
[kubernetes@renxiaozhao01 ~]$ kubectl version --client --output=yaml
clientVersion:
buildDate: "2023-08-24T11:23:10Z"
compiler: gc
gitCommit: 8dc49c4b984b897d423aab4971090e1879eb4f23
gitTreeState: clean
gitVersion: v1.28.1
goVersion: go1.20.7
major: "1"
minor: "28"
platform: linux/amd64
kustomizeVersion: v5.0.4-0.20230601165947-6ce0bf390ce3
[kubernetes@renxiaozhao01 ~]$
[kubernetes@renxiaozhao01 ~]$ kubectl version --client
Client Version: v1.28.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
[kubernetes@renxiaozhao01 ~]$
[kubernetes@renxiaozhao01 ~]$
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=1
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF
安装命令:sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
这里不会重复安装,对于之前已经安装过的kubectl
,yum安装时会自动跳过
[kubernetes@renxiaozhao01 ~]$ kubelet --version
Kubernetes v1.28.1
[kubernetes@renxiaozhao01 ~]$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.1", GitCommit:"8dc49c4b984b897d423aab4971090e1879eb4f23", GitTreeState:"clean", BuildDate:"2023-08-24T11:21:51Z", GoVersion:"go1.20.7", Compiler:"gc", Platform:"linux/amd64"}
[kubernetes@renxiaozhao01 ~]$
[kubernetes@renxiaozhao01 ~]$
执行sudo systemctl enable --now kubelet
命令:
kubelet
运行在每个节点上,并负责管理和监控节点上的容器。它与Kubernetes Master
节点通信,接收来自Master
节点的指令,并根据指令来创建、销毁和管理容器。enable
将kubelet服务设置为开机启动。而通过添加--now
选项,kubelet服务将立即启动kubelet 现在每隔几秒就会重启,因为它陷入了一个等待 kubeadm 指令的死循环。
在使用kubeadm初始化Kubernetes集群时,需要在Master节点上运行一些kubeadm命令来完成初始化过程。其中一个命令是kubeadm init,它将生成一个用于加入其他节点的令牌。kubelet服务在启动时会检查是否存在这个令牌,如果没有找到,它会陷入一个等待kubeadm指令的死循环。
手动启动:systemctl start kubelet
到这里已经在r1机器安装完了Kubernetes核心组件,参照上集,主要就是关闭防火墙那些操作,和本集安装kubeadm
操作(覆盖了kubeadm、kubelet、kubectl三个核心组件),此处做个精简版概括,继续在r2、r3机器安装(机器信息在上集第二章有介绍)。
systemctl stop firewalld.service
systemctl status firewalld.service
systemctl disable firewalld.service
vi /etc/fstab
,注释掉/dev/mapper/centos-swap swap...
这一行vi /etc/sysconfig/selinux
,SELINUX=enforcing
改为SELINUX=disabled
reboot
或者shutdown -r now
useradd kubernetes -d /home/kubernetes
passwd kubernetes
echo 'kubernetes ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=1
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF
sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
kubelet --version
kubeadm version
kubectl version --client
kubeadm init
和kubeadm join
命令在默认情况下会自动下载所需的Docker镜像。这些镜像包括Kubernetes组件(如kube-apiserver、kube-controller-manager、kube-scheduler等)以及其他一些常用的容器镜像(如CoreDNS等)。kubeadm init
和kubeadm join
命令来初始化和加入集群,并且不需要自定义镜像源,kubeadm init
和kubeadm join
命令会自动处理Docker
的安装和配置。尝试了一下直接执行kubeadm init
,反正都是错,所以还是在r1、r2、r3上面手动安装docker相关组件吧
Kubernetes
是一个容器编排平台,用于管理和编排容器化应用程序。它可以与多种容器运行时集成,包括Docker
、containerd
、CRI-O
等。Kubernetes
时,默认情况下会使用Docker
作为容器运行时(这是因为Docker是目前最广泛使用的容器运行时之一,具有广泛的社区支持和成熟的生态系统。因此,大多数Kubernetes安装指南和教程都会建议先安装Docker)。总结起来,Kubernetes
和Docker
是相互独立、不是必须绑定在一起的。但是正常在安装Kubernetes
时,默认都是先安装Docker
容器
yum 这种方式的安装适用于大多数情况,它会自动获取最新的稳定版本
yum-config-manager
是yum-utils
中的一个工具sudo yum install -y yum-utils
sudo yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
docker
sudo yum install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
docker
sudo systemctl start docker
docker
开启启动(虚拟机会经常重启,为了减少麻烦最好设置开机启动)sudo systemctl enable docker
docker
状态sudo systemctl status docker
docker
版本:docker version
[kubernetes@renxiaozhao01 ~]$ docker version
Client: Docker Engine - Community
Version: 24.0.5
API version: 1.43
Go version: go1.20.6
Git commit: ced0996
Built: Fri Jul 21 20:39:02 2023
OS/Arch: linux/amd64
Context: default
permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/version": dial unix /var/run/docker.sock: connect: permission denied
[kubernetes@renxiaozhao01 ~]$
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://xxxx.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
sudo docker info
可以看到刚刚配置的加速器信息官网地址
cri-dockerd 是一种基于 Docker 的容器运行时实现,它通过实现 CRI 接口,使 Kubernetes 能够与 Docker 进行交互,创建和管理容器
官网说明
Git地址,直接点击下载丢到服务器(网络好的话可以直接服务器执行sudo wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.4/cri-dockerd-0.3.4-3.el7.x86_64.rpm
进行下载),下载挺慢…
这次反而在环境上下载快一点
心累…明晚继续
cri-dockerd-0.3.4-3.el7.x86-64.rpm快速下载地址
没有rpm
命令,可以执行yum - y intall rpm
安装该命令
# 重载系统守护进程
sudo systemctl daemon-reload
# 设置cri-dockerd自启动
sudo systemctl enable cri-docker.socket cri-docker
# 启动cri-dockerd
sudo systemctl start cri-docker.socket cri-docker
# 查看cri-dockerd状态
sudo systemctl status cri-docker.socket
名称添加k8s标识,方便区分,日后部署hadoop集群可以叫hp-master、hp-slave01、hp-slave02
[kubernetes@renxiaozhao01 ~]$ sudo vi /etc/hosts
[kubernetes@renxiaozhao01 ~]$ sudo cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.17.17 k8s-master
192.168.17.18 k8s-node01
192.168.17.19 k8s-node02
[kubernetes@renxiaozhao01 ~]$
内核模块配置文件
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
加载内核模块:
net.bridge.bridge-nf-call-iptables
:用于启用桥接网络对 iptables 的调用net.bridge.bridge-nf-call-ip6tables
: 用于启用桥接网络对 ip6tables 的调用net.ipv4.ip_forward
: 用于启用 IP 转发初始化命令
sudo kubeadm init --node-name=k8s-master --image-repository=registry.aliyuncs.com/google_containers --cri-socket=unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=192.168.17.17 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
kubeadm init
命令中的参数解释(上面的官网地址中也有解释):
--node-name=k8s-master
: 指定k8s集群中Master节点的名称,和上面/etc/hosts配置中192.168.17.17 k8s-master名称一致。--image-repository=registry.aliyuncs.com/google_containers
: 指定容器镜像的仓库地址。--cri-socket=unix:///var/run/cri-dockerd.sock
: 指定容器运行时的 CRI(Container Runtime Interface)套接字路径(Docker 的配置文件路径通常是 /etc/docker/daemon.json,CRI 套接字路径通常不会在 Docker 的配置文件中直接指定。Kubernetes一般会使用默认的 Docker 套接字路径 /var/run/docker.sock)。--apiserver-advertise-address=192.168.17.17
: 指定Kubernetes API Server的地址,即Master节点的IP地址。和上面/etc/hosts配置中192.168.17.17 k8s-master的IP一致--pod-network-cidr=10.244.0.0/16
: 指定 Pod 网络的 CIDR(Classless Inter-Domain Routing)地址范围。默认使用这个即可。--service-cidr=10.96.0.0/12
: 指定服务网络的 CIDR 地址范围(需要选择一个未被使用的 CIDR 地址范围),一般默认使用这个即可。[kubernetes@renxiaozhao01 ~]$ sudo kubeadm init --node-name=k8s-master --image-repository=registry.aliyuncs.com/google_containers --cri-socket=unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=192.168.17.17 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
[init] Using Kubernetes version: v1.28.1
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.17.17]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.17.17 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.17.17 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 16.526647 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8s-master as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8s-master as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: 4ydg4t.7cjjm52hd4p86gmk
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.17.17:6443 --token 4ydg4t.7cjjm52hd4p86gmk \
--discovery-token-ca-cert-hash sha256:ee2c3ae1c2d702b77a0b52f9dafe734aa7e25f33c44cf7fa469c1adc8c176be1
[kubernetes@renxiaozhao01 ~]$
[kubernetes@renxiaozhao01 ~]$ mkdir -p $HOME/.kube
[kubernetes@renxiaozhao01 ~]$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[kubernetes@renxiaozhao01 ~]$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
[kubernetes@renxiaozhao01 ~]$
[kubernetes@renxiaozhao01 ~]$
export KUBECONFIG=/etc/kubernetes/admin.conf
临时生效,建议配置放到环境变量文件(非root用户不需要再配置环境变量)应该是必须安装,一开始没安装,执行下 kubeadm join
也可以添加成功(但是状态应该不对),后来 kubeadm join
添加节点名称参数时候一直不成功,然后尝试安装网络插件flannel下载地址,简单记录下吧
kube-flannel.yml
,指定网络接口名称,搜索kube-subnet-mgr
,下面添加 - --iface=ens33
(对应自己的固定IP网关)kubectl apply -f kube-flannel.yml
,如果非root
用户,添加sudo
会报错,很多加sudo
执行都会报错localhost:8080 was refused
,去掉就好了,有待研究[kubernetes@renxiaozhao03 ~]$ kubectl apply -f kube-flannel.yml
namespace/kube-flannel unchanged
serviceaccount/flannel unchanged
clusterrole.rbac.authorization.k8s.io/flannel unchanged
clusterrolebinding.rbac.authorization.k8s.io/flannel unchanged
configmap/kube-flannel-cfg unchanged
daemonset.apps/kube-flannel-ds unchanged
[kubernetes@renxiaozhao03 ~]$
[kubernetes@renxiaozhao03 ~]$
[kubernetes@renxiaozhao03 ~]$
[kubernetes@renxiaozhao03 ~]$ sudo kubectl apply -f kube-flannel.yml
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[kubernetes@renxiaozhao03 ~]$
[kubernetes@renxiaozhao03 ~]$
同步master
上的 .kube
到node
上面: scp -r .kube 192.168.17.18:/home/kubernetes/
执行上文生成的join指令:
sudo kubeadm join 192.168.17.17:6443 --token 4ydg4t.7cjjm52hd4p86gmk --discovery-token-ca-cert-hash sha256:ee2c3ae1c2d702b77a0b52f9dafe734aa7e25f33c44cf7fa469c1adc8c176be1 --cri-socket=unix:///var/run/cri-dockerd.sock
添加成功后,查看节点信息kubectl get nodes
有个节点没准备好呀,有缘再说吧,累了,不过好像可以确认网络插件属于必装,比如flannel
没忍住排查了一下,反正现在好了,就是按照下面的问题四,修改了一下cri-docker.service
,然后反复重启了几次
[kubernetes@renxiaozhao03 ~]$ sudo systemctl stop cri-docker
[kubernetes@renxiaozhao03 ~]$ #怎么修改参照问题四
[kubernetes@renxiaozhao03 ~]$ sudo vi /usr/lib/systemd/system/cri-docker.service
[kubernetes@renxiaozhao03 ~]$ sudo systemctl daemon-reload
[kubernetes@renxiaozhao03 ~]$ sudo systemctl start cri-docker
[kubernetes@renxiaozhao03 ~]$ sudo systemctl restart kubelet docker cri-docker
[kubernetes@renxiaozhao03 ~]$
[kubernetes@renxiaozhao03 ~]$
[kubernetes@renxiaozhao03 ~]$ sudo systemctl restart kubelet docker cri-docker
[kubernetes@renxiaozhao03 ~]$ kubectl get pod -A|grep kube-flannel-ds-rc8vq
kube-flannel kube-flannel-ds-rc8vq 1/1 Running 2 (33s ago) 164m
[kubernetes@renxiaozhao03 ~]$
略
[kubernetes@renxiaozhao01 ~]$ sudo yum -y install lrzsz
已加载插件:fastestmirror, langpacks
Loading mirror speeds from cached hostfile
Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=stock error was
14: curl#7 - "Failed to connect to 2600:1f16:c1:5e01:4180:6610:5482:c1c0: 网络不可达"
One of the configured repositories failed (未知),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Run the command with the repository temporarily disabled
yum --disablerepo= ...
4. Disable the repository permanently, so yum won' t use it by default. Yum
will then just ignore the repository until you permanently enable it
again or use --enablerepo for temporary usage:
yum-config-manager --disable <repoid>
or
subscription-manager repos --disable=<repoid>
5. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: base/7/x86_64
sudo systemctl restart network
从公司回到家,切换了网络,虚拟机已经重启了,ping www.baidu.com
也是通的,后面又试了一下,发现不通了,突然想起来中间切换过一次无线网络,重启网络就好了,原因还是本地网络不好,不稳(长城宽带)
[kubernetes@renxiaozhao01 yum.repos.d]$ curl https://www.baidu.com/
curl: (7) Failed connect to www.baidu.com:443; 没有到主机的路由
执行重启命令:sudo systemctl restart network
sudo curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
容器运行时:用于指代用于创建、打包和运行容器的软件组件
容器运行时是负责管理
和执行容器
的主要组件
常见的容器运行时包括Docker
、containerd
、CRI-O
等
[init] Using Kubernetes version: v1.28.1
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0829 19:08:22.039382 55875 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.6" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.aliyuncs.com/google_containers/pause:3.9" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.17.17]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.17.17 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.17.17 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
detected that the sandbox image "registry.k8s.io/pause:3.6" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.aliyuncs.com/google_containers/pause:3.9" as the CRI sandbox image
[kubelet-check] Initial timeout of 40s passed.
vi /usr/lib/systemd/system/cri-docker.service
错误详情一通过修改cri-docker.service解决,错误详情二由错误详情一导致
解决步骤:
停止cri-docker服务:sudo systemctl stop cri-docker
编辑vi /usr/lib/systemd/system/cri-docker.service
--pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9
重新加载服务:sudo systemctl daemon-reload
启动cri-docker服务:sudo systemctl start cri-docker
[kubernetes@renxiaozhao01 ~]$ sudo kubectl apply -f kube-flannel.yml
The connection to the server localhost:8080 was refused - did you specify the right host or port?
[WARNING Hostname]: hostname "renxiaozhao03"could not be reached
虽然是警告信息,但是名称和IP都不是我之前在/etc/hosts
中设置的,所以肯定不对对
--node-name
参数,指定主机名指定--node-name k8s-node02
,中间=
加不加都可以,不加就用空格
sudo kubeadm join 192.168.17.17:6443 --node-name=k8s-node02 --token 4ydg4t.7cjjm52hd4p86gmk --discovery-token-ca-cert-hash sha256:ee2c3ae1c2d702b77a0b52f9dafe734aa7e25f33c44cf7fa469c1adc8c176be1 --cri-socket=unix:///var/run/cri-dockerd.sock
其中--cri-socket=unix:///var/run/cri-dockerd.sock
也是根据错误提示新指定的参数,其中containerd不知道是不是那个组件自带的,博主只装了cri-dockerd
,这一块也迷糊,一方面查资料使用docker
容器,必须安装cri-dockerd
,一方面又说只要containerd
,一步步来吧,先跑起来集群再深究
[kubernetes@renxiaozhao02 .kube]$ sudo kubeadm join 192.168.17.17:6443 --node-name=k8s-node02 --token 4ydg4t.7cjjm52hd4p86gmk --discovery-token-ca-cert-hash sha256:ee2c3ae1c2d702b77a0b52f9dafe734aa7e25f33c44cf7fa469c1adc8c176be1
Found multiple CRI endpoints on the host. Please define which one do you wish to use by setting the 'criSocket' field in the kubeadm configuration file: unix:///var/run/containerd/containerd.sock, unix:///var/run/cri-dockerd.sock
To see the stack trace of this error execute with --v=5 or higher
error execution phase kubelet-start: error uploading crisocket: nodes "k8s-node01" not found
[kubernetes@renxiaozhao02 .kube]$ sudo systemctl stop kubelet
[kubernetes@renxiaozhao02 .kube]$ sudo rm -rf /etc/kubernetes/*
[kubernetes@renxiaozhao02 .kube]$ sudo kubeadm join 192.168.17.17:6443 --token urwtz8.3tvnbe2a63b3fnbl --discovery-token-ca-cert-hash sha256:ee2c3ae1c2d702b77a0b52f9dafe734aa7e25f33c44cf7fa469c1adc8c176be1 --node-name k8s-node01 --cri-socket=unix:///var/run/cri-dockerd.sock
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
error execution phase kubelet-start: error uploading crisocket: nodes "k8s-node01" not found
To see the stack trace of this error execute with --v=5 or higher
[kubernetes@renxiaozhao02 .kube]$
kubeadm reset
重置之后再添加join
不要通过上面删除的方式,博主之前添加成功过renxiaozhao02
节点(不指定--node-name
,默认就是主机名),虽然节点已经删除了,但是貌似有残留,直接在node几点机器执行kubeadm reset,然后再添加,成功
[kubernetes@renxiaozhao03 ~]$ sudo kubeadm reset --cri-socket=unix:///var/run/cri-dockerd.sock
W0830 17:11:12.123313 25230 preflight.go:56] [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0830 17:11:14.355783 25230 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Deleted contents of the etcd data directory: /var/lib/etcd
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
[kubernetes@renxiaozhao03 ~]$ #验证有没有用到ipvs(gpt说的,咋说呢,被它带着走了很远的路)
[kubernetes@renxiaozhao03 ~]$ kubectl get svc -A | grep ipvs
[kubernetes@renxiaozhao03 ~]$
[kubernetes@renxiaozhao03 ~]$
[kubernetes@renxiaozhao03 ~]$ #验证有没有用到 CNI
[kubernetes@renxiaozhao03 ~]$ kubectl get svc -A | grep cni -i
join
会报错,之前已经生成了相关文件,报文件已存在,直接执行了删除,结果引发新的错sudo systemctl stop kubelet
sudo rm -rf /etc/kubernetes/*
直接sudo rm -rf /etc/kubernetes/*
,结果引发了新的报错/etc/kubernetes/manifests
不存在(后来网上看了下,好像本来就没有这个目录,不是删除引起的),也不知道之前有没有这个文件,貌似只有master
节点会用到,最终解决方式是直接把mastre节点上的该目录同步过来,或者直接创建mkdir -p /etc/kubernetes/manifests
,应该也可以kubelet
日志:sudo journalctl -u kubelet
(也被坑了,一开始没注意,这个命令日志不自动换行),改用sudo journalctl -u kubelet --no-pager --follow
Unable to register node with API server" err="nodes \"k8s-node01\" is forbidden: node \"renxiaozhao02\" is not allowed to modify node \"k8s-node01\"" node="k8s-node01"
sudo journalctl -u kubelet --no-pager --follow
可以准确的查看日志8月 30 15:37:47 renxiaozhao02 kubelet[9469]: I0830 15:37:47.153840 9469 kubelet_node_status.go:70] "Attempting to register node" node="k8s-node01"
8月 30 15:37:47 renxiaozhao02 kubelet[9469]: E0830 15:37:47.160006 9469 kubelet_node_status.go:92] "Unable to register node with API server" err="nodes \"k8s-node01\" is forbidden: node \"renxiaozhao02\" is not allowed to modify node \"k8s-node01\"" node="k8s-node01"
8月 30 15:37:47 renxiaozhao02 kubelet[9469]: I0830 15:37:47.283440 9469 scope.go:117] "RemoveContainer" containerID="b0ab74aedc14936c3f0e95682f257b3a8981ecb3e9e36b590a35d65c3eafbd16"
8月 30 15:37:47 renxiaozhao02 kubelet[9469]: E0830 15:37:47.284025 9469 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-scheduler\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-scheduler pod=kube-scheduler-k8s-node01_kube-system(7734ba1e9e5564ed51f1e93da8155ae7)\"" pod="kube-system/kube-scheduler-k8s-node01" podUID="7734ba1e9e5564ed51f1e93da8155ae7"