Kubernetes(K8S)各种错误及解决方法

Kubernetes坑是有这么多啊,嫌弃。。。

重要操作

1.设置内核

内核必须支持 memory and swap accounting,即要求如下配置

CONFIG_RESOURCE_COUNTERS=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_SWAP_ENABLED=y
CONFIG_MEMCG_KMEM=y

查看内核支持

cat /boot/config-5.11.0-40-generic |grep MEMCG
cat /boot/config-5.11.0-40-generic |grep RESOURCE_COUNTERS

添加内核支持

vim /etc/default/grub

修改GRUB_CMDLINE_LINUX添加

cgroup_enable=memory swapaccount=1

如完整的GRUB_CMDLINE_LINUX像这样

GRUB_CMDLINE_LINUX="find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US cgroup_enable=memory swapaccount=1"

更新grub并重启,查看是否完成

update-grub
reboot now
cat /proc/cmdline

2.安装 k8s 库

所有节点都需要安装

apt-get update && apt-get install -y apt-transport-https curl
apt-get install -y kubelet kubeadm kubectl --allow-unauthenticated
apt-get install conntrack

3.关闭交换内存

swapoff -a
vim /etc/fstab

注释掉swap行

#/swapfile       none            swap    sw              0       0

重启

reboot now

4.拉取镜像 Image

将代码写入 fetch_images.sh

for  i  in  `kubeadm config images list`;  do
    imageName=${i#k8s.gcr.io/}
    docker pull registry.aliyuncs.com/google_containers/$imageName
    docker tag registry.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName
    docker rmi registry.aliyuncs.com/google_containers/$imageName
done;

运行

chmod +x fetch_images.sh
./fetch_images.sh

如果不能拉取完整的镜像,则查看所有需要的镜像

kubeadm config images list

并逐个 pull image,并修改为 k8s 名称,如

sudo docker tag coredns/coredns:latest k8s.gcr.io/coredns/coredns:v1.8.4

5.部署网络插件

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

错误信息与解决方法

错误信息:

No apt package "kubeadm", but there is a snap with that name.
Try "snap install kubeadm"

解决方法:

vim /etc/apt/sources.list

添加

deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main

错误信息:

The following signatures couldn't be verified because the public key is not available: NO_PUBKEY ED444FF07D8D0BF6

解决方法:

apt-key adv --keyserver keys.gnupg.net --recv-keys ED444FF07D8D0BF6

错误信息:

The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused

解决方法:

kubelet未正常启动,查看kubelet状态

systemctl status kubelet
journalctl -xeu kubelet > log.txt
vim log.txt

查询错误,并使用以下的错误解决方法

错误信息:

[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use

或者

[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists

解决方法:

kubeadm reset

错误信息:

failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

错误原因为docker和k8s使用的cgroup不一致导致的

解决办法:

vim /etc/docker/daemon.json

修改为

{
  "registry-mirrors": [
    "https://hub-mirror.c.163.com",
    "https://ghcr.io",
    "https://mirror.baidubce.com"
  ],
  "exec-opts":["native.cgroupdriver=systemd"]
}

然后重启动docker

systemctl restart docker
systemctl status docker
docker info |grep Cgroup

错误信息:

The connection to the server localhost:8080 was refused - did you specify the right host or port?
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
Error from server (Forbidden): deployments.apps is forbidden: User "system:node:ubuntu" cannot list resource "deployments" in API group "apps" in the namespace "kube-system"

解决方法:

#for master
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
echo 'export KUBECONFIG=$HOME/.kube/config' >> $HOME/.bashrc
source ~/.bashrc
#for node
echo 'export KUBECONFIG=/etc/kubernetes/kubelet.conf' >> $HOME/.bashrc
sudo chown $(id -u):$(id -g) /etc/kubernetes/kubelet.conf
source ~/.bashrc

或者尝试删除配置,然后重新初始化集群

#sudo rm -r /etc/cni/net.d
sudo rm -r $HOME/.kube

错误信息:

"PullImage from image service failed" err="rpc error: code = Unknown desc = error pulling image configuration: Get https://cdn02.quay.io/sha256/e6/e6ea68648f0cd70c8d77c79e8cd4c17f63d587815afcf274909b591cb0e417ab? dial tcp 99.84.224.149:443: i/o timeout" image="quay.io/coreos/flannel:v0.15.1"

解决方法:

修改YML中的镜像***quay.io/coreos/flannel:v0.15.1***全部改为

quay-mirror.qiniu.com/coreos/flannel:v0.15.1

或者通过docker查找适合的 image备份

docker search flannel

#test if exist
docker pull xxx/flannel:v0.15.1

错误信息:

kubectl get nodes
 "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"

解决方法:

需要部署网络插件flannel

错误信息:

kubectl get cs
NAME                 STATUS    MESSAGE                         ERROR
scheduler            Unhealthy   Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused

解决方法:

vim /etc/kubernetes/manifests/kube-controller-manager.yaml
vim /etc/kubernetes/manifests/kube-scheduler.yaml
#将 - --port=0 注释掉
sudo systemctl restart kubelet.service

错误信息:

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "90dc49fa4e7831704b67748edc12949438ace02fb4fad8c7c0f9e77c3939d572" network for pod "coredns-78fcd69978-6n7bh": networkPlugin cni failed to set up pod "coredns-78fcd69978-6n7bh_kube-system" network: open /run/flannel/subnet.env: no such file or directory

解决方法:

vim /run/flannel/subnet.env

写入如下内容

FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

错误信息:

Kubernetes pod状态出现 ImagePullBackOff

解决方法:

docker image 无法被 pull, 更改镜像源或者手动下载

错误信息:

networkPlugin cni failed to set up pod "coredns-78fcd69978-cg29q_kube-system" network: failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.10.0.1/24

解决方法:

可能已经预先配置了subnet

cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

查看网络ip

ifconfig cni0
cni0: flags=4163  mtu 1450
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.10.0.255
        inet6 fe80::1c2e:e4ff:fedd:6c48  prefixlen 64  scopeid 0x20

删除重建网络或者修改 FLANNEL_SUBNET=10.10.0.1/24

sudo ifconfig cni0 down
sudo ip link delete cni0

错误信息:

0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate

解决方法:

当创建单机版的 k8s 时,这个时候 master 节点是默认不允许调度 pod

kubectl taint nodes --all node-role.kubernetes.io/master-

正常的集群状态

kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME   READY   STATUS    RESTARTS   AGE     IP     NODE    NOMINATED   NODE   READINESS   GATES

kube-system   coredns-9d85f5447-8lr7g              1/1   Running  0  117m   10.244.0.3       k8s-master    
kube-system   coredns-9d85f5447-9d7t6              1/1   Running  0  117m   10.244.0.2       k8s-master    
kube-system   etcd-k8s-master                      1/1   Running  0  117m   192.168.59.156   k8s-master    
kube-system   kube-apiserver-k8s-master            1/1   Running  0  117m   192.168.59.156   k8s-master    
kube-system   kube-controller-manager-k8s-master   1/1   Running  0  117m   192.168.59.156   k8s-master    
kube-system   kube-flannel-ds-amd64-5hp2j          1/1   Running  0  29m    192.168.59.156   k8s-master    
kube-system   kube-flannel-ds-amd64-gkfxl          1/1   Running  0  7m10s  192.168.59.158   k8s-node2     
kube-system   kube-flannel-ds-amd64-gsr5n          1/1   Running  5  8m28s  192.168.59.157   k8s-node1     
kube-system   kube-proxy-jfq6d                     1/1   Running  0  8m28s  192.168.59.157   k8s-node1     
kube-system   kube-proxy-lxdxs                     1/1   Running  0  117m   192.168.59.156   k8s-master    
kube-system   kube-proxy-pjpqq                     1/1   Running  0  7m10s  192.168.59.158   k8s-node2     
kube-system   kube-scheduler-k8s-master            1/1   Running  0  117m   192.168.59.156   k8s-master    
kubectl get nodes 
NAME         STATUS   ROLES    AGE     VERSION
k8s-master   Ready    master   117m    v1.17.0
k8s-node1    Ready       8m37s   v1.17.0
k8s-node2    Ready       7m19s   v1.17.0

如果所有的节点都已经 Ready,Kubernetes Cluster 创建成功,一切准备就绪。 如果pod状态为Pending、ContainerCreating、ImagePullBackOff都表明 Pod 没有就绪,Running 才是就绪状态。 如果有pod提示Init:ImagePullBackOff,说明这个pod的镜像在对应节点上拉取失败,我们可以通过 kubectl describe pod 查看 Pod 具体情况,以确认拉取失败的镜像

K8S 常用命令

重新设置 K8S 并启动

sudo kubeadm reset

#添加子网络避免冲突,需和/run/flannel/subnet.env中subnetwork保持一致
sudo kubeadm init --pod-network-cidr=10.244.0.1/24#10.10.0.0./16 

查看分布式系统节点

kubectl get nodes

查看各个 pod 状态

kubectl get pods -n kube-system -o wide

kubectl get pods --all-namespaces

确认节点处于健康状态

kubectl get cs

描述详细 pod 的状态

kubectl describe pod -n kube-system

描述某个 pod 的状态

kubectl describe pod kube-flannel-ds-jzsm5 -n kube-system

查看某个 pod 的日志

kubectl logs kube-flannel-ds-jzsm5 -n kube-system

运行服务

kubectl create deployment nginxtest --image=nginx
kubectl expose deployment nginxtest --port=8080 --type=NodePort

Docker 常用命令

导出 image

docker save -o dockerdemo.tar  dockerdemo

导入 image

docker load -i dockerdemo.tar

其他命令

查看占用端口

sudo netstat -atup

K8S 安装参考

在 Ubuntu 上安装 K8S教程

Kubernetes 三节点安装

你可能感兴趣的:(Other,kubernetes,linux,容器,k8s)