Kubernetes坑是有这么多啊,嫌弃。。。
1.设置内核
内核必须支持 memory and swap accounting,即要求如下配置
CONFIG_RESOURCE_COUNTERS=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_SWAP_ENABLED=y
CONFIG_MEMCG_KMEM=y
查看内核支持
cat /boot/config-5.11.0-40-generic |grep MEMCG
cat /boot/config-5.11.0-40-generic |grep RESOURCE_COUNTERS
添加内核支持
vim /etc/default/grub
修改GRUB_CMDLINE_LINUX添加
cgroup_enable=memory swapaccount=1
如完整的GRUB_CMDLINE_LINUX像这样
GRUB_CMDLINE_LINUX="find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US cgroup_enable=memory swapaccount=1"
更新grub并重启,查看是否完成
update-grub
reboot now
cat /proc/cmdline
2.安装 k8s 库
所有节点都需要安装
apt-get update && apt-get install -y apt-transport-https curl
apt-get install -y kubelet kubeadm kubectl --allow-unauthenticated
apt-get install conntrack
3.关闭交换内存
swapoff -a
vim /etc/fstab
注释掉swap行
#/swapfile none swap sw 0 0
重启
reboot now
4.拉取镜像 Image
将代码写入 fetch_images.sh
for i in `kubeadm config images list`; do
imageName=${i#k8s.gcr.io/}
docker pull registry.aliyuncs.com/google_containers/$imageName
docker tag registry.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName
docker rmi registry.aliyuncs.com/google_containers/$imageName
done;
运行
chmod +x fetch_images.sh
./fetch_images.sh
如果不能拉取完整的镜像,则查看所有需要的镜像
kubeadm config images list
并逐个 pull image,并修改为 k8s 名称,如
sudo docker tag coredns/coredns:latest k8s.gcr.io/coredns/coredns:v1.8.4
5.部署网络插件
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
错误信息:
No apt package "kubeadm", but there is a snap with that name.
Try "snap install kubeadm"
解决方法:
vim /etc/apt/sources.list
添加
deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main
错误信息:
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY ED444FF07D8D0BF6
解决方法:
apt-key adv --keyserver keys.gnupg.net --recv-keys ED444FF07D8D0BF6
错误信息:
The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused
解决方法:
kubelet未正常启动,查看kubelet状态
systemctl status kubelet
journalctl -xeu kubelet > log.txt
vim log.txt
查询错误,并使用以下的错误解决方法
错误信息:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use
或者
[WARNING FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
解决方法:
kubeadm reset
错误信息:
failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"
错误原因为docker和k8s使用的cgroup不一致导致的
解决办法:
vim /etc/docker/daemon.json
修改为
{
"registry-mirrors": [
"https://hub-mirror.c.163.com",
"https://ghcr.io",
"https://mirror.baidubce.com"
],
"exec-opts":["native.cgroupdriver=systemd"]
}
然后重启动docker
systemctl restart docker
systemctl status docker
docker info |grep Cgroup
错误信息:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
Error from server (Forbidden): deployments.apps is forbidden: User "system:node:ubuntu" cannot list resource "deployments" in API group "apps" in the namespace "kube-system"
解决方法:
#for master
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
echo 'export KUBECONFIG=$HOME/.kube/config' >> $HOME/.bashrc
source ~/.bashrc
#for node
echo 'export KUBECONFIG=/etc/kubernetes/kubelet.conf' >> $HOME/.bashrc
sudo chown $(id -u):$(id -g) /etc/kubernetes/kubelet.conf
source ~/.bashrc
或者尝试删除配置,然后重新初始化集群
#sudo rm -r /etc/cni/net.d
sudo rm -r $HOME/.kube
错误信息:
"PullImage from image service failed" err="rpc error: code = Unknown desc = error pulling image configuration: Get https://cdn02.quay.io/sha256/e6/e6ea68648f0cd70c8d77c79e8cd4c17f63d587815afcf274909b591cb0e417ab? dial tcp 99.84.224.149:443: i/o timeout" image="quay.io/coreos/flannel:v0.15.1"
解决方法:
修改YML中的镜像***quay.io/coreos/flannel:v0.15.1***全部改为
quay-mirror.qiniu.com/coreos/flannel:v0.15.1
或者通过docker查找适合的 image备份
docker search flannel
#test if exist
docker pull xxx/flannel:v0.15.1
错误信息:
kubectl get nodes
"Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized"
解决方法:
需要部署网络插件flannel
错误信息:
kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
解决方法:
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
vim /etc/kubernetes/manifests/kube-scheduler.yaml
#将 - --port=0 注释掉
sudo systemctl restart kubelet.service
错误信息:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "90dc49fa4e7831704b67748edc12949438ace02fb4fad8c7c0f9e77c3939d572" network for pod "coredns-78fcd69978-6n7bh": networkPlugin cni failed to set up pod "coredns-78fcd69978-6n7bh_kube-system" network: open /run/flannel/subnet.env: no such file or directory
解决方法:
vim /run/flannel/subnet.env
写入如下内容
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
错误信息:
Kubernetes pod状态出现 ImagePullBackOff
解决方法:
docker image 无法被 pull, 更改镜像源或者手动下载
错误信息:
networkPlugin cni failed to set up pod "coredns-78fcd69978-cg29q_kube-system" network: failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.10.0.1/24
解决方法:
可能已经预先配置了subnet
cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
查看网络ip
ifconfig cni0
cni0: flags=4163 mtu 1450
inet 10.244.0.1 netmask 255.255.255.0 broadcast 10.10.0.255
inet6 fe80::1c2e:e4ff:fedd:6c48 prefixlen 64 scopeid 0x20
删除重建网络或者修改 FLANNEL_SUBNET=10.10.0.1/24
sudo ifconfig cni0 down
sudo ip link delete cni0
错误信息:
0/1 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate
解决方法:
当创建单机版的 k8s 时,这个时候 master 节点是默认不允许调度 pod
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl get pod --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-9d85f5447-8lr7g 1/1 Running 0 117m 10.244.0.3 k8s-master
kube-system coredns-9d85f5447-9d7t6 1/1 Running 0 117m 10.244.0.2 k8s-master
kube-system etcd-k8s-master 1/1 Running 0 117m 192.168.59.156 k8s-master
kube-system kube-apiserver-k8s-master 1/1 Running 0 117m 192.168.59.156 k8s-master
kube-system kube-controller-manager-k8s-master 1/1 Running 0 117m 192.168.59.156 k8s-master
kube-system kube-flannel-ds-amd64-5hp2j 1/1 Running 0 29m 192.168.59.156 k8s-master
kube-system kube-flannel-ds-amd64-gkfxl 1/1 Running 0 7m10s 192.168.59.158 k8s-node2
kube-system kube-flannel-ds-amd64-gsr5n 1/1 Running 5 8m28s 192.168.59.157 k8s-node1
kube-system kube-proxy-jfq6d 1/1 Running 0 8m28s 192.168.59.157 k8s-node1
kube-system kube-proxy-lxdxs 1/1 Running 0 117m 192.168.59.156 k8s-master
kube-system kube-proxy-pjpqq 1/1 Running 0 7m10s 192.168.59.158 k8s-node2
kube-system kube-scheduler-k8s-master 1/1 Running 0 117m 192.168.59.156 k8s-master
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 117m v1.17.0
k8s-node1 Ready 8m37s v1.17.0
k8s-node2 Ready 7m19s v1.17.0
如果所有的节点都已经 Ready,Kubernetes Cluster 创建成功,一切准备就绪。 如果pod状态为Pending、ContainerCreating、ImagePullBackOff都表明 Pod 没有就绪,Running 才是就绪状态。 如果有pod提示Init:ImagePullBackOff,说明这个pod的镜像在对应节点上拉取失败,我们可以通过 kubectl describe pod 查看 Pod 具体情况,以确认拉取失败的镜像
重新设置 K8S 并启动
sudo kubeadm reset
#添加子网络避免冲突,需和/run/flannel/subnet.env中subnetwork保持一致
sudo kubeadm init --pod-network-cidr=10.244.0.1/24#10.10.0.0./16
查看分布式系统节点
kubectl get nodes
查看各个 pod 状态
kubectl get pods -n kube-system -o wide
kubectl get pods --all-namespaces
确认节点处于健康状态
kubectl get cs
描述详细 pod 的状态
kubectl describe pod -n kube-system
描述某个 pod 的状态
kubectl describe pod kube-flannel-ds-jzsm5 -n kube-system
查看某个 pod 的日志
kubectl logs kube-flannel-ds-jzsm5 -n kube-system
运行服务
kubectl create deployment nginxtest --image=nginx
kubectl expose deployment nginxtest --port=8080 --type=NodePort
导出 image
docker save -o dockerdemo.tar dockerdemo
导入 image
docker load -i dockerdemo.tar
查看占用端口
sudo netstat -atup
在 Ubuntu 上安装 K8S教程
Kubernetes 三节点安装