由于这个世界越来越卷,作为一个大龄程序员, 学习K8S 很有必要
很简单的一master 节点带两个node 节点的模式
对于测试环境
节点 | 内存 | 硬盘空间 |
---|---|---|
master | 4G | 20G |
node | 8G | 20G |
简直看上令人感到劝退, 毕竟如果买云服务器8G 以上的2000多一年, 买两台简直不得了。 个人建议淘宝买两台微型主机在家里开虚拟机
master 之所以需要4G 内存是应为k8s 要在master 节点运行 一些关键组件例如api-server/controller manager
参考:
https://blog.csdn.net/nvd11/article/details/127355709
节点 | 虚拟机IP | 宿主机 | 宿主机系统 | 虚拟机技术 | 虚拟机系统 | 分配的cpu线程个数 | 虚拟机内存 | 硬盘空间 |
---|---|---|---|---|---|---|---|---|
master | 10.0.1.152 | NUC7i5BNK | Win10 | VirtualBox | Ubuntu Server 22.04 | 2 | 4G | 20G |
node 0 | 10.0.1.154 | Thinkpad x230 | Ubuntu Server 22.04 | kvm | Ubuntu Server 22.04 | 2 | 8G | 20G |
node 1 | 10.0.1.153 | 淘宝工控机8G | Ubuntu Server 22.04 | kvm | Ubuntu Server 22.04 | 2 | 4G | 20G |
因为这3台虚拟机必须互相能通讯, 所以最好把3台虚拟机都设置成桥接网卡模式。
要用NAT模式的话,必需把3台虚拟机都建在同一宿主机下。 这内存压力~
gateman@k8snode1:~$ cat /etc/apt/sources.list
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb http://mirrors.163.com/ubuntu/ jammy main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ jammy-security main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ jammy-updates main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ jammy-backports main restricted universe multiverse
3台服务器都需要
sudo systemctl stop ufw
sudo systemctl disable ufw
# 4g 内存的master
sudo hostnamectl set-hostname k8smaster
# 8g 内存的node0
sudo hostnamectl set-hostname k8snode0
# 4g 内存的node1
sudo hostnamectl set-hostname k8snode1
3台服务器都需要, 这是k8s的要求
# for lasting
sudo vi /etc/fstab
..
/dev/disk/by-uuid/0f14c7cc-0d65-4c31-b867-a70c91a4e496 / ext4 defaults 0 1
#/swap.img none swap sw 0 0 <---comment thisline
# for temporary
sudo swapoff -a
只需要在master node执行
gateman@k8smaster:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 vm1
10.0.1.152 k8smaster
10.0.1.154 k8snode0
10.0.1.153 k8snode1
...
3台服务器都需要
有一些ipv4的流量不能走iptables链【linux内核的一个过滤器,每个流量都会经过他,然后再匹配是否可进入当前应用进程去处理】,导致流量丢失
配置k8s.conf文件(#k8s.conf文件原来不存在,需要自己创建的)
gateman@k8smaster:~$ cat /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
之后执行下面命令使马上生效
sudo sysctl --system
3台服务器都需要
sudo apt-get install ntpdate
sudo ntpdate time.apple.com
因为我目前安装的版本仍然是基于docker的, 所以3台服务器都需要安装docker
curl -sSL https://get.daocloud.io/docker | sh
当执行完成docker 已经安装好。
为了让docker 可以让非root 用户执行, 下面的步骤可选
sudo usermod -aG docker $USER
sudo sh -eux <<EOF
# Install newuidmap & newgidmap binaries
apt-get install -y uidmap
EOF
dockerd-rootless-setuptool.sh install
这个脚本已经被放到/usr/bin ,在步骤7.1
export PATH=/usr/bin:$PATH
export DOCKER_HOST=unix:///run/user/1000/docker.sock
当你能成功执行docker ps 命令时就代表docker 已经安装完成
gateman@k8snode0:~$ source .bashrc
gateman@k8snode0:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3台服务器都需要
加速器url:
https://cr.console.aliyun.com/cn-guangzhou/instances/mirrors
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://xxxxx.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
用下命令来检查registry-mirrors 改动是否生效
上面说了, 如果选择了step 7的方案2 对非root用户是无效的。
docker info # current user
sudo docker info # root user
3台服务器都需要
参考
https://developer.aliyun.com/mirror/kubernetes?spm=a2c6h.13651102.0.0.3e221b11i4VEO5
sudo apt-get update && sudo apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg |sudo apt-key add -
sudo bash -c 'cat </etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF'
sudo apt-get update
3台服务器都需要
kubernetes 在1.24 版本就弃用docker
所以本文用的是1.22 版本
查看源里面的版本信息
apt-cache madison kubectl
安装:
sudo apt-get install kubelet=1.22.15-00 kubeadm=1.22.15-00 kubectl=1.22.15-00
设置开机启动:
sudo systemctl enable kubelet
这一步只在master 节点操作
需要root 权限 or sudo
sudo kubeadm init \
--apiserver-advertise-address=10.0.1.152 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.22.15 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16
解析:
–apiserver-advertise-address # API 服务器所公布的其正在监听的 IP 地址。如果未设置,则使用默认网络接口。
–image-repository # 选择用于拉取控制平面镜像的容器仓库
–service-cidr # 为服务的虚拟 IP 地址另外指定 IP 地址段 默认是10.96.0.0/12
–pod-network-cidr # 指定 Pod 网络的范围。Kubernetes 支持多种网络方案,而且不同网络方案对 --pod-network-cidr 有自己的要求,这里设置为 10.244.0.0/16 是因为我们将使用 flannel 网络方案,必须设置成这个 CIDR。
参考:
https://kubernetes.io/zh-cn/docs/reference/setup-tools/kubeadm/kubeadm-init/
当执行完时。
我们查看docker images 列表, 可以见到master的基本组件例如api server/controller manager 等镜像已经被pull下来
gateman@k8smaster:~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.aliyuncs.com/google_containers/kube-apiserver v1.22.15 bd16c7ea581a 4 weeks ago 128MB
registry.aliyuncs.com/google_containers/kube-proxy v1.22.15 8f9e316d565d 4 weeks ago 104MB
registry.aliyuncs.com/google_containers/kube-controller-manager v1.22.15 c9f0999a4422 4 weeks ago 122MB
registry.aliyuncs.com/google_containers/kube-scheduler v1.22.15 b4009e7b5215 4 weeks ago 52.7MB
registry.aliyuncs.com/google_containers/etcd 3.5.0-0 004811815584 16 months ago 295MB
registry.aliyuncs.com/google_containers/coredns v1.8.4 8d147537fb7d 17 months ago 47.6MB
registry.aliyuncs.com/google_containers/pause 3.5 ed210e3e4a5b 19 months ago 683kB
查看容器列表, 他们已经被启动。。
gateman@k8smaster:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
561055d01b7b 8f9e316d565d "/usr/local/bin/kube…" About a minute ago Up About a minute k8s_kube-proxy_kube-proxy-n8z8f_kube-system_841711aa-049f-4493-b650-9f6994e6b08f_0
606da9a1aa4f registry.aliyuncs.com/google_containers/pause:3.5 "/pause" About a minute ago Up About a minute k8s_POD_kube-proxy-n8z8f_kube-system_841711aa-049f-4493-b650-9f6994e6b08f_0
68f1a6a78a0d b4009e7b5215 "kube-scheduler --au…" 2 minutes ago Up 2 minutes k8s_kube-scheduler_kube-scheduler-k8smaster_kube-system_a42b5163caef5e8fa98c7a573da37679_0
953da57f8a1c c9f0999a4422 "kube-controller-man…" 2 minutes ago Up 2 minutes k8s_kube-controller-manager_kube-controller-manager-k8smaster_kube-system_f065be5d9b5bc014716ced31135f0969_0
9bd589d80f0c 004811815584 "etcd --advertise-cl…" 2 minutes ago Up 2 minutes k8s_etcd_etcd-k8smaster_kube-system_0ba085541ce986b0e765918e7a2236ba_0
e1d48c26bc51 bd16c7ea581a "kube-apiserver --ad…" 2 minutes ago Up 2 minutes k8s_kube-apiserver_kube-apiserver-k8smaster_kube-system_e12ac92bf1f55f7391d09a0523658312_0
ee0f8a5c5de4 registry.aliyuncs.com/google_containers/pause:3.5 "/pause" 2 minutes ago Up 2 minutes k8s_POD_etcd-k8smaster_kube-system_0ba085541ce986b0e765918e7a2236ba_0
4f4ada54984d registry.aliyuncs.com/google_containers/pause:3.5 "/pause" 2 minutes ago Up 2 minutes k8s_POD_kube-scheduler-k8smaster_kube-system_a42b5163caef5e8fa98c7a573da37679_0
559530b98edf registry.aliyuncs.com/google_containers/pause:3.5 "/pause" 2 minutes ago Up 2 minutes k8s_POD_kube-controller-manager-k8smaster_kube-system_f065be5d9b5bc014716ced31135f0969_0
880f6a75532d registry.aliyuncs.com/google_containers/pause:3.5 "/pause" 2 minutes ago Up 2 minutes k8s_POD_kube-apiserver-k8smaster_kube-system_e12ac92bf1f55f7391d09a0523658312_0
接下来把kubernetes的配置文件复制到当前用户下
mkdir -p ~/.kube
sudo cp -i /etc/kubernetes/admin.conf ~/.kube/config
sudo chown gateman:gateman ~/.kube/config
之所以用sudo cp是因为/etc/kubernetes/admin.conf 只有root可读
完成后我们可以用
kubectl get nodes
命令来查看node节点, 当然现在是1个都没的
这一步只在master 节点操作
当master 节点初始化完成后, 网络所有node 节点随便加入的, 需要1个token
我们可以用下面命令来获得token和join master节点的命令
gateman@k8smaster:~$ sudo kubeadm token create --print-join-command
[sudo] password for gateman:
kubeadm join 10.0.1.152:6443 --token 7ci99c.nozdtgow4i6lz2on --discovery-token-ca-cert-hash sha256:d0ca4eb129c624b324934398e78009f73ca16bae228ad2f2bc0056294533b46b
token 有效时间为24 hours
这一步只在两个nodes 节点执行
sudo kubeadm join 10.0.1.152:6443 --token 7ci99c.nozdtgow4i6lz2on --discovery-token-ca-cert-hash sha256:d0ca4eb129c624b324934398e78009f73ca16bae228ad2f2bc0056294533b46b
当执行完成后, 我们返回master节点, 查看节点
gateman@k8smaster:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8smaster NotReady control-plane,master 47m v1.22.15
k8snode0 NotReady <none> 18m v1.22.15
k8snode1 NotReady <none> 5m47s v1.22.15
已经见到两个节点已被加入, 但是status 都是notready
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f ./kube-flannel.yml
注意不要用sudo 执行, 除非你把/etc/kubernetes/admin.conf 也复制到root的home目录下。
当执行完成我们会见到如下的信息
namespace/kube-flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
一个命名空间kube-flannel 被创建
我们查看这个命名空间的pods
gateman@k8smaster:~$ kubectl get pods -n kube-flannel
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-7hkw7 0/1 Init:0/2 0 115s
kube-flannel-ds-pkmtk 0/1 Init:1/2 0 115s
kube-flannel-ds-rnqqz 0/1 Init:1/2 0 115s
可以见到有三个pods 正在初始化, 分别对应 3个节点, 我们等一等
等了20多分钟, 发现有两个pods 失败
gateman@k8smaster:~$ kubectl get pods -n kube-flannel
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-7hkw7 0/1 Init:ImagePullBackOff 0 21m
kube-flannel-ds-pkmtk 0/1 Init:ImagePullBackOff 0 21m
kube-flannel-ds-rnqqz 1/1 Running 0 21m
遇到坑了, 好消息是至少1个nodes ready了, google了下, 大概率是从墙外拉取镜像失败。
我先从master查看 flannel 镜像的地址和版本:
gateman@k8smaster:~$ cat kube-flannel.yml | grep -i flannel:v
#image: flannelcni/flannel:v0.20.0 for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel:v0.20.0
#image: flannelcni/flannel:v0.20.0 for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel:v0.20.0
我们去能访问 镜像的节点.
执行手动拉取镜像到1个文件
docker save -o flannel.tar.gz docker.io/rancher/mirrored-flannelcni-flannel:v0.20.0
然后复制到其他节点:
scp flannel.tar.gz gateman@k8smaster:/home/gateman
scp flannel.tar.gz gateman@k8snode1:/home/gateman
在拉去失败的节点,把文件load入本机的镜像列表
docker load -i flannel.tar.gz
返回master节点, 清楚kube-flannel namespace
kubectl replace --force -f kube-flannel.yml
或者删除
重新安装 flannel pods
kubectl delete namespace kube-flannel --force
kubectl apply -f ./kube-flannel.yml
最终成功启动, 3个nodes都Ready!
gateman@k8smaster:~$ kubectl get pods -n kube-flannel
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-8mdsm 1/1 Running 0 67s
kube-flannel-ds-ctl8g 1/1 Running 0 67s
kube-flannel-ds-skzsh 1/1 Running 0 67s
gateman@k8smaster:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8smaster Ready control-plane,master 156m v1.22.15
k8snode0 Ready 128m v1.22.15
k8snode1 Ready 115m v1.22.15
经过前面一通操作, k8s测试集群是建好了。
我们创建1个测试pod
首先deploy 1个nginx
在master执行
kubectl create deployment nginx --image=nginx
用 get pods 命令查看状态
gateman@k8smaster:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-865fffb4b4-xxmhz 0/1 ContainerCreating 0 6s
gateman@k8smaster:~$
正在创建, 只能等等
果然又有坑了
gateman@k8smaster:~$ kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-6799fc88d8-2529p 0/1 ImageInspectError 0 94s
遇到坑只会百度是不行的, 我们还需要google
首先我们用
kubectl describe pod <<podname>>
来查看出错原因
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 111s default-scheduler Successfully assigned default/nginx-6799fc88d8-2529p to k8snode1
Normal Pulling 110s kubelet Pulling image "nginx"
Normal Pulled 82s kubelet Successfully pulled image "nginx" in 27.697366196s
Warning FailedCreatePodSandBox 31s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "32a56797b16ec7c3e32e5c14a79855c239cf12138e95b20caf7239838907ffc4" network for pod "nginx-6799fc88d8-2529p": networkPlugin cni failed to set up pod "nginx-6799fc88d8-2529p_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 29s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1fccf177b27a523eddb2ab0c8a40916e1e23796775da6450c33eda7f98f8189c" network for pod "nginx-6799fc88d8-2529p": networkPlugin cni failed to set up pod "nginx-6799fc88d8-2529p_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 27s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f28026b9db8e68c113b4ae343bee90aa1d190a72017437a272515394a169ac23" network for pod "nginx-6799fc88d8-2529p": networkPlugin cni failed to set up pod "nginx-6799fc88d8-2529p_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 25s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1e39a3f87d7f0ba2eb87ea8d44704852e490f5e49bddee28d77ef776ad0f2f19" network for pod "nginx-6799fc88d8-2529p": networkPlugin cni failed to set up pod "nginx-6799fc88d8-2529p_default" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 22s (x6 over 34s) kubelet Pod sandbox changed, it will be killed and re-created.
Warning InspectFailed 6s (x5 over 23s) kubelet Failed to inspect image "nginx": rpc error: code = Unknown desc = Error response from daemon: readlink /var/lib/docker/overlay2: invalid argument
Warning Failed 6s (x5 over 23s) kubelet Error: ImageInspectError
可以见到这个app 被分配到k8snode1
启动失败原因是 不能访问文件/run/flannel/subnet.env
马上去node1 看一下
gateman@k8snode1:~$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.2.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
是可以访问的, 估计出错原因不准确, 好恶心
见到flannel 的字眼, 估计跟cni网络插件有关。
司马当活马医
先删除node1节点
在master上执行
kubectl delete node k8snode1
然后在node1节点删除旧证书
sudo rm -f /var/lib/kubelet/pki/*
在node 节点重启 kubelet
sudo systemctl restart kubelet
重新安装nginx
kubectl delete deployment nginx
kubectl create deployment nginx --image=nginx
遇到第二个坑
Warning FailedCreatePodSandBox 5m9s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e6b2acc95edb96e5743da34eb8cca515ae933758e7dc22fbdd88525af9f14085" network for pod "nginx-6799fc88d8-qzqpr": networkPlugin cni failed to set up pod "nginx-6799fc88d8-qzqpr_default" network: failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.3.1/24
简单来讲,就是说 node1 的桥接网卡cni0 的地址应该是10.244.3.1 但是实际上是其他地址
what the hell of this?
去node1检查
gateman@k8snode1:~$ ifconfig
cni0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 10.244.2.1 netmask 255.255.255.0 broadcast 10.244.2.255
inet6 fe80::3867:48ff:fe71:b759 prefixlen 64 scopeid 0x20<link>
ether 3a:67:48:71:b7:59 txqueuelen 1000 (Ethernet)
RX packets 82 bytes 2296 (2.2 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 153 bytes 16750 (16.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
gateman@k8snode1:~$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.3.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
果然对不上
解决方法(from google)
删除cni0网卡, k8s会自己重建
sudo ifconfig cni0 down
sudo ip link delete cni0
再重新安装nginx
这次终于可以了
gateman@k8smaster:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-6799fc88d8-74n47 1/1 Running 0 19s
虽然nginx已经启动, 但是当前状态仍然不能被k8s集群外的主机访问, 我们需要暴露端口
kubectl expose deployment nginx --port=80 --type=NodePort
至于–type=NodePort 参数请参考
https://dockone.io/article/4884
接下来我们可用下面命令来查看哪个端口被暴露了(mapping)
kubectl get pod,svc
gateman@k8smaster:~$ kubectl get pod,svc
NAME READY STATUS RESTARTS AGE
pod/nginx-6799fc88d8-74n47 1/1 Running 0 22m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4h12m
service/nginx NodePort 10.100.66.251 <none> 80:31407/TCP 62s
然后我们可以尝试用任意1个node(包括master)的地址+31407 来访问nginx
总结:
安装k8s 体验真的是不太友好, 即使我已经选择最简单的方式(kubeadm)