K8S - 安装教程 和 体验(kubeadm)

由于这个世界越来越卷,作为一个大龄程序员, 学习K8S 很有必要

安装模式

很简单的一master 节点带两个node 节点的模式


硬件要求

对于测试环境

节点 内存 硬盘空间
master 4G 20G
node 8G 20G

简直看上令人感到劝退, 毕竟如果买云服务器8G 以上的2000多一年, 买两台简直不得了。 个人建议淘宝买两台微型主机在家里开虚拟机

master 之所以需要4G 内存是应为k8s 要在master 节点运行 一些关键组件例如api-server/controller manager

参考:
https://blog.csdn.net/nvd11/article/details/127355709



服务器准备

节点 虚拟机IP 宿主机 宿主机系统 虚拟机技术 虚拟机系统 分配的cpu线程个数 虚拟机内存 硬盘空间
master 10.0.1.152 NUC7i5BNK Win10 VirtualBox Ubuntu Server 22.04 2 4G 20G
node 0 10.0.1.154 Thinkpad x230 Ubuntu Server 22.04 kvm Ubuntu Server 22.04 2 8G 20G
node 1 10.0.1.153 淘宝工控机8G Ubuntu Server 22.04 kvm Ubuntu Server 22.04 2 4G 20G

因为这3台虚拟机必须互相能通讯, 所以最好把3台虚拟机都设置成桥接网卡模式。

要用NAT模式的话,必需把3台虚拟机都建在同一宿主机下。 这内存压力~



Step 0 , 换源

gateman@k8snode1:~$ cat /etc/apt/sources.list
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb http://mirrors.163.com/ubuntu/ jammy main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ jammy-security main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ jammy-updates main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ jammy-backports main restricted universe multiverse



Step 1 , 关闭防火墙

3台服务器都需要

sudo systemctl stop ufw
sudo systemctl disable ufw



Step 2 , 修改hostname 可选

# 4g 内存的master
sudo hostnamectl set-hostname k8smaster

# 8g 内存的node0
sudo hostnamectl set-hostname k8snode0

# 4g 内存的node1
sudo hostnamectl set-hostname k8snode1



Step 3 , 关闭swap

3台服务器都需要, 这是k8s的要求

# for lasting
sudo vi /etc/fstab

..

/dev/disk/by-uuid/0f14c7cc-0d65-4c31-b867-a70c91a4e496 / ext4 defaults 0 1
#/swap.img	none	swap	sw	0	0   <---comment thisline

# for temporary
sudo swapoff -a



Step 4 , 在master node 添加hosts 包括master本身

只需要在master node执行

gateman@k8smaster:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 vm1

10.0.1.152 k8smaster
10.0.1.154 k8snode0
10.0.1.153 k8snode1
...



Step 5 , 将桥接的IPV4流量 传递到 iptables 的链

3台服务器都需要

有一些ipv4的流量不能走iptables链【linux内核的一个过滤器,每个流量都会经过他,然后再匹配是否可进入当前应用进程去处理】,导致流量丢失

配置k8s.conf文件(#k8s.conf文件原来不存在,需要自己创建的)

gateman@k8smaster:~$ cat /etc/sysctl.d/k8s.conf 
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1

之后执行下面命令使马上生效

sudo sysctl --system 



Step 6 , 时间同步

3台服务器都需要

sudo apt-get install ntpdate
sudo ntpdate time.apple.com



Step 7 , 安装docker

因为我目前安装的版本仍然是基于docker的, 所以3台服务器都需要安装docker

7.1 第一步执行下面官方安装脚本
curl -sSL https://get.daocloud.io/docker | sh

当执行完成docker 已经安装好。

为了让docker 可以让非root 用户执行, 下面的步骤可选

7.2 方案1 直接把当前用户加入docker group
sudo usermod -aG docker $USER
7.3 方案2 基于官方脚本, 这个方案有个硬伤,就是任何基于daemon.json的修改对于非root用户无效, 不建议
7.3.1 安装uidmap
sudo sh -eux <<EOF
# Install newuidmap & newgidmap binaries
apt-get install -y uidmap
EOF
7.3.2 执行官方 rootless 脚本
dockerd-rootless-setuptool.sh install

这个脚本已经被放到/usr/bin ,在步骤7.1

7.3.3 配置下面两个环境变量到~/.bashrc
export PATH=/usr/bin:$PATH
export DOCKER_HOST=unix:///run/user/1000/docker.sock






当你能成功执行docker ps 命令时就代表docker 已经安装完成

gateman@k8snode0:~$ source .bashrc
gateman@k8snode0:~$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES



step 8, 配置docker 阿里云源

3台服务器都需要

加速器url:
https://cr.console.aliyun.com/cn-guangzhou/instances/mirrors

sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
  "registry-mirrors": ["https://xxxxx.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

用下命令来检查registry-mirrors 改动是否生效
上面说了, 如果选择了step 7的方案2 对非root用户是无效的。

docker info # current user
sudo docker info # root user



step 9, 配置 阿里云的 kubernetes 源

3台服务器都需要

参考
https://developer.aliyun.com/mirror/kubernetes?spm=a2c6h.13651102.0.0.3e221b11i4VEO5

sudo apt-get update && sudo apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg |sudo apt-key add - 

sudo bash -c 'cat </etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF'

sudo apt-get update



step 10, 安装kubeadm, kubelet 和 kubectl

3台服务器都需要
kubernetes 在1.24 版本就弃用docker

所以本文用的是1.22 版本

查看源里面的版本信息

apt-cache madison kubectl

安装:

sudo apt-get install kubelet=1.22.15-00 kubeadm=1.22.15-00 kubectl=1.22.15-00

设置开机启动:

sudo systemctl enable kubelet



step 11, 部署master 节点

这一步只在master 节点操作
需要root 权限 or sudo

sudo kubeadm init \
--apiserver-advertise-address=10.0.1.152 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.22.15 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16

解析:
–apiserver-advertise-address # API 服务器所公布的其正在监听的 IP 地址。如果未设置,则使用默认网络接口。

–image-repository # 选择用于拉取控制平面镜像的容器仓库

–service-cidr # 为服务的虚拟 IP 地址另外指定 IP 地址段 默认是10.96.0.0/12

–pod-network-cidr # 指定 Pod 网络的范围。Kubernetes 支持多种网络方案,而且不同网络方案对 --pod-network-cidr 有自己的要求,这里设置为 10.244.0.0/16 是因为我们将使用 flannel 网络方案,必须设置成这个 CIDR。

参考:
https://kubernetes.io/zh-cn/docs/reference/setup-tools/kubeadm/kubeadm-init/

当执行完时。
我们查看docker images 列表, 可以见到master的基本组件例如api server/controller manager 等镜像已经被pull下来

gateman@k8smaster:~$ docker images
REPOSITORY                                                        TAG        IMAGE ID       CREATED         SIZE
registry.aliyuncs.com/google_containers/kube-apiserver            v1.22.15   bd16c7ea581a   4 weeks ago     128MB
registry.aliyuncs.com/google_containers/kube-proxy                v1.22.15   8f9e316d565d   4 weeks ago     104MB
registry.aliyuncs.com/google_containers/kube-controller-manager   v1.22.15   c9f0999a4422   4 weeks ago     122MB
registry.aliyuncs.com/google_containers/kube-scheduler            v1.22.15   b4009e7b5215   4 weeks ago     52.7MB
registry.aliyuncs.com/google_containers/etcd                      3.5.0-0    004811815584   16 months ago   295MB
registry.aliyuncs.com/google_containers/coredns                   v1.8.4     8d147537fb7d   17 months ago   47.6MB
registry.aliyuncs.com/google_containers/pause                     3.5        ed210e3e4a5b   19 months ago   683kB

查看容器列表, 他们已经被启动。。

gateman@k8smaster:~$ docker ps
CONTAINER ID   IMAGE                                               COMMAND                  CREATED              STATUS              PORTS     NAMES
561055d01b7b   8f9e316d565d                                        "/usr/local/bin/kube…"   About a minute ago   Up About a minute             k8s_kube-proxy_kube-proxy-n8z8f_kube-system_841711aa-049f-4493-b650-9f6994e6b08f_0
606da9a1aa4f   registry.aliyuncs.com/google_containers/pause:3.5   "/pause"                 About a minute ago   Up About a minute             k8s_POD_kube-proxy-n8z8f_kube-system_841711aa-049f-4493-b650-9f6994e6b08f_0
68f1a6a78a0d   b4009e7b5215                                        "kube-scheduler --au…"   2 minutes ago        Up 2 minutes                  k8s_kube-scheduler_kube-scheduler-k8smaster_kube-system_a42b5163caef5e8fa98c7a573da37679_0
953da57f8a1c   c9f0999a4422                                        "kube-controller-man…"   2 minutes ago        Up 2 minutes                  k8s_kube-controller-manager_kube-controller-manager-k8smaster_kube-system_f065be5d9b5bc014716ced31135f0969_0
9bd589d80f0c   004811815584                                        "etcd --advertise-cl…"   2 minutes ago        Up 2 minutes                  k8s_etcd_etcd-k8smaster_kube-system_0ba085541ce986b0e765918e7a2236ba_0
e1d48c26bc51   bd16c7ea581a                                        "kube-apiserver --ad…"   2 minutes ago        Up 2 minutes                  k8s_kube-apiserver_kube-apiserver-k8smaster_kube-system_e12ac92bf1f55f7391d09a0523658312_0
ee0f8a5c5de4   registry.aliyuncs.com/google_containers/pause:3.5   "/pause"                 2 minutes ago        Up 2 minutes                  k8s_POD_etcd-k8smaster_kube-system_0ba085541ce986b0e765918e7a2236ba_0
4f4ada54984d   registry.aliyuncs.com/google_containers/pause:3.5   "/pause"                 2 minutes ago        Up 2 minutes                  k8s_POD_kube-scheduler-k8smaster_kube-system_a42b5163caef5e8fa98c7a573da37679_0
559530b98edf   registry.aliyuncs.com/google_containers/pause:3.5   "/pause"                 2 minutes ago        Up 2 minutes                  k8s_POD_kube-controller-manager-k8smaster_kube-system_f065be5d9b5bc014716ced31135f0969_0
880f6a75532d   registry.aliyuncs.com/google_containers/pause:3.5   "/pause"                 2 minutes ago        Up 2 minutes                  k8s_POD_kube-apiserver-k8smaster_kube-system_e12ac92bf1f55f7391d09a0523658312_0

接下来把kubernetes的配置文件复制到当前用户下

mkdir -p ~/.kube
sudo cp -i /etc/kubernetes/admin.conf ~/.kube/config
sudo chown gateman:gateman ~/.kube/config

之所以用sudo cp是因为/etc/kubernetes/admin.conf 只有root可读
完成后我们可以用

kubectl get nodes

命令来查看node节点, 当然现在是1个都没的



step 12, 获得master 节点加入的token

这一步只在master 节点操作

当master 节点初始化完成后, 网络所有node 节点随便加入的, 需要1个token
我们可以用下面命令来获得token和join master节点的命令

gateman@k8smaster:~$ sudo kubeadm token create --print-join-command
[sudo] password for gateman: 
kubeadm join 10.0.1.152:6443 --token 7ci99c.nozdtgow4i6lz2on --discovery-token-ca-cert-hash sha256:d0ca4eb129c624b324934398e78009f73ca16bae228ad2f2bc0056294533b46b 

token 有效时间为24 hours



step 13, 加入Kubernetes Nodes

这一步只在两个nodes 节点执行

sudo kubeadm join 10.0.1.152:6443 --token 7ci99c.nozdtgow4i6lz2on --discovery-token-ca-cert-hash sha256:d0ca4eb129c624b324934398e78009f73ca16bae228ad2f2bc0056294533b46b 

当执行完成后, 我们返回master节点, 查看节点

gateman@k8smaster:~$ kubectl get nodes
NAME        STATUS     ROLES                  AGE     VERSION
k8smaster   NotReady   control-plane,master   47m     v1.22.15
k8snode0    NotReady   <none>                 18m     v1.22.15
k8snode1    NotReady   <none>                 5m47s   v1.22.15

已经见到两个节点已被加入, 但是status 都是notready



step 14, 部署CNI 网络插件

wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

kubectl apply -f ./kube-flannel.yml

注意不要用sudo 执行, 除非你把/etc/kubernetes/admin.conf 也复制到root的home目录下。

当执行完成我们会见到如下的信息

namespace/kube-flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

一个命名空间kube-flannel 被创建
我们查看这个命名空间的pods

gateman@k8smaster:~$ kubectl get pods -n kube-flannel
NAME                    READY   STATUS     RESTARTS   AGE
kube-flannel-ds-7hkw7   0/1     Init:0/2   0          115s
kube-flannel-ds-pkmtk   0/1     Init:1/2   0          115s
kube-flannel-ds-rnqqz   0/1     Init:1/2   0          115s

可以见到有三个pods 正在初始化, 分别对应 3个节点, 我们等一等

等了20多分钟, 发现有两个pods 失败

gateman@k8smaster:~$ kubectl get pods -n kube-flannel
NAME                    READY   STATUS                  RESTARTS   AGE
kube-flannel-ds-7hkw7   0/1     Init:ImagePullBackOff   0          21m
kube-flannel-ds-pkmtk   0/1     Init:ImagePullBackOff   0          21m
kube-flannel-ds-rnqqz   1/1     Running                 0          21m

遇到坑了, 好消息是至少1个nodes ready了, google了下, 大概率是从墙外拉取镜像失败。

我先从master查看 flannel 镜像的地址和版本:

gateman@k8smaster:~$ cat kube-flannel.yml | grep -i flannel:v
       #image: flannelcni/flannel:v0.20.0 for ppc64le and mips64le (dockerhub limitations may apply)
        image: docker.io/rancher/mirrored-flannelcni-flannel:v0.20.0
       #image: flannelcni/flannel:v0.20.0 for ppc64le and mips64le (dockerhub limitations may apply)
        image: docker.io/rancher/mirrored-flannelcni-flannel:v0.20.0

我们去能访问 镜像的节点.
执行手动拉取镜像到1个文件

docker save -o flannel.tar.gz docker.io/rancher/mirrored-flannelcni-flannel:v0.20.0

然后复制到其他节点:

scp flannel.tar.gz gateman@k8smaster:/home/gateman
scp flannel.tar.gz gateman@k8snode1:/home/gateman

在拉去失败的节点,把文件load入本机的镜像列表

docker load -i flannel.tar.gz

返回master节点, 清楚kube-flannel namespace

kubectl replace --force -f kube-flannel.yml

或者删除
重新安装 flannel pods

kubectl delete namespace kube-flannel --force
kubectl apply -f ./kube-flannel.yml

最终成功启动, 3个nodes都Ready!

gateman@k8smaster:~$ kubectl get pods -n kube-flannel
NAME                    READY   STATUS    RESTARTS   AGE
kube-flannel-ds-8mdsm   1/1     Running   0          67s
kube-flannel-ds-ctl8g   1/1     Running   0          67s
kube-flannel-ds-skzsh   1/1     Running   0          67s
gateman@k8smaster:~$ kubectl get nodes
NAME        STATUS   ROLES                  AGE    VERSION
k8smaster   Ready    control-plane,master   156m   v1.22.15
k8snode0    Ready                     128m   v1.22.15
k8snode1    Ready                     115m   v1.22.15



step 15, 测试

经过前面一通操作, k8s测试集群是建好了。

我们创建1个测试pod
首先deploy 1个nginx

在master执行

kubectl create deployment nginx --image=nginx

用 get pods 命令查看状态

gateman@k8smaster:~$ kubectl get pods
NAME                     READY   STATUS              RESTARTS   AGE
nginx-865fffb4b4-xxmhz   0/1     ContainerCreating   0          6s
gateman@k8smaster:~$ 

正在创建, 只能等等

果然又有坑了

gateman@k8smaster:~$ kubectl get pod
NAME                     READY   STATUS              RESTARTS   AGE
nginx-6799fc88d8-2529p   0/1     ImageInspectError   0          94s

遇到坑只会百度是不行的, 我们还需要google

首先我们用

kubectl describe pod <<podname>>

来查看出错原因

Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   Scheduled               111s               default-scheduler  Successfully assigned default/nginx-6799fc88d8-2529p to k8snode1
  Normal   Pulling                 110s               kubelet            Pulling image "nginx"
  Normal   Pulled                  82s                kubelet            Successfully pulled image "nginx" in 27.697366196s
  Warning  FailedCreatePodSandBox  31s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "32a56797b16ec7c3e32e5c14a79855c239cf12138e95b20caf7239838907ffc4" network for pod "nginx-6799fc88d8-2529p": networkPlugin cni failed to set up pod "nginx-6799fc88d8-2529p_default" network: open /run/flannel/subnet.env: no such file or directory
  Warning  FailedCreatePodSandBox  29s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1fccf177b27a523eddb2ab0c8a40916e1e23796775da6450c33eda7f98f8189c" network for pod "nginx-6799fc88d8-2529p": networkPlugin cni failed to set up pod "nginx-6799fc88d8-2529p_default" network: open /run/flannel/subnet.env: no such file or directory
  Warning  FailedCreatePodSandBox  27s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f28026b9db8e68c113b4ae343bee90aa1d190a72017437a272515394a169ac23" network for pod "nginx-6799fc88d8-2529p": networkPlugin cni failed to set up pod "nginx-6799fc88d8-2529p_default" network: open /run/flannel/subnet.env: no such file or directory
  Warning  FailedCreatePodSandBox  25s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "1e39a3f87d7f0ba2eb87ea8d44704852e490f5e49bddee28d77ef776ad0f2f19" network for pod "nginx-6799fc88d8-2529p": networkPlugin cni failed to set up pod "nginx-6799fc88d8-2529p_default" network: open /run/flannel/subnet.env: no such file or directory
  Normal   SandboxChanged          22s (x6 over 34s)  kubelet            Pod sandbox changed, it will be killed and re-created.
  Warning  InspectFailed           6s (x5 over 23s)   kubelet            Failed to inspect image "nginx": rpc error: code = Unknown desc = Error response from daemon: readlink /var/lib/docker/overlay2: invalid argument
  Warning  Failed                  6s (x5 over 23s)   kubelet            Error: ImageInspectError

可以见到这个app 被分配到k8snode1
启动失败原因是 不能访问文件/run/flannel/subnet.env
马上去node1 看一下

gateman@k8snode1:~$ cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.2.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

是可以访问的, 估计出错原因不准确, 好恶心
见到flannel 的字眼, 估计跟cni网络插件有关。

司马当活马医

先删除node1节点
在master上执行

kubectl delete node k8snode1

然后在node1节点删除旧证书

sudo rm -f /var/lib/kubelet/pki/*

在node 节点重启 kubelet

sudo systemctl restart kubelet

重新安装nginx

kubectl delete deployment nginx
kubectl create deployment nginx --image=nginx

遇到第二个坑

  Warning  FailedCreatePodSandBox  5m9s                   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e6b2acc95edb96e5743da34eb8cca515ae933758e7dc22fbdd88525af9f14085" network for pod "nginx-6799fc88d8-qzqpr": networkPlugin cni failed to set up pod "nginx-6799fc88d8-qzqpr_default" network: failed to delegate add: failed to set bridge addr: "cni0" already has an IP address different from 10.244.3.1/24

简单来讲,就是说 node1 的桥接网卡cni0 的地址应该是10.244.3.1 但是实际上是其他地址

what the hell of this?

去node1检查

gateman@k8snode1:~$ ifconfig
cni0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 10.244.2.1  netmask 255.255.255.0  broadcast 10.244.2.255
        inet6 fe80::3867:48ff:fe71:b759  prefixlen 64  scopeid 0x20<link>
        ether 3a:67:48:71:b7:59  txqueuelen 1000  (Ethernet)
        RX packets 82  bytes 2296 (2.2 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 153  bytes 16750 (16.7 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


gateman@k8snode1:~$ cat /run/flannel/subnet.env 
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.3.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

果然对不上
解决方法(from google)
删除cni0网卡, k8s会自己重建

sudo ifconfig cni0 down
sudo ip link delete cni0

再重新安装nginx
这次终于可以了

gateman@k8smaster:~$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-6799fc88d8-74n47   1/1     Running   0          19s

虽然nginx已经启动, 但是当前状态仍然不能被k8s集群外的主机访问, 我们需要暴露端口

kubectl expose deployment nginx --port=80 --type=NodePort

至于–type=NodePort 参数请参考
https://dockone.io/article/4884

接下来我们可用下面命令来查看哪个端口被暴露了(mapping)

kubectl get pod,svc
gateman@k8smaster:~$ kubectl get pod,svc
NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-6799fc88d8-74n47   1/1     Running   0          22m

NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        4h12m
service/nginx        NodePort    10.100.66.251   <none>        80:31407/TCP   62s

然后我们可以尝试用任意1个node(包括master)的地址+31407 来访问nginx
K8S - 安装教程 和 体验(kubeadm)_第1张图片

总结:
安装k8s 体验真的是不太友好, 即使我已经选择最简单的方式(kubeadm)

你可能感兴趣的:(kvm,K8S,kubernetes,linux,服务器)