在Linux系统CentOS7.9下部署k8s集群,使用containerd为容器运行时,Calico为k8s网络组件。
CentOS Linux release 7.9.2009 (Core)
ip | 主机名 | 角色 |
---|---|---|
10.230.64.16 | node-1 | master |
10.230.64.17 | node-2 | worker |
10.230.64.14 | node-3 | worker |
一般来讲,硬件设备会拥有唯一的地址,但是有些虚拟机的地址可能会重复。 Kubernetes 使用这些值来唯一确定集群中的节点。 如果这些值在每个节点上不唯一,可能会导致安装失败。
6443端口必须空闲,lsof -i:6443
# 直接关闭swap内存
swapoff -a
# 禁止开机自动挂载swap分区
sed -i '/swap / s/^\(.*\)$/#\1/g' /etc/fstab
# 检查关闭swap成功
free -h
# 安装工具
yum -y install ntp ntpdate
# 同步网络时间
ntpdate ntp.ntsc.ac.cn
setenforce 0
vi /etc/selinux/config
SELINUX=disabled
k8s集群节点之间通信需要使用部分端口,这里避免麻烦直接关闭防火墙
systemctl disable firewalld.service
也可以开放必要的端口
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# 应用 sysctl 参数而不重新启动
sudo sysctl --system
# 检查是否应用成功
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
#检查br_netfilter 和 overlay 模块被加载
lsmod | grep br_netfilter
lsmod | grep overlay
kube-proxy 中的 IPVS 实现通过减少对 iptables 的使用来增加可扩展性。当k8s集群中的负载均衡配置变多的时候,IPVS能实现比iptables更高效的转发性能。
# 安装ipset和ipvsadm
yum install -y ipset ipvsadmin
# 编写配置文件
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
# 添加权限并执行
chmod +x /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules
# 查看对应的模块是否加载成功
lsmod | grep -e ip_vs -e nf_conntrack_ipv4
内核版本低会报错:FATAL: Module nf_conntrack_ipv4 not found.
将上面的配置文件ipvs.modules中nf_conntrack_ipv4改为nf_conntrack,重新执行即可
遇到过成功部署集群后,node报错,看Calico容器里面也是差不多的报错,坑了很久:
ntime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initi
提前手动安装可以避免这个问题
CNI:Container network interface容器网络接口,为容器分配ip地址网卡等
github地址
下载安装cni,并解压到/opt/cni/bin目录下(建议使用这个目录,后续安装Containerd时里面默认指定的cni就在这里)
10.88.0.0/16 容器网段
mkdir /opt/cni/bin -p
tar xf cni-plugins-linux-amd64-v1.2.0.tgz -C /opt/cni/bin
cat << EOF | tee /etc/cni/net.d/10-containerd-net.conflist
{
"cniVersion": "1.0.0",
"name": "containerd-net",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"promiscMode": true,
"ipam": {
"type": "host-local",
"ranges": [
[{
"subnet": "10.88.0.0/16"
}],
[{
"subnet": "2001:db8:4860::/64"
}]
],
"routes": [
{ "dst": "0.0.0.0/0" },
{ "dst": "::/0" }
]
}
},
{
"type": "portmap",
"capabilities": {"portMappings": true},
"externalSetMarkChain": "KUBE-MARK-MASQ"
}
]
}
EOF
#从官方 Docker 存储库安装 containerd.io 包
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum install -y containerd.io
#生成默认配置
mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
#修改配置---修改containerd的cgroup driver为systemd
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
#修改镜像拉取地址
[plugins."io.containerd.grpc.v1.cri"]
...
# sandbox_image = "k8s.gcr.io/pause:3.6"
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.6"
#修改镜像代理
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = ""
[plugins."io.containerd.grpc.v1.cri".registry.auths]
[plugins."io.containerd.grpc.v1.cri".registry.configs]
[plugins."io.containerd.grpc.v1.cri".registry.headers]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://8aj710su.mirror.aliyuncs.com" ,"https://registry-1.docker.io"]
#配置开机自启并启动
systemctl enable containerd --now
#检查
#如果没有crictl需要单独安装
crictl version
crictl images
注意:如果服务器内有docker,需要确认使用的cgroup也是systemd
docker info | grep cgroup
docker 非systemd处理:
在/etc/docker/daemon.json中新增字段
"exec-opts": [
"native.cgroupdriver=systemd"
]
cat < /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
yum makecache fast
yum install -y kubelet-1.26.1 kubeadm-1.26.1 kubectl-1.26.1
systemctl enable kubelet.service
kubeadm config print init-defaults --component-configs KubeletConfiguration > kubeadm.yaml
主要是修改镜像仓库(默认的被墙了),修改master的IP
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.230.64.16 #你的master节点IP
bindPort: 6443
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock #容器运行时地址同/etc/containerd/config.toml中配置的地址
taints:
- effect: PreferNoSchedule
key: node-role.kubernetes.io/master
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: 1.26.1 #部署的k8s版本
imageRepository: registry.aliyuncs.com/google_containers。#镜像源换成阿里云源
networking:
podSubnet: 10.244.0.0/16 #k8s容器网段
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd #kubelet的cgroupDriver
failSwapOn: false
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs #指定kube-proxy代理模式为ipvs
因为init时拉取镜像耗时较长,先拉取镜像kubeadm config images pull --config kubeadm.yaml
拉取完成后查看镜像crictl images
用不同方法部署过很多次,无用镜像较多可以清理一下:
删除镜像名称包含“io”的镜像crictl images | grep -E -- 'io' | awk '{print $3}'|xargs -n 1 crictl rmi
kubeadm init --config kubeadm.yaml
****
****
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.230.64.16:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:b236772be763f26614e241577c880cda23553359ffda4d6d106f5438a138aa60
初始化完成后根据如上提示执行
#配置常规用户使用kubectl访问集群
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubeadm join 10.230.64.16:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:b236772be763f26614e241577c880cda23553359ffda4d6d106f5438a138aa60
注意该token24小时后会失效,重新获取命令kubeadm token create --print-join-command
查看当前集群节点状态kubectl get nodes
[root@node-1 mnt]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node NotReady control-plane 30m v1.26.1
node-2 NotReady 6s v1.26.1
node-3 NotReady 1s v1.26.1
可以看到节点状态都是NotReady,需要部署网络组件,这里选择Calico可以根据官方文档的建议进行安装calico官方安装手册安装方法更灵活
也可以使用我的yaml
CALICO_IPV4POOL_CIDR的value修改为 前面kubeadm.yaml中advertiseAddress的地址
由于官方提供的镜像可能被墙,可测试是否能拉取到crictl pull [image]
如拉取calico/cni:v3.25.0
依赖的镜像从yaml文件中搜“image”,如果拉取不到修改yaml,从“quay.io/”仓库拉取,如
#原来的
image: calico/kube-controllers:v3.25.0
改成
image: quay.io/calico/kube-controllers:v3.25.0
所有的都需要改
kubectl apply -f calico.yaml
#各个node状态正常
[root@node-1 mnt]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node Ready control-plane 40m v1.26.1
node-2 Ready 10m v1.26.1
node-3 Ready 10m v1.26.1
#master组件正常运行
[root@node-1 calico]# kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-7889bf4d45-k2827 1/1 Running 0 6h1m
calico-node-244st 1/1 Running 2 5h54m
calico-node-5svtt 1/1 Running 0 6h1m
calico-node-nvmqd 1/1 Running 0 5h43m
coredns-5bbd96d687-7t5wp 1/1 Running 0 6h14m
coredns-5bbd96d687-grlgl 1/1 Running 0 6h14m
etcd-node 1/1 Running 0 6h14m
kube-apiserver-node 1/1 Running 0 6h14m
kube-controller-manager-node 1/1 Running 0 6h14m
kube-proxy-dfkjn 1/1 Running 2 5h54m
kube-proxy-mhscv 1/1 Running 0 6h14m
kube-proxy-nxk4b 1/1 Running 0 5h43m
kube-scheduler-node 1/1 Running 0 6h14m
至此集群部署搭建完成
搭建过程中遇到过很多异常,但是没有一一截图这里记录一下思路
查看kubelet日志 journalctl -xefu kubelet
查看pod日志kubectl -n kube-system logs -f calico-kube-controllers-7889bf4d45-k2827
kubectl -n kube-system describe pod calico-kube-controllers-7889bf4d45-k2827
kubectl -n kube-system describe pod calico-kube-controllers-7889bf4d45-k2827 | grep "Node"
crictl ps -a
crictl logs -f calico-kube-controllers
部署组件或者容器时经常拉取不到镜像,可以通过从以下方案解决
ctr -n k8s.io images import mysql.tar
导入ctr -n k8s.io i tag 原tag 新tag
可以参考的文档kubernetes官方文档