使用 vagrant 快速搭建一主两从三台虚拟机集群。
虚拟机名 | IP | 用户 | 备注 | 配置 |
---|---|---|---|---|
k8s-node1 | 192.168.56.101 | root | master | 2核2G |
k8s-node2 | 192.168.56.101 | root | 从节点 | 2核2G |
k8s-node3 | 192.168.56.101 | root | 从节点 | 2核2G |
Vagrantfile
创建虚拟机文件centos-init.sh
虚拟机环境初始化脚本docker-init.sh
docker初始化脚本k8s-init.sh
k8s初始化脚本master-images.sh
master服务器docker镜像拉取脚本kube-flannel.yml
网络组件kubeadm-config.yml
k8s初始化配置文件(参考)资源清单Gitee 下载 CentOS镜像 下载 Ubuntu镜像 下载 清华大学镜像 下载
请先下载
centos7.0-x86_64.box
虚拟机box文件
# 添加本地 box(注意路径不能有空格)
vagrant box add centos7 D:/MallProject/app/VirtualBox-VMs/centos7.0-x86_64.box
# 查看本地 box
vagrant box list
# 先进入指定目录,打开命令窗口
cd D:\WorkFiles\k8s\vagrant
# 初始化本地 centos7 文件(在当前目录下会生成 Vagrantfile 文件)
vagrant init centos7
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
config.vm.box_check_update = false
config.vm.provider 'virtualbox' do |vb|
vb.customize [ "guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-set-threshold", 1000 ]
end
config.vm.synced_folder ".", "/vagrant", type: "rsync"
$num_instances = 3
# $etcd_cluster = "k8s-node1=http://192.168.56.101:2380"
(1..$num_instances).each do |i|
config.vm.define "k8s-node#{i}" do |node|
node.vm.box = "centos7"
node.vm.hostname = "k8s-node#{i}"
ip = "192.168.56.#{i+100}"
node.vm.network "private_network", ip: ip
node.vm.provider "virtualbox" do |vb|
vb.memory = "2048"
vb.cpus = 2
vb.name = "k8s-node#{i}"
end
# node.vm.provision "shell", path: "install.sh", args: [i, ip, $etcd_cluster]
end
end
end
# 注意在Vagrantfile所在目录下执行
vagrant up
# cmd窗口登录
vagrant ssh k8s-node1
# 修改 root 密码 - 根据提示输入两次新密码
sudo passwd root
# 切换到 root 若不修改,默认为 vagrant
su root
# 查看当前用户
whoami
vi /etc/ssh/sshd_config
#修改 PasswordAuthentication no 为 yes
PasswordAuthentication yes
# 重启服务
service sshd restart
注:若执行失败,可参考脚本手动执行
centos-init.sh
到三台服务器# 设置sh脚本格式
vi /opt/centos-init.sh
:set fileformat=unix
:wq
# 赋予可执行权限
chmod +x /opt/centos-init.sh
# 执行脚本
sh centos-init.sh
可参考脚本手动执行,避免网络中断导致的脚本执行错误
docker-init.sh
到三台服务器# 设置sh脚本格式
vi /opt/docker-init.sh
:set fileformat=unix
:wq
# 赋予可执行权限
chmod +x /opt/docker-init.sh
# 执行脚本
sh docker-init.sh
可参考脚本手动执行,避免网络中断导致的脚本执行错误
注:k8s-v1.24以上版本,默认弃用docker
k8s-init.sh
到三台服务器# 设置sh脚本格式
vi /opt/k8s-init.sh
:set fileformat=unix
:wq
# 赋予可执行权限
chmod +x /opt/k8s-init.sh
# 执行脚本
sh k8s-init.sh
下载并上传 master-images.sh
到 master
服务器,即 k8s-node1
执行脚本拉取依赖的镜像
# 设置sh脚本格式
vi /opt/master-images.sh
:set fileformat=unix
:wq
# 赋予可执行权限
chmod +x /opt/master-images.sh
# 执行前可以先查看所需组件的版本
kubeadm config images list --kubernetes-version v1.26.0
# 执行脚本
sh master-images.sh
初始化k8s
# 一、指令式初始化
# 注意,apiserver-advertise-address地址,通过 ip addr 命令,查看 eth0 获得
# 2.6版本后,image-repository参数不再传递给cri运行时去下载pause镜像
# 需要手动修改 /etc/containerd/config.toml 文件,见 1.6.1 问题 1
kubeadm init \
--apiserver-advertise-address=10.0.2.15 \
--control-plane-endpoint=k8s-node1 \
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \
--kubernetes-version v1.26.0 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=10.244.0.0/16
# 二、通过配置文件初始化集群,两者选其一(注:需修改配置文件)
kubeadm config print init-defaults > kubeadm-config.yml
# 初始化集群
kubeadm init --config kubeadm-config.yml
# 若初始化成功,按照提示执行
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 记得复制保存打印的命令:
# kubeadm join 10.0.2.15:6443 --token np71a6.197zbcz5l6yur1ji --discovery-token-ca-cert-hash sha256:82e0eeadf242ce9fbcb0d3681116d3f845bf8a77deeb0a02e38e57c17112cb35
注:若安装高版本大于等于1.26
,建议先直接执行以下问题的解决操作,再初始化
# 重置设置 - 若初始化失败,再次初始化之前,先执行此命令重置
kubeadm reset
# 查看状态
systemctl status kubelet
# 查看错误日志
journalctl -xeu kubelet
安装flannel网络组件
下载并上传 kube-flannel.yml
到master服务器
kubectl apply -f kube-flannel.yml
# 卸载(失败重装时可先卸载)
kubectl delete -f kube-flannel.yml
注:若安装失败,根据报错信息,将
PodSecurityPolicy
组件注释再安装
containerd
问题[init] Using Kubernetes version: v1.26.0
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2023-03-13T16:54:24+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
解决方式
# 检查配置并备份
mv /etc/containerd/config.toml /etc/containerd/config.toml.back
# 重新生成配置
containerd config default > /etc/containerd/config.toml
vi /etc/containerd/config.toml
# 1.找到 [plugins."io.containerd.grpc.v1.cri".registry.mirrors] 并追加4行(不确定此配置是否必要)
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://dockerhub.mirrors.nwafu.edu.cn"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]
endpoint = ["https://registry.aliyuncs.com/k8sxio"]
# 2.找到 [plugins."io.containerd.grpc.v1.cri"] 设置pause镜像地址(必要)
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.aliyuncs.com/k8sxio/pause:3.6"
# 重启服务
systemctl daemon-reload
systemctl enable containerd
systemctl restart containerd
systemctl status containerd
cgroup
问题[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
原因:kebernetes默认设置
cgroup
驱动为systemd
,而docker服务的cgroup
驱动为cgroupfs
,所以,修改其中一个保持一致即可
解决方式一(修改k8s):
# 修改k8s控制组为 cgroupfs
sudo mkdir -p /etc/systemd/system/kubelet.service.d
sudo tee /etc/systemd/system/kubelet.service.d/10-kubeadm.conf <
解决方式二(修改docker)博主使用的此方法:
# 查看docker状态
docker info | grep 'Cgroup'
# 设置docker控制组为 systemd
vi /etc/docker/daemon.json
"exec-opts": ["native.cgroupdriver=systemd"],
systemctl daemon-reload
systemctl restart docker
# 此命令在 master 服务器执行 kubeadm init 初始化成功后会打印
kubeadm join 10.0.2.15:6443 --token np71a6.197zbcz5l6yur1ji \
--discovery-token-ca-cert-hash sha256:82e0eeadf242ce9fbcb0d3681116d3f845bf8a77deeb0a02e38e57c17112cb35
kubectl get nodes
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.2. Latest validated version: 18.09
error execution phase preflight: couldn't validate the identity of the API Server: abort connecting to API servers after timeout of 5m0s
解决办法
# 在master服务器执行,生成新的token
kubeadm token create --ttl 24h --print-join-command
# 查看生成的 token
kubeadm token list
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2023-03-15T16:04:50+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
解决办法
# 检查配置并备份
mv /etc/containerd/config.toml /etc/containerd/config.toml.back
# 重新生成配置
containerd config default > /etc/containerd/config.toml
# 重启服务
systemctl daemon-reload
systemctl enable containerd
systemctl restart containerd
systemctl status containerd
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: Get "https://10.0.2.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp 10.0.2.15:6443: connect: connection refused
解决办法
# 1. 重置 kubeadm reset
# 2. 修改 k8s 初始化时指定的IP(eth0=10.0.2.15 改为 eht1=192.168.56.101)
# 若使用配置文件初始化的,要修改 kubeadm-config.yml中 IP
kubeadm init --apiserver-advertise-address=192.168.56.101 ...
# 3. 修改 kube-flannel.yml 指定 --iface=eth1
[root@k8s-node2 ~]# kubectl get nodes
E0317 17:10:08.837852 8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.838065 8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.839914 8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.842028 8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.843524 8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
解决办法
# 1. 复制 master 节点文件到工作节点 /etc/kubernetes/admin.conf
# 2. 设置环境变量
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> /etc/profile
# 3. 刷新环境
/etc/kubernetes/admin.conf
[root@k8s-node1 opt]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-node1 Ready control-plane 79m v1.26.0
k8s-node2 NotReady 63m v1.26.0
k8s-node3 NotReady 62m v1.26.0
[root@k8s-node2 opt]# journalctl -f -u kubelet.service
rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.6\": failed to pull image \"registry.k8s.io/pause:3.6\
Mar 17 17:44:17 k8s-node2 kubelet[8546]: E0317 17:44:17.090922 8546 kubelet.go:2475] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
解决办法
# 1. 见 6.1 问题 1,修改工作节点 /etc/containerd/config.toml,指定 pause 拉取地址
# 2. 重启 containerd
# 3. 重启 kubelet
systemctl restart kubelet
[root@k8s-node3 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-node1 Ready control-plane 92m v1.26.0
k8s-node2 Ready 75m v1.26.0
k8s-node3 Ready 75m v1.26.0
自此,k8s集群搭建完成,开始你们的表演吧~
最后,别忘了点赞
、关注
、收藏
~