之前准备了5台服务器,准备部署3台Master节点
但由于是使用Vagrant生成的虚拟机,默认网卡eth0对应IP地址 10.0.2.15
在使用 kubeadm join
命令加入集群时,etcd 获取到的地址是 10.0.2.15:2379
,etcd集群中ip地址错误,集群部署失败
问题暂时未解决
[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://10.0.2.15:2379 with maintenance client: context deadline exceeded
To see the stack trace of this error execute with --v=5 or higher
#Vagrant服务器默认的网卡IP地址是10.0.2.15,需要指定etcd的地址
kubeadm config print
kubectl -n kube-system edit cm kubeadm-config
在5台服务器上安装
#查看可以安装的Kubernetes版本
yum list kubeadm.x86_64 --showduplicates | sort -r
#安装kubernetes 1.25
yum install -y kubeadm-1.25* kubelet-1.25* kubectl-1.25*
#查看kubernetes版本
kubeadm version
配置Kubelet使用Containerd作为Runtime
cat > /etc/sysconfig/kubelet <
以下只在k8s-test1主节点上操作
vi kubeadm-config.yaml
#修改配置
#localAPIEndpoint.advertiseAddress:k8s-test1服务器IP
#apiServer.certSANs:VIP地址
#controlPlaneEndpoint:VIP地址
#networking.podSubnet:pod网络段IP
#networking.serviceSubnet:service网络段IP
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: 7t2weq.bjbawausm0jaxury
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.99.211
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
name: k8s-test1
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
---
apiServer:
certSANs:
- 192.168.99.211
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.25.4 # 更改此处的版本号和kubeadm version一致
networking:
dnsDomain: cluster.local
podSubnet: 172.31.0.0/16
serviceSubnet: 10.96.0.0/16
scheduler: {}
使用新版本更新kubeadm配置文件
kubeadm config migrate --old-config kubeadm-config.yaml --new-config kubeadm-config-new.yaml
拉取镜像
kubeadm config images pull --config /root/kubeadm-config-new.yaml
主节点执行初始化
kubeadm init --config /root/kubeadm-config-new.yaml --upload-certs
执行成功后返回
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.99.211:6443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:0d4bee0cdfaada347043712e2efe5eaad48ca0e16ee67a2fcc44d9d1712dfa09
执行
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
把k8s-test4加入到从节点
kubeadm join 192.168.99.211:6443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:0d4bee0cdfaada347043712e2efe5eaad48ca0e16ee67a2fcc44d9d1712dfa09
在k8s-test1中查看节点信息
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-test1 NotReady control-plane 4m22s v1.25.4
k8s-test4 NotReady 7s v1.25.4
#查看token过期时间
kubectl get secret -n kube-system
#在kubeadm-config.yaml中有一个配置项 bootstrapTokens.token: 7t2weq.bjbawausm0jaxury
kubectl get secret -n kube-system | grep '7t2weq'
kubectl get secret -n kube-system bootstrap-token-7t2weq -o yaml
#其中data.expiration: MjAyMi0xMS0yN1QwNDoyNzoyMFo= base64格式的数据
#base64解密
echo "MjAyMi0xMS0yN1QwNDoyNzoyMFo=" | base64 -d
kubeadm init
生成的token在24小时内有效,如果过期了,需要重新生成令牌
#工作节点生成token
kubeadm token create --print-join-command
把k8s-test5加入到从节点
在k8s-test1查看所有节点状态
kubectl get nodes
在一个master节点上执行
curl https://docs.projectcalico.org/manifests/calico.yaml -O
#注意:pod-network-cidr中的ip需要与calico.yaml中的ip一致,查找到192.168.0.0修改成172.31.0.0
cat calico.yaml |grep 192.168
vi calico.yaml
- name: CALICO_IPV4POOL_CIDR
value: "172.31.0.0/16"
kubectl apply -f calico.yaml
查询主节点的状态,等待pod都处于Running状态
kubectl get nodes
kubectl get pods -A
NAME STATUS ROLES AGE VERSION
k8s-test1 Ready control-plane 17m v1.25.4
k8s-test4 Ready 14m v1.25.4
k8s-test5 Ready 8m5s v1.25.4
遇到calico错误
kubectl get pods -A -owide
kubectl describe pod calico-node-rng8z -n kube-system
kubectl logs calico-node-rng8z -n kube-system
Defaulted container "calico-node" out of: calico-node, upgrade-ipam (init), install-cni (init), mount-bpffs (init)
Error from server (BadRequest): container "calico-node" in pod "calico-node-rng8z" is waiting to start: PodInitializing
#查看kube-proxy日志,未发现异常信息
kubectl logs kube-proxy-l6vvh -n kube-system
kubeadm使用容器来部署kube-proxy,重启服务器后,k8s集群恢复正常了
有部分pod使用宿主机IP地址
kubectl get pod calico-node-8c97w -n kube-system -oyaml | grep hostNetwork
官网下载
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
#替换成国内镜像
registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.6.2
kubectl apply -f components.yaml
遇到问题
#metrics-server pod无法正常启动,报错信息如下,就绪探针没有就绪导致容器启动不了
kubectl describe pod metrics-server-76f8496875-dxqjq -n kube-system | tail -10
#Warning Unhealthy 3s (x9 over 73s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
kubectl logs metrics-server-76f8496875-dxqjq -n kube-system
#E1201 15:00:08.422998 1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.99.213:10250/metrics/resource\": x509: cannot validate certificate for 192.168.99.213 because it doesn't contain any IP SANs" node="k8s-test3"
#解决方法:添加参数--kubelet-insecure-tls不验证客户端证书
vim components.yaml
#在Deployment中的args末尾中添加
- args:
- --kubelet-insecure-tls
#再次执行
kubectl apply -f components.yaml
kubectl get pods -A
kubectl top node
kubectl top pod -A
kubernetes官方提供的可视化界面
https://github.com/kubernetes/dashboard
wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
#由于国外github网站连接速度太慢,可能会下载文件失败,出现问题:The connection to the server raw.githubusercontent.com was refused
#解决方法参考:https://blog.csdn.net/weixin_38074756/article/details/109231865
#在 http://ip.tool.chinaz.com/ 查询域名raw.githubusercontent.com
vi /etc/hosts
185.199.111.133 raw.githubusercontent.com
替换国内镜像
cat recommended.yaml | grep image
registry.cn-hangzhou.aliyuncs.com/google_containers/dashboard:v2.5.0 #v2.7.0出现错误:exec /dashboard: exec format error
registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-scraper:v1.0.8
kubectl apply -f recommended.yaml
kubectl get pods -n kubernetes-dashboard
kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-65dbd8fb9b-jd7zz
kubectl get svc kubernetes-dashboard -n kubernetes-dashboard
kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard
#把type: ClusterIP 改为 type: NodePort
kubectl get svc kubernetes-dashboard -n kubernetes-dashboard
#查看dashboard的端口(443:31816/TCP 端口随机分配)
kubectl get svc -A |grep kubernetes-dashboard
#在Window机器访问 https://192.168.99.211:31816
#如果Chrome因证书安全问题不能访问,换一个浏览器试试
创建访问账号
vim dashboard-usr.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kube-system
注意:
notepad++在复制yaml文件时,可能空格、tab混用导致格式错误
执行
kubectl apply -f dashboard-usr.yaml
获取访问令牌
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')
#如果没有获取到token,修改apiserver、controller-manager配置
vim /etc/kubernetes/manifests/kube-apiserver.yaml
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
#添加
- --feature-gates=LegacyServiceAccountTokenNoAutoGeneration=false
systemctl restart kubelet
kubectl delete -f dashboard-usr.yaml
kubectl apply -f dashboard-usr.yaml
kubectl get serviceaccount -n kube-system | grep admin-user
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')
#使用token登录Dashboard
#安装metrics-server后,在Dashboard中可以查看内存、CPU使用率
查看kube-proxy模式
curl 127.0.0.1:10249/proxyMode
#默认:iptables,因性能问题需要修改成:ipvs (只在一个master节点上修改)
kubectl edit cm kube-proxy -n kube-system
mode: ipvs
滚动更新Kube-Proxy的Pod
kubectl patch daemonset kube-proxy -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}" -n kube-system
kubectl get pods -n kube-system
#验证Kube-Proxy模式
curl 127.0.0.1:10249/proxyMode
ipvs
默认Master节点自带污点,不能在Master上调试部署业务Pod
kubeadm安装的k8s,配置文件位置
/etc/kubernetes/manifests/
#修改配置后,重启kubelet
systemctl restart kubelet
#查看污点
kubectl describe node -l node-role.kubernetes.io/control-plane | grep Taints
删除所有master节点上的污点
kubectl taint node -l node-role.kubernetes.io/control-plane node-role.kubernetes.io/control-plane:NoSchedule-
https://kubernetes.io/zh/docs/reference/kubectl/cheatsheet
Kubectl自动补全
yum install -y bash-completion
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc
#按2次tab补全
kubectl get
配置文件中查看apiserver的地址
cat /etc/kubernetes/admin.conf
cat ~/.kube/config
# 显示合并的 kubeconfig 配置。
kubectl config view
切换集群
kubectl config use-context my-cluster-name # 设置默认的上下文为 my-cluster-name
kubectl config get-contexts # 显示上下文列表
kubectl config current-context # 展示当前所处的上下文
kubectl config use-context kubernetes-admin@kubernetes
kubectl apply
在集群中创建和更新资源
kubectl create deployment nginx --image=nginx # 启动单实例 nginx
kubectl get deployment nginx
kubectl get deployment nginx -oyaml
生成配置文件,不启动应用
kubectl create deployment nginx --image=nginx --dry-run=client -oyaml
kubectl delete deployment nginx
kubectl api-resources --namespaced=true # 所有命名空间作用域的资源
kubectl api-resources --namespaced=false # 所有非命名空间作用域的资源
# 列出当前名字空间下所有 Services,按名称排序
kubectl get services --sort-by=.metadata.name
# 列出 Pods,按重启次数排序
kubectl get pods --sort-by='.status.containerStatuses[0].restartCount'
# 列举所有 PV 持久卷,按容量排序
kubectl get pv --sort-by=.spec.capacity.storage
# 显示所有 Pods 的标签
kubectl get pods --show-labels
#使用标签过滤数据
kubectl get pods -l app=nginx
更新资源
kubectl set image deployment nginx nginx=nginx:1.15.1
kubectl edit deployment nginx
kubectl logs -f nginx-5c4db87df7-2llzg
#查看结尾10行日志
kubectl logs nginx-5c4db87df7-2llzg --tail 10
kubectl logs my-pod -c my-container # 获取 Pod 容器的日志(标准输出, 多容器场景)
kubectl exec -it my-pod -- ls / # 在已有的 Pod 中运行命令(单容器场景)
kubectl exec -it my-pod -c my-container -- ls / # 在已有的 Pod 中运行命令(多容器场景)
kubectl exec -it my-pod -- sh
kubectl exec -it my-pod -- bash
生产环境建议1年更新一次k8s版本,同时更新证书
#查看证书过期时间
kubeadm certs check-expiration
备份证书
cp -rp /etc/kubernetes/pki/ /opt/pki.bak
#证书更新(在每个master节点上执行)
kubeadm certs renew all
kubeadm certs check-expiration
#每个节点重启一下
systemctl restart kubelet
证书更新99年(下载k8s源码修改重新编译kubeadm)
#下载源码,并切换到对应版本号的分支上
kubeadm version
git clone https://gitee.com/mirrors/kubernetes.git
cd kubernetes/
git branch -a
git tag
git checkout v1.25.4 #对应本机k8s版本号
git status
#在docker容器中编译源码
systemctl start docker
docker run -it --rm -v `pwd`:/go/src/ registry.cn-beijing.aliyuncs.com/dotbalo/golang:kubeadm bash
#在容器中操作
cd /go/src/
go env -w GOPROXY=https://goproxy.cn,direct
go env -w GOSUMDB=offset
#证书常量配置文件
grep "365" cmd/kubeadm/app/constants/constants.go
sed -i 's#365#365 * 100#g' cmd/kubeadm/app/constants/constants.go
grep "365" cmd/kubeadm/app/constants/constants.go
mkdir -p _output/
chmod 777 -R _output/
make WHAT=cmd/kubeadm
ls _output/bin/kubeadm
#拷贝到挂载的目录
cp _output/bin/kubeadm ./kubeadm
#退出容器,并关闭docker
exit
systemctl stop docker
systemctl status docker
ls
#使用新编译出来的kubeadm更新集群证书
./kubeadm version #编译出来的版本v1.25.4-dirty
cp kubeadm /opt/
/opt/kubeadm version
/opt/kubeadm certs renew all
/opt/kubeadm certs check-expiration
每个节点重启一下
systemctl restart kubelet
#如果还有其它master节点
scp /opt/kubeadm k8s-test2:/opt/
#在k8s-test2更新
/opt/kubeadm certs renew all
systemctl restart kubelet