k8s 1.25学习2 - 使用kubeadm部署单Master集群

之前准备了5台服务器,准备部署3台Master节点
但由于是使用Vagrant生成的虚拟机,默认网卡eth0对应IP地址 10.0.2.15
在使用 kubeadm join 命令加入集群时,etcd 获取到的地址是 10.0.2.15:2379,etcd集群中ip地址错误,集群部署失败

问题暂时未解决

[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://10.0.2.15:2379 with maintenance client: context deadline exceeded
To see the stack trace of this error execute with --v=5 or higher
  
#Vagrant服务器默认的网卡IP地址是10.0.2.15,需要指定etcd的地址
kubeadm config print
kubectl -n kube-system edit cm kubeadm-config 

安装Kubernetes组件

在5台服务器上安装

#查看可以安装的Kubernetes版本
yum list kubeadm.x86_64 --showduplicates | sort -r

#安装kubernetes 1.25
yum install -y kubeadm-1.25* kubelet-1.25* kubectl-1.25* 

#查看kubernetes版本
kubeadm version

配置Kubelet使用Containerd作为Runtime

cat > /etc/sysconfig/kubelet <

集群初始化

以下只在k8s-test1主节点上操作
vi kubeadm-config.yaml

#修改配置
#localAPIEndpoint.advertiseAddress:k8s-test1服务器IP
#apiServer.certSANs:VIP地址
#controlPlaneEndpoint:VIP地址
#networking.podSubnet:pod网络段IP
#networking.serviceSubnet:service网络段IP

apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: 7t2weq.bjbawausm0jaxury
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.99.211
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  name: k8s-test1
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
---
apiServer:
  certSANs:
  - 192.168.99.211
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.25.4 # 更改此处的版本号和kubeadm version一致
networking:
  dnsDomain: cluster.local
  podSubnet: 172.31.0.0/16
  serviceSubnet: 10.96.0.0/16
scheduler: {}

使用新版本更新kubeadm配置文件

kubeadm config migrate --old-config kubeadm-config.yaml --new-config kubeadm-config-new.yaml

拉取镜像

kubeadm config images pull --config /root/kubeadm-config-new.yaml

主节点执行初始化

kubeadm init --config /root/kubeadm-config-new.yaml  --upload-certs

执行成功后返回

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.99.211:6443 --token 7t2weq.bjbawausm0jaxury \
	--discovery-token-ca-cert-hash sha256:0d4bee0cdfaada347043712e2efe5eaad48ca0e16ee67a2fcc44d9d1712dfa09 

执行

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

把k8s-test4加入到从节点

kubeadm join 192.168.99.211:6443 --token 7t2weq.bjbawausm0jaxury \
	--discovery-token-ca-cert-hash sha256:0d4bee0cdfaada347043712e2efe5eaad48ca0e16ee67a2fcc44d9d1712dfa09

在k8s-test1中查看节点信息

kubectl get nodes
NAME        STATUS     ROLES           AGE     VERSION
k8s-test1   NotReady   control-plane   4m22s   v1.25.4
k8s-test4   NotReady             7s      v1.25.4

重新生成k8s token

#查看token过期时间
kubectl get secret -n kube-system
#在kubeadm-config.yaml中有一个配置项 bootstrapTokens.token: 7t2weq.bjbawausm0jaxury
kubectl get secret -n kube-system | grep '7t2weq'
kubectl get secret -n kube-system bootstrap-token-7t2weq -o yaml
#其中data.expiration: MjAyMi0xMS0yN1QwNDoyNzoyMFo=    base64格式的数据
#base64解密
echo "MjAyMi0xMS0yN1QwNDoyNzoyMFo=" | base64 -d

kubeadm init 生成的token在24小时内有效,如果过期了,需要重新生成令牌

#工作节点生成token
kubeadm token create --print-join-command
把k8s-test5加入到从节点

在k8s-test1查看所有节点状态

kubectl get nodes

安装网络组件calico

在一个master节点上执行

curl https://docs.projectcalico.org/manifests/calico.yaml -O
#注意:pod-network-cidr中的ip需要与calico.yaml中的ip一致,查找到192.168.0.0修改成172.31.0.0
cat calico.yaml |grep 192.168
vi calico.yaml
            - name: CALICO_IPV4POOL_CIDR
              value: "172.31.0.0/16"

kubectl apply -f calico.yaml

查询主节点的状态,等待pod都处于Running状态

kubectl get nodes
kubectl get pods -A
NAME        STATUS   ROLES           AGE    VERSION
k8s-test1   Ready    control-plane   17m    v1.25.4
k8s-test4   Ready              14m    v1.25.4
k8s-test5   Ready              8m5s   v1.25.4

遇到calico错误

kubectl get pods -A -owide
kubectl describe pod calico-node-rng8z -n kube-system
kubectl logs calico-node-rng8z -n kube-system

Defaulted container "calico-node" out of: calico-node, upgrade-ipam (init), install-cni (init), mount-bpffs (init)
Error from server (BadRequest): container "calico-node" in pod "calico-node-rng8z" is waiting to start: PodInitializing

#查看kube-proxy日志,未发现异常信息
kubectl logs kube-proxy-l6vvh -n kube-system

kubeadm使用容器来部署kube-proxy,重启服务器后,k8s集群恢复正常了

有部分pod使用宿主机IP地址

kubectl get pod calico-node-8c97w -n kube-system -oyaml | grep hostNetwork

安装metrics-server监控工具

官网下载

wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
#替换成国内镜像
registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.6.2
kubectl apply -f components.yaml

遇到问题

#metrics-server pod无法正常启动,报错信息如下,就绪探针没有就绪导致容器启动不了
kubectl describe pod metrics-server-76f8496875-dxqjq -n kube-system | tail -10
#Warning  Unhealthy  3s (x9 over 73s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500

kubectl logs metrics-server-76f8496875-dxqjq -n kube-system
#E1201 15:00:08.422998       1 scraper.go:140] "Failed to scrape node" err="Get \"https://192.168.99.213:10250/metrics/resource\": x509: cannot validate certificate for 192.168.99.213 because it doesn't contain any IP SANs" node="k8s-test3"


#解决方法:添加参数--kubelet-insecure-tls不验证客户端证书
vim components.yaml
#在Deployment中的args末尾中添加
- args:
        - --kubelet-insecure-tls
      
        
#再次执行
kubectl apply -f components.yaml
kubectl get pods -A
kubectl top node
kubectl top pod -A        

安装Dashboard

kubernetes官方提供的可视化界面
https://github.com/kubernetes/dashboard

wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
#由于国外github网站连接速度太慢,可能会下载文件失败,出现问题:The connection to the server raw.githubusercontent.com was refused
#解决方法参考:https://blog.csdn.net/weixin_38074756/article/details/109231865
#在 http://ip.tool.chinaz.com/ 查询域名raw.githubusercontent.com
vi /etc/hosts
185.199.111.133  raw.githubusercontent.com

替换国内镜像

cat recommended.yaml | grep image
registry.cn-hangzhou.aliyuncs.com/google_containers/dashboard:v2.5.0    #v2.7.0出现错误:exec /dashboard: exec format error
registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-scraper:v1.0.8


kubectl apply -f recommended.yaml
kubectl get pods -n kubernetes-dashboard
kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-65dbd8fb9b-jd7zz 
kubectl get svc kubernetes-dashboard -n kubernetes-dashboard

kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard
#把type: ClusterIP 改为 type: NodePort
kubectl get svc kubernetes-dashboard -n kubernetes-dashboard

#查看dashboard的端口(443:31816/TCP 端口随机分配)
kubectl get svc -A |grep kubernetes-dashboard

#在Window机器访问 https://192.168.99.211:31816
#如果Chrome因证书安全问题不能访问,换一个浏览器试试

创建访问账号
vim dashboard-usr.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding 
metadata: 
  name: admin-user
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kube-system

注意:
notepad++在复制yaml文件时,可能空格、tab混用导致格式错误

执行

kubectl apply -f dashboard-usr.yaml  

获取访问令牌

kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')

#如果没有获取到token,修改apiserver、controller-manager配置
vim /etc/kubernetes/manifests/kube-apiserver.yaml
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
#添加
    - --feature-gates=LegacyServiceAccountTokenNoAutoGeneration=false


systemctl restart kubelet

kubectl delete -f dashboard-usr.yaml  
kubectl apply -f dashboard-usr.yaml  

kubectl get serviceaccount -n kube-system | grep admin-user 
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')

#使用token登录Dashboard
#安装metrics-server后,在Dashboard中可以查看内存、CPU使用率

修改kube-proxy网络模式

查看kube-proxy模式

curl 127.0.0.1:10249/proxyMode

#默认:iptables,因性能问题需要修改成:ipvs (只在一个master节点上修改)
kubectl edit cm kube-proxy -n kube-system
mode: ipvs

滚动更新Kube-Proxy的Pod

kubectl patch daemonset kube-proxy -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}" -n kube-system
kubectl get pods -n kube-system


#验证Kube-Proxy模式
curl 127.0.0.1:10249/proxyMode
ipvs


查看Master节点污点

默认Master节点自带污点,不能在Master上调试部署业务Pod

kubeadm安装的k8s,配置文件位置

/etc/kubernetes/manifests/

#修改配置后,重启kubelet 
systemctl restart kubelet


#查看污点
kubectl describe node -l node-role.kubernetes.io/control-plane | grep Taints

删除所有master节点上的污点
kubectl taint node -l node-role.kubernetes.io/control-plane node-role.kubernetes.io/control-plane:NoSchedule-


常用的kubectl命令

https://kubernetes.io/zh/docs/reference/kubectl/cheatsheet

Kubectl自动补全

yum install -y bash-completion
source <(kubectl completion bash) 
echo "source <(kubectl completion bash)" >> ~/.bashrc
#按2次tab补全
kubectl get 

配置文件中查看apiserver的地址

cat /etc/kubernetes/admin.conf
cat ~/.kube/config


# 显示合并的 kubeconfig 配置。
kubectl config view 

切换集群

kubectl config use-context my-cluster-name  # 设置默认的上下文为 my-cluster-name

kubectl config get-contexts                          # 显示上下文列表
kubectl config current-context                       # 展示当前所处的上下文
kubectl config use-context kubernetes-admin@kubernetes

kubectl apply 在集群中创建和更新资源

kubectl create deployment nginx --image=nginx # 启动单实例 nginx
kubectl get deployment nginx
kubectl get deployment nginx -oyaml


生成配置文件,不启动应用
kubectl create deployment nginx --image=nginx --dry-run=client -oyaml


kubectl delete deployment nginx

kubectl api-resources --namespaced=true      # 所有命名空间作用域的资源
kubectl api-resources --namespaced=false     # 所有非命名空间作用域的资源

# 列出当前名字空间下所有 Services,按名称排序
kubectl get services --sort-by=.metadata.name

# 列出 Pods,按重启次数排序
kubectl get pods --sort-by='.status.containerStatuses[0].restartCount'

# 列举所有 PV 持久卷,按容量排序
kubectl get pv --sort-by=.spec.capacity.storage

# 显示所有 Pods 的标签
kubectl get pods --show-labels
#使用标签过滤数据
kubectl get pods -l app=nginx

更新资源
kubectl set image deployment nginx nginx=nginx:1.15.1 
kubectl edit deployment nginx

kubectl logs -f nginx-5c4db87df7-2llzg
#查看结尾10行日志
kubectl logs nginx-5c4db87df7-2llzg --tail 10
kubectl logs my-pod -c my-container                 # 获取 Pod 容器的日志(标准输出, 多容器场景)

kubectl exec -it my-pod -- ls /                         # 在已有的 Pod 中运行命令(单容器场景)
kubectl exec -it my-pod -c my-container -- ls /         # 在已有的 Pod 中运行命令(多容器场景)
kubectl exec -it my-pod -- sh
kubectl exec -it my-pod -- bash


更新k8s集群证书

生产环境建议1年更新一次k8s版本,同时更新证书

#查看证书过期时间
kubeadm certs check-expiration

备份证书
cp -rp /etc/kubernetes/pki/ /opt/pki.bak


#证书更新(在每个master节点上执行)
kubeadm certs renew all
kubeadm certs check-expiration
#每个节点重启一下
systemctl restart kubelet

证书更新99年(下载k8s源码修改重新编译kubeadm)

#下载源码,并切换到对应版本号的分支上
kubeadm version
git clone https://gitee.com/mirrors/kubernetes.git
cd kubernetes/
git branch -a
git tag
git checkout v1.25.4  #对应本机k8s版本号
git status


#在docker容器中编译源码
systemctl start docker
docker run -it --rm -v `pwd`:/go/src/ registry.cn-beijing.aliyuncs.com/dotbalo/golang:kubeadm bash 
#在容器中操作
cd /go/src/
go env -w GOPROXY=https://goproxy.cn,direct
go env -w GOSUMDB=offset
#证书常量配置文件
grep "365" cmd/kubeadm/app/constants/constants.go
sed -i 's#365#365 * 100#g' cmd/kubeadm/app/constants/constants.go
grep "365" cmd/kubeadm/app/constants/constants.go
mkdir -p _output/
chmod 777 -R _output/
make WHAT=cmd/kubeadm
ls _output/bin/kubeadm
#拷贝到挂载的目录
cp _output/bin/kubeadm ./kubeadm
#退出容器,并关闭docker
exit 
systemctl stop docker
systemctl status docker
ls


#使用新编译出来的kubeadm更新集群证书
./kubeadm version	#编译出来的版本v1.25.4-dirty
cp kubeadm /opt/
/opt/kubeadm version
/opt/kubeadm certs renew all
/opt/kubeadm certs check-expiration
每个节点重启一下
systemctl restart kubelet

#如果还有其它master节点
scp /opt/kubeadm k8s-test2:/opt/
#在k8s-test2更新
/opt/kubeadm certs renew all
systemctl restart kubelet

你可能感兴趣的:(k8s,kubernetes,学习,运维)