机器信息
升级内核
系统配置
部署容器运行时Containerd
安装crictl客户端命令
配置服务器支持开启ipvs的前提条件
安装 kubeadm、kubelet 和 kubectl
初始化集群 (master)
安装CNI Calico
集群加入node节点
主机名 | 集群角色 | IP | 内核 | 系统版本 | 配置 |
---|---|---|---|---|---|
l-shahe-k8s-master1.ops.prod | master | 10.120.128.1 | 5.4.231-1.el7.elrepo.x86_64 | CentOS Linux release 7.9.2009 (Core) |
32C 128G |
10.120.129.1 | node | 10.120.129.1 | 5.4.231-1.el7.elrepo.x86_64 | CentOS Linux release 7.9.2009 (Core) | 32C 128G |
10.120.129.2 | node | 10.120.129.2 | 5.4.231-1.el7.elrepo.x86_64 | CentOS Linux release 7.9.2009 (Core) | 32C 128G |
参考
kubernetes 1.26.1 Etcd部署(外接)保姆级教程_Cloud孙文波的博客-CSDN博客保姆级部署文档https://blog.csdn.net/weixin_43798031/article/details/129215326
参考部署etcd篇
kubernetes 1.26.1 Etcd部署(外接)保姆级教程_Cloud孙文波的博客-CSDN博客
参考部署etcd篇
参考部署etcd篇
参考部署etcd篇
参考部署etcd篇
清理当前集群环境,线上集群需谨慎
swapoff -a
kubeadm reset
systemctl daemon-reload && systemctl restart kubelet
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
cp -r /opt/k8s-install/pki/* /etc/kubernetes/pki/.
systemctl stop kubelet
rm -rf /etc/cni/net.d/*
# 清理etcd数据 一定要谨慎
etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=https://10.120.174.14:2379,https://10.120.175.5:2379,https://10.120
.175.36:2379 del "" --prefix
使用kubeadm config print init-defaults --component-configs KubeletConfiguration
可以打印集群初始化默认的使用的配置:
备注:serviceSubnet: 10.122.0.0/18 实际分配给我们16位的地址段(10.122.0.0/16),考虑到会浪费ip地址。分为了4个/18的子网,只用了4分之一的ip数量即16338, 预留了16338 *3
预留网段:
10.122.64.0/18
10.122.128.0/18
10.122.192.0/18
kubeadm.yaml 文件内
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.120.128.1 #master主 机 IP
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: l-shahe-k8s-master1 #master主 机 名
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 10.120.102.9:6443 #LVS IP
controllerManager: {}
dns:
type: CoreDNS
etcd:
external:
endpoints:
- https://10.120.174.14:2379 #外 接 ETCD
- https://10.120.175.5:2379
- https://10.120.175.36:2379
caFile: /etc/kubernetes/pki/etcd/ca.crt #ETCD证 书
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
imageRepository: registry.aliyuncs.com/google_containers #阿 里 云 镜 像
kind: ClusterConfiguration
kubernetesVersion: 1.26.1
networking:
dnsDomain: cluster.local
serviceSubnet: 10.122.0.0/18 #services网段 如果service 网段有暴露出去的需求,需要和网络同学规划
podSubnet: 10.121.0.0/16 #POD网 段 , 需 要 规 划 网 络
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
failSwapOn: false
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs
执行如下命令,完成初始化
kubeadm init --config kubeadm.yaml --upload-certs
创建kubectl 配置文件
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl label node l-shahe-k8s-master1 node-role.kubernetes.io/control-plane-
kubectl label node l-shahe-k8s-master1 node-role.kubernetes.io/master=
备注:coredns Pending、节点NotReady 是因为没有安装CNI插件,下面步骤进行安装calico CNI步骤
github 下载安装包 release-v3.25.0.tgz
下载解压
mkdir -p /opt/k8s-install/calico/ && cd /opt/k8s-install/calico/
#我提前从github下载好的
wget 10.60.127.202:19999/k8s-1.26.1-image/release-v3.25.0.tgz
tar xf release-v3.25.0.tgz && cd /opt/k8s-install/calico/release-v3.25.0/images && source /root/.bash_profile
导入镜像
for i in `ls`;do ctr -n k8s.io images import $i;done
安装calico需要用到以下几个文件,里面的配置根据实际情况进行调整
custom-resources.yaml
# This section includes base Calico installation configuration.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
# Configures Calico networking.
registry: harbor-sh.yidian-inc.com/ #指定私有仓库
imagePath: calico #仓库路径,由于我是公开的仓库,所以没有配置imagePullSecrets
calicoNetwork:
# Note: The ipPools section cannot be modified post-install.
ipPools:
- blockSize: 27 #每台机器占用的预分配的ip地址
cidr: 10.121.0.0/16
encapsulation: IPIPCrossSubnet
natOutgoing: Disabled #不需要走nat转换
nodeSelector: all()
---
# This section configures the Calico API server.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
name: default
spec: {}
bgp-config.yaml
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
name: default
spec:
logSeverityScreen: Info
nodeToNodeMeshEnabled: false
asNumber: 63400
serviceClusterIPs:
- cidr: 10.122.0.0/18 #和kubeadm.yaml文件的地址保持一致,services网段 如果service网段有暴露出去的需求,需要和网络同学规划
listenPort: 178
bindMode: NodeIP
#communities:
# - name: bgp-large-community
# value: 63400:300:100
#prefixAdvertisements:
# - cidr: 172.218.4.0/26
# communities:
# - bgp-large-community
# - 63400:120
备注:部署TOR BGP模式需要机房网络设备支持BGP协议并配置BGP相关网络,peerIP和asNumber号需要网络组同学提供。
我们的方案是:机房每个机柜顶部都有一台三层交换机,机柜内的机器与顶部的三层交换机建立peer关系,每个机柜的三层交换机再建立peer关系。
bgp-peer.yaml
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: 10-120-128
spec:
peerIP: '10.120.128.254'
keepOriginalNextHop: true
asNumber: 64531
nodeSelector: rack == '10.120.128'
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
name: 10-120-129
spec:
peerIP: '10.120.129.254'
keepOriginalNextHop: true
asNumber: 64532
nodeSelector: rack == '10.120.129'
ippool.yaml
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: outbound-ippool
spec:
blockSize: 27
cidr: 10.121.0.0/18
ipipMode: "Never"
natOutgoing: false
disabled: false
nodeSelector: node_out_internet == "true"
allowedUses:
- Workload
- Tunnel
---
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: internal-ippool
spec:
blockSize: 27
cidr: 10.121.64.0/18
ipipMode: "Never"
natOutgoing: false
disabled: false
nodeSelector: node_out_internet == "false"
allowedUses:
- Workload
- Tunne
开始安装calico operator 和配置bgp网络
kubectl create -f tigera-operator.yaml. #只改替换原镜像为私有镜像,不需要修改其他原生的配置
kubectl create -f custom-resources.yaml
kubectl create -f bgp-config.yaml
kubectl delete ippool default-ipv4-ippool
kubectl create -f ippool.yaml #创建新的ip池
#删除已有的pod,备注已经分配了之前默认ip池地址的pod。需要删除掉,重新从自定义的IPPOOL获取IP
kubectl delete pod * -n calico-apiserver
kubectl delete pod * -n calico-system
kubectl delete pod * -n kube-system
kubectl create -f bgp-peer.yaml
给node节点增加标签
kubectl label node l-shahe-k8s-master1 rack='10.120.128'
kubectl label node l-shahe-k8s-master2 rack='10.120.128'
kubectl label node l-shahe-k8s-master3 rack='10.120.129'
kubectl label node l-shahe-k8s-master1 node_out_internet=true
kubectl label node l-shahe-k8s-master2 node_out_internet=true
kubectl label node l-shahe-k8s-master3 node_out_internet=true
重要:修改node节点的AS number master执行
calicoctl patch node l-shahe-k8s-master1 -p '{"spec": {"bgp": {"asNumber": "64531"}}}'
calicoctl patch node l-shahe-k8s-master2 -p '{"spec": {"bgp": {"asNumber": "64531"}}}'
calicoctl patch node l-shahe-k8s-master3 -p '{"spec": {"bgp": {"asNumber": "64532"}}}'
检查BGP 连接状态
[root@l-shahe-k8s-master1 calico]$ calicoctl node status
Calico process is running.
IPv4 BGP status
+----------------+---------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+----------------+---------------+-------+----------+-------------+
| 10.120.128.254 | node specific | up | 10:34:04 | Established |
+----------------+---------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
[root@10 ~]# calicoctl node status
Calico process is running.
IPv4 BGP status
+----------------+---------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+----------------+---------------+-------+----------+-------------+
| 10.120.129.254 | node specific | up | 10:48:16 | Established |
+----------------+---------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
修改containerd 配置文件,从私有仓库拉取镜像
sandbox_image = "harbor-sh.yidian-inc.com/kubernetes-1.26.1/pause:3.6"
[plugins."io.containerd.grpc.v1.cri".registry]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor-sh.myharbor"]
endpoint = ["https://harbor-sh.myharbor.com"]
[plugins."io.containerd.grpc.v1.cri".registry.configs]
[plugins."io.containerd.grpc.v1.cri".registry.configs."harbor-sh.myharbor".tls]
insecure_skip_verify = false
ca_file = "/etc/containerd/cert/harbor-sh-ca.crt"
[plugins."io.containerd.grpc.v1.cri".registry.configs."harbor-sh.myharbor".auth]
username = "admin"
password = "OpsSre"
[plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
tls_cert_file = ""
tls_key_file = ""
完成内核升级、系统配置、部署容器运行时Containerd、安装crictl客户端命令、安装 kubeadm、kubelet 和 kubectl
kubelet需要指定 --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock
vim /usr/lib/systemd/system/kubelet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=https://kubernetes.io/docs/
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock
Restart=always
StartLimitInterval=0
RestartSec=10
[Install]
WantedBy=multi-user.target
重启kubelet
systemctl daemon-reload && systemctl restart kubelet && systemctl status kubelet && systemctl restart containerd
将节点加入到集群
kubeadm join 10.120.102.9:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:fd831c44b451c671938a0f11d15c381b75fe0b1d9182c1fd596dbd800ed3242a
由于使用了Calico ToR 网络模式每次新加入的节点都要修改calico node节点as number号,才能和网关建立对等关系
常见问题:
镜像打完tag无法推送到本地的仓库,提示 not found 如下图:
[root@l-shahe-k8s-master1 calico]$ ctr -n k8s.io images push -k --user admin:Ops12345 harbor-sh.yidian-inc.com/calico/csi:v3.25.0
index-sha256:61a95f3ee79a7e591aff9eff535be73e62d2c3931d07c2ea8a1305f7bea19b31: waiting |--------------------------------------|
elapsed: 0.1 s total: 0.0 B (0.0 B/s)
ctr: content digest sha256:105ed88c5c46c6f1a24e4deb53a20475f124506067b7847ba0d10199b71f8c42: not found
解决方法:
拉取镜像 从所有平台提取内容和元数据
ctr -n k8s.io i pull --all-platforms docker.io/calico/csi:v3.25.0
ctr -n k8s.io images push -k --user admin:Ops12345 harbor-sh.yidian-inc.com/calico/csi:v3.25.0
优化kubelet 配置
kubectl apply -f kubelet.yaml
kubelet.yaml
apiVersion: v1
data:
kubelet: |
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
failSwapOn: false
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
systemReserved:
cpu: 500m
memory: 1024Mi
kubeReserved:
cpu: 500m
memory: 1024Mi
evictionHard:
memory.available: "1000Mi"
nodefs.available: "10%"
nodefs.inodesFree: "5%"
imagefs.available: "15%"
imageMinimumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
flushFrequency: 0
options:
json:
infoBufferSize: "0"
verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
kind: ConfigMap
metadata:
creationTimestamp: "2023-02-28T06:58:21Z"
name: kubelet-config
namespace: kube-system
resourceVersion: "967290"
uid: a6bf7d67-5356-45cf-b3c2-78aedb09098b