1.环境介绍
(1).高可用集群规划
主机名 | ip地址 | 说明 |
---|---|---|
k8s-master01 | 192.168.2.96 | master节点 |
k8s-master02 | 192.168.2.97 | master节点 |
k8s-master03 | 192.168.2.98 | master节点 |
k8s-node01 | 192.168.2.99 | node节点 |
k8s-node02 | 192.168.2.100 | node节点 |
k8s-master-vip | 192.168.2.236 | keepalived虚拟ip |
(2).网段规划
网段名称 | 网段划分 |
---|---|
宿主机网段 | 192.168.2.1/24 |
Pod网段 | 172.16.0.0/12 |
Service网段 | 10.0.0.0/16 |
2.配置信息
配置信息 | 备注 |
---|---|
系统版本 | centos7.9 |
Docker版本 | 20.10x |
kubeadm版本 | v1.23.17 |
cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
docker --version
Docker version 20.10.21, build baeda1f
kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.17", GitCommit:"953be8927218ec8067e1af2641e540238ffd7576", GitTreeState:"clean", BuildDate:"2023-02-22T13:33:14Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}
注意事项:宿主机网段、K8s Service网段、Pod网段不能重复!!!
3.修改主机名
(1)根据规划信息在每台机器上修改主机名
hostnamectl set-hostname k8s-master01
hostnamectl set-hostname k8s-master02
hostnamectl set-hostname k8s-master03
hostnamectl set-hostname k8s-node01
hostnamectl set-hostname k8s-node02
4.修改hosts文件
(1)安装vim编辑器,如果已安装则可忽略
yum insytall vim -y
(2)修改每台机器的hosts文件
vim /etc/hosts
192.168.2.96 k8s-master01
192.168.2.97 k8s-master02
192.168.2.98 k8s-master03
192.168.2.236 k8s-master-vip
192.168.2.99 k8s-node01
192.168.2.100 k8s-node02
注意事项:如果不是高可用集群,上面VIP为Master01的IP!!!
5.安装yum源
(1)在每台机器上执行以下命令配置默认yum源并安装依赖
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
yum install -y yum-utils device-mapper-persistent-data lvm2
(2)在每台机器上执行以下命令配置Docker的yum源
yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
(3)在每台机器上执行以下命令配置kubernetes的yum源
cat < /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo
6.必备工具安装
(1)在每台机器上执行以下命令安装必备工具
yum install wget jq psmisc vim net-tools telnet yum-utils device-mapper-persistent-data lvm2 git -y
7.关闭防火墙、swap分区、dnsmasq、selinux
(1)在每台机器上执行以下命令关闭防火墙
systemctl disable --now firewalld
(2)在每台机器上执行以下命令关闭selinux
setenforce 0
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/sysconfig/selinux
sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
(3)在每台机器上执行以下命令关闭dnsmasq
systemctl disable --now dnsmasq
Failed to execute operation: No such file or directory
注意:这里如果是通过VMware虚拟机实践的,会因为没有这个服务而报错!!!
(4)在每台机器上执行以下命令关闭NetworkManager
systemctl disable --now NetworkManager
注意:公有云不要关闭NetworkManager!!!
(5)在每台机器上执行以下命令关闭swap分区
临时关闭
swapoff -a && sysctl -w vm.swappiness=0
永久关闭
sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab
8.时钟同步
(1)在每台机器上执行以下命令安装ntpdate
rpm -ivh http://mirrors.wlnmp.com/centos/wlnmp-release-centos.noarch.rpm
yum install ntpdate -y
(2)在每台机器上执行以下命令同步时间
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
echo 'Asia/Shanghai' >/etc/timezone
ntpdate time2.aliyun.com
#添加定时任务
crontab -e
*/5 * * * * /usr/sbin/ntpdate time2.aliyun.com
9.配置limit
(1)在每台机器上执行以下命令配置limit
ulimit -SHn 65535
vim /etc/security/limits.conf
末尾添加以下内容
* soft nofile 65536
* hard nofile 131072
* soft nproc 65535
* hard nproc 655350
* soft memlock unlimited
* hard memlock unlimited
10.Master01节点配置免密钥登录
(1)在Master01节点上配置如下命令,使其免密钥登录其他节点
ssh-keygen -t rsa #按3次回车即可
for i in k8s-master01 k8s-master02 k8s-master03 k8s-node01 k8s-node02;do ssh-copy-id -i .ssh/id_rsa.pub $i;done
注意:此操作结束后会提示输入4次其他节点的密码!!!
(2)在Master01节点上远程登录k8s-node02节点进行测试,发现测试成功
ssh k8s-node02
11.下载源码文件
(1)在Master01节点上下载源码文件
git clone https://gitee.com/jeckjohn/k8s-ha-install.git
(2)在Master01节点上执行以下命令查看分支
cd k8s-ha-install
git branch -a
[root@192 k8s-ha-install]# git branch -a
* master
remotes/origin/HEAD -> origin/master
remotes/origin/manual-installation
remotes/origin/manual-installation-v1.16.x
remotes/origin/manual-installation-v1.17.x
remotes/origin/manual-installation-v1.18.x
remotes/origin/manual-installation-v1.19.x
remotes/origin/manual-installation-v1.20.x
remotes/origin/manual-installation-v1.20.x-csi-hostpath
remotes/origin/manual-installation-v1.21.x
remotes/origin/manual-installation-v1.22.x
remotes/origin/manual-installation-v1.23.x
remotes/origin/manual-installation-v1.24.x
remotes/origin/manual-installation-v1.25.x
remotes/origin/manual-installation-v1.26.x
remotes/origin/manual-installation-v1.27.x
remotes/origin/manual-installation-v1.28.x
remotes/origin/master
centos7.9内核升级
如果安装的版本低于1.24,选择Docker和Containerd均可,高于1.24选择Containerd作为Runtime。
1.在每台机器上执行以下命令安装docker-ce-20.10,注意这里安装docker时会把Containerd也装上
yum install docker-ce-20.10.* docker-ce-cli-20.10.* -y
2.在每台机器上执行以下命令配置Containerd所需的模块
cat <| sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
3.在每台机器上执行以下命令加载模块
modprobe -- overlay
modprobe -- br_netfilter
4.在每台机器上执行以下命令配置Containerd所需的内核
cat <| sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
5.在每台机器上执行以下命令加载内核
sysctl --system
6.在每台机器上执行以下命令配置Containerd的配置文件
mkdir -p /etc/containerd
containerd config default | tee /etc/containerd/config.toml
7.在每台机器上执行以下命令将Containerd的Cgroup改为Systemd,找到containerd.runtimes.runc.options,添加SystemdCgroup = true(如果已存在直接修改,否则会报错)
vim /etc/containerd/config.toml
...
...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = ""
CriuImagePath = ""
CriuPath = ""
CriuWorkPath = ""
IoGid = 0
IoUid = 0
NoNewKeyring = false
NoPivotRoot = false
Root = ""
ShimCgroup = ""
SystemdCgroup = true
8.在每台机器上执行以下命令将sandbox_image的Pause镜像改成符合自己版本的地址http://registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6
vim /etc/containerd/config.toml
#原本内容
sandbox_image = "registry.k8s.io/pause:3.6"
#修改后的内容
sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6"
9.在每台机器上执行以下命令启动Containerd,并配置开机自启动
systemctl daemon-reload
systemctl enable --now containerd
ls /run/containerd/containerd.sock /run/containerd/containerd.sock
10.在每台机器上执行以下命令配置crictl客户端连接的运行时位置
cat > /etc/crictl.yaml <///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF
11.在每台机器上执行以下命令进行验证
ctr image ls
REF TYPE DIGEST SIZE PLATFORMS LABELS
1.在Master01节点查看最新的Kubernetes版本是多少
yum list kubeadm.x86_64 --showduplicates | sort -r
2.在每台机器上执行以下命令安装1.23最新版本kubeadm、kubelet和kubectl
yum install kubeadm-1.23* kubelet-1.23* kubectl-1.23* -y
查看版本
kubeadm version
3.在每台机器上执行以下命令更改Kubelet的配置使用Containerd作为Runtime,如果选择的是docker作为的Runtime,则不需要进行更改
cat >/etc/sysconfig/kubelet<"--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
EOF
4.在每台机器上执行以下命令设置Kubelet开机自启动(由于还未初始化,没有kubelet的配置文件,此时kubelet无法启动,无需管理)
systemctl daemon-reload
systemctl enable --now kubelet
systemctl status kubelet
说明:由于还未初始化,没有kubelet的配置文件,此时kubelet无法启动,无需管理
公有云要用公有云自带的负载均衡,比如阿里云的SLB,腾讯云的ELB,用来替代haproxy和keepalived,因为公有云大部分都是不支持keepalived的,另外如果用阿里云的话,kubectl控制端不能放在master节点,推荐使用腾讯云,因为阿里云的slb有回环的问题,也就是slb代理的服务器不能反向访问SLB,但是腾讯云修复了这个问题。
注意:如果不是高可用集群,haproxy和keepalived无需安装!!!
1.安装HAProxy
(1)所有Master节点通过yum安装HAProxy和KeepAlived
yum install keepalived haproxy -y
(2)所有Master节点配置HAProxy,所有Master节点的HAProxy配置相同
mkdir /etc/haproxy
vim /etc/haproxy/haproxy.cfg
global
maxconn 2000
ulimit-n 16384
log 127.0.0.1 local0 err
stats timeout 30s
defaults
log global
mode http
option httplog
timeout connect 5000
timeout client 50000
timeout server 50000
timeout http-request 15s
timeout http-keep-alive 15s
frontend monitor-in
bind *:33305
mode http
option httplog
monitor-uri /monitor
frontend k8s-master
bind 0.0.0.0:16443
bind 127.0.0.1:16443
mode tcp
option tcplog
tcp-request inspect-delay 5s
default_backend k8s-master
backend k8s-master
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server k8s-master01 192.168.2.96:6443 check
server k8s-master02 192.168.2.97:6443 check
server k8s-master03 192.168.2.98:6443 check
(3)所有Master节点重启HAProxy,并验证端口16443
systemctl restart haproxy
netstat -lntp | grep 16443
tcp 0 0 127.0.0.1:16443 0.0.0.0:* LISTEN 1075/haproxy
tcp 0 0 0.0.0.0:16443 0.0.0.0:*
2.安装KeepAlived
所有Master节点配置KeepAlived,配置不一样,注意区分每个节点的IP和网卡(interface参数)
(1)Master01节点的配置如下
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state MASTER
interface ens33
mcast_src_ip 192.168.2.96
virtual_router_id 51
priority 101
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.2.236
}
track_script {
chk_apiserver
}
}
(2)Master02节点的配置如下
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state MASTER
interface ens33
mcast_src_ip 192.168.2.97
virtual_router_id 51
priority 101
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.2.236
}
track_script {
chk_apiserver
}
}
(3)Master03节点的配置如下
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
script_user root
enable_script_security
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 5
weight -5
fall 2
rise 1
}
vrrp_instance VI_1 {
state MASTER
interface ens33
mcast_src_ip 192.168.2.98
virtual_router_id 51
priority 101
advert_int 2
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH
}
virtual_ipaddress {
192.168.2.236
}
track_script {
chk_apiserver
}
}
(4)所有master节点配置KeepAlived健康检查文件
vim /etc/keepalived/check_apiserver.sh
#!/bin/bash
#初始化错误计数器
err=0
#循环三次检查HAProxy进程是否在运行
for k in $(seq 1 3)
do
check_code=$(pgrep haproxy)
if [[ $check_code == "" ]]; then
#如果未找到进程,增加错误计数器并等待一秒钟
err=$(expr $err + 1)
sleep 1
continue
else
#如果找到进程,重置错误计数器并退出循环
err=0
break
fi
done
#根据错误计数器的值,决定是否停止keepalived服务并退出脚本
if [[ $err != "0" ]]; then
echo "systemctl stop keepalived"
/usr/bin/systemctl stop keepalived
exit 1
else
exit 0
fi
#赋权
chmod +x /etc/keepalived/check_apiserver.sh
3.所有master节点启动haproxy和keepalived
systemctl daemon-reload
systemctl enable --now haproxy
systemctl enable --now keepalived
4.测试VIP,验证keepalived是否是正常
ping 192.168.2.236 -c 4
1.Master01节点创建kubeadm-config.yaml配置文件如下
vim kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: 7t2weq.bjbawausm0jaxury
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.2.96 #Master01节点的IP地址
bindPort: 6443
nodeRegistration:
criSocket: /run/containerd/containerd.sock
name: k8s-master01
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
certSANs:
- 192.168.2.236 #VIP地址/公有云的负载均衡地址
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 192.168.2.236:16443
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.23.17 #此处版本号和kubeadm版本一致
networking:
dnsDomain: cluster.local
podSubnet: 172.16.0.0/12
serviceSubnet: 10.0.0.0/16
scheduler: {}
2.Master01节点上更新kubeadm文件
kubeadm config migrate --old-config kubeadm-config.yaml --new-config new.yaml
3.在Master01节点上将new.yaml文件复制到其他master节点
for i in k8s-master02 k8s-master03; do scp new.yaml $i:/root/; done
4.所有Master节点提前下载镜像,可以节省初始化时间(其他节点不需要更改任何配置,包括IP地址也不需要更改)
kubeadm config images pull --config /root/new.yaml
5.所有节点设置开机自启动kubelet
systemctl enable --now kubelet
6.Master01节点初始化,初始化以后会在/etc/kubernetes目录下生成对应的证书和配置文件,之后其他Master节点加入Master01即可
kubeadm init --config /root/new.yaml --upload-certs
正常执行成功后可以输出如下日志
[root@k8s-master01 ~]# kubeadm init --config /root/new.yaml --upload-certs
[init] Using Kubernetes version: v1.23.17
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.0.0.1 192.168.2.96 192.168.2.236]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master01 localhost] and IPs [192.168.2.96 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master01 localhost] and IPs [192.168.2.96 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 22.532424 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.23" in namespace kube-system with the configuration for the kubelets in the cluster
NOTE: The "kubelet-config-1.23" naming of the kubelet ConfigMap is deprecated. Once the UnversionedKubeletConfigMap feature gate graduates to Beta the default name will become just "kubelet-config". Kubeadm upgrade will handle this transition transparently.
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
3b03fabd6969ee744908335536f94e0ac11d15be87edd918d8ad08324ddfdbb2
[mark-control-plane] Marking the node k8s-master01 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8s-master01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 7t2weq.bjbawausm0jaxury
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 192.168.2.236:16443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:c5891bc7b53ee8e7548de96db1f4ed5ef353b77e572910f8aa3965040356701d \
--control-plane --certificate-key 3b03fabd6969ee744908335536f94e0ac11d15be87edd918d8ad08324ddfdbb2
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.2.236:16443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:c5891bc7b53ee8e7548de96db1f4ed5ef353b77e572910f8aa3965040356701d
补充:
如果初始化失败,重置后再次初始化,命令如下(没有失败不要执行)
kubeadm reset -f ; ipvsadm --clear ; rm -rf ~/.kube
7.Master01节点配置环境变量,用于访问Kubernetes集群
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
#查看节点状态
kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane,master 4m5s v1.23.17
1.添加Master02节点和Master03节点到k8s集群
kubeadm join 192.168.2.236:16443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:c5891bc7b53ee8e7548de96db1f4ed5ef353b77e572910f8aa3965040356701d \
--control-plane --certificate-key 3b03fabd6969ee744908335536f94e0ac11d15be87edd918d8ad08324ddfdbb2
2.添加Node01节点和Node02节点到k8s集群
kubeadm join 192.168.2.236:16443 --token 7t2weq.bjbawausm0jaxury \
--discovery-token-ca-cert-hash sha256:c5891bc7b53ee8e7548de96db1f4ed5ef353b77e572910f8aa3965040356701d
3.在Master01节点上查看节点状态
kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 NotReady control-plane,master 7m11s v1.23.17
k8s-master02 NotReady control-plane,master 2m28s v1.23.17
k8s-master03 NotReady control-plane,master 102s v1.23.17
k8s-node01 NotReady 106s v1.23.17
k8s-node02 NotReady 84s v1.23.17
1.在Master01节点上进入相应分支目录
cd /root/k8s-ha-install && git checkout manual-installation-v1.23.x && cd calico/
2.提取Pod网段并赋值给变量
POD_SUBNET=`cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep cluster-cidr= | awk -F= '{print $NF}'`
3.修改calico.yaml文件
sed -i "s#POD_CIDR#${POD_SUBNET}#g" calico.yaml
4.安装Calico
kubectl apply -f calico.yaml
5.查看节点状态
kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane,master 9h v1.23.17
k8s-master02 Ready control-plane,master 9h v1.23.17
k8s-master03 Ready control-plane,master 9h v1.23.17
k8s-node01 Ready 9h v1.23.17
k8s-node02 Ready 9h v1.23.17
6.查看pod状态,观察到所有pod都是running
kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6f6595874c-tntnr 1/1 Running 0 8m52s
calico-node-5mj9g 1/1 Running 1 (41s ago) 8m52s
calico-node-hhjrv 1/1 Running 2 (61s ago) 8m52s
calico-node-szjm7 1/1 Running 0 8m52s
calico-node-xcgwq 1/1 Running 0 8m52s
calico-node-ztbkj 1/1 Running 1 (11s ago) 8m52s
calico-typha-6b6cf8cbdf-8qj8z 1/1 Running 0 8m52s
coredns-65c54cc984-nrhlg 1/1 Running 0 9h
coredns-65c54cc984-xkx7w 1/1 Running 0 9h
etcd-k8s-master01 1/1 Running 1 (29m ago) 9h
etcd-k8s-master02 1/1 Running 1 (29m ago) 9h
etcd-k8s-master03 1/1 Running 1 (29m ago) 9h
kube-apiserver-k8s-master01 1/1 Running 1 (29m ago) 9h
kube-apiserver-k8s-master02 1/1 Running 1 (29m ago) 9h
kube-apiserver-k8s-master03 1/1 Running 2 (29m ago) 9h
kube-controller-manager-k8s-master01 1/1 Running 2 (29m ago) 9h
kube-controller-manager-k8s-master02 1/1 Running 1 (29m ago) 9h
kube-controller-manager-k8s-master03 1/1 Running 1 (29m ago) 9h
kube-proxy-7rmrs 1/1 Running 1 (29m ago) 9h
kube-proxy-bmqhr 1/1 Running 1 (29m ago) 9h
kube-proxy-l9rqg 1/1 Running 1 (29m ago) 9h
kube-proxy-nn465 1/1 Running 1 (29m ago) 9h
kube-proxy-sghfb 1/1 Running 1 (29m ago) 9h
kube-scheduler-k8s-master01 1/1 Running 2 (29m ago) 9h
kube-scheduler-k8s-master02 1/1 Running 1 (29m ago) 9h
kube-scheduler-k8s-master03 1/1 Running 1 (29m ago) 9h
在新版的Kubernetes中系统资源的采集均使用Metrics-server,可以通过Metrics采集节点和Pod的内存、磁盘、CPU和网络的使用率。
1.将Master01节点的front-proxy-ca.crt复制到Node-01节点和Node-02节点
scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-node01:/etc/kubernetes/pki/front-proxy-ca.crt
scp /etc/kubernetes/pki/front-proxy-ca.crt k8s-node02:/etc/kubernetes/pki/front-proxy-ca.crt
2.在Master01节点上操作安装metrics server
cd /root/k8s-ha-install/kubeadm-metrics-server
kubectl create -f comp.yaml
3.在Master01节点上查看metrics-server部署情况
kubectl get po -n kube-system -l k8s-app=metrics-server
NAME READY STATUS RESTARTS AGE
metrics-server-5cf8885b66-jdjtb 1/1 Running 0 115s
4.在Master01节点上查看node使用情况
kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-master01 130m 0% 1019Mi 12%
k8s-master02 102m 0% 1064Mi 13%
k8s-master03 93m 0% 971Mi 12%
k8s-node01 45m 0% 541Mi 6%
k8s-node02 57m 0% 544Mi 6%
Dashboard 是基于网页的 Kubernetes 用户界面。 你可以使用 Dashboard 将容器应用部署到 Kubernetes 集群中,也可以对容器应用排错,还能管理集群资源。 你可以使用 Dashboard 获取运行在集群中的应用的概览信息,也可以创建或者修改 Kubernetes 资源 (如 Deployment,Job,DaemonSet 等等)。 例如,你可以对 Deployment 实现弹性伸缩、发起滚动升级、重启 Pod 或者使用向导创建新的应用。Dashboard 同时展示了 Kubernetes 集群中的资源状态信息和所有报错信息。
1.在Master01节点上操作安装Dashboard
cd /root/k8s-ha-install/dashboard/
kubectl create -f .
2.在Master01节点上查看Dashboard服务
kubectl get svc -n kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.0.159.210 8000/TCP 2m6s
kubernetes-dashboard NodePort 10.0.241.159 443:31822/TCP 2m6s
3.在谷歌浏览器(Chrome)启动文件中加入启动参数,用于解决无法访问Dashboard的问题
(1)右键谷歌浏览器(Chrome),选择【属性】
(2)在【目标】位置处添加下面参数,这里再次强调一下–test-type --ignore-certificate-errors前面有参数
–test-type --ignore-certificate-errors
4.在Master01节点上查看token值
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}')
5.打开谷歌浏览器(Chrome),输入https://任意节点IP:服务端口,这里以Master01节点为例
https://192.168.2.97:32636/
6.切换命名命名空间为kube-system,默认defult命名空间没有资源
1.在Master01节点上将Kube-proxy改为ipvs模式,默认是iptables
kubectl edit cm kube-proxy -n kube-system
2.在Master01节点上更新Kube-Proxy的Pod
kubectl patch daemonset kube-proxy -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}" -n kube-system
3.在Master01节点上查看kube-proxy滚动更新情况
kubectl get po -n kube-system | grep kube-proxy
kube-proxy-2kz9g 1/1 Running 0 58s
kube-proxy-b54gh 1/1 Running 0 63s
kube-proxy-kclcc 1/1 Running 0 61s
kube-proxy-pv8gc 1/1 Running 0 59s
kube-proxy-xt52m 1/1 Running 0 56s
4.在Master01节点上验证Kube-Proxy模式
curl 127.0.0.1:10249/proxyMode
ipvs
1.在Master01节点上开启kubectl自动补全
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc
2.在Master01节点上为 kubectl 使用一个速记别名
alias k=kubectl
complete -o default -F __start_kubectl k
1.在Master01节点上查看节点是否正常,确定都是Ready
kubectl get node
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready control-plane,master 2d2h v1.23.17
k8s-master02 Ready control-plane,master 2d v1.23.17
k8s-master03 Ready control-plane,master 2d v1.23.17
k8s-node01 Ready 2d v1.23.17
k8s-node02 Ready 2d v1.23.17
[root@k8s-master01 ~]#
2.在Master01节点上查看所有Pod是否正常,确定READY都是N/N形式的且STATUS 都为Running
3.在Master01节点上查看集群网段是否冲突
(1)在Master01节点上查看SVC网段
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.0.0.1 443/TCP 2d2h
nginx NodePort 10.0.25.21 80:30273/TCP 47h
springboot-demo NodePort 10.0.115.157 30000:30001/TCP 45h
(2)在Master01节点上查看POD网段,主要分为两段,一段是因为使用HostNetwork,所以使用宿主机网段;另一段使用POD网段
4.在Master01节点上查看是否正常创建资源
(1)在Master01节点上创建名为cluster-test的deployment
kubectl create deploy cluster-test --image=registry.cn-hangzhou.aliyuncs.com/zq-demo/debug-tools -- sleep 3600
(2)在Master01节点上查看deployment创建情况
kubectl get po
NAME READY STATUS RESTARTS AGE
cluster-test-79b978867f-mcqgr 1/1 Running 2 (10m ago) 2d
nginx-85b98978db-vb9gc 1/1 Running 1 (10m ago) 47h
springboot-demo-6d87f66f6b-fxpgj 1/1 Running 1 (10m ago) 45h
springboot-demo-6d87f66f6b-p4tmd 1/1 Running 1 (10m ago) 45h
springboot-demo-6d87f66f6b-tmtkn 1/1 Running 1 (10m ago) 45h
5.在Master01节点上检查Pod 是否能够解析 Service
(1)在Master01节点上解析kubernetes,观察到和上面SVC地址一致
[root@k8s-master01 ~]# kubectl exec -it cluster-test-79b978867f-mcqgr -- bash
(06:53 cluster-test-79b978867f-mcqgr:/) nslookup kubernetes
Server: 10.0.0.10
Address: 10.0.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.0.0.1
(2)在Master01节点上解析kube-dns.kube-system,观察到和上面SVC地址一致
(06:53 cluster-test-79b978867f-mcqgr:/) nslookup kube-dns.kube-system
Server: 10.0.0.10
Address: 10.0.0.10#53
Name: kube-dns.kube-system.svc.cluster.local
Address: 10.0.0.10
6.每个节点是否能访问 Kubernetes 的 kubernetes svc 443 和 kube-dns 的 service 53
(1)在每台机器上测试访问 Kubernetes 的 kubernetes svc 443
[root@k8s-master02 ~]# curl https://10.0.0.1:443
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
(2)在每台机器上测试访问 Kubernetes 的kube-dns 的 service 53
curl 10.0.0.10:53
curl: (52) Empty reply from server
7.Pod 和机器之间是否能正常通讯
(1)在Master01节点上查看pod节点IP
[root@k8s-master01 ~]# kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cluster-test-79b978867f-mcqgr 1/1 Running 2 (16m ago) 2d 172.27.14.201 k8s-node02
nginx-85b98978db-vb9gc 1/1 Running 1 (16m ago) 47h 172.17.125.7 k8s-node01
springboot-demo-6d87f66f6b-fxpgj 1/1 Running 1 (16m ago) 45h 172.27.14.203 k8s-node02
springboot-demo-6d87f66f6b-p4tmd 1/1 Running 1 (16m ago) 45h 172.27.14.204 k8s-node02
springboot-demo-6d87f66f6b-tmtkn 1/1 Running 1 (16m ago) 45h 172.17.125.9 k8s-node01
(2)在Master01节点上ping测试
[root@k8s-master01 ~]# ping 172.27.14.201
PING 172.27.14.201 (172.27.14.201) 56(84) bytes of data.
64 bytes from 172.27.14.201: icmp_seq=1 ttl=63 time=0.418 ms
64 bytes from 172.27.14.201: icmp_seq=2 ttl=63 time=0.222 ms
64 bytes from 172.27.14.201: icmp_seq=3 ttl=63 time=0.269 ms
64 bytes from 172.27.14.201: icmp_seq=4 ttl=63 time=0.364 ms
64 bytes from 172.27.14.201: icmp_seq=5 ttl=63 time=0.197 ms
^C
--- 172.27.14.201 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4106ms
rtt min/avg/max/mdev = 0.197/0.294/0.418/0.084 ms
[root@k8s-master01 ~]#
8.检查Pod 和Pod之间是否能正常通讯
(1)在Master01节点上查看default默认命名空间下的Pod
[root@k8s-master01 ~]# kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cluster-test-79b978867f-mcqgr 1/1 Running 2 (19m ago) 2d 172.27.14.201 k8s-node02
nginx-85b98978db-vb9gc 1/1 Running 1 (19m ago) 47h 172.17.125.7 k8s-node01
springboot-demo-6d87f66f6b-fxpgj 1/1 Running 1 (19m ago) 45h 172.27.14.203 k8s-node02
springboot-demo-6d87f66f6b-p4tmd 1/1 Running 1 (19m ago) 45h 172.27.14.204 k8s-node02
springboot-demo-6d87f66f6b-tmtkn 1/1 Running 1 (19m ago) 45h 172.17.125.9 k8s-node01
您在 /var/spool/mail/root 中有新邮件
[root@k8s-master01 ~]#
(2)在Master01节点上kube-system命名空间下的Pod
[root@k8s-master01 ~]# kubectl get po -n kube-system -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-6f6595874c-rvpbg 1/1 Running 4 (20m ago) 2d1h 172.18.195.8 k8s-master03
calico-node-6vwpm 1/1 Running 2 (20m ago) 2d1h 192.168.2.99 k8s-node01
calico-node-b8kfr 1/1 Running 3 (25m ago) 2d1h 192.168.2.96 k8s-master01
calico-node-blqnz 1/1 Running 2 (20m ago) 2d1h 192.168.2.100 k8s-node02
calico-node-hpw8m 1/1 Running 3 (20m ago) 2d1h 192.168.2.98 k8s-master03
calico-node-pbb9c 1/1 Running 2 2d1h 192.168.2.97 k8s-master02
calico-typha-6b6cf8cbdf-4jt8j 1/1 Running 2 (20m ago) 2d1h 192.168.2.99 k8s-node01
coredns-65c54cc984-cdksw 1/1 Running 2 (20m ago) 2d3h 172.18.195.9 k8s-master03
coredns-65c54cc984-qb5nz 1/1 Running 2 (20m ago) 2d3h 172.18.195.7 k8s-master03
etcd-k8s-master01 1/1 Running 3 (21m ago) 2d3h 192.168.2.96 k8s-master01
etcd-k8s-master02 1/1 Running 2 (21m ago) 2d1h 192.168.2.97 k8s-master02
etcd-k8s-master03 1/1 Running 2 (20m ago) 2d1h 192.168.2.98 k8s-master03
kube-apiserver-k8s-master01 1/1 Running 7 (20m ago) 2d3h 192.168.2.96 k8s-master01
kube-apiserver-k8s-master02 1/1 Running 3 (18m ago) 2d1h 192.168.2.97 k8s-master02
kube-apiserver-k8s-master03 1/1 Running 3 (20m ago) 2d1h 192.168.2.98 k8s-master03
kube-controller-manager-k8s-master01 1/1 Running 5 (25m ago) 2d3h 192.168.2.96 k8s-master01
kube-controller-manager-k8s-master02 1/1 Running 2 (21m ago) 2d1h 192.168.2.97 k8s-master02
kube-controller-manager-k8s-master03 1/1 Running 3 (19m ago) 2d1h 192.168.2.98 k8s-master03
kube-proxy-258rd 1/1 Running 1 (20m ago) 2d 192.168.2.99 k8s-node01
kube-proxy-7vgnc 1/1 Running 1 (20m ago) 2d 192.168.2.98 k8s-master03
kube-proxy-9gsq5 1/1 Running 1 (21m ago) 2d 192.168.2.97 k8s-master02
kube-proxy-fltdx 1/1 Running 1 (20m ago) 2d 192.168.2.100 k8s-node02
kube-proxy-hc7z9 1/1 Running 1 (25m ago) 2d 192.168.2.96 k8s-master01
kube-scheduler-k8s-master01 1/1 Running 5 (25m ago) 2d3h 192.168.2.96 k8s-master01
kube-scheduler-k8s-master02 1/1 Running 2 (21m ago) 2d1h 192.168.2.97 k8s-master02
kube-scheduler-k8s-master03 1/1 Running 3 (19m ago) 2d1h 192.168.2.98 k8s-master03
metrics-server-5cf8885b66-w9dkt 1/1 Running 1 (20m ago) 2d 172.27.14.205 k8s-node02
(3)在Master01节点上进入cluster-test-79b978867f-429xg进行ping测试
[root@k8s-master01 ~]# kubectl exec -it cluster-test-79b978867f-mcqgr -- bash
(07:02 cluster-test-79b978867f-mcqgr:/) ping 192.168.2.99
PING 192.168.2.99 (192.168.2.99) 56(84) bytes of data.
64 bytes from 192.168.2.99: icmp_seq=1 ttl=63 time=0.260 ms
64 bytes from 192.168.2.99: icmp_seq=2 ttl=63 time=0.431 ms
64 bytes from 192.168.2.99: icmp_seq=3 ttl=63 time=0.436 ms
64 bytes from 192.168.2.99: icmp_seq=4 ttl=63 time=0.419 ms
64 bytes from 192.168.2.99: icmp_seq=5 ttl=63 time=0.253 ms
64 bytes from 192.168.2.99: icmp_seq=6 ttl=63 time=0.673 ms
64 bytes from 192.168.2.99: icmp_seq=7 ttl=63 time=0.211 ms
64 bytes from 192.168.2.99: icmp_seq=8 ttl=63 time=0.374 ms
64 bytes from 192.168.2.99: icmp_seq=9 ttl=63 time=0.301 ms
64 bytes from 192.168.2.99: icmp_seq=10 ttl=63 time=0.194 ms
^C
--- 192.168.2.99 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9217ms
rtt min/avg/max/mdev = 0.194/0.355/0.673/0.137 ms