•Kubernetes是Google在2014年开源的一个容器集群管理系统,Kubernetes简称K8S。
•K8S用于容器化应用程序的部署,扩展和管理。
•K8S提供了容器编排,资源调度,弹性伸缩,部署管理,服务发现等一系列功能。
•Kubernetes目标是让部署容器化应用简单高效。
官方网站:http://www.kubernetes.io
![在这里插入图片描述](https://img-blog.csdnimg.cn/a130eebf82b14c54863c67cdf240b508.png
Kubernetes API,集群的统一入口,各组件协调者,以RESTful API提供接口 服务,所有对象资源的增删改查和监听操作都交给APIServer处理后再提交给
Etcd存储。
处理集群中常规后台任务,一个资源对应一个控制器,而ControllerManager 就是负责管理这些控制器的。
根据调度算法为新创建的Pod选择一个Node节点,可以任意部署,可以部署在 同一个节点上,也可以部署在不同的节点上。
分布式键值存储系统。用于保存集群状态数据,比如Pod、Service等对象信息。
kubelet是Master在Node节点上的Agent,管理本机运行容器的生命周期,比如创 建容器、Pod挂载数据卷、下载secret、获取容器和节点状态等工作。kubelet将每 个Pod转换成一组容器。
在Node节点上实现Pod网络代理,维护网络规则和四层负载均衡工作。
容器引擎,运行容器。
ReplicaSet : 确保预期的Pod副本数量
Deployment : 无状态应用部署
StatefulSet : 有状态应用部署
DaemonSet : 确保所有Node运行同一个Pod
Job : 一次性任务
Cronjob : 定时任务
更高级层次对象,部署和管理Pod
Node名称 | ip_addr | keepalived | HAPROXY | MASTER | WORKER |
---|---|---|---|---|---|
keepalive-1 | 192.168.3.141 | yes | yes | yes | |
keepalive-2 | 192.168.3.142 | yes | yes | yes | |
keepalive-3 | 192.168.3.143 | yes | yes | yes | |
k8master-1 | 192.168.3.144 | yes | |||
k8master-2 | 192.168.3.145 | yes | |||
k8master-3 | 192.168.3.146 | yes | |||
k8worker-1 | 192.168.3.147 | yes | |||
k8worker-2 | 192.168.3.148 | yes | |||
VIP | 192.168.3.140 | ||||
Note | |||||
Master-包括: | kube-apiserver | kube-controller-manager | kube-scheduler | etcd | |
Worker-包括: | kubelet | kube-proxy | flannel |
注意:
设置永久主机名称,然后重新登录:
hostnamectl set-hostname k8master-1 # 将 k8master-1 替换为当前主机名
/etc/hosts
文件中;如果 DNS 不支持解析主机名称,则需要修改每台机器的 /etc/hosts
文件,添加主机名和 IP 的对应关系:
cat >> /etc/hosts <<EOF
192.168.3.140 vip
192.168.3.141 ka-1
192.168.3.142 ka-2
192.168.3.143 ka-3
192.168.3.144 k8master-1
192.168.3.145 k8master-2
192.168.3.146 k8master-3
192.168.3.147 k8worker-1
192.168.3.148 k8worker-2
EOF
在每台机器上添加 docker 账户:
useradd -m docker
设置 k8master-1 的 root 账户可以无密码登录所有节点:
ssh-keygen -t rsa
ssh-copy-id root@k8master-1
ssh-copy-id root@k8master-2
ssh-copy-id root@k8master-3
ssh-copy-id root@ka-1
ssh-copy-id root@ka-2
ssh-copy-id root@ka-3
ssh-copy-id root@k8worker-1
ssh-copy-id root@k8worker-2
ssh-copy-id root@k8worker-3
将可执行文件目录添加到 PATH 环境变量中:
echo 'PATH=/app/k8s/bin:$PATH' >>/root/.bashrc
source /root/.bashrc
在每台机器上安装依赖包:
CentOS:
yum -y install bridge-utils chrony ipvsadm ipset sysstat conntrack libseccomp wget tcpdump screen vim nfs-utils bind-utils wget socat telnet sshpass net-tools sysstat lrzsz yum-utils device-mapper-persistent-data lvm2 tree nc lsof strace nmon iptraf iftop rpcbind mlocate ipvsadm
在每台机器上关闭防火墙,清理防火墙规则,设置默认转发策略:
systemctl stop firewalld
setenforce 0
sed -i 's/^SELINUX=.\*/SELINUX=disabled/' /etc/selinux/config
如果开启了 swap 分区,kubelet 会启动失败(可以通过将参数 --fail-swap-on 设置为 false 来忽略 swap on),故需要在每台机器上关闭 swap 分区。同时注释 /etc/fstab
中相应的条目,防止开机自动挂载 swap 分区:
swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
关闭 SELinux,否则后续 K8S 挂载目录时可能报错 Permission denied
:
setenforce 0
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
cat >> /etc/security/limits.conf <<EOF
* soft noproc 65535
* hard noproc 65535
* soft nofile 65535
* hard nofile 65535
* soft memlock unlimited
* hard memlock unlimited
EOF
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4 # 4.19内核已改名为nf_conntrack,这里报错可忽略
modprobe -- overlay
modprobe -- br_netfilter
EOF
chmod 755 /etc/sysconfig/modules/ipvs.modules
bash /etc/sysconfig/modules/ipvs.modules
lsmod | grep -e ip_vs -e nf_conntrack_ipv4
cat > kubernetes.conf <<EOF
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_forward=1
net.ipv4.tcp_tw_recycle=0
vm.swappiness=0 # 禁止使用 swap 空间,只有当系统 OOM 时才允许使用它
vm.overcommit_memory=1 # 不检查物理内存是否够用
vm.panic_on_oom=0 # 开启 OOM
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576
fs.file-max=52706963
fs.nr_open=52706963
net.ipv6.conf.all.disable_ipv6=1
net.netfilter.nf_conntrack_max=2310720
EOF
cp kubernetes.conf /etc/sysctl.d/kubernetes.conf
sysctl -p /etc/sysctl.d/kubernetes.conf
# 调整系统 TimeZone
timedatectl set-timezone Asia/Shanghai
# 将当前的 UTC 时间写入硬件时钟
timedatectl set-local-rtc 0
# 重启依赖于系统时间的服务
systemctl restart rsyslog
systemctl restart crond
systemctl stop postfix && systemctl disable postfix
systemd 的 journald 是 Centos 7 缺省的日志记录工具,它记录了所有系统、内核、Service Unit 的日志。
相比 systemd,journald 记录的日志有如下优势:
journald 默认将日志转发给 rsyslog,这会导致日志写了多份,/var/log/messages 中包含了太多无关日志,不方便后续查看,同时也影响系统性能。
mkdir /var/log/journal # 持久化保存日志的目录
mkdir /etc/systemd/journald.conf.d
cat > /etc/systemd/journald.conf.d/99-prophet.conf <<EOF
[Journal]
# 持久化保存到磁盘
Storage=persistent
# 压缩历史日志
Compress=yes
SyncIntervalSec=5m
RateLimitInterval=30s
RateLimitBurst=1000
# 最大占用空间 10G
SystemMaxUse=10G
# 单日志文件最大 200M
SystemMaxFileSize=200M
# 日志保存时间 2 周
MaxRetentionSec=2week
# 不将日志转发到 syslog
ForwardToSyslog=no
EOF
systemctl restart systemd-journald
cat /app/k8s/bin/environment.sh << EOF
#!/bin/bash
# 生成 EncryptionConfig 所需的加密 key
export ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64)
# 集群各机器 IP 数组
export NODE_IPS=(192.168.3.141 192.168.3.142 192.168.3.143 192.168.3.144 192.168.3.145 192.168.3.146 192.168.3.147 192.168.3.148)
# 集群各 IP 对应的主机名数组
export NODE_NAMES=(ka-1 ka-2 ka-3 k8master-1 k8master-2 k8master-3 k8worker-1 k8worker-2)
# etcd 集群服务地址列表
export ETCD_ENDPOINTS="https://192.168.3.144:2379,https://192.168.3.145:2379,https://192.168.3.146:2379"
# etcd 集群间通信的 IP 和端口
export ETCD_NODES="k8master-1=https://192.168.3.144:2380,k8master-2=https://192.168.3.145:2380,k8master-3=https://192.168.3.146:2380"
#etcd IP
export ETCD_IPS=(192.168.3.144 192.168.3.145 192.168.3.146)
export API_IPS=(192.168.3.144 192.168.3.145 192.168.3.146)
export CTL_IPS=(192.168.3.144 192.168.3.145 192.168.3.146)
export WORK_IPS=(192.168.3.141 192.168.3.142 192.168.3.143 192.168.3.147 192.168.3.148)
export KA_IPS=(192.168.3.141 192.168.3.142 192.168.3.143)
export VIP_IP="192.168.3.140"
# kube-apiserver 的反向代理(kube-nginx)地址端口
export KUBE_APISERVER="https://192.168.3.140:8443"
# 节点间互联网络接口名称
export IFACE="ens192"
# etcd 数据目录
export ETCD_DATA_DIR="/app/k8s/etcd/data"
# etcd WAL 目录,建议是 SSD 磁盘分区,或者和 ETCD_DATA_DIR 不同的磁盘分区
export ETCD_WAL_DIR="/app/k8s/etcd/wal"
# k8s 各组件数据目录
export K8S_DIR="/app/k8s/k8s"
# docker 数据目录
export DOCKER_DIR="/app/k8s/docker"
## 以下参数一般不需要修改
# TLS Bootstrapping 使用的 Token,可以使用命令 head -c 16 /dev/urandom | od -An -t x | tr -d ' ' 生成
BOOTSTRAP_TOKEN="41f7e4ba8b7be874fcff18bf5cf41a7c"
# 最好使用 当前未用的网段 来定义服务网段和 Pod 网段
# 服务网段,部署前路由不可达,部署后集群内路由可达(kube-proxy 保证)
SERVICE_CIDR="10.254.0.0/16"
# Pod 网段,建议 /16 段地址,部署前路由不可达,部署后集群内路由可达(flanneld 保证)
CLUSTER_CIDR="172.1.0.0/16"
# 服务端口范围 (NodePort Range)
export NODE_PORT_RANGE="30000-32767"
# flanneld 网络配置前缀
export FLANNEL_ETCD_PREFIX="/kubernetes/network"
# kubernetes 服务 IP (一般是 SERVICE_CIDR 中第一个IP)
export CLUSTER_KUBERNETES_SVC_IP="10.254.0.1"
# 集群 DNS 服务 IP (从 SERVICE_CIDR 中预分配)
export CLUSTER_DNS_SVC_IP="10.254.0.2"
# 集群 DNS 域名(末尾不带点号)
export CLUSTER_DNS_DOMAIN="cluster.local"
# 将二进制目录 /app/k8s/bin 加到 PATH 中
export PATH=/app/k8s/bin:$PATH
EOF
创建目录:
mkdir -p /app/k8s/{bin,work} /etc/{kubernetes,etcd}/cert
升级内核,详见:
https://gitee.com/lenovux/k8s/blob/master/B.centos7%E5%86%85%E6%A0%B8%E5%8D%87%E7%BA%A7.md
后续使用的环境变量都定义在文件 A.实用脚本.md 的[environment.sh]中,请根据自己的机器、网络情况修改。然后,把它拷贝到所有节点的 /app/k8s/bin
目录。
tags: TLS, CA, x509
为确保安全,kubernetes
系统各组件需要使用 x509
证书对通信进行加密和认证。
CA (Certificate Authority) 是自签名的根证书,用来签名后续创建的其它证书。
本文档使用 CloudFlare
的 PKI 工具集 cfssl 创建所有证书。
注意:如果没有特殊指明,本文档的所有操作均在k8master-1 节点上执行,然后远程分发文件和执行命令。
sudo mkdir -p /app/k8s/cert && cd /app/k8s
wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
mv cfssl_linux-amd64 /app/k8s/bin/cfssl
wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
mv cfssljson_linux-amd64 /app/k8s/bin/cfssljson
wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
mv cfssl-certinfo_linux-amd64 /app/k8s/bin/cfssl-certinfo
chmod +x /app/k8s/bin/*
export PATH=/app/k8s/bin:$PATH
CA 证书是集群所有节点共享的,只需要创建一个 CA 证书,后续创建的所有证书都由它签名。
CA 配置文件用于配置根证书的使用场景 (profile) 和具体参数 (usage,过期时间、服务端认证、客户端认证、加密等),后续在签名其它证书时需要指定特定场景。
cd /app/k8s/work
cat > ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
}
}
}
}
EOF
signing
:表示该证书可用于签名其它证书,生成的 ca.pem
证书中 CA=TRUE
;server auth
:表示 client 可以用该该证书对 server 提供的证书进行验证;client auth
:表示 server 可以用该该证书对 client 提供的证书进行验证;cd /app/k8s/work
cat > ca-csr.json <<EOF
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "hangzhou",
"L": "hangzhou",
"O": "k8s",
"OU": "CMCC"
}
],
"ca": {
"expiry": "876000h"
}
}
EOF
Common Name
,kube-apiserver 从证书中提取该字段作为请求的用户名 (User Name),浏览器使用该字段验证网站是否合法;Organization
,kube-apiserver 从证书中提取该字段作为请求用户所属的组 (Group);RBAC
授权的用户标识;cd /app/k8s/work
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
ls ca*
1.创建证书请求文件
cd /app/k8s/work
cat > etcd-csr.json <<EOF
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"192.168.3.144",
"192.168.3.145",
"192.168.3.146",
"192.168.3.140"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "hangzhou",
"L": "hangzhou",
"O": "k8s",
"OU": "CMCC"
}
]
}
EOF
2.签发证书
cd /app/k8s/work
cfssl gencert -ca=/app/k8s/work/ca.pem \
-ca-key=/app/k8s/work/ca-key.pem \
-config=/app/k8s/work/ca-config.json \
-profile=kubernetes etcd-csr.json | cfssljson -bare etcd
ls etcd*pem
1.创建证书请求文件
source /app/k8s/bin/environment.sh
cat > kubernetes-csr.json <<EOF
{
"CN": "kubernetes",
"hosts": [
"127.0.0.1",
"192.168.3.144",
"192.168.3.145",
"192.168.3.146",
"192.168.3.140",
"${CLUSTER_KUBERNETES_SVC_IP}",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local."
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "zhejiang",
"L": "hangzhou",
"O": "k8s",
"OU": "CMCC"
}
]
}
EOF
2.签发证书
cfssl gencert -ca=/app/k8s/work/ca.pem \
-ca-key=/app/k8s/work/ca-key.pem \
-config=/app/k8s/work/ca-config.json \
-profile=kubernetes kubernetes-csr.json | cfssljson -bare kubernetes
ls kubernetes*pem
1.创建证书请求配置文件
cd /app/k8s/work
cat > kube-controller-manager-csr.json <<EOF
{
"CN": "system:kube-controller-manager",
"key": {
"algo": "rsa",
"size": 2048
},
"hosts": [
"127.0.0.1",
"192.168.3.144",
"192.168.3.145",
"192.168.3.146",
"192.168.3.140"
],
"names": [
{
"C": "CN",
"ST": "zhejiang",
"L": "hangzhou",
"O": "system:kube-controller-manager",
"OU": "CMCC"
}
]
}
EOF
2.签发证书
cd /app/k8s/work
cfssl gencert -ca=/app/k8s/work/ca.pem \
-ca-key=/app/k8s/work/ca-key.pem \
-config=/app/k8s/work/ca-config.json \
-profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager
ls kube-controller-manager*pem
1.创建证书请求配置文件
cd /app/k8s/work
cat > kube-scheduler-csr.json <<EOF
{
"CN": "system:kube-scheduler",
"hosts": [
"127.0.0.1",
"192.168.3.146",
"192.168.3.145",
"192.168.3.144",
"192.168.3.140"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "zhejiang",
"L": "hangzhou",
"O": "system:kube-scheduler",
"OU": "CMCC"
}
]
}
EOF
2.签发证书
cd /app/k8s/work
cfssl gencert -ca=/app/k8s/work/ca.pem \
-ca-key=/app/k8s/work/ca-key.pem \
-config=/app/k8s/work/ca-config.json \
-profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-scheduler
ls kube-scheduler*pem
1.创建证书请求配置文件
cd /app/k8s/work
cat > admin-csr.json <<EOF
{
"CN": "admin",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "system:masters",
"OU": "4Paradigm"
}
]
}
EOF
这里绑定的不是User(用户),而是Group(组),使用O指定组。如果部署完后你想查看,可以通过如下命令:k get ClusterRolebinding cluster-admin -o yaml
2.签发证书
cd /app/k8s/work
cfssl gencert -ca=/app/k8s/work/ca.pem \
-ca-key=/app/k8s/work/ca-key.pem \
-config=/app/k8s/work/ca-config.json \
-profile=kubernetes admin-csr.json | cfssljson -bare admin
ls admin*
1.创建证书请求配置文件
cd /app/k8s/work
cat > kube-proxy-csr.json <<EOF
{
"CN": "system:kube-proxy",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "zhejiang",
"L": "hangzhou",
"O": "k8s",
"OU": "CMCC"
}
]
}
EOF
2.签发证书
cd /app/k8s/work
cfssl gencert -ca=/app/k8s/work/ca.pem \
-ca-key=/app/k8s/work/ca-key.pem \
-config=/app/k8s/work/ca-config.json \
-profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy
ls kube-proxy*
1.创建证书请求配置文件
cd /app/k8s/work
cat > flanneld-csr.json <<EOF
{
"CN": "flanneld",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "zhejiang",
"L": "hangzhou",
"O": "k8s",
"OU": "CMCC"
}
]
}
EOF
2.签发证书
cfssl gencert -ca=/app/k8s/work/ca.pem \
-ca-key=/app/k8s/work/ca-key.pem \
-config=/app/k8s/work/ca-config.json \
-profile=kubernetes flanneld-csr.json | cfssljson -bare flanneld
ls flanneld*pem
将生成的 CA 证书、秘钥文件、配置文件拷贝到所有节点的 /etc/kubernetes/cert
目录下:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/kubernetes/cert"
scp flanneld*pem ca-config.json root@${node_ip}:/etc/flanneld/cert
done
tags: kubectl
本文档介绍安装和配置 kubernetes 集群的命令行管理工具 kubectl 的步骤。
kubectl 默认从 ~/.kube/config
文件读取 kube-apiserver 地址和认证信息,如果没有配置,执行 kubectl 命令时可能会出错:
$ kubectl get pods
The connection to the server localhost:8080 was refused - did you specify the right host or port?
注意:
~/.kube/config
;下载和解压:
cd /app/k8s/work
curl -LO https://dl.k8s.io/release/v1.22.0/bin/linux/amd64/kubectl
分发到所有使用 kubectl 的节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kubectl root@${node_ip}:/app/k8s/bin/
ssh root@${node_ip} "chmod +x /app/k8s/bin/*"
done
kubeconfig 为 kubectl 的配置文件,包含访问 apiserver 的所有信息,如 apiserver 地址、CA 证书和自身使用的证书;
cd /app/k8s/work
source /app/k8s/bin/environment.sh
# 设置集群参数
kubectl config set-cluster kubernetes \
--certificate-authority=/app/k8s/work/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kubectl.kubeconfig
# 设置客户端认证参数
kubectl config set-credentials admin \
--client-certificate=/app/k8s/work/admin.pem \
--client-key=/app/k8s/work/admin-key.pem \
--embed-certs=true \
--kubeconfig=kubectl.kubeconfig
# 设置上下文参数
kubectl config set-context kubernetes \
--cluster=kubernetes \
--user=admin \
--kubeconfig=kubectl.kubeconfig
# 设置默认上下文
kubectl config use-context kubernetes --kubeconfig=kubectl.kubeconfig
--certificate-authority
:验证 kube-apiserver 证书的根证书;--client-certificate
、--client-key
:刚生成的 admin
证书和私钥,连接 kube-apiserver 时使用;--embed-certs=true
:将 ca.pem 和 admin.pem 证书内容嵌入到生成的 kubectl.kubeconfig 文件中(不加时,写入的是证书文件路径,后续拷贝 kubeconfig 到其它机器时,还需要单独拷贝证书文件,不方便。);分发到所有使用 kubectl
命令的节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p ~/.kube"
scp kubectl.kubeconfig root@${node_ip}:~/.kube/config
done
wget https://github.com//dty1er/kubecolor/releases/download/v0.0.20/kubecolor_0.0.20_Linux_x86_64.tar.gz
tar -zvxf kubecolor_0.0.20_Linux_x86_64.tar.gz
cp kubecolor /app/k8s/bin/
chmod +x /app/k8s/bin/*
echo 'command -v kubecolor >/dev/null 2>&1 && alias k="kubecolor"' >> ~/.bashrc
echo 'complete -o default -F __start_kubectl k' >> ~/.bashrc
source ~/.bashrc
kubecolor get pods
tags: etcd
etcd 是基于 Raft 的分布式 key-value 存储系统,由 CoreOS 开发,常用于服务发现、共享配置以及并发控制(如 leader 选举、分布式锁等)。kubernetes 使用 etcd 存储所有运行数据。
本文档介绍部署一个三节点高可用 etcd 集群的步骤:
etcd 集群各节点的名称和 IP 如下:
注意:如果没有特殊指明,本文档的所有操作均在 k8master-1 节点上执行,然后远程分发文件和执行命令。
到 etcd 的 release 页面 下载最新版本的发布包:
cd /app/k8s/work
wget https://github.com/etcd-io/etcd/releases/download/v3.5.0/etcd-v3.5.0-linux-amd64.tar.gz
tar -zxvf etcd-v3.5.0-linux-amd64.tar.gz
分发二进制文件到集群所有节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
scp etcd-v3.5.0-linux-amd64/etcd* root@${node_ip}:/app/k8s/bin
ssh root@${node_ip} "chmod +x /app/k8s/bin/*"
done
分发生成的证书和私钥到各 etcd 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/etcd/cert"
scp etcd*.pem root@${node_ip}:/etc/etcd/cert/
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > etcd.service.template <<EOF
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos
[Service]
Type=notify
WorkingDirectory=${ETCD_DATA_DIR}
ExecStart=/app/k8s/bin/etcd \\
--data-dir=${ETCD_DATA_DIR} \\
--wal-dir=${ETCD_WAL_DIR} \\
--name=##NODE_NAME## \\
--cert-file=/etc/etcd/cert/etcd.pem \\
--key-file=/etc/etcd/cert/etcd-key.pem \\
--trusted-ca-file=/etc/kubernetes/cert/ca.pem \\
--peer-cert-file=/etc/etcd/cert/etcd.pem \\
--peer-key-file=/etc/etcd/cert/etcd-key.pem \\
--peer-trusted-ca-file=/etc/kubernetes/cert/ca.pem \\
--peer-client-cert-auth \\
--client-cert-auth \\
--listen-peer-urls=https://##NODE_IP##:2380 \\
--initial-advertise-peer-urls=https://##NODE_IP##:2380 \\
--listen-client-urls=https://##NODE_IP##:2379,http://127.0.0.1:2379 \\
--advertise-client-urls=https://##NODE_IP##:2379 \\
--initial-cluster-token=etcd-cluster-0 \\
--initial-cluster=${ETCD_NODES} \\
--initial-cluster-state=new \\
--auto-compaction-mode=periodic \\
--auto-compaction-retention=1 \\
--max-request-bytes=33554432 \\
--quota-backend-bytes=6442450944 \\
--heartbeat-interval=250 \\
--election-timeout=2000 \\
--enable-v2=true
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
WorkingDirectory
、--data-dir
:指定工作目录和数据目录为 ${ETCD_DATA_DIR}
,需在启动服务前创建这个目录;--wal-dir
:指定 wal 目录,为了提高性能,一般使用 SSD 或者和 --data-dir
不同的磁盘;--name
:指定节点名称,当 --initial-cluster-state
值为 new
时,--name
的参数值必须位于 --initial-cluster
列表中;--cert-file
、--key-file
:etcd server 与 client 通信时使用的证书和私钥;--trusted-ca-file
:签名 client 证书的 CA 证书,用于验证 client 证书;--peer-cert-file
、--peer-key-file
:etcd 与 peer 通信使用的证书和私钥;--peer-trusted-ca-file
:签名 peer 证书的 CA 证书,用于验证 peer 证书;或者这个配置:
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos
[Service]
Type=notify
WorkingDirectory=/app/k8s/etcd/data
ExecStart=/app/k8s/bin/etcd \
--data-dir=/app/k8s/etcd/data \
--wal-dir=/app/k8s/etcd/wal \
--name=k8master-1 \
--cert-file=/etc/etcd/cert/etcd.pem \
--key-file=/etc/etcd/cert/etcd-key.pem \
--trusted-ca-file=/etc/kubernetes/cert/ca.pem \
--peer-cert-file=/etc/etcd/cert/etcd.pem \
--peer-key-file=/etc/etcd/cert/etcd-key.pem \
--peer-trusted-ca-file=/etc/kubernetes/cert/ca.pem \
--peer-client-cert-auth \
--client-cert-auth \
--listen-peer-urls=https://192.168.3.144:2380 \
--initial-advertise-peer-urls=https://192.168.3.144:2380 \
--listen-client-urls=https://192.168.3.144:2379,http://127.0.0.1:2379 \
--advertise-client-urls=https://192.168.3.144:2379 \
--initial-cluster-token=etcd-cluster-0 \
--initial-cluster=k8master-1=https://192.168.3.144:2380,k8master-2=https://192.168.3.145:2380,k8master-3=https://192.168.3.146:2380
--initial-cluster-state=new \
--auto-compaction-mode=periodic \
--auto-compaction-retention=1 \
--max-request-bytes=33554432 \
--quota-backend-bytes=6442450944 \
--heartbeat-interval=250 \
--election-timeout=2000
--enable-v2=true
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
替换模板文件中的变量,为各节点创建 systemd unit 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for (( i=0; i < 9; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" etcd.service.template > etcd-${NODE_IPS[i]}.service
done
ls *.service
分发生成的 systemd unit 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
scp etcd-${node_ip}.service root@${node_ip}:/etc/systemd/system/etcd.service
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p ${ETCD_DATA_DIR} ${ETCD_WAL_DIR}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable etcd && systemctl restart etcd " &
done
systemctl start etcd
会卡住一段时间,为正常现象;cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status etcd|grep Active"
done
确保状态为 active (running)
,否则查看日志,确认原因:
journalctl -u etcd
部署完 etcd 集群后,在任一 etcd 节点上执行如下命令:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
ETCDCTL_API=3 /app/k8s/bin/etcdctl \
--endpoints=https://${node_ip}:2379 \
--cacert=/etc/kubernetes/cert/ca.pem \
--cert=/etc/etcd/cert/etcd.pem \
--key=/etc/etcd/cert/etcd-key.pem endpoint health
done
预期输出:
>>> 172.27.137.240
https://172.27.137.240:2379 is healthy: successfully committed proposal: took = 2.756451ms
>>> 172.27.137.239
https://172.27.137.239:2379 is healthy: successfully committed proposal: took = 2.025018ms
>>> 172.27.137.238
https://172.27.137.238:2379 is healthy: successfully committed proposal: took = 2.335097ms
输出均为 healthy
时表示集群服务正常。
source /app/k8s/bin/environment.sh
ETCDCTL_API=3 /app/k8s/bin/etcdctl \
-w table --cacert=/etc/kubernetes/cert/ca.pem \
--cert=/etc/etcd/cert/etcd.pem \
--key=/etc/etcd/cert/etcd-key.pem \
--endpoints=${ETCD_ENDPOINTS} endpoint status
输出:
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.3.144:2379 | 4c3f96aeaf516ead | 3.5.0 | 29 kB | false | false | 16 | 44 | 44 | |
| https://192.168.3.145:2379 | 166a0a39c02deebc | 3.5.0 | 20 kB | false | false | 16 | 44 | 44 | |
| https://192.168.3.146:2379 | 4591e7e99dfd9a5e | 3.5.0 | 20 kB | true | false | 16 | 44 | 44 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
如果出现 IS LEADER 2个true,检查日志发现:
request cluster ID mismatch
需要删除:
/app/k8s/etcd/work/*
/app/k8s/etcd/wal/*
再重启服务
tags: flanneld
kubernetes 要求集群内各节点(包括 master 节点)能通过 Pod 网段互联互通。flannel 使用 vxlan 技术为各节点创建一个可以互通的 Pod 网络,使用的端口为 UDP 8472(需要开放该端口,如公有云 AWS 等)。
flanneld 第一次启动时,从 etcd 获取配置的 Pod 网段信息,为本节点分配一个未使用的地址段,然后创建 flannedl.1
网络接口(也可能是其它名称,如 flannel1 等)。
flannel 将分配给自己的 Pod 网段信息写入 /app/flannel/docker
文件,docker 后续使用这个文件中的环境变量设置 docker0
网桥,从而从这个地址段为本节点的所有 Pod 容器分配 IP。
注意:如果没有特殊指明,本文档的所有操作均在 k8master-1 节点上执行,然后远程分发文件和执行命令。
从 flannel 的 release 页面 下载最新版本的安装包:
cd /app/k8s/work
mkdir flannel
wget https://github.com/coreos/flannel/releases/download/v0.14.0/flannel-v0.14.0-linux-amd64.tar.gz
tar -xzvf flannel-v0.14.0-linux-amd64.tar.gz -C flannel
cd /app/k8s/work
mkdir cni
wget https://github.com/containernetworking/plugins/releases/download/v0.9.1/cni-plugins-linux-amd64-v0.9.1.tgz
tar -xvf cni-plugins-linux-amd64-v0.9.1.tgz -C cni
分发flanneld二进制文件到集群所有节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
scp flannel/{flanneld,mk-docker-opts.sh} root@${node_ip}:/app/k8s/bin/
ssh root@${node_ip} "chmod +x /app/k8s/bin/*"
done
分发CNI插件二进制文件到集群worker节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
scp cni/* root@${node_ip}:/app/k8s/bin/cni
ssh root@${node_ip} "chmod +x /app/k8s/bin/cni/*"
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > net-conf.json <<EOF
{
"Network": "172.1.0.0/16",
"Backend": {
"Type": "vxlan",
"DirectRouting": true
}
}
EOF
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/kube-flannel/"
scp net-conf.json root@${node_ip}:/etc/kube-flannel/
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > 10-flannel.conflist <<EOF
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
EOF
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/cni/net.d/"
scp 10-flannel.conflist root@${node_ip}:/etc/cni/net.d/
done
cat <<EOF | kubectl apply -f -
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
rules:
- apiGroups: ['extensions']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
EOF
cd /app/k8s/work
source /app/k8s/bin/environment.sh
kubectl config set-cluster kubernetes --kubeconfig=flannel.conf --embed-certs --server=https://192.168.3.140:8443 --certificate-authority=/app/k8s/work/ca.pem
kubectl config set-credentials flannel --kubeconfig=flannel.conf --token=$(kubectl get sa -n kube-system flannel -o jsonpath={.secrets[0].name} | xargs kubectl get secret -n kube-system -o jsonpath={.data.token} | base64 -d)
kubectl config set-context kubernetes --kubeconfig=flannel.conf --user=flannel --cluster=kubernetes
kubectl config use-context kubernetes --kubeconfig=flannel.conf
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
scp flannel.conf root@${node_ip}:/etc/kubernetes/
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > flanneld.service.template <<EOF
[Unit]
Description=Flanneld
After=network.target
After=network-online.target
Wants=network-online.target
After=etcd.service
[Service]
Type=notify
Environment=NODE_NAME=##NODE_IP##
ExecStart=/app/k8s/bin/flanneld \\
--iface=ens192 \\
--ip-masq \\
--kube-subnet-mgr=true \\
--kubeconfig-file=/etc/kubernetes/flannel.conf
Restart=always
RestartSec=5
StartLimitInterval=0
[Install]
WantedBy=multi-user.target
EOF
mk-docker-apps.sh
脚本将分配给 flanneld 的 Pod 子网段信息写入 /app/flannel/docker
文件,后续 docker 启动时使用这个文件中的环境变量配置 docker0 网桥;-iface
参数指定通信接口;-ip-masq
: flanneld 为访问 Pod 网络外的流量设置 SNAT 规则,同时将传递给 Docker 的变量 --ip-masq
(/app/flannel/docker
文件中)设置为 false,这样 Docker 将不再创建 SNAT 规则;--ip-masq
为 true 时,创建的 SNAT 规则比较“暴力”:将所有本节点 Pod 发起的、访问非 docker0 接口的请求做 SNAT,这样访问其他节点 Pod 的请求来源 IP 会被设置为 flannel.1 接口的 IP,导致目的 Pod 看不到真实的来源 Pod IP。替换模板文件中的变量,为各节点生成 systemd unit 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for (( i=0; i < 7; i++ ))
do
sed -e "s/##NODE_IP##/${WOKR_IPS[i]}/" flanneld.service.template > flanneld-${WORK_IPS[i]}.service
done
ls flanneld-*.service
##分发生成的 systemd unit 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
scp flanneld-${node_ip}.service root@${node_ip}:/etc/systemd/system/flanneld.service
done
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable flanneld && systemctl restart flanneld"
done
source /app/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status flanneld|grep Active"
done
确保状态为 active (running)
,否则查看日志,确认原因:
journalctl -u flanneld
source /app/k8s/bin/environment.sh
ETCDCTL_API=2 etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/cert/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
set ${FLANNEL_ETCD_PREFIX}/config '{"Network":"'${CLUSTER_CIDR}'", "SubnetLen": 21, "Backend": {"Type": "vxlan"}}'
查看集群 Pod 网段(/16):
source /app/k8s/bin/environment.sh
ETCDCTL_API=2 etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/cert/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
get ${FLANNEL_ETCD_PREFIX}/config
输出:
{“Network”:“172.1.0.0/16”, “SubnetLen”: 24, “Backend”: {“Type”: “vxlan”}}
查看已分配的 Pod 子网段列表(/24):
source /app/k8s/bin/environment.sh
ETCDCTL_API=2 etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/cert/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
ls ${FLANNEL_ETCD_PREFIX}/subnets
输出(结果视部署情况而定):
/kubernetes/network/subnets/172.1.99.0-24
/kubernetes/network/subnets/172.1.30.0-24
/kubernetes/network/subnets/172.1.14.0-24
/kubernetes/network/subnets/172.1.38.0-24
/kubernetes/network/subnets/172.1.12.0-24
/kubernetes/network/subnets/172.1.52.0-24
/kubernetes/network/subnets/172.1.2.0-24
/kubernetes/network/subnets/172.1.24.0-24
查看某一 Pod 网段对应的节点 IP 和 flannel 接口地址:
source /app/k8s/bin/environment.sh
ETCDCTL_API=2 etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/cert/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
get ${FLANNEL_ETCD_PREFIX}/subnets/172.1.99.0-24
输出(结果视部署情况而定):
{"PublicIP":"192.168.3.143","BackendType":"vxlan","BackendData":{"VNI":1,"VtepMAC":"f6:89:30:ef:45:04"}}
[root@ka-3 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:80:99:bd brd ff:ff:ff:ff:ff:ff
inet 192.168.3.143/24 brd 192.168.3.255 scope global noprefixroute ens192
valid_lft forever preferred_lft forever
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether f6:89:30:ef:45:04 brd ff:ff:ff:ff:ff:ff
inet 172.1.99.0/32 brd 172.1.99.0 scope global flannel.1
valid_lft forever preferred_lft forever
[root@ka-3 ~]# ip route show |grep flannel.1
172.1.2.0/24 via 172.1.2.0 dev flannel.1 onlink
172.1.12.0/24 via 172.1.12.0 dev flannel.1 onlink
172.1.14.0/24 via 172.1.14.0 dev flannel.1 onlink
172.1.24.0/24 via 172.1.24.0 dev flannel.1 onlink
172.1.30.0/24 via 172.1.30.0 dev flannel.1 onlink
172.1.38.0/24 via 172.1.38.0 dev flannel.1 onlink
172.1.52.0/24 via 172.1.52.0 dev flannel.1 onlink
${FLANNEL_ETCD_PREFIX}/subnets/172.1.99.0-24
,来决定进请求发送给哪个节点的互联 IP;在各节点上部署 flannel 后,检查是否创建了 flannel 接口(名称可能为 flannel0、flannel.0、flannel.1 等):
source /app/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "/usr/sbin/ip addr show flannel.1|grep -w inet"
done
输出:
>>> 192.168.3.141
inet 172.1.2.0/32 brd 172.1.2.0 scope global flannel.1
>>> 192.168.3.142
inet 172.1.24.0/32 brd 172.1.24.0 scope global flannel.1
>>> 192.168.3.143
inet 172.1.99.0/32 brd 172.1.99.0 scope global flannel.1
>>> 192.168.3.144
inet 172.1.12.0/32 brd 172.1.12.0 scope global flannel.1
>>> 192.168.3.145
inet 172.1.52.0/32 brd 172.1.52.0 scope global flannel.1
>>> 192.168.3.146
inet 172.1.30.0/32 brd 172.1.30.0 scope global flannel.1
>>> 192.168.3.147
inet 172.1.14.0/32 brd 172.1.14.0 scope global flannel.1
>>> 192.168.3.148
inet 172.1.38.0/32 brd 172.1.38.0 scope global flannel.1
在各节点上 ping 所有 flannel 接口 IP,确保能通:
source /app/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "ping -c 1 172.1.2.0"
ssh ${node_ip} "ping -c 1 172.1.24.0"
ssh ${node_ip} "ping -c 1 172.1.99.0"
done
本文档讲解使用 nginx 4 层透明代理功能实现 K8S 节点( master 节点和 worker 节点)高可用访问 kube-apiserver 的步骤。
注意:如果没有特殊指明,本文档的所有操作均在 k8master-1 节点上执行,然后远程分发文件和执行命令。
KA-1 KA-2 KA-3
yum install keepalived -y
配置 ka-1
cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
notification_email {
[email protected]
[email protected]
[email protected]
}
notification_email_from [email protected]
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 {
state BACKUP
interface ens192
virtual_router_id 51
priority 100
advert_int 1
nopreempt
unicast_src_ip 192.168.3.141
unicast_peer {
192.168.3.142
192.168.3.143
}
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.3.140/24
}
}
EOF
配置keepalived on ka-2:
cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
notification_email {
[email protected]
[email protected]
[email protected]
}
notification_email_from [email protected]
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 {
state BACKUP
interface ens192
virtual_router_id 51
priority 90
advert_int 1
nopreempt
unicast_src_ip 192.168.3.142
unicast_peer {
192.168.3.141
192.168.3.143
}
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.3.140/24
}
}
EOF
配置keepalived on ka-3:
cat > /etc/keepalived/keepalived.conf << EOF
global_defs {
notification_email {
[email protected]
[email protected]
[email protected]
}
notification_email_from [email protected]
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 {
state BACKUP
interface ens192
virtual_router_id 51
priority 60
advert_int 1
nopreempt
unicast_src_ip 192.168.3.143
unicast_peer {
192.168.3.141
192.168.3.142
}
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.3.140/24
}
}
EOF
下载源码:
cd /app/k8s/work
wget http://nginx.org/download/nginx-1.15.3.tar.gz
tar -xzvf nginx-1.15.3.tar.gz
安装编译环境:
yum -y install gcc gcc-c++ autoconf automake make
配置编译参数:
cd /app/k8s/work/nginx-1.15.3
mkdir nginx-prefix
./configure --with-stream --without-http --prefix=$(pwd)/nginx-prefix --without-http_uwsgi_module --without-http_scgi_module --without-http_fastcgi_module
--with-stream
:开启 4 层透明转发(TCP Proxy)功能;--without-xxx
:关闭所有其他功能,这样生成的动态链接二进制程序依赖最小;输出:
Configuration summary
+ PCRE library is not used
+ OpenSSL library is not used
+ zlib library is not used
nginx path prefix: "/root/tmp/nginx-1.15.3/nginx-prefix"
nginx binary file: "/root/tmp/nginx-1.15.3/nginx-prefix/sbin/nginx"
nginx modules path: "/root/tmp/nginx-1.15.3/nginx-prefix/modules"
nginx configuration prefix: "/root/tmp/nginx-1.15.3/nginx-prefix/conf"
nginx configuration file: "/root/tmp/nginx-1.15.3/nginx-prefix/conf/nginx.conf"
nginx pid file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/nginx.pid"
nginx error log file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/error.log"
nginx http access log file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/access.log"
nginx http client request body temporary files: "client_body_temp"
nginx http proxy temporary files: "proxy_temp"
编译和安装:
cd /app/k8s/work/nginx-1.15.3
make && make install
cd /app/k8s/work/nginx-1.15.3
./nginx-prefix/sbin/nginx -v
输出:
nginx version: nginx/1.15.3
查看 nginx 动态链接的库:
$ ldd ./nginx-prefix/sbin/nginx
输出:
linux-vdso.so.1 => (0x00007ffc945e7000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f4385072000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4384e56000)
libc.so.6 => /lib64/libc.so.6 (0x00007f4384a89000)
/lib64/ld-linux-x86-64.so.2 (0x00007f4385276000)
创建目录结构:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${KA_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /app/k8s/kube-nginx/{conf,logs,sbin}"
done
拷贝二进制程序:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${KA_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /app/k8s/kube-nginx/{conf,logs,sbin}"
scp /app/k8s/work/nginx-1.15.3/nginx-prefix/sbin/nginx root@${node_ip}:/app/k8s/kube-nginx/sbin/kube-nginx
ssh root@${node_ip} "chmod a+x /app/k8s/kube-nginx/sbin/*"
done
配置 nginx,开启 4 层透明转发功能:
cd /app/k8s/work
cat > kube-nginx.conf << \EOF
worker_processes 1;
events {
worker_connections 1024;
}
stream {
upstream backend {
hash $remote_addr consistent;
server 192.168.3.144:6443 max_fails=3 fail_timeout=30s;
server 192.168.3.145:6443 max_fails=3 fail_timeout=30s;
server 192.168.3.146:6443 max_fails=3 fail_timeout=30s;
}
server {
listen 192.168.3.140:8443;
proxy_connect_timeout 1s;
proxy_pass backend;
}
}
EOF
分发配置文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${KA_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-nginx.conf root@${node_ip}:/app/k8s/kube-nginx/conf/kube-nginx.conf
done
配置 kube-nginx systemd unit 文件:
cd /app/k8s/work
cat > kube-nginx.service <<EOF
[Unit]
Description=kube-apiserver nginx proxy
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=forking
ExecStartPre=/app/k8s/kube-nginx/sbin/kube-nginx -c /app/k8s/kube-nginx/conf/kube-nginx.conf -p /app/k8s/kube-nginx -t
ExecStart=/app/k8s/kube-nginx/sbin/kube-nginx -c /app/k8s/kube-nginx/conf/kube-nginx.conf -p /app/k8s/kube-nginx
ExecReload=/app/k8s/kube-nginx/sbin/kube-nginx -c /app/k8s/kube-nginx/conf/kube-nginx.conf -p /app/k8s/kube-nginx -s reload
PrivateTmp=true
Restart=always
RestartSec=5
StartLimitInterval=0
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
分发 systemd unit 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${KA_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-nginx.service root@${node_ip}:/etc/systemd/system/
done
启动 kube-nginx 服务:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${KA_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-nginx && systemctl restart kube-nginx"
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${KA_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status kube-nginx |grep 'Active:'"
done
确保状态为 active (running)
,否则查看日志,确认原因:
journalctl -u kube-nginx
tags: master, kube-apiserver, kube-scheduler, kube-controller-manager
kubernetes master 节点运行如下组件:
kube-apiserver、kube-scheduler 和 kube-controller-manager 均以多实例模式运行:
注意:如果没有特殊指明,本文档的所有操作均在 k8master-1 节点上执行,然后远程分发文件和执行命令。
参考 06-0.apiserver高可用之nginx代理.md
从 CHANGELOG 页面 下载二进制 tar 文件并解压:
cd /app/k8s/work
wget https://dl.k8s.io/v1.22.1/kubernetes-server-linux-amd64.tar.gz
tar -xzvf kubernetes-server-linux-amd64.tar.gz
cd kubernetes
tar -xzvf kubernetes-src.tar.gz
将二进制文件拷贝到所有 master 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
scp kubernetes/server/bin/{apiextensions-apiserver,cloud-controller-manager,kube-apiserver,kube-controller-manager,kube-proxy,kube-scheduler,kubeadm,kubectl,kubelet,mounter} root@${node_ip}:/app/k8s/bin/
ssh root@${node_ip} "chmod +x /app/k8s/bin/*"
done
tags: master, kube-apiserver
本文档讲解部署一个三实例 kube-apiserver 集群的步骤,它们通过 kube-nginx 进行代理访问,从而保证服务可用性。
注意:如果没有特殊指明,本文档的所有操作均在 k8master-1 节点上执行,然后远程分发文件和执行命令。
下载最新版本的二进制文件、安装和配置 flanneld 参考:06-1.部署master节点.md
将生成的证书和私钥文件拷贝到所有 master 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${API_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/kubernetes/cert"
scp api*.pem root@${node_ip}:/etc/kubernetes/cert/
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > encryption-config.yaml <<EOF
kind: EncryptionConfig
apiVersion: v1
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: ${ENCRYPTION_KEY}
- identity: {}
EOF
将加密配置文件拷贝到 master 节点的 /etc/kubernetes
目录下:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${API_IPS[@]}
do
echo ">>> ${node_ip}"
scp encryption-config.yaml root@${node_ip}:/etc/kubernetes/
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > audit-policy.yaml <<EOF
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# 不记录一些高容量和低风险的日志.
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core
resources: ["endpoints", "services", "services/status"]
- level: None
# Ingress controller reads 'configmaps/ingress-uid' through the unsecured port.
# TODO(#46983): Change this to the ingress controller service account.
users: ["system:unsecured"]
namespaces: ["kube-system"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["configmaps"]
- level: None
users: ["kubelet"] # legacy kubelet identity
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes", "nodes/status"]
- level: None
userGroups: ["system:nodes"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes", "nodes/status"]
- level: None
users:
- system:kube-controller-manager
- system:kube-scheduler
- system:serviceaccount:kube-system:endpoint-controller
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: "" # core
resources: ["endpoints"]
- level: None
users: ["system:apiserver"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["namespaces", "namespaces/status", "namespaces/finalize"]
- level: None
users: ["cluster-autoscaler"]
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: "" # core
resources: ["configmaps", "endpoints"]
# Don't log HPA fetching metrics.
- level: None
users:
- system:kube-controller-manager
verbs: ["get", "list"]
resources:
- group: "metrics.k8s.io"
# Don't log these read-only URLs.
- level: None
nonResourceURLs:
- /healthz*
- /version
- /swagger*
# Don't log events requests because of performance impact.
- level: None
resources:
- group: "" # core
resources: ["events"]
# node and pod status calls from nodes are high-volume and can be large, don't log responses for expected updates from nodes
- level: Request
users: ["kubelet", "system:node-problem-detector", "system:serviceaccount:kube-system:node-problem-detector"]
verbs: ["update","patch"]
resources:
- group: "" # core
resources: ["nodes/status", "pods/status"]
omitStages:
- "RequestReceived"
- level: Request
userGroups: ["system:nodes"]
verbs: ["update","patch"]
resources:
- group: "" # core
resources: ["nodes/status", "pods/status"]
omitStages:
- "RequestReceived"
# deletecollection calls can be large, don't log responses for expected namespace deletions
- level: Request
users: ["system:serviceaccount:kube-system:namespace-controller"]
verbs: ["deletecollection"]
omitStages:
- "RequestReceived"
# Secrets, ConfigMaps, TokenRequest and TokenReviews can contain sensitive & binary data,
# so only log at the Metadata level.
- level: Metadata
resources:
- group: "" # core
resources: ["secrets", "configmaps", "serviceaccounts/token"]
- group: authentication.k8s.io
resources: ["tokenreviews"]
omitStages:
- "RequestReceived"
# Get responses can be large; skip them.
- level: Request
verbs: ["get", "list", "watch"]
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apiextensions.k8s.io"
- group: "apiregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "metrics.k8s.io"
- group: "networking.k8s.io"
- group: "node.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "scheduling.k8s.io"
- group: "storage.k8s.io"
omitStages:
- "RequestReceived"
# Default level for known APIs
- level: RequestResponse
resources:
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apiextensions.k8s.io"
- group: "apiregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "metrics.k8s.io"
- group: "networking.k8s.io"
- group: "node.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "scheduling.k8s.io"
- group: "storage.k8s.io"
omitStages:
- "RequestReceived"
# Default level for all other requests.
- level: Metadata
omitStages:
- "RequestReceived"
EOF
分发审计策略文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${API_IPS[@]}
do
echo ">>> ${node_ip}"
scp audit-policy.yaml root@${node_ip}:/etc/kubernetes/audit-policy.yaml
done
kubelet在启动时会向apiserver发送给csr请求,这里我们创建一个权限有限的token,用于kubelet向apiserver申请证书。
token,user,uid,"group1,group2,group3"
echo "$(head -c 16 /dev/urandom | od -An -t x | tr -d ' '),kubelet-bootstrap,10001,"system:bootstrappers"" > /etc/kubernetes/token.csv
token 下发:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${API_IPS[@]}
do
echo ">>> ${node_ip}"
scp /etc/kubernetes/token.csv root@${node_ip}:/etc/kubernetes/
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > kube-apiserver.service.template <<EOF
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
ExecStart=/app/k8s/bin/kube-apiserver \\
--advertise-address=##NODE_IP## \\
--allow-privileged=true \\
--apiserver-count=3 \\
--audit-log-maxage=15 \\
--audit-log-maxsize=100 \\
--audit-log-maxbackup=5 \\
--audit-log-path=/var/log/kube-apiserver-audit.log \\
--audit-policy-file=/etc/kubernetes/audit-policy.yaml \\
--authorization-mode=Node,RBAC \\
--bind-address=##NODE_IP## \\
--client-ca-file=/etc/kubernetes/cert/ca.pem \
--delete-collection-workers=2 \\
--enable-admission-plugins=NodeRestriction \\
--enable-bootstrap-token-auth \\
--token-auth-file=/etc/kubernetes/token.csv \\
--encryption-provider-config=/etc/kubernetes/encryption-config.yaml \\
--etcd-cafile=/etc/kubernetes/cert/ca.pem \\
--etcd-certfile=/etc/etcd/cert/etcd.pem \\
--etcd-keyfile=/etc/etcd/cert/etcd-key.pem \\
--etcd-servers=https://192.168.3.144:2379,https://192.168.3.145:2379,https://192.168.3.146:2379 \\
--event-ttl=72h \\
--feature-gates=EphemeralContainers=true \\
--kubelet-certificate-authority=/etc/kubernetes/cert/ca.pem \\
--kubelet-client-certificate=/etc/kubernetes/cert/apiserver.pem \\
--kubelet-client-key=/etc/kubernetes/cert/apiserver-key.pem \\
--logtostderr=false \\
--log-file=/var/log/kube-apiserver.log \\
--log-file-max-size=1 \\
--proxy-client-cert-file=/etc/kubernetes/cert/apiserver.pem \\
--proxy-client-key-file=/etc/kubernetes/cert/apiserver-key.pem \\
--requestheader-client-ca-file=/etc/kubernetes/cert/ca.pem \\
--requestheader-allowed-names="aggregator" \\
--requestheader-username-headers="X-Remote-User" \\
--requestheader-group-headers="X-Remote-Group" \\
--requestheader-extra-headers-prefix="X-Remote-Extra-" \\
--runtime-config=api/all=true \\
--secure-port=6443 \\
--service-account-issuer=https://kubernetes.default.svc.cluster.local \\
--service-account-key-file=/etc/kubernetes/cert/apiserver.pem \\
--service-account-signing-key-file=/etc/kubernetes/cert/apiserver-key.pem \\
--service-cluster-ip-range=10.254.0.0/16 \\
--service-node-port-range=30000-32767 \\
--tls-cert-file=/etc/kubernetes/cert/apiserver.pem \\
--tls-private-key-file=/etc/kubernetes/cert/apiserver-key.pem
Restart=on-failure
RestartSec=10
Type=notify
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
参数参考:
替换模板文件中的变量,为各节点生成 systemd unit 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for (( i=3; i < 7; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-apiserver.service.template > kube-apiserver-${NODE_IPS[i]}.service
done
ls kube-apiserver*.service
分发生成的 systemd unit 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${API_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-apiserver-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-apiserver.service
done
source /app/k8s/bin/environment.sh
for node_ip in ${API_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-apiserver && systemctl restart kube-apiserver"
done
source /app/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status kube-apiserver |grep 'Active:'"
done
确保状态为 active (running)
,否则查看日志,确认原因:
journalctl -u kube-apiserver
source /app/k8s/bin/environment.sh
ETCDCTL_API=3 etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--cacert=/app/k8s/work/ca.pem \
--cert=/app/k8s/work/etcd.pem \
--key=/app/k8s/work/etcd-key.pem \
get /registry/ --prefix --keys-only
$ kubectl cluster-info
Kubernetes master is running at https://192.168.3.140:8443
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$ kubectl get all --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.254.0.1 <none> 443/TCP 12m
$ kubectl get componentstatuses
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
如果执行 kubectl 命令式时输出如下错误信息,则说明使用的 ~/.kube/config
文件不对,请切换到正确的账户后再执行该命令:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
执行 kubectl get componentstatuses
命令时,apiserver 默认向 192.168.3.140 发送请求。当 controller-manager、scheduler 以集群模式运行时,有可能和 kube-apiserver 不在一台机器上,这时 controller-manager 或 scheduler 的状态为 Unhealthy,但实际上它们工作正常。
$ sudo netstat -lnpt|grep kube
tcp 0 0 192.168.3.144:6443 0.0.0.0:* LISTEN 101442/kube-apiserv
在执行 kubectl exec、run、logs 等命令时,apiserver 会将请求转发到 kubelet 的 https 端口。这里定义 RBAC 规则,授权 apiserver 使用的证书(kubernetes.pem)用户名(CN:kuberntes)访问 kubelet API 的权限:
kubectl create clusterrolebinding kube-apiserver:kubelet-apis --clusterrole=system:kubelet-api-admin --user kubernetes
tags: master, kube-controller-manager
本文档介绍部署高可用 kube-controller-manager 集群的步骤。
该集群包含 3 个节点,启动后将通过竞争选举机制产生一个 leader 节点,其它节点为阻塞状态。当 leader 节点不可用时,阻塞的节点将再次进行选举产生新的 leader 节点,从而保证服务的可用性。
为保证通信安全,本文档先生成 x509 证书和私钥,kube-controller-manager 在如下两种情况下使用该证书:
注意:如果没有特殊指明,本文档的所有操作均在 k8master-1 节点上执行,然后远程分发文件和执行命令。
下载最新版本的二进制文件、安装和配置 flanneld 参考:06-1.部署master节点.md。
将生成的证书和私钥分发到所有 master 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${CTL_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-controller-manager*.pem root@${node_ip}:/etc/kubernetes/cert/
done
kube-controller-manager 使用 kubeconfig 文件访问 apiserver,该文件提供了 apiserver 地址、嵌入的 CA 证书和 kube-controller-manager 证书:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
kubectl config set-cluster kubernetes \
--certificate-authority=/app/k8s/work/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-controller-manager.kubeconfig
kubectl config set-credentials system:kube-controller-manager \
--client-certificate=kube-controller-manager.pem \
--client-key=kube-controller-manager-key.pem \
--embed-certs=true \
--kubeconfig=kube-controller-manager.kubeconfig
kubectl config set-context system:kube-controller-manager \
--cluster=kubernetes \
--user=system:kube-controller-manager \
--kubeconfig=kube-controller-manager.kubeconfig
kubectl config use-context system:kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfig
分发 kubeconfig 到所有 master 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-controller-manager.kubeconfig root@${node_ip}:/etc/kubernetes/
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > kube-controller-manager.service.template <<EOF
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
ExecStart=/app/k8s/bin/kube-controller-manager \\
--allocate-node-cidrs=true \\
--cluster-cidr=172.1.0.0/16 \\
--service-cluster-ip-range=0.254.0.0/16 \\
--node-cidr-mask-size=24 \\
--authentication-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\
--authorization-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\
--bind-address=##NODE_IP## \\
--client-ca-file=/etc/kubernetes/cert/ca.pem \\
--cluster-name=kubernetes \\
--cluster-signing-cert-file=/etc/kubernetes/cert/ca.pem \\
--cluster-signing-key-file=/etc/kubernetes/cert/ca-key.pem \\
--cluster-signing-duration=87600h \\
--concurrent-deployment-syncs=10 \\
--controllers=*,bootstrapsigner,tokencleaner \\
--horizontal-pod-autoscaler-initial-readiness-delay=30s \\
--horizontal-pod-autoscaler-sync-period=10s \\
--kube-api-burst=100 \\
--kube-api-qps=100 \\
--kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\
--leader-elect=true \\
--logtostderr=false \\
--log-file=/var/log/kube-controller-manager.log \\
--log-file-max-size=100 \\
--pod-eviction-timeout=1m \\
--root-ca-file=/etc/kubernetes/cert/ca.pem \\
--secure-port=10257 \\
--service-account-private-key-file=/etc/kubernetes/cert/apiserver-key.pem \\
--tls-cert-file=/etc/kubernetes/cert/kube-controller-manager.pem \\
--tls-private-key-file=/etc/kubernetes/cert/kube-controller-manager-key.pem \\
--use-service-account-credentials=true
Restart=on-failure
RestartSec=10
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
替换模板文件中的变量,为各节点创建 systemd unit 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for (( i=3; i < 7; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-controller-manager.service.template > kube-controller-manager-${NODE_IPS[i]}.service
done
ls kube-controller-manager*.service
分发到所有 master 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${CTL_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-controller-manager-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-controller-manager.service
done
source /app/k8s/bin/environment.sh
for node_ip in ${CTL_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-controller-manager && systemctl restart kube-controller-manager"
done
source /app/k8s/bin/environment.sh
for node_ip in ${CTL_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status kube-controller-manager|grep Active"
done
确保状态为 active (running)
,否则查看日志,确认原因:
journalctl -u kube-controller-manager
kube-controller-manager 监听 10257 端口,接收 https 请求:
$ sudo netstat -lnpt | grep kube-cont
tcp 0 0 192.168.3.144:10257 0.0.0.0:* LISTEN 108977/kube-control
注意:以下命令在 kube-controller-manager 节点上执行。
$ curl -s --cacert /app/k8s/work/ca.pem --cert /app/k8s/work/admin.pem --key /app/k8s/work/admin-key.pem https://192.168.3.144:10257/metrics |head
# HELP ClusterRoleAggregator_adds (Deprecated) Total number of adds handled by workqueue: ClusterRoleAggregator
# TYPE ClusterRoleAggregator_adds counter
ClusterRoleAggregator_adds 3
# HELP ClusterRoleAggregator_depth (Deprecated) Current depth of workqueue: ClusterRoleAggregator
# TYPE ClusterRoleAggregator_depth gauge
ClusterRoleAggregator_depth 0
# HELP ClusterRoleAggregator_longest_running_processor_microseconds (Deprecated) How many microseconds has the longest running processor for ClusterRoleAggregator been running.
# TYPE ClusterRoleAggregator_longest_running_processor_microseconds gauge
ClusterRoleAggregator_longest_running_processor_microseconds 0
# HELP ClusterRoleAggregator_queue_latency (Deprecated) How long an item stays in workqueueClusterRoleAggregator before being requested.
ClusteRole system:kube-controller-manager
的权限很小,只能创建 secret、serviceaccount 等资源对象,各 controller 的权限分散到 ClusterRole system:controller:XXX
中:
$ kubectl describe clusterrole system:kube-controller-manager
Name: system:kube-controller-manager
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
secrets [] [] [create delete get update]
endpoints [] [] [create get update]
serviceaccounts [] [] [create get update]
events [] [] [create patch update]
tokenreviews.authentication.k8s.io [] [] [create]
subjectaccessreviews.authorization.k8s.io [] [] [create]
configmaps [] [] [get]
namespaces [] [] [get]
*.* [] [] [list watch]
需要在 kube-controller-manager 的启动参数中添加 --use-service-account-credentials=true
参数,这样 main controller 会为各 controller 创建对应的 ServiceAccount XXX-controller。内置的 ClusterRoleBinding system:controller:XXX 将赋予各 XXX-controller ServiceAccount 对应的 ClusterRole system:controller:XXX 权限。
$ kubectl get clusterrole|grep controller
system:controller:attachdetach-controller 51m
system:controller:certificate-controller 51m
system:controller:clusterrole-aggregation-controller 51m
system:controller:cronjob-controller 51m
system:controller:daemon-set-controller 51m
system:controller:deployment-controller 51m
system:controller:disruption-controller 51m
system:controller:endpoint-controller 51m
system:controller:expand-controller 51m
system:controller:generic-garbage-collector 51m
system:controller:horizontal-pod-autoscaler 51m
system:controller:job-controller 51m
system:controller:namespace-controller 51m
system:controller:node-controller 51m
system:controller:persistent-volume-binder 51m
system:controller:pod-garbage-collector 51m
system:controller:pv-protection-controller 51m
system:controller:pvc-protection-controller 51m
system:controller:replicaset-controller 51m
system:controller:replication-controller 51m
system:controller:resourcequota-controller 51m
system:controller:route-controller 51m
system:controller:service-account-controller 51m
system:controller:service-controller 51m
system:controller:statefulset-controller 51m
system:controller:ttl-controller 51m
system:kube-controller-manager 51m
以 deployment controller 为例:
$ kubectl describe clusterrole system:controller:deployment-controller
Name: system:controller:deployment-controller
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
replicasets.apps [] [] [create delete get list patch update watch]
replicasets.extensions [] [] [create delete get list patch update watch]
events [] [] [create patch update]
pods [] [] [get list update watch]
deployments.apps [] [] [get list update watch]
deployments.extensions [] [] [get list update watch]
deployments.apps/finalizers [] [] [update]
deployments.apps/status [] [] [update]
deployments.extensions/finalizers [] [] [update]
deployments.extensions/status [] [] [update]
$ kubectl get lease -n kube-system kube-controller-manager
NAME HOLDER AGE
kube-controller-manager k8master-2_152485e6-c38b-4ef1-9f24-12388ca93634 97m
可见,当前的 leader 为 k8master-2 节点。
停掉一个或两个节点的 kube-controller-manager 服务,观察其它节点的日志,是否获取了 leader 权限。
tags: master, kube-scheduler
本文档介绍部署高可用 kube-scheduler 集群的步骤。
该集群包含 3 个节点,启动后将通过竞争选举机制产生一个 leader 节点,其它节点为阻塞状态。当 leader 节点不可用后,剩余节点将再次进行选举产生新的 leader 节点,从而保证服务的可用性。
为保证通信安全,本文档先生成 x509 证书和私钥,kube-scheduler 在如下两种情况下使用该证书:
注意:如果没有特殊指明,本文档的所有操作均在 master-1 节点上执行,然后远程分发文件和执行命令。
下载最新版本的二进制文件、安装和配置 flanneld 参考:06-1.部署master节点.md。
将生成的证书和私钥分发到所有 master 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-scheduler*.pem root@${node_ip}:/etc/kubernetes/cert/
done
kube-scheduler 使用 kubeconfig 文件访问 apiserver,该文件提供了 apiserver 地址、嵌入的 CA 证书和 kube-scheduler 证书:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
kubectl config set-cluster kubernetes \
--certificate-authority=/app/k8s/work/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-scheduler.kubeconfig
kubectl config set-credentials system:kube-scheduler \
--client-certificate=kube-scheduler.pem \
--client-key=kube-scheduler-key.pem \
--embed-certs=true \
--kubeconfig=kube-scheduler.kubeconfig
kubectl config set-context system:kube-scheduler \
--cluster=kubernetes \
--user=system:kube-scheduler \
--kubeconfig=kube-scheduler.kubeconfig
kubectl config use-context system:kube-scheduler --kubeconfig=kube-scheduler.kubeconfig
分发 kubeconfig 到所有 master 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-scheduler.kubeconfig root@${node_ip}:/etc/kubernetes/
done
分发 kube-scheduler 配置文件到所有 master 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > kube-scheduler.service.template <<EOF
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
ExecStart=/app/k8s/bin/kube-scheduler \\
--authentication-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\
--authorization-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\
--kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\
--bind-address=0.0.0.0 \\
--secure-port=10259 \\
--client-ca-file=/etc/kubernetes/cert/ca.pem \\
--leader-elect=true \\
--log-file=/var/log/kube-scheduler.log \\
--log-file-max-size=100 \\
--logtostderr=false \\
--tls-cert-file=/etc/kubernetes/cert/kube-scheduler.pem \\
--tls-private-key-file=/etc/kubernetes/cert/kube-scheduler-key.pem
Restart=on-failure
RestartSec=10
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
分发 systemd unit 文件到所有 master 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-scheduler.service.template root@${node_ip}:/etc/systemd/system/kube-scheduler.service
done
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-scheduler && systemctl restart kube-scheduler"
done
source /app/k8s/bin/environment.sh
for node_ip in ${ETCD_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status kube-scheduler|grep Active"
done
确保状态为 active (running)
,否则查看日志,确认原因:
journalctl -u kube-scheduler
注意:以下命令在 kube-scheduler 节点上执行。
kube-scheduler 监听 10251 和 10259 端口:
两个接口都对外提供 /metrics
和 /healthz
的访问。
$ sudo netstat -lnpt |grep kube-sch
tcp 0 0 192.168.3.144:10251 0.0.0.0:* LISTEN 114702/kube-schedul
tcp 0 0 192.168.3.144:10259 0.0.0.0:* LISTEN 114702/kube-schedul
$ curl -s http://192.168.3.144:10251/metrics |head
# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
$ curl -s --cacert /app/k8s/work/ca.pem --cert /app/k8s/work/admin.pem --key /app/k8s/work/admin-key.pem https://192.168.3.144:10259/metrics |head
# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
$ kubectl get lease -n kube-system kube-scheduler
NAME HOLDER AGE
kube-scheduler k8master-1_d1a9902d-3853-4d00-a2e3-6179ca0f8a5a 8m53s
可见,当前的 leader 为 k8master-1 节点。
随便找一个或两个 master 节点,停掉 kube-scheduler 服务,看其它节点是否获取了 leader 权限。
tags: worker, flanneld, docker, kubeconfig, kubelet, kube-proxy
kubernetes worker 节点运行如下组件:
注意:如果没有特殊指明,本文档的所有操作均在 k8master-1 节点上执行,然后远程分发文件和执行命令。
参考 05-部署flannel网络.md。
参考 06-0.apiserver高可用之nginx代理.md。
CentOS:
source /opt/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "yum install -y epel-release"
ssh root@${node_ip} "yum install -y conntrack ipvsadm ntp ntpdate ipset jq iptables curl sysstat libseccomp && modprobe ip_vs "
done
tags: worker, docker
docker 运行和管理容器,kubelet 通过 Container Runtime Interface (CRI) 与它进行交互。
注意:如果没有特殊指明,本文档的所有操作均在 k8master-1,然后远程分发文件和执行命令。
参考 07-0.部署worker节点.md。
到 docker 下载页面 下载最新发布包:
cd /app/k8s/work
wget https://download.docker.com/linux/static/stable/x86_64/docker-20.10.8.tgz
tar -xvf docker-20.10.8.tgz
分发二进制文件到所有 worker 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
scp docker/* root@${node_ip}:/app/k8s/bin/
ssh root@${node_ip} "chmod +x /app/k8s/bin/*"
done
cd /app/k8s/work
cat > docker.service <<"EOF"
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.io
[Service]
WorkingDirectory=##DOCKER_DIR##
Environment="PATH=/app/k8s/bin:/bin:/sbin:/usr/bin:/usr/sbin"
EnvironmentFile=-/app/flannel/docker
ExecStart=/app/k8s/bin/dockerd
ExecReload=/bin/kill -s HUP $MAINPID
Restart=on-failure
RestartSec=5
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
Delegate=yes
KillMode=process
[Install]
WantedBy=multi-user.target
EOF
EOF 前后有双引号,这样 bash 不会替换文档中的变量,如 $DOCKER_NETWORK_OPTIONS
(这些环境变量是 systemd 负责替换的。);
dockerd 运行时会调用其它 docker 命令,如 docker-proxy,所以需要将 docker 命令所在的目录加到 PATH 环境变量中;
flanneld 启动时将网络配置写入 /app/flannel/docker
文件中,dockerd 启动前读取该文件中的环境变量 DOCKER_NETWORK_OPTIONS
,然后设置 docker0 网桥网段;
如果指定了多个 EnvironmentFile
选项,则必须将 /app/flannel/docker
放在最后(确保 docker0 使用 flanneld 生成的 bip 参数);
docker 需要以 root 用于运行;
docker 从 1.13 版本开始,可能将 iptables FORWARD chain的默认策略设置为DROP,从而导致 ping 其它 Node 上的 Pod IP 失败,遇到这种情况时,需要手动设置策略为 ACCEPT
:
$ sudo iptables -P FORWARD ACCEPT
并且把以下命令写入 /etc/rc.local
文件中,防止节点重启iptables FORWARD chain的默认策略又还原为DROP
/sbin/iptables -P FORWARD ACCEPT
分发 systemd unit 文件到所有 worker 机器:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
sed -i -e "s|##DOCKER_DIR##|${DOCKER_DIR}|" docker.service
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
scp docker.service root@${node_ip}:/etc/systemd/system/
done
使用国内的仓库镜像服务器以加快 pull image 的速度,同时增加下载的并发数 (需要重启 dockerd 生效):
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > docker-daemon.json <<EOF
{
"registry-mirrors": ["https://docker.mirrors.ustc.edu.cn","https://hub-mirror.c.163.com"],
"insecure-registries": ["docker02:35000"],
"exec-opts": ["native.cgroupdriver=systemd"],
"max-concurrent-downloads": 20,
"live-restore": true,
"max-concurrent-uploads": 10,
"debug": true,
"data-root": "${DOCKER_DIR}/data",
"exec-root": "${DOCKER_DIR}/exec",
"log-opts": {
"max-size": "100m",
"max-file": "5"
}
}
EOF
分发 docker 配置文件到所有 worker 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/docker/ ${DOCKER_DIR}/{data,exec}"
scp docker-daemon.json root@${node_ip}:/etc/docker/daemon.json
done
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable docker && systemctl restart docker"
done
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status docker|grep Active"
done
确保状态为 active (running)
,否则查看日志,确认原因:
journalctl -u docker
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "/usr/sbin/ip addr show flannel.1 && /usr/sbin/ip addr show docker0"
done
确认各 worker 节点的 docker0 网桥和 flannel.1 接口的 IP 处于同一个网段中(如下 172.30.80.0/32 位于 172.30.80.1/21 中):
>>> 172.27.137.240
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether ce:9c:a9:08:50:03 brd ff:ff:ff:ff:ff:ff
inet 172.30.80.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:5c:c1:77:03 brd ff:ff:ff:ff:ff:ff
inet 172.30.80.1/21 brd 172.30.87.255 scope global docker0
valid_lft forever preferred_lft forever
注意: 如果您的服务安装顺序不对或者机器环境比较复杂, docker服务早于flanneld服务安装,此时 worker 节点的 docker0 网桥和 flannel.1 接口的 IP可能不会同处同一个网段下,这个时候请先停止docker服务, 手工删除docker0网卡,重新启动docker服务后即可修复:
systemctl stop docker
ip link delete docker0
systemctl start docker
$ ps -elfH|grep docker
0 S root 28404 26159 0 80 0 - 28207 pipe_r 10:09 pts/0 00:00:00 grep --color=auto docker
4 S root 28163 1 0 80 0 - 337435 do_wai 10:08 ? 00:00:00 /app/k8s/bin/dockerd --bip=172.1.14.1/24 --ip-masq=false --mtu=1450
0 S root 28169 28163 0 80 0 - 313542 futex_ 10:08 ? 00:00:00 containerd --config /app/k8s/docker/exec/containerd/contai
$ docker info
Client:
Context: default
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.8
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e25210fe30a0a703442421b0f60afac609f950a3
runc version: v1.0.1-0-g4144b638
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 5.13.13-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.775GiB
Name: k8worker-1
ID: Z63R:N2N6:LZUF:GKF7:LF6E:4E5V:MR3O:A7GT:S423:BCEA:4S2D:44YW
Docker Root Dir: /app/k8s/docker/data
Debug Mode: true
File Descriptors: 25
Goroutines: 39
System Time: 2021-09-14T10:10:06.013211604+08:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
docker02:35000
127.0.0.0/8
Registry Mirrors:
https://docker.mirrors.ustc.edu.cn/
https://hub-mirror.c.163.com/
Live Restore Enabled: true
Product License: Community Engine
WARNING: No swap limit support
tags: worker, kubelet
kubelet 运行在每个 worker 节点上,接收 kube-apiserver 发送的请求,管理 Pod 容器,执行交互式命令,如 exec、run、logs 等。
kubelet 启动时自动向 kube-apiserver 注册节点信息,内置的 cadvisor 统计和监控节点的资源使用情况。
为确保安全,部署时关闭了 kubelet 的非安全 http 端口,对请求进行认证和授权,拒绝未授权的访问(如 apiserver、heapster 的请求)。
注意:如果没有特殊指明,本文档的所有操作均在 k8master-1 节点上执行,然后远程分发文件和执行命令。
参考 06-1.部署master节点.md。
将二进制文件拷贝到所有 work 节点:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
scp ./kubernetes/server/bin/kubelet root@${node_ip}:/app/k8s/bin/
ssh root@${node_ip} "chmod +x /app/k8s/bin/*"
done
参考 07-0.部署worker节点.md
cd /app/k8s/work
source /app/k8s/bin/environment.sh
# 设置集群信息
kubectl config set-cluster kubernetes --kubeconfig=kubelet-bootstrap.conf --server=https://192.168.3.140:8443 --certificate-authority=/etc/kubernetes/cert/ca.pem --embed-certs=true
# 设置用户信息,之前用的是证书,现在这里用的是token(这个token我们后面只赋予他创建csr请求的权限),kubelet的证书我们交给apiserver来管理
kubectl config set-credentials kubelet-bootstrap --kubeconfig=kubelet-bootstrap.conf --token=`sed 's#,.*##' /etc/kubernetes/token.csv`
# 设置上下文信息
kubectl config set-context kubernetes --kubeconfig=kubelet-bootstrap.conf --cluster=kubernetes --user=kubelet-bootstrap
# 设置默认上下文
kubectl config use-context kubernetes --kubeconfig=kubelet-bootstrap.conf
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
scp kubelet-bootstrap.conf root@${node_ip}:/etc/kubernetes/
done
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: create-csrs-for-bootstrapping
subjects:
- kind: Group
name: system:bootstrappers
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: system:node-bootstrapper
apiGroup: rbac.authorization.k8s.io
EOF
cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-apiserver-kubelet-apis-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kubelet-api-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: kubernetes
EOF
cat > kubelet.yaml <<EOF
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
enableServer: true
staticPodPath: /etc/kubernetes/manifests
syncFrequency: 1m
fileCheckFrequency: 20s
address: 0.0.0.0
port: 10250
readOnlyPort: 0
rotateCertificates: true
serverTLSBootstrap: true
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/cert/ca.pem
authorization:
mode: Webhook
healthzPort: 10248
healthzBindAddress: 0.0.0.0
clusterDomain: cluster.local
clusterDNS:
# 配置这个为预留,后期部署core-dns将使用这个地址
- 10.254.0.2
nodeStatusUpdateFrequency: 10s
nodeStatusReportFrequency: 1m
imageMinimumGCAge: 2m
imageGCHighThresholdPercent: 80
imageGCLowThresholdPercent: 75
volumeStatsAggPeriod: 1m
cgroupDriver: systemd
runtimeRequestTimeout: 2m
maxPods: 200
kubeAPIQPS: 5
kubeAPIBurst: 10
serializeImagePulls: false
evictionHard:
memory.available: "100Mi"
nodefs.available: "10%"
nodefs.inodesFree: "5%"
imagefs.available: "15%"
containerLogMaxSize: 10Mi
containerLogMaxFiles: 8
EOF
为各节点创建和分发 kubelet 配置文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
scp kubelet.yaml root@${node_ip}:/etc/kubernetes/kubelet-config.yaml
done
创建 kubelet systemd unit 文件模板:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > kubelet.service.template <<EOF
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service
[Service]
ExecStart=/app/k8s/bin/kubelet \\
--bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.conf \\
--cert-dir=/etc/kubernetes/cert \\
--config=/etc/kubernetes/kubelet-config.yaml \\
--container-runtime=docker \\
--network-plugin=cni \\
--cni-bin-dir=/app/k8s/bin/cni/ \\
--hostname-override=##NODE_NAME## \\
--kubeconfig=/etc/kubernetes/kubelet.conf \\
--pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.2 \\
--alsologtostderr=true \\
--logtostderr=false \\
--log-file=/var/log/kubelet.log \\
--log-file-max-size=100 \\
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
--hostname-override
选项,则 kube-proxy
也需要设置该选项,否则会出现找不到 Node 的情况;--bootstrap-kubeconfig
:指向 bootstrap kubeconfig 文件,kubelet 使用该文件中的用户名和 token 向 kube-apiserver 发送 TLS Bootstrapping 请求;--cert-dir
目录创建证书和私钥文件,然后写入 --kubeconfig
文件;--pod-infra-container-image
不使用 redhat 的 pod-infrastructure:latest
镜像,它不能回收容器的僵尸;为各节点创建和分发 kubelet systemd unit 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
sed -e "s/##NODE_NAME##/${node_name}/" kubelet.service.template > kubelet-${node_name}.service
scp kubelet-${node_name}.service root@${node_name}:/etc/systemd/system/kubelet.service
done
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kubelet && systemctl restart kubelet"
done
kubelet 启动时查找 --kubeletconfig
参数对应的文件是否存在,如果不存在则使用 --bootstrap-kubeconfig
指定的 kubeconfig 文件向 kube-apiserver 发送证书签名请求 (CSR)。
kube-apiserver 收到 CSR 请求后,对其中的 Token 进行认证,认证通过后将请求的 user 设置为 system:bootstrap:
,group 设置为 system:bootstrappers
,这一过程称为 Bootstrap Token Auth。
默认情况下,这个 user 和 group 没有创建 CSR 的权限,kubelet 启动失败,错误日志如下:
$ sudo journalctl -u kubelet -a |grep -A 2 'certificatesigningrequests'
May 26 12:13:41 k8master-1 kubelet[128468]: I0526 12:13:41.798230 128468 certificate_manager.go:366] Rotating certificates
May 26 12:13:41 k8master-1 kubelet[128468]: E0526 12:13:41.801997 128468 certificate_manager.go:385] Failed while requesting a signed certificate from the master: cannot cre
ate certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:bootstrap:82jfrm" cannot create resource "certificatesigningrequests" i
n API group "certificates.k8s.io" at the cluster scope
May 26 12:13:42 k8master-1 kubelet[128468]: E0526 12:13:42.044828 128468 kubelet.go:2244] node "k8master-1" not found
May 26 12:13:42 k8master-1 kubelet[128468]: E0526 12:13:42.078658 128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Unauthor
ized
May 26 12:13:42 k8master-1 kubelet[128468]: E0526 12:13:42.079873 128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Unauthorize
d
May 26 12:13:42 k8master-1 kubelet[128468]: E0526 12:13:42.082683 128468 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Unau
thorized
May 26 12:13:42 k8master-1 kubelet[128468]: E0526 12:13:42.084473 128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unau
thorized
May 26 12:13:42 k8master-1 kubelet[128468]: E0526 12:13:42.088466 128468 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: U
nauthorized
解决办法是:创建一个 clusterrolebinding,将 group system:bootstrappers 和 clusterrole system:node-bootstrapper 绑定:
$ kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --group=system:bootstrappers
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kubelet/kubelet-plugins/volume/exec/"
ssh root@${node_ip} "/usr/sbin/swapoff -a"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kubelet && systemctl restart kubelet"
done
$ journalctl -u kubelet |tail
8月 15 12:16:49 k8master-1 kubelet[7807]: I0815 12:16:49.578598 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
8月 15 12:16:49 k8master-1 kubelet[7807]: I0815 12:16:49.578698 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
8月 15 12:16:50 k8master-1 kubelet[7807]: I0815 12:16:50.205871 7807 mount_linux.go:214] Detected OS with systemd
8月 15 12:16:50 k8master-1 kubelet[7807]: I0815 12:16:50.205939 7807 server.go:408] Version: v1.11.2
8月 15 12:16:50 k8master-1 kubelet[7807]: I0815 12:16:50.206013 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
8月 15 12:16:50 k8master-1 kubelet[7807]: I0815 12:16:50.206101 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletServerCertificate:true RotateKubeletClientCertificate:true]}
8月 15 12:16:50 k8master-1 kubelet[7807]: I0815 12:16:50.206217 7807 plugins.go:97] No cloud provider specified.
8月 15 12:16:50 k8master-1 kubelet[7807]: I0815 12:16:50.206237 7807 server.go:524] No cloud provider specified: "" from the config file: ""
8月 15 12:16:50 k8master-1 kubelet[7807]: I0815 12:16:50.206264 7807 bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
8月 15 12:16:50 k8master-1 kubelet[7807]: I0815 12:16:50.208628 7807 bootstrap.go:86] No valid private key and/or certificate found, reusing existing private key or creating a new one
kubelet 启动后使用 --bootstrap-kubeconfig 向 kube-apiserver 发送 CSR 请求,当这个 CSR 被 approve 后,kube-controller-manager 为 kubelet 创建 TLS 客户端证书、私钥和 --kubeletconfig 文件。
注意:kube-controller-manager 需要配置 --cluster-signing-cert-file
和 --cluster-signing-key-file
参数,才会为 TLS Bootstrap 创建证书和私钥。
$ kubectl get csr
NAME AGE REQUESTOR CONDITION
csr-5f4vh 31s system:bootstrap:82jfrm Pending
csr-5rw7s 29s system:bootstrap:b1f7np Pending
csr-m29fm 31s system:bootstrap:3gzd53 Pending
$ kubectl get nodes
No resources found.
创建三个 ClusterRoleBinding,分别用于自动 approve client、renew client、renew server 证书:
cd /app/k8s/work
cat > csr-crb.yaml <<EOF
# Approve all CSRs for the group "system:bootstrappers"
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: auto-approve-csrs-for-group
subjects:
- kind: Group
name: system:bootstrappers
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
apiGroup: rbac.authorization.k8s.io
---
# To let a node of the group "system:nodes" renew its own credentials
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: node-client-cert-renewal
subjects:
- kind: Group
name: system:nodes
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
apiGroup: rbac.authorization.k8s.io
---
# A ClusterRole which instructs the CSR approver to approve a node requesting a
# serving cert matching its client cert.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: approve-node-server-renewal-csr
rules:
- apiGroups: ["certificates.k8s.io"]
resources: ["certificatesigningrequests/selfnodeserver"]
verbs: ["create"]
---
# To let a node of the group "system:nodes" renew its own server credentials
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: node-server-cert-renewal
subjects:
- kind: Group
name: system:nodes
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: approve-node-server-renewal-csr
apiGroup: rbac.authorization.k8s.io
EOF
kubectl apply -f csr-crb.yaml
等待一段时间(1-10 分钟),三个节点的 CSR 都被自动 approved:
$ kubectl get csr
NAME AGE REQUESTOR CONDITION
csr-729m9 7m35s kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q6gvdu <none> Approved,Issued
csr-d2wwx 3m38s kubernetes.io/kubelet-serving system:node:k8worker-2 <none> Pending
csr-hzv2r 7m40s kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:q6gvdu <none> Approved,Issued
csr-kckvz 3m40s kubernetes.io/kubelet-serving system:node:k8worker-2 <none> Pending
所有节点均 ready:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8worker-1 Ready <none> 4h21m v1.22.1
k8worker-2 Ready <none> 4m15s v1.22.1
kube-controller-manager 为各 node 生成了 kubeconfig 文件和公私钥:
$ ls -l /etc/kubernetes/kubelet.kubeconfig
-rw------- 1 root root 2306 May 26 12:17 /etc/kubernetes/kubelet.kubeconfig
$ ls -l /etc/kubernetes/cert/|grep kubelet
-rw------- 1 root root 1224 Sep 15 11:10 /etc/kubernetes/cert/kubelet-client-2021-09-15-11-10-29.pem
lrwxrwxrwx 1 root root 59 Sep 15 11:10 /etc/kubernetes/cert/kubelet-client-current.pem -> /etc/kubernetes/cert/kubelet-client-2021-09-15-11-10-29.pem
-rw------- 1 root root 1261 Sep 15 11:24 /etc/kubernetes/cert/kubelet-server-2021-09-15-11-24-11.pem
lrwxrwxrwx 1 root root 59 Sep 15 11:24 /etc/kubernetes/cert/kubelet-server-current.pem -> /etc/kubernetes/cert/kubelet-server-2021-09-15-11-24-11.pem
基于安全性考虑,CSR approving controllers 不会自动 approve kubelet server 证书签名请求,需要手动 approve:
$ kubectl certificate approve csr-5r7j7
certificatesigningrequest.certificates.k8s.io/csr-5r7j7 approved
$ kubectl certificate approve csr-c7z56
certificatesigningrequest.certificates.k8s.io/csr-c7z56 approved
$ kubectl certificate approve csr-j55lh
certificatesigningrequest.certificates.k8s.io/csr-j55lh approved
$ ls -l /etc/kubernetes/cert/kubelet-*
-rw------- 1 root root 1224 Sep 15 11:10 /etc/kubernetes/cert/kubelet-client-2021-09-15-11-10-29.pem
lrwxrwxrwx 1 root root 59 Sep 15 11:10 /etc/kubernetes/cert/kubelet-client-current.pem -> /etc/kubernetes/cert/kubelet-client-2021-09-15-11-10-29.pem
-rw------- 1 root root 1261 Sep 15 11:24 /etc/kubernetes/cert/kubelet-server-2021-09-15-11-24-11.pem
lrwxrwxrwx 1 root root 59 Sep 15 11:24 /etc/kubernetes/cert/kubelet-server-current.pem -> /etc/kubernetes/cert/kubelet-server-2021-09-15-11-24-11.pem
m
kubelet 启动后监听多个端口,用于接收 kube-apiserver 或其它客户端发送的请求:
$ sudo netstat -lnpt|grep kubelet
tcp 0 0 127.0.0.1:40187 0.0.0.0:* LISTEN 755/kubelet
tcp6 0 0 :::10248 :::* LISTEN 755/kubelet
tcp6 0 0 :::10250 :::* LISTEN 755/kubelet
--cadvisor-port
参数(默认 4194 端口),不支持访问 cAdvisor UI & API。例如执行 kubectl exec -it nginx-ds-5rmws -- sh
命令时,kube-apiserver 会向 kubelet 发送如下请求:
POST /exec/default/nginx-ds-5rmws/my-nginx?command=sh&input=1&output=1&tty=1
kubelet 接收 10250 端口的 https 请求,可以访问如下资源:
详情参考:https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/server/server.go#L434:3
由于关闭了匿名认证,同时开启了 webhook 授权,所有访问 10250 端口 https API 的请求都需要被认证和授权。
预定义的 ClusterRole system:kubelet-api-admin 授予访问 kubelet 所有 API 的权限(kube-apiserver 使用的 kubernetes 证书 User 授予了该权限):
$ kubectl describe clusterrole system:kubelet-api-admin
Name: system:kubelet-api-admin
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate=true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
nodes [] [] [get list watch proxy]
nodes/log [] [] [*]
nodes/metrics [] [] [*]
nodes/proxy [] [] [*]
nodes/spec [] [] [*]
nodes/stats [] [] [*]
kubelet 配置了如下认证参数:
同时配置了如下授权参数:
kubelet 收到请求后,使用 clientCAFile 对证书签名进行认证,或者查询 bearer token 是否有效。如果两者都没通过,则拒绝请求,提示 Unauthorized:
$ curl -s --cacert /etc/kubernetes/cert/ca.pem https://192.168.3.147:10250/metrics
Unauthorized
$ curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer 123456" https://192.168.3.147:10250/metrics
Unauthorized
通过认证后,kubelet 使用 SubjectAccessReview API 向 kube-apiserver 发送请求,查询证书或 token 对应的 user、group 是否有操作资源的权限(RBAC);
$ # 权限不足的证书;
$ curl -s --cacert /etc/kubernetes/cert/ca.pem --cert /etc/kubernetes/cert/kube-controller-manager.pem --key /etc/kubernetes/cert/kube-controller-manager-key.pem https://192.168.3.147:10250/metrics
Forbidden (user=system:kube-controller-manager, verb=get, resource=nodes, subresource=metrics)
$ # 使用部署 kubectl 命令行工具时创建的、具有最高权限的 admin 证书;
$ curl -s --cacert /etc/kubernetes/cert/ca.pem --cert /app/k8s/work/admin.pem --key /app/k8s/work/admin-key.pem https://192.168.3.147:10250/metrics|head
# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
--cacert
、--cert
、--key
的参数值必须是文件路径,如上面的 ./admin.pem
不能省略 ./
,否则返回 401 Unauthorized
;创建一个 ServiceAccount,将它和 ClusterRole system:kubelet-api-admin 绑定,从而具有调用 kubelet API 的权限:
kubectl create sa kubelet-api-test
kubectl create clusterrolebinding kubelet-api-test --clusterrole=system:kubelet-api-admin --serviceaccount=default:kubelet-api-test
SECRET=$(kubectl get secrets | grep kubelet-api-test | awk '{print $1}')
TOKEN=$(kubectl describe secret ${SECRET} | grep -E '^token' | awk '{print $2}')
echo ${TOKEN}
$ curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer ${TOKEN}" https://192.168.3.147:10250/metrics|head
# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
cadvisor 是内嵌在 kubelet 二进制中的,统计所在节点各容器的资源(CPU、内存、磁盘、网卡)使用情况的服务。
浏览器访问 https://192.168.3.147:10250/metrics 和https://192.168.3.147:10250/metrics/cadvisor 分别返回 kubelet 和 cadvisor 的 metrics。
注意:
或者参考代码中的注释。
tags: worker, kube-proxy
kube-proxy 运行在所有 worker 节点上,它监听 apiserver 中 service 和 endpoint 的变化情况,创建路由规则以提供服务 IP 和负载均衡功能。
本文档讲解使用 ipvs 模式的 kube-proxy 的部署过程。
注意:如果没有特殊指明,本文档的所有操作均在 k8master-1 节点上执行,然后远程分发文件和执行命令。
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_name in ${WORK_IPS[@]}
do
echo ">>> ${node_name}"
scp /app/k8s/work/kubernetes/server/bin/kube-proxy root@${node_name}:/app/k8s/bin/
done
参考 06-1.部署master节点.md。
参考 07-0.部署worker节点.md。
各节点需要安装 ipvsadm
和 ipset
命令,加载 ip_vs
内核模块。
分发 kubeconfig 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_name in ${WORK_IPS[@]}
do
echo ">>> ${node_name}"
scp kube-proxy*.pem root@${node_name}:/etc/kubernetes/
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
kubectl config set-cluster kubernetes \
--certificate-authority=/app/k8s/work/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-proxy.kubeconfig
kubectl config set-credentials kube-proxy \
--client-certificate=kube-proxy.pem \
--client-key=kube-proxy-key.pem \
--embed-certs=true \
--kubeconfig=kube-proxy.kubeconfig
kubectl config set-context default \
--cluster=kubernetes \
--user=kube-proxy \
--kubeconfig=kube-proxy.kubeconfig
kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig
--embed-certs=true
:将 ca.pem 和 admin.pem 证书内容嵌入到生成的 kubectl-proxy.kubeconfig 文件中(不加时,写入的是证书文件路径);分发 kubeconfig 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_name in ${WORK_IPS[@]}
do
echo ">>> ${node_name}"
scp kube-proxy.kubeconfig root@${node_name}:/etc/kubernetes/
done
从 v1.10 开始,kube-proxy 部分参数可以配置文件中配置。可以使用 --write-config-to
选项生成该配置文件,或者参考 源代码的注释。
创建 kube-proxy config 文件模板:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > kube-proxy-config.yaml.template <<EOF
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
bindAddress: ##NODE_IP##
healthzBindAddress: ##NODE_IP##:10256
metricsBindAddress: ##NODE_IP##:10249
bindAddressHardFail: true
enableProfiling: false
clusterCIDR: ${CLUSTER_CIDR}
hostnameOverride: ##NODE_NAME##
clientConnection:
kubeconfig: /etc/kubernetes/kube-proxy.kubeconfig
qps: 100
burst: 200
mode: "ipvs"
EOF
bindAddress
: 监听地址;clientConnection.kubeconfig
: 连接 apiserver 的 kubeconfig 文件;clusterCIDR
: kube-proxy 根据 --cluster-cidr
判断集群内部和外部流量,指定 --cluster-cidr
或 --masquerade-all
选项后 kube-proxy 才会对访问 Service IP 的请求做 SNAT;hostnameOverride
: 参数值必须与 kubelet 的值一致,否则 kube-proxy 启动后会找不到该 Node,从而不会创建任何 ipvs 规则;mode
: 使用 ipvs 模式;为各节点创建和分发 kube-proxy 配置文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for (( i=0; i < 9; i++ ))
do
echo ">>> ${NODE_NAMES[i]}"
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-proxy-config.yaml.template > kube-proxy-config-${NODE_NAMES[i]}.yaml.template
scp kube-proxy-config-${NODE_NAMES[i]}.yaml.template root@${NODE_NAMES[i]}:/etc/kubernetes/kube-proxy-config.yaml
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
cat > kube-proxy.service <<EOF
[Unit]
[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
ExecStart=/app/k8s/bin/kube-proxy \\
--config=/etc/kubernetes/kube-proxy-config.yaml \\
--logtostderr=false \\
--log-file=/var/log/kube-proxy.log \\
--v=2
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
分发 kube-proxy systemd unit 文件:
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_name in ${WORK_IPS[@]}
do
echo ">>> ${node_name}"
scp kube-proxy.service root@${node_name}:/etc/systemd/system/
done
cd /app/k8s/work
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-proxy"
ssh root@${node_ip} "modprobe ip_vs_rr"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-proxy && systemctl restart kube-proxy"
done
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status kube-proxy|grep Active"
done
确保状态为 active (running)
,否则查看日志,确认原因:
journalctl -u kube-proxy
$ sudo netstat -lnpt|grep kube-prox
tcp 0 0 192.168.3.147:10249 0.0.0.0:* LISTEN 28606/kube-proxy
tcp 0 0 0.0.0.0:38446 0.0.0.0:* LISTEN 28606/kube-proxy
tcp 0 0 192.168.3.147:10256 0.0.0.0:* LISTEN 28606/kube-proxy
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "/usr/sbin/ipvsadm -ln"
done
预期输出:
>>> 192.168.3.148
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.255.0.1:443 rr
-> 192.168.3.144:6443 Masq 1 0 0
-> 192.168.3.145:6443 Masq 1 0 0
-> 192.168.3.146:6443 Masq 1 0 0
可见所有通过 https 访问 K8S SVC kubernetes 的请求都转发到 kube-apiserver 节点的 6443 端口;
tags: verify
本文档使用 daemonset 验证 master 和 worker 节点是否工作正常。
注意:如果没有特殊指明,本文档的所有操作均在 k8master-1 节点上执行,然后远程分发文件和执行命令。
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8worker-1 Ready <none> 5h16m v1.22.1
k8worker-2 Ready <none> 5h20m v1.22.1
ka-1 Ready <none> 5h20m v1.22.1
ka-2 Ready <none> 5h20m v1.22.1
ka-3 Ready <none> 5h20m v1.22.1
都为 Ready 时正常。
cd /app/k8s/work
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: my-pod1
labels:
app: nginx
spec:
nodeName: "ka-1"
containers:
- name: nginx
image: nginx:1.14.2
ports:
- name: web
containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: my-pod1
spec:
type: ClusterIP
selector:
app: nginx
ports:
- name: web
port: 80
targetPort: 80
EOF
[root@k8master-1 work]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
my-pod1 1/1 Running 0 50s 172.1.1.2 ka-1 <none> <none>
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "ping -c 1 172.1.1.2"
done
[root@k8master-1 work]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.254.0.1 <none> 443/TCP 24h
my-pod1 ClusterIP 10.254.250.113 <none> 80/TCP 2m55s
可见:
在所有 Node 上 curl Service IP:
source /app/k8s/bin/environment.sh
for node_ip in ${WORK_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "curl -s 10.254.250.113"
done
预期输出 nginx 欢迎页面内容。
插件是集群的附件组件,丰富和完善了集群的功能。
注意:
kubectl taint node 192.168.3.144 node-role.kubernetes.io/master:NoSchedule
kubectl taint node 192.168.3.145 node-role.kubernetes.io/master:NoSchedule
kubectl taint node 192.168.3.146 node-role.kubernetes.io/master:NoSchedule
kubectl label nodes 192.168.3.144 node-role.kubernetes.io/master=
kubectl label nodes 192.168.3.144 node-role.kubernetes.io/control-plane=
kubectl label nodes 192.168.3.145 node-role.kubernetes.io/master=
kubectl label nodes 192.168.3.145 node-role.kubernetes.io/control-plane=
kubectl label nodes 192.168.3.146 node-role.kubernetes.io/master=
kubectl label nodes 192.168.3.146 node-role.kubernetes.io/control-plane=
tags: addons, dns, coredns
注意:
coredns 目录是 cluster/addons/dns
:
cd /app/k8s/work/kubernetes/cluster/addons/dns/coredns
cat > coredns.yaml <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
rules:
- apiGroups:
- ""
resources:
- endpoints
- services
- pods
- namespaces
verbs:
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:coredns
subjects:
- kind: ServiceAccount
name: coredns
namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: coredns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/name: "CoreDNS"
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
spec:
priorityClassName: system-cluster-critical
serviceAccountName: coredns
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
nodeSelector:
kubernetes.io/os: linux
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values: ["kube-dns"]
topologyKey: kubernetes.io/hostname
containers:
- name: coredns
image: coredns/coredns:1.7.1
imagePullPolicy: IfNotPresent
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
args: [ "-conf", "/etc/coredns/Corefile" ]
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
readOnly: true
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
dnsPolicy: Default
volumes:
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
---
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
annotations:
prometheus.io/port: "9153"
prometheus.io/scrape: "true"
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "CoreDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.254.0.2
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
- name: metrics
port: 9153
protocol: TCP
EOF
kubectl create -f coredns.yaml
$ kubectl get all -n kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-f474c9d79-tp7c2 1/1 Running 0 4m3s
pod/kube-flannel-ds-lchjr 1/1 Running 0 2d4h
pod/kube-flannel-ds-x8xmm 1/1 Running 0 2d23h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.254.0.2 <none> 53/UDP,53/TCP,9153/TCP 4m4s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kube-flannel-ds 2 2 1 2 1 <none> 2d23h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/coredns 1/1 1 1 4m4s
NAME DESIRED CURRENT READY AGE
replicaset.apps/coredns-f474c9d79 1 1 1 4m3s
新建一个 Pod+service
cd /app/k8s/work
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: my-nginx
labels:
app: nginx
spec:
nodeName: "k8worker-1"
containers:
- name: nginx
image: nginx:1.14.2
ports:
- name: web
containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: my-nginx
spec:
type: ClusterIP
selector:
app: nginx
ports:
- name: web
port: 80
targetPort: 80
EOF
[root@k8master-1 work]# kubectl get services --all-namespaces |grep my-nginx
default my-nginx ClusterIP 10.254.32.90 80/TCP 3m30s
创建另一个 Pod,查看 `/etc/resolv.conf` 是否包含 `kubelet` 配置的 `--cluster-dns` 和 `--cluster-domain`,是否能够将服务 `my-nginx` 解析到上面显示的 Cluster IP `10.254.242.255`
```bash
cd /app/k8s/work
cat <
[root@k8worker-1 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
my-nginx1 1/1 Running 0 62s
my-pod2222 1/1 Running 0 2d22h
nginx-6799fc88d8-xjk85 1/1 Running 0 4m52s
$ kubectl -it exec my-nginx1 bash
root@my-nginx1:/# cat /etc/resolv.conf
nameserver 10.254.0.2
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
tags: addons, dashboard
注意:
将下载的 kubernetes-server-linux-amd64.tar.gz 解压后,再解压其中的 kubernetes-src.tar.gz 文件
cd /app/k8s/work/kubernetes/
tar -xzvf kubernetes-src.tar.gz
dashboard 对应的目录是:cluster/addons/dashboard
:
cd /app/k8s/work/kubernetes/cluster/addons/dashboard
$ kubectl apply -f dashboard.yaml
[root@k8master-1 dashboard]# kubectl get deployment kubernetes-dashboard -n kubernetes-dashboard
NAME READY UP-TO-DATE AVAILABLE AGE
kubernetes-dashboard 1/1 1 1 2m46s
[root@k8master-1 dashboard]# kubectl --namespace kubernetes-dashboard get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
dashboard-metrics-scraper-7b4c85dd89-v8mql 1/1 Running 0 3m6s 172.1.14.4 k8worker-1 <none> <none>
kubernetes-dashboard-7fff8584c9-qf4wr 1/1 Running 0 3m6s 172.1.69.11 k8worker-2 <none> <none>
[root@k8master-1 dashboard]# kubectl get services kubernetes-dashboard -n kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard NodePort 10.254.212.203 <none> 443:49935/TCP 9s
$ kubectl exec --namespace kubernetes-dashboard -it kubernetes-dashboard-7fff8584c9-qf4wr -- /dashboard --help
2021/09/26 07:55:43 Starting overwatch
Usage of /dashboard:
--alsologtostderr log to standard error as well as files
--api-log-level string Level of API request logging. Should be one of 'INFO|NONE|DEBUG'. (default "INFO")
--apiserver-host string The address of the Kubernetes Apiserver to connect to in the format of protocol://address:port, e.g., http://localhost:8080. If not specified, the assumption is that the binary runs inside a Kubernetes cluster and local discovery is attempted.
--authentication-mode strings Enables authentication options that will be reflected on login screen. Supported values: token, basic. Note that basic option should only be used if apiserver has '--authorization-mode=ABAC' and '--basic-auth-file' flags set. (default [token])
--auto-generate-certificates When set to true, Dashboard will automatically generate certificates used to serve HTTPS. (default false)
--bind-address ip The IP address on which to serve the --port (set to 0.0.0.0 for all interfaces). (default 0.0.0.0)
--default-cert-dir string Directory path containing '--tls-cert-file' and '--tls-key-file' files. Used also when auto-generating certificates flag is set. (default "/certs")
--disable-settings-authorizer When enabled, Dashboard settings page will not require user to be logged in and authorized to access settings page. (default false)
--enable-insecure-login When enabled, Dashboard login view will also be shown when Dashboard is not served over HTTPS. (default false)
--enable-skip-login When enabled, the skip button on the login page will be shown. (default false)
--heapster-host string The address of the Heapster Apiserver to connect to in the format of protocol://address:port, e.g., http://localhost:8082. If not specified, the assumption is that the binary runs inside a Kubernetes cluster and service proxy will be used.
--insecure-bind-address ip The IP address on which to serve the --insecure-port (set to 127.0.0.1 for all interfaces). (default 127.0.0.1)
--insecure-port int The port to listen to for incoming HTTP requests. (default 9090)
--kubeconfig string Path to kubeconfig file with authorization and master location information.
--locale-config string File containing the configuration of locales (default "./locale_conf.json")
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--logtostderr log to standard error instead of files
--metric-client-check-period int Time in seconds that defines how often configured metric client health check should be run. (default 30)
--metrics-provider string Select provider type for metrics. 'none' will not check metrics. (default "sidecar")
--namespace string When non-default namespace is used, create encryption key in the specified namespace. (default "kube-system")
--port int The secure port to listen to for incoming HTTPS requests. (default 8443)
--sidecar-host string The address of the Sidecar Apiserver to connect to in the format of protocol://address:port, e.g., http://localhost:8000. If not specified, the assumption is that the binary runs inside a Kubernetes cluster and service proxy will be used.
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
--system-banner string When non-empty displays message to Dashboard users. Accepts simple HTML tags.
--system-banner-severity string Severity of system banner. Should be one of 'INFO|WARNING|ERROR'. (default "INFO")
--tls-cert-file string File containing the default x509 Certificate for HTTPS.
--tls-key-file string File containing the default x509 private key matching --tls-cert-file.
--token-ttl int Expiration time (in seconds) of JWE tokens generated by dashboard. '0' never expires (default 900)
-v, --v Level log level for V logs
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
pflag: help requested
command terminated with exit code 2
dashboard 的 --authentication-mode
支持 token、basic,默认为 token。如果使用 basic,则 kube-apiserver 必须配置 --authorization-mode=ABAC
和 --basic-auth-file
参数。
从 1.7 开始,dashboard 只允许通过 https 访问,如果使用 kube proxy 则必须监听 localhost 或 127.0.0.1。对于 NodePort 没有这个限制,但是仅建议在开发环境中使用。
对于不满足这些条件的登录访问,在登录成功后浏览器不跳转,始终停在登录界面。
https://NodeIP:NodePort
地址访问 dashboard;dashboard 默认只支持 token 认证(不支持 client 证书认证),所以如果使用 Kubeconfig 文件,需要将 token 写入到该文件。
kubectl create sa dashboard-admin -n kube-system
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
ADMIN_SECRET=$(kubectl get secrets -n kube-system | grep dashboard-admin | awk '{print $1}')
DASHBOARD_LOGIN_TOKEN=$(kubectl describe secret -n kube-system ${ADMIN_SECRET} | grep -E '^token' | awk '{print $2}')
echo ${DASHBOARD_LOGIN_TOKEN}
使用输出的 token 登录 Dashboard。
在master上使用:
kubectl proxy --address=0.0.0.0 --port=8086 --accept-hosts='^*$'
http://192.168.3.144:8086/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
source /app/k8s/bin/environment.sh
# 设置集群参数
kubectl config set-cluster kubernetes \
--certificate-authority=/etc/kubernetes/cert/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=dashboard.kubeconfig
# 设置客户端认证参数,使用上面创建的 Token
kubectl config set-credentials dashboard_user \
--token=${DASHBOARD_LOGIN_TOKEN} \
--kubeconfig=dashboard.kubeconfig
# 设置上下文参数
kubectl config set-context default \
--cluster=kubernetes \
--user=dashboard_user \
--kubeconfig=dashboard.kubeconfig
# 设置默认上下文
kubectl config use-context default --kubeconfig=dashboard.kubeconfig
用生成的 dashboard.kubeconfig 登录 Dashboard。
由于缺少 Heapster 插件,当前 dashboard 不能展示 Pod、Nodes 的 CPU、内存等统计数据和图表。