使用Kubeadm来搭建k8s-v1.18.2(包含所有错误集锦分析)

k8s的搭建
使用 Kubeadm来搭建master集群,⽬前所安装的版本是 v1.18.2
欢迎运维萌新大佬等进群,涵盖业务运维、应用运维、系统运维、网络运维、数据库运维、桌面运维、运维开发等,地区不限, 新群建立中,欢迎各位进群交流业界知识~
qq群号: 1027981908
使用Kubeadm来搭建k8s-v1.18.2(包含所有错误集锦分析)_第1张图片

系统环境

系统版本
内核
在三台机器上分别设置host,命令如下:

 hostnamectl set-hostname k8s-master
 hostnamectl set-hostname k8s-node1
 hostnamectl set-hostname k8s-node2

在三个节点机器的/etc/hosts下加入如下内容(对应自己机器ip)
172.20.0.15 k8s-master
172.20.0.12 k8s-node1
172.20.0.43 k8s-node2

安装细节

以下都在三个节点上跑

  • 禁用防火墙
 systemctl stop firewalld
 systemctl disable firewalld
  • selinux禁用

setenforce 0
cat /etc/selinux/config

SELINUX=disabled
  • 安装docker
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum -y install docker-ce-18.09.9
  • 安装 Kubeadm
cat < /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
  • 安装 kubeadm、kubelet、kubectl
    其中 --disableexcludes 禁掉除了 kubernetes 之外的别的仓库
yum install -y kubelet-1.18.2 kubeadm-1.18.2 kubectl-1.18.2 --disableexcludes=kubernetes
kubeadm version
  • 启动docker
systemctl start docker
  • 改docker的启动方式
cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF
systemctl enable docker.service
systemctl daemon-reload
systemctl restart docker
  • 改下防火墙forward链路规则
    ExecStartPost=/usr/sbin/iptables -P FORWARD ACCEPT
    使用Kubeadm来搭建k8s-v1.18.2(包含所有错误集锦分析)_第2张图片文件在/usr/lib/systemd/system/docker.service
    懒人请执行如下命令:
sed -i '20i ExecStartPost=/usr/sbin/iptables -P FORWARD ACCEPT' /usr/lib/systemd/system/docker.service
systemctl daemon-reload
systemctl restart docker

改完后检查一下,

iptables -L

在这里插入图片描述
这里我打算留坑,有些必要设置不做,跳坑的直接下滑跳至跳坑安装部分

埋雷

以下在master上执行

  • 提前下好镜像,首先启动docker,也可以改配置文件换国内镜像源(譬如阿里的),我这里打发时间,一个个拉取下来改tag;
    ps:以下是因为第一次安装时kubeadm及kubelet等版本我装的是v1.16.2的,因此镜像也是拉取1.16.2的;
  • List 一下需要拉取的镜像
 kubeadm config images list

k8s.gcr.io/kube-apiserver:v1.16.2
k8s.gcr.io/kube-controller-manager:v1.16.2
k8s.gcr.io/kube-scheduler:v1.16.2
k8s.gcr.io/kube-proxy:v1.16.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.3.15-0
k8s.gcr.io/coredns:1.6.2
一个个search一下看看国内镜像源下载路径
然后开始拉取镜像

docker pull aiotceo/kube-apiserver:v1.16.2
docker pull aiotceo/kube-scheduler:v1.16.2
docker pull aiotceo/kube-controller-manager:v1.16.2
docker pull aiotceo/kube-proxy:v1.16.2
docker pull tangxu/etcd:3.3.15-0
docker pull aiotceo/coredns:1.6.2
docker pull aiotceo/pause:3.1

docker tag aiotceo/kube-apiserver:v1.16.2 k8s.gcr.io/kube-apiserver:v1.16.2
docker tag aiotceo/kube-scheduler:v1.16.2 k8s.gcr.io/kube-scheduler:v1.16.2
docker tag aiotceo/kube-controller-manager:v1.16.2 k8s.gcr.io/kube-controller-manager:v1.16.2
docker tag aiotceo/kube-proxy:v1.16.2 k8s.gcr.io/kube-proxy:v1.16.2
docker tag tangxu/etcd:3.3.15-0 k8s.gcr.io/etcd:3.3.15-0
docker tag aiotceo/coredns:1.6.2 k8s.gcr.io/coredns:1.6.2
docker tag aiotceo/pause:3.1 k8s.gcr.io/pause:3.1

初始化集群,版本v1.16.2,这里是根据kubadm等版本来选,有些低版本会报错不支持;pod网段设置,master的ip设置:
这里有两种方式初始化集群,这里这个操作在master上执行,
第一种:
下图是参数启动,还可以使用配置文件:

kubeadm init --kubernetes-version=v1.16.2 --pod-network-cidr=172.20.0.0/16 --apiserver-advertise-address=172.20.0.15

第二种:

kubeadm config print init-defaults > xxxx.yaml
kubeadm init --config xxxx.yaml

这个自动生成的配置文件:

apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
 - groups:
 - system:bootstrappers:kubeadm:default-node-token
  token: hdp0kg.ab86i3ms07muvkaxbjcc3oui #这里建议更改,字母小写,格式是:小写字母与数字组合6位+.+16位的小写字母与数字组合;
  ttl: 24h0m0s
  usages:
 - signing
 - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 172.20.0.15 #api 的ip要更改,即master ip
  bindPort: 6443 #api的端口要更改
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: k8s-master #自动获取了
  taints:
 - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io #镜像地址,国外的,不想一个个tag,又不能科学上网的,可更换成阿里云的镜像源: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.16.2 #上述提供的国内镜像源无此版本对应镜像,选择替换上述镜像源的话,此版本最好用v1.18.2
networking:
  dnsDomain: cluster.k8stest #改不改都可以
  serviceSubnet: 10.10.0.0/16  
  podSubnet: 10.18.0.0/16 #pod网段,最好按自己需求设置一下,与上述网段最好不要重复
scheduler: {}
kubeadm init --kubernetes-version=v1.16.2 --pod-network-cidr=172.10.0.0/16 --apiserver-advertise-address=172.10.0.15 

错误集锦分析

  • 如果有下面这个警告,说明你docker启动方式没设置或者没搞好,docker与kubelet的启动方式要一致,这里说推荐的启动方式是systemd;
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: [preflight] Some fatal errors occurred:
  • 如下报错是端口占用(关掉占用服务即可,我这里是之前启过一次失败了)
[ERROR Port-10250]: Port 10250 is in use

这个错就是 kubelet和docker的启动方式不一样导致,更改方式见上下文操作

 failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

将docker的启动方式改回systemd,在文件/etc/docker/daemon.json 中编辑
"exec-opts": ["native.cgroupdriver=systemd"]
kubelet的启动方式在如下文件下更改
/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf bak/10-kubeadm.conf
如下是我写的自动安装脚本(已上传到csdn资源处)其中更改kubelet启动方式的地方,其中变量KUBESTART是kubelet的启动方式:

 counts=`grep -i "KUBELET_CGROUP_ARGS=–cgroup-driver=" /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf |
if [[ "$counts" != "0" ]] ; then
 sed -i s/–cgroup-driver=.*$/–cgroup-driver=$KUBESTART/g /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.con
else
 sed -i 5iEnvironment="KUBELET_CGROUP_ARGS=–cgroup-driver=$KUBESTART" /usr/lib/systemd/system/kubelet.service.d/1
fi

收动更改的话,就编辑上述文件,增加如下一行:
Environment="KUBELET_CGROUP_ARGS=–cgroup-driver=systemd"
同上要重载重启

k8s-master kubelet: W0710 15:05:16.718248   17552 docker_service.go:563] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth" 

systemctl enable kubelet
systemctl daemon-reload
systemctl restart docker

 - 下面这个类型的错误,是拉取国外镜像超时了,所以上述步骤的提前拉取镜像步骤很有必要,如果做了上述步骤还出错,检查一下tag的名字和镜像是否拉下来了

[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.16.2: output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

OK,再试一次
控制台输出信息如下,这时检查一下kubelet的启动日志,以及系统日志看看具体是什么原因,对症解决即可,一般是kubelet的配置文件有误,或者域名设置有误、或者是版本不对应、以及ipvs防火墙等原因造成,一般kubelet正常启动,k8s集群基本能正常初始化了。

[init] Using Kubernetes version: v1.16.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using ‘kubeadm config images pull’
[kubelet-start] Writing kubelet environment file with flags to file “/var/lib/kubelet/kubeadm-flags.env”
[kubelet-start] Writing kubelet configuration to file “/var/lib/kubelet/config.yaml”
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder “/etc/kubernetes/pki”
[certs] Generating “ca” certificate and key
[certs] Generating “apiserver” certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.20.0.15]
[certs] Generating “apiserver-kubelet-client” certificate and key
[certs] Generating “front-proxy-ca” certificate and key
[certs] Generating “front-proxy-client” certificate and key
[certs] Generating “etcd/ca” certificate and key
[certs] Generating “etcd/server” certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master localhost] and IPs [172.20.0.15 127.0.0.1 ::1]
[certs] Generating “etcd/peer” certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master localhost] and IPs [172.20.0.15 127.0.0.1 ::1]
[certs] Generating “etcd/healthcheck-client” certificate and key
[certs] Generating “apiserver-etcd-client” certificate and key
[certs] Generating “sa” key and public key
[kubeconfig] Using kubeconfig folder “/etc/kubernetes”
[kubeconfig] Writing “admin.conf” kubeconfig file
[kubeconfig] Writing “kubelet.conf” kubeconfig file
[kubeconfig] Writing “controller-manager.conf” kubeconfig file
[kubeconfig] Writing “scheduler.conf” kubeconfig file
[control-plane] Using manifest folder “/etc/kubernetes/manifests”
[control-plane] Creating static Pod manifest for “kube-apiserver”
[control-plane] Creating static Pod manifest for “kube-controller-manager”
[control-plane] Creating static Pod manifest for “kube-scheduler”
[etcd] Creating static Pod manifest for local etcd in “/etc/kubernetes/manifests”
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory “/etc/kubernetes/manifests”. This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn’t running or healthy.
[kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn’t running or healthy.

再执行一次试试
```bash
kubeadm init --kubernetes-version=v1.16.2 --pod-network-cidr=172.10.0.0/16 --apiserver-advertise-address=172.10.0.15 --control-plane-endpoint="172.10.0.15:6443"

日志如下,这里是之前运行过,要reset才能重新初始化;

[init] Using Kubernetes version: v1.16.2
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
	[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
kubeadm reset 
kubeadm init --kubernetes-version=v1.16.2 --pod-network-cidr=172.10.0.0/16 --apiserver-advertise-address=172.10.0.15 --control-plane-endpoint="172.10.0.15:6443"

k8s-master kubelet: W0710 15:05:16.718248 17552 docker_service.go:563] Hairpin mode set to “promiscuous-bridge” but kubenet is not enabled, falling back to “hairpin-veth”

systemctl enable kubelet

journalctl -u kubelet
journalctl -xue kubelet
查看kubelet日志
系统日志: /var/log/message
修改kubelet的启动方式:
KUBELET_CGROUP_ARGS=–cgroup-driver=systemd
在cat /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf里修改添加

跳坑安装

这个时候去看日志会发现有很多报错,之前留下的坑,现在补上;
使用Kubeadm来搭建k8s-v1.18.2(包含所有错误集锦分析)_第3张图片在三节点上运行
- 启用 iptables 上桥接数据转发过滤功能
由于开启内核 ipv4 转发需要加载 br_netfilter 模块,所以加载下该模块:br_netfilter
首先加载模块临时和生效两种,修改配置文件之后重启生效;这边两种都加上,避免重启;
先看看现在配置是否有该模块
在这里插入图片描述
下面命令不执行也可以了,如果上图已加载

modprobe br_netfilter

开启内核ipv4的话,方便后续重建移除,建议不要直接在/etc/sysctl.conf下修改
让其永久生效,创建/etc/sysctl.d/k8s.conf 文件,添加如下内容:

net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1

导入

sysctl -p /etc/sysctl.d/k8s.conf

接下来就是k8s的网络实现模式,这里选择的是ipvs(IP virtual server);刚刚日志报错就是这里没有操作;
使用Kubeadm来搭建k8s-v1.18.2(包含所有错误集锦分析)_第4张图片
当前只加载了nf_conntrack,这里单独写一个文件去加载,放到如下配置的路径下,使其后续机器重启了也能自动加载

cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF

修改一下权限

chmod 755 /etc/sysconfig/modules/ipvs.modules

先执行一次即刻生效

bash /etc/sysconfig/modules/ipvs.modules

再检查一下是否加载了

 lsmod | grep -e ip_vs -e nf_conntrack

使用Kubeadm来搭建k8s-v1.18.2(包含所有错误集锦分析)_第5张图片

在master上跑

 kubeadm config print init-defaults > kubeadm.yaml
  • 初始化配置文件 - kubeadm.yaml
    对应参数解析上述埋雷安装处以作解析
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
 - groups:
 - system:bootstrappers:kubeadm:default-node-token
  token: hdp0kg.ab86i3ms07muvkax
  ttl: 24h0m0s
  usages:
 - signing
 - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 172.20.0.15
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: k8s-master
  taints:
 - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.18.2
networking:
  dnsDomain: cluster.k8stest
  serviceSubnet: 10.10.0.0/12
  podSubnet: 10.18.0.0/16
scheduler: {}
  • 初始化集群
 kubeadm init --config kubeadm.yaml
  • 配置环境变量
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
  • 测试
  • kubectl get nodes
    此时已经可以看到master节点,此时还需要安装网络规划服务
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f kube-flannel.yml

这个过程时间较长,可以通过如下命令查看建立过程详细信息

kubectl get pods -A -o wide
找到flannel对应的pod id
kubectl describe pods xxxxxx -n kube-system

常见错误有拉取镜像失败超时,systemd1服务连接超时(查看系统错误日志,解决掉错误信息,一般是开机自启的服务出错导致,k8s导致的,绝大多数是因为kubelet导致,可暂时将kubelet服务手动停止,再重启系统),内存不足,磁盘空间不足等等;
其他节点加入集群,执行如下命令(token每台机器生产都不一样的,不能直接复制粘贴)
kubeadm join 172.20.0.15:6443 --token hdp0kg.ab86i3ms07muvkax --discovery-token-ca-cert-hash sha256:c58c4a30124c7c74140ade1bbe1e460552c20ddee07e79e6a197f660e9617111
不记得tocken 了就执行如下命令:
kubeadm token list

你可能感兴趣的:(k8s)