k8s集群安装过程中的相关问题和解决

前言

断点续传模式~

记录

我用的是ubuntu16.04,首先要做的是配置apt源,这里推荐阿里云的源地址 https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg,centos的在这儿https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64,用梯子的或者海外的用谷歌源是最方便的,后面可以省很多事。

安装docker和kubeadm、kubectl、kubelet基本没啥问题,照着来就行,安装完工具就需要下载镜像了。

由于镜像在谷歌仓库(k8s.grc.io),config image pull会报以下错误:

[preflight/images] You can also perform this action in beforehand using 'kubeadm config images pull'
[preflight] Some fatal errors occurred:
    [ERROR ImagePull]: failed to pull image [k8s.gcr.io/kube-apiserver-amd64:v1.11.3]: exit status

这个很明显是墙的缘故,手动从docker的国内托管站点下载:

docker pull mirrorgooglecontainers/kube-apiserver-amd64:v1.11.3
docker pull mirrorgooglecontainers/kube-controller-manager-amd64:v1.11.3
docker pull mirrorgooglecontainers/kube-scheduler-amd64:v1.11.3
docker pull mirrorgooglecontainers/kube-proxy-amd64:v1.11.3
docker pull mirrorgooglecontainers/pause:3.1
docker pull mirrorgooglecontainers/etcd-amd64:3.2.18

相关的镜像就上面几个,但是下下来之后仍然无法kubeadm init,依然会去谷歌仓库拉镜像,卡了很久才发现从docker仓库拉下来的镜像和谷歌仓库中的不一样,因此在kubeadm init时会认为这些镜像不存在,接着去谷歌仓库拉,docker images可以查看这些镜像现在的名字,所以需要执行如下命令:

docker tag docker.io/mirrorgooglecontainers/kube-proxy-amd64:v1.11.3 k8s.gcr.io/kube-proxy-amd64:v1.11.3
docker tag docker.io/mirrorgooglecontainers/kube-scheduler-amd64:v1.11.3 k8s.gcr.io/kube-scheduler-amd64:v1.11.3
docker tag docker.io/mirrorgooglecontainers/kube-apiserver-amd64:v1.11.3 k8s.gcr.io/kube-apiserver-amd64:v1.11.3
docker tag docker.io/mirrorgooglecontainers/kube-controller-manager-amd64:v1.11.3 k8s.gcr.io/kube-controller-manager-amd64:v1.11.3
docker tag docker.io/mirrorgooglecontainers/etcd-amd64:3.2.18 k8s.gcr.io/etcd-amd64:3.2.18
docker tag docker.io/mirrorgooglecontainers/pause:3.1 k8s.gcr.io/pause:3.1
docker tag docker.io/coredns/coredns:1.1.3 k8s.gcr.io/coredns:1.1.3

接着出现了CPU数量少于2(VM)和swap分区的问题,调整了CPU的数量,并禁用了swap分区,这两个问题解决,搜了一下为啥要禁止swap分区,大概就是不使用虚拟内存以提高性能,将实例紧密包装到尽可能接近百分之百的意思吧,接着继续执行kubeadm init,报错如下:

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.

开始以为是kubelet的版本和apiserver版本不兼容,于是又重新装了一遍工具,发现端口被占用,这是因为之前master启动时占了6443端口,kubeadm reset后解决,但是中间又很二的执行了一个改ip的操作-_-!,依然报上面的错误,在日志里看到重复报以下错误:

1006 02:44:41.050125   19805 reflector.go:123] k8s.io/client-go/informers/factor
1006 02:44:41.129531   19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.230347   19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.331174   19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.431984   19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.532748   19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.596687   19805 controller.go:135] failed to ensure node lease exist
1006 02:44:41.633573   19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.734381   19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.745881   19805 kubelet_node_status.go:94] Unable to register node with apiserver ~~
1006 02:44:41.835220   19805 kubelet.go:2267] node "ubuntu" not found
1006 02:44:41.936016   19805 kubelet.go:2267] node "ubuntu" not found

因为改地址导致旧的ip无法匹配,最简单的解决方案依然是kubeadm reset,推荐修改conf参数的方式来解决,将conf文件中的旧地址修改为新地址,重启kubelet服务,问题解决。

k8s集群安装过程中的相关问题和解决_第1张图片

因为整个安装操作都是在root下进行,所以还要复制下配置文件到home目录,然后改个权限,做完这些master节点安装kubernetes就成功了,这里要把安装成功后的token记下来,用于之后node的添加:

k8s集群安装过程中的相关问题和解决_第2张图片

已达总线带宽上限,后面的下次续传。

 

你可能感兴趣的:(分布式,k8s和docker)