参照以前安装kubernetes 1.5.2失败,原因是docker包冲突。在查看高版本安装过程中发现,高版本kubernetes不再打包安装docker,而是需要用户先自行安装好docker服务。
机器上已经安装了 Docker version 17.12.0-ce, build c97c6d6
再安装kubernetes (kubernetes.x86_64 1.5.2-0.7.git269f928.el7) 时失败。
错误:docker-ce conflicts with 2:docker-1.12.6-71.git3e8e77d.el7.centos.1.x86_64
您可以尝试添加 --skip-broken 选项来解决该问题
您可以尝试执行:rpm -Va --nofiles --nodigest
猜测可能因为版本问题,故去网上搜索安装更高级版本方法。结果如下:
“但是在kubernetes1.6之后,安装就比较繁琐了,需要证书各种认证,对于刚接触kubernetes的人来说很不友好,按照官方文档在本地安装“集群”的的话,我觉得你肯定是跑不起来的,除非你突破了GFW的限制,还要懂得怎么样不断修改参数。”
意思是k8s 1.6之后的安装与之前可能有比较大的差异。google被墙,需要预先下载很多docker镜像。
以下三篇文章安装k8s 1.7.5,由于缺乏docker镜像,安装失败。
https://www.cnblogs.com/liangDream/p/7358847.html
http://www.bubuko.com/infodetail-2375091.html
https://www.kubernetes.org.cn/3063.html
[root@tensorflow0 hdzhou]# yum remove docker \
docker-common \
docker-selinux \
docker-engine
======================================================================================================================================================================================
Package 架构 版本 源 大小
======================================================================================================================================================================================
正在删除:
container-selinux noarch 2:2.36-1.gitff95335.el7 @extras 34 k
为依赖而移除:
docker-ce x86_64 17.12.0.ce-1.el7.centos installed 123 M
nvidia-docker2 noarch 2.0.2-1.docker17.12.0.ce @nvidia-docker 2.3 k
事务概要
======================================================================================================================================================================================
移除 1 软件包 (+2 依赖软件包)
2月 26 16:42:00 tensorflow0 dockerd[8717]: time="2018-02-26T16:42:00.315096986+08:00" level=info msg="libcontainerd: new containerd process, pid: 8725"
2月 26 16:42:01 tensorflow0 dockerd[8717]: time="2018-02-26T16:42:01.319051277+08:00" level=error msg="[graphdriver] prior storage driver overlay2 failed: driver not supported"
2月 26 16:42:01 tensorflow0 dockerd[8717]: Error starting daemon: error initializing graphdriver: driver not supported
2月 26 16:42:01 tensorflow0 systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
2月 26 16:42:01 tensorflow0 systemd[1]: Failed to start Docker Application Container Engine.
sudo mv /var/lib/docker /var/lib/docker.old
rpm -ivh socat-1.7.3.2-2.el7.x86_64.rpm
rpm -ivh kubernetes-cni-0.6.0-0.x86_64.rpm kubelet-1.9.9-9.x86_64.rpm kubectl-1.9.0-0.x86_64.rpm
rpm -ivh kubectl-1.9.0-0.x86_64.rpm
rpm -ivh kubeadm-1.9.0-0.x86_64.rpm
rpm -e 文件名 --nodeps
rpm -e socat-1.7.3.2-2.el7.x86_64 --nodeps
rpm -e kubernetes-cni-0.6.0-0.x86_64 --nodeps
rpm -e kubelet-1.9.0-0.x86_64 --nodeps
rpm -e kubectl-1.9.0-0.x86_64 --nodeps
rpm -e kubeadm-1.9.0-0.x86_64.rpm --nodeps
cat /var/log/messages
journalctl -xeu kubelet
kubeadm init --kubernetes-version=v1.9.0 --pod-network-cidr=10.244.0.0/16
kubeadm join --token 5ce44e.47b6dc4e4b66980f 192.168.1.138:6443 --discovery-token-ca-cert-hash sha256:9d7eac82d66744405c783de5403e1f2bb7191b4c1b350d721b7b8570c62ff83a
kubeadm token list
kubeadm token create
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
[root@tensorflow0 etc]# kubeadm init --kubernetes-version=v1.9.0 --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.9.0
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Some fatal errors occurred:
[ERROR Swap]: running with swap on is not supported. Please disable swap
[ERROR Port-2379]: Port 2379 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
[root@tensorflow0 etc]#
kubectl get pod kube-proxy-d2p7p -o wide --namespace=kube-system
kubectl describe pod kube-proxy-d2p7p --namespace=kube-system
注意:时刻查看/var/log/messages的日志输出,会看到kubelet一直启动失败。
[preflight] Running pre-flight checks.
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Some fatal errors occurred:
[ERROR Swap]: running with swap on is not supported. Please disable swap
关闭swap
swapoff -a
E1216 23:50:16.116098 28152 pod_workers.go:186] Error syncing pod 6f5b9673-e2b5-11e7-a0f5-001e67d35991 ("kube-dns-6f4fd4bdf-xrj4w_kube-system(6f5b9673-e2b5-11e7-a0f5-001e67d35991)"), skipping: failed to "CreatePodSandbox" for "kube-dns-6f4fd4bdf-xrj4w_kube-system(6f5b9673-e2b5-11e7-a0f5-001e67d35991)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-6f4fd4bdf-xrj4w_kube-system(6f5b9673-e2b5-11e7-a0f5-001e67d35991)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"kube-dns-6f4fd4bdf-xrj4w_kube-system\" network: failed to allocate for range 0: no IP addresses available in range set: 10.244.0.1-10.244.0.254"
kubeadm reset
rm -rf /var/lib/cni/flannel/*
rm -rf /var/lib/cni/networks/cbr0/*
ip link delete cni0 flannel.1
default po/httpd-68f9d7648d-5f9gt 0/1 ContainerCreating 0 1m tensorflow0
Warning FailedCreatePodSandBox 20s (x12 over 54s) kubelet, tensorflow0 Failed create pod sandbox.
Normal SandboxChanged 20s (x12 over 53s) kubelet, tensorflow0 Pod sandbox changed, it will be killed and re-created.
Error while adding to cni network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.2.1/24。
Error while adding to cni network: failed to allocate for range 0: no IP addresses available in range set: 10.244.2.1-10.244.2.254
[root@tensorflow0 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 四 2018-03-22 14:49:29 CST; 4min 12s ago
Docs: http://kubernetes.io/docs/
Main PID: 3873 (kubelet)
Memory: 45.0M
CGroup: /system.slice/kubelet.service
├─ 3873 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --feature-gates=DevicePlugins=true --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --network-plugin=cni -...
├─11665 /opt/cni/bin/flannel
└─11670 /opt/cni/bin/bridge
3月 22 14:53:35 tensorflow0 kubelet[3873]: E0322 14:53:35.990200 3873 kuberuntime_manager.go:647] createPodSandbox for pod "httpd-68f9d7648d-5f9gt_default(39f66066-2d9d-11e8-bf17-98eecb73f4db)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to ...
3月 22 14:53:35 tensorflow0 kubelet[3873]: E0322 14:53:35.990287 3873 pod_workers.go:186] Error syncing pod 39f66066-2d9d-11e8-bf17-98eecb73f4db ("httpd-68f9d7648d-5f9gt_default(39f66066-2d9d-11e8-bf17-98eecb73f4db)"), skipping: failed to "Cre...bf17-98eecb73f4db)"
3月 22 14:53:37 tensorflow0 kubelet[3873]: W0322 14:53:37.041536 3873 pod_container_deletor.go:77] Container "73c43b8766686c64d31bdd0533604d1d349ebe08f95d7463d23ebdffe377113e" not found in pod's containers
3月 22 14:53:39 tensorflow0 kubelet[3873]: E0322 14:53:39.621047 3873 cni.go:259] Error adding network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.2.1/24
3月 22 14:53:39 tensorflow0 kubelet[3873]: E0322 14:53:39.621083 3873 cni.go:227] Error while adding to cni network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.2.1/24
3月 22 14:53:39 tensorflow0 kubelet[3873]: E0322 14:53:39.809286 3873 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "httpd-68f9d7648d-5f9gt_default" net...t from 10.244.2.1/24
3月 22 14:53:39 tensorflow0 kubelet[3873]: E0322 14:53:39.809337 3873 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "httpd-68f9d7648d-5f9gt_default(39f66066-2d9d-11e8-bf17-98eecb73f4db)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to s...
3月 22 14:53:39 tensorflow0 kubelet[3873]: E0322 14:53:39.809360 3873 kuberuntime_manager.go:647] createPodSandbox for pod "httpd-68f9d7648d-5f9gt_default(39f66066-2d9d-11e8-bf17-98eecb73f4db)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to ...
3月 22 14:53:39 tensorflow0 kubelet[3873]: E0322 14:53:39.809424 3873 pod_workers.go:186] Error syncing pod 39f66066-2d9d-11e8-bf17-98eecb73f4db ("httpd-68f9d7648d-5f9gt_default(39f66066-2d9d-11e8-bf17-98eecb73f4db)"), skipping: failed to "Cre...bf17-98eecb73f4db)"
3月 22 14:53:40 tensorflow0 kubelet[3873]: W0322 14:53:40.063548 3873 pod_container_deletor.go:77] Container "f1b063e5245c7a5c8527d1426858781c6554bcb06d987c7f472cfd0c41290110" not found in pod's containers
Hint: Some lines were ellipsized, use -l to show in full.
rm -rf /var/lib/cni/flannel/* && rm -rf /var/lib/cni/networks/cbr0/* && ip link delete cni0
rm -rf /var/lib/cni/networks/cni0/*
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
echo 'net.bridge.bridge-nf-call-iptables=1' >> /etc/sysctl.conf
sysctl -p
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
[root@tensorflow1 influxdb]# kubectl get all -o wide -n kube-system
error: {batch cronjobs} matches multiple kinds [batch/v1beta1, Kind=CronJob batch/v2alpha1, Kind=CronJob]
启动nvidia-device-plugin-daemonset失败
Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:337: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=all --utility --pid=12545 /data1/docker/overlay/10be1d599f91da020b7bfced8058533bb6129b637871ea61e0547ecb8758b3a2/merged]\\n*** Error in `/usr/bin/nvidia-container-cli': double free or corruption (!prev): 0x000055c6961daa10 ***\\n======= Backtrace: =========\\n/lib64/libc.so.6(+0x7c619)[0x7f5aa0af0619]\\n/usr/lib64/nvidia/libcuda.so.1(+0x2edd7c)[0x7f5a9fb77d7c]\\n/usr/lib64/nvidia/libcuda.so.1(+0x2eddc3)[0x7f5a9fb77dc3]\\n/usr/lib64/nvidia/libcuda.so.1
发现gpu已经被占用,先清理干净,再启动就没问题了。