Kubernetes 最新版本安装过程和注意事项

本文写于 2019-02-06 已亥猪年 农历正月初二
当前最新版本为 v1.13.3

在 18 年 6 月份京东活动的时候,买了一本 Kubernetes 权威指南,一直没时间看,春节期间正好学学。由于书上使用的是 2017 年的 1.6.0 版本,我自己为了使用最新版本,特地做一个记录。

虽然买了本书,但是整个操作过程参考了很多资料,主要是 kubeadm 官方文档:

  • https://kubernetes.io/docs/setup/independent/install-kubeadm/
  • https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

系统防火墙配置
禁用防火墙
systemctl disable firewalld
关闭防火墙
systemctl stop firewalld
禁用 SELinux,目的是让容器可以读取主机文件系统
setenforce 0
配置禁用 SELinux
vi /etc/sysconfig/selinux
修改 SELINUX 为 disabled
SELINUX=disabled
#SELINUX=enforcing

在上述第一个文档中安装 kubeadm,kubelet 和 kubectl 这一步时,文档提供的脚本中的地址都是https://packages.cloud.google.com 域名下的,由于被墙无法使用,我们可以使用阿里巴巴开源镜像站 提供的 kubernetes。

1. 安装 kubelet kubeadm kubectl

使用阿里巴巴开源镜像站

https://opsx.alibaba.com/mirror

从列表找到 kubernetes,点击帮助,显示如下信息。

我使用的最新版本的 CentOS 7:VMware 虚拟机 最小化安装 CentOS 7 的 IP 配置

Debian / Ubuntu

apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - 
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF  
apt-get update
apt-get install -y kubelet kubeadm kubectl

CentOS / RHEL / Fedora

cat < /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
setenforce 0
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet && systemctl start kubelet

Docker CE

阿里巴巴开源镜像站还可以搜 docker-ce,帮助中给了一个地址:

https://yq.aliyun.com/articles/110806

特别注意:本文的 Kubernetes 版本为 v1.13.3,因为我使用 docker 官方脚本安装的最新版本的,所以执行 kubeadm init 时有下面的警告:

[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06

因此使用 18.06 版本可以消除该警告,如果要指定版本,上面文档中也有说明。

关闭防火墙
查看防火墙状态: firewall-cmd --state
关闭防火墙: systemctl stop firewalld.service
禁止开机启动:systemctl disable firewalld.service
参考:https://www.linuxidc.com/Linux/2016-12/138979.htm

2. Docker 加速

由于国内下载 docker image 速度较慢,可以使用镜像进行加速。

编辑或新增 /etc/docker/daemon.json 添加如下内容:

{
  "registry-mirrors": [
    "https://dockerhub.azk8s.cn",
    "https://reg-mirror.qiniu.com"
  ]
}

参考:https://yeasy.gitbooks.io/docker_practice/install/mirror.html

3. 执行 kubeadm init

执行该命令时会出现很多问题,这里都列举出来。

执行 kubeadm init 时,会先请求 https://dl.k8s.io/release/stable-1.txt 获取最新稳定的版本号,该地址实际会跳转到 https://storage.googleapis.com/kubernetes-release/release/stable-1.txt ,在写本文时此时的返回值为 v1.13.3。由于被墙无法请求该地址,为了避免这个问题,我们可以直接指定要获取的版本,执行下面的命令:

kubeadm init --kubernetes-version=v1.13.3

执行该命令时可能会遇到下面的错误。

3.1 ERROR FileContent–proc-sys-net-bridge-bridge-nf-call-iptables

问题:

[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1

解决方案:

echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables

3.2 ERROR Swap

问题:

[ERROR Swap]: running with swap on is not supported. Please disable swap

解决方案,禁用 swap 分区:

#禁用当前的 swap
sudo swapoff -a 
#同时永久禁掉swap分区,打开如下文件注释掉swap那一行
sudo vi /etc/fstab

3.3 无法下载镜像

问题(有删减):

error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.13.3
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.13.3
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.13.3
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.13.3
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/pause:3.1
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.2.24
	[ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns:1.2.6

由于 gcr.io 被墙无法下载,我们可以先通过其他渠道下载,然后在继续执行命令。

4. 预先下载镜像

根据前面错误信息来看我们需要下载的镜像。就当前来说,用户 mirrorgooglecontainers 在 docker hub 同步了所有 k8s 最新的镜像,先从这儿下载,然后修改 tag 即可。

docker pull mirrorgooglecontainers/kube-apiserver:v1.13.3
docker pull mirrorgooglecontainers/kube-controller-manager:v1.13.3
docker pull mirrorgooglecontainers/kube-scheduler:v1.13.3
docker pull mirrorgooglecontainers/kube-proxy:v1.13.3
docker pull mirrorgooglecontainers/pause:3.1
docker pull mirrorgooglecontainers/etcd:3.2.24
docker pull coredns/coredns:1.2.6

下载完成后,通过 docker images 查看如下:

REPOSITORY                                       TAG                 IMAGE ID            CREATED             SIZE
mirrorgooglecontainers/kube-apiserver            v1.13.3             fe242e556a99        5 days ago          181MB
mirrorgooglecontainers/kube-controller-manager   v1.13.3             0482f6400933        5 days ago          146MB
mirrorgooglecontainers/kube-proxy                v1.13.3             98db19758ad4        5 days ago          80.3MB
mirrorgooglecontainers/kube-scheduler            v1.13.3             3a6f709e97a0        5 days ago          79.6MB
coredns/coredns                                  1.2.6               f59dcacceff4        3 months ago        40MB
mirrorgooglecontainers/etcd                      3.2.24              3cab8e1b9802        4 months ago        220MB
mirrorgooglecontainers/pause                     3.1                 da86e6ba6ca1        13 months ago       742kB

分别修改上述镜像的标签。

docker tag mirrorgooglecontainers/kube-apiserver:v1.13.3 k8s.gcr.io/kube-apiserver:v1.13.3
docker tag mirrorgooglecontainers/kube-controller-manager:v1.13.3 k8s.gcr.io/kube-controller-manager:v1.13.3
docker tag mirrorgooglecontainers/kube-scheduler:v1.13.3 k8s.gcr.io/kube-scheduler:v1.13.3
docker tag mirrorgooglecontainers/kube-proxy:v1.13.3 k8s.gcr.io/kube-proxy:v1.13.3
docker tag mirrorgooglecontainers/pause:3.1 k8s.gcr.io/pause:3.1
docker tag mirrorgooglecontainers/etcd:3.2.24 k8s.gcr.io/etcd:3.2.24
docker tag coredns/coredns:1.2.6 k8s.gcr.io/coredns:1.2.6

配置好镜像后继续执行前面的命令:

kubeadm init --kubernetes-version=v1.13.3

5. 安装成功

前面命令执行输出的日志如下(保存好这段日志):

[root@k8s-master ~]# kubeadm init --kubernetes-version=v1.13.3
[init] Using Kubernetes version: v1.13.3
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.200.131 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.200.131 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.200.131]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 21.507393 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s-master" as an annotation
[mark-control-plane] Marking the node k8s-master as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8s-master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 7j01ut.pbdh60q732m1kd4v
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 192.168.200.131:6443 --token 7j01ut.pbdh60q732m1kd4v --discovery-token-ca-cert-hash sha256:de1dc033ae5cc27607b0f271655dd884c0bf6efb458957133dd9f50681fa2723

6. 特别注意上面提示的后续操作(一)

上面要求执行下面的操作:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

有了上面配置后,后续才能使用 kubectl 执行命令。

如果你系统重启过,执行 kubectl 时你可能会遇到类似下面这样的问题:

[root@k8s-master ~]# kubectl get pods
The connection to the server 192.168.200.131:6443 was refused - did you specify the right host or port?

我不清楚这里具体的原因,但是找到了问题的根源就是 swap,在前面 3.2 中,如果没有彻底禁用 swap,重启后会仍然启用,此时的 k8s 就会出现上面的错误。因为这个原因,所以建议直接禁用:

#永久禁掉swap分区,打开如下文件注释掉swap那一行
sudo vi /etc/fstab

7. 特别注意上面提示的后续操作(二)

上面要求执行下面的操作:

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

参考Kubernetes 权威指南(书上用的 v1.6),这里安装 Weave 插件,文档地址:

https://www.weave.works/docs/net/latest/kubernetes/kube-addon/

按照文档,执行下面的命令:

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

输出的内容如下:

serviceaccount/weave-net created
clusterrole.rbac.authorization.k8s.io/weave-net created
clusterrolebinding.rbac.authorization.k8s.io/weave-net created
role.rbac.authorization.k8s.io/weave-net created
rolebinding.rbac.authorization.k8s.io/weave-net created
daemonset.extensions/weave-net created

8. 特别注意上面提示的后续操作(三)

上面要求在NODE节点的主机上,以 root 用户执行下面的操作,

kubeadm join 192.168.200.131:6443 --token 7j01ut.pbdh60q732m1kd4v --discovery-token-ca-cert-hash sha256:de1dc033ae5cc27607b0f271655dd884c0bf6efb458957133dd9f50681fa2723

注意,别复制这里的,看你自己上面安装后输出的内容。
这段代码中的 token 可能存在有效期 1 天,如果过期,可以参考下面地址获取最新的 token
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#join-nodes

如果你使用的多个主机组建的集群,可以在其他主机执行上面的命令加入到集群中。

因为我这儿是实验用,所以我用的单机集群方式。

9. 单机集群

参考官方文档:https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

先执行下面的命令查看当前 pods 状态:

[root@k8s-master ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE
kube-system   coredns-86c58d9df4-9s65p             0/1     Error     0          59m
kube-system   coredns-86c58d9df4-dvg7b             0/1     Error     0          59m
kube-system   etcd-k8s-master                      1/1     Running   3          58m
kube-system   kube-apiserver-k8s-master            1/1     Running   3          58m
kube-system   kube-controller-manager-k8s-master   1/1     Running   3          58m
kube-system   kube-proxy-5p4d8                     1/1     Running   3          59m
kube-system   kube-scheduler-k8s-master            1/1     Running   3          58m
kube-system   weave-net-j87km                      1/2     Running   2          16m

此时看到两个 Error,不清楚原因,然后执行单机集群的命令:

kubectl taint nodes --all node-role.kubernetes.io/master-

再次查看状态:

[root@k8s-master ~]# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE
kube-system   coredns-86c58d9df4-9s65p             1/1     Running   1          60m
kube-system   coredns-86c58d9df4-dvg7b             1/1     Running   1          60m
kube-system   etcd-k8s-master                      1/1     Running   3          59m
kube-system   kube-apiserver-k8s-master            1/1     Running   3          59m
kube-system   kube-controller-manager-k8s-master   1/1     Running   3          59m
kube-system   kube-proxy-5p4d8                     1/1     Running   3          60m
kube-system   kube-scheduler-k8s-master            1/1     Running   3          59m
kube-system   weave-net-j87km                      2/2     Running   3          16m

现在都正常了。

退出系统,创建快照(在本文操作过程中,我创建了 3 个不同阶段的快照)。

10. 虚拟机备份下载

为了方便自己和他人使用,提供一个虚拟机的备份方便直接使用(如果安装遇到各种莫名其妙错误想要直接跳过安装的)。

虚拟机版本:15.0.0 build-10134415
虚拟机备份链接: https://pan.baidu.com/s/1s3FZtcvONgFXAmz1AUU9_w
提取码: tbi2

系统登陆用户:root
系统登陆密码:jj
有关虚拟机的 IP 信息看这里:VMware 虚拟机 最小化安装 CentOS 7 的 IP 配置

如果想要修改 IP 应该怎么做?

  1. 首先改 IP,重启 network 服务。
  2. 重置 k8s,执行 kubeadm reset
  3. 执行安装 kubeadm init --kubernetes-version=v1.13.3
  4. 你可能还会遇到 3.1 中的问题,按照上面配置即可。
  5. 先删除6中创建的 $HOME/.kube 目录,执行 rm -rf $HOME/.kube
  6. 执行 6. 特别注意上面提示的后续操作(一) 中的操作。
  7. 执行7. 特别注意上面提示的后续操作(二) 中的操作。
  8. 如果是单机集群方式,还需执行kubectl taint nodes --all node-role.kubernetes.io/master-
  9. 好了,又满血复活了,执行 kubectl get pods --all-namespaces 查看状态。

11. 小结

本文是一边写一边操作验证写完的,写完的时候,我自己的配置都没问题了,从头下来虽然花的时间比较长,但是还算顺利,本文只是配置了 Kubernetes 的实验环境,一切才刚刚开始!

12. 补充常见问题

根据我自己后续操作遇到的问题,都列举在此。

12.1 Pod 一直是 Pending

例如:

[root@k8s-master chapter01]# kubectl get pods
NAME          READY   STATUS    RESTARTS   AGE
mysql-4wvzz   0/1     Pending   0          4m1s

此时可以通过下面命令查看 Pod 状态:

[root@k8s-master chapter01]# kubectl describe pods mysql-4wvzz
Name:               mysql-4wvzz
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             app=mysql
Annotations:        <none>
Status:             Pending
IP:                 
Controlled By:      ReplicationController/mysql
Containers:
  mysql:
    Image:      mysql
    Port:       3306/TCP
    Host Port:  0/TCP
    Environment:
      MYSQL_ROOT_PASSWORD:  123456
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-rksdn (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-rksdn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-rksdn
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  28s (x3 over 103s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

此时发现一个警告:0/1 nodes are available: 1 node(s) had taints that the pod didn’t tolerate.

该问题是因为没有可用的节点NODE,所以无法部署,如果你使用的单机集群方式,你可能是忘了执行下面的命令:

kubectl taint nodes --all node-role.kubernetes.io/master-

如果是多机集群,通过 kubectl get nodes 查看节点状态,保证有可用节点。

问题解决后,再次查看 Pod 状态,此时的事件部分输出如下:

Events:
  Type     Reason            Age                 From                 Message
  ----     ------            ----                ----                 -------
  Warning  FailedScheduling  51s (x9 over 6m6s)  default-scheduler    0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled         10s                 default-scheduler    Successfully assigned default/mysql-4wvzz to k8s-master
  Normal   Pulling           9s                  kubelet, k8s-master  pulling image "mysql"
  Normal   Pulled            7s                  kubelet, k8s-master  Successfully pulled image "mysql"
  Normal   Created           7s                  kubelet, k8s-master  Created container
  Normal   Started           7s                  kubelet, k8s-master  Started container

此时问题已经解决。

12.2 weave cni 无效,改用 flannel

你可能感兴趣的:(其他)