本文写于 2019-02-06 已亥猪年 农历正月初二
当前最新版本为 v1.13.3
在 18 年 6 月份京东活动的时候,买了一本 Kubernetes 权威指南,一直没时间看,春节期间正好学学。由于书上使用的是 2017 年的 1.6.0 版本,我自己为了使用最新版本,特地做一个记录。
虽然买了本书,但是整个操作过程参考了很多资料,主要是 kubeadm 官方文档:
系统防火墙配置
禁用防火墙
systemctl disable firewalld
关闭防火墙
systemctl stop firewalld
禁用 SELinux,目的是让容器可以读取主机文件系统
setenforce 0
配置禁用 SELinux
vi /etc/sysconfig/selinux
修改 SELINUX 为 disabled
SELINUX=disabled
#SELINUX=enforcing
在上述第一个文档中安装 kubeadm,kubelet 和 kubectl 这一步时,文档提供的脚本中的地址都是https://packages.cloud.google.com 域名下的,由于被墙无法使用,我们可以使用阿里巴巴开源镜像站 提供的 kubernetes。
使用阿里巴巴开源镜像站
https://opsx.alibaba.com/mirror
从列表找到 kubernetes,点击帮助,显示如下信息。
我使用的最新版本的 CentOS 7:VMware 虚拟机 最小化安装 CentOS 7 的 IP 配置
apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl
cat < /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
setenforce 0
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet && systemctl start kubelet
在阿里巴巴开源镜像站还可以搜 docker-ce,帮助中给了一个地址:
https://yq.aliyun.com/articles/110806
特别注意:本文的 Kubernetes 版本为 v1.13.3,因为我使用 docker 官方脚本安装的最新版本的,所以执行 kubeadm init 时有下面的警告:
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
因此使用 18.06 版本可以消除该警告,如果要指定版本,上面文档中也有说明。
关闭防火墙
查看防火墙状态:firewall-cmd --state
关闭防火墙:systemctl stop firewalld.service
禁止开机启动:systemctl disable firewalld.service
参考:https://www.linuxidc.com/Linux/2016-12/138979.htm
由于国内下载 docker image 速度较慢,可以使用镜像进行加速。
编辑或新增 /etc/docker/daemon.json
添加如下内容:
{
"registry-mirrors": [
"https://dockerhub.azk8s.cn",
"https://reg-mirror.qiniu.com"
]
}
参考:https://yeasy.gitbooks.io/docker_practice/install/mirror.html
kubeadm init
执行该命令时会出现很多问题,这里都列举出来。
执行 kubeadm init
时,会先请求 https://dl.k8s.io/release/stable-1.txt 获取最新稳定的版本号,该地址实际会跳转到 https://storage.googleapis.com/kubernetes-release/release/stable-1.txt ,在写本文时此时的返回值为 v1.13.3
。由于被墙无法请求该地址,为了避免这个问题,我们可以直接指定要获取的版本,执行下面的命令:
kubeadm init --kubernetes-version=v1.13.3
执行该命令时可能会遇到下面的错误。
问题:
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
解决方案:
echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
问题:
[ERROR Swap]: running with swap on is not supported. Please disable swap
解决方案,禁用 swap 分区:
#禁用当前的 swap
sudo swapoff -a
#同时永久禁掉swap分区,打开如下文件注释掉swap那一行
sudo vi /etc/fstab
问题(有删减):
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-apiserver:v1.13.3
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-controller-manager:v1.13.3
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-scheduler:v1.13.3
[ERROR ImagePull]: failed to pull image k8s.gcr.io/kube-proxy:v1.13.3
[ERROR ImagePull]: failed to pull image k8s.gcr.io/pause:3.1
[ERROR ImagePull]: failed to pull image k8s.gcr.io/etcd:3.2.24
[ERROR ImagePull]: failed to pull image k8s.gcr.io/coredns:1.2.6
由于 gcr.io 被墙无法下载,我们可以先通过其他渠道下载,然后在继续执行命令。
根据前面错误信息来看我们需要下载的镜像。就当前来说,用户 mirrorgooglecontainers 在 docker hub 同步了所有 k8s 最新的镜像,先从这儿下载,然后修改 tag 即可。
docker pull mirrorgooglecontainers/kube-apiserver:v1.13.3
docker pull mirrorgooglecontainers/kube-controller-manager:v1.13.3
docker pull mirrorgooglecontainers/kube-scheduler:v1.13.3
docker pull mirrorgooglecontainers/kube-proxy:v1.13.3
docker pull mirrorgooglecontainers/pause:3.1
docker pull mirrorgooglecontainers/etcd:3.2.24
docker pull coredns/coredns:1.2.6
下载完成后,通过 docker images
查看如下:
REPOSITORY TAG IMAGE ID CREATED SIZE
mirrorgooglecontainers/kube-apiserver v1.13.3 fe242e556a99 5 days ago 181MB
mirrorgooglecontainers/kube-controller-manager v1.13.3 0482f6400933 5 days ago 146MB
mirrorgooglecontainers/kube-proxy v1.13.3 98db19758ad4 5 days ago 80.3MB
mirrorgooglecontainers/kube-scheduler v1.13.3 3a6f709e97a0 5 days ago 79.6MB
coredns/coredns 1.2.6 f59dcacceff4 3 months ago 40MB
mirrorgooglecontainers/etcd 3.2.24 3cab8e1b9802 4 months ago 220MB
mirrorgooglecontainers/pause 3.1 da86e6ba6ca1 13 months ago 742kB
分别修改上述镜像的标签。
docker tag mirrorgooglecontainers/kube-apiserver:v1.13.3 k8s.gcr.io/kube-apiserver:v1.13.3
docker tag mirrorgooglecontainers/kube-controller-manager:v1.13.3 k8s.gcr.io/kube-controller-manager:v1.13.3
docker tag mirrorgooglecontainers/kube-scheduler:v1.13.3 k8s.gcr.io/kube-scheduler:v1.13.3
docker tag mirrorgooglecontainers/kube-proxy:v1.13.3 k8s.gcr.io/kube-proxy:v1.13.3
docker tag mirrorgooglecontainers/pause:3.1 k8s.gcr.io/pause:3.1
docker tag mirrorgooglecontainers/etcd:3.2.24 k8s.gcr.io/etcd:3.2.24
docker tag coredns/coredns:1.2.6 k8s.gcr.io/coredns:1.2.6
配置好镜像后继续执行前面的命令:
kubeadm init --kubernetes-version=v1.13.3
前面命令执行输出的日志如下(保存好这段日志):
[root@k8s-master ~]# kubeadm init --kubernetes-version=v1.13.3
[init] Using Kubernetes version: v1.13.3
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.09.0. Latest validated version: 18.06
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.200.131 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.200.131 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.200.131]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 21.507393 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s-master" as an annotation
[mark-control-plane] Marking the node k8s-master as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8s-master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 7j01ut.pbdh60q732m1kd4v
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 192.168.200.131:6443 --token 7j01ut.pbdh60q732m1kd4v --discovery-token-ca-cert-hash sha256:de1dc033ae5cc27607b0f271655dd884c0bf6efb458957133dd9f50681fa2723
上面要求执行下面的操作:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
有了上面配置后,后续才能使用 kubectl
执行命令。
如果你系统重启过,执行 kubectl
时你可能会遇到类似下面这样的问题:
[root@k8s-master ~]# kubectl get pods
The connection to the server 192.168.200.131:6443 was refused - did you specify the right host or port?
我不清楚这里具体的原因,但是找到了问题的根源就是 swap,在前面 3.2 中,如果没有彻底禁用 swap,重启后会仍然启用,此时的 k8s 就会出现上面的错误。因为这个原因,所以建议直接禁用:
#永久禁掉swap分区,打开如下文件注释掉swap那一行
sudo vi /etc/fstab
上面要求执行下面的操作:
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
参考Kubernetes 权威指南(书上用的 v1.6),这里安装 Weave 插件,文档地址:
https://www.weave.works/docs/net/latest/kubernetes/kube-addon/
按照文档,执行下面的命令:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
输出的内容如下:
serviceaccount/weave-net created
clusterrole.rbac.authorization.k8s.io/weave-net created
clusterrolebinding.rbac.authorization.k8s.io/weave-net created
role.rbac.authorization.k8s.io/weave-net created
rolebinding.rbac.authorization.k8s.io/weave-net created
daemonset.extensions/weave-net created
上面要求在NODE节点的主机上,以 root 用户执行下面的操作,
kubeadm join 192.168.200.131:6443 --token 7j01ut.pbdh60q732m1kd4v --discovery-token-ca-cert-hash sha256:de1dc033ae5cc27607b0f271655dd884c0bf6efb458957133dd9f50681fa2723
注意,别复制这里的,看你自己上面安装后输出的内容。
这段代码中的 token 可能存在有效期 1 天,如果过期,可以参考下面地址获取最新的 token
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#join-nodes
如果你使用的多个主机组建的集群,可以在其他主机执行上面的命令加入到集群中。
因为我这儿是实验用,所以我用的单机集群方式。
参考官方文档:https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
先执行下面的命令查看当前 pods 状态:
[root@k8s-master ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-86c58d9df4-9s65p 0/1 Error 0 59m
kube-system coredns-86c58d9df4-dvg7b 0/1 Error 0 59m
kube-system etcd-k8s-master 1/1 Running 3 58m
kube-system kube-apiserver-k8s-master 1/1 Running 3 58m
kube-system kube-controller-manager-k8s-master 1/1 Running 3 58m
kube-system kube-proxy-5p4d8 1/1 Running 3 59m
kube-system kube-scheduler-k8s-master 1/1 Running 3 58m
kube-system weave-net-j87km 1/2 Running 2 16m
此时看到两个 Error,不清楚原因,然后执行单机集群的命令:
kubectl taint nodes --all node-role.kubernetes.io/master-
再次查看状态:
[root@k8s-master ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-86c58d9df4-9s65p 1/1 Running 1 60m
kube-system coredns-86c58d9df4-dvg7b 1/1 Running 1 60m
kube-system etcd-k8s-master 1/1 Running 3 59m
kube-system kube-apiserver-k8s-master 1/1 Running 3 59m
kube-system kube-controller-manager-k8s-master 1/1 Running 3 59m
kube-system kube-proxy-5p4d8 1/1 Running 3 60m
kube-system kube-scheduler-k8s-master 1/1 Running 3 59m
kube-system weave-net-j87km 2/2 Running 3 16m
现在都正常了。
退出系统,创建快照(在本文操作过程中,我创建了 3 个不同阶段的快照)。
为了方便自己和他人使用,提供一个虚拟机的备份方便直接使用(如果安装遇到各种莫名其妙错误想要直接跳过安装的)。
虚拟机版本:15.0.0 build-10134415
虚拟机备份链接: https://pan.baidu.com/s/1s3FZtcvONgFXAmz1AUU9_w
提取码: tbi2
系统登陆用户:root
系统登陆密码:jj
有关虚拟机的 IP 信息看这里:VMware 虚拟机 最小化安装 CentOS 7 的 IP 配置
如果想要修改 IP 应该怎么做?
kubeadm reset
。kubeadm init --kubernetes-version=v1.13.3
。$HOME/.kube
目录,执行 rm -rf $HOME/.kube
。kubectl taint nodes --all node-role.kubernetes.io/master-
。kubectl get pods --all-namespaces
查看状态。本文是一边写一边操作验证写完的,写完的时候,我自己的配置都没问题了,从头下来虽然花的时间比较长,但是还算顺利,本文只是配置了 Kubernetes 的实验环境,一切才刚刚开始!
根据我自己后续操作遇到的问题,都列举在此。
例如:
[root@k8s-master chapter01]# kubectl get pods
NAME READY STATUS RESTARTS AGE
mysql-4wvzz 0/1 Pending 0 4m1s
此时可以通过下面命令查看 Pod 状态:
[root@k8s-master chapter01]# kubectl describe pods mysql-4wvzz
Name: mysql-4wvzz
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: app=mysql
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicationController/mysql
Containers:
mysql:
Image: mysql
Port: 3306/TCP
Host Port: 0/TCP
Environment:
MYSQL_ROOT_PASSWORD: 123456
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-rksdn (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-rksdn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-rksdn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28s (x3 over 103s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
此时发现一个警告:0/1 nodes are available: 1 node(s) had taints that the pod didn’t tolerate.
该问题是因为没有可用的节点NODE,所以无法部署,如果你使用的单机集群方式,你可能是忘了执行下面的命令:
kubectl taint nodes --all node-role.kubernetes.io/master-
如果是多机集群,通过 kubectl get nodes
查看节点状态,保证有可用节点。
问题解决后,再次查看 Pod 状态,此时的事件部分输出如下:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 51s (x9 over 6m6s) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
Normal Scheduled 10s default-scheduler Successfully assigned default/mysql-4wvzz to k8s-master
Normal Pulling 9s kubelet, k8s-master pulling image "mysql"
Normal Pulled 7s kubelet, k8s-master Successfully pulled image "mysql"
Normal Created 7s kubelet, k8s-master Created container
Normal Started 7s kubelet, k8s-master Started container
此时问题已经解决。