一、软硬件环境
采用CentOS7.4 minimual,docker 1.12,kubeadm 1.7.5,etcd 3.0, k8s 1.7.6
本章节以下配置内容需要在全部节点上都做配置。我们这里仅选用两个节点搭建一个实验环境。
设置主机节点的主机名,在/etc/hosts中配置好映射关系:
10.0.2.15 gqtest1.future
10.0.2.4 gqtest2.future
配置系统防火墙策略,使以上两个主机在同网段间的通信不受限制:
firewall-cmd --permanent --zone=public --add-rich-rule="rule family="ipv4" source address="10.0.2
.0/24" accept"
firewall-cmd --reload
firewall-cmd --list-all
注:对于实验环境,不妨直接永久关停firewalld。
关闭selinux。
配置系统内核参数使流过网桥的流量也进入iptables/netfilter框架中,在/etc/sysctl.conf中添加以下配置:
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
sysctl -p
注:如果上面执行sysctl -p时报错,可以先执行一下modprobe br_netfilter,然后再重新执行sysctl -p
二、使用kubeadm工具快速安装Kubernetes集群
Kubeadm到目前为止还是用于初学者快速安装和学习k8s使用,不适合用在生产环境中。
1、安装kubeadm和相关工具
本小节的配置内容需要在所有节点上进行配置。
添加k8s yum 阿里源
cat < /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl= https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
EOF
yum -y install epel-release
yum clean all
yum makecache
安装kubeadm和相关工具包:
# yum -y install docker kubelet kubeadm kubectl kubernetes-cni
启动Docker与kubelet服务:
systemctl enable docker && systemctl start docker
systemctl enable kubelet && systemctl start kubelet
注:此时kubelet的服务运行状态是异常的,因为缺少主配置文件kubelet.conf。但可以暂不处理,因为在完成Master节点的初始化后才会生成这个配置文件。
2、下载Kubernetes的相关镜像
本小节的配置内容需要在所有节点上进行配置。
(1)因为无法直接访问gcr.io下载镜像,所以需要配置一个国内的容器镜像加速器
配置一个阿里云的加速器:
- 登录 https://cr.console.aliyun.com/
- 在页面中找到并点击镜像加速按钮,即可看到属于自己的专属加速链接,选择centos版本后即可看到类似上面的配置方法提示信息
在系统中执行以下命令(mirror的地址需要更新):
tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://jzv3xt7h.mirror.aliyuncs.com"]
}
EOF
重启docker服务:
systemctl daemon-reload
systemctl restart docker
(2)下载k8s相关镜像,下载后将镜像名改为gcr.io/google_container开头的名字,以供kubeadm使用。
下面的shell脚本主要做了3件事,下载各种需要用到的容器镜像、重新打标记为符合google命令规范的版本名称、清除旧的容器镜像:
[root@gqtest1 ~]# more get-images.sh
#!/bin/bash
images=(kube-proxy-amd64:v1.7.6 kube-scheduler-amd64:v1.7.6 kube-controller-manager-amd64:v1.7.6 kube-apiserver-amd64:v1.7.6 etcd-amd64:3.0.17 pause-amd64:3.0 kubernetes-
dashboard-amd64:v1.6.1 k8s-dns-sidecar-amd64:1.14.4 k8s-dns-kube-dns-amd64:1.14.4 k8s-dns-dnsmasq-nanny-amd64:1.14.4)
for imageName in ${images[@]} ; do
docker pull cloudnil/$imageName
docker tag cloudnil/$imageName gcr.io/google_containers/$imageName
docker rmi cloudnil/$imageName
done
执行上述shell脚本,等待下载完成后,查看一下下载容器镜像的结果:
[root@gqtest1 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
gcr.io/google_containers/kube-apiserver-amd64 v1.7.6 fd35bbc17508 5 months ago 186.1 MB
gcr.io/google_containers/kube-scheduler-amd64 v1.7.6 15c1d3eed0e7 5 months ago 77.2 MB
gcr.io/google_containers/kube-controller-manager-amd64 v1.7.6 41cbd335ed40 5 months ago 138 MB
gcr.io/google_containers/kube-proxy-amd64 v1.7.6 fbb7fbc5b300 5 months ago 114.7 MB
gcr.io/google_containers/k8s-dns-kube-dns-amd64 1.14.4 2d6a3bea02c4 7 months ago 49.38 MB
gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64 1.14.4 13117b1d461f 7 months ago 41.41 MB
gcr.io/google_containers/k8s-dns-sidecar-amd64 1.14.4 c413c7235eb4 7 months ago 41.81 MB
gcr.io/google_containers/etcd-amd64 3.0.17 393e48d05c4e 7 months ago 168.9 MB
gcr.io/google_containers/kubernetes-dashboard-amd64 v1.6.1 c14ffb751676 7 months ago 134.4 MB
gcr.io/google_containers/pause-amd64 3.0 66c684b679d2 7 months ago 746.9 kB
3、运行kubeadm init安装Master
[root@gqtest1 ~]# kubeadm init --kubernetes-version=v1.7.6
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.7.6
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated CA certificate and key.
[certificates] Generated API server certificate and key.
[certificates] API Server serving cert is signed for DNS names [gqtest1.future kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.2.15]
[certificates] Generated API server kubelet client certificate and key.
[certificates] Generated service account token signing key and public key.
[certificates] Generated front-proxy CA certificate and key.
[certificates] Generated front-proxy client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/scheduler.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 37.006294 seconds
[token] Using token: 320116.d14b1964f47178bc
[apiconfig] Created RBAC rules
[addons] Applied essential addon: kube-proxy
[addons] Applied essential addon: kube-dns
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run (as a regular user):
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
http://kubernetes.io/docs/admin/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join --token 320116.d14b1964f47178bc 10.0.2.15:6443
注:选项--kubernetes-version=v1.7.6是必须的,否则会因为访问google网站被墙而无法执行命令。这里使用v1.7.6版本,与上面下载的相关容器镜像的版本有关。
上面的命令大约需要1分钟的过程,期间可以观察下tail -f /var/log/message日志文件的输出,掌握该配置过程和进度。
上面的输出信息建议保存一份,后续添加工作节点还要用到。
Kubernetes Master初始化成功后,按提示执行以下操作:
[root@gqtest1 ~]# mkdir -p $HOME/.kube
[root@gqtest1 ~]# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@gqtest1 ~]# chown $(id -u):$(id -g) $HOME/.kube/config
[root@gqtest1 ~]# kubectl get nodes
NAME STATUS AGE VERSION
gqtest1.future NotReady 32m v1.7.5
[root@gqtest1 ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-gqtest1.future 1/1 Running 0 32m
kube-system kube-apiserver-gqtest1.future 1/1 Running 0 32m
kube-system kube-controller-manager-gqtest1.future 1/1 Running 0 32m
kube-system
kube-dns-2425271678-gps35 0/3
Pending 0 33m
kube-system kube-proxy-6m2z7 1/1 Running 0 33m
kube-system kube-scheduler-gqtest1.future 1/1 Running 0 32m
[root@gqtest1 ~]# kubectl get nodes
NAME STATUS AGE VERSION
gqtest1.future NotReady 32m v1.7.5
至此完成了Master节点上k8s软件的安装,但集群内还没有可用的工作Node,也缺少容器网络的配置。
查看pods状态信息,可以看到还有一个dns的pod处于Pending状态,这是受缺少容器网络支持的影响而造成的。
查看nodes状态信息,看到gqtest1节点的状态为NotReady 。
4、安装网络插件
再详细看一下Master节点初始化时输出的提示信息,包括了网络插件的安装建议:
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
http://kubernetes.io/docs/admin/addons/
这里是选择安装weave插件,在Master节点上执行:
[root@gqtest1 ~]# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
serviceaccount "weave-net" created
clusterrole "weave-net" created
clusterrolebinding "weave-net" created
role "weave-net" created
rolebinding "weave-net" created
daemonset "weave-net" created
Weave以透明而可靠的方式实现了简单、安全的网络。关于k8s网络插件的介绍详见本文末尾。
等待一会,再观察pods的运行状态,可以看到已经全部处于正常状态了:
[root@gqtest1 ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-gqtest1.future 1/1 Running 0 34m
kube-system kube-apiserver-gqtest1.future 1/1 Running 0 34m
kube-system kube-controller-manager-gqtest1.future 1/1 Running 0 34m
kube-system kube-dns-2425271678-gps35 3/3 Running 0 35m
kube-system kube-proxy-6m2z7 1/1 Running 0 35m
kube-system kube-scheduler-gqtest1.future 1/1 Running 0 34m
kube-system weave-net-hd7k2 2/2 Running 0 1m
安装一个weave网络管理工具:
[root@gqtest1 ~]# curl -L git.io/weave -o /usr/local/bin/weave
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0
0 0 0 595 0 0 93 0 --:--:-- 0:00:06 --:--:-- 220
100 50382 100 50382 0 0 5268 0 0:00:09 0:00:09 --:--:-- 17671
[root@gqtest1 ~]# chmod a+x /usr/local/bin/weave
查看weave网络服务运行状态信息:
[root@gqtest1 ~]# weave status
Version: 2.2.0 (failed to check latest version - see logs; next check at 2018/02/23 16:26:07)
Service: router
Protocol: weave 1..2
Name: ea:0f:53:f9:2f:f0(gqtest1.future)
Encryption: disabled
PeerDiscovery: enabled
Targets: 1
Connections: 1 (1 failed)
Peers: 1
TrustedSubnets: none
Service: ipam
Status: ready
Range: 10.32.0.0/12
DefaultSubnet: 10.32.0.0/12
5、安装Node并加入集群
在工作节点上执行kubeadm join命令,加入集群:
[root@gqtest2 ~]# kubeadm join --token 320116.d14b1964f47178bc 10.0.2.15:6443
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "10.0.2.15:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.2.15:6443"
[discovery] Cluster info signature and contents are valid, will use API Server "https://10.0.2.15:6443"
[discovery] Successfully established connection with API Server "10.0.2.15:6443"
[bootstrap] Detected server version: v1.7.6
[bootstrap] The server supports the Certificates API (certificates.k8s.io/v1beta1)
[csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request
[csr] Received signed certificate from the API server, generating KubeConfig...
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
Node join complete:
* Certificate signing request sent to master and response
received.
* Kubelet informed of new secure connection details.
Run 'kubectl get nodes' on the master to see this machine join.
默认情况下,Master节点不参与工作负载,但如果希望安装出一个All-In-One的k8s环境,则可以执行以下命令,让Master节点也成为一个Node节点:
# kubectl taint nodes --all node-role.kubernetes.io/master-
注:相当于是删除Node的Label"node-role.kubernetes.io/master"
6、验证k8s集群是否成功安装完成
再观察扩容了一个工作节点后的完整集群的pods运行状态信息:
[root@gqtest1 ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-gqtest1.future 1/1 Running 1 1h
kube-system kube-apiserver-gqtest1.future 1/1 Running 1 1h
kube-system kube-controller-manager-gqtest1.future 1/1 Running 1 1h
kube-system kube-dns-2425271678-gps35 3/3 Running 3 1h
kube-system kube-proxy-0pc5d 1/1 Running 0 43m
kube-system kube-proxy-6m2z7 1/1 Running 1 1h
kube-system kube-scheduler-gqtest1.future 1/1 Running 1 1h
kube-system weave-net-3fh66 2/2 Running 0 43m
kube-system weave-net-hd7k2 2/2 Running 3 1h
查看nodes信息:
[root@gqtest1 ~]# kubectl get nodes
NAME STATUS AGE VERSION
gqtest1.future Ready 1h v1.7.5
gqtest2.future Ready 43m v1.7.5
查看k8s集群状态信息:
[root@gqtest1 ~]# kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}
查看k8s集群中的Services状态信息:
[root@gqtest1 ~]# kubectl get svc kubernetes
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 10.96.0.1 443/TCP 2h
[root@gqtest1 ~]# kubectl get svc -n kube-system
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns 10.96.0.10 53/UDP,53/TCP 2h
或者直接查看全部Services的完整信息:
[root@gqtest1 ~]# kubectl get svc --all-namespaces -o wide
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes 10.96.0.1 443/TCP 2h
kube-system kube-dns 10.96.0.10 53/UDP,53/TCP 2h k8s-app=kube-dns
三、k8s部署的故障排错与调试方法
(1)先掌握有哪些命名空间,有哪些pods,确认每个pod的运行状态
[root@gqtest1 ~]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-gqtest1.future 1/1 Running 1 2h
kube-system kube-apiserver-gqtest1.future 1/1 Running 1 2h
kube-system kube-controller-manager-gqtest1.future 1/1 Running 1 2h
kube-system kube-dns-2425271678-gps35 3/3 Running 3 2h
kube-system kube-proxy-0pc5d 1/1 Running 0 54m
kube-system kube-proxy-6m2z7 1/1 Running 1 2h
kube-system kube-scheduler-gqtest1.future 1/1 Running 1 2h
kube-system weave-net-3fh66 2/2 Running 0 54m
kube-system weave-net-hd7k2 2/2 Running 3 1h
(2)查看一个指定的pod的详细配置信息
[root@gqtest1 ~]# kubectl --namespace=kube-system describe pod kube-dns-2425271678-gps35
Name: kube-dns-2425271678-gps35
Namespace: kube-system
Node: gqtest1.future/10.0.2.15
Start Time: Fri, 23 Feb 2018 18:27:04 +0800
Labels: k8s-app=kube-dns
pod-template-hash=2425271678
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kube-dns-2425271678","uid":"386100c7-187f-11e8-9a79-08002770...
scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.32.0.2
Created By: ReplicaSet/kube-dns-2425271678
Controlled By: ReplicaSet/kube-dns-2425271678
Containers:
kubedns:
Container ID: docker://53ba0a56e18ea8130c414f42983d89100e80646b3ee20557bb47e58079a97745
Image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.4
Image ID: docker://sha256:2d6a3bea02c4f469c117aaae0ac51668585024a2c9e174403076cc1c5f79860e
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-dir=/kube-dns-config
--v=2
State: Running
Started: Fri, 23 Feb 2018 18:49:12 +0800
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Fri, 23 Feb 2018 18:27:05 +0800
Finished: Fri, 23 Feb 2018 18:43:35 +0800
Ready: True
Restart Count: 1
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment:
PROMETHEUS_PORT: 10055
Mounts:
/kube-dns-config from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-trvkm (ro)
dnsmasq:
Container ID: docker://8baceefac0a5475d932aa77cc2bd2350a28a046ea2a27313cbac42303d96817d
Image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.4
Image ID: docker://sha256:13117b1d461f84c5ff47adeaff5b016922e1baab83f47de3320cf4a6f3c4e911
Ports: 53/UDP, 53/TCP
Args:
-v=2
-logtostderr
-configDir=/etc/k8s/dns/dnsmasq-nanny
-restartDnsmasq=true
--
-k
--cache-size=1000
--log-facility=-
--server=/cluster.local/127.0.0.1#10053
--server=/in-addr.arpa/127.0.0.1#10053
--server=/ip6.arpa/127.0.0.1#10053
State: Running
Started: Fri, 23 Feb 2018 18:49:13 +0800
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Fri, 23 Feb 2018 18:27:06 +0800
Finished: Fri, 23 Feb 2018 18:43:35 +0800
Ready: True
Restart Count: 1
Requests:
cpu: 150m
memory: 20Mi
Liveness: http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment:
Mounts:
/etc/k8s/dns/dnsmasq-nanny from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-trvkm (ro)
sidecar:
Container ID: docker://5284f8602b574560a673e55d5c57ed094344016067c1531c3c803267c6a36b2b
Image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.4
Image ID: docker://sha256:c413c7235eb4ba8165ec953c0e886e22bd94f72dd360de7ab42ce340fda6550e
Port: 10054/TCP
Args:
--v=2
--logtostderr
--probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
--probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
State: Running
Started: Fri, 23 Feb 2018 18:49:14 +0800
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 23 Feb 2018 18:27:07 +0800
Finished: Fri, 23 Feb 2018 18:43:25 +0800
Ready: True
Restart Count: 1
Requests:
cpu: 10m
memory: 20Mi
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-trvkm (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
kube-dns-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-dns
Optional: true
kube-dns-token-trvkm:
Type: Secret (a volume populated by a Secret)
SecretName: kube-dns-token-trvkm
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
按selector分组,查看service和pod的详细运行状态:
[root@gqtest1 ~]# kubectl get svc -n kube-system -l k8s-app=kube-dns -o wide
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-dns 10.96.0.10 53/UDP,53/TCP 2h k8s-app=kube-dns
[root@gqtest1 ~]# kubectl get pods -n kube-system -l name=weave-net -o wide
NAME READY STATUS RESTARTS AGE IP NODE
weave-net-3fh66 2/2 Running 0 1h 10.0.2.4 gqtest2.future
weave-net-hd7k2 2/2 Running 3 1h 10.0.2.15 gqtest1.future
这是weave status之外,另一种查看weave网络服务状态的方法:
[root@gqtest1 ~]# kubectl exec -n kube-system weave-net-hd7k2 -c weave -- /home/weave/weave --local status
Version: 2.2.0 (failed to check latest version - see logs; next check at 2018/02/23 16:26:07)
Service: router
Protocol: weave 1..2
Name: ea:0f:53:f9:2f:f0(gqtest1.future)
Encryption: disabled
PeerDiscovery: enabled
Targets: 1
Connections: 2 (1 established, 1 failed)
Peers: 2 (with 2 established connections)
TrustedSubnets: none
Service: ipam
Status: ready
Range: 10.32.0.0/12
DefaultSubnet: 10.32.0.0/12
查看kubelet产生的事件日志信息,在排错时很有用:
journalctl -xeu kubelet
查看一个主机节点的配置详情:
[root@gqtest1 ~]# kubectl describe node gqtest2.future
Name: gqtest2.future
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=gqtest2.future
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:
CreationTimestamp: Fri, 23 Feb 2018 19:08:04 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Fri, 23 Feb 2018 20:14:03 +0800 Fri, 23 Feb 2018 19:08:05 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 23 Feb 2018 20:14:03 +0800 Fri, 23 Feb 2018 19:08:05 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 23 Feb 2018 20:14:03 +0800 Fri, 23 Feb 2018 19:08:05 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Fri, 23 Feb 2018 20:14:03 +0800 Fri, 23 Feb 2018 19:50:40 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.0.2.4
Hostname: gqtest2.future
Capacity:
cpu: 2
memory: 1883376Ki
pods: 110
Allocatable:
cpu: 2
memory: 1780976Ki
pods: 110
System Info:
Machine ID: 53e312c62f2942908f2035d576b42b51
System UUID: B7ADF3E2-298A-47BC-86A3-F11038C80119
Boot ID: cbecf64b-e172-4b12-b9b4-db9646f49e1d
Kernel Version: 3.10.0-693.17.1.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.7.5
Kube-Proxy Version: v1.7.5
ExternalID: gqtest2.future
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system kube-proxy-0pc5d 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system weave-net-3fh66 20m (1%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
20m (1%) 0 (0%) 0 (0%) 0 (0%)
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
24m 24m 1 kube-proxy, gqtest2.future Normal Starting Starting kube-proxy.
23m 23m 1 kubelet, gqtest2.future Normal NodeReady Node gqtest2.future status is now: NodeReady
查看提供dns服务的pod中3个容器中的应用运行日志信息:
[root@gqtest1 ~]# kubectl logs -f kube-dns-2425271678-gps35 -n kube-system -c kubedns
I0223 10:49:12.712082 1 dns.go:48] version: 1.14.3-4-gee838f6
I0223 10:49:12.726504 1 server.go:70] Using configuration read from directory: /kube-dns-config with period 10s
I0223 10:49:12.726545 1 server.go:113] FLAG: --alsologtostderr="false"
I0223 10:49:12.726553 1 server.go:113] FLAG: --config-dir="/kube-dns-config"
I0223 10:49:12.726559 1 server.go:113] FLAG: --config-map=""
I0223 10:49:12.726563 1 server.go:113] FLAG: --config-map-namespace="kube-system"
I0223 10:49:12.726567 1 server.go:113] FLAG: --config-period="10s"
I0223 10:49:12.726571 1 server.go:113] FLAG: --dns-bind-address="0.0.0.0"
I0223 10:49:12.726575 1 server.go:113] FLAG: --dns-port="10053"
I0223 10:49:12.726581 1 server.go:113] FLAG: --domain="cluster.local."
I0223 10:49:12.726588 1 server.go:113] FLAG: --federations=""
I0223 10:49:12.726595 1 server.go:113] FLAG: --healthz-port="8081"
I0223 10:49:12.726599 1 server.go:113] FLAG: --initial-sync-timeout="1m0s"
I0223 10:49:12.726603 1 server.go:113] FLAG: --kube-master-url=""
I0223 10:49:12.726608 1 server.go:113] FLAG: --kubecfg-file=""
I0223 10:49:12.726611 1 server.go:113] FLAG: --log-backtrace-at=":0"
I0223 10:49:12.726617 1 server.go:113] FLAG: --log-dir=""
I0223 10:49:12.726622 1 server.go:113] FLAG: --log-flush-frequency="5s"
I0223 10:49:12.726625 1 server.go:113] FLAG: --logtostderr="true"
I0223 10:49:12.726629 1 server.go:113] FLAG: --nameservers=""
I0223 10:49:12.726633 1 server.go:113] FLAG: --stderrthreshold="2"
I0223 10:49:12.726637 1 server.go:113] FLAG: --v="2"
I0223 10:49:12.726640 1 server.go:113] FLAG: --version="false"
I0223 10:49:12.726646 1 server.go:113] FLAG: --vmodule=""
I0223 10:49:12.726916 1 server.go:176] Starting SkyDNS server (0.0.0.0:10053)
I0223 10:49:12.727218 1 server.go:198] Skydns metrics enabled (/metrics:10055)
I0223 10:49:12.727230 1 dns.go:147] Starting endpointsController
I0223 10:49:12.727234 1 dns.go:150] Starting serviceController
I0223 10:49:12.727490 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0223 10:49:12.727497 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0223 10:49:13.230199 1 dns.go:171] Initialized services and endpoints from apiserver
I0223 10:49:13.230215 1 server.go:129] Setting up Healthz Handler (/readiness)
I0223 10:49:13.230223 1 server.go:134] Setting up cache handler (/cache)
I0223 10:49:13.230229 1 server.go:120] Status HTTP port 8081
^C
[root@gqtest1 ~]# kubectl logs -f kube-dns-2425271678-gps35 -n kube-system -c sidecar
ERROR: logging before flag.Parse: I0223 10:49:14.751622 1 main.go:48] Version v1.14.3-4-gee838f6
ERROR: logging before flag.Parse: I0223 10:49:14.751973 1 server.go:45] Starting server (options {DnsMasqPort:53 DnsMasqAddr:127.0.0.1 DnsMasqPollIntervalMs:5000 Probes:[{Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1} {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}] PrometheusAddr:0.0.0.0 PrometheusPort:10054 PrometheusPath:/metrics PrometheusNamespace:kubedns})
ERROR: logging before flag.Parse: I0223 10:49:14.751997 1 dnsprobe.go:75] Starting dnsProbe {Label:kubedns Server:127.0.0.1:10053 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}
ERROR: logging before flag.Parse: I0223 10:49:14.752105 1 dnsprobe.go:75] Starting dnsProbe {Label:dnsmasq Server:127.0.0.1:53 Name:kubernetes.default.svc.cluster.local. Interval:5s Type:1}
^C
[root@gqtest1 ~]# kubectl logs -f kube-dns-2425271678-gps35 -n kube-system -c dnsmasq
I0223 10:49:13.799678 1 main.go:76] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
I0223 10:49:13.800884 1 nanny.go:86] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
I0223 10:49:14.638909 1 nanny.go:111]
W0223 10:49:14.639013 1 nanny.go:112] Got EOF from stdout
I0223 10:49:14.639280 1 nanny.go:108] dnsmasq[10]: started, version 2.76 cachesize 1000
I0223 10:49:14.639308 1 nanny.go:108] dnsmasq[10]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
I0223 10:49:14.639314 1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0223 10:49:14.639318 1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0223 10:49:14.639321 1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0223 10:49:14.639328 1 nanny.go:108] dnsmasq[10]: reading /etc/resolv.conf
I0223 10:49:14.639332 1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain ip6.arpa
I0223 10:49:14.639336 1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa
I0223 10:49:14.639339 1 nanny.go:108] dnsmasq[10]: using nameserver 127.0.0.1#10053 for domain cluster.local
I0223 10:49:14.639343 1 nanny.go:108] dnsmasq[10]: using nameserver 192.168.5.66#53
I0223 10:49:14.639346 1 nanny.go:108] dnsmasq[10]: read /etc/hosts - 7 addresses
^C
注:名为sidecar的容器中应用输出了一些错误日志,据称是功能bug,已经在后续新版本中修复。
进入到kubedns容器系统中做检查的方法:
[root@gqtest1 ~]# docker exec -it 53ba0a56e18ea8130c414f42983d89100e80646b3ee20557bb47e58079a97745 /bin/sh
/ # ls
bin etc kube-dns lib mnt root sbin sys usr
dev home kube-dns-config media proc run srv tmp var
/ # ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
12: eth0@if13: mtu 1376 qdisc noqueue state UP
link/ether d6:25:ca:25:47:ac brd ff:ff:ff:ff:ff:ff
inet 10.32.0.2/12 brd 10.47.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::d425:caff:fe25:47ac/64 scope link tentative flags 08
valid_lft forever preferred_lft forever
注:这进入的是一个名为kube-dns-2425271678-gps35的pod中的一个名为kubedns的容器。
使用kubeadm搭建k8s集群失败后,怎么重新来过,初始化失败后的清理命令如下:
kubeadm reset
ifconfig weave down
ip link delete weave
rm -rf /var/lib/cni/
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X
reboot
四、k8s 网络插件知识扫盲
下面列出的插件是专门为Kubernetes开发的。
Kubenet
Kubenet是专门用于单节点环境,它可以通过与设定规则的云平台一起使用来实现节点间的通信。Kubenet是一个非常基本的网络插件,如果你正在寻找跨节点的网络策略,Kubenet没有任何帮助。
Flannel
Flannel是一个被CoreOS开发的,专门为k8s设计的overlay网络方案。Flannel的主要优点是它经过了良好的测试并且成本很低。Flannel 为整个集群的提供分布式处理。k8s 为正确的通信和服务,进行端口映射和分配唯一的ip地址给每个pod。如果你用Google Compute,它将非常兼容。然而,如果你用其他的云服务商,可能会遇到很多困难。Flannel正解决这个问题。
Weave
Weave是由Weavenetwork开发,用于连接,监视,可视化和监控Kubernetes。通过Weave,可以创建网络,更快的部署防火墙,并通过自动化的故障排除,提高网络运维效率。
Weave创建的网络可以连通在不同位置的容器,比如公有云、私有云,虚拟机和裸金属设备,容器网络可以承载二层和三层的流量,并支持多播;内建的加密功能让容器隔离更加容易实现; Weave网络还可以自动选择最快的路径路由容器流量,保证容器的网络速度。每台运行weave的主机都需要运行几个必须的容器,透过这些容器实现跨主机通讯。在一个weave网络中,会有多个运行在不同主机的peer,这些peer起到路由的作用。
在weave routers间会创建TCP或UDP连接,工作的流程是:
如果用户启用了加密(启用加密的方法会在后面说明),这些全双工的连接会使用UDP协议承载封装好的网包,并且可以透过防火墙。
在实现上,weave会在主机上创建一个网桥,容器会通过veth peer连接到网桥,一般情况下由weave自带的地址分配工具自动分配为容器分配地址,如果用户进行干预,则以用户设置优先。
因为起到路由作用的weave容器也会连接到上述网桥,所以,weave routers会借助pcap,透过设置为混杂模式的接入网桥的接口捕捉以太网包,但是对于直接透过内核转发的本地容器间流量或是宿主机与本地容器间的流量则会被排除。
被捕捉的数据包通过UDP协议转发到其他Host上的weave router peer上,一旦收到这些包,路由会把包通过pcap注入到它的网桥接口或转发到其他的peers。
weave路由会通过mac地址识别不同的peer所在的位置,连同拓扑信息去决定转发路径,避免采取像泛洪般的手段,把每个包都发到每个peer上,以提高网络性能。
用GRE/VXLAN 的 OpenVSwitch
OpenVSwitch 用于跨节点建立网络。隧道类型可以是VxLAN或GRE(通用的路由封装)。GRE用于在IP网络上进行数据帧的隧道化。在VXLAN的一帧数据中,包括了原来的2层数据包头,被封装的一个IP包头,一个UDP包头和一个VXLAN包头。VXLAN更适用于要进行大规模网络隔离的大型数据中心。值得注意的是,OpenVSwitch也是Xen默认的网络方案,同样也适用于像KVM, VIrtualBox, Proxmox VE 或 OpenStack 等平台。
Calico
从k8s 1.0 开始, Calico 为k8s pod提供了三层网络。Calico提供了简单的,可扩展的,安全的虚拟网络。它用边界网关协议(BGP)为每个Pod提供根分布,并可使用IT基础设施集成Kubernetes集群。Calico可以几乎与所有的云平台兼容,在Kubernetes环境下,具有高可扩展性。除了Kubernetes, Calico 还支持 OpenStack, Mesos, and Docker。
参考1:《Kubernetes权威指南——从Docker到Kubernetes实践全接触》第2章。
参考2:Kubernetes中文社区 | 中文文档
参考2:大约上百篇的各类技术博客、github.com issues或stackoverflow.com questions等!