本文为Kubernetes监控系列的第二篇文章,系列目录如下:
Kubernetes监控开源工具基本介绍以及如何使用Sysdig进行监控
Kubernetes集群的监控报警策略最佳实践
Kubernetes中的服务发现与故障排除(本篇)
Docker与Kubernetes在WayBlazer的使用案例(敬请期待)
它们是孤立的,你和你要监视的进程之间存在一定的障碍,在主机上运行的传统故障排除工具不了解容器、命名空间和编排平台。
它们带来最小的运行时间,只需要服务及其依赖项,而不需要所有的故障排除工具,想象下只用busybox来进行故障排除!
它们调度在你集群、进行容器移动,扩容或者缩容。随着流程的结束,它们变得非常不稳定,出现并消失。
并通过新的虚拟网络层互相通信。
第一部分:Kubernetes故障排除之服务发现
第二部分:Kubernetes故障排除之DNS解析
第三部分:Kubernetes故障排除之使用Docker运行容器
$ kubectl create namespace critical-app
namespace "critical-app" created
$ kubectl create -f backend.yaml
service "backend" created
deployment "backend" created
$ kubectl run -it --image=tutum/curl client --namespace critical-app --restart=Never
$ sudo sysdig -k http://127.0.0.1:8080 -s8192 -zw capture.scap
-k http://localhost:8080 连接Kubernetes API
-s8192 扩大IO缓冲区,因为我们需要显示全部内容,否则默认情况下会被切断,
-zw capture.scap 将系统调用与元数据信息压缩并dump到文件中
$ sysdig -r capture.scap -pk -NA "fd.type in (ipv4, ipv6) and (k8s.ns.name=critical-app or proc.name=skydns)" | less
-r capture.scap 读取捕获文件
-pk 打印Kubernetes部分到stdout中
-NA 显示ASCII输出
[...]
10030 16:41:39.536689965 0 client (b3a718d8b339) curl (22370:13) < socket fd=3(<4>)
10031 16:41:39.536694724 0 client (b3a718d8b339) curl (22370:13) > connect fd=3(<4>)
10032 16:41:39.536703160 0 client (b3a718d8b339) curl (22370:13) < connect res=0 tuple=172.17.0.7:46162->10.0.2.15:53
10048 16:41:39.536831645 1 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
10049 16:41:39.536834352 1 (36ae6d09d26e) skydns (17280:11) < recvmsg res=87 size=87 data=
backendcritical-appsvcclusterlocalcritical-appsvcclusterlocal tuple=::ffff:172.17.0.7:46162->:::53
10050 16:41:39.536837173 1 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
[...]
[...]
10096 16:41:39.538247116 1 (36ae6d09d26e) skydns (4639:8) > write fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=221
10097 16:41:39.538275108 1 (36ae6d09d26e) skydns (4639:8) < write res=221 data=
GET /v2/keys/skydns/local/cluster/svc/critical-app/local/cluster/svc/critical-app/backend?quorum=false&recursive=true&sorted=false HTTP/1.1
Host: 10.0.2.15:4001
User-Agent: Go 1.1 package http
Accept-Encoding: gzip
10166 16:41:39.538636659 1 (36ae6d09d26e) skydns (4617:1) > read fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=4096
10167 16:41:39.538638040 1 (36ae6d09d26e) skydns (4617:1) < read res=285 data=
HTTP/1.1 404 Not Found
Content-Type: application/json
X-Etcd-Cluster-Id: 7e27652122e8b2ae
X-Etcd-Index: 1259
Date: Thu, 08 Dec 2016 15:41:39 GMT
Content-Length: 112
{"errorCode":100,"message":"Key not found","cause":"/skydns/local/cluster/svc/critical-app/local","index":1259}
[...]
[...]
10218 16:41:39.538914765 0 client (b3a718d8b339) curl (22370:13) < connect res=0 tuple=172.17.0.7:35547->10.0.2.15:53
10242 16:41:39.539005618 1 (36ae6d09d26e) skydns (17280:11) < recvmsg res=74 size=74 data=
backendcritical-appsvcclusterlocalsvcclusterlocal tuple=::ffff:172.17.0.7:35547->:::53
10247 16:41:39.539018226 1 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
10248 16:41:39.539019925 1 (36ae6d09d26e) skydns (17280:11) < recvmsg res=74 size=74 data=
0]backendcritical-appsvcclusterlocalsvcclusterlocal tuple=::ffff:172.17.0.7:35547->:::53
10249 16:41:39.539022522 1 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
10273 16:41:39.539210393 1 (36ae6d09d26e) skydns (4639:8) > write fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=208
10274 16:41:39.539239613 1 (36ae6d09d26e) skydns (4639:8) < write res=208 data=
GET /v2/keys/skydns/local/cluster/svc/local/cluster/svc/critical-app/backend?quorum=false&recursive=true&sorted=false HTTP/1.1
Host: 10.0.2.15:4001
User-Agent: Go 1.1 package http
Accept-Encoding: gzip
10343 16:41:39.539465153 1 (36ae6d09d26e) skydns (4617:1) > read fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=4096
10345 16:41:39.539467440 1 (36ae6d09d26e) skydns (4617:1) < read res=271 data=
HTTP/1.1 404 Not Found
[...]
[...]
10396 16:41:39.539686075 0 client (b3a718d8b339) curl (22370:13) < connect res=0 tuple=172.17.0.7:40788->10.0.2.15:53
10418 16:41:39.539755453 0 (36ae6d09d26e) skydns (17280:11) < recvmsg res=70 size=70 data=
backendcritical-appsvcclusterlocalclusterlocal tuple=::ffff:172.17.0.7:40788->:::53
10433 16:41:39.539800679 0 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
10434 16:41:39.539802549 0 (36ae6d09d26e) skydns (17280:11) < recvmsg res=70 size=70 data=
backendcritical-appsvcclusterlocalclusterlocal tuple=::ffff:172.17.0.7:40788->:::53
10437 16:41:39.539805177 0 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
10478 16:41:39.540166087 1 (36ae6d09d26e) skydns (4639:8) > write fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=204
10479 16:41:39.540183401 1 (36ae6d09d26e) skydns (4639:8) < write res=204 data=
GET /v2/keys/skydns/local/cluster/local/cluster/svc/critical-app/backend?quorum=false&recursive=true&sorted=false HTTP/1.1
Host: 10.0.2.15:4001
User-Agent: Go 1.1 package http
Accept-Encoding: gzip
10523 16:41:39.540421040 1 (36ae6d09d26e) skydns (4617:1) > read fd=3(<4t>10.0.2.15:34108->10.0.2.15:4001) size=4096
10524 16:41:39.540422241 1 (36ae6d09d26e) skydns (4617:1) < read res=267 data=
HTTP/1.1 404 Not Found
[...]
[...]
10690 16:41:39.541376928 1 (36ae6d09d26e) skydns (4639:8) > connect fd=8(<4>)
10691 16:41:39.541381577 1 (36ae6d09d26e) skydns (4639:8) < connect res=0 tuple=10.0.2.15:44249->8.8.8.8:53
10702 16:41:39.541415384 1 (36ae6d09d26e) skydns (4639:8) > write fd=8(<4u>10.0.2.15:44249->8.8.8.8:53) size=68
10703 16:41:39.541531434 1 (36ae6d09d26e) skydns (4639:8) < write res=68 data=
Nbackendcritical-appsvcclusterlocallocaldomain
10717 16:41:39.541629507 1 (36ae6d09d26e) skydns (4639:8) > read fd=8(<4u>10.0.2.15:44249->8.8.8.8:53) size=512
10718 16:41:39.541632726 1 (36ae6d09d26e) skydns (4639:8) < read res=-11(EAGAIN) data=
58215 16:41:43.541261462 1 (36ae6d09d26e) skydns (4640:9) > close fd=7(<4u>10.0.2.15:54272->8.8.8.8:53)
58216 16:41:43.541263355 1 (36ae6d09d26e) skydns (4640:9) < close res=0
[...]
[...]
58292 16:41:43.542822050 1 (36ae6d09d26e) skydns (4640:9) < write res=68 data=
Nbackendcritical-appsvcclusterlocallocaldomain
58293 16:41:43.542829001 1 (36ae6d09d26e) skydns (4640:9) > read fd=8(<4u>10.0.2.15:56371->8.8.8.8:53) size=512
58294 16:41:43.542831896 1 (36ae6d09d26e) skydns (4640:9) < read res=-11(EAGAIN) data=
75904 16:41:44.543459524 0 (36ae6d09d26e) skydns (17280:11) < recvmsg res=68 size=68 data=
[...]
75923 16:41:44.543560717 0 (36ae6d09d26e) skydns (17280:11) < recvmsg res=68 size=68 data=
Nbackendcritical-appsvcclusterlocallocaldomain tuple=::ffff:172.17.0.7:47441->:::53
75927 16:41:44.543569823 0 (36ae6d09d26e) skydns (17280:11) > recvmsg fd=6(<3t>:::53)
104208 16:41:47.551459027 1 (36ae6d09d26e) skydns (4640:9) > close fd=7(<4u>10.0.2.15:42126->8.8.8.8:53)
104209 16:41:47.551460674 1 (36ae6d09d26e) skydns (4640:9) < close res=0
[...]
[...]
104406 16:41:47.552626262 0 (36ae6d09d26e) skydns (4639:8) < write res=190 data=
GET /v2/keys/skydns/local/cluster/svc/critical-app/backend?quorum=false&recursive=true&sorted=false HTTP/1.1
[...]
104457 16:41:47.552919333 1 (36ae6d09d26e) skydns (4617:1) < read res=543 data=
HTTP/1.1 200 OK
[...]
{"action":"get","node":{"key":"/skydns/local/cluster/svc/critical-app/backend","dir":true,"nodes":[{"key":"/skydns/local/cluster/svc/critical-app/back
end/6ead029a","value":"{\"host\":\"10.3.0.214\",\"priority\":10,\"weight\":10,\"ttl\":30,\"targetstrip\":0}","modifiedIndex":270,"createdIndex":270}],
"modifiedIndex":270,"createdIndex":270}}
[...]
107992 16:41:48.087346702 1 client (b3a718d8b339) curl (22369:12) < connect res=-115(EINPROGRESS) tuple=172.17.0.7:36404->10.3.0.214:80
108002 16:41:48.087377769 1 client (b3a718d8b339) curl (22369:12) > sendto fd=3(<4t>172.17.0.7:36404->10.3.0.214:80) size=102 tuple=NULL
108005 16:41:48.087401339 0 backend-1440326531-csj02 (730a6f492270) nginx (7203:6) < accept fd=3(<4t>172.17.0.7:36404->172.17.0.5:80) tuple=172.17.0.7:36404->172.17.0.5:80 queuepct=0 queuelen=0 queuemax=128
108006 16:41:48.087406626 1 client (b3a718d8b339) curl (22369:12) < sendto res=102 data=
GET / HTTP/1.1
[...]
108024 16:41:48.087541774 0 backend-1440326531-csj02 (730a6f492270) nginx (7203:6) < writev res=238 data=
HTTP/1.1 200 OK
Server: nginx/1.10.2
[...]
$ sysdig -pk -NA -r capture.scap -c echo_fds "fd.type=file and fd.name=/etc/resolv.conf"
------ Read 119B from [k8s_client.eee910bc_client_critical-app_da587b4d-bd5a-11e6-8bdb-028ce2cfb533_bacd01b6] [b3a718d8b339] /etc/resolv.conf (curl)
search critical-app.svc.cluster.local svc.cluster.local cluster.local localdomain
nameserver 10.0.2.15
options ndots:5
[...]
$ sudo sysdig -pk -s8192 -c echo_fds -NA "fd.type in (unix) and evt.buffer contains localdomain"
$ kubectl run -it --image=tutum/curl client-foobar --namespace critical-app --restart=Never
[...]
------ Write 2.61KB to [k8s-kubelet] [de7157ba23c4] (hyperkube)
POST /containers/create?name=k8s_POD.d8dbe16c_client-foobar_critical-app_085ac98f-bd64-11e6-8bdb-028ce2cfb533_9430448e HTTP/1.1
Host: docker
[...]
"DnsSearch":["critical-app.svc.cluster.local","svc.cluster.local","cluster.local","localdomain"],
[...]
https://github.com/gianlucaborello/healthchecker-kubernetes/blob/master/manifests/backend.yaml
https://sysdig.com/blog/understanding-how-kubernetes-services-dns-work/
http://kubernetes.io/docs/user-guide/services/#dns
http://blog.kubernetes.io/2015/11/monitoring-Kubernetes-with-Sysdig.html
https://github.com/bencer/troubleshooting-docker-kubernetes-sysdig/raw/master/capture.scap
http://man7.org/linux/man-pages/man5/resolv.conf.5.html
http://www.sysdig.org/install/
https://www.sysdig.com/