众所周知,在一个kubernetes集群中,各服务之间可以通过各自的FQDN互相访问,而集群中的kube-dns服务为此提供了域名解析的功能, 使用如下的命令可以看到
root@cxy:~# kget svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 35d
假设在集群中有一个名为proxy的服务如下:
root@cxy:~# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
proxy ClusterIP 10.107.29.12 80/TCP,443/TCP,7777/TCP,8565/TCP,8665/TCP,8088/TCP,8595/TCP 2d6h
则其在集群内的FQDN为proxy.default.svc.cluster.local, 可通过nslookup查找得到:
root@cxy:/# nslookup proxy
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: proxy.default.svc.cluster.local
Address: 10.107.29.12
正如前文所述, kube-dns支持的service的FQDN格式为proxy.default.svc.cluster.local, 其中:
在实际的生产中,容器化并不是从服务的最初开发阶段便被应用的,因此,只要不是雷同,服务运行时实际所需的域名一定不是xxxx.default.svc.cluster.local, 而一定是其他的格式,例如:proxy.mgmt.pix.yun.com。那么,应该如何利用kube-dns也为这样的集群外部域名提供解析呢?如果将命名空间改为mgmt, 并修改coredns配置中的域名为yun.com, 则kube-dns能解析到格式为xxx.mgmt.svc.yun.com, 可是由于svc这一段无法被设置,所以还是没办法将proxy.mgmt.pix.yun.com完全转化为kube-dns可解析的内部域名。
方法如下:
运行下面命令,添加rewrite stop 这部分配置块(https://coredns.io/plugins/rewrite/),可以将解析请求中匹配到的xxx.mgmt.pix.yun.com的域名转化为xxx.default.svc.cluster.local进行解析,且返回的结果中的域名仍显示为xxx.mgmt.pix.yun.com:
root@cxy:/# kubectl edit cm/coredns -n kube-system
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
rewrite stop {
name regex (.*)\.mgmt\.pix\.yun\.com {1}.default.svc.cluster.local
answer name (.*)\.default\.svc\.cluster\.local {1}.mgmt.pix.yun.com
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"Corefile":".:53 {\n errors\n health {\n lameduck 5s\n }\n ready\n #rewrite name regex (.*)\\.mgmt\\.pitrix\\.yunify\\.com {1}.default.svc.cluster.local\n rewrite stop {\n name regex (.*)\\.mgmt\\.pitrix\\.yunify\\.com {1}.default.svc.cluster.local\n answer name (.*)\\.default\\.svc\\.cluster\\.local {1}.mgmt.pitrix.yunify.com\n }\n kubernetes cluster.local in-addr.arpa ip6.arpa {\n #kubernetes mgmt.pitrix.yunify.com in-addr.arpa ip6.arpa {\n pods insecure\n fallthrough in-addr.arpa ip6.arpa\n ttl 30\n }\n prometheus :9153\n forward . /etc/resolv.conf {\n max_concurrent 1000\n }\n cache 30\n loop\n reload\n loadbalance\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"creationTimestamp":"2021-03-04T12:01:53Z","managedFields":[{"apiVersion":"v1","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:Corefile":{}}},"manager":"kubeadm","operation":"Update","time":"2021-03-04T12:01:53Z"}],"name":"coredns","namespace":"kube-system","resourceVersion":"280","uid":"fa78b69f-9d99-4ce1-82ce-6ee0ab966e47"}}
creationTimestamp: "2021-04-08T03:17:54Z"
name: coredns
namespace: kube-system
resourceVersion: "5975594"
uid: 5f9dc9d1-f9b7-476f-851a-ebdf81bfe165
这样修改后,等待两分钟,配置就会自动生效了,此时来验证结果:
root@cxy:/# nslookup proxy.mgmt.pix.yun.com
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: proxy.mgmt.pix.yun.com
Address: 10.107.29.12
由上面的测试看出,此时已可以通过proxy.mgmt.pitrix.yunify.com解析到proxy服务正确的ip, 然而,在测试中发现,python程序中的socker.gethostbyname("proxy.mgmt.pix.yun.com")仍无法获取到ip地址
root@cxy:/# python
Python 2.7.12 (default, Mar 1 2021, 11:38:31)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.gethostbyname("proxy.mgmt.pix.yun.com")
Traceback (most recent call last):
File "", line 1, in
socket.gaierror: [Errno -2] Name or service not known
经过研究发现,跟/etc/resolv.conf中的一项配置有关(options ndots: 5):
root@cxy:/# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
此项配置的含义是,当FQDN中包含的“.” 数量大于等于5时,将FQDN作为一个完整的域名进行解析,否则,将FQDN作为前缀,依次尝试用search后面设置的域名后缀进行补全后,作为一个完整的域名进行解析,在本文的情景中, “proxy.mgmt.pix.yun.com”中包含4个“.”, 因此,会尝试对其补全后再进行解析,于是,最终的解析请求中的完整域名如下:
proxy.mgmt.pix.yun.com.default.svc.cluster.local
proxy.mgmt.pix.yun.com.svc.cluster.local
proxy.mgmt.pix.yun.com.cluster.local
上述完整域名不能在coredns中被解析,因此获取ip失败。 查明原因后,解决方法就显而易见了,只需将options ndots的数量减小到小于需解析的外部域名中包含的“.”数量即可,例如改为3:
root@cxy:/# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:3
此时再测试, 就可以成功解析出ip了:
root@cxy:/# python
Python 2.7.12 (default, Mar 1 2021, 11:38:31)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.gethostbyname("proxy.mgmt.pix.yun.com")
'10.107.29.12'
那么, 在哪里可以设置容器的/etc/resolv.conf中的该选项呢? 经过研究,kubernetes提供了对dns的配置选项,如下的dnsConfig部分即可设置ndots的值为3
apiVersion: apps/v1
kind: Deployment
metadata:
name: proxy
labels:
app: proxy
spec:
replicas: 1
selector:
matchLabels:
app: proxy
template:
metadata:
labels:
app: proxy
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
dnsConfig:
options:
- name: ndots
value: "3"
containers:
- name: proxy
image: proxy
imagePullPolicy: Always