Table of Contents
Introduction
Basics of k8s DNS
kube-dns customization
Setup our own name server
Prepare the Dockerfile for dnsmasq image
Prepare our dnsmasq pod, i.e. custom name server
Integrate our name server with kube-dns: create the ConfigMap
Add entries to our DNS and test it out
Issues
Pod CIDR
References
Introduction
In some cases the k8s default DNS service does not match our needs, for example an Oracle RAC database has to resolve the public/private/vip/scan IPs for its nodes. This document describes how we can customize the DNS service including integrating our own DNS server.
This documents assumes you know the basic concepts of k8s and DNS.
This documents assumes you have read this page: multiple network interfaces for a pod, because we will touch this in later section.
k8s 1.9+ supports better DNS customization, but our environment is still 1.8 compatible, and uses kube-dns instead of CoreDNS.
Basics of k8s DNS
The default DNS record of a k8s pod is: a-b-c-d.sub-domain.my-namespace.pod.cluster.local
a-b-c-d is the IP string of the pod. For example, a pod my-pod with IP address 10.244.0.2 in default namespace, will have an A record of "10-244-0-2.default.pod.cluster.local". If the pod specifies a sub domain, such as ora-subdomain, it will be "10-244-0-2.ora-subdomain.default.pod.cluster.local". There is no pod name or hostname record in the DNS server. However, if we create a headless service (services without a clusterIP) with the name exactly same as the sub-domain, we will have a record "my-pod.ora-subdomin.default.svc.cluster.local" (note the suffix "pod.cluster.local" becomes "svc.cluster.local"). This is also where the StatefulSet's stable network identifier comes from, in that case, in addition to each pod that belongs to the StatefulSet, the service itself (i.e. "sub-domain.my-namespace.svc.cluster.local") will resolve to the IPs (a set) of the pods.
A service's DNS record is always my-svc.my-namespace.svc.cluster.local.
kube-dns customization
kube-dns can be extended to support additional DNS name servers. In short, consider the following config map:
1 2 3 4 5 6 7 8 9 10 |
apiVersion: v1 kind: ConfigMap metadata: name: kube-dns namespace: kube-system data: stubDomains: | { "example.com" : [ "1.2.3.4" ]} upstreamNameservers: | [ "8.8.8.8" , "8.8.4.4" ] |
The custom configurations are stubDomains and upstreamNameservers.
First, let's see what is the resolve process when these configurations does NOT exist:
- The name resolves according to the default rule of k8s DNS, as stated in "Basics of k8s DNS" section.
- If the name is not found (e.g. names with example.com domain ), it is forwarded to upstream name servers that inherited from the node.
This means the k8s node's /etc/hosts and /etc/resolv.conf functions inside the pod containers.
Now what if stubDomains and/or upstramNameservers are specified? The process is:
- For names with the cluster suffix, i.e. ".cluster.local", the request is sent to kube-dns.
- For names with the stub domain suffix, i.e. ".example.com", the request is sent to the configured custom DNS server, listening at 1.2.3.4.
- For names without a matching suffix, for example "github.com", the request is forwarded to the upstream DNS, in this case 8.8.8.8 and 8.8.4.4 (These are public DNS servers of Google company).
Important: If upstreamNameservers is specified, we'll lost the node's resolving. A natural solution would be copying the node's resolve to the custom name server.
Setup our own name server
So what will be our custom name server? dnsmasq is an easy-to-configure, widely used one (kube-dns itself is using it). In this section, we will setup our own dnsmasq server from scratch.
Prepare the Dockerfile for dnsmasq image
First write the Dockerfile
Dockerfile
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
FROM oraclelinux: 7 # Install dnsmasq RUN yum install -y dnsmasq # Install netcat RUN yum install -y nc # Pre-configure dnsmasq RUN echo 'listen-address=__LOCAL_IP__' >> /etc/dnsmasq.conf RUN echo 'resolv-file=/etc/resolv.dnsmasq.conf' >> /etc/dnsmasq.conf RUN echo 'conf-dir=/etc/dnsmasq.d' >> /etc/dnsmasq.conf RUN touch /etc/resolv.dnsmasq.conf # Copied from node host /etc/resolv.conf, TODO: automate this RUN echo 'nameserver my.internal.name.server1.ip' >> /etc/resolv.dnsmasq.conf RUN echo 'nameserver my.internal.name.server2.ip' >> /etc/resolv.dnsmasq.conf RUN echo 'nameserver my.internal.name.server3.ip' >> /etc/resolv.dnsmasq.conf RUN echo 'search cn.my.com my.com mycorp.com' >> /etc/resolv.dnsmasq.conf # This directory will usually be provided with the -v option. # RUN echo 'address=/example.com/xx.xx.xx.xx' >> /etc/dnsmasq.d/0hosts # On the other hand the above directory isn't reloaded with a SIGHUP. Instead # we can use an --addn-hosts file, see run.sh. RUN touch /etc/addn-hosts ADD run.sh /root/run.sh EXPOSE 22 EXPOSE 53 EXPOSE 12345 CMD /root/run.sh |
Note that we explicitly specify "resolv-file=resolv.dnsmasq.conf" to replace the default /etc/resolv.conf and thus bypass the k8s pod default resolving. We copy the node host name servers and search domains, so our custom name server can still resolve the names out of the k8s cluster. Actually what we copy here is the kube-dns pod's /etc/resolv.conf, which in turn copies (partly) the file of the node:
bash -4.2$ kubectl get po -n kube-system | grep "kube-dns" NAME READY STATUS RESTARTS AGE kube-dns-545bc4bfd4-zjsz8 3 /3 Running 4 4d bash -4.2$ kubectl exec kube-dns-545bc4bfd4-zjsz8 -c kubedns -n kube-system cat /etc/resolv .conf nameserver my.internal.name.server1.ip nameserver my.internal.name.server2.ip nameserver my.internal.name.server3.ip search cn.my.com my.com mycorp.com bash -4.2$ cat /etc/resolv .conf options timout:1 options attempts:2 ; generated by /usr/sbin/dhclient-script search cn.my.com mycorp.com my.com nameserver my.internal.name.server1.ip nameserver my.internal.name.server2.ip nameserver my.internal.name.server3.ip |
The /etc/resolv.conf specified the "upstream" name servers of kube-dns, and explains how it inherited from the node's name resolving.
The run.sh specified in the CMD field of Dockerfile is:
#!/bin/bash sed -i s /__LOCAL_IP__/ $POD_IP/ /etc/dnsmasq .conf dnsmasq --addn-hosts= /etc/addn-hosts & echo "Config is /etc/dnsmasq.conf" echo "--addn-hosts=/etc/addn-hosts" # Start netcat, listen on port 12345 and reload the host line automatically on the fly while [ 1 ]; do #sleep 3600 m=$(nc -l 0.0.0.0 12345); echo $m; echo $m >> /etc/addn-hosts ; kill -HUP $(pgrep dnsmasq) echo "-- Reloaded --" done |
put it to the same path of Dockerfile and build the image:
docker build -- rm --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy --build-arg no_proxy=$no_proxy -t my /dnsmasq . |
Prepare our dnsmasq pod, i.e. custom name server
Yaml work:
dnsmasq.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
apiVersion: v1 kind: Pod metadata: name: mydns labels: app: mydns spec: containers: - image: my/dnsmasq name: mydns command: [ "/bin/sh" , "-c" , "/root/run.sh" ] imagePullPolicy: Never securityContext: privileged: true env: - name: POD_IP valueFrom: fieldRef: fieldPath: status.podIP --- apiVersion: v1 kind: Service metadata: name: mydns-service labels: app: mydns-service spec: ports: - port: 12345 name: inbound-port - port: 53 name: dns-service-port selector: app: mydns |
Then create the pod and service:
kubectl apply -f dnsmasq.yaml |
Integrate our name server with kube-dns: create the ConfigMap
After the pod mydns starts, use "kubectl logs.." to see if it starts successfully, and then get its IP:
bash -4.2$ kubectl logs mydns Config is /etc/dnsmasq .conf --addn-hosts= /etc/addn-hosts bash -4.2$ kubectl get po -o wide | grep mydns | awk '{print $6}' 10.244.0.11 |
Now write our ConfigMap yaml file using this IP:
cm.yaml
1 2 3 4 5 6 7 8 9 10 |
apiVersion: v1 kind: ConfigMap metadata: name: kube-dns namespace: kube-system data: stubDomains: | { "example.com" : [ "10.244.0.11" ]} upstreamNameservers: | [ "10.244.0.11" ] |
Apply it
Add entries to our DNS and test it out
As we stated previously, we assume you have read Multiple Network Interfaces for a k8s pod, now we are about to test our DNS server with some simple pod that enables multiple network interfaces. The testing pod is an ole7 container doing nothing but sleeping:
ole7.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
apiVersion: v1 kind: Pod metadata: name: ole7 annotations: k8s.v1.cni.cncf.io/networks: macvlan-conf spec: volumes: - name: tempdir emptyDir: {} initContainers: - name: get-priv-ip image: oraclelinux: 7 command: [ "/bin/bash" , "-c" , "ip a | grep 192.168 | awk '{ print substr($2, 1, index($2, \"/\") - 1) }' 2>&1 | tee /temp/PRIV_IP" ] volumeMounts: - name: "tempdir" mountPath: "/temp" containers: - name: ole7 command: [ "/bin/bash" , "-c" , "sleep 2000000000000" ] image: oraclelinux: 7 env: - name: POD_IP valueFrom: fieldRef: fieldPath: status.podIP volumeMounts: - name: "tempdir" mountPath: "/temp" |
Create it by:
kubectl apply -f ole7.yaml |
The pod has two IP addresses, one is the flannel default (displays in "kubectl get pod ...") and one is in file /temp/PRIV_IP in the container.
Once the pod is initiated, we can test it out. In the following example, I got both public and private IPs for pod "ole7", added them to our custom name server (by redirecting the echo output to /dev/tcp/mydns-service/12345) dynamically, giving them names. Next, I pinged all the names with or without the domain suffix, I also ping some hosts that out of the k8s cluster, e.g. vm09xxl and home.cn.my.com, and all the commands are successful.
Test commands
bash -4.2$ kubectl get po -o wide | grep ole7 | awk '{print $6}' 10.244.0.26 bash -4.2$ kubectl exec ole7 -- cat /temp/PRIV_IP 192.168.1.211 bash -4.2$ kubectl exec -ti ole7 -- bash [root@ole7 /] # echo "10.244.0.26 ole.example.com ole" > /dev/tcp/mydns-service/12345 [root@ole7 /] # echo "192.168.1.211 ole-priv.example.com ole-priv" > /dev/tcp/mydns-service/12345 [root@ole7 /] # ping -c 3 ole PING ole (10.244.0.26) 56(84) bytes of data. 64 bytes from ole7 (10.244.0.26): icmp_seq=1 ttl=64 time =0.057 ms 64 bytes from ole7 (10.244.0.26): icmp_seq=2 ttl=64 time =0.057 ms 64 bytes from ole7 (10.244.0.26): icmp_seq=3 ttl=64 time =0.038 ms --- ole ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min /avg/max/mdev = 0.038 /0 .050 /0 .057 /0 .012 ms [root@ole7 /] # ping -c 3 ole.example.com PING ole.example.com (10.244.0.26) 56(84) bytes of data. 64 bytes from ole7 (10.244.0.26): icmp_seq=1 ttl=64 time =0.022 ms 64 bytes from ole7 (10.244.0.26): icmp_seq=2 ttl=64 time =0.032 ms 64 bytes from ole7 (10.244.0.26): icmp_seq=3 ttl=64 time =0.038 ms --- ole.example.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min /avg/max/mdev = 0.022 /0 .030 /0 .038 /0 .009 ms [root@ole7 /] # ping -c 3 ole-priv PING ole-priv (192.168.1.211) 56(84) bytes of data. 64 bytes from ole7 (192.168.1.211): icmp_seq=1 ttl=64 time =0.023 ms 64 bytes from ole7 (192.168.1.211): icmp_seq=2 ttl=64 time =0.045 ms 64 bytes from ole7 (192.168.1.211): icmp_seq=3 ttl=64 time =0.043 ms --- ole-priv ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min /avg/max/mdev = 0.023 /0 .037 /0 .045 /0 .009 ms [root@ole7 /] # ping -c 3 ole-priv.example.com PING ole-priv.example.com (192.168.1.211) 56(84) bytes of data. 64 bytes from ole7 (192.168.1.211): icmp_seq=1 ttl=64 time =0.026 ms 64 bytes from ole7 (192.168.1.211): icmp_seq=2 ttl=64 time =0.054 ms 64 bytes from ole7 (192.168.1.211): icmp_seq=3 ttl=64 time =0.038 ms --- ole-priv.example.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min /avg/max/mdev = 0.026 /0 .039 /0 .054 /0 .012 ms [root@ole7 /] # ping -c 3 vm09xxl PING vm09xxl.cn.my.com (10.xxx.xxx.xxx) 56(84) bytes of data. 64 bytes from vm09xxl.cn.my.com (10.xxx.xxx.xxx): icmp_seq=1 ttl=57 time =0.703 ms 64 bytes from vm09xxl.cn.my.com (10.xxx.xxx.xxx): icmp_seq=2 ttl=57 time =0.716 ms 64 bytes from vm09xxl.cn.my.com (10.xxx.xxx.xxx): icmp_seq=3 ttl=57 time =0.591 ms --- vm09xxl.cn.my.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min /avg/max/mdev = 0.591 /0 .670 /0 .716 /0 .056 ms [root@ole7 /] # ping -c 3 home.cn.my.com PING vmibfj.cn.my.com (10.xxx.x.xxx) 56(84) bytes of data. 64 bytes from vmibfj.cn.my.com (10.xxx.x.xxx): icmp_seq=1 ttl=57 time=0.602 ms 64 bytes from vmibfj.cn.my.com (10.xxx.x.xxx): icmp_seq=2 ttl=57 time=0.657 ms 64 bytes from vmibfj.cn.my.com (10.xxx.x.xxx): icmp_seq=3 ttl=57 time=0.684 ms --- vmibfj.cn.my.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min /avg/max/mdev = 0.602 /0 .647 /0 .684 /0 .045 ms |
We add two names for a single pod IP, e.g "ole.exmple.com" and "ole" to addn-hosts in our name server. When a name resolving request is sent to kube-dns, if the name has domain suffix i.e. example.com, the request is forwarded to the custom name server specified in the "stubDomains" attribute of the ConfigMap. If the name does not contain suffix, the request is firstly handled by kube-dns by adding the "...cluster.local" suffix, if the name could not be found in k8s cluster, kube-dns will forward it to "upstreamNameservers", in our case, this is the same custom name server, and the name is found in addn-hosts. On the other hand, a name such as "vm09xxl" follows the same resolving path, but cannot be found in addn-hosts, so the name server consults its /etc/resolv.dnsmasq.conf and forwards the request to its upstream name servers specified there.
Issues
Pod CIDR
Our current pod_network_cidr="10.244.0.0/16", the IP(s) allocated to pods conflict to those of our VMs'. For example, 10.244.0.25 is somevm1.cn.my.com, 10.244.0.26 is somevm2.cn.my.com. This may case confusion in our name resolving. We may need to change pod_network_cidr to another value.
References
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/
https://kubernetes.io/blog/2017/04/configuring-private-dns-zones-upstream-nameservers-kubernetes/
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
https://github.com/noteed/docker-dnsmasq
https://pracucci.com/kubernetes-dns-resolution-ndots-options-and-why-it-may-affect-application-performances.html