我们知道容器之间的通信,例如上图中的10.0.9.3与10.0.9.5通信是通过overlay网络,是通过一个VXLAN tannel来实现的。
但是service和service之间通信是通过VIP实现的。例如client的service与web 的service进行通信,而web有一个scale,因此client访问web是通过访问虚拟IP(VIP)来实现的。那么VIP是怎么映射到具体的10.0.9.5或者10.0.9.6呢?这是通过LVS实现的。
什么是LVS?
LVS,Linux Virtual Server。可以实现在系统级别的负载均衡。
我们可以在三个节点上任何一个节点访问80端口都可以访问到wordpress,这个实现就是IngressNetWork的作用。任何一台swarm节点上去访问端口服务时,会通过端口服务通过本节点IPVS(IP Virtual Service),并通过LVS给loadbanlance到真正具有service上面,例如上图中我们通过访问Docker Host3
转发到另外两个节点中。
我们的实验环境跟上一节一样,我们将whoami 的scale 变为2
iie4bu@swarm-manager:~$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
h4wlczp85sw5 client replicated 1/1 busybox:1.28.3
9i6wz6cg4koc whoami replicated 3/3 jwilder/whoami:latest *:8000->8000/tcp
iie4bu@swarm-manager:~$ docker service scale whoami=2
whoami scaled to 2
overall progress: 2 out of 2 tasks
1/2: running [==================================================>]
2/2: running [==================================================>]
verify: Service converged
查看whoami的运行情况:
iie4bu@swarm-manager:~$ docker service ps whoami
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
6hhuf528spdw whoami.1 jwilder/whoami:latest swarm-manager Running Running 17 hours ago
9idgk9jbrlcm whoami.3 jwilder/whoami:latest swarm-worker2 Running Running 16 hours ago
分别运行在swarm-manager和swarm-worker2节点上。
在swarm-manager中访问whoami:
iie4bu@swarm-manager:~$ curl 127.0.0.1:8000
I'm cc9f97cc5056
iie4bu@swarm-manager:~$ curl 127.0.0.1:8000
I'm f47e05019fd9
在swarm-worker1中访问whoami:
iie4bu@swarm-worker1:~$ curl 127.0.0.1:8000
I'm f47e05019fd9
iie4bu@swarm-worker1:~$ curl 127.0.0.1:8000
I'm cc9f97cc5056
为什么对于swarm-worker1来讲本地并没有whoami的service,确能访问8000端口?
通过iptables可以看到本地的转发规则:
iie4bu@swarm-worker1:~$ sudo iptables -nL -t nat
[sudo] password for iie4bu:
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DOCKER-INGRESS all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
DOCKER-INGRESS all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match src-type LOCAL
MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0
MASQUERADE all -- 172.19.0.0/16 0.0.0.0/0
MASQUERADE all -- 172.18.0.0/16 0.0.0.0/0
MASQUERADE tcp -- 172.17.0.2 172.17.0.2 tcp dpt:443
MASQUERADE tcp -- 172.17.0.2 172.17.0.2 tcp dpt:80
MASQUERADE tcp -- 172.17.0.2 172.17.0.2 tcp dpt:22
Chain DOCKER (2 references)
target prot opt source destination
RETURN all -- 0.0.0.0/0 0.0.0.0/0
RETURN all -- 0.0.0.0/0 0.0.0.0/0
RETURN all -- 0.0.0.0/0 0.0.0.0/0
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:443 to:172.17.0.2:443
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:81 to:172.17.0.2:80
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:23 to:172.17.0.2:22
Chain DOCKER-INGRESS (2 references)
target prot opt source destination
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8000 to:172.19.0.2:8000
RETURN all -- 0.0.0.0/0 0.0.0.0/0
我们看到DOCKER-INGRESS,它的转发规则是如果我们访问tcp8000duank端口,它会转发到172.19.0.2:8000上,那么这个172.19.0.2:8000是什么呢?
我们先看一下本地的ip:
iie4bu@swarm-worker1:~$ ifconfig
br-3f2fc691f5da Link encap:Ethernet HWaddr 02:42:c8:f4:03:ad
inet addr:172.18.0.1 Bcast:172.18.255.255 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
docker0 Link encap:Ethernet HWaddr 02:42:43:69:b7:60
inet addr:172.17.0.1 Bcast:172.17.255.255 Mask:255.255.0.0
inet6 addr: fe80::42:43ff:fe69:b760/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:83 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:9208 (9.2 KB)
docker_gwbridge Link encap:Ethernet HWaddr 02:42:41:bf:4d:15
inet addr:172.19.0.1 Bcast:172.19.255.255 Mask:255.255.0.0
inet6 addr: fe80::42:41ff:febf:4d15/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:42 errors:0 dropped:0 overruns:0 frame:0
TX packets:142 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3556 (3.5 KB) TX bytes:13857 (13.8 KB)
......
......
我们可以看到本地有个网络是docker_gwbridge,它的ip是172.19.0.1,这个地址与172.19.0.2在同一个网段。所以说我们可以猜测172.19.0.2肯定是与docker_gwbridge相连的一个网络。可以通过brctl show 查看:
iie4bu@swarm-worker1:~$ brctl show
bridge name bridge id STP enabled interfaces
br-3f2fc691f5da 8000.0242c8f403ad no
docker0 8000.02424369b760 no veth75a496d
docker_gwbridge 8000.024241bf4d15 no veth400f4b4
veth44af5f8
可以看到docker_gwbridge有两个interface,这两个interface哪个是呢?
通过docker network ls:
iie4bu@swarm-worker1:~$ docker network ls
NETWORK ID NAME DRIVER SCOPE
bdf23298113d bridge bridge local
969e60257ba5 docker_gwbridge bridge local
cdcffe1b31cb host host local
uz1kgf9j6m48 ingress overlay swarm
cojus8blvkdo my-demo overlay swarm
3f2fc691f5da network_default bridge local
dba4587ee914 none null local
查看docker_gwbridge的详细信息:
iie4bu@swarm-worker1:~$ docker network inspect docker_gwbridge
[
{
"Name": "docker_gwbridge",
"Id": "969e60257ba50b070374d31ea43a0550d6cd3ae3e68623746642fe8736dee5a4",
"Created": "2019-04-08T09:47:59.343371327+08:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.19.0.0/16",
"Gateway": "172.19.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"5559895ccaea972dad4f2fe52c0ec754d2d7c485dceb35083719768f611552e7": {
"Name": "gateway_bf5031da0049",
"EndpointID": "6ad54f228134798de719549c0f93c804425beb85dd97f024408c4f7fc393fdf9",
"MacAddress": "02:42:ac:13:00:03",
"IPv4Address": "172.19.0.3/16",
"IPv6Address": ""
},
"ingress-sbox": {
"Name": "gateway_ingress-sbox",
"EndpointID": "177757eca7a18630ae91c01b8ac67bada25ce1ea050dad4ac5cc318093062003",
"MacAddress": "02:42:ac:13:00:02",
"IPv4Address": "172.19.0.2/16",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.bridge.enable_icc": "false",
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.bridge.name": "docker_gwbridge"
},
"Labels": {}
}
]
可以看到与docker_gwbridge相连的container有两个,分别是gateway_bf5031da0049和gateway_ingress-sbox,而gateway_ingress-sbox的ip正是172.19.0.2。也就是说数据被转发到这个gateway_ingress-sbox network namespace中去了。
进入gateway_ingress-sbox:
iie4bu@swarm-worker1:~$ sudo ls /var/run/docker/netns
1-cojus8blvk 1-uz1kgf9j6m 44e6e70b2177 b1ba5b4dd9f2 bf5031da0049 ingress_sbox
iie4bu@swarm-worker1:~$ sudo nsenter --net=//var/run/docker/netns/ingress_sbox
-bash: /home/iie4bu/anaconda3/etc/profiel.d/conda.sh: No such file or directory
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
If your shell is Bash or a Bourne variant, enable conda for the current user with
$ echo ". /home/iie4bu/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc
or, for all users, enable conda with
$ sudo ln -s /home/iie4bu/anaconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh
The options above will permanently enable the 'conda' command, but they do NOT
put conda's base (root) environment on PATH. To do so, run
$ conda activate
in your terminal, or to put the base environment on PATH permanently, run
$ echo "conda activate" >> ~/.bashrc
Previous to conda 4.4, the recommended way to activate conda was to modify PATH in
your ~/.bashrc file. You should manually remove the line that looks like
export PATH="/home/iie4bu/anaconda3/bin:$PATH"
^^^ The above line should NO LONGER be in your ~/.bashrc file! ^^^
root@swarm-worker1:~#
这样就进入了ingress_sbox的namespace了。查看ip
root@swarm-worker1:~# ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
13: eth0@if14: mtu 1450 qdisc noqueue state UP group default
link/ether 02:42:0a:ff:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.255.0.3/16 brd 10.255.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet 10.255.0.168/32 brd 10.255.0.168 scope global eth0
valid_lft forever preferred_lft forever
15: eth1@if16: mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:13:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 172.19.0.2/16 brd 172.19.255.255 scope global eth1
valid_lft forever preferred_lft forever
发现ip是172.19.0.2。
为了掩饰LVS,我们退出ingress_sbox的namespace。在swarm-worker1中安装ipvsadm,这是lvs的一个管理工具。
iie4bu@swarm-worker1:~$ sudo apt-get install ipvsadm
安装成功之后,在进入到ingress_sbox中,然后输入命令iptables -nL -t mangle
root@swarm-worker1:~# iptables -nL -t mangle
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
MARK tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8000 MARK set 0x100
Chain INPUT (policy ACCEPT)
target prot opt source destination
MARK all -- 0.0.0.0/0 10.255.0.168 MARK set 0x100
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MARK一行就表示负载均衡。
输入命令ipvsadm -l
root@swarm-worker1:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM 256 rr
-> 10.255.0.5:0 Masq 1 0 0
-> 10.255.0.7:0 Masq 1 0 0
10.255.0.5 10.255.0.7就是whoami的service地址。
在swarm-manager上查看whoami:
iie4bu@swarm-manager:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cc9f97cc5056 jwilder/whoami:latest "/app/http" 19 hours ago Up 19 hours 8000/tcp whoami.1.6hhuf528spdw9j9pla7l3tv3t
iie4bu@swarm-manager:~$ docker exec cc9 ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 10.255.0.168/32 brd 10.255.0.168 scope global lo
valid_lft forever preferred_lft forever
inet 10.0.2.5/32 brd 10.0.2.5 scope global lo
valid_lft forever preferred_lft forever
18: eth0@if19: mtu 1450 qdisc noqueue state UP
link/ether 02:42:0a:ff:00:05 brd ff:ff:ff:ff:ff:ff
inet 10.255.0.5/16 brd 10.255.255.255 scope global eth0
valid_lft forever preferred_lft forever
20: eth1@if21: mtu 1500 qdisc noqueue state UP
link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.3/16 brd 172.18.255.255 scope global eth1
valid_lft forever preferred_lft forever
23: eth2@if24: mtu 1450 qdisc noqueue state UP
link/ether 02:42:0a:00:02:07 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.7/24 brd 10.0.2.255 scope global eth2
valid_lft forever preferred_lft forever
iie4bu@swarm-manager:~$
whoami的IP正是10.255.0.5。
在swarm-worker2上查看:
iie4bu@swarm-worker2:~$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f47e05019fd9 jwilder/whoami:latest "/app/http" 20 hours ago Up 20 hours 8000/tcp whoami.3.9idgk9jbrlcm3ufvkmbmvv2t8
633ddfc082b9 busybox:1.28.3 "sh -c 'while true; …" 20 hours ago Up 20 hours client.1.3iv3gworpyr5vdo0h9eortlw0
iie4bu@swarm-worker2:~$ docker exec -it f47 ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 10.255.0.168/32 brd 10.255.0.168 scope global lo
valid_lft forever preferred_lft forever
inet 10.0.2.5/32 brd 10.0.2.5 scope global lo
valid_lft forever preferred_lft forever
24: eth2@if25: mtu 1450 qdisc noqueue state UP
link/ether 02:42:0a:00:02:0d brd ff:ff:ff:ff:ff:ff
inet 10.0.2.13/24 brd 10.0.2.255 scope global eth2
valid_lft forever preferred_lft forever
26: eth1@if27: mtu 1500 qdisc noqueue state UP
link/ether 02:42:ac:12:00:04 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.4/16 brd 172.18.255.255 scope global eth1
valid_lft forever preferred_lft forever
28: eth0@if29: mtu 1450 qdisc noqueue state UP
link/ether 02:42:0a:ff:00:07 brd ff:ff:ff:ff:ff:ff
inet 10.255.0.7/16 brd 10.255.255.255 scope global eth0
valid_lft forever preferred_lft forever
whoami的IP正是10.255.0.7。
因此当我们的数据包进入到ingress_sbox通过lvs做了一个负载均衡,也就是说我们访问8000端口,它会把数据包转发到10.255.0.5和10.255.0.7做一个负载。然后就会进入到swarm节点中了。