前文我聊到了docker machine的简单使用和基本原理的说明,回顾请参考https://www.cnblogs.com/qiuhom-1874/p/13160915.html;今天我们来聊一聊docker集群管理工具docker swarm;docker swarm是docker 官方的集群管理工具,它可以让跨主机节点来创建,管理docker 集群;它的主要作用就是可以把多个节点主机的docker环境整合成一个大的docker资源池;docker swarm面向的就是这个大的docker 资源池在上面管理容器;在前面我们都只是在单台主机上的创建,管理容器,但是在生产环境中通常一台物理机上的容器实在是不能够满足当前业务的需求,所以docker swarm提供了一种集群解决方案,方便在多个节点上创建,管理容器;接下来我们来看看docker swarm集群的搭建过程吧;
docker swarm 在我们安装好docker时就已经安装好了,我们可以使用docker info来查看
[root@node1 ~]# docker info Client: Debug Mode: false Server: Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: 19.03.11 Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 3.10.0-693.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 3.686GiB Name: docker-node01 ID: 4HXP:YJ5W:4SM5:NAPM:NXPZ:QFIU:ARVJ:BYDG:KVWU:5AAJ:77GC:X7GQ Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: provider=generic Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false [root@node1 ~]#
提示:从上面的信息可以看到,swarm是处于非活跃状态,这是因为我们还没有初始化集群,所以对应的swarm选项的值是处于inactive状态;
初始化集群
[root@docker-node01 ~]# docker swarm init --advertise-addr 192.168.0.41 Swarm initialized: current node (ynz304mbltxx10v3i15ldkmj1) is now a manager. To add a worker to this swarm, run the following command: docker swarm join --token SWMTKN-1-6difxlq3wc8emlwxzuw95gp8rmvbz2oq62kux3as0e4rbyqhk3-2m9x12n102ca4qlyjpseobzik 192.168.0.41:2377 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions. [root@docker-node01 ~]#
提示:从上面反馈的信息可以看到,集群初始化成功,并且告诉我们当前节点为管理节点,如果想要其他节点加入到该集群,可以在对应节点上运行docker swarm join --token SWMTKN-1-6difxlq3wc8emlwxzuw95gp8rmvbz2oq62kux3as0e4rbyqhk3-2m9x12n102ca4qlyjpseobzik 192.168.0.41:2377 这个命令,就把对应节点当作work节点加入到该集群,如果想要以管理节点身份加入到集群,我们需要在当前终端运行docker swarm join-token manager命令
[root@docker-node01 ~]# docker swarm join-token manager To add a manager to this swarm, run the following command: docker swarm join --token SWMTKN-1-6difxlq3wc8emlwxzuw95gp8rmvbz2oq62kux3as0e4rbyqhk3-dqjeh8hp6cp99bksjc03b8yu3 192.168.0.41:2377 [root@docker-node01 ~]#
提示:我们执行docker swarm join-token manager命令,它返回了一个命令,并告诉我们添加一个管理节点,在对应节点上执行docker swarm join --token SWMTKN-1-6difxlq3wc8emlwxzuw95gp8rmvbz2oq62kux3as0e4rbyqhk3-dqjeh8hp6cp99bksjc03b8yu3 192.168.0.41:2377命令即可;
到此docker swarm集群就初始化完毕,接下来我们把其他节点加入到该集群
把docker-node02以work节点身份加入集群
[root@node2 ~]# docker swarm join --token SWMTKN-1-6difxlq3wc8emlwxzuw95gp8rmvbz2oq62kux3as0e4rbyqhk3-2m9x12n102ca4qlyjpseobzik 192.168.0.41:2377 This node joined a swarm as a worker. [root@node2 ~]#
提示:没有报错就表示加入集群成功;我们可以使用docker info来查看当前的docker 环境详细信息
提示:从上面的信息可以看到,在docker-node02这台主机上docker swarm 已经激活,并且可以看到管理节点的地址;除了以上方式可以确定docker-node02以及加入到集群;我们还可以在管理节点上运行docker node ls 查看集群节点信息;
查看集群节点信息
提示:在管理节点上运行docker node ls 就可以列出当前集群里有多少节点已经成功加入进来;
把docker-node03以管理节点身份加入到集群
提示:可以看到docker-node03已经是集群的管理节点,所以可以在docker-node03这个节点执行docker node ls 命令;到此docker swarm集群就搭建好了;接下来我们来说一说docker swarm集群的常用管理
有关节点相关管理命令
docker node ls :列出当前集群上的所有节点
[root@docker-node01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION ynz304mbltxx10v3i15ldkmj1 * docker-node01 Ready Active Leader 19.03.11 tzkm0ymzjdmc1r8d54snievf1 docker-node02 Ready Active 19.03.11 aeo8j7zit9qkoeeft3j0q1h0z docker-node03 Ready Active Reachable 19.03.11 [root@docker-node01 ~]#
提示:该命令只能在管理节点上执行;
docker node inspect :查看指定节点的详细信息;
[root@docker-node01 ~]# docker node inspect docker-node01 [ { "ID": "ynz304mbltxx10v3i15ldkmj1", "Version": { "Index": 9 }, "CreatedAt": "2020-06-20T05:57:17.57684293Z", "UpdatedAt": "2020-06-20T05:57:18.18575648Z", "Spec": { "Labels": {}, "Role": "manager", "Availability": "active" }, "Description": { "Hostname": "docker-node01", "Platform": { "Architecture": "x86_64", "OS": "linux" }, "Resources": { "NanoCPUs": 4000000000, "MemoryBytes": 3958075392 }, "Engine": { "EngineVersion": "19.03.11", "Labels": { "provider": "generic" }, "Plugins": [ { "Type": "Log", "Name": "awslogs" }, { "Type": "Log", "Name": "fluentd" }, { "Type": "Log", "Name": "gcplogs" }, { "Type": "Log", "Name": "gelf" }, { "Type": "Log", "Name": "journald" }, { "Type": "Log", "Name": "json-file" }, { "Type": "Log", "Name": "local" }, { "Type": "Log", "Name": "logentries" }, { "Type": "Log", "Name": "splunk" }, { "Type": "Log", "Name": "syslog" }, { "Type": "Network", "Name": "bridge" }, { "Type": "Network", "Name": "host" }, { "Type": "Network", "Name": "ipvlan" }, { "Type": "Network", "Name": "macvlan" }, { "Type": "Network", "Name": "null" }, { "Type": "Network", "Name": "overlay" }, { "Type": "Volume", "Name": "local" } ] }, "TLSInfo": { "TrustRoot": "-----BEGIN CERTIFICATE-----\nMIIBaTCCARCgAwIBAgIUeBd/eSZ7WaiyLby9o1yWpjps3gwwCgYIKoZIzj0EAwIw\nEzERMA8GA1UEAxMIc3dhcm0tY2EwHhcNMjAwNjIwMDU1MjAwWhcNNDAwNjE1MDU1\nMjAwWjATMREwDwYDVQQDEwhzd2FybS1jYTBZMBMGByqGSM49AgEGCCqGSM49AwEH\nA0IABMsYxnGoPbM4gqb23E1TvOeQcLcY56XysLuF8tYKm56GuKpeD/SqXrUCYqKZ\nHV+WSqcM0fD1g+mgZwlUwFzNxhajQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMB\nAf8EBTADAQH/MB0GA1UdDgQWBBTV64kbvS83eRHyI6hdJeEIv3GmrTAKBggqhkjO\nPQQDAgNHADBEAiBBB4hLn0ijybJWH5j5rtMdAoj8l/6M3PXERnRSlhbcawIgLoby\newMHCnm8IIrUGe7s4CZ07iHG477punuPMKDgqJ0=\n-----END CERTIFICATE-----\n", "CertIssuerSubject": "MBMxETAPBgNVBAMTCHN3YXJtLWNh", "CertIssuerPublicKey": "MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEyxjGcag9sziCpvbcTVO855BwtxjnpfKwu4Xy1gqbnoa4ql4P9KpetQJiopkdX5ZKpwzR8PWD6aBnCVTAXM3GFg==" } }, "Status": { "State": "ready", "Addr": "192.168.0.41" }, "ManagerStatus": { "Leader": true, "Reachability": "reachable", "Addr": "192.168.0.41:2377" } } ] [root@docker-node01 ~]#
docker node ps :列出指定节点上运行容器的清单
[root@docker-node01 ~]# docker node ps ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS [root@docker-node01 ~]# docker node ps docker-node01 ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS [root@docker-node01 ~]#
提示:类似docker ps 命令,我上面没有运行容器,所以看不到对应信息;默认不指定节点名称表示查看当前节点上的运行容器清单;
docker node rm :删除指定节点
[root@docker-node01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION ynz304mbltxx10v3i15ldkmj1 * docker-node01 Ready Active Leader 19.03.11 tzkm0ymzjdmc1r8d54snievf1 docker-node02 Ready Active 19.03.11 aeo8j7zit9qkoeeft3j0q1h0z docker-node03 Ready Active Reachable 19.03.11 [root@docker-node01 ~]# docker node rm docker-node03 Error response from daemon: rpc error: code = FailedPrecondition desc = node aeo8j7zit9qkoeeft3j0q1h0z is a cluster manager and is a member of the raft cluster. It must be demoted to worker before removal [root@docker-node01 ~]# docker node rm docker-node02 Error response from daemon: rpc error: code = FailedPrecondition desc = node tzkm0ymzjdmc1r8d54snievf1 is not down and can't be removed [root@docker-node01 ~]#
提示:删除节点前必须满足,被删除的节点不是管理节点,其次就是要删除的节点必须是down状态;
docker swarm leave:离开当前集群
[root@docker-node03 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e7958ffa16cd nginx "/docker-entrypoint.…" 28 seconds ago Up 26 seconds 80/tcp n1 [root@docker-node03 ~]# docker swarm leave Error response from daemon: You are attempting to leave the swarm on a node that is participating as a manager. Removing this node leaves 1 managers out of 2. Without a Raft quorum your swarm will be inaccessible. The only way to restore a swarm that has lost consensus is to reinitialize it with `--force-new-cluster`. Use `--force` to suppress this message. [root@docker-node03 ~]# docker swarm leave -f Node left the swarm. [root@docker-node03 ~]#
提示:管理节点默认是不允许离开集群的,如果强制使用-f选项离开集群,会导致在其他管理节点无法正常管理集群;
[root@docker-node01 ~]# docker node ls Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online. [root@docker-node01 ~]#
提示:我们在docker-node01上现在就不能使用docker node ls 来查看集群节点列表了;解决办法重新初始化集群;
[root@docker-node01 ~]# docker node ls Error response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online. [root@docker-node01 ~]# docker swarm init --advertise-addr 192.168.0.41 Error response from daemon: This node is already part of a swarm. Use "docker swarm leave" to leave this swarm and join another one. [root@docker-node01 ~]# docker swarm init --force-new-cluster Swarm initialized: current node (ynz304mbltxx10v3i15ldkmj1) is now a manager. To add a worker to this swarm, run the following command: docker swarm join --token SWMTKN-1-6difxlq3wc8emlwxzuw95gp8rmvbz2oq62kux3as0e4rbyqhk3-2m9x12n102ca4qlyjpseobzik 192.168.0.41:2377 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions. [root@docker-node01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION ynz304mbltxx10v3i15ldkmj1 * docker-node01 Ready Active Leader 19.03.11 tzkm0ymzjdmc1r8d54snievf1 docker-node02 Unknown Active 19.03.11 aeo8j7zit9qkoeeft3j0q1h0z docker-node03 Down Active 19.03.11 rm3j7cjvmoa35yy8ckuzoay46 docker-node03 Unknown Active 19.03.11 [root@docker-node01 ~]#
提示:重新初始化集群不能使用docker swarm init --advertise-addr 192.168.0.41这种方式初始化,必须使用docker swarm init --force-new-cluster,该命令表示使用从当前状态强制创建一个集群;现在我们就可以使用docker node rm 把down状态的节点从集群删除;
删除down状态的节点
[root@docker-node01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION ynz304mbltxx10v3i15ldkmj1 * docker-node01 Ready Active Leader 19.03.11 tzkm0ymzjdmc1r8d54snievf1 docker-node02 Ready Active 19.03.11 aeo8j7zit9qkoeeft3j0q1h0z docker-node03 Down Active 19.03.11 rm3j7cjvmoa35yy8ckuzoay46 docker-node03 Down Active 19.03.11 [root@docker-node01 ~]# docker node rm aeo8j7zit9qkoeeft3j0q1h0z rm3j7cjvmoa35yy8ckuzoay46 aeo8j7zit9qkoeeft3j0q1h0z rm3j7cjvmoa35yy8ckuzoay46 [root@docker-node01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION ynz304mbltxx10v3i15ldkmj1 * docker-node01 Ready Active Leader 19.03.11 tzkm0ymzjdmc1r8d54snievf1 docker-node02 Ready Active 19.03.11 [root@docker-node01 ~]#
docker node promote:把指定节点提升为管理节点
[root@docker-node01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION ynz304mbltxx10v3i15ldkmj1 * docker-node01 Ready Active Leader 19.03.11 tzkm0ymzjdmc1r8d54snievf1 docker-node02 Ready Active 19.03.11 [root@docker-node01 ~]# docker node promote docker-node02 Node docker-node02 promoted to a manager in the swarm. [root@docker-node01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION ynz304mbltxx10v3i15ldkmj1 * docker-node01 Ready Active Leader 19.03.11 tzkm0ymzjdmc1r8d54snievf1 docker-node02 Ready Active Reachable 19.03.11 [root@docker-node01 ~]#
docker node demote:把指定节点降级为work节点
[root@docker-node01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION ynz304mbltxx10v3i15ldkmj1 * docker-node01 Ready Active Leader 19.03.11 tzkm0ymzjdmc1r8d54snievf1 docker-node02 Ready Active Reachable 19.03.11 [root@docker-node01 ~]# docker node demote docker-node02 Manager docker-node02 demoted in the swarm. [root@docker-node01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION ynz304mbltxx10v3i15ldkmj1 * docker-node01 Ready Active Leader 19.03.11 tzkm0ymzjdmc1r8d54snievf1 docker-node02 Ready Active 19.03.11 [root@docker-node01 ~]#
docker node update:更新指定节点
[root@docker-node01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION ynz304mbltxx10v3i15ldkmj1 * docker-node01 Ready Active Leader 19.03.11 tzkm0ymzjdmc1r8d54snievf1 docker-node02 Ready Active 19.03.11 [root@docker-node01 ~]# docker node update docker-node01 --availability drain docker-node01 [root@docker-node01 ~]# docker node ls ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION ynz304mbltxx10v3i15ldkmj1 * docker-node01 Ready Drain Leader 19.03.11 tzkm0ymzjdmc1r8d54snievf1 docker-node02 Ready Active 19.03.11 [root@docker-node01 ~]#
提示:以上命令把docker-node01的availability属性更改为drain,这样更改后docker-node01的资源就不会被调度到用来运行容器;
为docker swarm集群添加图形界面
[root@docker-node01 docker]# docker run --name v1 -d -p 8888:8080 -e HOST=192.168.0.41 -e PORT=8080 -v /var/run/docker.sock:/var/run/docker.sock docker-registry.io/test/visualizer Unable to find image 'docker-registry.io/test/visualizer:latest' locally latest: Pulling from test/visualizer cd784148e348: Pull complete f6268ae5d1d7: Pull complete 97eb9028b14b: Pull complete 9975a7a2a3d1: Pull complete ba903e5e6801: Pull complete 7f034edb1086: Pull complete cd5dbf77b483: Pull complete 5e7311667ddb: Pull complete 687c1072bfcb: Pull complete aa18e5d3472c: Pull complete a3da1957bd6b: Pull complete e42dbf1c67c4: Pull complete 5a18b01011d2: Pull complete Digest: sha256:54d65cbcbff52ee7d789cd285fbe68f07a46e3419c8fcded437af4c616915c85 Status: Downloaded newer image for docker-registry.io/test/visualizer:latest 3c15b186ff51848130393944e09a427bd40d2504c54614f93e28477a4961f8b6 [root@docker-node01 docker]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3c15b186ff51 docker-registry.io/test/visualizer "npm start" 6 seconds ago Up 5 seconds (health: starting) 0.0.0.0:8888->8080/tcp v1 [root@docker-node01 docker]#
提示:我上面的命令是从私有仓库中下载的镜像,原因是互联网下载太慢了,所以我提前下载好,放在私有仓库中;有关私有仓库的搭建使用,请参考https://www.cnblogs.com/qiuhom-1874/p/13061984.html或者https://www.cnblogs.com/qiuhom-1874/p/13058338.html;在管理节点上运行visualizer容器后,我们就可以直接访问该管理节点地址的8888端口,就可以看到当前容器的情况;如下图
提示:从上面的信息可以看到当前集群有一个管理节点和两个work节点;现目前集群里没有运行任何容器;
在docker swarm运行服务
[root@docker-node01 ~]# docker service create --name myweb docker-registry.io/test/nginx:latest i0j6wvvtfe1360ibj04jxulmd overall progress: 1 out of 1 tasks 1/1: running [==================================================>] verify: Service converged [root@docker-node01 ~]# docker service ls ID NAME MODE REPLICAS IMAGE PORTS i0j6wvvtfe13 myweb replicated 1/1 docker-registry.io/test/nginx:latest [root@docker-node01 ~]# docker service ps myweb ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 99y8towew77e myweb.1 docker-registry.io/test/nginx:latest docker-node03 Running Running 1 minutes ago [root@docker-node01 ~]#
提示:docker service create 表示在当前swarm集群环境中创建一个服务;以上命令表示在swarm集群上创建一个名为myweb的服务,用docker-registry.io/test/nginx:latest镜像;默认情况下只启动一个副本;
提示:可以看到当前集群中运行了一个myweb的容器,并且运行在docker-node03这台主机上;
在swarm 集群上创建多个副本服务
[root@docker-node01 ~]# docker service create --replicas 3 --name web docker-registry.io/test/nginx:latest mbiap412jyugfpi4a38mb5i1k overall progress: 3 out of 3 tasks 1/3: running [==================================================>] 2/3: running [==================================================>] 3/3: running [==================================================>] verify: Service converged [root@docker-node01 ~]# docker service ls ID NAME MODE REPLICAS IMAGE PORTS i0j6wvvtfe13 myweb replicated 1/1 docker-registry.io/test/nginx:latest mbiap412jyug web replicated 3/3 docker-registry.io/test/nginx:latest [root@docker-node01 ~]#docker service ps web ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 1rt0e7u4senz web.1 docker-registry.io/test/nginx:latest docker-node02 Running Running 28 seconds ago 31ll0zu7udld web.2 docker-registry.io/test/nginx:latest docker-node02 Running Running 28 seconds ago l9jtbswl2x22 web.3 docker-registry.io/test/nginx:latest docker-node03 Running Running 32 seconds ago [root@docker-node01 ~]#
提示:--replicas选项用来指定期望运行的副本数量,该选项会在集群上创建我们指定数量的副本,即便我们集群中有节点宕机,它始终会创建我们指定数量的容器在集群上运行着;
测试:把docker-node03关机,看看我们运行的服务是否会迁移到节点2上呢?
docker-node03关机前
docker-node03关机后
提示:从上面的截图可以看到,当节点3宕机后,节点3上跑的所有容器,会全部迁移到节点2上来;这就是创建容器时用--replicas选项的作用;总结一点,创建服务使用副本模式,该服务所在节点故障,它会把对应节点上的服务迁移到其他节点上;这里需要提醒一点的是,只要集群上的服务副本满足我们指定的replicas的数量,即便故障的节点恢复了,它是不会把服务迁移回来的;
[root@docker-node01 ~]# docker service ps web ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS 1rt0e7u4senz web.1 docker-registry.io/test/nginx:latest docker-node02 Running Running 15 minutes ago 31ll0zu7udld web.2 docker-registry.io/test/nginx:latest docker-node02 Running Running 15 minutes ago t3gjvsgtpuql web.3 docker-registry.io/test/nginx:latest docker-node02 Running Running 6 minutes ago l9jtbswl2x22 \_ web.3 docker-registry.io/test/nginx:latest docker-node03 Shutdown Shutdown 23 seconds ago [root@docker-node01 ~]#
提示:我们在管理节点查看服务列表,可以看到它迁移服务就是把对应节点上的副本停掉,然后在其他节点创建一个新的副本;
服务伸缩
[root@docker-node01 ~]# docker service ls ID NAME MODE REPLICAS IMAGE PORTS i0j6wvvtfe13 myweb replicated 1/1 docker-registry.io/test/nginx:latest mbiap412jyug web replicated 3/3 docker-registry.io/test/nginx:latest [root@docker-node01 ~]# docker service scale myweb=3 web=5 myweb scaled to 3 web scaled to 5 overall progress: 3 out of 3 tasks 1/3: running [==================================================>] 2/3: running [==================================================>] 3/3: running [==================================================>] verify: Service converged overall progress: 5 out of 5 tasks 1/5: running [==================================================>] 2/5: running [==================================================>] 3/5: running [==================================================>] 4/5: running [==================================================>] 5/5: running [==================================================>] verify: Service converged [root@docker-node01 ~]# docker service ls ID NAME MODE REPLICAS IMAGE PORTS i0j6wvvtfe13 myweb replicated 3/3 docker-registry.io/test/nginx:latest mbiap412jyug web replicated 5/5 docker-registry.io/test/nginx:latest [root@docker-node01 ~]# docker service ps myweb web ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS j7w490h2lons myweb.1 docker-registry.io/test/nginx:latest docker-node02 Running Running 12 minutes ago 1rt0e7u4senz web.1 docker-registry.io/test/nginx:latest docker-node02 Running Running 21 minutes ago 99y8towew77e myweb.1 docker-registry.io/test/nginx:latest docker-node03 Shutdown Shutdown 5 minutes ago en5rk0jf09wu myweb.2 docker-registry.io/test/nginx:latest docker-node03 Running Running 31 seconds ago 31ll0zu7udld web.2 docker-registry.io/test/nginx:latest docker-node02 Running Running 21 minutes ago h1hze7h819ca myweb.3 docker-registry.io/test/nginx:latest docker-node03 Running Running 30 seconds ago t3gjvsgtpuql web.3 docker-registry.io/test/nginx:latest docker-node02 Running Running 12 minutes ago l9jtbswl2x22 \_ web.3 docker-registry.io/test/nginx:latest docker-node03 Shutdown Shutdown 5 minutes ago od3ti2ixpsgc web.4 docker-registry.io/test/nginx:latest docker-node03 Running Running 31 seconds ago n1vur8wbmkgz web.5 docker-registry.io/test/nginx:latest docker-node03 Running Running 31 seconds ago [root@docker-node01 ~]#
提示:docker service scale 命令用来指定服务的副本数量,从而实现动态伸缩;
服务暴露
[root@docker-node01 ~]# docker service ls ID NAME MODE REPLICAS IMAGE PORTS i0j6wvvtfe13 myweb replicated 3/3 docker-registry.io/test/nginx:latest mbiap412jyug web replicated 5/5 docker-registry.io/test/nginx:latest [root@docker-node01 ~]# docker service update --publish-add 80:80 myweb myweb overall progress: 3 out of 3 tasks 1/3: running [==================================================>] 2/3: running [==================================================>] 3/3: running [==================================================>] verify: Service converged [root@docker-node01 ~]#
提示:docker swarm集群中的服务暴露和docker里面的端口暴露原理是一样的,都是通过iptables 规则表或LVS规则实现的;
提示:我们可以在管理节点上看到对应80端口已经处于监听状态,并且在iptables规则表中多了一项访问本机80端口都DNAT到172.18.0.2的80上了;其实不光是在管理节点,在work节点上相应的iptables规则也都发生了变化;如下
提示:从上面的规则来看,我们访问节点地址的80端口,都会DNAT到172.18.0.2的80;
提示:从上面是显示结果看,我们不难得知在docker-node02运行myweb容器的内部地址是10.0.0.7,那为什么我们访问172.18.0.2是能够访问到容器内部的服务呢?
测试:我们在docker-node02追踪查看nginx容器的访问日志,看看到容器的IP地址是那个?
[root@docker-node02 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2134e1b2c689 docker-registry.io/test/nginx:latest "/docker-entrypoint.…" 24 minutes ago Up 24 minutes 80/tcp nginx.1.ych7y3ugxp6o592pbz5k2i412 [root@docker-node02 ~]# docker logs -f nginx.1.ych7y3ugxp6o592pbz5k2i412 /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/ /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh 10-listen-on-ipv6-by-default.sh: Getting the checksum of /etc/nginx/conf.d/default.conf 10-listen-on-ipv6-by-default.sh: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh /docker-entrypoint.sh: Configuration complete; ready for start up 10.0.0.3 - - [21/Jun/2020:02:37:11 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.29.0" "-" 172.18.0.1 - - [21/Jun/2020:02:38:35 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.29.0" "-" 10.0.0.2 - - [21/Jun/2020:02:53:32 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.29.0" "-" 10.0.0.2 - - [21/Jun/2020:02:53:58 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.29.0" "-" ^C [root@docker-node02 ~]#
提示:我们在管理节点上访问172.18.0.2在node2节点上看到的日志是10.0.0.2的ip访问到nginx服务;这是为什么呢?其实原因就是在每个节点上都有一个ingress-sbox容器,该容器的地址就10.0.0.2;不同节点上的ingress-sbox的地址都不同,所以我们访问不同节点地址,在nginx上看到地址也就不同;如下图所示
提示:访问不同的节点地址,在nginx日志上记录的IP各不相同
提示:从上面的截图可以了解到每个节点的ingress-sbox容器的地址各不相同,但他们都把网关指向10.0.0.1,这意味着各个节点容器通信就可以基于这个网关来进行,从而实现了swarm集群上的容器间通信能够基于ingress网络进行;现在还有一个问题就是172.18.0.0/18的网络是怎么和10.0.0.0/24的网络通信的?
提示:从上面的截图可以看到,在管理节点上有两个网络名称空间,一个id为0,而id为0的网络名称空间中有veth0和vxlan0这两个网卡;而veth0和vxlan0都是桥接到br0上的,br0的地址就是10.0.0.1/24;vxlan的vlan id为4096;结合上面nginx的日志,不难想到
我们访问管理节点上的80,通过iptables规则把流量转发给docker-gwbridge网络上;现在我们还不清楚docker-gwbridge网络上那个名称空间的网络,但是我们清楚知道在容器内部有两张网卡,一张是eth0,一张是eth1,而eth1就是桥接到docker-gwbridge网络上,这也就意味着容docker-gwbridge网络的名称空间和容器内部的eth1网络名称空间相同;
提示:从上面的截图看,1-u5mwgfq7rb这个名称的网络名称空间有三张网卡,分别是eth0,eth1和vxlan0,它们都是桥接在br0这个网卡上;而上面管理节点也在1-u5mwgfq7rb这个网络名称空间,并且它们中的vxlan0的vlan id都是4096,这意味着管理节点上的vxlan0可以同node2上的vxlan0直接通信(相同网络名称空间中的相同VLAN id是可以直接通信的),而vxlan0又是直接桥接到br0这块网卡,所以我们在nginx日志中能够看到ingress-sbox容器的地址在访问nginx;这其中的原因是ingress-sbox的网关就是br0;其实node3也是相同逻辑,不同节点上的容器间通信都是走vxlan0,与外部通信走eth1---->然后通过SNAT走docker-gwbridge---->物理网卡出去;
提示:一个容器上有两个网络,一个是eth0 ingress网络,一个是eth1属于docker-gwbridge网络,两者都属于同一容器中的网络名称空间,所以我们访问172.18.0.2就会通过ingress-sbox容器把源地址更改为docker-gwbridge上的ingress-sbox的地址,从而我们在看nginx日志,就会看到10.0.0.2的地址;ingress-sbox容器作用我们可以理解为做SNAT的作用;
测试:访问管理节点的80服务看看是否能够访问到nginx提供的页面呢?
[root@docker-node02 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b829991d6966 docker-registry.io/test/nginx:latest "/docker-entrypoint.…" About an hour ago Up About an hour 80/tcp myweb.1.ilhkslrlnreyo6xx5j2h9isjb 8c2965fbdc27 docker-registry.io/test/nginx:latest "/docker-entrypoint.…" 2 hours ago Up 2 hours 80/tcp web.2.pthe8da2n45i06oee4n7h4krd b019d663e48e docker-registry.io/test/nginx:latest "/docker-entrypoint.…" 2 hours ago Up 2 hours 80/tcp web.3.w26gqpoyysgplm7qwhjbgisiv a7c1afd76f1f docker-registry.io/test/nginx:latest "/docker-entrypoint.…" 2 hours ago Up 2 hours 80/tcp web.1.ho0d7u3wensl0kah0ioz1lpk5 [root@docker-node02 ~]# docker exec -it myweb.1.ilhkslrlnreyo6xx5j2h9isjb bash root@b829991d6966:/# cd /usr/share/nginx/html/ root@b829991d6966:/usr/share/nginx/html# ls 50x.html index.html root@b829991d6966:/usr/share/nginx/html# echo "this is docker-node02 index page" >index.html root@b829991d6966:/usr/share/nginx/html# cat index.html this is docker-node02 index page root@b829991d6966:/usr/share/nginx/html#
提示:以上是在docker-node02节点上对运行的nginx容器的主页进行了修改,接下我们访问管理节点的80端口,看看是否能够访问得到work节点上的容器,它们会有什么效果?是轮询?还是一直访问一个容器?
提示:可以看到我们访问管理节点的80端口,会轮询的访问到work节点上的容器;用浏览器测试可能存在缓存的问题,我们可以用curl命令测试比较准确;如下
[root@docker-node03 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f43fdb9ec7fc docker-registry.io/test/nginx:latest "/docker-entrypoint.…" 2 hours ago Up 2 hours 80/tcp myweb.3.pgdjutofb5thlk02aj7387oj0 4470785f3d00 docker-registry.io/test/nginx:latest "/docker-entrypoint.…" 2 hours ago Up 2 hours 80/tcp myweb.2.uwxbe182qzq00qgfc7odcmx87 7493dcac95ba docker-registry.io/test/nginx:latest "/docker-entrypoint.…" 2 hours ago Up 2 hours 80/tcp web.4.rix50fhlmg6m9txw9urk66gvw 118880d300f4 docker-registry.io/test/nginx:latest "/docker-entrypoint.…" 2 hours ago Up 2 hours 80/tcp web.5.vo7c7vjgpf92b0ryelb7eque0 [root@docker-node03 ~]# docker exec -it myweb.2.uwxbe182qzq00qgfc7odcmx87 bash root@4470785f3d00:/# cd /usr/share/nginx/html/ root@4470785f3d00:/usr/share/nginx/html# echo "this is myweb.2 index page" > index.html root@4470785f3d00:/usr/share/nginx/html# cat index.html this is myweb.2 index page root@4470785f3d00:/usr/share/nginx/html# exit exit [root@docker-node03 ~]# docker exec -it myweb.3.pgdjutofb5thlk02aj7387oj0 bash root@f43fdb9ec7fc:/# cd /usr/share/nginx/html/ root@f43fdb9ec7fc:/usr/share/nginx/html# echo "this is myweb.3 index page" >index.html root@f43fdb9ec7fc:/usr/share/nginx/html# cat index.html this is myweb.3 index page root@f43fdb9ec7fc:/usr/share/nginx/html# exit exit [root@docker-node03 ~]#
提示:为了访问方便看得出效果,我们把myweb.2和myweb.3的主页都更改了内容
[root@docker-node01 ~]# for i in {1..10} ; do curl 192.168.0.41; done this is myweb.3 index page this is docker-node02 index page this is myweb.2 index page this is myweb.3 index page this is docker-node02 index page this is myweb.2 index page this is myweb.3 index page this is docker-node02 index page this is myweb.2 index page this is myweb.3 index page [root@docker-node01 ~]#
提示:通过上面的测试,我们在使用--publish-add 暴露服务时,就相当于在管理节点创建了一个load balance;