即使有了Docker Compose,项目的部署仍然存在问题,因为Docker Compose只能把项目所有的容器部署在同一台机器上,这在生产环境下是不现实的。
Docker Compose一般只适用于开发环境,而对于生产环境下的项目部署,我们需要用到Docker Swarm。
Docker Swarm是Docker官方提供的一套容器编排系统,它将一组Docker主机虚拟成一个单独的虚拟Docker主机。
架构如下:
swarm是一系列节点的集合,而节点可以是一台裸机或者一台虚拟机。一个节点能扮演一个或者两个角色,manager或者worker。
manager
Docker Swarm集群需要至少一个manager节点,节点之间使用Raft consensus protocol进行协同工作。
通常,第一个启用docker swarm的节点将成为leader,后来加入的都是follower。当前的leader如果挂掉,
剩余的节点将重新选举出一个新的leader。每一个manager都有一个完整的当前集群状态的副本,可以保证manager的高可用。
worker
worker节点是运行实际应用服务的容器所在的地方。理论上,一个manager节点也能同时成为worker节点,但在生产环境中,
我们不建议这样做。worker节点之间,通过control plane进行通信,这种通信使用gossip协议,并且是异步的。
多个tasks组成一个service,多个services组成一个stack。
task
在Docker Swarm中,task是一个部署的最小单元,task与容器是一对一的关系。
service
swarm service是一个抽象的概念,它只是一个对运行在swarm集群上的应用服务,所期望状态的描述。
它就像一个描述了下面物品的清单列表一样:
服务名称
使用哪个镜像来创建容器
要运行多少个副本
服务的容器要连接到哪个网络上
需要映射哪些端口
stack
stack是描述一系列相关services的集合,可以通过在一个YAML文件中来定义一个stack,类似于docker-compose。
对于单主机网络,所有的容器都运行在一个docker主机上,他们之间的通信一般使用本地的bridge network即可。
而对于swarm集群,针对的是一组docker主机,需要使用docker的overlay network。
docker swarm
命令:ca 显示根CA证书
init 初始化一个集群
join 加入集群
join-token worker 查看工作节点的token
join-token manager 查看管理节点的token
leave 离开集群
unlock 解锁集群
unlock-key 管理解锁密钥
update 更新集群
docker node
命令:
demote 节点降级,由管理节点降级为工作节点
inspect 查看一个或多个节点的详情
ls 查看所有的节点
promote 节点升级,由工作节点升级为管理节点
ps 查看一个或多个节点中的task,默认为当前节点
rm 删除一个或多个节点
update 更新一个节点
docker service
命令:create 创建一个新的service
inspect 查看一个或多个service的详情
logs 获取service或task的日志
ls 列出所有的service
ps 列出一个或多个service的task
rm 删除一个或多个service
rollback 将更改还原为service的配置
scale 创建一个或多个service的副本
update 更新一个service
docker stack
命令:deploy 部署新的stack或更新现有stack
ls 列出现有stack
ps 列出stack中的tasks
rm 删除一个或多个stack
services 列出stack中的services
注意:以上命令大多只能在manager节点上执行。
根据集群的高可用性要求实现奇数个节点。当有两个以上manager节点时,集群可以manager节点的故障中恢复,而无需停机。
N个manager节点的集群将最多容忍(N-1) / 2
个manager节点的丢失。
下面创建一个三节点的swarm集群。
role | ip | hostname |
---|---|---|
manager | 192.168.30.128 | test1 |
worker1 | 192.168.30.129 | test2 |
worker2 | 192.168.30.130 | test3 |
# systemctl stop firewalld && systemctl disable firewalld
# sed -i 's/=enforcing/=disabled/g' /etc/selinux/config && setenforce 0
# curl http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -o /etc/yum.repos.d/docker.repo
# yum makecache fast
# yum install -y docker-ce
# systemctl start docker && systemctl enable docker
注意:swarm集群必须先在manager节点进行集群初始化,然后在worker节点上加入集群。
192.168.30.128
# docker swarm init --advertise-addr=192.168.30.128
Swarm initialized: current node (q1n9ztahdj489pltf3gl5pomj) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token SWMTKN-1-38ci507e4fyp92oauqvov5axo3qhti5wlm58odjz5lo2rdatyo-e2y52gxq7y40mah2nzq1b5fg9 192.168.30.128:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
将添加worker节点的命令复制到其它非manager节点的机器执行
192.168.30.129
# docker swarm join --token SWMTKN-1-38ci507e4fyp92oauqvov5axo3qhti5wlm58odjz5lo2rdatyo-e2y52gxq7y40mah2nzq1b5fg9 192.168.30.128:2377
This node joined a swarm as a worker.
192.168.30.130
# docker swarm join --token SWMTKN-1-38ci507e4fyp92oauqvov5axo3qhti5wlm58odjz5lo2rdatyo-e2y52gxq7y40mah2nzq1b5fg9 192.168.30.128:2377
This node joined a swarm as a worker.
192.168.30.128
# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
q1n9ztahdj489pltf3gl5pomj * test1 Ready Active Leader 19.03.3
0qyp2ut4m3pggag1yq7f3jn31 test2 Ready Active 19.03.4
cbi8detm7t9v8w5ntyzid0cvj test3 Ready Active 19.03.4
可以看到,一个三节点的swarm集群创建完毕,test1(192.168.30.128)为manager节点。
在Docker Swarm集群中,我们可以通过docker service
命令创建一些service,每个service都包含多个tasks,每个task对应一个容器。
192.168.30.128
# docker service create --name busybox busybox:latest sh -c "while true; do sleep 3600; done"
y8o6jogs0iyp4qewb5okgzb37
overall progress: 1 out of 1 tasks
1/1: running
verify: Service converged
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
y8o6jogs0iyp busybox replicated 1/1 busybox:latest
# docker service ps busybox
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
mavy3blpmzvz busybox.1 busybox:latest test2 Running Running about a minute ago
可以看到,该service的task运行在test2(192.168.30.129)上
192.168.30.129
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3dd48d9541ca busybox:latest "sh -c 'while true; …" 2 minutes ago Up 2 minutes busybox.1.mavy3blpmzvzka1ks1ebuz3s4
当前busybox这个service的task只有1个,扩展为5个。
192.168.30.128
# docker service scale busybox=5
busybox scaled to 5
overall progress: 5 out of 5 tasks
1/5: running
2/5: running
3/5: running
4/5: running
5/5: running
verify: Service converged
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
y8o6jogs0iyp busybox replicated 5/5 busybox:latest
# docker service ps busybox
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
mavy3blpmzvz busybox.1 busybox:latest test2 Running Running 4 minutes ago
gxg5gt2j5a1v busybox.2 busybox:latest test1 Running Running 20 seconds ago
okge105yuzb8 busybox.3 busybox:latest test2 Running Running 25 seconds ago
b86rr94bbotj busybox.4 busybox:latest test3 Running Running 22 seconds ago
8zogu5kacnpw busybox.5 busybox:latest test1 Running Running 20 seconds ago
可以看到,该service的task分别运行在集群的3个节点上。
当某个task对应的容器挂掉时,会自动在任一节点启动该task对应的容器。
192.168.30.128
# docker rm -f 7d013a7eb685
7d013a7eb685
# docker service ps busybox
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
mavy3blpmzvz busybox.1 busybox:latest test2 Running Running 11 minutes ago
jewllc9gywpa busybox.2 busybox:latest test3 Ready Ready 3 seconds ago
gxg5gt2j5a1v \_ busybox.2 busybox:latest test1 Shutdown Failed 3 seconds ago "task: non-zero exit (137)"
okge105yuzb8 busybox.3 busybox:latest test2 Running Running 7 minutes ago
b86rr94bbotj busybox.4 busybox:latest test3 Running Running 7 minutes ago
8zogu5kacnpw busybox.5 busybox:latest test1 Running Running 7 minutes ago
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
y8o6jogs0iyp busybox replicated 5/5 busybox:latest
可以看到,test1(192.168.30.128)上强制删除的容器在test3(192.168.30.130)上重新启动。
192.168.30.130
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
05efd68e06c5 busybox:latest "sh -c 'while true; …" 2 minutes ago Up About a minute busybox.2.jewllc9gywpaude65rmc87wka
629fe2d2b396 busybox:latest "sh -c 'while true; …" 9 minutes ago Up 9 minutes busybox.4.b86rr94bbotj1fhyk7owwt2tl
故障恢复可以保证我们的service是稳定有效的。
192.168.30.128
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
y8o6jogs0iyp busybox replicated 5/5 busybox:latest
# docker service rm busybox
busybox
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
192.168.30.129
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
192.168.30.130
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
当删除service后,该service对应的task容器也会停止运行并删除。
如果在swarm集群中部署项目,首先需要创建的是overlay network,因为项目中关联的services需要通过overlay network通信。
在swarm集群中创建overlay network,不再需要外部的分布式存储(如etcd),swarm集群会自动完成overlay network的同步工作。
下面使用Docker Service部署wordpress项目,该项目包含两个service:wordpress和mysql。
192.168.30.128
# docker network ls
NETWORK ID NAME DRIVER SCOPE
5ee1b278fd34 bridge bridge local
faf258f504b0 docker_gwbridge bridge local
535808221d2e host host local
5yetwtzg2b1x ingress overlay swarm
2addad8d8857 none null local
# docker network create -d overlay test
exyl7ksbeavt00c5ot0k66s2w
# docker network ls
NETWORK ID NAME DRIVER SCOPE
5ee1b278fd34 bridge bridge local
faf258f504b0 docker_gwbridge bridge local
535808221d2e host host local
5yetwtzg2b1x ingress overlay swarm
2addad8d8857 none null local
exyl7ksbeavt test overlay swarm
192.168.30.128
# docker service create --name mysql --network test -e MYSQL_ROOT_PASSWORD=123456789 -e MYSQL_DATABASE=wordpress --mount type=volume,source=mysql_data,destination=/var/lib/mysql mysql:5.7
3bjl5kse0letilvkx0kltnfm9
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
3bjl5kse0let mysql replicated 1/1 mysql:5.7
# docker service ps mysql
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
d8ofaptycyzs mysql.1 mysql:5.7 test3 Running Running about a minute ago
192.168.30.128
# docker service create --name wordpress --network test -p 80:80 -e WORDPRESS_DB_PASSWORD=123456789 -e WORDPRESS_DB_HOST=mysql wordpress
x96xdiazi4iupgvwl5oza4sx3
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
3bjl5kse0let mysql replicated 1/1 mysql:5.7
x96xdiazi4iu wordpress replicated 1/1 wordpress:latest *:80->80/tcp
# docker service ps wordpress
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
8b9g1upsll15 wordpress.1 wordpress:latest test1 Running Running 36 seconds ago
可以看到,mysql这个service的task容器运行在test3(192.168.30.130)上,而wordpress这个服务的task容器运行在test1(192.168.30.128)上。
192.168.30.128
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e189db13cbe3 wordpress:latest "docker-entrypoint.s…" About a minute ago Up About a minute 80/tcp wordpress.1.8b9g1upsll15m8crlhkebgc64
192.168.30.130
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6be947ccfe36 mysql:5.7 "docker-entrypoint.s…" 3 minutes ago Up 3 minutes 3306/tcp, 33060/tcp mysql.1.d8ofaptycyzs01ckt6k2zhxd7
打开浏览器访问192.168.30.128
,
填写信息后直接登录,可以看看wordpress站点
通过overlay network,在swarm集群中使用docker service
部署wordpress项目成功。
继续使用浏览器,分别访问192.168.30.129
和192.168.30.130
,
这是为什么呢?这就是Routing Mesh的作用。如果service有绑定端口,则该service可通过任意swarm节点的相应端口访问。
创建whoami service
# docker service create --name whoami --network test -p 8000:8000 jwilder/whoami
ckgqv2okq5wdscgv0pihk2hwr
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
# docker service scale whoami=3
whoami scaled to 3
overall progress: 3 out of 3 tasks
1/3: running [==================================================>]
2/3: running [==================================================>]
3/3: running [==================================================>]
verify: Service converged
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
3bjl5kse0let mysql replicated 1/1 mysql:5.7
ckgqv2okq5wd whoami replicated 3/3 jwilder/whoami:latest *:8000->8000/tcp
x96xdiazi4iu wordpress replicated 1/1 wordpress:latest *:80->80/tcp
# docker service ps whoami
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ugndxn4ojkw4 whoami.1 jwilder/whoami:latest test2 Running Running 2 minutes ago
iv3cu975hpr4 whoami.2 jwilder/whoami:latest test1 Running Running about a minute ago
qc5kp773iof7 whoami.3 jwilder/whoami:latest test3 Running Running about a minute ago
创建busybox service
# docker service create --name busybox --network test busybox sh -c "while true; do sleep 3600; done"
bxy6hzvrfoxy28yyagzvkfqmf
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
# docker service ps busybox
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
rewjvfn34qq9 busybox.1 busybox:latest test2 Running Running 23 seconds ago
192.168.30.129
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
933d4e916cbd busybox:latest "sh -c 'while true; …" About a minute ago Up About a minute busybox.1.rewjvfn34qq9tbrn2dty4e6vf
7b28da1c9491 jwilder/whoami:latest "/app/http" 5 minutes ago Up 5 minutes 8000/tcp whoami.1.ugndxn4ojkw42u6y22rfdjlii
# docker exec -it busybox.1.rewjvfn34qq9tbrn2dty4e6vf sh
/ # ping whoami
PING whoami (10.0.0.17): 56 data bytes
64 bytes from 10.0.0.17: seq=0 ttl=64 time=0.192 ms
64 bytes from 10.0.0.17: seq=1 ttl=64 time=0.353 ms
64 bytes from 10.0.0.17: seq=2 ttl=64 time=0.144 ms
64 bytes from 10.0.0.17: seq=3 ttl=64 time=0.144 ms
^C
--- whoami ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 0.144/0.208/0.353 ms
前面启动了3个whoami的task容器,为什么在busybox中ping whoami
时,始终只返回一个ip呢?
/ # nslookup whoami
Server: 127.0.0.11
Address: 127.0.0.11:53
Non-authoritative answer:
*** Can't find whoami: No answer
/ # nslookup tasks.whoami 127.0.0.11
Server: 127.0.0.11
Address: 127.0.0.11:53
Non-authoritative answer:
Name: tasks.whoami
Address: 10.0.0.18
Name: tasks.whoami
Address: 10.0.0.21
Name: tasks.whoami
Address: 10.0.0.20
*** Can't find tasks.whoami: No answer
这回看到3个ip:10.0.0.18
、10.0.0.21
、10.0.0.20
,分别在进入各个whoami的容器内查看ip可知,这3个ip就是whoami容器的真实ip,而我们ping whoami
得到的ip10.0.0.17
是一个虚拟ip(vip)。
对于一个service来说,它的vip一般是不变的,在水平扩展时发生变化的是vip后面的task容器ip。
任选一个节点,
# curl 127.0.0.1:8000
I'm ae9da507e9f7
# curl 127.0.0.1:8000
I'm 5e4782fd9dee
# curl 127.0.0.1:8000
I'm 7b28da1c9491
# curl 127.0.0.1:8000
I'm ae9da507e9f7
# curl 127.0.0.1:8000
I'm 5e4782fd9dee
# curl 127.0.0.1:8000
I'm 7b28da1c9491
通过连续curl返回的内容可以看到,每次返回的结果是不同的hostname,并且是轮询返回,这就形成了负载均衡。
其实routing mesh内部根据LVS(Linux Virtual Server)来实现的,通过vip达到负载均衡的目的。
Internal
容器和容器之间的访问通过overlay网络(通过VIP)
Ingress
如果service有绑定端口,则此service可通过任意swarm节点的相应端口访问
外部访问的负载均衡
服务端口被暴露到各个swarm节点
内部通过LVS进行负载均衡