docker swarm实践

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

准备

准备3台机器,包含一台管理节点,两台工作节点的最小的swarm集群

172.16.0.200  Ubuntu14.04

172.16.0.201  Ubuntu14.04

172.16.0.202  Ubuntu14.04

生成环境最好2n+1(n>=1)个manager节点,但也不是越多越好,官方建议是7个manager节点

 

安装docker

 

使用脚本自动安装

在测试或开发环境中 Docker 官方为了简化安装流程,提供了一套便捷的安装脚本,Ubuntu 系统上可以使用这套脚本安装:

$ curl -fsSL get.docker.com -o get-docker.sh
$ sudo sh get-docker.sh --mirror Aliyun

执行这个命令后,会下载脚本到get-docker.sh,并使用阿里云的镜像下载,然后脚本会把 Docker CE 的 Edge 版本安装在系统中。且是默认启动。

直接只有Daocloud的安装脚本安装

curl -sSL https://get.daocloud.io/docker | sh

 

启动命令

service docker start

 

建立docker用户组

默认情况下,docker 命令会使用 Unix socket 与 Docker 引擎通讯。而只有 root 用户和 docker 组的用户才可以访问 Docker 引擎的 Unix socket。出于安全考虑,一般 Linux 系统上不会直接使用 root 用户。因此,更好地做法是将需要使用 docker 的用户加入 docker 用户组。

建立docker组

groupadd docker

将当前用户加入docker组

usermod -aG docker $USER

退出当前终端并重新登录,进行如下测试。

 

测试docker是否正确安装

$ docker run hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
ca4f61b1923c: Pull complete
Digest: sha256:be0cd392e45be79ffeffa6b05338b98ebb16c87b255f48e297ec7f98e123905c
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://cloud.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/engine/userguide/

若能输出以上信息,则是正确安装

 

创建swarm集群

3台机器见最上面,都安装docker环境。现在以172.16.0.200为manager,其余2台为worker

 

初始化集群

我们使用 docker swarm init 在本机初始化一个 Swarm 集群。

root@ubuntu:~# docker swarm init --advertise-addr 172.16.0.200
Swarm initialized: current node (u66elsqnr7cx3ufefopuvchbm) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

如果你的 Docker 主机有多个网卡,拥有多个 IP,必须使用 --advertise-addr 指定 IP。

执行 docker swarm init 命令的节点自动成为管理节点

 

增加工作节点

登录201

root@api:~# ssh 172.16.0.201
[email protected]'s password: 
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)
Last login: Fri Dec 29 12:05:09 2017 from 172.16.0.200

加入到集群

依照上面初始化集群成功后的提示执行加入命令即可

root@ubuntu:~# docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377
This node joined a swarm as a worker.

登录202

root@api:~# ssh 172.16.0.202
[email protected]'s password: 
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)
Last login: Fri Dec 29 12:05:09 2017 from 172.16.0.200

加入到集群

root@ubuntu:~# docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377
This node joined a swarm as a worker.

 

查看集群

经过上边的两步,我们已经拥有了一个最小的 Swarm 集群,包含一个管理节点和两个工作节点。

在管理节点使用 docker node ls 查看集群。

root@ubuntu:~# docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS
c88xqfzcbhhlg6c07oochr2g7     ubuntu              Ready               Active              
rq5oh6hfdo32t9xbl7z7sgikm     ubuntu              Ready               Active              
u66elsqnr7cx3ufefopuvchbm *   ubuntu              Ready               Active              Leader

退出集群

root@ubuntu:~# docker swarm leave
Node left the swarm.

两台都退出后,manager上查看

root@ubuntu:~# docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS
c88xqfzcbhhlg6c07oochr2g7     ubuntu              Down                Active              
rq5oh6hfdo32t9xbl7z7sgikm     ubuntu              Down                Active              
u66elsqnr7cx3ufefopuvchbm *   ubuntu              Ready               Active              Leader

可见STATUS=Down了

 

疑问

上面可见有个token,那么我忘记后面怎么再加入新的worker节点或manager节点呢? 通过 docker swarm join-token worker查看,见下

root@ubuntu:~# docker swarm join-token worker
To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii 172.16.0.200:2377

加入manager也一样,换下参数

root@ubuntu:~# docker swarm join-token manager
To add a manager to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-469wlqjh1jjuupv8ha29l2rzm 172.16.0.200:2377

也可更新 token

$ docker swarm join-token --rotate worker
Succesfully rotated worker join token.
 
To add a worker to this swarm, run the following command:
 
    docker swarm join \
    --token SWMTKN-1-3pu6hszjas19xyp7ghgosyx9k8atbfcr8p2is99znpy26u2lkl-b30ljddcqhef9b9v4rs7mel7t \
    172.16.0.200:2377

使用–rotate更新token之后,只能用新的token来加入集群。

-q或–quiet参数只打印token:

root@ubuntu:~# docker swarm join-token -q worker
SWMTKN-1-1im8j7ggh1wqwxppvg3cl1mbpkzsnux4g4vgftg6s08dydl8xw-0iorbvp4rqvsqpe658i054xii

 

部署服务

我们使用 docker service 命令来管理 Swarm 集群中的服务,该命令只能在管理节点运行。

 

新建服务

管理节点执行

root@ubuntu:~# docker service create --name nginx --replicas 3 -p 80:80 nginx
tqd95pxsro7o0rs33ylz9zimj
overall progress: 3 out of 3 tasks 
1/3: running   [==================================================>] 
2/3: running   [==================================================>] 
3/3: running   [==================================================>] 
verify: Service converged 

现在我们使用浏览器,输入任意节点 IP ,即可看到 nginx 默认页面,如curl http://172.16.0.200

root@ubuntu:~# curl http://172.16.0.172



Welcome to nginx!



Welcome to nginx!

If you see this page, the nginx web server is successfully installed and working. Further configuration is required.

For online documentation and support please refer to nginx.org.
Commercial support is available at nginx.com.

Thank you for using nginx.

到各台机器查看,上面都已经起了docker 服务

root@worker2:~# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
08a92a22061b        nginx:latest        "nginx -g 'daemon of…"   33 seconds ago      Up 33 seconds       80/tcp              nginx.1.tfhtoyshmzun54x17l3l3sop5

 

查看服务

使用 docker service ls 来查看当前 Swarm 集群运行的服务。

root@ubuntu:~# docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS
tqd95pxsro7o        nginx               replicated          3/3                 nginx:latest        *:80->80/tcp

使用 docker service ps xxx 来查看某个服务的详情,分布在哪个node等

root@manager1:~# docker service ps nginx 
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
tfhtoyshmzun        nginx.1             nginx:latest        worker2             Running             Running 18 minutes ago                       
p6b1gfragl8y        nginx.2             nginx:latest        manager1            Running             Running 16 minutes ago                       
250owqr51y50        nginx.3             nginx:latest        worker1             Running             Running 20 minutes ago                   

使用 docker service logs xxx 来查看某个服务的log,前面我访问curl http://172.16.0.200 3次,log如下

root@manager1:~# docker service logs nginx 
nginx.2.p6b1gfragl8y@manager1    | 10.255.0.2 - - [02/Jan/2018:09:58:31 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
nginx.1.tfhtoyshmzun@worker2    | 10.255.0.2 - - [02/Jan/2018:10:01:48 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
nginx.3.250owqr51y50@worker1    | 10.255.0.2 - - [02/Jan/2018:10:05:12 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"

上面nginx是service名称,也可访问具体服务中的某一个服务的log,logs后面按tab键,会弹出名称. 我又访问了3次,可见3台上分别有2次访问,可见其实现了负载均衡

root@manager1:~# docker service logs 250owqr51y50 
nginx.3.250owqr51y50@worker1    | 10.255.0.2 - - [02/Jan/2018:10:05:12 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
nginx.3.250owqr51y50@worker1    | 10.255.0.2 - - [02/Jan/2018:10:06:43 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
root@manager1:~# docker service logs p6b1gfragl8y 
nginx.2.p6b1gfragl8y@manager1    | 10.255.0.2 - - [02/Jan/2018:09:58:31 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
nginx.2.p6b1gfragl8y@manager1    | 10.255.0.2 - - [02/Jan/2018:10:06:36 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
root@manager1:~# docker service logs tfhtoyshmzun 
nginx.1.tfhtoyshmzun@worker2    | 10.255.0.2 - - [02/Jan/2018:10:01:48 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
nginx.1.tfhtoyshmzun@worker2    | 10.255.0.2 - - [02/Jan/2018:10:06:42 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"

 

删除服务

使用 docker service rm xxx 来从swarm集群移除某个服务。

root@manager1:~# docker service rm nginx 
nginx

查看,已经无nginx的服务

root@manager1:~# docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS

到worker节点上查看,也已经删除了docker 服务

root@worker1:~# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

 

在swarm中使用compose部署

在swarm集群中也可以使用compose文件(docker-compose.yml)来配置,启动多个服务,我们以部署WordPress为例进行说明

 

配置文件

manager1节点上的 docker-compose.yml

version: "3"
services:
  web:
    image: nginx
    deploy:
      replicas: 3
      restart_policy:
        condition: on-failure
      resources:
        limits:
          cpus: "0.1"
          memory: 50M
    ports:
      - "80:80"
    networks:
      - webnet
  visualizer:
    image: dockersamples/visualizer:stable
    ports:
      - "8080:8080"
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
    deploy:
      placement:
        constraints: [node.role == manager]
    networks:
      - webnet
networks:
  webnet:

这里:

1、起了2个services:(web 和 visualizer)

2、web是3个nginx组成的。 visualizer是一个开源项目,可用一个图来看到整个swarm上运行的容器,这里指定了只能运行在manager节点上

3、起了一个网络webnet,类型为overlay,见最下面一行。启动的容器都使用此网络互联

root@manager1:~# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
965de71420b3        bridge              bridge              local
f66cd75a8741        docker_gwbridge     bridge              local
136fb30aa99c        host                host                local
73ha87ntxcd5        ingress             overlay             swarm
eebca5dc00a0        none                null                local
pjxvmr7b1wcq        proj_webnet         overlay             swarm

 

部署服务

deploy 部署

-c 指定配置文件

proj 名称随便起

root@manager1:~# docker stack deploy -c docker-compose.yml proj
Creating network proj_webnet
Creating service proj_web
Creating service proj_visualizer

部署完成以后,访问http://任意节点:8080,即会看到监控界面

docker swarm实践_第1张图片

查看服务

列出所有stack

root@manager1:~# docker stack ls 
NAME                SERVICES
proj                2

一个stack,2个services(web和visualizer)

列出所有服务services

root@manager1:~# docker stack services proj 
ID                  NAME                MODE                REPLICAS            IMAGE                             PORTS
44u5poqrieoy        proj_visualizer     replicated          1/1                 dockersamples/visualizer:stable   *:8080->8080/tcp
s2hlawvwwal4        proj_web            replicated          3/3                 nginx:latest                      *:80->80/tcp

列出stack中任务情况,分布情况

root@manager1:~# docker stack ps proj 
ID                  NAME                IMAGE                             NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
quexa31y9kkt        proj_visualizer.1   dockersamples/visualizer:stable   manager1            Running             Running 20 minutes ago                       
4ozvbgpkrqtb        proj_web.1          nginx:latest                      worker2             Running             Running 21 minutes ago                       
skkh5uzsyl9o        proj_web.2          nginx:latest                      manager1            Running             Running 21 minutes ago                       
uns9r5vdx1x4        proj_web.3          nginx:latest                      worker1             Running             Running 21 minutes ago 

 

扩容服务

比如由3份变为5份 docker service scale proj_web=5 

root@manager1:~# docker service scale proj_web=5
proj_web scaled to 5
overall progress: 5 out of 5 tasks 
1/5: running   [==================================================>] 
2/5: running   [==================================================>] 
3/5: running   [==================================================>] 
4/5: running   [==================================================>] 
5/5: running   [==================================================>] 
verify: Service converged 

查看服务情况

root@manager1:~# docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                             PORTS
lhxiyrevf4dp        proj_visualizer     replicated          1/1                 dockersamples/visualizer:stable   *:8080->8080/tcp
uxo0piooz4it        proj_web            replicated          5/5                 nginx:latest                      *:80->80/tcp
root@manager1:~# docker service ps proj_web 
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
6jqcnikrh39j        proj_web.1          nginx:latest        worker1             Running             Running 4 minutes ago                       
o7gq6w3wisbk        proj_web.2          nginx:latest        worker2             Running             Running 4 minutes ago                       
15hzrtciynx7        proj_web.3          nginx:latest        manager1            Running             Running 4 minutes ago                       
zjtcv1e68aag        proj_web.4          nginx:latest        worker1             Running             Running 2 minutes ago                       
7mvta3g7m8fm        proj_web.5          nginx:latest        worker2             Running             Running 2 minutes ago       

减配置直接设置较少的数量即可,比如再设置回3个副本

root@manager1:~# docker service scale proj_web=3
proj_web scaled to 3
overall progress: 3 out of 3 tasks 
1/3: running   [==================================================>] 
2/3: running   [==================================================>] 
3/3: running   [==================================================>] 
verify: Service converged 
root@manager1:~# docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                             PORTS
lhxiyrevf4dp        proj_visualizer     replicated          1/1                 dockersamples/visualizer:stable   *:8080->8080/tcp
uxo0piooz4it        proj_web            replicated          3/3                 nginx:latest                      *:80->80/tcp
root@manager1:~# docker service ps proj_web     
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
15hzrtciynx7        proj_web.3          nginx:latest        manager1            Running             Running 6 minutes ago                       
zjtcv1e68aag        proj_web.4          nginx:latest        worker1             Running             Running 4 minutes ago                       
7mvta3g7m8fm        proj_web.5          nginx:latest        worker2             Running             Running 4 minutes ago    

 

移除服务

docker stack rm xxx 移除服务

root@manager1:~# docker stack rm proj
Removing service proj_visualizer
Removing service proj_web
Removing network proj_webnet
root@manager1:~# docker stack ls
NAME                SERVICES

root@manager1:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

webnet的网络也删除了

root@worker2:~# docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
feff2a6d1503        bridge              bridge              local
2f0e401d10a6        docker_gwbridge     bridge              local
7a60d8ee6f8b        host                host                local
73ha87ntxcd5        ingress             overlay             swarm
98442c4c5766        none                null                local

 

-----------------------

测试负载均衡

比如我在202上 stop 容器

root@worker2:~# docker stop 505
505
root@manager1:~# docker stack ps proj 
ID                  NAME                IMAGE                             NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
quexa31y9kkt        proj_visualizer.1   dockersamples/visualizer:stable   manager1            Running             Running 31 minutes ago                       
4ozvbgpkrqtb        proj_web.1          nginx:latest                      worker2             Shutdown            Complete 3 seconds ago                       
skkh5uzsyl9o        proj_web.2          nginx:latest                      manager1            Running             Running 32 minutes ago                       
uns9r5vdx1x4        proj_web.3          nginx:latest                      worker1             Running             Running 32 minutes ago   

可看到worker2 Shutdown

本以为过会会新起个容器,可没有

root@manager1:~# docker stack ps proj       
ID                  NAME                IMAGE                             NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
quexa31y9kkt        proj_visualizer.1   dockersamples/visualizer:stable   manager1            Running             Running 37 minutes ago                       
4ozvbgpkrqtb        proj_web.1          nginx:latest                      worker2             Shutdown            Complete 6 minutes ago                       
skkh5uzsyl9o        proj_web.2          nginx:latest                      manager1            Running             Running 38 minutes ago                       
uns9r5vdx1x4        proj_web.3          nginx:latest                      worker1             Running             Running 38 minutes ago   

本以为再重新启动202上的容器会恢复,可还是没有

root@worker2:~# docker start 505
505
root@manager1:~# docker stack ps proj 
ID                  NAME                IMAGE                             NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
quexa31y9kkt        proj_visualizer.1   dockersamples/visualizer:stable   manager1            Running             Running 40 minutes ago                       
4ozvbgpkrqtb        proj_web.1          nginx:latest                      worker2             Shutdown            Complete 8 minutes ago                       
skkh5uzsyl9o        proj_web.2          nginx:latest                      manager1            Running             Running 41 minutes ago                       
uns9r5vdx1x4        proj_web.3          nginx:latest                      worker1             Running             Running 41 minutes ago 

那么删除容器试试呢,还是没有--!

root@worker2:~# docker rm -f 505 
505
root@worker2:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
root@manager1:~# docker stack ps proj       
ID                  NAME                IMAGE                             NODE                DESIRED STATE       CURRENT STATE          ERROR               PORTS
quexa31y9kkt        proj_visualizer.1   dockersamples/visualizer:stable   manager1            Running             Running 1 hours ago                        
4ozvbgpkrqtb        proj_web.1          nginx:latest                      worker2             Shutdown            Complete 1 hours ago                       
skkh5uzsyl9o        proj_web.2          nginx:latest                      manager1            Running             Running 1 hours ago                        
uns9r5vdx1x4        proj_web.3          nginx:latest                      worker1             Running             Running 1 hours ago   

虽然服务挂了,但是访问没问题 curl http://172.16.0.202,见下

root@api:~# curl http://172.16.0.174



Welcome to nginx!



Welcome to nginx!

If you see this page, the nginx web server is successfully installed and working. Further configuration is required.

For online documentation and support please refer to nginx.org.
Commercial support is available at nginx.com.

Thank you for using nginx.

几经测试,发现kill掉的容器会马上发现并重启,见下:

kill 掉202上的容器

root@worker2:~# docker kill 2ff47
2ff47
root@worker2:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

发现副本又3个变为2个

root@manager1:~# docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                             PORTS
lhxiyrevf4dp        proj_visualizer     replicated          1/1                 dockersamples/visualizer:stable   *:8080->8080/tcp
uxo0piooz4it        proj_web            replicated          2/3                 nginx:latest                      *:80->80/tcp

但几秒钟后就新起了一个容器,3个副本就恢复了。 ps看的话还是能看到Shutdown的那个容器

root@manager1:~# docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                             PORTS
lhxiyrevf4dp        proj_visualizer     replicated          1/1                 dockersamples/visualizer:stable   *:8080->8080/tcp
uxo0piooz4it        proj_web            replicated          3/3                 nginx:latest                      *:80->80/tcp
root@manager1:~# docker service ps proj_web 
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR                         PORTS
15hzrtciynx7        proj_web.3          nginx:latest        manager1            Running             Running 9 minutes ago                                  
zjtcv1e68aag        proj_web.4          nginx:latest        worker1             Running             Running 6 minutes ago                                  
gu528xwks2h8        proj_web.5          nginx:latest        worker2             Running             Running 22 seconds ago                                 
7mvta3g7m8fm         \_ proj_web.5      nginx:latest        worker2             Shutdown            Failed 28 seconds ago    "task: non-zero exit (137)"  

由此可见,docker stop停掉的容器可能认为是人工的方式,人为之,docker swarm集群就不再新起,这里可能是官方的bug

docker service update proj_web 也会使上面docker stop方式停掉的容器重启

root@manager1:~# docker service update proj_web 
proj_web
overall progress: 3 out of 3 tasks 
1/3: running   [==================================================>] 
2/3:   
3/3: running   [==================================================>] 
verify: Service converged 

并且删掉了原各种原因退出的容器,以下可见原 “7mvta3g7m8fm         \_ proj_web.5      nginx:latest        worker2             Shutdown            Failed 5 minutes ago      "task: non-zero exit (137)"”的容器没了

root@manager1:~# docker service ps proj_web 
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
r5ievoac5x3o        proj_web.1          nginx:latest        worker2             Running             Running 11 seconds ago                       
15hzrtciynx7        proj_web.3          nginx:latest        manager1            Running             Running 16 minutes ago                       
zjtcv1e68aag        proj_web.4          nginx:latest        worker1             Running             Running 13 minutes ago  

去202上看退出的容器也是没了

root@worker2:~# docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
69346c5e694e        nginx:latest        "nginx -g 'daemon of…"   2 minutes ago       Up 2 minutes        80/tcp              proj_web.1.r5ievoac5x3obvgelod00o9vw

 

又发现个问题,当尝试用attach进入容器时,会一直hang住,手动断开后,容器挂了

root@manager1:~# docker attach proj_web.3.15hzrtciynx72plzhjula2cd5 
^C
root@manager1:~# 
root@manager1:~# docker service ls    
ID                  NAME                MODE                REPLICAS            IMAGE                             PORTS
lhxiyrevf4dp        proj_visualizer     replicated          1/1                 dockersamples/visualizer:stable   *:8080->8080/tcp
uxo0piooz4it        proj_web            replicated          2/3                 nginx:latest                      *:80->80/tcp
root@manager1:~# docker stack ps proj 
ID                  NAME                IMAGE                             NODE                DESIRED STATE       CURRENT STATE             ERROR               PORTS
r5ievoac5x3o        proj_web.1          nginx:latest                      worker2             Running             Running 18 hours ago                          
ige19sbe6zma        proj_visualizer.1   dockersamples/visualizer:stable   manager1            Running             Running 19 hours ago                          
15hzrtciynx7        proj_web.3          nginx:latest                      manager1            Shutdown            Complete 18 seconds ago                       
zjtcv1e68aag        proj_web.4          nginx:latest                      worker1             Running             Running 19 hours ago     

发现proj_web.3已经Shutdown了,而且集群感知不到,不会重启,就像前面的手动stop一样

当docker service update时也会hang住,且没有新起挂掉的容器

root@manager1:~# docker service update proj_web 
proj_web
overall progress: 2 out of 3 tasks 
1/3:   
2/3: running   [==================================================>] 
3/3: running   [==================================================>] 
^C
Operation continuing in background.
Use `docker service ps proj_web` to check progress.

root@manager1:~# docker service ps proj_web 
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE             ERROR               PORTS
r5ievoac5x3o        proj_web.1          nginx:latest        worker2             Running             Running 19 hours ago                          
15hzrtciynx7        proj_web.3          nginx:latest        manager1            Shutdown            Complete 12 minutes ago                       
zjtcv1e68aag        proj_web.4          nginx:latest        worker1             Running             Running 19 hours ago        

当docker service scale proj_web=3时也会hang住,只是新加的容器启动了,那个死去的容器就是连不通,可能是服务内部网络的问题

root@manager1:~# docker service scale proj_web=4
proj_web scaled to 4
overall progress: 3 out of 4 tasks 
1/4: running   [==================================================>] 
2/4:   
3/4: running   [==================================================>] 
4/4: running   [==================================================>] 

考虑是容器网络的问题,那么把有问题的容器删掉呢

root@manager1:~# docker ps -a
CONTAINER ID        IMAGE                             COMMAND                  CREATED              STATUS                      PORTS               NAMES
ecd4a8bebc9c        nginx:latest                      "nginx -g 'daemon of…"   About a minute ago   Up About a minute           80/tcp              proj_web.2.xwmvxfs23ijab0wessptod8d7
d58eb620e9aa        nginx:latest                      "nginx -g 'daemon of…"   19 hours ago         Exited (0) 21 minutes ago                       proj_web.3.15hzrtciynx72plzhjula2cd5
1879b088ca37        dockersamples/visualizer:stable   "npm start"              19 hours ago         Up 19 hours                 8080/tcp            proj_visualizer.1.ige19sbe6zma322oplynf7csp
5bc982461a41        dockersamples/visualizer:stable   "npm start"              27 hours ago         Exited (0) 19 hours ago                         proj_visualizer.1.quexa31y9kkt1m4a4pqcwskzt
root@manager1:~# docker rm d58
d58
root@manager1:~# docker service update proj_web 
proj_web
overall progress: 3 out of 4 tasks 
1/4:   
2/4: running   [==================================================>] 
3/4: running   [==================================================>] 
4/4: running   [==================================================>] 
^C
Operation continuing in background.
Use `docker service ps proj_web` to check progress.
##### 可见删掉也还是不行的,集群服务里还是有4个副本,只是一个一直不通
root@manager1:~# docker rm 5bc
5bc
root@manager1:~# docker service update proj_web 
proj_web
overall progress: 3 out of 4 tasks 
1/4:   
2/4: running   [==================================================>] 
3/4: running   [==================================================>] 
4/4: running   [==================================================>] 
^C
Operation continuing in background.
Use `docker service ps proj_web` to check progress.
root@manager1:~# docker service scale proj_web=3
proj_web scaled to 3
overall progress: 3 out of 3 tasks 
1/3: running   [==================================================>] 
2/3: running   [==================================================>] 
3/3: running   [==================================================>] 
verify: Service converged 
##### 当重置为3个副本时,就ok了。也就说明了是坏掉容器的网络问题
root@manager1:~# docker service ps proj_web 
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
r5ievoac5x3o        proj_web.1          nginx:latest        worker2             Running             Running 19 hours ago                        
xwmvxfs23ija        proj_web.2          nginx:latest        manager1            Running             Running 4 minutes ago                       
zjtcv1e68aag        proj_web.4          nginx:latest        worker1             Running             Running 19 hours ago                        
root@manager1:~# 

 

转载于:https://my.oschina.net/u/914655/blog/1596700

你可能感兴趣的:(docker swarm实践)