Nomad 服务编排
Nomad 是一个管理机器集群并在集群上运行应用程序的工具。
快速入门
环境准备
参考之前的一篇《Consul 搭建集群》准备三台虚机。
ip | |
---|---|
n1 | 172.20.20.10 |
n2 | 172.20.20.11 |
n3 | 172.20.20.12 |
单机安装
登录到虚机n1,切换用户到root
» vagrant ssh n1
su [vagrant@n1 ~]$ su
Password:
[root@n1 vagrant]#
安装一些依赖的工具
[root@n1 vagrant]# yum install -y epel-release
[root@n1 vagrant]# yum install -y jq
[root@n1 vagrant]# yum install -y unzip
下载0.8.1版本到/tmp目录下
最新的0.8.3版本和consul结合会有反复注册服务的bug,这里使用0.8.1
[root@n1 vagrant]# cd /tmp/
[root@n1 vagrant]# curl -s https://releases.hashicorp.com/nomad/0.8.1/nomad_0.8.1_linux_amd64.zip -o nomad.zip
解压,并赋予nomad可执行权限,最后把nomad移动到/usr/bin/下
[root@n1 vagrant]# unzip nomad.zip
[root@n1 vagrant]# chmod +x nomad
[root@n1 vagrant]# mv nomad /usr/bin/nomad
检查nomad是否安装成功
[root@n1 vagrant]# nomad
Usage: nomad [-version] [-help] [-autocomplete-(un)install] [args]
Common commands:
run Run a new job or update an existing job
stop Stop a running job
status Display the status output for a resource
alloc Interact with allocations
job Interact with jobs
node Interact with nodes
agent Runs a Nomad agent
Other commands:
acl Interact with ACL policies and tokens
agent-info Display status information about the local agent
deployment Interact with deployments
eval Interact with evaluations
namespace Interact with namespaces
operator Provides cluster-level tools for Nomad operators
quota Interact with quotas
sentinel Interact with Sentinel policies
server Interact with servers
ui Open the Nomad Web UI
version Prints the Nomad version
出现如上所示代表安装成功。
批量安装
参考之前的一篇《Consul 搭建集群》批量安装这一节。
使用如下脚本可批量安装nomad,并同时为每个虚机安装好docker。
$script = <
启动 Agent
首先启动consul组成一个集群,具体参考《Consul 搭建集群》。如果用默认的配置,nomad启动后会检测本机的Consul并自动的讲nomad服务注册。
n1
[root@n1 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node1 -bind=172.20.20.10 -ui -client 0.0.0.0
n2
[root@n2 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node2 -bind=172.20.20.11 -ui -client 0.0.0.0 -join 172.20.20.10
n3
[root@n3 vagrant]# consul agent -server -bootstrap-expect 3 -data-dir /etc/consul.d -node=node3 -bind=172.20.20.12 -ui -client 0.0.0.0 -join 172.20.20.10
[root@n1 vagrant]# consul members
Node Address Status Type Build Protocol DC Segment
node1 172.20.20.10:8301 alive server 1.1.0 2 dc1
node2 172.20.20.11:8301 alive server 1.1.0 2 dc1
node3 172.20.20.12:8301 alive server 1.1.0 2 dc1
基本概念
- server 分配提交的job
- clinet 执行job任务
启动server
定义server的配置文件server.hcl
log_level = "DEBUG"
bind_addr = "0.0.0.0"
data_dir = "/home/vagrant/data_server"
name = "server1"
advertise {
http = "172.20.20.10:4646"
rpc = "172.20.20.10:4647"
serf = "172.20.20.10:4648"
}
server {
enabled = true
# Self-elect, should be 3 or 5 for production
bootstrap_expect = 3
}
在命令行中执行
[root@n1 vagrant]# nomad agent -config=server.hcl
进入到n2,n3 执行
nomad agent -config=server.hcl
打开浏览器 http://172.20.20.10:8500/ui/#/dc1/services
从consul中能看到nomad都以启动
再打开nomad自带的UI http://172.20.20.10:4646/ui/servers
可以看到server都已运行
启动client
在启动client之前需要先启动docker
,client执行job需要用到docker。
[root@n1 vagrant]# systemctl start docker
在n2,n3 也需要启动
定义client的配置文件client.hcl
log_level = "DEBUG"
data_dir = "/home/vagrant/data_clinet"
name = "client1"
advertise {
http = "172.20.20.10:4646"
rpc = "172.20.20.10:4647"
serf = "172.20.20.10:4648"
}
client {
enabled = true
servers = ["172.20.20.10:4647"]
}
ports {
http = 5656
}
在n1中输入命令
[root@n1 vagrant]# nomad agent -config=client.hcl
打开浏览器 http://172.20.20.10:8500/ui/#/dc1/services/nomad-client
可以看到nomad-client已经启动成功,同理在n2,n3也运行client。
运行 Job
进入到n2,新建一个文件夹job,运行nomad init
[root@n2 vagrant]# mkdir job
[root@n2 vagrant]# cd job/
[root@n2 job]# nomad init
Example job file written to example.nomad
以上命令新建了一个example的Job
命令行键入
[root@n2 job]# nomad run example.nomad
==> Monitoring evaluation "97f8a1fe"
Evaluation triggered by job "example"
Evaluation within deployment: "3c89e74a"
Allocation "47bf1f20" created: node "9df69026", group "cache"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "97f8a1fe" finished with status "complete"
进阶操作
集群成员
[root@n1 vagrant]# nomad server members
Name Address Port Status Leader Protocol Build Datacenter Region
server1.global 172.20.20.10 4648 alive false 2 0.8.1 dc1 global
server2.global 172.20.20.11 4648 alive false 2 0.8.1 dc1 global
server3.global 172.20.20.12 4648 alive true 2 0.8.1 dc1 global
查询 Job 状态
[root@n1 vagrant]# nomad status example
ID = example
Name = example
Submit Date = 2018-06-13T08:42:57Z
Type = service
Priority = 50
Datacenters = dc1
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
cache 0 0 1 0 0 0
Latest Deployment
ID = 3c89e74a
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy
cache 1 1 1 0
Allocations
ID Node ID Task Group Version Desired Status Created Modified
47bf1f20 9df69026 cache 0 run running 8m44s ago 8m26s ago
修改 Job
编辑 example.nomad 找到 count = 1
修改为 count = 3
在命令行中查看Job的变更计划
[root@n2 job]# nomad plan example.nomad
+/- Job: "example"
+/- Task Group: "cache" (2 create, 1 in-place update)
+/- Count: "1" => "3" (forces create)
Task: "redis"
Scheduler dry-run:
- All tasks successfully allocated.
Job Modify Index: 70
To submit the job with version verification run:
nomad job run -check-index 70 example.nomad
When running the job with the check-index flag, the job will only be run if the
server side version matches the job modify index returned. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.
执行Job的变更任务
[root@n2 job]# nomad job run -check-index 70 example.nomad
==> Monitoring evaluation "3a0ff5e0"
Evaluation triggered by job "example"
Evaluation within deployment: "2b5b803f"
Allocation "34086acb" created: node "6166e031", group "cache"
Allocation "4d01cd92" created: node "f97b5095", group "cache"
Allocation "47bf1f20" modified: node "9df69026", group "cache"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "3a0ff5e0" finished with status "complete"
可以看到又多了两个client节点去执行Job任务
在浏览器中可以看到一共有3个实例
同时也能看到Job的版本记录
[root@n2 job]# nomad status example
ID = example
Name = example
Submit Date = 2018-06-13T08:56:03Z
Type = service
Priority = 50
Datacenters = dc1
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
cache 0 0 3 0 0 0
Latest Deployment
ID = 2b5b803f
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy
cache 3 3 3 0
Allocations
ID Node ID Task Group Version Desired Status Created Modified
34086acb 6166e031 cache 1 run running 3m38s ago 3m25s ago
4d01cd92 f97b5095 cache 1 run running 3m38s ago 3m26s ago
47bf1f20 9df69026 cache 1 run running 16m43s ago 3m27s ago
离开集群
首先停止n1的nomad server,Ctrl-C
在n2上查询members
[root@n2 job]# nomad server members
Name Address Port Status Leader Protocol Build Datacenter Region
server1.global 172.20.20.10 4648 failed false 2 0.8.1 dc1 global
server2.global 172.20.20.11 4648 alive true 2 0.8.1 dc1 global
server3.global 172.20.20.12 4648 alive false 2 0.8.1 dc1 global
server1 的状态为 failed,此时将server1 移出集群
[root@n2 job]# nomad server force-leave server1.global
[root@n2 job]# nomad server members
Name Address Port Status Leader Protocol Build Datacenter Region
server1.global 172.20.20.10 4648 left false 2 0.8.1 dc1 global
server2.global 172.20.20.11 4648 alive true 2 0.8.1 dc1 global
server3.global 172.20.20.12 4648 alive false 2 0.8.1 dc1 global
server1的状态为left,移出集群成功。