本文根据 DCOS官方文档 https://docs.mesosphere.com/1.9/ 翻译整理。
概览
1. DC/OS是什么?
关键词:distributed system, cluster manager,container platform, operating system
DC/OS是一个分布式集群管理系统,容器平台,数据中心操作系统。
1.1 分布式系统
节点分为master nodes 和 agent nodes
master nodes: leader election
1.2 集群管理器
基于Apache Mesos
1.3 容器平台
两个内置task scheduler:
- Marathon
- Metronome(DC/OS Jobs)
两个container runtimes:
- Docker
- Mesos
所有运行在DC/OS的task都是容器化的。可以使用现成的镜像,也可以使用可执行文件或脚本(会在运行时做容器化处理)
目前需要每个节点安装Docker,将来版本可能不再需要Docker引擎
1.4 操作系统
- abstract resources
- package management
- networking
- logging and metrics
- storage and volumes
- identity management
内核空间和用户空间
- system space 资源分配、安全、进程隔离
- user space: applications,jobs,services(package manager)
not a host operating system
2. 架构
关键词:Software Layer, Platform Layer, Infrastructure Layer, package, component
运行分布式、容器化软件的平台,屏蔽基础设施层差异,可运行在虚拟机、物理机,
提供计算、存储和网络服务的基础设施即可。
总体架构分为软件、平台和基础设施层。
软件层
软件层的特色是package
提供package management和package repository简化服务安装
(db,mq,stream processors,artifact repositories,monitoring solutions,ci tools,src contol,log aggregators)
用户也可以安装自定义服务
平台层
平台层主要由分布在各个节点的组件构成
组件有以下类别:
- Cluster Management
- Container Orchestration
- Container Runtimes
- Logging and Metrics
- Networking
- Package Management
- IAM and Security (身份及访问管理(Identity and Access Management))
- Storage
组件运行在Master nodes,Private Agent Notes,Public Agent Nodes
Infrastructure Layer
can be installed on x86 machinse on a shared IPv4 network
- public clouds
- private clouds
- on-premises hardware
External components
- GUI
- CLI
- package repository
- container registry(master.mesos:5000)
接下来是以下内容:
- 节点类型
- Task类型
- 组件
- 分布式进程管理
- 启动时序
2.1 节点类型
关键词:master node, public/private agent node,leader election, quorum
DC/OS node是一个虚拟机或物理机。 node有三类:
- master
- private agent
- public agent
DMZ:demilitarized zone
私有 | 受保护 | 公开 |
---|---|---|
private agent | master | public agent |
Master Nodes
大部分DC/OS组件运行在Master nodes, 包括Mesos master process
Protected Zone: master节点通常应该部署到网络访问受限的区域内。
HA: 多个master实现HA和容错。一个master的集群适用于开发
Leader Election: master节点之间的leader election将流量路由到当前leader. 同时其他组件也有自己独立的leader election,不同组件的leader可能位于不同的master
Quorum: Quorum表示法定仲裁数。保证50% + 1数量的master节点总是可用,例如3个master要保证至少2个,5个保证3个。master节点的数目只能在安装时指定。主要是由于改变Quorum比较复杂。后续可能会改进。
Agent Nodes
agent node用于运行用户tasks.
agent node只包含少数组件,包含一个Mesos Agent process.
根据网络配置不同,agent分为public和private.
Public Agent Nodes:
允许从集群外部访问。
public agent nodes的资源只会被分配给角色为slave_public
的task.
public agent node的Mesos agent有public_ip:true
属性作为标识。
public agent node主要用于外部反向代理LB,如Marathon-LB. 仅对外暴露DMZ减少恶意攻击风险。
集群一般有少量public agent nodes.
Private Agent Nodes:
private agent node无法访问集群外部网络。
private node agent 的资源默认执行无差别分配。 确切说,资源具有*
角色,可分配给没有指定role的任意task.
private agent nodes运行大部分任务,且不暴露到外部网络,因此private agent 会占到集群节点的大多数。大多数Mesosphere Universe package也是默认安装到private agent node.
2.2 Task类型
关键词:scheduler, executor, task, Marathon,Metronome
task被两种调度器调度:内置调度器和调度服务。
Executors
scheduler在launch一个task时,指定Mesos Executor.
在Mesos中,scheduler和executor合称为framework.
内置executor对所有scheduler可用,scheduler也可以使用自定义executor.
- Command Executor: 运行shell commands或Docker containers
- Default Executor(Mesos 1.1): 执行一组shell commands或Docker containers
Schedulers
用户并不会直接控制task. scheduler提供了对task控制的高级抽象。
内置scheduler
- Marathon scheduler provides services(Apps and Pods), run continuously and in parallel
- Metronome scheduler provides jobs, run immediately or on a defined schedule.
User space schedulers--自定义调度器
用户可以安装额外的作为service的调度器,如
- Kafka scheduler provides Kafka brokers
- Cassandra scheduler provides Cassandra nodes
- Spark scheduler(dispatcher) provides Spark jobs
2.3 Components
DC/OS是由很多开源微服务组件组成
Mesosphere Enterprise DC/OS 除包含大部分开源微服务组件外,还包含一些额外的组件、模块和插件。
From the top: batteries-included 容器平台
- 容器编排
- package管理
- 安全
From the bottom: 基于Apache Mesos的操作系统
- 集群管理
- SDN
- 日志和计量数据收集
下面按类别介绍各个组件
- Cluster Management
- Container Orchestration
- Container Runtimes
- Logging and Metrics
- Networking
- Package Management
- IAM and Security (身份及访问管理(Identity and Access Management))
- Storage
2.3.1 Cluster Management
集群管理相关组件 | 系统服务 |
---|---|
Apache Mesos | dcos-mesos-master.service dcos-mesos-slave.service dcos-mesos-slave-public.service |
Apache ZooKeeper | 被Exhibitor管理 |
Exhibitor | dcos-exhibitor.service |
DC/OS Installer | dcos-download.service dcos-setup.service |
DC/OS GUI | served by Admin Router |
DC/OS CLI | a user downloadable binary |
Apache Mesos
kernel
Apache ZooKeeper
|consistent, highly available, distributed key-value storage for configuration, synchronization, name registration, and cluster state storage.
ZooKeeper被Exhibitor管理
Exhibitor
管理ZooKeeper并提供Web Interface
DC/OS Installer
- dcos_generate_config.ee.sh 生成install artifacts并安装DC/OS.
- DC/OS Download服务从bootstrap下载install artifacts
- DC/OS Setup服务使用DC/OS Component Pakage Manager(Pkgpanda)安装组件
DC/OS GUI
The GUI is served by Admin Router.
DC/OS CLI
a user downloadable binary.
2.3.2 Container Orchestration
持续、自动化调度、协调、管理容器化进程和资源--容器编排
容器编排相关组件 | 系统服务 |
---|---|
Marathon | dcos-marathon.service |
Metronome | dcos-metronome.service |
Marathon
orchestrates long-lived containerized services(app and pods).
Metronome
aka DC/OS Jobs
orchestrates short-lived, scheduled or immediate, containerized jobs
2.3.3 Container Runtimes
容器运行时相关组件 | 系统服务 |
---|---|
Universal Container Runtime | part of Mesos Agent |
Docker Engine | docker.service |
Docke GC(Since 1.9.0) | dcos-docker-gc.service dcos-docker-gc.timer |
Universal Container Runtime
also called Mesos Containerizer
- a logical component built-in to the Mesos Agent
- not technically a separate process
- containerizes Mesos tasks with configurable isolators
- supports multiple image formats, incluing Docker images with out Docker engine.
Universal Container Runtime is part of Mesos Agent.
Docker Engine
DC/OS Installer 不会安装Docker Engine.
Docker Engine 作为节点系统依赖需要手动安装。
Mesos Agent也包含一个独立逻辑组件Docker Containerizer
Docker Engine is not installed by the DC/OS installer.
Docker GC
NEW IN 1.9.0
2.3.4 Logging and Metrics
aggregating, caching, and streaming logs, metrics, and cluster state metadata.
日志和计量相关组件 | 系统服务 |
---|---|
DC/OS Network Metrics(Enterprise DC/OS) | dcos-networking_api.service |
3DT | dcos-3dt.service dcos-3dt.socket |
DC/OS Log(Since 1.9.0) | dcos-log-master.service dcos-log-master.socket dcos-log-agent.service dcos-log-agent.socket |
Logrotate | dcos-logrotate-master.service dcos-logrotate-master.timer dcos-logrotate-agent.service dcos-logrotate-agent.timer |
DC/OS Metrics | dcos-metrics-master.service dcos-metrics-master.socket dcos-metrics-agent.service dcos-metrics-agent.socket |
DC/OS Signal(建议安装时取消) | dcos-signal.service dcos-signal.timer |
DC/OS History | dcos-history.service |
DC/OS Network Metrics(Enterprise DC/OS)
DC/OS Network Metrics 即 DC/OS Networking API.
3DT
3DT: DC/OS Distributed Diagnostics Tool
aggregates and exposes component health.
DC/OS Log
NEW IN 1.9.0
exposes node, component, and container(task) logs.
Logrotate
manages rotation, compression and deletion of historical log files.
DC/OS Metrics
NEW IN 1.9.0
exposes node, container, and application metrics.
DC/OS Signal
reports cluster telemetry and analytics to help improve DC/OS.
安装时可以取消该选项。
DC/OS History
caches and exposes historical system state to facilitate cluster usage statistics in the GUI.
2.3.5 Networking
DC/OS 的networking components 用于routing, proxying, name resolution, virtual IPs, load balancing, and distributed reconfiguration.
网络相关组件 | 系统服务 |
---|---|
Admin Router | dcos-adminrouter.service dcos-adminrouter-reload.service dcos-adminrouter-reload.timer dcos-adminrouter-agent.service dcos-adminrouter-agent-reload.service dcos-adminrouter-agent-reload.timer |
Mesos DNS | dcos-mesos-dns.service |
DNSForwarder(Spartan) | dcos-spartan.service dcos-spartan-watchdog.service dcos-spartan-watchdog.timer |
Generate resolv.conf | dcos-gen-resolvconf.service dcos-gen-resolvconf.timer |
Minuteman | Included in Navstar |
Navstar | dcos-navstar.service |
Erlang Port Mapping Daemon(EPMD) | dcos-epmd.service |
Admin Router
endpoints 代理
proxies node-specific health, logs, metrics, and package management internal endpoints.
Mesos DNS
provides domain name based service discoverty within the cluster.
集群内基于域名的服务发现
DNSForwarder(Spartan)
forwards DNS requests to multiple DNS servers. Spartan Watchdog restarts Spartan when it is unhealthy.
Generate resolv.conf
Generate resolv.conf configures network name resolution by updating /etc/resolv.conf to facilitate DC/OS’s software defined networking.
Minuteman
provides distributed Layer 4 virtual IP load balancing.
Included in Navstar.
Navstar
orchestrates virtual overlay networks using VXLAN and manages distributed Layer 4 virtual IP load balancing.
Erlang Port Mapping Daemon(EPMD)
facilitates communication between distributed Erlang programs.
2.3.6 Package Management
package管理分为两个层次:
- machine-level for components
- cluster-level for user services
Package管理相关组件 | 系统服务 |
---|---|
DC/OS Package Manager(Cosmos) | dcos-cosmos.service |
DC/OS Component Package Manager (Pkgpanda) | dcos-pkgpanda-api.service dcos-pkgpanda-api.socket |
Cosmos
aka DC/OS Package Manager
负责从DC/OS package repositories(如Mesosphere Universe)安装package
Pkgpanda
aka DC/OS Component Package Manager
安装并管理DC/OS组件
2.3.7 IAM and Security
在Enterprise DC/OS中,IAM功能由一个内部数据库管理用户、用户组和权限。
以下组件仅包含于Enterprise DC/OS
|IAM相关组件|系统服务|
|Bouncer|dcos-bouncer.service
|
|DC/OS Certificate Authority|dcos-ca.service
|
|DC/OS Secrets|dcos-secrets.service
|
|Vault|dcos-vault.service
|
Bouncer
即DC/OS Identity and Access Manager
支持LDAP, SAML, or Open ID Connect
DC/OS Certificate Authority
处理已签名的数字证书相关。
基于 Cloudflare’s Cfssl.
DC/OS Secrets
存储secrets.
such as API keys, passwords, certificates, and more
provides a secure API for storing and retrieving secrets from Vault, a secret store.
Vault
for securely managing secrets
provides a unified interface to any secret.
2.3.8 Storage
存储相关组件 | 系统服务 |
---|---|
REX-Ray | dcos-rexray.service |
REX-Ray
orchestrates provisioning, attachment, and mounting of external persistent volumes
2.3.9 Legacy Component Changes
Cluster ID Service已在1.9.0移除。DC/OS Setup Service负责生成集群UUID.
Mesos Persistent Volume Discoverty service已在1.9.0移除。 mounted disk resource的检测由DC/OS Setup service执行。
2.3.10 Sockets and Timers
有些组件是响应式的,使用 systemd sockets 按需启动,而不是持续运行占用资源。
这些sockets作为独立的systemd units存在,不作为独立组件。
有些组件使用 systemd timers 定时运行或重启。 也不作为独立组件。
2.3.11 Component Installation
组件的安装、升级和管理由Pkgpanda负责。
2.3.12 Systemd Services
大多数组件以 systemd services 的形式运行在各个节点上。
查看方法:
查看
/etc/systemd/system/dcos.target.wants/
或执行 systemctl | grep dcos-
Master Node
[vagrant@m1 ~]ls /etc/systemd/system/dcos.target.wants/
dcos-3dt.service dcos-marathon.service
dcos-3dt.socket dcos-mesos-dns.service
dcos-adminrouter-reload.service dcos-mesos-master.service
dcos-adminrouter-reload.timer dcos-metrics-master.service
dcos-adminrouter.service dcos-metrics-master.socket
dcos-bouncer.service dcos-metronome.service
dcos-ca.service dcos-navstar.service
dcos-cosmos.service dcos-networking_api.service
dcos-epmd.service dcos-pkgpanda-api.service
dcos-exhibitor.service dcos-pkgpanda-api.socket
dcos-gen-resolvconf.service dcos-secrets.service
dcos-gen-resolvconf.timer dcos-signal.service
dcos-history.service dcos-signal.timer
dcos-log-master.service dcos-spartan.service
dcos-log-master.socket dcos-spartan-watchdog.service
dcos-logrotate-master.service dcos-spartan-watchdog.timer
dcos-logrotate-master.timer dcos-vault.service
Private Agent Node
[vagrant@a1 ~]ls /etc/systemd/system/dcos.target.wants/
dcos-3dt.service dcos-logrotate-agent.timer
dcos-3dt.socket dcos-mesos-slave.service
dcos-adminrouter-agent-reload.service dcos-metrics-agent.service
dcos-adminrouter-agent-reload.timer dcos-metrics-agent.socket
dcos-adminrouter-agent.service dcos-navstar.service
dcos-docker-gc.service dcos-pkgpanda-api.service
dcos-docker-gc.timer dcos-pkgpanda-api.socket
dcos-epmd.service dcos-rexray.service
dcos-gen-resolvconf.service dcos-signal.timer
dcos-gen-resolvconf.timer dcos-spartan.service
dcos-log-agent.service dcos-spartan-watchdog.service
dcos-log-agent.socket dcos-spartan-watchdog.timer
dcos-logrotate-agent.service
Public Agent Node
[vagrant@p1 ~]ls /etc/systemd/system/dcos.target.wants/
dcos-3dt.service dcos-logrotate-agent.timer
dcos-3dt.socket dcos-mesos-slave-public.service
dcos-adminrouter-agent-reload.service dcos-metrics-agent.service
dcos-adminrouter-agent-reload.timer dcos-metrics-agent.socket
dcos-adminrouter-agent.service dcos-navstar.service
dcos-docker-gc.service dcos-pkgpanda-api.service
dcos-docker-gc.timer dcos-pkgpanda-api.socket
dcos-epmd.service dcos-rexray.service
dcos-gen-resolvconf.service dcos-signal.timer
dcos-gen-resolvconf.timer dcos-spartan.service
dcos-log-agent.service dcos-spartan-watchdog.service
dcos-log-agent.socket dcos-spartan-watchdog.timer
dcos-logrotate-agent.service
2.4 Distributed Process Management
进程通信发生在层与层之间、同一层内部。
以Marathon部署Docker容器的service为例:
Step | Description |
---|---|
1 | Client/Scheduler 初始化: 客户端需要知道怎样连接Scheduler来启动一个process例如通过Mesos-DNS或DC/OS CLI. |
2 | Mesos master发送资源offer给Scheduler: 基于Mesos master的DRF算法和agent资源计算offer |
3 | Scheduler 拒绝资源offer,因为没有process请求。只要process没有初始化,scheduler会拒绝master的资源offer. |
4 | Client 初始化 process launch. 例如,用户通过DC/OS Services页面或HTTP endpoint v2/app创建Marathon app |
5 | Mesos master 发送资源offer. 例如 cpus():1; mem():128; ports(*):[21452-21452] |
6 | 如果资源offer满足Scheduler的需求, Scheduler接受offer并发送lunchTask请求到Mesos master. |
7 | Mesos master协调Mesos agents启动task. |
8 | Mesos agent通过Executor启动task. |
9 | Executor向Mesos agent报告task状态. |
10 | Mesos agent向Mesos master报告task状态. |
11 | Mesos master向scheduler报告task状态. |
12 | Scheduler向client报告process状态. |
2.5 Boot Sequence
安装DC/OS时, 各个组件是并行安装,但由于存在依赖关系,它们的初始化是有一定顺序的。
3DT服务用于监控组件服务和节点健康。如果一个节点的全部组件服务是健康的,该节点会被标记为健康。
Master节点
以下是Master节点的组件服务启动时序。
- 启动3DT
- 轮询systemd查询组件状态
- 报告节点unhealthy,直到所有组件(systemd services)变成健康状态
- 报告集群unhealthy,直到所有master节点变成健康状态
- 启动Exhibitor
- 创建ZooKeeper配置并启动ZooKeeper
- 启动Mesos Master
- 通过本地ZooKeeper注册
- 从ZooKeeper发现其他Mesos Master
- 选举leading master
- 启动Mesos DNS
- 发现leading Mesos Master(通过zk或mesos-master?--原文如此)
- 轮询leading Mesos Master查看集群状态