作者:彭靖田
在Kubernetes的世界中,一切服务都是跑在容器中的,最简单的容器组是Pod。基于现实世界中的具体任务,Kubernetes抽象了更高级的容器组,如:ReplicaSet、Deployment、Job等。对于Web类型的长周期服务来说,重点考察两个需求:高可用(High Availability)和可伸缩性(Scalability)。换句话说,Kubernetes的目的是想让服务开发和运维人员,从以前像宠物(pet)一样呵护服务的软硬件配置中解放出来,转而像牲畜(cattle)一样管理服务,由Kubernetes来保证服务的稳定运行。
为此,Kubernetes提出了ReplicaSet和Deployment。
ReplicaSet是下一代的副本控制器(ReplicationController),在早期的Kubernetes版本中,相同服务的弹性伸缩由Replication Controller控制。在最新的Kubernetes版本中,两者的唯一区别在于对标签选择器(label selector)的支持。ReplicaSet支持set-based的标签选择器,而Replication Controller仅支持equality-based的标签选择器。这两种标签的区别见标签使用指导:https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
当明确服务本质没有区别,只需要增加固定数量的副本时,我们可以选择使用ReplicaSet,比如:nginx服务、redis服务。
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
name: frontend
# theselabels can be applied automatically
#from the labels in the pod template if not set
#labels:
# app: guestbook
# tier: frontend
spec:
#this replicas value is default
#modify it according to your case
replicas: 3
#selector can be applied automatically
#from the labels in the pod template if not set,
#but we are specifying the selector here to
#demonstrate its usage.
selector:
matchLabels:
tier: frontend
matchExpressions:
- {key: tier, operator: In, values: [frontend]}
template:
metadata:
labels:
app: guestbook
tier: frontend
spec:
containers:
- name: php-redis
image: gcr.io/google_samples/gb-frontend:v3
resources:
requests:
cpu: 100m
memory: 100Mi
env:
- name: GET_HOSTS_FROM
value: dns
# If your cluster config does not include adns service, then to
# instead access environment variables tofind service host
# info, comment out the 'value: dns' lineabove, and uncomment the
# line below.
# value: env
ports:
- containerPort: 80
不妨设此服务的作业描述文件为frontend.yaml,在Kubernetes集群中创建3个一样的frontend服务的副本(replicas)。
$ kubectl create -f frontend.yaml
replicaset "frontend" created
$ kubectl describe rs/frontend
Name: frontend
Namespace: default
Image(s): gcr.io/google_samples/gb-frontend:v3
Selector: tier=frontend,tier in (frontend)
Labels: app=guestbook,tier=frontend
Replicas: 3 current / 3 desired
Pods Status: 3 Running / 0 Waiting /0 Succeeded / 0 Failed
No volumes.
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 1 {replicaset-controller } Normal SuccessfulCreate Created pod:frontend-qhloh
1m 1m 1 {replicaset-controller } Normal SuccessfulCreate Created pod:frontend-dnjpy
1m 1m 1 {replicaset-controller } Normal SuccessfulCreate Created pod:frontend-9si5l
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
frontend-9si5l 1/1 Running 0 1m
frontend-dnjpy 1/1 Running 0 1m
frontend-qhloh 1/1 Running 0 1m
自此,frontend的3个副本已经可以对外提供服务。但是,如果想要更进一步便捷的管理服务,如版本升级、回退,服务的弹性伸缩。还需要祭出另一个神器——Deployment。
Deployment为Pods和ReplicaSets的管理提供了一套抽象的描述语法,用户只需要描述自己期望服务达到的状态,Kubernetes通过自己的“黑魔法”为你实现背后的一切调度工作。
以我们MIND深度学习平台的推理(inference)服务为例,不妨设其作业描述文件为inference_deploy.yaml:
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: mnistx-65728444-inference
spec:
replicas: 2
template:
metadata:
labels:
name: mnistx-65728444
type: inference
spec:
containers:
- name: tf-serving
image: xx.xx.xx.xx:xxxx/mind/tf-serving:0.5.1
ports:
- containerPort: xxxx
command:
- "./tensorflow_model_server"
args:
- "--model_name=mnist"
- "--model_base_path=/mnt/nfs/dlks/mnistx-60496603/service"
- "--port=xxxx"
volumeMounts:
- name: mynfs
mountPath: /mnt/nfs/dlks
securityContext:
privileged: true
volumes:
- name: mynfs
nfs:
path: /
server: xx.xx.xx.xx
restartPolicy: Always
我们定义了2个MNIST推理服务的副本(replicas),在Kubernetes集群中创建此服务:
$ kubectl create -f inference-deploy.yaml
deployment"mnistx-65728444-inference" created
$ kubectl get deploy
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
mnistx-65728444-inference 2 2 2 2 33s
$ kubectl get rs
NAME DESIRED CURRENT READY AGE
mnistx-65728444-inference-409334488 2 2 2 2m
$ kubectl get po
NAME READY STATUS RESTARTS AGE
mnistx-65728444-inference-409334488-08t2l 1/1 Running 0 2m
mnistx-65728444-inference-409334488-jc9mp 1/1 Running 0 2m
不难发现,Deployment和ReplicaSet都是从更高层的抽象角度在描述服务的部署情况,如期望容器(DESIRED)、当前容器(CURRENT)、最新容器(UP-TO-DATE)、可用容器(AVAILABLE/READY)。而Pod则更关注容器本身的运行状态和重启次数。
再次强调,所有的高级抽象容器最终都由Pod来实现,高级之处在于抽象的语义满足了更具体的服务需求。
为了让MNIST推理服务接收让Kubernetes集群外部的访问请求,需要再创建一个反向代理的Service资源,不妨设其描述文件为inference-service.yaml。
kind: Service
apiVersion: v1
metadata:
name: mnistx-65728444-inference
spec:
selector:
name: mnistx-65728444
type: NodePort
ports:
-protocol: TCP
port: xxx
targetPort: xxxx
nodePort: xxxxx
创建Service,并查看其代理的终端(endpoint):
$ kubectl create -f inference-service.yaml
service"mnistx-65728444-inference" created
$ kubectl describe svc
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Selector:
Type: ClusterIP
IP: xx.xx.xx.xx
Port: https 443/TCP
Endpoints: xx.xx.xx.xx:xxxx
Session Affinity: ClientIP
No events.
Name: mnistx-65728444-inference
Namespace: default
Labels:
Selector: name=mnistx-65728444
Type: NodePort
IP: xx.xx.xx.xx
Port:
NodePort:
Endpoints: xx.xx.xx.xx:xxxx, xx.xx.xx.xx:xxx
Session Affinity: None
No events.
$ kubectl describe po | grep IP
IP: xx.xx.xx.xx
IP: xx.xx.xx.xx
现在,假设访问MNIST推理服务的请求数量突然暴增10倍,我们需要将MNIST推理服务副本增加到20个。在Kubernetes的世界中,只需要一行命令即可:
$ kubectl scale --replicas=20deploy/mnistx-65728444-inference
deployment"mnistx-65728444-inference" scaled
$ kubectl get deploy
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
mnistx-65728444-inference 20 20 20 3 13m
Service瞬间识别到新增的终端:
$ kubectl describe svc
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Selector:
Type: ClusterIP
IP: xx.xx.xx.xx
Port: https 443/TCP
Endpoints: xx.xx.xx.xx:xxxx
Session Affinity: ClientIP
No events.
Name: mnistx-65728444-inference
Namespace: default
Labels:
Selector: name=mnistx-65728444
Type: NodePort
IP: xx.xx.xx.xx
Port:
NodePort:
Endpoints: xx.xx.xx.xx:xxxx,xx.xx.xx.xx :xxxx, xx.xx.xx.xx:xxxx + 17 more...
Session Affinity: None
No events.
5秒后,新增的所有服务副本全部Ready,Kubernetes的调度速度非常快。
$ kubectl get rs
NAME DESIRED CURRENT READY AGE
mnistx-65728444-inference-409334488 20 20 20 14m
$ kubectl get po
NAME READY STATUS RESTARTS AGE
mnistx-65728444-inference-409334488-08t2l 1/1 Running 0 14m
mnistx-65728444-inference-409334488-3hrbn 1/1 Running 0 36s
mnistx-65728444-inference-409334488-4p7h4 1/1 Running 0 36s
mnistx-65728444-inference-409334488-775r3 1/1 Running 0 36s
mnistx-65728444-inference-409334488-91lx4 1/1 Running 1 36s
mnistx-65728444-inference-409334488-bj1mh 1/1 Running 0 36s
mnistx-65728444-inference-409334488-d16qn 1/1 Running 0 36s
mnistx-65728444-inference-409334488-fv7g6 1/1 Running 0 36s
mnistx-65728444-inference-409334488-hss1g 1/1 Running 0 36s
mnistx-65728444-inference-409334488-hvjbl 1/1 Running 0 36s
mnistx-65728444-inference-409334488-jc9mp 1/1 Running 0 14m
mnistx-65728444-inference-409334488-q8hq1 1/1 Running 0 36s
mnistx-65728444-inference-409334488-qcpkv 1/1 Running 0 36s
mnistx-65728444-inference-409334488-qdqmb 1/1 Running 0 36s
mnistx-65728444-inference-409334488-qt7wn 1/1 Running 0 36s
mnistx-65728444-inference-409334488-r5scs 1/1 Running 0 36s
mnistx-65728444-inference-409334488-sv7zf 1/1 Running 0 36s
mnistx-65728444-inference-409334488-wr8wv 1/1 Running 0 36s
mnistx-65728444-inference-409334488-ztp48 1/1 Running 0 36s
mnistx-65728444-inference-409334488-zw933 1/1 Running 0 36s
如果请求数量变少,减少服务副本数量也是一行命令的事儿:
$ kubectl scale --replicas=1deploy/mnistx-65728444-inference
deployment"mnistx-65728444-inference" scaled
$ kubectl get deploy
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
mnistx-65728444-inference 1 1 1 1 17m
$ kubectl get rs
NAME DESIRED CURRENT READY AGE
mnistx-65728444-inference-409334488 1 1 1 17m
$ kubectl get po
NAME READY STATUS RESTARTS AGE
mnistx-65728444-inference-409334488-08t2l 1/1 Running 0 17m
mnistx-65728444-inference-409334488-3hrbn 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-4p7h4 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-775r3 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-91lx4 1/1 Terminating 1 4m
mnistx-65728444-inference-409334488-bj1mh 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-d16qn 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-fv7g6 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-hss1g 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-hvjbl 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-jc9mp 1/1 Terminating 0 17m
mnistx-65728444-inference-409334488-q8hq1 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-qcpkv 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-qdqmb 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-qt7wn 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-r5scs 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-sv7zf 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-wr8wv 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-ztp48 1/1 Terminating 0 4m
mnistx-65728444-inference-409334488-zw933 1/1 Terminating 0 4m
整个弹性伸缩过程中,服务始终处于运行(Running)状态,极大减少了运维人员的负担。
综上:Kubernetes对于同类服务的弹性伸缩做了非常强大的处理,不论是版本升级、版本回退、灰度发布、服务发现等功能,都已经达到了业界顶尖的水准。因此,Kubernetes作为容器编排系统的事实标准也就不难理解了。感谢为此付出的Google Kubernetes组成员和广大Contributor。
Kubernetes Github:https://github.com/kubernetes/kubernetes
Kubernetes Docs:http://kubernetes.io/