一、认识Job控制器
Job控制器用于调配Pod对象运行一次性任务,容器中的进程在正常运行结束后不会对其进行重启,而是将Pod对象置于"Completed"(完成状态),若是容器中进程因错误而终止了,则需要依配置确定是否重启,未运行完成的Pod对象因其所在的节点故障而意外终止后被重新调度。
Job Controller负责根据Job Spec中的定义来创建Pod,并持续监控Pod状态,直至其成功结束。如果失败了,则根据用户定义的restartPolicy(只支持OnFailure和Never,不支持Always)来决定是否创建新的Pod再次重试任务。
在实际中,有的作业任务可能不止需要运行一次,用户可以配置它们以串行或者并行的方式运行。总结起来,这种类型的Job控制器对象有两种,具体如下:
1)单工作队列(work queue)的串行式Job:即以多个一次性的作业方式串行执行多次作业,直至满足期望的次数;这个Job也可以理解为并行度为1的作业执行方式,在某个时刻仅存在一个Pod资源对象。
2)多工作队列的并行式Job:这种方式可以设置工作队列数,即作业数,每个队列仅负责运行一个作业;也可以用有限的工作队列运行较多的任务,即工作队列数少于总作业数,相当于运行多个串行作业队列。将并行度属性.spec.paralleism的值设置为1,并设置总任务数.spec.completion属性便能够让Job控制器以串行的方式运行多任务。.spec.parallelism能够定义作业执行的并行度,将其设置为2或者以上的值即可实现并行多队列作业运行。同时,如果.spec.completions属性值设置大于.spec.parallelism的属性值,则表示使用多队列串行任务作业模式。
Job控制器待其Pod资源运行完成后,将不再占用系统资源。用户可按需保留或使用资源删除命令将其删除。不过,如果某Job控制器的容器应用总是无法正常结束运行,而其restartPolicy又定为了重启,则它可能会一直处于不停的重启和错误循环当中。所幸的是,Job控制器提供了两个属性用于抑制这种情况的方式:
1).spec.activeDeadineSeconds
2).spec.backoffLimit
二、Job控制器实验
1)编写创建Job控制器的yaml文件
]# cat job.yml
apiVersion: batch/v1
kind: Job
metadata:
name: job-example
spec:
template:
spec:
containers:
- name: myjob
image: alpine
command: ["/bin/sh","-c","sleep 120"]
restartPolicy: Never
]# kubectl apply -f job.yaml
job.batch/job-example created
2)查看Pod及Job详情
]# kubectl get jobs -o wide
NAME COMPLETIONS DURATION AGE CONTAINERS IMAGES SELECTOR
job-example 0/1 59s 59s myjob alpine controller-uid=9da99a63-6e39-4e7e-a974-fc0dd6e6216a
]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
job-example-hqc7l 1/1 Running 0 65s 10.244.2.23 node2 <none> <none>
]# kubectl get jobs -o wide
NAME COMPLETIONS DURATION AGE CONTAINERS IMAGES SELECTOR
job-example 1/1 2m9s 2m23s myjob alpine controller-uid=9da99a63-6e39-4e7e-a974-fc0dd6e6216a
可以看到当经过120秒之后,此Job的状态即转变为Completions(完成状态)
3)查看Job的详细信息
]# kubectl describe job job-example
Name: job-example
Namespace: default
Selector: controller-uid=9da99a63-6e39-4e7e-a974-fc0dd6e6216a
Labels: controller-uid=9da99a63-6e39-4e7e-a974-fc0dd6e6216a
job-name=job-example
Annotations: Parallelism: 1
Completions: 1
Start Time: Mon, 10 Aug 2020 09:08:20 +0800
Completed At: Mon, 10 Aug 2020 09:10:29 +0800
Duration: 2m9s
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=9da99a63-6e39-4e7e-a974-fc0dd6e6216a
job-name=job-example
Containers:
myjob:
Image: alpine
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
sleep 120
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 3m56s job-controller Created pod: job-example-hqc7l
Normal Completed 107s job-controller Job completed
4)创建并行队列Job
]# cat job-muilt.yml
apiVersion: batch/v1
kind: Job
metadata:
name: job-mulit
spec:
completions: 6
parallelism: 2
template:
spec:
containers:
- name: myjob
image: alpine
command: ["/bin/sh", "-c", "sleep 20"]
restartPolicy: OnFailure
]# kubectl apply -f job-muilt.yml
5)查看Job控制器状态
]# kubectl get job -o wide -w
NAME COMPLETIONS DURATION AGE CONTAINERS IMAGES SELECTOR
job-mulit 0/6 0s myjob alpine controller-uid=86b6fb38-0e4f-4d3b-a0aa-7502c49b7d41
job-mulit 0/6 0s 0s myjob alpine controller-uid=86b6fb38-0e4f-4d3b-a0aa-7502c49b7d41
job-mulit 1/6 22s 22s myjob alpine controller-uid=86b6fb38-0e4f-4d3b-a0aa-7502c49b7d41
job-mulit 2/6 25s 25s myjob alpine controller-uid=86b6fb38-0e4f-4d3b-a0aa-7502c49b7d41
job-mulit 3/6 44s 44s myjob alpine controller-uid=86b6fb38-0e4f-4d3b-a0aa-7502c49b7d41
job-mulit 4/6 46s 46s myjob alpine controller-uid=86b6fb38-0e4f-4d3b-a0aa-7502c49b7d41
job-mulit 5/6 72s 72s myjob alpine controller-uid=86b6fb38-0e4f-4d3b-a0aa-7502c49b7d41
job-mulit 6/6 73s 73s myjob alpine controller-uid=86b6fb38-0e4f-4d3b-a0aa-7502c49b7d41
]# kubectl get pods -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
job-mulit-fj7q8 0/1 Pending 0 0s <none> <none> <none> <none>
job-mulit-fj7q8 0/1 Pending 0 0s <none> node2 <none> <none>
job-mulit-cbcgv 0/1 Pending 0 0s <none> <none> <none> <none>
job-mulit-cbcgv 0/1 Pending 0 0s <none> node1 <none> <none>
job-mulit-fj7q8 0/1 ContainerCreating 0 0s <none> node2 <none> <none>
job-mulit-cbcgv 0/1 ContainerCreating 0 0s <none> node1 <none> <none>
job-mulit-cbcgv 1/1 Running 0 2s 10.244.1.70 node1 <none> <none>
job-mulit-fj7q8 1/1 Running 0 6s 10.244.2.27 node2 <none> <none>
job-mulit-cbcgv 0/1 Completed 0 22s 10.244.1.70 node1 <none> <none>
job-mulit-sbmth 0/1 Pending 0 0s <none> <none> <none> <none>
job-mulit-sbmth 0/1 Pending 0 0s <none> node1 <none> <none>
job-mulit-sbmth 0/1 ContainerCreating 0 0s <none> node1 <none> <none>
job-mulit-sbmth 1/1 Running 0 2s 10.244.1.71 node1 <none> <none>
job-mulit-fj7q8 0/1 Completed 0 25s 10.244.2.27 node2 <none> <none>
job-mulit-xln8h 0/1 Pending 0 0s <none> <none> <none> <none>
job-mulit-xln8h 0/1 Pending 0 0s <none> node2 <none> <none>
job-mulit-xln8h 0/1 ContainerCreating 0 0s <none> node2 <none> <none>
job-mulit-xln8h 1/1 Running 0 2s 10.244.2.28 node2 <none> <none>
job-mulit-sbmth 0/1 Completed 0 22s 10.244.1.71 node1 <none> <none>
job-mulit-jkv78 0/1 Pending 0 0s <none> <none> <none> <none>
job-mulit-jkv78 0/1 Pending 0 0s <none> node1 <none> <none>
job-mulit-jkv78 0/1 ContainerCreating 0 0s <none> node1 <none> <none>
job-mulit-xln8h 0/1 Completed 0 21s 10.244.2.28 node2 <none> <none>
job-mulit-x49rj 0/1 Pending 0 0s <none> <none> <none> <none>
job-mulit-x49rj 0/1 Pending 0 0s <none> node2 <none> <none>
job-mulit-x49rj 0/1 ContainerCreating 0 0s <none> node2 <none> <none>
job-mulit-jkv78 1/1 Running 0 8s 10.244.1.72 node1 <none> <none>
job-mulit-x49rj 1/1 Running 0 7s 10.244.2.29 node2 <none> <none>
job-mulit-jkv78 0/1 Completed 0 28s 10.244.1.72 node1 <none> <none>
job-mulit-x49rj 0/1 Completed 0 27s 10.244.2.29 node2 <none> <none>
从监视Job控制器和Pod状态中可以发现,Job控制器下的各Pod都是两个两个一起完成的,符合我们定义的并发任务执行数
6)查看Job详细信息
]# kubectl describe jobs job-mulit
Name: job-mulit
Namespace: default
Selector: controller-uid=86b6fb38-0e4f-4d3b-a0aa-7502c49b7d41
Labels: controller-uid=86b6fb38-0e4f-4d3b-a0aa-7502c49b7d41
job-name=job-mulit
Annotations: Parallelism: 2
Completions: 6
Start Time: Mon, 10 Aug 2020 09:19:19 +0800
Completed At: Mon, 10 Aug 2020 09:20:32 +0800
Duration: 73s
Pods Statuses: 0 Running / 6 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=86b6fb38-0e4f-4d3b-a0aa-7502c49b7d41
job-name=job-mulit
Containers:
myjob:
Image: alpine
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
sleep 20
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 3m58s job-controller Created pod: job-mulit-fj7q8
Normal SuccessfulCreate 3m58s job-controller Created pod: job-mulit-cbcgv
Normal SuccessfulCreate 3m36s job-controller Created pod: job-mulit-sbmth
Normal SuccessfulCreate 3m33s job-controller Created pod: job-mulit-xln8h
Normal SuccessfulCreate 3m14s job-controller Created pod: job-mulit-jkv78
Normal SuccessfulCreate 3m12s job-controller Created pod: job-mulit-x49rj
Normal Completed 2m45s job-controller Job completed