更多kubernetes文章: k8s专栏目录
1.创建namespace gpu
2.增加限制
[root@tensorflow1 gpu-namespace]# cat compute-resources.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
namespace: gpu
spec:
hard:
pods: "5"
requests.cpu: "1"
requests.memory: 1Gi
limits.cpu: "2"
limits.memory: 2Gi
[root@tensorflow1 gpu-namespace]# kubectl describe namespace gpu
Name: gpu
Labels:
Annotations:
Status: Active
Resource Quotas
Name: compute-resources
Resource Used Hard
-------- --- ---
limits.cpu 0 2
limits.memory 0 2Gi
pods 4 5
requests.cpu 0 1
requests.memory 0 1Gi
No resource limits.
3.检查限制情况
在已经创建好容器的情况下再增加限制,发现限制并没有起作用,预期是memory限制到2g,结果是 从容器内仍然能看到8g内存
容器内:
root@tensorflow-ps-rc-cm9c8:/notebooks# free -m
total used free shared buff/cache available
Mem: 7783 1615 274 251 5893 5383
Swap: 0 0 0
宿主机:
[root@tensorflow0 ~]# free -m
total used free shared buff/cache available
Mem: 7783 1616 272 251 5894 5382
Swap: 0 0 0
4.杀掉容器
容器启动失败,要求对容器添加限制
[root@tensorflow1 gpu-namespace]# kubectl describe rc/tensorflow-ps-rc -n gpu
...
Warning FailedCreate 2m replication-controller Error creating: pods "tensorflow-ps-rc-jrxxl" is forbidden: failed quota: compute-resources: must specify limits.cpu,limits.memory,requests.cpu,requests.memory
Warning FailedCreate 23s (x9 over 2m) replication-controller (combined from similar events): Error creating: pods "tensorflow-ps-rc-sw9wx" is forbidden: failed quota: compute-resources: must specify limits.cpu,limits.memory,requests.cpu,requests.memory
5.配置好限制,重启启动
增加配置:
resources:
requests:
memory: "1024Mi"
cpu: "250m"
limits:
memory: "1024Mi"
cpu: "500m"
只启动了一个[root@tensorflow1 tf_gpu]# kubectl get all -o wide -n gpu
NAME READY STATUS RESTARTS AGE IP NODE
po/tensorflow-ps-rc-9m8zj 1/1 Running 0 1h 10.244.2.91 tensorflow0
po/tensorflow-worker-rc-5zq9q 1/1 Running 0 11d 10.244.2.61 tensorflow0
po/tensorflow-worker-rc-mhncr 1/1 Running 0 11d 10.244.1.87 tensorflow2
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
rc/tensorflow-ps-rc 2 1 1 1h ps nfs:5000/tensorflow/tensorflow:nightly name=tensorflow-ps
rc/tensorflow-worker-rc 2 2 2 11d worker nfs:5000/tensorflow/tensorflow:nightly-gpu name=tensorflow-worker
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
svc/tensorflow-ps-service ClusterIP 10.99.156.187 2222/TCP 11d name=tensorflow-ps
svc/tensorflow-wk-service ClusterIP 10.102.251.161 2222/TCP 11d name=tensorflow-worker
第二个不满足条件了
[root@tensorflow1 tf_gpu]# kubectl describe namespace gpu
Name: gpu
Labels:
Annotations:
Status: Active
Resource Quotas
Name: compute-resources
Resource Used Hard
-------- --- ---
limits.cpu 500m 2
limits.memory 1Gi 2Gi
pods 3 5
requests.cpu 250m 1
requests.memory 1Gi 1Gi
No resource limits.
[root@tensorflow1 tf_gpu]# kubectl describe rc/tensorflow-ps-rc -n gpu
Warning FailedCreate 3m replication-controller Error creating: pods "tensorflow-ps-rc-cbt6c" is forbidden: exceeded quota: compute-resources, requested: requests.memory=1Gi, used: requests.memory=1Gi, limited: requests.memory=1Gi
6.进入启动成功的那个容器
宿主机内存
[root@tensorflow0 ~]# free -m
total used free shared buff/cache available
Mem: 7783 1433 450 251 5899 5567
Swap: 0 0 0
容器内存,与外面看到的一致。
root@tensorflow-ps-rc-9m8zj:/notebooks# free -m
total used free shared buff/cache available
Mem: 7783 1433 450 251 5899 5567
Swap: 0 0 0
虽然限制了1G内存,但是仍能看到8G内存