kubeadm中集成GPU插件

nvidia驱动和nvidia-docker等安装请看以前我写的文章,

nvidia驱动版本要求要高于384

nvidia-docker版本要高于2

设置docker默认运行环境nvidia

>>> cat /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

修改kubelet启动参数,允许启用插件

>>> cat /var/lib/kubelet/kubeadm-flags.env

KUBELET_KUBEADM_ARGS="--cgroup-driver=cgroupfs --network-plugin=cni --feature-gates=DevicePlugins=true --pod-infra-container-image=k8s.gcr.io/pause:3.1"

重启kubelet

systemctl daemon-reload
systemctl restart kubelet

安装gpu插件

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml

查看node所拥有的gpu数量

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
NAME      GPU
master0   
master1   
master2   
node0     2

准备测试任务gpu_pod.yaml

>>> cat gpu_pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  restartPolicy: Never
  containers:
  - image: nvidia/cuda
    name: cuda
    command: ["nvidia-smi"]
    resources:
      limits:
        nvidia.com/gpu: 1

部署测试任务

kubectl apply -f gpu_pod.yaml 

查看运行结果

>>> kubectl logs gpu-pod

Wed Mar 18 08:53:44 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:02:00.0 Off |                  N/A |
| 19%   43C    P2    78W / 250W |  10678MiB / 11019MiB |     22%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

完毕,,,

有问题加QQ群:  526855734

你可能感兴趣的:(nvidia,kubernetes)