This guide will help you create a GPU-enabled Kubernetes cluster or enable GPU support on an existing cluster using containerd as the runtime.
- Ubuntu 22.04 Server for worker nodes
- Ubuntu 22.04 Desktop for the master node
- Latest NVIDIA GPU drivers installed
- Static IP addresses for all machines
Follow the guide by Choudhry Shehryar to set up your Kubernetes cluster.
-
Install NVIDIA Container Toolkit
- Follow the installing with apt guide
- Configure containerd using the official guide
-
Install NVIDIA device plugin
-
Modify containerd configuration:
- Open
/etc/containerd/config.toml
(make a backup first) - Set
default_runtime_name = "nvidia"
- Set
runtime = "/usr/bin/nvidia-container-runtime"
orruntime = "nvidia-container-runtime"
- Open
-
Restart containerd:
sudo systemctl restart containerd
-
Test GPU access in containerd:
sudo ctr image pull docker.io/nvidia/cuda:11.2.2-base-ubuntu20.04 sudo ctr run --rm --gpus 0 -t docker.io/nvidia/cuda:11.2.2-base-ubuntu20.04 cuda-11.0-base nvidia-smi
-
Apply NVIDIA device plugin on the master node:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.3/nvidia-device-plugin.yml
-
Create a test pod using the following YAML:
apiVersion: v1 kind: Pod metadata: name: gputest namespace: default spec: restartPolicy: Never containers: - name: gpu image: nvidia/cuda:11.2.2-base-ubuntu20.04 command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 30; done;" ] resources: limits: nvidia.com/gpu: 1
-
Access the pod and run
nvidia-smi
:kubectl exec -n default -it gputest -- /bin/bash nvidia-smi
-
Check GPU availability on worker nodes:
kubectl describe node <name-of-worker-node>
Look for
nvidia.com/gpu: 2
in the output.
For a comprehensive guide on setting up a GPU-enabled Kubernetes cluster from start to finish, check out Choudhry Shehryar's full guide.