Skip to content

Latest commit

 

History

History
80 lines (62 loc) · 2.7 KB

README.md

File metadata and controls

80 lines (62 loc) · 2.7 KB

GPU-Enabled Kubernetes Cluster Setup Guide

This guide will help you create a GPU-enabled Kubernetes cluster or enable GPU support on an existing cluster using containerd as the runtime.

Prerequisites

  • Ubuntu 22.04 Server for worker nodes
  • Ubuntu 22.04 Desktop for the master node
  • Latest NVIDIA GPU drivers installed
  • Static IP addresses for all machines

Kubernetes Cluster Setup

Follow the guide by Choudhry Shehryar to set up your Kubernetes cluster.

Enabling GPU Support

  1. Install NVIDIA Container Toolkit

  2. Install NVIDIA device plugin

  3. Modify containerd configuration:

    • Open /etc/containerd/config.toml (make a backup first)
    • Set default_runtime_name = "nvidia"
    • Set runtime = "/usr/bin/nvidia-container-runtime" or runtime = "nvidia-container-runtime"
  4. Restart containerd:

    sudo systemctl restart containerd
    
  5. Test GPU access in containerd:

    sudo ctr image pull docker.io/nvidia/cuda:11.2.2-base-ubuntu20.04
    sudo ctr run --rm --gpus 0 -t docker.io/nvidia/cuda:11.2.2-base-ubuntu20.04 cuda-11.0-base nvidia-smi
    
  6. Apply NVIDIA device plugin on the master node:

    kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.3/nvidia-device-plugin.yml
    

Testing GPU Support

  1. Create a test pod using the following YAML:

    apiVersion: v1
    kind: Pod
    metadata:
      name: gputest
      namespace: default
    spec:
      restartPolicy: Never
      containers:
        - name: gpu
          image: nvidia/cuda:11.2.2-base-ubuntu20.04
          command: [ "/bin/bash", "-c", "--" ]
          args: [ "while true; do sleep 30; done;" ]
          resources:
            limits:
              nvidia.com/gpu: 1
  2. Access the pod and run nvidia-smi:

    kubectl exec -n default -it gputest -- /bin/bash
    nvidia-smi
    
  3. Check GPU availability on worker nodes:

    kubectl describe node <name-of-worker-node>
    

    Look for nvidia.com/gpu: 2 in the output.

Additional Resources

For a comprehensive guide on setting up a GPU-enabled Kubernetes cluster from start to finish, check out Choudhry Shehryar's full guide.