Skip to content

If you cant enable GPUs in k8s cluster when using containerd as runtime this guide is for you

Notifications You must be signed in to change notification settings

bilal77511/k8s-GPU-setting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

GPU-Enabled Kubernetes Cluster Setup Guide

This guide will help you create a GPU-enabled Kubernetes cluster or enable GPU support on an existing cluster using containerd as the runtime.

Prerequisites

  • Ubuntu 22.04 Server for worker nodes
  • Ubuntu 22.04 Desktop for the master node
  • Latest NVIDIA GPU drivers installed
  • Static IP addresses for all machines

Kubernetes Cluster Setup

Follow the guide by Choudhry Shehryar to set up your Kubernetes cluster.

Enabling GPU Support

  1. Install NVIDIA Container Toolkit

  2. Install NVIDIA device plugin

  3. Modify containerd configuration:

    • Open /etc/containerd/config.toml (make a backup first)
    • Set default_runtime_name = "nvidia"
    • Set runtime = "/usr/bin/nvidia-container-runtime" or runtime = "nvidia-container-runtime"
  4. Restart containerd:

    sudo systemctl restart containerd
    
  5. Test GPU access in containerd:

    sudo ctr image pull docker.io/nvidia/cuda:11.2.2-base-ubuntu20.04
    sudo ctr run --rm --gpus 0 -t docker.io/nvidia/cuda:11.2.2-base-ubuntu20.04 cuda-11.0-base nvidia-smi
    
  6. Apply NVIDIA device plugin on the master node:

    kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.3/nvidia-device-plugin.yml
    

Testing GPU Support

  1. Create a test pod using the following YAML:

    apiVersion: v1
    kind: Pod
    metadata:
      name: gputest
      namespace: default
    spec:
      restartPolicy: Never
      containers:
        - name: gpu
          image: nvidia/cuda:11.2.2-base-ubuntu20.04
          command: [ "/bin/bash", "-c", "--" ]
          args: [ "while true; do sleep 30; done;" ]
          resources:
            limits:
              nvidia.com/gpu: 1
  2. Access the pod and run nvidia-smi:

    kubectl exec -n default -it gputest -- /bin/bash
    nvidia-smi
    
  3. Check GPU availability on worker nodes:

    kubectl describe node <name-of-worker-node>
    

    Look for nvidia.com/gpu: 2 in the output.

Additional Resources

For a comprehensive guide on setting up a GPU-enabled Kubernetes cluster from start to finish, check out Choudhry Shehryar's full guide.

About

If you cant enable GPUs in k8s cluster when using containerd as runtime this guide is for you

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published