Skip to content

Commit

Permalink
Merge branch 'kata' into 'master'
Browse files Browse the repository at this point in the history
Tech preview for Sandboxed workloads

See merge request nvidia/cloud-native/cnt-docs!297
  • Loading branch information
mikemckiernan committed Aug 4, 2023
2 parents cfe58a8 + e9e72b4 commit 9cbe97a
Show file tree
Hide file tree
Showing 4 changed files with 509 additions and 32 deletions.
4 changes: 3 additions & 1 deletion gpu-operator/appendix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ Advanced Configurations

Container Device Interface Support <cdi.rst>

Confidential Containers <confidential-containers.rst>
gpu-operator-kata.rst

gpu-operator-confidential-containers.rst

GPU Operator with Amazon EKS <amazon-eks.rst>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@
.. headings (h1/h2/h3/h4/h5) are # * = -
##########################################################
GPU Operator Support for Confidential Containers with Kata
##########################################################
##################################################
GPU Operator with Confidential Containers and Kata
##################################################

.. contents::
:depth: 2
Expand Down Expand Up @@ -111,6 +111,7 @@ About NVIDIA Confidential Computing Manager

You can set the default confidential computing mode of the NVIDIA GPUs by setting the
``ccManager.defaultMode=<on|off>`` option.
The default value is ``off``.
You can set this option when you install NVIDIA GPU Operator or afterward by modifying the
``cluster-policy`` instance of the ``ClusterPolicy`` object.

Expand Down Expand Up @@ -185,12 +186,15 @@ Node A receives the following software components:
- ``NVIDIA Device Plugin for Kubernetes`` -- to discover and advertise GPU resources to kubelet.
- ``NVIDIA DGCM and DGCM Exporter`` -- to monitor GPUs.
- ``NVIDIA MIG Manager for Kubernetes`` -- to manage MIG-capable GPUs.
- ``Node Feature Discovery`` -- to detect CPU, kernel, and host features and label worker nodes.
- ``NVIDIA GPU Feature Discovery`` -- to detect NVIDIA GPUs and label worker nodes.

Node B receives the following software components:

- ``NVIDIA Kata Manager for Kubernetes`` -- to manage the NVIDIA artifacts such as the
NVIDIA optimized Linux kernel image and initial RAM disk.
- ``NVIDIA Confidential Computing Manager for Kubernetes`` -- to manage the confidential
computing mode of the NVIDIA GPU on the node.
- ``NVIDIA Sandbox Device Plugin`` -- to discover and advertise the passthrough GPUs to kubelet.
- ``NVIDIA VFIO Manager`` -- to load the vfio-pci device driver and bind it to all GPUs on the node.
- ``Node Feature Discovery`` -- to detect CPU security features, NVIDIA GPUs, and label worker nodes.
Expand Down Expand Up @@ -241,7 +245,7 @@ Installing and configuring your cluster to support the NVIDIA GPU Operator with

#. Label the worker nodes that you want to use with confidential containers.

This step ensures that you can continue to run traditional container workloads and vGPU workloads on some nodes in your cluster.
This step ensures that you can continue to run traditional container workloads with GPU or vGPU workloads on some nodes in your cluster.

#. Install the Confidential Containers Operator.

Expand All @@ -264,10 +268,8 @@ Label Nodes for Confidental Containers
$ kubectl label node <node-name> nvidia.com/gpu.workload.config=vm-passthrough
..
.. include:: gpu-operator-kata.rst
:start-after: start-install-coco-operator
:end-before: end-install-coco-operator
.. start-install-coco-operator
********************************************
Install the Confidential Containers Operator
Expand Down Expand Up @@ -358,7 +360,7 @@ Perform the following steps to install and verify the Confidential Containers Op
ccruntime.confidentialcontainers.org/ccruntime-sample created
Wait approximately 10 minutes for the Operator to create the base runtime classes.
Wait a few minutes for the Operator to create the base runtime classes.

#. (Optional) View the runtime classes:

Expand All @@ -379,6 +381,7 @@ Perform the following steps to install and verify the Confidential Containers Op
kata-qemu-snp kata-qemu-snp 13m
kata-qemu-tdx kata-qemu-tdx 13m
.. end-install-coco-operator
*******************************
Install the NVIDIA GPU Operator
Expand Down Expand Up @@ -482,12 +485,12 @@ Verification
.. code-block:: output
:emphasize-lines: 3
65:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A10] [10de:2236] (rev a1)
Subsystem: NVIDIA Corporation GA102GL [A10] [10de:1482]
65:00.0 3D controller [0302]: NVIDIA Corporation xxxxxxx [xxx] [10de:xxxx] (rev xx)
Subsystem: NVIDIA Corporation xxxxxxx [xxx] [10de:xxxx]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
#. Confirm that Kata manager installed the ``kata-qemu-nvidia-gpu`` runtime class files:
#. Confirm that NVIDIA Kata Manager installed the ``kata-qemu-nvidia-gpu-snp`` runtime class files:

.. code-block:: console
Expand Down Expand Up @@ -540,6 +543,7 @@ To set a node-level mode, apply the ``nvidia.com/cc.mode=<on|off|devtools>`` lab
$ kubectl label node <node-name> nvidia.com/cc.mode=on --overwrite
The mode that you set on a node has higher precedence than the cluster-wide default mode.

Verifying a Mode Change
=======================
Expand Down Expand Up @@ -579,7 +583,7 @@ A pod specification for a confidential computing requires the following:
.. code-block:: output
{
"nvidia.com/GA102GL_A10": "1"
"nvidia.com/GH100_H100_PCIE": "1"
}
#. Create a file, such as ``cuda-vectoradd-coco.yaml``, like the following example:
Expand All @@ -599,9 +603,9 @@ A pod specification for a confidential computing requires the following:
containers:
- name: cuda-vectoradd
image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04"
resources:
limits:
"nvidia.com/GA102GL_A10": 1
resources:
limits:
"nvidia.com/GH100_H100_PCIE": 1
#. Create the pod:

Expand Down
Loading

0 comments on commit 9cbe97a

Please sign in to comment.