Configure Kubernetes Cluster to Use Nvidia GPU

Saturday, 15/10/2022

Tram Ho

As an open source platform for automating the deployment, scaling, and management of containerized applications, Kubernetes is often chosen for the deployment of web services in general and that includes applications with using ML models. On that basis, in this article, we will learn about how Kubernetes components interact with Nvidia GPUs as well as practice with a simple k8s cluster.

Nvidia GPU on Kubernetes

A little recap, though, is that Kubernetes manages its units called Pod which are designed to emulate application-specific logic servers and can contain different linked application containers. relatively tight. Containers in a Pod share an IP address and port space, they are always co-located, scheduled, and run in a shared context on the same Node .

Image used from https://kubernetes.io/en/docs/tutorials/kubernetes-basics/explore/explore-intro/

Building a Kubernetes cluster for testing is quite simple, as we can use the k3s installation via a single command as follows:

<span class="token function">curl</span> -sfL https://get.k3s.io <span class="token operator">|</span> <span class="token assign-left variable">INSTALL_K3S_EXEC</span> <span class="token operator">=</span> <span class="token string">"server --no-deploy traefik"</span> <span class="token function">sh</span>

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC = "server --no-deploy traefik" sh

For those of you wondering, I’m not deploying traefik because this Ingress is not very popular, after testing other tools such as Seldon Core , I have to remove it first, so I won’t install it from the beginning ┐(￣ヮ￣)┌ . In this demo, I will have a cluster with 2 node that is because my machine does not have a GPU and the other machine has (￢_￢;) Installing a new node and adding it to the cluster will be done easily through via the command curl -sfL https://get.k3s.io | K3S_URL=https://10.0.37.144:6443 K3S_TOKEN=$K3S_TOKEN sh - with the value of K3S_TOKEN taken from /var/lib/rancher/k3s/server/node-token on node master . If you are not too black the result will look like this:

% kubectl get nodes                           
NAME          STATUS   ROLES                  AGE     VERSION
b122436-pc    Ready    control-plane,master   5m42s   v1.24.6+k3s1
b120639-pc3   Ready    <span class="token operator">&lt;</span> none <span class="token operator">&gt;</span>                 2m25s   v1.24.6+k3s1

% kubectl get nodes

NAME STATUS ROLES AGE VERSION

b122436-pc Ready control-plane,master 5m42s v1.24.6+k3s1

b120639-pc3 Ready < none > 2m25s v1.24.6+k3s1

Here we have node b120639-pc3 which will be the node with GPU and node b122436-pc is my (；⌣̀_⌣́) So to try and see if a default k8s cluster can use Nvidia GPU, we try deploying a pod with standard content just like a textbook , having to leave the requests field in the manifest because the GPU is always a special resource, it will take a long time to use it without asking:

<span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> v1
<span class="token key atrule">kind</span> <span class="token punctuation">:</span> Pod
<span class="token key atrule">metadata</span> <span class="token punctuation">:</span>
  <span class="token key atrule">name</span> <span class="token punctuation">:</span> gpu <span class="token punctuation">-</span> operator <span class="token punctuation">-</span> test
<span class="token key atrule">spec</span> <span class="token punctuation">:</span>
  <span class="token key atrule">restartPolicy</span> <span class="token punctuation">:</span> OnFailure
  <span class="token key atrule">containers</span> <span class="token punctuation">:</span>
    <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> cuda <span class="token punctuation">-</span> vector <span class="token punctuation">-</span> add
      <span class="token key atrule">image</span> <span class="token punctuation">:</span> <span class="token string">"nvcr.io/nvidia/cloud-native/gpu-operator-validator:v22.9.0"</span>
      <span class="token key atrule">resources</span> <span class="token punctuation">:</span>
        <span class="token key atrule">limits</span> <span class="token punctuation">:</span>
          <span class="token key atrule">nvidia.com/gpu</span> <span class="token punctuation">:</span> <span class="token number">1</span>

apiVersion : v1

kind : Pod

metadata :

name : gpu - operator - test

spec :

restartPolicy : OnFailure

containers :

- name : cuda - vector - add

image : "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v22.9.0"

resources :

limits :

nvidia.com/gpu : 1

It would be nice when it’s ok but life is not that easy =)))))))) (otherwise why would I write this article) we will get the following result:

% kubectl get event --field-selector involvedObject.name <span class="token operator">=</span> gpu-operator-test
LAST SEEN   TYPE      REASON             OBJECT                  MESSAGE
9m3s        Warning   FailedScheduling   pod/gpu-operator-test   <span class="token number">0</span> /2 nodes are available: <span class="token number">2</span> Insufficient nvidia.com/gpu. preemption: <span class="token number">0</span> /2 nodes are available: <span class="token number">2</span> No preemption victims found <span class="token keyword">for</span> incoming pod.
8m45s       Warning   FailedScheduling   pod/gpu-operator-test   skip schedule deleting pod: default/gpu-operator-test
8m43s       Normal    Scheduled          pod/gpu-operator-test   Successfully assigned default/gpu-operator-test to b120639-pc3
8m36s       Normal    Pulling            pod/gpu-operator-test   Pulling image <span class="token string">"nvcr.io/nvidia/cloud-native/gpu-operator-validator:v22.9.0"</span>
6m41s       Normal    Pulled             pod/gpu-operator-test   Successfully pulled image <span class="token string">"nvcr.io/nvidia/cloud-native/gpu-operator-validator:v22.9.0"</span> <span class="token keyword">in</span> 1m55.516263773s
6m41s       Normal    Created            pod/gpu-operator-test   Created container cuda-vector-add
6m40s       Normal    Started            pod/gpu-operator-test   Started container cuda-vector-add
98s         Warning   FailedScheduling   pod/gpu-operator-test   <span class="token number">0</span> /2 nodes are available: <span class="token number">2</span> Insufficient nvidia.com/gpu. preemption: <span class="token number">0</span> /2 nodes are available: <span class="token number">2</span> No preemption victims found <span class="token keyword">for</span> incoming pod.

% kubectl get event --field-selector involvedObject.name = gpu-operator-test

LAST SEEN TYPE REASON OBJECT MESSAGE

9m3s Warning FailedScheduling pod/gpu-operator-test 0 /2 nodes are available: 2 Insufficient nvidia.com/gpu. preemption: 0 /2 nodes are available: 2 No preemption victims found for incoming pod.

8m45s Warning FailedScheduling pod/gpu-operator-test skip schedule deleting pod: default/gpu-operator-test

8m43s Normal Scheduled pod/gpu-operator-test Successfully assigned default/gpu-operator-test to b120639-pc3

8m36s Normal Pulling pod/gpu-operator-test Pulling image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v22.9.0"

6m41s Normal Pulled pod/gpu-operator-test Successfully pulled image "nvcr.io/nvidia/cloud-native/gpu-operator-validator:v22.9.0" in 1m55.516263773s

6m41s Normal Created pod/gpu-operator-test Created container cuda-vector-add

6m40s Normal Started pod/gpu-operator-test Started container cuda-vector-add

98s Warning FailedScheduling pod/gpu-operator-test 0 /2 nodes are available: 2 Insufficient nvidia.com/gpu. preemption: 0 /2 nodes are available: 2 No preemption victims found for incoming pod.

So to save this case, we will learn a little bit below.

Kubernetes Device Plugin for GPU

Currently, Kubernetes is supporting the management of GPUs (not only Nvidia but also AMD and Intel) through Device Plugin , but they will not be pre-installed, but we need to do it ourselves through installing GPU driver . and configure the corresponding device plugin based on the GPU manufacturer’s instructions. Once the plugin is installed successfully, our cluster will expose a custom schedulable resource such as amd.com/gpu or nvidia.com/gpu . There will be a little caveat when using them as follows:

You can specify GPU limits without specifying requests because Kubernetes will use limits as the request value by default.
You can specify GPU in both limits and requests , but the two values must be equal.
You cannot specify GPU requests without specifying limits .

Then a sample manifest used to use the GPU with a pod would look like this:

<span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> v1
<span class="token key atrule">kind</span> <span class="token punctuation">:</span> Pod
<span class="token key atrule">metadata</span> <span class="token punctuation">:</span>
  <span class="token key atrule">name</span> <span class="token punctuation">:</span> example <span class="token punctuation">-</span> vector <span class="token punctuation">-</span> add
<span class="token key atrule">spec</span> <span class="token punctuation">:</span>
  <span class="token key atrule">restartPolicy</span> <span class="token punctuation">:</span> OnFailure
  <span class="token key atrule">containers</span> <span class="token punctuation">:</span>
    <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> example <span class="token punctuation">-</span> vector <span class="token punctuation">-</span> add
      <span class="token key atrule">image</span> <span class="token punctuation">:</span> <span class="token string">"registry.example/example-vector-add:v42"</span>
      <span class="token key atrule">resources</span> <span class="token punctuation">:</span>
        <span class="token key atrule">limits</span> <span class="token punctuation">:</span>
          <span class="token key atrule">gpu-vendor.example/example-gpu</span> <span class="token punctuation">:</span> <span class="token number">1</span> <span class="token comment"># requesting 1 GPU</span>

apiVersion : v1

kind : Pod

metadata :

name : example - vector - add

spec :

restartPolicy : OnFailure

containers :

- name : example - vector - add

image : "registry.example/example-vector-add:v42"

resources :

limits :

gpu-vendor.example/example-gpu : 1 # requesting 1 GPU

Then, if different node in the cluster have different types of GPUs, then we can use node label and node selector to schedule pod on the appropriate node like this:

<span class="token comment"># Label your nodes with the accelerator type they have.</span>
kubectl label nodes node1 <span class="token assign-left variable">accelerator</span> <span class="token operator">=</span> example-gpu-x100
kubectl label nodes node2 <span class="token assign-left variable">accelerator</span> <span class="token operator">=</span> other-gpu-k915

# Label your nodes with the accelerator type they have.

kubectl label nodes node1 accelerator = example-gpu-x100

kubectl label nodes node2 accelerator = other-gpu-k915

NVIDIA device plugin for Kubernetes

To be able to support customers in using NVIDIA GPUs for Kubernetes clusters, NVIDIA has made the NVIDIA device plugin available at https://github.com/NVIDIA/k8s-device-plugin . Essentially the NVIDIA device plugin built for Kubernetes is a Daemonset that allows you to automatically:

Show the number of GPUs per node in the cluster
Monitor GPU health
Run GPU-intensive container on the cluster

Installation can be done by a number of steps listed in https://github.com/NVIDIA/k8s-device-plugin#quick-start , though you won’t have to do them yourself, these steps will be done through NVIDIA GPU Operator

NVIDIA GPU Operator on Kubernetes Cluster

As demonstrated in the previous section, Kubernetes provides access to special hardware resources such as GPU NVIDIA , NIC , Infiniband adapters and others through the device plugin framework . However, configuring and managing node with these hardware resources requires configuration of many software components such as drivers , container runtime or other libraries, which is difficult and error prone.

To get around this, the NVIDIA GPU Operator is built through the operator framework in Kubernetes to automate the management of all the NVIDIA software components needed to provision the GPU. These components include:

NVIDIA drivers (to be able to use CUDA)
Kubernetes device plugin for GPU
NVIDIA Container Toolkit
Automatic node labeller using NVIDIA GPU feature discovery
Monitor based on NVIDIA Data Center GPU Manager (DCGM)
And other ingredients.

The NVIDIA GPU Operator implementation is via Helm Chart, and it can be easily installed by following these steps:

First, we need to install helm if not already available via the command:

<span class="token function">curl</span> -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 
   <span class="token operator">&amp;&amp;</span> <span class="token function">chmod</span> <span class="token number">700</span> get_helm.sh 
   <span class="token operator">&amp;&amp;</span> ./get_helm.sh

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3

&& chmod 700 get_helm.sh

&& ./get_helm.sh

And then add the NVIDIA Helm repository as follows:

helm repo <span class="token function">add</span> nvidia https://helm.ngc.nvidia.com/nvidia 
   <span class="token operator">&amp;&amp;</span> helm repo update

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia

&& helm repo update

The GPU Operator ‘s Chart will be used to automatically install the necessary components but before using them, we need to review them at https://github.com/NVIDIA/gpu-operator/tree/master/deployments/ gpu-operator . If nothing needs to be changed, we create a release from that chart with the following command:

helm <span class="token function">install</span> --wait --generate-name 
     -n gpu-operator --create-namespace 
     nvidia/gpu-operator

helm install --wait --generate-name

-n gpu-operator --create-namespace

nvidia/gpu-operator

For machines with NVIDIA driver available, we can skip the driver configuration to avoid problems by setting driver.enabled to false as follows:

helm <span class="token function">install</span> --wait --generate-name 
     -n gpu-operator --create-namespace 
     nvidia/gpu-operator 
     --set driver.enabled <span class="token operator">=</span> false

helm install --wait --generate-name

-n gpu-operator --create-namespace

nvidia/gpu-operator

--set driver.enabled = false

Similarly, if the runtime environment is already configured to use the NVIDIA Container Toolkit , you can customize the value of the chart release as follows:

helm <span class="token function">install</span> --wait --generate-name 
     -n gpu-operator --create-namespace 
      nvidia/gpu-operator 
      --set driver.enabled <span class="token operator">=</span> false 
      --set toolkit.enabled <span class="token operator">=</span> false

helm install --wait --generate-name

-n gpu-operator --create-namespace

nvidia/gpu-operator

--set driver.enabled = false

--set toolkit.enabled = false

Finally, after a billion years of waiting, we will have a new namespace named gpu-operator created and a billion other things in it like this:

NAME                                                              READY   STATUS            RESTARTS       AGE
gpu-operator-59b9d49c6f-xrz9q                                     <span class="token number">1</span> /1     Running           <span class="token number">0</span>              15m
gpu-operator-1665737174-node-feature-discovery-worker-2ltj5       <span class="token number">1</span> /1     Running           <span class="token number">0</span>              15m
gpu-operator-1665737174-node-feature-discovery-master-7fd6fw7ww   <span class="token number">1</span> /1     Running           <span class="token number">0</span>              15m
nvidia-dcgm-exporter-sjjjr                                        <span class="token number">0</span> /1     PodInitializing   <span class="token number">0</span>              4m10s
nvidia-cuda-validator-8kbh2                                       <span class="token number">0</span> /1     Completed         <span class="token number">0</span>              3m22s
nvidia-device-plugin-daemonset-zmqrs                              <span class="token number">1</span> /1     Running           <span class="token number">0</span>              4m10s
nvidia-device-plugin-validator-lmrfp                              <span class="token number">0</span> /1     Completed         <span class="token number">0</span>              2m39s
nvidia-operator-validator-zk77v                                   <span class="token number">1</span> /1     Running           <span class="token number">0</span>              4m11s
gpu-operator-1665737174-node-feature-discovery-worker-sjzdz       <span class="token number">1</span> /1     Running           <span class="token number">6</span> <span class="token punctuation">(</span> 5m8s ago <span class="token punctuation">)</span>   15m
gpu-feature-discovery-kvsc7                                       <span class="token number">1</span> /1     Running           <span class="token number">0</span>              4m10s
nvidia-container-toolkit-daemonset-cpjxx                          <span class="token number">1</span> /1     Running           <span class="token number">0</span>              4m11s

NAME READY STATUS RESTARTS AGE

gpu-operator-59b9d49c6f-xrz9q 1 /1 Running 0 15m

gpu-operator-1665737174-node-feature-discovery-worker-2ltj5 1 /1 Running 0 15m

gpu-operator-1665737174-node-feature-discovery-master-7fd6fw7ww 1 /1 Running 0 15m

nvidia-dcgm-exporter-sjjjr 0 /1 PodInitializing 0 4m10s

nvidia-cuda-validator-8kbh2 0 /1 Completed 0 3m22s

nvidia-device-plugin-daemonset-zmqrs 1 /1 Running 0 4m10s

nvidia-device-plugin-validator-lmrfp 0 /1 Completed 0 2m39s

nvidia-operator-validator-zk77v 1 /1 Running 0 4m11s

gpu-operator-1665737174-node-feature-discovery-worker-sjzdz 1 /1 Running 6 ( 5m8s ago ) 15m

gpu-feature-discovery-kvsc7 1 /1 Running 0 4m10s

nvidia-container-toolkit-daemonset-cpjxx 1 /1 Running 0 4m11s

To check if the gpu-operator is working, we can use the manifest to create a Deployment as follows:

<span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> apps/v1
<span class="token key atrule">kind</span> <span class="token punctuation">:</span> Deployment
<span class="token key atrule">metadata</span> <span class="token punctuation">:</span>
  <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test
  <span class="token key atrule">labels</span> <span class="token punctuation">:</span>
    <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test
<span class="token key atrule">spec</span> <span class="token punctuation">:</span>
  <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">1</span>
  <span class="token key atrule">selector</span> <span class="token punctuation">:</span>
    <span class="token key atrule">matchLabels</span> <span class="token punctuation">:</span>
      <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test
  <span class="token key atrule">template</span> <span class="token punctuation">:</span>
    <span class="token key atrule">metadata</span> <span class="token punctuation">:</span>
      <span class="token key atrule">labels</span> <span class="token punctuation">:</span>
        <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test
    <span class="token key atrule">spec</span> <span class="token punctuation">:</span>
      <span class="token key atrule">tolerations</span> <span class="token punctuation">:</span>
        <span class="token punctuation">-</span> <span class="token key atrule">key</span> <span class="token punctuation">:</span> nvidia.com/gpu
          <span class="token key atrule">operator</span> <span class="token punctuation">:</span> Exists
          <span class="token key atrule">effect</span> <span class="token punctuation">:</span> NoSchedule
      <span class="token key atrule">containers</span> <span class="token punctuation">:</span>
        <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> dcgmproftester11
          <span class="token key atrule">image</span> <span class="token punctuation">:</span> nvidia/samples <span class="token punctuation">:</span> dcgmproftester <span class="token punctuation">-</span> 2.0.10 <span class="token punctuation">-</span> cuda11.0 <span class="token punctuation">-</span> ubuntu18.04
          <span class="token key atrule">command</span> <span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token string">"/bin/sh"</span> <span class="token punctuation">,</span> <span class="token string">"-c"</span> <span class="token punctuation">]</span>
          <span class="token key atrule">args</span> <span class="token punctuation">:</span>
            <span class="token punctuation">-</span> while true; do /usr/bin/dcgmproftester11 <span class="token punctuation">-</span> <span class="token punctuation">-</span> no <span class="token punctuation">-</span> dcgm <span class="token punctuation">-</span> validation <span class="token punctuation">-</span> t 1004 <span class="token punctuation">-</span> d 300; sleep 30; done
          <span class="token key atrule">resources</span> <span class="token punctuation">:</span>
            <span class="token key atrule">limits</span> <span class="token punctuation">:</span>
              <span class="token key atrule">nvidia.com/gpu</span> <span class="token punctuation">:</span> <span class="token number">1</span>
          <span class="token key atrule">securityContext</span> <span class="token punctuation">:</span>
            <span class="token key atrule">capabilities</span> <span class="token punctuation">:</span>
              <span class="token key atrule">add</span> <span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token string">"SYS_ADMIN"</span> <span class="token punctuation">]</span>

apiVersion : apps/v1

kind : Deployment

metadata :

name : nvidia - plugin - test

labels :

app : nvidia - plugin - test

spec :

replicas : 1

selector :

matchLabels :

app : nvidia - plugin - test

template :

metadata :

labels :

app : nvidia - plugin - test

spec :

tolerations :

- key : nvidia.com/gpu

operator : Exists

effect : NoSchedule

containers :

- name : dcgmproftester11

image : nvidia/samples : dcgmproftester - 2.0.10 - cuda11.0 - ubuntu18.04

command : [ "/bin/sh" , "-c" ]

args :

- while true; do /usr/bin/dcgmproftester11 - - no - dcgm - validation - t 1004 - d 300; sleep 30; done

resources :

limits :

nvidia.com/gpu : 1

securityContext :

capabilities :

add : [ "SYS_ADMIN" ]

The results obtained will be as follows:

% kubectl get pod nvidia-plugin-test-6d64ffd55f-l46r9
NAME                                  READY   STATUS    RESTARTS   AGE
nvidia-plugin-test-6d64ffd55f-l46r9   <span class="token number">1</span> /1     Running   <span class="token number">0</span>          17m

% kubectl get pod nvidia-plugin-test-6d64ffd55f-l46r9

NAME READY STATUS RESTARTS AGE

nvidia-plugin-test-6d64ffd55f-l46r9 1 /1 Running 0 17m

This means that the pod have a scheduled Nvidia GPU usage requirement and my preset was fortunately error free.

☆*:.｡.o(≧▽≦)o.｡.:*☆ ☆*:.｡.o(≧▽≦)o.｡.:*☆ ☆*:.｡.o(≧▽≦)o.｡.:*☆

MIG Support on Kubernetes

So as above, we can configure the components on Kubernetes to be able to use the GPU. However, nvidia.com/gpu resources by default are only counted according to the number of registered GPUs, so it is impossible for two pods to use the same GPU when we are not allowed to configure as nvidia.com/gpu: 0.5 . That leads to a waste of resources during GPU usage when modern GPUs often have large VRAM (e.g. 12GB with a 2080Ti) and not all machine learning models use up such a large amount of VRAM.

To better visualize, when increasing replicas to 2 in the manifest above, we will immediately encounter the error 0/2 nodes are available: 2 Insufficient nvidia.com/gpu. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod. causing the new pod to not be scheduled. To solve this problem, Yen Vi introduced the Multi-Instance GPU solution by dividing the GPU into seven instances, each completely isolated from high-bandwidth memory, cache, and compute cores. its own. This gives you the ability to support any workload, from the smallest to the largest, with guaranteed quality of service (QoS) and extend the reach of accelerated computing resources. for all users.

However, life is not so easy when Multi-Instance GPU only supports GPUs based on Ampere architecture like NVIDIA A100 or more specifically H100 , A100 , A30 so we need another solution when we don’t have money. (눈_눈)(눈_눈)(눈_눈)

Time-Slicing GPUs in Kubernetes

As introduced above, the latest generations of NVIDIA GPUs offer an operating mode known as Multi-Instance GPU or MIG . MIG allows us to partition the GPU into many smaller, predefined instances, each of which looks like a mini GPU providing memory and fault isolation at the hardware layer. With such a split, you can share access to the GPU by running the workload on one of these predefined instances instead of the full native GPU.

However if:

You don’t like to use MIG because it’s abbreviated as your ex’s name
You are willing to trade the isolation provided by the MIG for the ability to share the GPU by a larger number of users
You don’t have money to buy a new GPU

To address this issue, the NVIDIA GPU Operator enables GPU quá mức -registration through an extensive set of options for the NVIDIA Kubernetes Device Plugin to allow workloads placed on registered GPUs to alternate with together. This GPU “time sharing” mechanism in Kubernetes allows the system to define a set of “replicas” for the GPU, each of which can be independently distributed to a group to run workloads. job. Unlike MIG, there’s no memory or error isolation between replicas, but for some people sometimes nobody cares and the Time-Slicing của GPU mechanism is used to splicing volumes. work from copies of the same underlying GPU.

To configure shared access to the GPU with GPU Time-Slicing , we need to provide a time-slicing configuration to the NVIDIA Kubernetes Device Plugin like the following ConfigMap :

<span class="token key atrule">version</span> <span class="token punctuation">:</span> v1
<span class="token key atrule">sharing</span> <span class="token punctuation">:</span>
  <span class="token key atrule">timeSlicing</span> <span class="token punctuation">:</span>
    <span class="token key atrule">renameByDefault</span> <span class="token punctuation">:</span> &lt;bool <span class="token punctuation">&gt;</span>
    <span class="token key atrule">failRequestsGreaterThanOne</span> <span class="token punctuation">:</span> &lt;bool <span class="token punctuation">&gt;</span>
    <span class="token key atrule">resources</span> <span class="token punctuation">:</span>
    <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> &lt;resource <span class="token punctuation">-</span> name <span class="token punctuation">&gt;</span>
      <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> &lt;num <span class="token punctuation">-</span> replicas <span class="token punctuation">&gt;</span>
    <span class="token punctuation">...</span>

version : v1

sharing :

timeSlicing :

renameByDefault : <bool >

failRequestsGreaterThanOne : <bool >

resources :

- name : <resource - name >

replicas : <num - replicas >

...

A sample ConfigMap provided by the textbook is as follows:

<span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> v1
<span class="token key atrule">kind</span> <span class="token punctuation">:</span> ConfigMap
<span class="token key atrule">data</span> <span class="token punctuation">:</span>
      <span class="token key atrule">tesla-t4</span> <span class="token punctuation">:</span> <span class="token punctuation">|</span> <span class="token punctuation">-</span>
        <span class="token key atrule">version</span> <span class="token punctuation">:</span> v1
        <span class="token key atrule">sharing</span> <span class="token punctuation">:</span>
          <span class="token key atrule">timeSlicing</span> <span class="token punctuation">:</span>
            <span class="token key atrule">resources</span> <span class="token punctuation">:</span>
            <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/gpu
              <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">4</span>

apiVersion : v1

kind : ConfigMap

data :

tesla-t4 : | -

version : v1

sharing :

timeSlicing :

resources :

- name : nvidia.com/gpu

replicas : 4

With the above configuration, we can allow 4 pod to use the same T4 . Similarly, the config for an a100 will look like this:

<span class="token key atrule">a100-40gb</span> <span class="token punctuation">:</span> <span class="token punctuation">|</span> <span class="token punctuation">-</span>
        <span class="token key atrule">version</span> <span class="token punctuation">:</span> v1
        <span class="token key atrule">sharing</span> <span class="token punctuation">:</span>
          <span class="token key atrule">timeSlicing</span> <span class="token punctuation">:</span>
            <span class="token key atrule">resources</span> <span class="token punctuation">:</span>
            <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/gpu
              <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">8</span>
            <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/mig <span class="token punctuation">-</span> 1g.5gb
              <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">2</span>
            <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/mig <span class="token punctuation">-</span> 2g.10gb
              <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">2</span>
            <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/mig <span class="token punctuation">-</span> 3g.20gb
              <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">3</span>
            <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/mig <span class="token punctuation">-</span> 7g.40gb
              <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">7</span>

a100-40gb : | -

version : v1

sharing :

timeSlicing :

resources :

- name : nvidia.com/gpu

replicas : 8

- name : nvidia.com/mig - 1g.5gb

replicas : 2

- name : nvidia.com/mig - 2g.10gb

replicas : 2

- name : nvidia.com/mig - 3g.20gb

replicas : 3

- name : nvidia.com/mig - 7g.40gb

replicas : 7

I’m not that rich, so the config I used to experiment with the 2080Ti will be as follows:

<span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> v1
<span class="token key atrule">kind</span> <span class="token punctuation">:</span> ConfigMap
<span class="token key atrule">metadata</span> <span class="token punctuation">:</span>
  <span class="token key atrule">name</span> <span class="token punctuation">:</span> time <span class="token punctuation">-</span> slicing <span class="token punctuation">-</span> config
  <span class="token key atrule">namespace</span> <span class="token punctuation">:</span> gpu <span class="token punctuation">-</span> operator
<span class="token key atrule">data</span> <span class="token punctuation">:</span>
    <span class="token key atrule">2080ti-12gb</span> <span class="token punctuation">:</span> <span class="token punctuation">|</span> <span class="token punctuation">-</span>
        <span class="token key atrule">version</span> <span class="token punctuation">:</span> v1
        <span class="token key atrule">sharing</span> <span class="token punctuation">:</span>
          <span class="token key atrule">timeSlicing</span> <span class="token punctuation">:</span>
            <span class="token key atrule">resources</span> <span class="token punctuation">:</span>
            <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/gpu
              <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">6</span>

apiVersion : v1

kind : ConfigMap

metadata :

name : time - slicing - config

namespace : gpu - operator

data :

2080ti-12gb : | -

version : v1

sharing :

timeSlicing :

resources :

- name : nvidia.com/gpu

replicas : 6

To enable time-slicing with NVIDIA GPU Operator by passing devicePlugin.config.name to the name of the ConfigMap parameter created above as follows:

kubectl patch clusterpolicy/cluster-policy 
   -n gpu-operator --type merge 
   -p <span class="token string">'{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config"}}}}'</span>

kubectl patch clusterpolicy/cluster-policy

-n gpu-operator --type merge

-p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config"}}}}'

The time-slicing configuration can be applied at the cluster level or per node . By default, the GPU Operator will not apply the time-slicing configuration to any GPU nodes in the cluster and we will have to explicitly specify it with devicePlugin.config.default and we can update it with the following command. :

kubectl patch clusterpolicy/cluster <span class="token punctuation">-</span> policy 
   <span class="token punctuation">-</span> n gpu <span class="token punctuation">-</span> operator <span class="token punctuation">-</span> <span class="token punctuation">-</span> type merge 
   <span class="token punctuation">-</span> p ' <span class="token punctuation">{</span> <span class="token key atrule">"spec"</span> <span class="token punctuation">:</span> <span class="token punctuation">{</span> <span class="token key atrule">"devicePlugin"</span> <span class="token punctuation">:</span> <span class="token punctuation">{</span> <span class="token key atrule">"config"</span> <span class="token punctuation">:</span> <span class="token punctuation">{</span> <span class="token key atrule">"name"</span> <span class="token punctuation">:</span> <span class="token string">"time-slicing-config"</span> <span class="token punctuation">,</span> <span class="token key atrule">"default"</span> <span class="token punctuation">:</span> <span class="token string">"2080ti-12gb"</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> '

kubectl patch clusterpolicy/cluster - policy

- n gpu - operator - - type merge

- p ' { "spec" : { "devicePlugin" : { "config" : { "name" : "time-slicing-config" , "default" : "2080ti-12gb" } } } } '

To check if the configuration is ok, we can check the node information as follows:

<span class="token directive important">% kubectl describe node b120639-</span> <span class="token key atrule">pc3 | grep nvidia.com/gpu</span> <span class="token punctuation">:</span>         
  <span class="token key atrule">nvidia.com/gpu</span> <span class="token punctuation">:</span>     <span class="token number">6</span>
  <span class="token key atrule">nvidia.com/gpu</span> <span class="token punctuation">:</span>     <span class="token number">6</span>

% kubectl describe node b120639- pc3 | grep nvidia.com/gpu :

nvidia.com/gpu : 6

The above output shows that our configuration has been successful and the SHARED suffix has been added to the name of gpu.product

<span class="token directive important">% kubectl describe node b120639-pc3 | grep nvidia.com/gpu.product</span>
                    nvidia.com/gpu.product=NVIDIA <span class="token punctuation">-</span> GeForce <span class="token punctuation">-</span> RTX <span class="token punctuation">-</span> 2080 <span class="token punctuation">-</span> Ti <span class="token punctuation">-</span> SHARED

% kubectl describe node b120639-pc3 | grep nvidia.com/gpu.product

nvidia.com/gpu.product=NVIDIA - GeForce - RTX - 2080 - Ti - SHARED

And now the basic unit used in resource management on this node will be 1/6 of the GPU, so we can configure the Deployment above as follows:

<span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> apps/v1
<span class="token key atrule">kind</span> <span class="token punctuation">:</span> Deployment
<span class="token key atrule">metadata</span> <span class="token punctuation">:</span>
  <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token punctuation">-</span> <span class="token number">2</span>
  <span class="token key atrule">labels</span> <span class="token punctuation">:</span>
    <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token punctuation">-</span> <span class="token number">2</span>
<span class="token key atrule">spec</span> <span class="token punctuation">:</span>
  <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">2</span>
  <span class="token key atrule">selector</span> <span class="token punctuation">:</span>
    <span class="token key atrule">matchLabels</span> <span class="token punctuation">:</span>
      <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token punctuation">-</span> <span class="token number">2</span>
  <span class="token key atrule">template</span> <span class="token punctuation">:</span>
    <span class="token key atrule">metadata</span> <span class="token punctuation">:</span>
      <span class="token key atrule">labels</span> <span class="token punctuation">:</span>
        <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token punctuation">-</span> <span class="token number">2</span>
    <span class="token key atrule">spec</span> <span class="token punctuation">:</span>
      <span class="token key atrule">tolerations</span> <span class="token punctuation">:</span>
        <span class="token punctuation">-</span> <span class="token key atrule">key</span> <span class="token punctuation">:</span> nvidia.com/gpu
          <span class="token key atrule">operator</span> <span class="token punctuation">:</span> Exists
          <span class="token key atrule">effect</span> <span class="token punctuation">:</span> NoSchedule
      <span class="token key atrule">containers</span> <span class="token punctuation">:</span>
        <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> dcgmproftester11
          <span class="token key atrule">image</span> <span class="token punctuation">:</span> nvidia/samples <span class="token punctuation">:</span> dcgmproftester <span class="token punctuation">-</span> 2.0.10 <span class="token punctuation">-</span> cuda11.0 <span class="token punctuation">-</span> ubuntu18.04
          <span class="token key atrule">command</span> <span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token string">"/bin/sh"</span> <span class="token punctuation">,</span> <span class="token string">"-c"</span> <span class="token punctuation">]</span>
          <span class="token key atrule">args</span> <span class="token punctuation">:</span>
            <span class="token punctuation">-</span> while true; do /usr/bin/dcgmproftester11 <span class="token punctuation">-</span> <span class="token punctuation">-</span> no <span class="token punctuation">-</span> dcgm <span class="token punctuation">-</span> validation <span class="token punctuation">-</span> t 1004 <span class="token punctuation">-</span> d 300; sleep 30; done
          <span class="token key atrule">resources</span> <span class="token punctuation">:</span>
            <span class="token key atrule">limits</span> <span class="token punctuation">:</span>
              <span class="token key atrule">nvidia.com/gpu</span> <span class="token punctuation">:</span> <span class="token number">2</span>
          <span class="token key atrule">securityContext</span> <span class="token punctuation">:</span>
            <span class="token key atrule">capabilities</span> <span class="token punctuation">:</span>
              <span class="token key atrule">add</span> <span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token string">"SYS_ADMIN"</span> <span class="token punctuation">]</span>

apiVersion : apps/v1

kind : Deployment

metadata :

name : nvidia - plugin - test - 2

labels :

app : nvidia - plugin - test - 2

spec :

replicas : 2

selector :

matchLabels :

template :

metadata :

labels :

spec :

tolerations :

- key : nvidia.com/gpu

operator : Exists

effect : NoSchedule

containers :

- name : dcgmproftester11

args :

resources :

limits :

nvidia.com/gpu : 2

securityContext :

capabilities :

The results obtained will be:

% kubectl describe nodes b120639-pc3 

Name:               b120639-pc3
<span class="token punctuation">..</span> .
Capacity:
  nvidia.com/gpu:     <span class="token number">6</span>
Allocated resources:
  <span class="token punctuation">..</span> .
  Resource           Requests    Limits
  <span class="token punctuation">..</span> .
  nvidia.com/gpu     <span class="token number">4</span>           <span class="token number">4</span>

% kubectl describe nodes b120639-pc3

Name: b120639-pc3

.. .

Capacity:

nvidia.com/gpu: 6

Allocated resources:

.. .

Resource Requests Limits

.. .

nvidia.com/gpu 4 4

summary

This article covers how Kubernetes components interact with Nvidia GPUs and talks about solutions to use GPUs on a Kubernetes cluster more efficiently through Time-Slicing . Configuring a Kubernetes cluster to use Nvidia GPUs is a fundamental step to deploying services that use Nvidia GPUs such as Triton Inference Server, and in the next article (if I do), we will learn about it together. serving solution for this model. This is the end of the article, thank you all for taking the time to read.

References

Share the news now

Source : Viblo

Configure Kubernetes Cluster to Use Nvidia GPU

Nvidia GPU on Kubernetes

Kubernetes Device Plugin for GPU

NVIDIA device plugin for Kubernetes

NVIDIA GPU Operator on Kubernetes Cluster

MIG Support on Kubernetes

Time-Slicing GPUs in Kubernetes

summary

References

TikTok becomes the second largest social platform in South Africa

The fastest depreciating after 9 months of launch, iPhone 14 Pro Max continues to break the bottom in Vietnam

Beginner's guide to R: Introduction

10 essential SublimeText plugins for JavaScript developers