As an open source platform for automating the deployment, scaling, and management of containerized applications, Kubernetes is often chosen for the deployment of web services in general and that includes applications with using ML models. On that basis, in this article, we will learn about how Kubernetes components interact with Nvidia GPUs as well as practice with a simple k8s cluster.
Nvidia GPU on Kubernetes
A little recap, though, is that Kubernetes
manages its units called Pod
which are designed to emulate application-specific logic servers and can contain different linked application containers. relatively tight. Containers in a Pod share an IP address and port space, they are always co-located, scheduled, and run in a shared context on the same Node
.
Image used from https://kubernetes.io/en/docs/tutorials/kubernetes-basics/explore/explore-intro/
Building a Kubernetes cluster for testing is quite simple, as we can use the k3s
installation via a single command as follows:
1 2 | <span class="token function">curl</span> -sfL https://get.k3s.io <span class="token operator">|</span> <span class="token assign-left variable">INSTALL_K3S_EXEC</span> <span class="token operator">=</span> <span class="token string">"server --no-deploy traefik"</span> <span class="token function">sh</span> |
For those of you wondering, I’m not deploying traefik
because this Ingress
is not very popular, after testing other tools such as Seldon Core
, I have to remove it first, so I won’t install it from the beginning ┐( ̄ヮ ̄)┌
. In this demo, I will have a cluster with 2 node
that is because my machine does not have a GPU and the other machine has (¬_¬;)
Installing a new node
and adding it to the cluster will be done easily through via the command curl -sfL https://get.k3s.io | K3S_URL=https://10.0.37.144:6443 K3S_TOKEN=$K3S_TOKEN sh -
with the value of K3S_TOKEN
taken from /var/lib/rancher/k3s/server/node-token
on node
master
. If you are not too black the result will look like this:
1 2 3 4 5 6 | % kubectl get nodes NAME STATUS ROLES AGE VERSION b122436-pc Ready control-plane,master 5m42s v1.24.6+k3s1 b120639-pc3 Ready <span class="token operator"><</span> none <span class="token operator">></span> 2m25s v1.24.6+k3s1 |
Here we have node b120639-pc3
which will be the node
with GPU
and node
b122436-pc
is my (;⌣̀_⌣́)
So to try and see if a default k8s cluster can use Nvidia GPU, we try deploying a pod
with standard content just like a textbook , having to leave the requests
field in the manifest
because the GPU is always a special resource, it will take a long time to use it without asking:
1 2 3 4 5 6 7 8 9 10 11 12 13 | <span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> v1 <span class="token key atrule">kind</span> <span class="token punctuation">:</span> Pod <span class="token key atrule">metadata</span> <span class="token punctuation">:</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> gpu <span class="token punctuation">-</span> operator <span class="token punctuation">-</span> test <span class="token key atrule">spec</span> <span class="token punctuation">:</span> <span class="token key atrule">restartPolicy</span> <span class="token punctuation">:</span> OnFailure <span class="token key atrule">containers</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> cuda <span class="token punctuation">-</span> vector <span class="token punctuation">-</span> add <span class="token key atrule">image</span> <span class="token punctuation">:</span> <span class="token string">"nvcr.io/nvidia/cloud-native/gpu-operator-validator:v22.9.0"</span> <span class="token key atrule">resources</span> <span class="token punctuation">:</span> <span class="token key atrule">limits</span> <span class="token punctuation">:</span> <span class="token key atrule">nvidia.com/gpu</span> <span class="token punctuation">:</span> <span class="token number">1</span> |
It would be nice when it’s ok but life is not that easy =))))))))
(otherwise why would I write this article) we will get the following result:
1 2 3 4 5 6 7 8 9 10 11 12 | % kubectl get event --field-selector involvedObject.name <span class="token operator">=</span> gpu-operator-test LAST SEEN TYPE REASON OBJECT MESSAGE 9m3s Warning FailedScheduling pod/gpu-operator-test <span class="token number">0</span> /2 nodes are available: <span class="token number">2</span> Insufficient nvidia.com/gpu. preemption: <span class="token number">0</span> /2 nodes are available: <span class="token number">2</span> No preemption victims found <span class="token keyword">for</span> incoming pod. 8m45s Warning FailedScheduling pod/gpu-operator-test skip schedule deleting pod: default/gpu-operator-test 8m43s Normal Scheduled pod/gpu-operator-test Successfully assigned default/gpu-operator-test to b120639-pc3 8m36s Normal Pulling pod/gpu-operator-test Pulling image <span class="token string">"nvcr.io/nvidia/cloud-native/gpu-operator-validator:v22.9.0"</span> 6m41s Normal Pulled pod/gpu-operator-test Successfully pulled image <span class="token string">"nvcr.io/nvidia/cloud-native/gpu-operator-validator:v22.9.0"</span> <span class="token keyword">in</span> 1m55.516263773s 6m41s Normal Created pod/gpu-operator-test Created container cuda-vector-add 6m40s Normal Started pod/gpu-operator-test Started container cuda-vector-add 98s Warning FailedScheduling pod/gpu-operator-test <span class="token number">0</span> /2 nodes are available: <span class="token number">2</span> Insufficient nvidia.com/gpu. preemption: <span class="token number">0</span> /2 nodes are available: <span class="token number">2</span> No preemption victims found <span class="token keyword">for</span> incoming pod. |
So to save this case, we will learn a little bit below.
Kubernetes Device Plugin for GPU
Currently, Kubernetes
is supporting the management of GPUs (not only Nvidia but also AMD and Intel) through Device Plugin
, but they will not be pre-installed, but we need to do it ourselves through installing GPU driver
. and configure the corresponding device plugin
based on the GPU manufacturer’s instructions. Once the plugin is installed successfully, our cluster will expose a custom schedulable resource such as amd.com/gpu
or nvidia.com/gpu
. There will be a little caveat when using them as follows:
- You can specify GPU
limits
without specifyingrequests
becauseKubernetes
will uselimits
as the request value by default. - You can specify GPU in both
limits
andrequests
, but the two values must be equal. - You cannot specify GPU
requests
without specifyinglimits
.
Then a sample manifest
used to use the GPU with a pod
would look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 | <span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> v1 <span class="token key atrule">kind</span> <span class="token punctuation">:</span> Pod <span class="token key atrule">metadata</span> <span class="token punctuation">:</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> example <span class="token punctuation">-</span> vector <span class="token punctuation">-</span> add <span class="token key atrule">spec</span> <span class="token punctuation">:</span> <span class="token key atrule">restartPolicy</span> <span class="token punctuation">:</span> OnFailure <span class="token key atrule">containers</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> example <span class="token punctuation">-</span> vector <span class="token punctuation">-</span> add <span class="token key atrule">image</span> <span class="token punctuation">:</span> <span class="token string">"registry.example/example-vector-add:v42"</span> <span class="token key atrule">resources</span> <span class="token punctuation">:</span> <span class="token key atrule">limits</span> <span class="token punctuation">:</span> <span class="token key atrule">gpu-vendor.example/example-gpu</span> <span class="token punctuation">:</span> <span class="token number">1</span> <span class="token comment"># requesting 1 GPU</span> |
Then, if different node
in the cluster have different types of GPUs, then we can use node label
and node selector
to schedule pod
on the appropriate node
like this:
1 2 3 4 | <span class="token comment"># Label your nodes with the accelerator type they have.</span> kubectl label nodes node1 <span class="token assign-left variable">accelerator</span> <span class="token operator">=</span> example-gpu-x100 kubectl label nodes node2 <span class="token assign-left variable">accelerator</span> <span class="token operator">=</span> other-gpu-k915 |
NVIDIA device plugin for Kubernetes
To be able to support customers in using NVIDIA GPUs for Kubernetes
clusters, NVIDIA has made the NVIDIA device plugin
available at https://github.com/NVIDIA/k8s-device-plugin . Essentially the NVIDIA device plugin
built for Kubernetes is a Daemonset
that allows you to automatically:
- Show the number of GPUs per
node
in the cluster - Monitor GPU health
- Run GPU-intensive
container
on the cluster
Installation can be done by a number of steps listed in https://github.com/NVIDIA/k8s-device-plugin#quick-start , though you won’t have to do them yourself, these steps will be done through NVIDIA GPU Operator
NVIDIA GPU Operator on Kubernetes Cluster
As demonstrated in the previous section, Kubernetes provides access to special hardware resources such as GPU NVIDIA
, NIC
, Infiniband adapters
and others through the device plugin framework . However, configuring and managing node
with these hardware resources requires configuration of many software components such as drivers
, container runtime
or other libraries, which is difficult and error prone.
To get around this, the NVIDIA GPU Operator
is built through the operator framework in Kubernetes to automate the management of all the NVIDIA software components needed to provision the GPU. These components include:
- NVIDIA drivers (to be able to use CUDA)
- Kubernetes device plugin for GPU
- NVIDIA Container Toolkit
- Automatic node labeller using NVIDIA GPU feature discovery
- Monitor based on NVIDIA Data Center GPU Manager (DCGM)
- And other ingredients.
The NVIDIA GPU Operator implementation is via Helm Chart, and it can be easily installed by following these steps:
First, we need to install helm
if not already available via the command:
1 2 3 4 | <span class="token function">curl</span> -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 <span class="token operator">&&</span> <span class="token function">chmod</span> <span class="token number">700</span> get_helm.sh <span class="token operator">&&</span> ./get_helm.sh |
And then add the NVIDIA Helm repository
as follows:
1 2 3 | helm repo <span class="token function">add</span> nvidia https://helm.ngc.nvidia.com/nvidia <span class="token operator">&&</span> helm repo update |
The GPU Operator
‘s Chart
will be used to automatically install the necessary components but before using them, we need to review them at https://github.com/NVIDIA/gpu-operator/tree/master/deployments/ gpu-operator . If nothing needs to be changed, we create a release
from that chart
with the following command:
1 2 3 4 | helm <span class="token function">install</span> --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator |
For machines with NVIDIA driver
available, we can skip the driver
configuration to avoid problems by setting driver.enabled
to false
as follows:
1 2 3 4 5 | helm <span class="token function">install</span> --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --set driver.enabled <span class="token operator">=</span> false |
Similarly, if the runtime environment is already configured to use the NVIDIA Container Toolkit
, you can customize the value of the chart release
as follows:
1 2 3 4 5 6 | helm <span class="token function">install</span> --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --set driver.enabled <span class="token operator">=</span> false --set toolkit.enabled <span class="token operator">=</span> false |
Finally, after a billion years of waiting, we will have a new namespace
named gpu-operator
created and a billion other things in it like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 | NAME READY STATUS RESTARTS AGE gpu-operator-59b9d49c6f-xrz9q <span class="token number">1</span> /1 Running <span class="token number">0</span> 15m gpu-operator-1665737174-node-feature-discovery-worker-2ltj5 <span class="token number">1</span> /1 Running <span class="token number">0</span> 15m gpu-operator-1665737174-node-feature-discovery-master-7fd6fw7ww <span class="token number">1</span> /1 Running <span class="token number">0</span> 15m nvidia-dcgm-exporter-sjjjr <span class="token number">0</span> /1 PodInitializing <span class="token number">0</span> 4m10s nvidia-cuda-validator-8kbh2 <span class="token number">0</span> /1 Completed <span class="token number">0</span> 3m22s nvidia-device-plugin-daemonset-zmqrs <span class="token number">1</span> /1 Running <span class="token number">0</span> 4m10s nvidia-device-plugin-validator-lmrfp <span class="token number">0</span> /1 Completed <span class="token number">0</span> 2m39s nvidia-operator-validator-zk77v <span class="token number">1</span> /1 Running <span class="token number">0</span> 4m11s gpu-operator-1665737174-node-feature-discovery-worker-sjzdz <span class="token number">1</span> /1 Running <span class="token number">6</span> <span class="token punctuation">(</span> 5m8s ago <span class="token punctuation">)</span> 15m gpu-feature-discovery-kvsc7 <span class="token number">1</span> /1 Running <span class="token number">0</span> 4m10s nvidia-container-toolkit-daemonset-cpjxx <span class="token number">1</span> /1 Running <span class="token number">0</span> 4m11s |
To check if the gpu-operator
is working, we can use the manifest
to create a Deployment
as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | <span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> apps/v1 <span class="token key atrule">kind</span> <span class="token punctuation">:</span> Deployment <span class="token key atrule">metadata</span> <span class="token punctuation">:</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token key atrule">labels</span> <span class="token punctuation">:</span> <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token key atrule">spec</span> <span class="token punctuation">:</span> <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">1</span> <span class="token key atrule">selector</span> <span class="token punctuation">:</span> <span class="token key atrule">matchLabels</span> <span class="token punctuation">:</span> <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token key atrule">template</span> <span class="token punctuation">:</span> <span class="token key atrule">metadata</span> <span class="token punctuation">:</span> <span class="token key atrule">labels</span> <span class="token punctuation">:</span> <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token key atrule">spec</span> <span class="token punctuation">:</span> <span class="token key atrule">tolerations</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> <span class="token key atrule">key</span> <span class="token punctuation">:</span> nvidia.com/gpu <span class="token key atrule">operator</span> <span class="token punctuation">:</span> Exists <span class="token key atrule">effect</span> <span class="token punctuation">:</span> NoSchedule <span class="token key atrule">containers</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> dcgmproftester11 <span class="token key atrule">image</span> <span class="token punctuation">:</span> nvidia/samples <span class="token punctuation">:</span> dcgmproftester <span class="token punctuation">-</span> 2.0.10 <span class="token punctuation">-</span> cuda11.0 <span class="token punctuation">-</span> ubuntu18.04 <span class="token key atrule">command</span> <span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token string">"/bin/sh"</span> <span class="token punctuation">,</span> <span class="token string">"-c"</span> <span class="token punctuation">]</span> <span class="token key atrule">args</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> while true; do /usr/bin/dcgmproftester11 <span class="token punctuation">-</span> <span class="token punctuation">-</span> no <span class="token punctuation">-</span> dcgm <span class="token punctuation">-</span> validation <span class="token punctuation">-</span> t 1004 <span class="token punctuation">-</span> d 300; sleep 30; done <span class="token key atrule">resources</span> <span class="token punctuation">:</span> <span class="token key atrule">limits</span> <span class="token punctuation">:</span> <span class="token key atrule">nvidia.com/gpu</span> <span class="token punctuation">:</span> <span class="token number">1</span> <span class="token key atrule">securityContext</span> <span class="token punctuation">:</span> <span class="token key atrule">capabilities</span> <span class="token punctuation">:</span> <span class="token key atrule">add</span> <span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token string">"SYS_ADMIN"</span> <span class="token punctuation">]</span> |
The results obtained will be as follows:
1 2 3 4 5 | % kubectl get pod nvidia-plugin-test-6d64ffd55f-l46r9 NAME READY STATUS RESTARTS AGE nvidia-plugin-test-6d64ffd55f-l46r9 <span class="token number">1</span> /1 Running <span class="token number">0</span> 17m |
This means that the pod
have a scheduled Nvidia GPU
usage requirement and my preset was fortunately error free.
☆*:.。.o(≧▽≦)o.。.:*☆
☆*:.。.o(≧▽≦)o.。.:*☆
☆*:.。.o(≧▽≦)o.。.:*☆
MIG Support on Kubernetes
So as above, we can configure the components on Kubernetes to be able to use the GPU. However, nvidia.com/gpu
resources by default are only counted according to the number of registered GPUs, so it is impossible for two pods to use the same GPU when we are not allowed to configure as nvidia.com/gpu: 0.5
. That leads to a waste of resources during GPU usage when modern GPUs often have large VRAM (e.g. 12GB with a 2080Ti) and not all machine learning models use up such a large amount of VRAM.
To better visualize, when increasing replicas
to 2
in the manifest
above, we will immediately encounter the error 0/2 nodes are available: 2 Insufficient nvidia.com/gpu. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.
causing the new pod to not be scheduled. To solve this problem, Yen Vi introduced the Multi-Instance GPU
solution by dividing the GPU into seven instances, each completely isolated from high-bandwidth memory, cache, and compute cores. its own. This gives you the ability to support any workload, from the smallest to the largest, with guaranteed quality of service (QoS) and extend the reach of accelerated computing resources. for all users.
However, life is not so easy when Multi-Instance GPU
only supports GPUs based on Ampere
architecture like NVIDIA A100
or more specifically H100
, A100
, A30
so we need another solution when we don’t have money. (눈_눈)(눈_눈)(눈_눈)
Time-Slicing GPUs in Kubernetes
As introduced above, the latest generations of NVIDIA GPUs offer an operating mode known as Multi-Instance GPU
or MIG
. MIG
allows us to partition the GPU into many smaller, predefined instances, each of which looks like a mini GPU providing memory and fault isolation at the hardware layer. With such a split, you can share access to the GPU by running the workload on one of these predefined instances instead of the full native GPU.
However if:
- You don’t like to use
MIG
because it’s abbreviated as your ex’s name - You are willing to trade the isolation provided by the MIG for the ability to share the GPU by a larger number of users
- You don’t have money to buy a new GPU
To address this issue, the NVIDIA GPU Operator
enables GPU quá mức
-registration through an extensive set of options for the NVIDIA Kubernetes Device Plugin to allow workloads placed on registered GPUs to alternate with together. This GPU “time sharing” mechanism in Kubernetes allows the system to define a set of “replicas” for the GPU, each of which can be independently distributed to a group to run workloads. job. Unlike MIG, there’s no memory or error isolation between replicas, but for some people sometimes nobody cares and the Time-Slicing của GPU
mechanism is used to splicing volumes. work from copies of the same underlying GPU.
To configure shared access to the GPU with GPU Time-Slicing
, we need to provide a time-slicing configuration to the NVIDIA Kubernetes Device Plugin
like the following ConfigMap
:
1 2 3 4 5 6 7 8 9 10 | <span class="token key atrule">version</span> <span class="token punctuation">:</span> v1 <span class="token key atrule">sharing</span> <span class="token punctuation">:</span> <span class="token key atrule">timeSlicing</span> <span class="token punctuation">:</span> <span class="token key atrule">renameByDefault</span> <span class="token punctuation">:</span> <bool <span class="token punctuation">></span> <span class="token key atrule">failRequestsGreaterThanOne</span> <span class="token punctuation">:</span> <bool <span class="token punctuation">></span> <span class="token key atrule">resources</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> <resource <span class="token punctuation">-</span> name <span class="token punctuation">></span> <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <num <span class="token punctuation">-</span> replicas <span class="token punctuation">></span> <span class="token punctuation">...</span> |
A sample ConfigMap
provided by the textbook is as follows:
1 2 3 4 5 6 7 8 9 10 11 | <span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> v1 <span class="token key atrule">kind</span> <span class="token punctuation">:</span> ConfigMap <span class="token key atrule">data</span> <span class="token punctuation">:</span> <span class="token key atrule">tesla-t4</span> <span class="token punctuation">:</span> <span class="token punctuation">|</span> <span class="token punctuation">-</span> <span class="token key atrule">version</span> <span class="token punctuation">:</span> v1 <span class="token key atrule">sharing</span> <span class="token punctuation">:</span> <span class="token key atrule">timeSlicing</span> <span class="token punctuation">:</span> <span class="token key atrule">resources</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/gpu <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">4</span> |
With the above configuration, we can allow 4 pod
to use the same T4
. Similarly, the config for an a100
will look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <span class="token key atrule">a100-40gb</span> <span class="token punctuation">:</span> <span class="token punctuation">|</span> <span class="token punctuation">-</span> <span class="token key atrule">version</span> <span class="token punctuation">:</span> v1 <span class="token key atrule">sharing</span> <span class="token punctuation">:</span> <span class="token key atrule">timeSlicing</span> <span class="token punctuation">:</span> <span class="token key atrule">resources</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/gpu <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">8</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/mig <span class="token punctuation">-</span> 1g.5gb <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">2</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/mig <span class="token punctuation">-</span> 2g.10gb <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">2</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/mig <span class="token punctuation">-</span> 3g.20gb <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">3</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/mig <span class="token punctuation">-</span> 7g.40gb <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">7</span> |
I’m not that rich, so the config I used to experiment with the 2080Ti
will be as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | <span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> v1 <span class="token key atrule">kind</span> <span class="token punctuation">:</span> ConfigMap <span class="token key atrule">metadata</span> <span class="token punctuation">:</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> time <span class="token punctuation">-</span> slicing <span class="token punctuation">-</span> config <span class="token key atrule">namespace</span> <span class="token punctuation">:</span> gpu <span class="token punctuation">-</span> operator <span class="token key atrule">data</span> <span class="token punctuation">:</span> <span class="token key atrule">2080ti-12gb</span> <span class="token punctuation">:</span> <span class="token punctuation">|</span> <span class="token punctuation">-</span> <span class="token key atrule">version</span> <span class="token punctuation">:</span> v1 <span class="token key atrule">sharing</span> <span class="token punctuation">:</span> <span class="token key atrule">timeSlicing</span> <span class="token punctuation">:</span> <span class="token key atrule">resources</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia.com/gpu <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">6</span> |
To enable time-slicing
with NVIDIA GPU Operator
by passing devicePlugin.config.name
to the name of the ConfigMap
parameter created above as follows:
1 2 3 4 | kubectl patch clusterpolicy/cluster-policy -n gpu-operator --type merge -p <span class="token string">'{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config"}}}}'</span> |
The time-slicing
configuration can be applied at the cluster level or per node
. By default, the GPU Operator
will not apply the time-slicing
configuration to any GPU nodes in the cluster and we will have to explicitly specify it with devicePlugin.config.default
and we can update it with the following command. :
1 2 3 4 | kubectl patch clusterpolicy/cluster <span class="token punctuation">-</span> policy <span class="token punctuation">-</span> n gpu <span class="token punctuation">-</span> operator <span class="token punctuation">-</span> <span class="token punctuation">-</span> type merge <span class="token punctuation">-</span> p ' <span class="token punctuation">{</span> <span class="token key atrule">"spec"</span> <span class="token punctuation">:</span> <span class="token punctuation">{</span> <span class="token key atrule">"devicePlugin"</span> <span class="token punctuation">:</span> <span class="token punctuation">{</span> <span class="token key atrule">"config"</span> <span class="token punctuation">:</span> <span class="token punctuation">{</span> <span class="token key atrule">"name"</span> <span class="token punctuation">:</span> <span class="token string">"time-slicing-config"</span> <span class="token punctuation">,</span> <span class="token key atrule">"default"</span> <span class="token punctuation">:</span> <span class="token string">"2080ti-12gb"</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> ' |
To check if the configuration is ok, we can check the node
information as follows:
1 2 3 4 | <span class="token directive important">% kubectl describe node b120639-</span> <span class="token key atrule">pc3 | grep nvidia.com/gpu</span> <span class="token punctuation">:</span> <span class="token key atrule">nvidia.com/gpu</span> <span class="token punctuation">:</span> <span class="token number">6</span> <span class="token key atrule">nvidia.com/gpu</span> <span class="token punctuation">:</span> <span class="token number">6</span> |
The above output shows that our configuration has been successful and the SHARED
suffix has been added to the name of gpu.product
1 2 3 | <span class="token directive important">% kubectl describe node b120639-pc3 | grep nvidia.com/gpu.product</span> nvidia.com/gpu.product=NVIDIA <span class="token punctuation">-</span> GeForce <span class="token punctuation">-</span> RTX <span class="token punctuation">-</span> 2080 <span class="token punctuation">-</span> Ti <span class="token punctuation">-</span> SHARED |
And now the basic unit used in resource management on this node
will be 1/6
of the GPU, so we can configure the Deployment
above as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | <span class="token key atrule">apiVersion</span> <span class="token punctuation">:</span> apps/v1 <span class="token key atrule">kind</span> <span class="token punctuation">:</span> Deployment <span class="token key atrule">metadata</span> <span class="token punctuation">:</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token punctuation">-</span> <span class="token number">2</span> <span class="token key atrule">labels</span> <span class="token punctuation">:</span> <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token punctuation">-</span> <span class="token number">2</span> <span class="token key atrule">spec</span> <span class="token punctuation">:</span> <span class="token key atrule">replicas</span> <span class="token punctuation">:</span> <span class="token number">2</span> <span class="token key atrule">selector</span> <span class="token punctuation">:</span> <span class="token key atrule">matchLabels</span> <span class="token punctuation">:</span> <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token punctuation">-</span> <span class="token number">2</span> <span class="token key atrule">template</span> <span class="token punctuation">:</span> <span class="token key atrule">metadata</span> <span class="token punctuation">:</span> <span class="token key atrule">labels</span> <span class="token punctuation">:</span> <span class="token key atrule">app</span> <span class="token punctuation">:</span> nvidia <span class="token punctuation">-</span> plugin <span class="token punctuation">-</span> test <span class="token punctuation">-</span> <span class="token number">2</span> <span class="token key atrule">spec</span> <span class="token punctuation">:</span> <span class="token key atrule">tolerations</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> <span class="token key atrule">key</span> <span class="token punctuation">:</span> nvidia.com/gpu <span class="token key atrule">operator</span> <span class="token punctuation">:</span> Exists <span class="token key atrule">effect</span> <span class="token punctuation">:</span> NoSchedule <span class="token key atrule">containers</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> <span class="token key atrule">name</span> <span class="token punctuation">:</span> dcgmproftester11 <span class="token key atrule">image</span> <span class="token punctuation">:</span> nvidia/samples <span class="token punctuation">:</span> dcgmproftester <span class="token punctuation">-</span> 2.0.10 <span class="token punctuation">-</span> cuda11.0 <span class="token punctuation">-</span> ubuntu18.04 <span class="token key atrule">command</span> <span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token string">"/bin/sh"</span> <span class="token punctuation">,</span> <span class="token string">"-c"</span> <span class="token punctuation">]</span> <span class="token key atrule">args</span> <span class="token punctuation">:</span> <span class="token punctuation">-</span> while true; do /usr/bin/dcgmproftester11 <span class="token punctuation">-</span> <span class="token punctuation">-</span> no <span class="token punctuation">-</span> dcgm <span class="token punctuation">-</span> validation <span class="token punctuation">-</span> t 1004 <span class="token punctuation">-</span> d 300; sleep 30; done <span class="token key atrule">resources</span> <span class="token punctuation">:</span> <span class="token key atrule">limits</span> <span class="token punctuation">:</span> <span class="token key atrule">nvidia.com/gpu</span> <span class="token punctuation">:</span> <span class="token number">2</span> <span class="token key atrule">securityContext</span> <span class="token punctuation">:</span> <span class="token key atrule">capabilities</span> <span class="token punctuation">:</span> <span class="token key atrule">add</span> <span class="token punctuation">:</span> <span class="token punctuation">[</span> <span class="token string">"SYS_ADMIN"</span> <span class="token punctuation">]</span> |
The results obtained will be:
1 2 3 4 5 6 7 8 9 10 11 12 13 | % kubectl describe nodes b120639-pc3 Name: b120639-pc3 <span class="token punctuation">..</span> . Capacity: nvidia.com/gpu: <span class="token number">6</span> Allocated resources: <span class="token punctuation">..</span> . Resource Requests Limits <span class="token punctuation">..</span> . nvidia.com/gpu <span class="token number">4</span> <span class="token number">4</span> |
summary
This article covers how Kubernetes components interact with Nvidia GPUs and talks about solutions to use GPUs on a Kubernetes cluster more efficiently through Time-Slicing
. Configuring a Kubernetes cluster to use Nvidia GPUs is a fundamental step to deploying services that use Nvidia GPUs such as Triton Inference Server, and in the next article (if I do), we will learn about it together. serving solution for this model. This is the end of the article, thank you all for taking the time to read.