How to Build a Kubernetes GPU Cluster for AI Workloads with K3s and NVIDIA Runtime

Modern AI workloads demand high computational power, scalability, and efficiency. Traditional CPU clusters can’t keep up with the processing needs of deep learning, data analytics, and model inference at scale.

This guide walks you through building a GPU-enabled K3s cluster that supports containerized AI workloads using the NVIDIA container runtime, containerd, and RuntimeClass integration — a lightweight but powerful foundation for machine learning infrastructure.

🧩 Why Choose K3s for GPU Workloads?

K3s is a lightweight, CNCF-certified Kubernetes distribution optimized for edge and hybrid environments. Its minimal footprint and simple deployment make it ideal for:

On-prem GPU clusters
AI research labs
Edge inference systems

Combined with NVIDIA GPU acceleration, K3s can orchestrate deep learning, data processing, and AI inference pipelines with remarkable efficiency.

⚙️ Step 1: Install NVIDIA Container Toolkit on GPU Nodes

Install the NVIDIA Container Toolkit to allow containerd to communicate with the GPU hardware.

sudo dnf install -y nvidia-container-toolkit
sudo systemctl restart containerd

Verify the installation:

nvidia-smi

You should see your GPU details (model, driver version, CUDA version).

🧰 Step 2: Configure Containerd for NVIDIA Runtime

Create or edit /etc/containerd/config.d/99-nvidia.toml on each GPU node:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
  runtime_type = "io.containerd.runc.v2"
  privileged_without_host_devices = false
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
    BinaryName = "/usr/bin/nvidia-container-runtime"
    SystemdCgroup = true

Then restart your K3s agent:

sudo systemctl restart k3s-agent

This enables GPU passthrough support for pods scheduled on GPU nodes.

🔒 Step 3: Connect to a Private Container Registry

If you store container images in a private registry (such as GitLab Container Registry), configure authentication.

Create /etc/rancher/k3s/registries.yaml:

mirrors:
  "registry.example.com":
    endpoint:
      - "https://registry.example.com"

configs:
  "registry.example.com":
    auth:
      username: "registry_user"
      password: "glpat-*******************"

Restart the K3s agent:

sudo systemctl restart k3s-agent

This allows your cluster to securely pull images for AI workloads.

🧱 Step 4: Define the NVIDIA RuntimeClass

Create a Kubernetes RuntimeClass so that GPU workloads can request the NVIDIA runtime.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: nvidia
handler: nvidia

Apply it:

kubectl apply -f runtimeclass.yaml

🧪 Step 5: Deploy and Validate a GPU Test Pod

To verify that the GPUs are available, deploy a test pod running nvidia-smi inside a container.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
spec:
  runtimeClassName: nvidia
  restartPolicy: Never
  containers:
  - name: cuda-container
    image: registry.example.com/infra/gpu-test:latest
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        nvidia.com/gpu: 1

Apply and monitor:

kubectl apply -f gpu-test.yaml
kubectl logs -f gpu-test

You should see output similar to:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05   Driver Version: 580.95.05   CUDA Version: 13.0       |
| GPU Name: NVIDIA GeForce RTX 5070 Ti                                        |
+-----------------------------------------------------------------------------+

🧮 Step 6: Validate Across Multiple GPU Nodes

If your cluster has multiple GPU nodes, deploy one test pod per node:

kubectl apply -f gpu-test-t.yaml
kubectl apply -f gpu-test-s.yaml

Check results:

kubectl get pods -o wide
kubectl logs gpu-test-t
kubectl logs gpu-test-s

Each node should report its respective GPU and CUDA details.

🖼️ Figure Placeholder

Figure: GPU-enabled K3s architecture showing master, GPU worker nodes, and NVIDIA runtime integration.
(Insert your diagram here in WordPress media library.)

🚀 Business Value: Why GPU-Enabled Kubernetes Matters

A GPU-enabled Kubernetes environment allows organizations to:

Accelerate AI model training with scalable GPU scheduling.
Optimize resource usage by running GPU workloads only where needed.
Enable hybrid deployment — from cloud to on-premise.
Reduce operational overhead with containerized GPU drivers and automated rollout.

For businesses developing AI-driven products, this approach offers a balance of performance, security, and scalability — without requiring heavyweight enterprise clusters.

✅ Final Validation Summary

Component	Status
NVIDIA drivers on hosts	✅ Installed
NVIDIA container runtime	✅ Configured
Device plugin DaemonSet	✅ Running
Private registry access	✅ Authenticated
RuntimeClass “nvidia”	✅ Active
GPU workloads execution	✅ Successful
Multi-node GPU validation	✅ Confirmed

🧭 Conclusion

By enabling GPU workloads inside a lightweight K3s cluster, you gain a high-performance foundation for AI inference, machine learning training, and data-intensive computing.

The NVIDIA container runtime, containerd integration, and private registry authentication ensure a secure and maintainable environment ready for enterprise-grade GPU orchestration.

Whether you’re building scalable AI pipelines or running inference workloads at the edge, this setup is a solid, production-ready foundation for your next generation of intelligent applications.

This article is inspired by real-world challenges we tackle in our projects. If you're looking for expert solutions or need a team to bring your idea to life,