Modern AI workloads demand high computational power, scalability, and efficiency. Traditional CPU clusters can’t keep up with the processing needs of deep learning, data analytics, and model inference at scale.
This guide walks you through building a GPU-enabled K3s cluster that supports containerized AI workloads using the NVIDIA container runtime, containerd, and RuntimeClass integration — a lightweight but powerful foundation for machine learning infrastructure.
🧩 Why Choose K3s for GPU Workloads?
K3s is a lightweight, CNCF-certified Kubernetes distribution optimized for edge and hybrid environments. Its minimal footprint and simple deployment make it ideal for:
- On-prem GPU clusters
- AI research labs
- Edge inference systems
Combined with NVIDIA GPU acceleration, K3s can orchestrate deep learning, data processing, and AI inference pipelines with remarkable efficiency.
⚙️ Step 1: Install NVIDIA Container Toolkit on GPU Nodes
Install the NVIDIA Container Toolkit to allow containerd to communicate with the GPU hardware.
sudo dnf install -y nvidia-container-toolkit
sudo systemctl restart containerd
Verify the installation:
nvidia-smi
You should see your GPU details (model, driver version, CUDA version).
🧰 Step 2: Configure Containerd for NVIDIA Runtime
Create or edit /etc/containerd/config.d/99-nvidia.toml on each GPU node:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
runtime_type = "io.containerd.runc.v2"
privileged_without_host_devices = false
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
SystemdCgroup = true
Then restart your K3s agent:
sudo systemctl restart k3s-agent
This enables GPU passthrough support for pods scheduled on GPU nodes.
🔒 Step 3: Connect to a Private Container Registry
If you store container images in a private registry (such as GitLab Container Registry), configure authentication.
Create /etc/rancher/k3s/registries.yaml:
mirrors:
"registry.example.com":
endpoint:
- "https://registry.example.com"
configs:
"registry.example.com":
auth:
username: "registry_user"
password: "glpat-*******************"
Restart the K3s agent:
sudo systemctl restart k3s-agent
This allows your cluster to securely pull images for AI workloads.
🧱 Step 4: Define the NVIDIA RuntimeClass
Create a Kubernetes RuntimeClass so that GPU workloads can request the NVIDIA runtime.
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: nvidia
handler: nvidia
Apply it:
kubectl apply -f runtimeclass.yaml
🧪 Step 5: Deploy and Validate a GPU Test Pod
To verify that the GPUs are available, deploy a test pod running nvidia-smi inside a container.
apiVersion: v1
kind: Pod
metadata:
name: gpu-test
spec:
runtimeClassName: nvidia
restartPolicy: Never
containers:
- name: cuda-container
image: registry.example.com/infra/gpu-test:latest
imagePullPolicy: IfNotPresent
resources:
limits:
nvidia.com/gpu: 1
Apply and monitor:
kubectl apply -f gpu-test.yaml
kubectl logs -f gpu-test
You should see output similar to:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
| GPU Name: NVIDIA GeForce RTX 5070 Ti |
+-----------------------------------------------------------------------------+
🧮 Step 6: Validate Across Multiple GPU Nodes
If your cluster has multiple GPU nodes, deploy one test pod per node:
kubectl apply -f gpu-test-t.yaml
kubectl apply -f gpu-test-s.yaml
Check results:
kubectl get pods -o wide
kubectl logs gpu-test-t
kubectl logs gpu-test-s
Each node should report its respective GPU and CUDA details.
🖼️ Figure Placeholder
Figure: GPU-enabled K3s architecture showing master, GPU worker nodes, and NVIDIA runtime integration.
(Insert your diagram here in WordPress media library.)
🚀 Business Value: Why GPU-Enabled Kubernetes Matters
A GPU-enabled Kubernetes environment allows organizations to:
- Accelerate AI model training with scalable GPU scheduling.
- Optimize resource usage by running GPU workloads only where needed.
- Enable hybrid deployment — from cloud to on-premise.
- Reduce operational overhead with containerized GPU drivers and automated rollout.
For businesses developing AI-driven products, this approach offers a balance of performance, security, and scalability — without requiring heavyweight enterprise clusters.
✅ Final Validation Summary
| Component | Status |
|---|---|
| NVIDIA drivers on hosts | ✅ Installed |
| NVIDIA container runtime | ✅ Configured |
| Device plugin DaemonSet | ✅ Running |
| Private registry access | ✅ Authenticated |
| RuntimeClass “nvidia” | ✅ Active |
| GPU workloads execution | ✅ Successful |
| Multi-node GPU validation | ✅ Confirmed |
🧭 Conclusion
By enabling GPU workloads inside a lightweight K3s cluster, you gain a high-performance foundation for AI inference, machine learning training, and data-intensive computing.
The NVIDIA container runtime, containerd integration, and private registry authentication ensure a secure and maintainable environment ready for enterprise-grade GPU orchestration.
Whether you’re building scalable AI pipelines or running inference workloads at the edge, this setup is a solid, production-ready foundation for your next generation of intelligent applications.


