Home Lab: K8s Cluster with GPU Node
Ubuntu Desktop + KVM/QEMU + Kubernetes + Cilium + NVIDIA GPU Passthrough + ArgoCD
Dual boot alongside Windows for gaming. Hardware: AMD Ryzen 5 9600X | 32 GB RAM | 1 TB SSD | RTX 5060 Ti 16 GB | April 2026
Table of Contents
- Architecture Overview
- Phase 1: Install Ubuntu Desktop (Dual Boot)
- Phase 2: Enable IOMMU & Configure GPU Passthrough (VFIO)
- Phase 3: Install KVM/QEMU & Libvirt
- Phase 4: Setup Bridge Network
- Phase 5: Create Virtual Machines
- Phase 6: Install Kubernetes (kubeadm)
- Phase 7: Install Cilium CNI
- Phase 8: Join gpu-worker VM as GPU Worker Node
- Phase 9: NVIDIA Container Toolkit
- Phase 10: ArgoCD — GitOps with lespaul-argo_cd
- Phase 11: Verification & Testing
- Phase 12: Jenkins Kubernetes Agent (Dynamic Pod Agents)
- Resource Allocation Summary
- Troubleshooting
- Maintenance & Tips
Node Legend
| Symbol | Meaning |
|---|---|
🖥️ [HOST] | Ubuntu Desktop host machine (iGPU, manages VMs — not a k8s node) |
🎛️ [controlplane] | VM control plane |
👷 [worker1, worker2] | VM worker nodes (CPU) |
🎮 [gpu-worker] | VM worker node (dGPU passthrough — RTX 5060 Ti) |
🌐 [ALL NODES] | controlplane + worker1 + worker2 + gpu-worker |
🧑💼 [kubectl client] | Any machine with kubeconfig (usually host) |
🔨 [docker-builder] | Jenkins VM agent for Docker image builds |
1. Architecture Overview
1.1 Final Cluster Layout
| Node | Role | vCPU | RAM | IP |
|---|---|---|---|---|
| controlplane (VM) | Control Plane | 2 | 4 GB | 192.168.100.200 |
| worker1 (VM) | Worker (CPU) | 2 | 4 GB | 192.168.100.201 |
| worker2 (VM) | Worker (CPU) | 2 | 4 GB | 192.168.100.202 |
| gpu-worker (VM) | Worker (dGPU) | 4 | 8 GB | 192.168.100.210 |
| jenkins-master (VM) | Jenkins CI | 2 | 4 GB | 192.168.100.170 |
| docker-builder (VM) | Jenkins Docker Agent | 2 | 4 GB | 192.168.100.171 |
| nfs-server (VM) | NFS Storage | 1 | 1 GB | 192.168.100.180 |
1.2 Key Design Decisions
- Bridge networking (
br-k8s,192.168.100.0/24): all VMs — includinggpu-worker— share one L2 bridge with NAT to the internet. Every cluster node (controlplane, worker1, worker2, gpu-worker) gets an IP on192.168.100.0/24. This uniform L2 topology is required for Cilium native routing to work without VXLAN encapsulation. - GPU passthrough (VFIO/IOMMU): the RTX 5060 Ti (dGPU) is passed through to the
gpu-workerVM via PCIe passthrough — the host never uses the dGPU directly. The host retains the iGPU (AMD Radeon integrated in Ryzen 5 9600X) for the desktop environment. Using a VM with passthrough (instead of the host as a k8s node) keeps all nodes on the same bridge and avoids network asymmetry. - Cilium CNI: full eBPF networking. kube-proxy is skipped — Cilium replaces it entirely.
- Host NOT a k8s worker: the host runs the desktop, manages VMs via libvirt, and serves as the kubectl client — but does not join the cluster. All GPU workloads run inside the
gpu-workerVM via passthrough. - ArgoCD (App of Apps): all post-bootstrap changes are driven by Git pushes to
lespaul-argo_cd.
Memory note: 32 GB is fully committed across 7 VMs + desktop. worker1/worker2 are reduced to 4 GB each (from 6 GB) to make room for the
docker-builderVM. If OOM occurs, reduce jenkins-master to 2 GB.
2. Phase 1: Install Ubuntu Desktop (Dual Boot)
2.1 Preparation (in Windows)
- Download Ubuntu 24.04 LTS Desktop ISO from ubuntu.com.
- Create a bootable USB with Rufus or Balena Etcher.
- Disable Fast Startup: Control Panel → Power Options → Choose what the power buttons do → uncheck "Turn on fast startup".
- Disable BitLocker if enabled: Settings → Privacy & Security → Device encryption.
- Shrink the Windows partition: open Disk Management, right-click the main partition → Shrink Volume. Free at least 300 GB for Ubuntu.
Back up important data before modifying partitions.
2.2 Install Ubuntu
- Boot from USB (press F2/F12/DEL for boot menu).
- Select "Install Ubuntu" → "Install Ubuntu alongside Windows Boot Manager".
- Recommended partition layout: 512 MB
/boot/efi, remainder as ext4 at/. Use a swap file (not a partition) — configured post-install. - Complete installation and reboot. GRUB will show both Ubuntu and Windows.
2.3 Post-Install Basics
sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential curl wget git htop net-tools
Monitor connection: plug your display into the motherboard video output (HDMI/DisplayPort on the back I/O panel), not the GPU. The GPU will be passed to
gpu-workerin Phase 2 and will no longer drive the desktop.
3. Phase 2: Enable IOMMU & Configure GPU Passthrough (VFIO)
GPU passthrough lets the gpu-worker VM exclusively own the RTX 5060 Ti. The host uses the AMD Ryzen 5 9600X's integrated Radeon GPU for the desktop from this point on.
BIOS first: enter UEFI/BIOS and enable AMD-Vi (IOMMU) — usually under Advanced → CPU Configuration or AMD CBS → NBIO Common Options. Save and reboot into Ubuntu.
3.1 🖥️ [HOST] Enable IOMMU in GRUB
sudo nano /etc/default/grub
# Change GRUB_CMDLINE_LINUX_DEFAULT to:
# GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on iommu=pt"
sudo update-grub
sudo reboot
Verify after reboot:
dmesg | grep -i iommu | head -20
# Look for: AMD-Vi: IOMMU enabled or pci 0000:00:00.2: AMD-Vi: IOMMU performance
3.2 🖥️ [HOST] Find GPU IOMMU Group & PCI IDs
# List all devices with their IOMMU group numbers
for d in /sys/kernel/iommu_groups/*/devices/*; do
n=${d#*/iommu_groups/*}; n=${n%%/*}
printf 'IOMMU Group %s ' "$n"
lspci -nns "${d##*/}"
done | grep -i nvidia
Example output:
IOMMU Group 14 01:00.0 VGA compatible controller [0300]: NVIDIA ... RTX 5060 Ti [10de:XXXX]
IOMMU Group 14 01:00.1 Audio device [0403]: NVIDIA ... HD Audio [10de:YYYY]
Note the PCI slot (01:00.0, 01:00.1) and the vendor:device IDs (10de:XXXX, 10de:YYYY).
All devices in the same IOMMU group must be passed through together. If the GPU shares a group with unrelated devices, consider ACS override patches (advanced — out of scope here).
3.3 🖥️ [HOST] Bind RTX 5060 Ti to vfio-pci
Replace 10de:XXXX,10de:YYYY with your actual GPU + HDMI audio PCI IDs from Step 3.2.
cat <<EOF | sudo tee /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:XXXX,10de:YYYY
softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
EOF
cat <<EOF | sudo tee /etc/modules-load.d/vfio.conf
vfio
vfio_iommu_type1
vfio_pci
EOF
sudo update-initramfs -u -k all
sudo reboot
3.4 🖥️ [HOST] Verify VFIO Binding
lspci -nnk | grep -A3 -i nvidia
# "Kernel driver in use: vfio-pci" ← GPU is claimed by VFIO — correct
# If it still shows "nvidia" or "nouveau", the softdep didn't apply — check /etc/modprobe.d/vfio.conf
The GPU is now unavailable to the host OS and ready for VM passthrough.
4. Phase 3: Install KVM/QEMU & Libvirt
4.1 Verify Virtualization Support
egrep -c '(vmx|svm)' /proc/cpuinfo # should be > 0
sudo apt install -y cpu-checker
kvm-ok # should say: KVM acceleration can be used
4.2 Install Packages
sudo apt install -y qemu-kvm libvirt-daemon-system \
libvirt-clients bridge-utils virt-manager virtinst
sudo usermod -aG libvirt $USER
sudo usermod -aG kvm $USER
# Log out and back in for group changes to take effect
4.3 Verify
virsh list --all
sudo systemctl status libvirtd # should be active (running)
5. Phase 4: Setup Bridge Network
The bridge br-k8s (192.168.100.0/24) connects all VMs and the host on the same subnet with NAT out to the internet.
First identify your physical NIC:
ip aornmcli device status. Common names:enp4s0,enp5s0,eno1. Replace accordingly below.
5.1 Option A: Netplan
# /etc/netplan/01-bridge.yaml
network:
version: 2
ethernets:
enp4s0: # your physical NIC
dhcp4: false
bridges:
br-k8s:
interfaces: [enp4s0]
addresses: [192.168.100.1/24]
parameters:
stp: false
dhcp4: false
mtu: 1500
sudo netplan apply
ip addr show br-k8s
bridge link
Enable NAT (internet access for VMs):
# Persist in /etc/rc.local or a systemd unit
sudo iptables -t nat -A POSTROUTING -s 192.168.100.0/24 ! -d 192.168.100.0/24 -j MASQUERADE
echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
5.2 Option B: NetworkManager
sudo nmcli connection add type bridge ifname br-k8s con-name br-k8s
sudo nmcli connection add type bridge-slave ifname enp4s0 master br-k8s
sudo nmcli connection modify br-k8s ipv4.addresses 192.168.100.1/24 ipv4.method manual
sudo nmcli connection down 'Wired connection 1'
sudo nmcli connection up br-k8s
WiFi bridge is not supported. Use Ethernet for this setup.
6. Phase 5: Create Virtual Machines
6.1 Download Ubuntu Server ISO
cd /var/lib/libvirt/images/
sudo wget https://releases.ubuntu.com/24.04/ubuntu-24.04-live-server-amd64.iso
6.2 Create VM Disks
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/controlplane.qcow2 30G
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/worker1.qcow2 40G
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/worker2.qcow2 40G
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/gpu-worker.qcow2 60G
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/jenkins.qcow2 50G
6.3 Create VMs
Control plane:
virt-install \
--name controlplane --ram 4096 --vcpus 2 \
--disk path=/var/lib/libvirt/images/controlplane.qcow2,format=qcow2 \
--os-variant ubuntu24.04 \
--network bridge=br-k8s,model=virtio \
--cdrom /var/lib/libvirt/images/ubuntu-24.04-live-server-amd64.iso \
--graphics vnc,listen=0.0.0.0 --noautoconsole
Worker nodes (repeat for worker1, worker2 — adjust --name and --disk):
virt-install \
--name worker1 --ram 6144 --vcpus 2 \
--disk path=/var/lib/libvirt/images/worker1.qcow2,format=qcow2 \
--os-variant ubuntu24.04 \
--network bridge=br-k8s,model=virtio \
--cdrom /var/lib/libvirt/images/ubuntu-24.04-live-server-amd64.iso \
--graphics vnc,listen=0.0.0.0 --noautoconsole
gpu-worker (PCIe passthrough — adjust 01:00.0 / 01:00.1 to your GPU's PCI slot from Phase 2):
virt-install \
--name gpu-worker --ram 8192 --vcpus 4 \
--disk path=/var/lib/libvirt/images/gpu-worker.qcow2,format=qcow2 \
--os-variant ubuntu24.04 \
--network bridge=br-k8s,model=virtio \
--cdrom /var/lib/libvirt/images/ubuntu-24.04-live-server-amd64.iso \
--machine q35 \
--boot uefi \
--cpu host-passthrough \
--features kvm_hidden=on \
--hostdev 01:00.0 \
--hostdev 01:00.1 \
--graphics vnc,listen=0.0.0.0 --noautoconsole
--network bridge=br-k8sputsgpu-workeron the same L2 bridge as all other VMs — essential for Cilium native routing. Do not use the defaultvirbr0(NAT network) or host networking here.--machine q35is required for PCIe passthrough.--cpu host-passthroughexposes real CPU features to the VM (required by NVIDIA drivers). This is CPU feature exposure only — the GPU is passed through via--hostdev, not via the host OS.kvm_hidden=onprevents NVIDIA Error 43 caused by the driver detecting it's running inside a hypervisor.--hostdev 01:00.1passes the GPU's HDMI audio device — required because GPU and audio share the same IOMMU group.
6.4 Post-Install VM Configuration
Set static IPs on each VM via Netplan:
# /etc/netplan/00-installer-config.yaml (example for controlplane)
network:
version: 2
ethernets:
enp1s0:
addresses: [192.168.100.200/24]
routes:
- to: default
via: 192.168.100.1
nameservers:
addresses: [8.8.8.8, 8.8.4.4]
| Node | IP | Hostname |
|---|---|---|
| Host | 192.168.100.1 | k8s-host |
| controlplane | 192.168.100.200 | controlplane |
| worker1 | 192.168.100.201 | worker1 |
| worker2 | 192.168.100.202 | worker2 |
| gpu-worker | 192.168.100.210 | gpu-worker |
| jenkins-master | 192.168.100.170 | jenkins-master |
| docker-builder | 192.168.100.171 | docker-builder |
| nfs-server | 192.168.100.180 | nfs-server |
Add to /etc/hosts on all nodes:
192.168.100.1 k8s-host
192.168.100.200 controlplane
192.168.100.201 worker1
192.168.100.202 worker2
192.168.100.210 gpu-worker
192.168.100.170 jenkins-master
192.168.100.171 docker-builder
192.168.100.180 nfs-server
6.5 Setup SSH Access
# On host
ssh-keygen -t ed25519
ssh-copy-id ubuntu@192.168.100.200
ssh-copy-id ubuntu@192.168.100.201
ssh-copy-id ubuntu@192.168.100.202
ssh-copy-id ubuntu@192.168.100.210
ssh-copy-id ubuntu@192.168.100.170
ssh-copy-id ubuntu@192.168.100.171
ssh-copy-id ubuntu@192.168.100.180
6.6 Create NFS Server VM
The NFS server is a lightweight dedicated VM that provides persistent storage for the Kubernetes cluster via the nfs-client StorageClass.
sudo qemu-img create -f qcow2 /var/lib/libvirt/images/nfs-server.qcow2 100G
virt-install \
--name nfs-server \
--ram 1024 --vcpus 1 \
--disk path=/var/lib/libvirt/images/nfs-server.qcow2,format=qcow2 \
--os-variant ubuntu24.04 \
--network bridge=br-k8s,model=virtio \
--cdrom /var/lib/libvirt/images/ubuntu-24.04-live-server-amd64.iso \
--graphics vnc,listen=0.0.0.0 --noautoconsole
Set static IP after Ubuntu Server install:
# /etc/netplan/00-installer-config.yaml on the nfs-server VM
network:
version: 2
ethernets:
enp1s0:
addresses: [192.168.100.180/24]
routes:
- to: default
via: 192.168.100.1
nameservers:
addresses: [8.8.8.8, 8.8.4.4]
sudo netplan apply
Add to /etc/hosts on all nodes:
192.168.100.180 nfs-server
Auto-start with the other VMs:
virsh autostart nfs-server
6.7 Configure NFS Server
All commands run on the nfs-server VM (192.168.100.180).
Install NFS
sudo apt update
sudo apt install -y nfs-kernel-server
Create export directory
sudo mkdir -p /srv/nfs/k8s
sudo chown nobody:nogroup /srv/nfs/k8s
sudo chmod 777 /srv/nfs/k8s
Configure exports
echo '/srv/nfs/k8s 192.168.100.0/24(rw,sync,no_subtree_check,no_root_squash)' | \
sudo tee -a /etc/exports
sudo exportfs -rav
sudo systemctl enable --now nfs-kernel-server
Verify from host
showmount -e 192.168.100.180
# Expected:
# Export list for 192.168.100.180:
# /srv/nfs/k8s 192.168.100.0/24
Install NFS client on all Kubernetes nodes
NFS client packages must be present on every node that will mount NFS volumes:
# Run on controlplane, worker1, worker2, and gpu-worker
sudo apt install -y nfs-common
7. Phase 6: Install Kubernetes (kubeadm)
7.1 🌐 [ALL NODES] Kernel Modules & Sysctl
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
7.2 👷🎛️🎮 [controlplane, worker1, worker2, gpu-worker] Disable Swap
Do NOT run on the host. The host is not a k8s node.
sudo swapoff -a
sudo sed -i '/swap/d' /etc/fstab