Debugging
Debugging a Kubernetes cluster can be challenging. This guide provides several approaches to help you diagnose and resolve issues with your Kube-Hetzner cluster.
Using the Hetzner CLI
First and foremost, it's always good to have a quick look into Hetzner quickly without logging in to the UI. That is where the hcloud
cli comes in.
Setup
Activate it with hcloud context create Kube-hetzner
; it will prompt for your Hetzner API token, paste that, and hit enter
.
Useful Commands
- Check the nodes:
hcloud server list
- Check the network:
hcloud network describe k3s
- Check the load balancer:
hcloud loadbalancer describe k3s-traefik
SSH Access to Nodes
To log in to your cluster via SSH, use:
ssh root@<control-plane-ip> -i /path/to/private_key -o StrictHostKeyChecking=no
Checking Node Status
For control-plane nodes, use:
journalctl -u k3s
For agent nodes, use:
journalctl -u k3s-agent
Inspecting Configuration
Check the k3s configuration:
cat /etc/rancher/k3s/config.yaml
Checking Reboot Times
See when the previous reboot took place:
last reboot
uptime
Kubernetes Debugging
Check Cluster Status
kubectl cluster-info
kubectl get nodes
kubectl get pods -A
Check System Components
kubectl get pods -n kube-system
kubectl logs -n kube-system -l app=kube-dns
Check CNI Status
For Cilium:
kubectl -n kube-system exec --stdin --tty cilium-xxxx -- cilium status
Common Issues and Solutions
Nodes Not Ready
- Check if nodes are accessible via SSH
- Check k3s service status:
systemctl status k3s
orsystemctl status k3s-agent
- Check logs:
journalctl -u k3s
orjournalctl -u k3s-agent
- Check network connectivity between nodes
Pods Stuck in Pending
- Check resource availability:
kubectl describe nodes
- Check pod events:
kubectl describe pod <pod-name>
- Check for taints/tolerations mismatches
Ingress Not Working
- Check load balancer status in Hetzner Cloud Console
- Check ingress controller pods:
kubectl get pods -n traefik
(or nginx/contour) - Check ingress resources:
kubectl get ingress
- Check service endpoints:
kubectl get endpoints <service-name>
Storage Issues
For Longhorn:
kubectl -n longhorn-system get volumes
kubectl -n longhorn-system get nodes
For Hetzner CSI:
kubectl get csinodes
kubectl get csidrivers
Advanced Debugging
Enable Debug Logging
Add to your kube.tf:
k3s_exec_server_args = "--debug"
k3s_exec_agent_args = "--debug"
Check System Resources
df -h # Disk usage
free -m # Memory usage
top # CPU usage
Network Diagnostics
ip a # Network interfaces
ip r # Routing table
ping <other-node-ip> # Connectivity test
SELinux Issues
Check for SELinux denials:
ausearch -m avc -ts recent
Rather than weakening SELinux modules for all workloads on your cluster, it's better to create a profile and apply it to a specific workload using the udica
tool (pre-installed on MicroOS nodes).