Debugging

Debugging a Kubernetes cluster can be challenging. This guide provides several approaches to help you diagnose and resolve issues with your Kube-Hetzner cluster.

Using the Hetzner CLI

First and foremost, it's always good to have a quick look into Hetzner quickly without logging in to the UI. That is where the hcloud cli comes in.

Setup

Activate it with hcloud context create Kube-hetzner; it will prompt for your Hetzner API token, paste that, and hit enter.

Useful Commands

Check the nodes: hcloud server list
Check the network: hcloud network describe k3s
Check the load balancer: hcloud loadbalancer describe k3s-traefik

SSH Access to Nodes

To log in to your cluster via SSH, use:

ssh root@<control-plane-ip> -i /path/to/private_key -o StrictHostKeyChecking=no

Checking Node Status

For control-plane nodes, use:

journalctl -u k3s

For agent nodes, use:

journalctl -u k3s-agent

Inspecting Configuration

Check the k3s configuration:

cat /etc/rancher/k3s/config.yaml

Checking Reboot Times

See when the previous reboot took place:

last reboot
uptime

Kubernetes Debugging

Check Cluster Status

kubectl cluster-info
kubectl get nodes
kubectl get pods -A

Check System Components

kubectl get pods -n kube-system
kubectl logs -n kube-system -l app=kube-dns

Check CNI Status

For Cilium:

kubectl -n kube-system exec --stdin --tty cilium-xxxx -- cilium status

Common Issues and Solutions

Nodes Not Ready

Check if nodes are accessible via SSH
Check k3s service status: systemctl status k3s or systemctl status k3s-agent
Check logs: journalctl -u k3s or journalctl -u k3s-agent
Check network connectivity between nodes

Pods Stuck in Pending

Check resource availability: kubectl describe nodes
Check pod events: kubectl describe pod <pod-name>
Check for taints/tolerations mismatches

Ingress Not Working

Check load balancer status in Hetzner Cloud Console
Check ingress controller pods: kubectl get pods -n traefik (or nginx/contour)
Check ingress resources: kubectl get ingress
Check service endpoints: kubectl get endpoints <service-name>

Storage Issues

For Longhorn:

kubectl -n longhorn-system get volumes
kubectl -n longhorn-system get nodes

For Hetzner CSI:

kubectl get csinodes
kubectl get csidrivers

Advanced Debugging

Enable Debug Logging

Add to your kube.tf:

k3s_exec_server_args = "--debug"
k3s_exec_agent_args = "--debug"

Check System Resources

df -h  # Disk usage
free -m  # Memory usage
top  # CPU usage

Network Diagnostics

ip a  # Network interfaces
ip r  # Routing table
ping <other-node-ip>  # Connectivity test

SELinux Issues

Check for SELinux denials:

ausearch -m avc -ts recent

Rather than weakening SELinux modules for all workloads on your cluster, it's better to create a profile and apply it to a specific workload using the udica tool (pre-installed on MicroOS nodes).

Using the Hetzner CLI​

Setup​

Useful Commands​

SSH Access to Nodes​

Checking Node Status​

Inspecting Configuration​

Checking Reboot Times​

Kubernetes Debugging​

Check Cluster Status​

Check System Components​

Check CNI Status​

Common Issues and Solutions​

Nodes Not Ready​

Pods Stuck in Pending​

Ingress Not Working​

Storage Issues​

Advanced Debugging​

Enable Debug Logging​

Check System Resources​

Network Diagnostics​

SELinux Issues​