Complete Kubernetes Cluster Setup Guide with Cilium on Arch Linux
This comprehensive guide walks you through deploying a production-ready Kubernetes cluster on Arch Linux using Cilium as the Container Network Interface (CNI), ingress controller, and load balancer. We’ll also set up automated NFS storage provisioning and demonstrate how to deploy GitLab with CI/CD pipelines on the cluster.
Architecture Overview
Our cluster deployment includes:
- 3 Kubernetes nodes: 1 control plane + 2 worker nodes
- 1 NFS server: For persistent storage with automatic provisioning
- Cilium networking: Replaces kube-proxy with advanced networking features
- Integrated load balancing: L2 announcements for external IP management
- TLS ingress capabilities: SSL/TLS termination with certificate management
- Storage automation: Dynamic persistent volume provisioning
Key Features:
- No kube-proxy dependency (Cilium handles everything)
- Advanced networking with eBPF
- Built-in ingress controller and load balancer
- Automatic NFS storage provisioning
- Production-ready security configurations
Prerequisites
Before starting, ensure you have:
- Basic knowledge of Kubernetes concepts
- Understanding of Linux networking and storage
- 4 virtual or physical machines ready for Arch Linux installation
- Network access between all machines
- Administrative privileges on all systems
Resource Requirements:
- Kubernetes nodes: 8GB RAM, 4 CPU cores, 20GB storage each
- NFS server: 1GB RAM, 1 CPU core, 200GB storage
- Network: Private subnet with sufficient IP addresses
Virtual Environment Configuration
We’ll start by setting up the infrastructure for our cluster deployment.
Network Setup
Create a virtual network with the IP range 22.22.22.0/24
. This provides ample address space for our cluster and future expansion.
Arch Linux Template Preparation
We’ll create a base Arch Linux template with pre-configured SSH keys that can be cloned for all our machines. This approach ensures consistency and speeds up deployment.
Installation Process:
- Initial Setup Commands:
root@archiso ~ # pacman -Sy --noconfirm --needed git wget
root@archiso ~ # git clone https://github.com/hibrit/ais
root@archiso ~ # cd ais
root@archiso ~/ais (git)-[main] # ./1_update_mirrors.sh
- Disk Partitioning:
root@archiso ~/ais (git)-[main] # cfdisk /dev/vda
# Create a single partition covering the entire disk
root@archiso ~/ais (git)-[main] # mkfs.ext4 /dev/vda1
root@archiso ~/ais (git)-[main] # mount /dev/vda1 /mnt
- System Installation:
root@archiso ~/ais (git)-[main] # ./2_install_base_system.sh
[root@archiso /]# cd ais
[root@archiso ais]# vim ./3_install_system.sh
# Edit the script for minimal installation and correct GRUB configuration
- Post-Installation Configuration:
After reboot, configure networking using
nmtui
and enable SSH access:
[flouda@ak8s ~]$ cd Documents/ais
[flouda@ak8s ais]$ ./5_install_yay.sh
[flouda@ak8s ais]$ vim 7_terminal_setup.sh
# Edit for minimal server installation (remove GUI components)
[flouda@ak8s ais]$ ./7_terminal_setup.sh
[flouda@ak8s ais]$ systemctl enable --now sshd.service
- SSH Key Distribution:
ssh-copy-id -i ~/.ssh/id_rsa flouda@template.ak8s.net
Once the template is ready, create snapshots and clone 4 instances for your cluster.
NFS Server Configuration
The NFS server provides centralized storage for persistent volumes with automatic provisioning capabilities.
Storage Setup
- Add Storage Disk: Attach a 200GB disk to your NFS server and partition it:
sudo cfdisk /dev/vdb
# Create a single partition
sudo mkfs.ext4 /dev/vdb1
- Persistent Mount Configuration: Get the partition UUID and configure automatic mounting:
lsblk -f
# Note the UUID of /dev/vdb1
sudo vim /etc/fstab
Add this line to /etc/fstab
:
UUID=your-uuid-here /data ext4 defaults 0 2
- Mount Point Setup:
sudo mkdir /data
sudo mount -a
sudo chown nobody:nobody -R /data
NFS Service Configuration
- Install NFS Utilities:
sudo pacman -Sy --noconfirm --needed nfs-utils
- Configure Exports:
Edit
/etc/exports
to share the storage:
sudo vim /etc/exports
Add the following configuration:
/data 22.22.22.0/24(rw,sync,no_subtree_check,no_root_squash)
- Start NFS Services:
sudo systemctl enable --now nfs-server.service
sudo exportfs -av
Security Note: The no_root_squash
option is used for Kubernetes compatibility but should be carefully considered in production environments.
Kubernetes Node Preparation
Configure all three Kubernetes nodes with the required system modifications and package installations.
System Configuration
Apply these configurations to all Kubernetes nodes:
- Kernel Module Loading:
sudo modprobe overlay
sudo modprobe br_netfilter
- Persistent Module Configuration:
Create
/etc/modules-load.d/k8s.conf
:
overlay
br_netfilter
- Network Parameter Configuration:
Create
/etc/sysctl.d/k8s.conf
:
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
Apply the settings:
sudo sysctl --system
- Verification:
lsmod | grep -E "(overlay|br_netfilter)"
sysctl net.bridge.bridge-nf-call-iptables net.ipv4.ip_forward
Package Installation
Control Plane Node:
sudo pacman -Sy containerd kubelet kubeadm kubectl cni-plugins helm cilium-cli
Worker Nodes:
sudo pacman -Sy containerd kubelet kubeadm kubectl cni-plugins
Container Runtime Configuration
Configure containerd on all nodes. For detailed information about containerd alternatives and rootless configurations, see our comprehensive containerd setup guide:
- Generate Default Configuration:
sudo mkdir /etc/containerd
sudo bash -c "containerd config default > /etc/containerd/config.toml"
- Critical Configuration Changes:
Edit
/etc/containerd/config.toml
and modify:
- Set
SystemdCgroup = true
(for proper cgroup management) - Set
sandbox_image = "registry.k8s.io/pause:3.9"
(use current pause image)
- Service Management:
sudo systemctl enable --now containerd
sudo systemctl enable --now kubelet.service
- Pre-pull Images:
sudo kubeadm config images pull
Cluster Initialization
Control Plane Setup
Initialize the Kubernetes cluster on your control plane node:
sudo kubeadm init --pod-network-cidr='10.85.0.0/16' --skip-phases=addon/kube-proxy
Important Parameters:
--pod-network-cidr
: Defines the IP range for pod networking--skip-phases=addon/kube-proxy
: We’ll use Cilium instead of kube-proxy
After initialization, kubeadm will provide two critical commands:
- Configure kubectl access (run on control plane):
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
- Join worker nodes (run on each worker):
kubeadm join <control-plane-ip>:6443 --token <token> --discovery-token-ca-cert-hash sha256:<hash>
Node Verification and Labeling
- Check Node Status:
kubectl get nodes -o wide
Nodes will show “NotReady” status until networking is configured.
- Label Worker Nodes:
kubectl label node ak8s2 node-role.kubernetes.io/worker=worker
kubectl label node ak8s3 node-role.kubernetes.io/worker=worker
Cilium Installation and Configuration
Cilium provides advanced networking capabilities, replacing kube-proxy with eBPF-based networking.
Install Cilium
Deploy Cilium with comprehensive features enabled:
cilium install \
--set kubeProxyReplacement=true \
--set ingressController.enabled=true \
--set ingressController.loadbalancerMode=dedicated \
--set ingressController.default=true \
--set l2announcements.enabled=true \
--set k8sClientRateLimit.qps=30 \
--set k8sClientRateLimit.burst=50 \
--set externalIPs.enabled=true \
--set devices=enp1s0
Configuration Explanation:
kubeProxyReplacement=true
: Replace kube-proxy with CiliumingressController.enabled=true
: Enable built-in ingress controllerl2announcements.enabled=true
: Enable Layer 2 IP announcementsexternalIPs.enabled=true
: Support for external IP addressesdevices=enp1s0
: Network interface for L2 announcements
Monitor Deployment
watch kubectl -n kube-system get pod
Wait for all Cilium pods to reach “Running” status.
Load Balancer IP Pool Configuration
Create an IP pool for load balancer services:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "ak8s-pool"
namespace: kube-system
spec:
cidrs:
- cidr: "22.22.22.0/24"
L2 Announcement Policy
Configure which nodes announce external IPs:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: ak8s-policy
namespace: kube-system
spec:
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist
interfaces:
- enp1s0
externalIPs: true
loadBalancerIPs: true
Apply both configurations:
kubectl apply -f cilium-ip-pool.yaml
kubectl apply -f cilium-l2-policy.yaml
Cluster Validation
Test your cluster functionality with a complete ingress setup.
TLS Certificate Generation
Create a self-signed certificate for testing:
openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout cert.key -out cert.crt -subj "/CN=test.ak8s.net"
kubectl create secret tls web-tls --cert=./cert.crt --key=./cert.key
Test Application Deployment
- Deploy Test Pod:
apiVersion: v1
kind: Pod
metadata:
labels:
run: web
name: web
spec:
containers:
- image: nginx:latest
name: web
ports:
- containerPort: 80
restartPolicy: Always
- Create Service:
apiVersion: v1
kind: Service
metadata:
labels:
run: web
name: web
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
run: web
type: ClusterIP
- Configure Ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
tls:
- hosts:
- test.ak8s.net
secretName: web-tls
rules:
- host: test.ak8s.net
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web
port:
number: 80
Verification Steps
- Check Resources:
kubectl get pod,svc,ingress
- Note External IP:
Check the EXTERNAL-IP assigned to your ingress and add it to your
/etc/hosts
file:
<external-ip> test.ak8s.net
- Test Access:
Navigate to
https://test.ak8s.net
in your browser. You should see the default Nginx page with a self-signed certificate warning.
NFS Storage Provisioning
Implement automatic persistent volume provisioning using NFS.
Install NFS Provisioner
Use Helm to deploy the NFS subdir external provisioner:
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
--namespace kube-system \
--set nfs.server=nfs.ak8s.net \
--set nfs.path=/data \
--set storageClass.defaultClass=true \
--set storageClass.onDelete=delete \
--set storageClass.accessModes=ReadWriteMany
Configuration Parameters:
nfs.server
: FQDN or IP of your NFS servernfs.path
: Export path on the NFS serverstorageClass.defaultClass=true
: Makes this the default storage classstorageClass.onDelete=delete
: Removes data when PVCs are deletedaccessModes=ReadWriteMany
: Allows multiple pods to mount the same volume
Verify Storage Class
kubectl get storageclass
You should see nfs-client
marked as the default storage class.
Test Persistent Volume Claims
Create a test PVC to verify automatic provisioning:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
After applying this configuration:
kubectl apply -f test-pvc.yaml
kubectl get pv,pvc
You should see a persistent volume automatically created and bound to your claim.
Advanced Configuration and Optimization
Performance Tuning
Cilium Optimization:
- Adjust
k8sClientRateLimit
values based on cluster size - Configure CPU and memory limits for Cilium pods
- Enable Cilium Hubble for network observability
Storage Optimization:
- Configure NFS mount options for better performance
- Consider using faster storage backend for NFS server
- Implement backup strategies for persistent data
Security Hardening
Network Policies: Implement Cilium network policies to restrict pod-to-pod communication:
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "default-deny"
spec:
endpointSelector: {}
egress:
- toEndpoints:
- matchLabels:
k8s:io.kubernetes.pod.namespace: kube-system
RBAC Configuration: Implement proper Role-Based Access Control for cluster security.
Monitoring and Logging
Cilium Hubble: Enable network observability:
cilium hubble enable --ui
Metrics Collection: Configure Prometheus and Grafana for cluster monitoring (see our Prometheus and Grafana deployment guide).
Troubleshooting Common Issues
Network Connectivity Problems
Symptom: Pods cannot communicate across nodes Solution:
- Verify Cilium pods are running:
kubectl -n kube-system get pods -l k8s-app=cilium
- Check Cilium status:
cilium status
- Verify L2 announcements:
cilium l2 announce list
Symptom: External IPs not accessible Solution:
- Check CiliumLoadBalancerIPPool configuration
- Verify L2AnnouncementPolicy targets correct nodes
- Ensure network interface is correctly specified
Storage Issues
Symptom: PVCs stuck in Pending state Solution:
- Check NFS server connectivity:
showmount -e nfs.ak8s.net
- Verify NFS provisioner pods:
kubectl -n kube-system get pods -l app=nfs-subdir-external-provisioner
- Check storage class configuration:
kubectl describe storageclass nfs-client
Symptom: NFS mount failures Solution:
- Verify NFS exports:
sudo exportfs -av
- Check firewall rules on NFS server
- Test manual mount:
sudo mount -t nfs nfs.ak8s.net:/data /mnt/test
Certificate and TLS Issues
Symptom: Ingress TLS not working Solution:
- Verify certificate secret exists:
kubectl get secrets
- Check ingress controller logs:
kubectl -n kube-system logs -l app.kubernetes.io/name=cilium
- Validate certificate:
openssl x509 -in cert.crt -text -noout
Next Steps and Integration
With your cluster fully operational, you can proceed with:
- GitLab Deployment: Follow our GitLab deployment guide to set up version control and CI/CD
- Container Registry: Deploy a private Docker registry for custom images
- Monitoring Stack: Implement comprehensive monitoring and alerting
- Backup Solutions: Set up automated backup strategies for persistent data
- High Availability: Expand to multiple control plane nodes for production resilience
References and Resources
Official Documentation
Cilium-Specific Resources
Storage and NFS
Security and Best Practices
Questions Answered in This Document
Q: How do I set up a Kubernetes cluster without kube-proxy?
A: Use Cilium as a complete replacement for kube-proxy by installing it with kubeProxyReplacement=true
. Cilium provides all networking functionality using eBPF technology, offering better performance and advanced features.
Q: What are the hardware requirements for a Cilium-based Kubernetes cluster? A: For a basic setup, use 8GB RAM and 4 CPU cores per Kubernetes node, plus 1GB RAM and 1 CPU core for the NFS server. The NFS server needs substantial storage (200GB+) for persistent volumes.
Q: How does Cilium handle load balancing and ingress traffic? A: Cilium includes built-in ingress controller and load balancer capabilities using L2 announcements. Configure CiliumLoadBalancerIPPool for IP allocation and CiliumL2AnnouncementPolicy to control which nodes announce external IPs.
Q: Can I use NFS for Kubernetes persistent storage in production? A: Yes, with proper configuration. Use the NFS subdir external provisioner for automatic volume provisioning, ensure NFS server redundancy, implement proper backup strategies, and consider performance implications for high-throughput applications.
Q: How do I troubleshoot Cilium networking issues?
A: Use cilium status
to check overall health, cilium connectivity test
for network validation, and kubectl -n kube-system logs -l k8s-app=cilium
for detailed logs. Enable Hubble for network observability and troubleshooting.
Q: What makes Cilium better than traditional Kubernetes networking? A: Cilium uses eBPF technology for faster packet processing, provides built-in ingress and load balancing, offers advanced security policies, includes network observability tools, and eliminates the need for kube-proxy.
Q: How do I secure my Kubernetes cluster with Cilium? A: Implement Cilium network policies for micro-segmentation, use proper RBAC configurations, enable TLS for all communications, secure the NFS server with appropriate export permissions, and regularly update all cluster components.
Q: Can this setup handle production workloads? A: This setup provides a solid foundation for production use. For full production readiness, add multiple control plane nodes for high availability, implement comprehensive monitoring and alerting, set up automated backups, and establish proper disaster recovery procedures.