Install Kubernetes Using RKE2 with Cilium
RKE2 (Rancher Kubernetes Engine 2) is Rancher’s next-generation Kubernetes distribution that focuses on security and compliance for government and enterprise environments. This comprehensive guide walks you through setting up a production-ready Kubernetes cluster using RKE2 with Cilium as the Container Network Interface (CNI) and NFS for persistent storage.
This setup provides a robust, scalable foundation for container orchestration with advanced networking capabilities, load balancing, and reliable storage solutions. The configuration includes security best practices and monitoring capabilities essential for production environments.
Infrastructure Requirements
Server Specifications
Kubernetes Nodes (3 servers):
- Operating System: Ubuntu 22.04 LTS
- CPU: 4 cores minimum
- RAM: 8GB minimum
- Network: Static IP addresses recommended
- Storage: 50GB+ for system and container images
NFS Storage Server (1 server):
- Operating System: Ubuntu 22.04 LTS
- CPU: 1 core minimum
- RAM: 1GB minimum
- Storage: 100GB+ for shared storage
- Network: Same subnet as Kubernetes nodes
Network Configuration
This guide uses the following IP addressing scheme:
- NFS Server:
11.11.11.10
(nfs.k8s.net
) - Kubernetes Master:
11.11.11.11
(k8s1.k8s.net
) - Kubernetes Worker 1:
11.11.11.12
(k8s2.k8s.net
) - Kubernetes Worker 2:
11.11.11.13
(k8s3.k8s.net
)
Adjust these addresses to match your network environment.
Pre-Installation Setup
The following steps must be performed on all Kubernetes nodes (master and workers).
System Updates and Preparation
Update your system packages to ensure security patches and compatibility:
apt update && apt upgrade -y
Disable Swap
Kubernetes requires swap to be disabled for optimal performance and resource management:
# Disable swap immediately
swapoff -a
# Edit fstab to prevent swap from being enabled on reboot
vim /etc/fstab
# Comment out any swap lines (add # at the beginning)
# Remove swap file if it exists
rm /swap.img
Install NFS Client Support
Since this setup uses NFS for persistent storage, install NFS client utilities on all nodes:
apt install nfs-common -y
Configure Host Resolution
Add hostname entries to /etc/hosts
on all servers for reliable internal communication:
vim /etc/hosts
Add these entries:
11.11.11.10 nfs.k8s.net
11.11.11.11 k8s1.k8s.net
11.11.11.12 k8s2.k8s.net
11.11.11.13 k8s3.k8s.net
RKE2 Server (Master) Node Installation
Perform these steps on your designated master node (k8s1.k8s.net
).
Install RKE2 Server
Download and install RKE2 using the official installation script:
curl -sfL https://get.rke2.io | sh -
This script installs the rke2-server
service and associated binaries. The installation requires root privileges.
Configure RKE2 Server
Create the RKE2 configuration directory:
mkdir -p /etc/rancher/rke2
Create the server configuration file:
vim /etc/rancher/rke2/config.yaml
Add the following configuration:
disable:
- rke2-ingress-nginx # We'll use Cilium's ingress controller
- rke2-canal # Disable default CNI to use Cilium
disable-kube-proxy: true # Cilium will replace kube-proxy
cni:
- cilium
tls-san:
- api.k8s.net # Additional TLS subject alternative name
etcd-expose-metrics: true # Enable etcd metrics for monitoring
kube-controller-manager-arg:
- bind-address=0.0.0.0 # Allow external access to metrics
kube-scheduler-arg:
- bind-address=0.0.0.0 # Allow external access to metrics
Configure Cilium CNI
Create the manifests directory for custom configurations:
mkdir -p /var/lib/rancher/rke2/server/manifests
Create the Cilium configuration:
vim /var/lib/rancher/rke2/server/manifests/rke2-cilium-config.yaml
Add the following Cilium configuration:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-cilium
namespace: kube-system
spec:
valuesContent: |-
kubeProxyReplacement: true # Replace kube-proxy with Cilium
k8sServiceHost: 11.11.11.11 # Kubernetes API server IP
k8sServicePort: 6443 # Kubernetes API server port
cni:
chainingMode: "none" # Direct CNI mode
l2announcements:
enabled: true # Enable L2 load balancer announcements
ingressController:
enabled: true # Enable Cilium ingress controller
default: true # Set as default ingress class
k8sClientRateLimit:
qps: 30 # Queries per second limit
burst: 50 # Burst limit for API requests
externalIPs:
enabled: true # Enable external IP support
Start RKE2 Server
Enable and start the RKE2 server service:
systemctl enable --now rke2-server.service
Monitor Installation Progress
Monitor the installation progress with logs using tmux for persistent terminal sessions:
journalctl -u rke2-server -f
Wait for the service to start successfully before proceeding. You should see messages indicating successful cluster initialization.
Post-Installation Verification
After installation, RKE2 provides several important files and utilities:
Installed Utilities (located in /var/lib/rancher/rke2/bin/
):
kubectl
- Kubernetes command-line toolcrictl
- Container runtime interface toolctr
- Low-level container tool
Important Files:
- Kubeconfig:
/etc/rancher/rke2/rke2.yaml
- Node token:
/var/lib/rancher/rke2/server/node-token
Cleanup Scripts (for maintenance):
rke2-killall.sh
- Stop all RKE2 processesrke2-uninstall.sh
- Complete removal script
Worker Node Installation
Perform these steps on each worker node (k8s2.k8s.net
, k8s3.k8s.net
).
Install RKE2 Agent
Install RKE2 in agent mode:
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -
Configure RKE2 Agent
Create the configuration directory:
mkdir -p /etc/rancher/rke2/
Get the node token from the master node:
# On master node, get the token
cat /var/lib/rancher/rke2/server/node-token
Create the agent configuration:
vim /etc/rancher/rke2/config.yaml
Add the following configuration (replace <token>
with the actual token from the master):
server: https://k8s1.k8s.net:9345
token: <token from server node>
Important Notes:
- RKE2 server listens on port
9345
for node registration - Kubernetes API remains on standard port
6443
- Each node must have a unique hostname
Start RKE2 Agent
Enable and start the agent service:
systemctl enable --now rke2-agent.service
Monitor the agent startup using tmux terminal multiplexer for persistent sessions:
journalctl -u rke2-agent -f
Label Worker Nodes (Optional)
From the master node, add worker role labels for better organization:
# Add kubectl to PATH or use full path
export PATH=$PATH:/var/lib/rancher/rke2/bin
# Label worker nodes
kubectl label node k8s2.k8s.net node-role.kubernetes.io/worker=worker
kubectl label node k8s3.k8s.net node-role.kubernetes.io/worker=worker
Configure Cilium Load Balancing
Cilium provides advanced load balancing capabilities. Configure L2 announcements and IP pools for external access.
Create IP Pool Configuration
Create a file for the load balancer IP pool:
vim ippool.yaml
Add the following configuration:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: default-pool
spec:
cidrs:
- cidr: "11.11.11.0/24" # Adjust to your network range
Create L2 Announcement Policy
Create the L2 announcement policy:
vim l2announcementpolicy.yaml
Add the following configuration:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: default-policy
spec:
externalIPs: true
loadBalancerIPs: true
Apply Cilium Configurations
Apply both configurations to your cluster:
kubectl apply -f ippool.yaml
kubectl apply -f l2announcementpolicy.yaml
At this point, you have a fully functional RKE2 Kubernetes cluster with Cilium CNI and load balancing capabilities.
NFS Storage Server Setup
Configure the dedicated NFS server for persistent storage provisioning.
Prepare Storage Directory
Assuming you have mounted your storage device to /shared
(adjust path as needed):
# Set appropriate ownership for NFS sharing
chown nobody:nogroup /shared -R
Install NFS Server
Update packages and install NFS server:
apt update && apt install nfs-kernel-server -y
Configure NFS Exports
Configure the NFS export for your Kubernetes subnet:
vim /etc/exports
Add the following export configuration:
/shared 11.11.11.0/24(rw,sync,no_subtree_check,no_root_squash)
Export Options Explained:
rw
- Read-write accesssync
- Synchronous writes for data safetyno_subtree_check
- Disable subtree checking for performanceno_root_squash
- Allow root access from clients
Start NFS Server
Enable and start the NFS server service:
systemctl enable --now nfs-kernel-server.service
Verify NFS Export
Check that your export is active:
exportfs -av
You should see output confirming your /shared
directory is exported.
Configure Kubernetes NFS Storage Integration
Integrate NFS storage with Kubernetes using the NFS Subdir External Provisioner.
Install NFS Provisioner
Add the Helm repository and install the NFS provisioner:
# Add the NFS provisioner Helm repository
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
# Install the provisioner with custom configuration
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
--namespace kube-system \
--set nfs.server=nfs.k8s.net \
--set nfs.path=/shared \
--set storageClass.defaultClass=true \
--set storageClass.onDelete=delete \
--set storageClass.accessModes=ReadWriteMany
Configuration Parameters:
nfs.server
- NFS server hostnamenfs.path
- Exported directory pathstorageClass.defaultClass=true
- Set as default storage classstorageClass.onDelete=delete
- Clean up storage when PVC is deletedstorageClass.accessModes=ReadWriteMany
- Allow multiple pod access
Verify Storage Configuration
Check that the storage class is created and set as default:
kubectl get storageclass
You should see the NFS storage class marked as default.
Cluster Verification and Testing
Verify Node Status
Check that all nodes are ready:
kubectl get nodes -o wide
Verify System Pods
Ensure all system components are running:
kubectl get pods -n kube-system
Test Storage Provisioning
Create a test PVC to verify NFS integration:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
EOF
Verify the PVC is bound:
kubectl get pvc test-pvc
Clean up the test:
kubectl delete pvc test-pvc
Security Considerations
Security Considerations
Network Security
- Configure firewall rules to restrict access to cluster ports using UFW firewall management
- Use network policies to control pod-to-pod communication
- Consider implementing TLS encryption for NFS if handling sensitive data
Access Control
- Implement RBAC (Role-Based Access Control) for user management
- Use service accounts with minimal required permissions
- Regularly rotate certificates and tokens using OpenSSL certificate management
Monitoring and Maintenance
- Set up monitoring for cluster health and resource usage
- Implement log aggregation for troubleshooting
- Plan regular backup strategies for etcd and persistent data
- Keep RKE2 and Cilium updated to latest stable versions
Troubleshooting Common Issues
Node Join Failures
- Verify network connectivity between nodes
- Check that the node token is correct and not expired
- Ensure firewall rules allow communication on required ports (6443, 9345)
Storage Issues
- Verify NFS server is accessible from all nodes
- Check NFS export permissions and network configuration
- Ensure nfs-common is installed on all worker nodes
Cilium Networking Issues
- Check Cilium pod logs:
kubectl logs -n kube-system -l k8s-app=cilium
- Verify L2 announcement configuration is applied correctly
- Ensure IP pool CIDR doesn’t conflict with existing network ranges
Performance Optimization
Resource Allocation
- Monitor resource usage and adjust CPU/memory limits as needed
- Consider node affinity rules for workload placement
- Implement horizontal pod autoscaling for dynamic workloads
Storage Performance
- Use SSD storage for NFS server when possible
- Consider multiple NFS servers for high-availability setups
- Monitor I/O patterns and optimize accordingly
Network Optimization
- Tune Cilium configuration for your specific workload patterns
- Implement proper service mesh configuration if needed
- Consider dedicated network interfaces for cluster traffic
References and Resources
Official Documentation
- RKE2 Documentation - Complete RKE2 installation and configuration guide
- Cilium Documentation - Comprehensive Cilium networking documentation
- Kubernetes NFS Documentation - Official Kubernetes NFS storage guide
Related Tools and Resources
- NFS Subdir External Provisioner - GitHub repository for NFS storage provisioner
- Helm Documentation - Package manager for Kubernetes
- kubectl Reference - Command-line tool documentation
Advanced Topics
- RKE2 High Availability Setup - Multi-master cluster configuration
- Cilium Load Balancing - Advanced load balancing features
- Kubernetes Security Best Practices - Security hardening guidelines
Questions Answered in This Document
Q: What are the minimum hardware requirements for an RKE2 Kubernetes cluster? A: For the Kubernetes nodes, you need at least 4 CPU cores and 8GB RAM per server running Ubuntu 22.04. For the NFS storage server, 1 CPU core and 1GB RAM is sufficient, with additional storage space (100GB+) for shared volumes.
Q: Why disable kube-proxy when using Cilium?
A: Cilium can replace kube-proxy functionality with better performance and more advanced features like load balancing and network policies. Setting disable-kube-proxy: true
and kubeProxyReplacement: true
eliminates redundancy and improves cluster efficiency.
Q: How do I troubleshoot nodes that fail to join the cluster?
A: Check network connectivity between nodes, verify the node token is correct using cat /var/lib/rancher/rke2/server/node-token
, ensure firewall rules allow traffic on ports 6443 and 9345, and check service logs with journalctl -u rke2-agent -f
.
Q: What storage access modes does the NFS provisioner support? A: The NFS storage provisioner supports ReadWriteMany access mode, allowing multiple pods to mount the same volume simultaneously for read and write operations, which is ideal for shared application data and logs.
Q: How do I configure external access to services in this setup? A: Use Cilium’s L2 announcements with LoadBalancer services. Configure the CiliumLoadBalancerIPPool with your network CIDR and apply the CiliumL2AnnouncementPolicy to enable external IP assignment and announcement.
Q: What cleanup scripts are available if I need to remove RKE2?
A: RKE2 provides rke2-killall.sh
to stop all processes and rke2-uninstall.sh
for complete removal. These scripts are located in /usr/local/bin
for regular file systems or /opt/rke2/bin
for read-only systems.
Q: How do I add additional worker nodes to an existing cluster?
A: Install RKE2 in agent mode using INSTALL_RKE2_TYPE="agent"
, configure /etc/rancher/rke2/config.yaml
with the server URL and node token, then start the rke2-agent service. Each node must have a unique hostname.
Q: What ports need to be open for RKE2 cluster communication? A: Essential ports include 6443 for Kubernetes API, 9345 for node registration, 10250 for kubelet, 2379-2380 for etcd, and 8472 for Cilium VXLAN. Configure firewall rules to allow these ports between cluster nodes.
Q: How do I monitor the health of Cilium networking?
A: Use kubectl get pods -n kube-system -l k8s-app=cilium
to check Cilium pod status, kubectl logs -n kube-system -l k8s-app=cilium
for logs, and the Cilium CLI tools for detailed network diagnostics and connectivity testing.
Q: Can I use this setup in production environments? A: Yes, this configuration includes production-ready features like etcd metrics exposure, TLS security, resource management, and monitoring capabilities. However, consider implementing additional security hardening, backup strategies, and high-availability configurations for critical production workloads.