Install Kubernetes Using RKE2 with Cilium

RKE2 (Rancher Kubernetes Engine 2) is Rancher’s next-generation Kubernetes distribution that focuses on security and compliance for government and enterprise environments. This comprehensive guide walks you through setting up a production-ready Kubernetes cluster using RKE2 with Cilium as the Container Network Interface (CNI) and NFS for persistent storage.

This setup provides a robust, scalable foundation for container orchestration with advanced networking capabilities, load balancing, and reliable storage solutions. The configuration includes security best practices and monitoring capabilities essential for production environments.

Infrastructure Requirements

Server Specifications

Kubernetes Nodes (3 servers):

  • Operating System: Ubuntu 22.04 LTS
  • CPU: 4 cores minimum
  • RAM: 8GB minimum
  • Network: Static IP addresses recommended
  • Storage: 50GB+ for system and container images

NFS Storage Server (1 server):

  • Operating System: Ubuntu 22.04 LTS
  • CPU: 1 core minimum
  • RAM: 1GB minimum
  • Storage: 100GB+ for shared storage
  • Network: Same subnet as Kubernetes nodes

Network Configuration

This guide uses the following IP addressing scheme:

  • NFS Server: 11.11.11.10 (nfs.k8s.net)
  • Kubernetes Master: 11.11.11.11 (k8s1.k8s.net)
  • Kubernetes Worker 1: 11.11.11.12 (k8s2.k8s.net)
  • Kubernetes Worker 2: 11.11.11.13 (k8s3.k8s.net)

Adjust these addresses to match your network environment.

Pre-Installation Setup

The following steps must be performed on all Kubernetes nodes (master and workers).

System Updates and Preparation

Update your system packages to ensure security patches and compatibility:

apt update && apt upgrade -y

Disable Swap

Kubernetes requires swap to be disabled for optimal performance and resource management:

# Disable swap immediately
swapoff -a
 
# Edit fstab to prevent swap from being enabled on reboot
vim /etc/fstab
# Comment out any swap lines (add # at the beginning)
 
# Remove swap file if it exists
rm /swap.img

Install NFS Client Support

Since this setup uses NFS for persistent storage, install NFS client utilities on all nodes:

apt install nfs-common -y

Configure Host Resolution

Add hostname entries to /etc/hosts on all servers for reliable internal communication:

vim /etc/hosts

Add these entries:

11.11.11.10     nfs.k8s.net
11.11.11.11     k8s1.k8s.net
11.11.11.12     k8s2.k8s.net
11.11.11.13     k8s3.k8s.net

RKE2 Server (Master) Node Installation

Perform these steps on your designated master node (k8s1.k8s.net).

Install RKE2 Server

Download and install RKE2 using the official installation script:

curl -sfL https://get.rke2.io | sh -

This script installs the rke2-server service and associated binaries. The installation requires root privileges.

Configure RKE2 Server

Create the RKE2 configuration directory:

mkdir -p /etc/rancher/rke2

Create the server configuration file:

vim /etc/rancher/rke2/config.yaml

Add the following configuration:

disable:
  - rke2-ingress-nginx  # We'll use Cilium's ingress controller
  - rke2-canal          # Disable default CNI to use Cilium
disable-kube-proxy: true # Cilium will replace kube-proxy
cni:
  - cilium
tls-san:
  - api.k8s.net          # Additional TLS subject alternative name
etcd-expose-metrics: true # Enable etcd metrics for monitoring
kube-controller-manager-arg:
  - bind-address=0.0.0.0 # Allow external access to metrics
kube-scheduler-arg:
  - bind-address=0.0.0.0 # Allow external access to metrics

Configure Cilium CNI

Create the manifests directory for custom configurations:

mkdir -p /var/lib/rancher/rke2/server/manifests

Create the Cilium configuration:

vim /var/lib/rancher/rke2/server/manifests/rke2-cilium-config.yaml

Add the following Cilium configuration:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-cilium
  namespace: kube-system
spec:
  valuesContent: |-
    kubeProxyReplacement: true    # Replace kube-proxy with Cilium
    k8sServiceHost: 11.11.11.11   # Kubernetes API server IP
    k8sServicePort: 6443          # Kubernetes API server port
    cni:
      chainingMode: "none"        # Direct CNI mode
    l2announcements:
      enabled: true               # Enable L2 load balancer announcements
    ingressController:
      enabled: true               # Enable Cilium ingress controller
      default: true               # Set as default ingress class
    k8sClientRateLimit:
      qps: 30                     # Queries per second limit
      burst: 50                   # Burst limit for API requests
    externalIPs:
      enabled: true               # Enable external IP support

Start RKE2 Server

Enable and start the RKE2 server service:

systemctl enable --now rke2-server.service

Monitor Installation Progress

Monitor the installation progress with logs using tmux for persistent terminal sessions:

journalctl -u rke2-server -f

Wait for the service to start successfully before proceeding. You should see messages indicating successful cluster initialization.

Post-Installation Verification

After installation, RKE2 provides several important files and utilities:

Installed Utilities (located in /var/lib/rancher/rke2/bin/):

  • kubectl - Kubernetes command-line tool
  • crictl - Container runtime interface tool
  • ctr - Low-level container tool

Important Files:

  • Kubeconfig: /etc/rancher/rke2/rke2.yaml
  • Node token: /var/lib/rancher/rke2/server/node-token

Cleanup Scripts (for maintenance):

  • rke2-killall.sh - Stop all RKE2 processes
  • rke2-uninstall.sh - Complete removal script

Worker Node Installation

Perform these steps on each worker node (k8s2.k8s.net, k8s3.k8s.net).

Install RKE2 Agent

Install RKE2 in agent mode:

curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" sh -

Configure RKE2 Agent

Create the configuration directory:

mkdir -p /etc/rancher/rke2/

Get the node token from the master node:

# On master node, get the token
cat /var/lib/rancher/rke2/server/node-token

Create the agent configuration:

vim /etc/rancher/rke2/config.yaml

Add the following configuration (replace <token> with the actual token from the master):

server: https://k8s1.k8s.net:9345
token: <token from server node>

Important Notes:

  • RKE2 server listens on port 9345 for node registration
  • Kubernetes API remains on standard port 6443
  • Each node must have a unique hostname

Start RKE2 Agent

Enable and start the agent service:

systemctl enable --now rke2-agent.service

Monitor the agent startup using tmux terminal multiplexer for persistent sessions:

journalctl -u rke2-agent -f

Label Worker Nodes (Optional)

From the master node, add worker role labels for better organization:

# Add kubectl to PATH or use full path
export PATH=$PATH:/var/lib/rancher/rke2/bin
 
# Label worker nodes
kubectl label node k8s2.k8s.net node-role.kubernetes.io/worker=worker
kubectl label node k8s3.k8s.net node-role.kubernetes.io/worker=worker

Configure Cilium Load Balancing

Cilium provides advanced load balancing capabilities. Configure L2 announcements and IP pools for external access.

Create IP Pool Configuration

Create a file for the load balancer IP pool:

vim ippool.yaml

Add the following configuration:

apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: default-pool
spec:
  cidrs:
  - cidr: "11.11.11.0/24"  # Adjust to your network range

Create L2 Announcement Policy

Create the L2 announcement policy:

vim l2announcementpolicy.yaml

Add the following configuration:

apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: default-policy
spec:
  externalIPs: true
  loadBalancerIPs: true

Apply Cilium Configurations

Apply both configurations to your cluster:

kubectl apply -f ippool.yaml
kubectl apply -f l2announcementpolicy.yaml

At this point, you have a fully functional RKE2 Kubernetes cluster with Cilium CNI and load balancing capabilities.

NFS Storage Server Setup

Configure the dedicated NFS server for persistent storage provisioning.

Prepare Storage Directory

Assuming you have mounted your storage device to /shared (adjust path as needed):

# Set appropriate ownership for NFS sharing
chown nobody:nogroup /shared -R

Install NFS Server

Update packages and install NFS server:

apt update && apt install nfs-kernel-server -y

Configure NFS Exports

Configure the NFS export for your Kubernetes subnet:

vim /etc/exports

Add the following export configuration:

/shared 11.11.11.0/24(rw,sync,no_subtree_check,no_root_squash)

Export Options Explained:

  • rw - Read-write access
  • sync - Synchronous writes for data safety
  • no_subtree_check - Disable subtree checking for performance
  • no_root_squash - Allow root access from clients

Start NFS Server

Enable and start the NFS server service:

systemctl enable --now nfs-kernel-server.service

Verify NFS Export

Check that your export is active:

exportfs -av

You should see output confirming your /shared directory is exported.

Configure Kubernetes NFS Storage Integration

Integrate NFS storage with Kubernetes using the NFS Subdir External Provisioner.

Install NFS Provisioner

Add the Helm repository and install the NFS provisioner:

# Add the NFS provisioner Helm repository
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
 
# Install the provisioner with custom configuration
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
    --namespace kube-system \
    --set nfs.server=nfs.k8s.net \
    --set nfs.path=/shared \
    --set storageClass.defaultClass=true \
    --set storageClass.onDelete=delete \
    --set storageClass.accessModes=ReadWriteMany

Configuration Parameters:

  • nfs.server - NFS server hostname
  • nfs.path - Exported directory path
  • storageClass.defaultClass=true - Set as default storage class
  • storageClass.onDelete=delete - Clean up storage when PVC is deleted
  • storageClass.accessModes=ReadWriteMany - Allow multiple pod access

Verify Storage Configuration

Check that the storage class is created and set as default:

kubectl get storageclass

You should see the NFS storage class marked as default.

Cluster Verification and Testing

Verify Node Status

Check that all nodes are ready:

kubectl get nodes -o wide

Verify System Pods

Ensure all system components are running:

kubectl get pods -n kube-system

Test Storage Provisioning

Create a test PVC to verify NFS integration:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
EOF

Verify the PVC is bound:

kubectl get pvc test-pvc

Clean up the test:

kubectl delete pvc test-pvc

Security Considerations

Security Considerations

Network Security

  • Configure firewall rules to restrict access to cluster ports using UFW firewall management
  • Use network policies to control pod-to-pod communication
  • Consider implementing TLS encryption for NFS if handling sensitive data

Access Control

  • Implement RBAC (Role-Based Access Control) for user management
  • Use service accounts with minimal required permissions
  • Regularly rotate certificates and tokens using OpenSSL certificate management

Monitoring and Maintenance

  • Set up monitoring for cluster health and resource usage
  • Implement log aggregation for troubleshooting
  • Plan regular backup strategies for etcd and persistent data
  • Keep RKE2 and Cilium updated to latest stable versions

Troubleshooting Common Issues

Node Join Failures

  • Verify network connectivity between nodes
  • Check that the node token is correct and not expired
  • Ensure firewall rules allow communication on required ports (6443, 9345)

Storage Issues

  • Verify NFS server is accessible from all nodes
  • Check NFS export permissions and network configuration
  • Ensure nfs-common is installed on all worker nodes

Cilium Networking Issues

  • Check Cilium pod logs: kubectl logs -n kube-system -l k8s-app=cilium
  • Verify L2 announcement configuration is applied correctly
  • Ensure IP pool CIDR doesn’t conflict with existing network ranges

Performance Optimization

Resource Allocation

  • Monitor resource usage and adjust CPU/memory limits as needed
  • Consider node affinity rules for workload placement
  • Implement horizontal pod autoscaling for dynamic workloads

Storage Performance

  • Use SSD storage for NFS server when possible
  • Consider multiple NFS servers for high-availability setups
  • Monitor I/O patterns and optimize accordingly

Network Optimization

  • Tune Cilium configuration for your specific workload patterns
  • Implement proper service mesh configuration if needed
  • Consider dedicated network interfaces for cluster traffic

References and Resources

Official Documentation

Advanced Topics

Questions Answered in This Document

Q: What are the minimum hardware requirements for an RKE2 Kubernetes cluster? A: For the Kubernetes nodes, you need at least 4 CPU cores and 8GB RAM per server running Ubuntu 22.04. For the NFS storage server, 1 CPU core and 1GB RAM is sufficient, with additional storage space (100GB+) for shared volumes.

Q: Why disable kube-proxy when using Cilium? A: Cilium can replace kube-proxy functionality with better performance and more advanced features like load balancing and network policies. Setting disable-kube-proxy: true and kubeProxyReplacement: true eliminates redundancy and improves cluster efficiency.

Q: How do I troubleshoot nodes that fail to join the cluster? A: Check network connectivity between nodes, verify the node token is correct using cat /var/lib/rancher/rke2/server/node-token, ensure firewall rules allow traffic on ports 6443 and 9345, and check service logs with journalctl -u rke2-agent -f.

Q: What storage access modes does the NFS provisioner support? A: The NFS storage provisioner supports ReadWriteMany access mode, allowing multiple pods to mount the same volume simultaneously for read and write operations, which is ideal for shared application data and logs.

Q: How do I configure external access to services in this setup? A: Use Cilium’s L2 announcements with LoadBalancer services. Configure the CiliumLoadBalancerIPPool with your network CIDR and apply the CiliumL2AnnouncementPolicy to enable external IP assignment and announcement.

Q: What cleanup scripts are available if I need to remove RKE2? A: RKE2 provides rke2-killall.sh to stop all processes and rke2-uninstall.sh for complete removal. These scripts are located in /usr/local/bin for regular file systems or /opt/rke2/bin for read-only systems.

Q: How do I add additional worker nodes to an existing cluster? A: Install RKE2 in agent mode using INSTALL_RKE2_TYPE="agent", configure /etc/rancher/rke2/config.yaml with the server URL and node token, then start the rke2-agent service. Each node must have a unique hostname.

Q: What ports need to be open for RKE2 cluster communication? A: Essential ports include 6443 for Kubernetes API, 9345 for node registration, 10250 for kubelet, 2379-2380 for etcd, and 8472 for Cilium VXLAN. Configure firewall rules to allow these ports between cluster nodes.

Q: How do I monitor the health of Cilium networking? A: Use kubectl get pods -n kube-system -l k8s-app=cilium to check Cilium pod status, kubectl logs -n kube-system -l k8s-app=cilium for logs, and the Cilium CLI tools for detailed network diagnostics and connectivity testing.

Q: Can I use this setup in production environments? A: Yes, this configuration includes production-ready features like etcd metrics exposure, TLS security, resource management, and monitoring capabilities. However, consider implementing additional security hardening, backup strategies, and high-availability configurations for critical production workloads.