Running with a single master node in production poses a serious risk. In this article, I'll explain step-by-step how to set up a High Availability (HA) Kubernetes cluster.
HA Cluster Architecture
Components we'll use:
- 2 Master Nodes (Control Plane)
- 4 Worker Nodes
- HAProxy Load Balancer
- etcd Cluster (3 nodes)
Prerequisites
Minimum requirements for each node:
- 2 CPUs
- 4GB RAM
- 20GB Disk
- Ubuntu 22.04 LTS
HAProxy Installation
First, we'll place HAProxy in front of the master nodes.
# Install HAProxy
sudo apt-get update
sudo apt-get install -y haproxy
# HAProxy configuration
sudo cat > /etc/haproxy/haproxy.cfg << 'HAPROXY'
global
log /dev/log local0
log /dev/log local1 notice
daemon
defaults
log global
mode tcp
option tcplog
timeout connect 5000
timeout client 50000
timeout server 50000
frontend kubernetes-apiserver
bind *:6443
mode tcp
option tcplog
default_backend kubernetes-master
backend kubernetes-master
mode tcp
balance roundrobin
option tcp-check
server master1 192.168.1.10:6443 check
server master2 192.168.1.11:6443 check
HAPROXY
# Restart HAProxy
sudo systemctl restart haproxy
sudo systemctl enable haproxy
Master Node Setup
Let's set up the first master node:
# Install container runtime (containerd)
sudo apt-get install -y containerd
# Configure containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
sudo systemctl enable containerd
# Install Kubernetes packages
sudo apt-get install -y kubeadm kubelet kubectl
sudo apt-mark hold kubeadm kubelet kubectl
# Initialize first master node
sudo kubeadm init \
--control-plane-endpoint="haproxy-ip:6443" \
--upload-certs \
--pod-network-cidr=10.244.0.0/16
Adding Second Master Node
# Join command from first master
sudo kubeadm join haproxy-ip:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane \
--certificate-key <cert-key>
Adding Worker Nodes
# Run on each worker node
sudo kubeadm join haproxy-ip:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash>
Network Plugin (Calico)
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
etcd Cluster Health
# Check etcd cluster status
kubectl exec -it etcd-master1 -n kube-system -- \
etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list
# etcd health check
kubectl exec -it etcd-master1 -n kube-system -- \
etcdctl endpoint health
Load Balancer Test
# HAProxy stats page
curl http://haproxy-ip:9000/stats
# API server access test
kubectl get nodes
# Shut down one master node and test again
kubectl get nodes
Failover Test
Simulate master node failure:
# Stop Master1
sudo systemctl stop kubelet
# API server should still be accessible
kubectl get pods -A
# Restart Master1
sudo systemctl start kubelet
Monitoring
# Install Prometheus and Grafana
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring --create-namespace
Backup and Restore
# etcd backup
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# etcd restore (disaster recovery)
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
--data-dir=/var/lib/etcd-restore
Conclusion
With HA Kubernetes cluster:
- ✅ Zero downtime
- ✅ Automatic failover
- ✅ Scalable infrastructure
- ✅ Production-ready
"High Availability is not optional for production systems - it's a requirement."