Title here
Summary here
You have a 3-node HA Kubernetes cluster. Due to a critical failure (e.g., 2 nodes corrupted simultaneously), you have lost quorum in etcd. The API server is read-only or unresponsive. You have a snapshot backup snapshot.db.
How do you restore the cluster functionality?
kube-apiserver and etcd static pods on all masters to prevent further corruption.etcdctl snapshot restore snapshot.db --data-dir /var/lib/etcd-new .... This creates a new single-member cluster from the data.etcd.yaml manifest to point to the new data directory and set --initial-cluster-state=new.