555 words
3 minutes
Mastering ETCD Backup and Restore in Kubernetes

Mastering ETCD Backup and Restore in Kubernetes#

ETCD is the backbone of Kubernetes, storing all cluster data. Understanding how to backup and restore ETCD is crucial for any Kubernetes administrator. In this guide, we’ll walk through the process of backing up ETCD and restoring it in case of a disaster.

Prerequisites#

  • A running Kubernetes cluster (preferably set up with kubeadm)
  • SSH access to the control plane node
  • kubectl and etcdctl installed on the control plane node

Step 1: Cluster Health Check#

First, let’s ensure our cluster is healthy:

Terminal window
kubectl get nodes

All nodes should be in the “Ready” state.

Step 2: Identify ETCD Version#

To find the ETCD version:

Terminal window
kubectl describe pod etcd-controlplane -n kube-system | grep Image:

Note the version for compatibility purposes.

Step 3: Create a Sample Deployment#

Let’s create a deployment to demonstrate the backup and restore process:

Terminal window
kubectl create deployment nginx --image=nginx --replicas=2

Verify the deployment:

Terminal window
kubectl get deployments

Step 4: Identify ETCD Endpoint and Certificates#

We need to locate the ETCD endpoint, server certificates, and CA certificates. These are typically found in the ETCD pod manifest:

Terminal window
sudo cat /etc/kubernetes/manifests/etcd.yaml

Look for:

  • --listen-client-urls for the endpoint
  • --cert-file for the server certificate
  • --trusted-ca-file for the CA certificate

Step 5: Set ETCDCTL API Version#

Set the ETCDCTL API version:

Terminal window
export ETCDCTL_API=3

Step 6: Take ETCD Snapshot#

Now, let’s take a snapshot of ETCD:

Terminal window
sudo ETCDCTL_API=3 etcdctl snapshot save /opt/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key

Verify the snapshot:

Terminal window
sudo ETCDCTL_API=3 etcdctl snapshot status /opt/etcd-snapshot.db

Step 7: Simulate Disaster#

Let’s delete our nginx deployment to simulate a disaster:

Terminal window
kubectl delete deployment nginx

Verify that the deployment is gone:

Terminal window
kubectl get deployments

Step 8: Restore ETCD from Backup#

Now, let’s restore ETCD from our backup:

  1. Stop the kubelet service:

    Terminal window
    sudo systemctl stop kubelet
  2. Stop the container runtime (e.g., containerd):

    Terminal window
    sudo systemctl stop containerd
  3. Restore the snapshot:

    Terminal window
    sudo ETCDCTL_API=3 etcdctl snapshot restore /opt/etcd-snapshot.db \
    --data-dir /var/lib/etcd-from-backup
  4. Update the ETCD pod manifest to use the new data directory:

    Terminal window
    sudo sed -i 's/path: \/var\/lib\/etcd/path: \/var\/lib\/etcd-from-backup/g' /etc/kubernetes/manifests/etcd.yaml
  5. Restart the kubelet and container runtime:

    Terminal window
    sudo systemctl start containerd
    sudo systemctl start kubelet

Step 9: Verify Restoration#

After a few minutes, check if the nginx deployment is back:

Terminal window
kubectl get deployments

You should see the nginx deployment with 2 replicas, just as it was before we deleted it.

Key Takeaways#

  1. Regular Backups: Implement a regular backup schedule for ETCD. The frequency depends on your recovery point objective (RPO).

  2. Version Compatibility: Ensure that the etcdctl version matches your ETCD version for compatibility.

  3. Secure Your Backups: ETCD backups contain sensitive cluster data. Ensure they are stored securely and encrypted.

  4. Test Your Backups: Regularly test the restore process to ensure your backups are valid and your team is familiar with the procedure.

  5. Document the Process: Keep detailed documentation of your backup and restore process, including the location of certificates and endpoints.

  6. Cluster-Specific Details: Be aware that certificate paths and endpoints may vary depending on your cluster setup.

  7. Minimize Downtime: Practice the restore process to minimize downtime during a real disaster recovery scenario.

  8. Backup Metadata: Along with ETCD data, backup important metadata like the ETCD version and cluster configuration.

Conclusion#

Mastering ETCD backup and restore is crucial for maintaining a resilient Kubernetes cluster. By following this guide, you’ve learned how to take snapshots of your ETCD data and restore your cluster to a previous state. Remember, regular practice and documentation of this process are key to ensuring you can quickly recover from potential disasters.

As you continue to work with Kubernetes, consider integrating these backup and restore procedures into your regular maintenance routines and disaster recovery plans. With proper ETCD management, you can ensure the reliability and stability of your Kubernetes clusters, even in the face of unexpected issues.

Mastering ETCD Backup and Restore in Kubernetes
https://mranv.pages.dev/posts/kubernetes-etcd-backup-restore/
Author
Anubhav Gain
Published at
2024-09-27
License
CC BY-NC-SA 4.0