Backup

Task 4: Backup and Restore etcd in Kubernetes

etcd is the key-value store that Kubernetes uses to persist cluster state. Taking regular backups is critical to disaster recovery.


Step 1: Access the etcd Pod

Command

kubectl get pods -n kube-system -l component=etcd

This shows the etcd pod running in the control plane.

Common Issues & Solutions

Issue
Cause
Solution

No resources found

Running on a managed service like EKS, GKE, or AKS.

Managed Kubernetes doesn't give direct access to etcd. Use cloud-specific backup methods.

Error: Unauthorized

Running as a non-root user.

Switch to root or use sudo.


Step 2: Find the etcdctl Command Path

Command

ETCDCTL=$(kubectl get pods -n kube-system -l component=etcd -o jsonpath="{.items[0].spec.containers[0].command}" | grep -o '/.*etcdctl')
echo $ETCDCTL
  • If running a self-hosted cluster, install etcdctl:

apt update && apt install etcd-client -y  # Debian/Ubuntu
yum install etcd -y  # CentOS/RHEL

Common Issues & Solutions

Issue
Cause
Solution

Command not found

etcdctl binary is missing.

Manually install etcdctl (apt install etcd-client or yum install etcd).

Error: etcd endpoint not set

ETCDCTL_ENDPOINTS is not configured.

Set ETCDCTL_ENDPOINTS manually.


Step 3: Take an etcd Snapshot

Command

  • This saves an etcd snapshot inside the pod.

Method 2: Run from the Control Plane Node

  • This saves a snapshot on the master node.

Verify Snapshot

Common Issues & Solutions

Issue
Cause
Solution

Error: client connection failed

etcd is not running or wrong endpoint.

Check kubectl get pods -n kube-system and restart etcd.

snapshot save: permission denied

User lacks write permission.

Use sudo or change backup directory.

x509: certificate signed by unknown authority

Incorrect TLS paths.

Verify /etc/kubernetes/pki/etcd/*.crt files exist.


Step 4: Move Snapshot to a Safe Location

To ensure safety, transfer the snapshot to another server or S3 bucket.

Copy Snapshot to Remote Server

Upload to S3

Common Issues & Solutions

Issue
Cause
Solution

scp: command not found

scp not installed.

Install with apt install openssh-client.

Access Denied on S3

IAM policy issue.

Ensure IAM role has s3:PutObject permission.


Step 5: Restore etcd from Backup

Before restoring, stop the Kubernetes control plane.

Step 5.1: Stop Control Plane Services (Only for Self-Managed Clusters)

Commands

Common Issues & Solutions

Issue
Cause
Solution

Unit not found

Running on a managed service.

Use EKS/GKE/AKS restore methods.

Failed to stop service

Service is not running.

Ignore and proceed with restore.


Step 5.2: Restore Snapshot

Command

  • This creates a new etcd data directory.

Common Issues & Solutions

Issue
Cause
Solution

snapshot restore: permission denied

User lacks write access.

Use sudo or correct directory permissions.


Step 5.3: Replace Old etcd Data

Commands


Step 5.4: Restart etcd and Control Plane

Commands

Verify if the cluster is running:

Common Issues & Solutions

Issue
Cause
Solution

Failed to start etcd

Corrupt data directory.

Delete /var/lib/etcd and restore again.

kubectl: Unable to connect

etcd not running.

Check logs with journalctl -u etcd -f.


Step 6: Verify Cluster Health After Restore

Command


Summary

etcd Backup TakenBackup Stored on Remote StorageCluster Restored from SnapshotControl Plane Services Restarted


Next Task: Do you want to proceed with Kubernetes High Availability setup or Disaster Recovery automation? 😊

Last updated