Backup

Task 4: Backup and Restore `etcd` in Kubernetes

etcd is the key-value store that Kubernetes uses to persist cluster state. Taking regular backups is critical to disaster recovery.

Step 1: Access the `etcd` Pod

Command

kubectl get pods -n kube-system -l component=etcd

This shows the etcd pod running in the control plane.

Common Issues & Solutions

Issue

Cause

Solution

No resources found

Running on a managed service like EKS, GKE, or AKS.

Managed Kubernetes doesn't give direct access to etcd. Use cloud-specific backup methods.

Error: Unauthorized

Running as a non-root user.

Switch to root or use sudo.

Step 2: Find the `etcdctl` Command Path

Command

ETCDCTL=$(kubectl get pods -n kube-system -l component=etcd -o jsonpath="{.items[0].spec.containers[0].command}" | grep -o '/.*etcdctl')
echo $ETCDCTL

If running a self-hosted cluster, install etcdctl:

apt update && apt install etcd-client -y  # Debian/Ubuntu
yum install etcd -y  # CentOS/RHEL

Common Issues & Solutions

Issue

Cause

Solution

Command not found

etcdctl binary is missing.

Manually install etcdctl (apt install etcd-client or yum install etcd).

Error: etcd endpoint not set

ETCDCTL_ENDPOINTS is not configured.

Set ETCDCTL_ENDPOINTS manually.

Step 3: Take an `etcd` Snapshot

Method 1: Run in the `etcd` Pod (Recommended)

Command

kubectl exec -it etcd-master -n kube-system -- sh -c \
"ETCDCTL_API=3 etcdctl snapshot save /var/lib/etcd/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key"

This saves an etcd snapshot inside the pod.

Method 2: Run from the Control Plane Node

ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key

This saves a snapshot on the master node.

Verify Snapshot

ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-snapshot.db

Common Issues & Solutions

Issue

Cause

Solution

Error: client connection failed

etcd is not running or wrong endpoint.

Check kubectl get pods -n kube-system and restart etcd.

snapshot save: permission denied

User lacks write permission.

Use sudo or change backup directory.

x509: certificate signed by unknown authority

Incorrect TLS paths.

Verify /etc/kubernetes/pki/etcd/*.crt files exist.

Step 4: Move Snapshot to a Safe Location

To ensure safety, transfer the snapshot to another server or S3 bucket.

Copy Snapshot to Remote Server

scp /backup/etcd-snapshot.db user@remote-server:/backups/

Upload to S3

aws s3 cp /backup/etcd-snapshot.db s3://my-backup-bucket/

Common Issues & Solutions

Issue

Cause

Solution

scp: command not found

scp not installed.

Install with apt install openssh-client.

Access Denied on S3

IAM policy issue.

Ensure IAM role has s3:PutObject permission.

Step 5: Restore `etcd` from Backup

Before restoring, stop the Kubernetes control plane.

Step 5.1: Stop Control Plane Services (Only for Self-Managed Clusters)

Commands

systemctl stop kube-apiserver kube-controller-manager kube-scheduler
systemctl stop etcd

Common Issues & Solutions

Issue

Cause

Solution

Unit not found

Running on a managed service.

Use EKS/GKE/AKS restore methods.

Failed to stop service

Service is not running.

Ignore and proceed with restore.

Step 5.2: Restore Snapshot

Command

ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
--data-dir=/var/lib/etcd-new

This creates a new etcd data directory.

Common Issues & Solutions

Issue

Cause

Solution

snapshot restore: permission denied

User lacks write access.

Use sudo or correct directory permissions.

Step 5.3: Replace Old `etcd` Data

Commands

mv /var/lib/etcd /var/lib/etcd-old
mv /var/lib/etcd-new /var/lib/etcd
chown -R etcd:etcd /var/lib/etcd

Step 5.4: Restart `etcd` and Control Plane

Commands

systemctl start etcd
systemctl start kube-apiserver kube-controller-manager kube-scheduler

Verify if the cluster is running:

kubectl get nodes
kubectl cluster-info

Common Issues & Solutions

Issue

Cause

Solution

Failed to start etcd

Corrupt data directory.

Delete /var/lib/etcd and restore again.

kubectl: Unable to connect

etcd not running.

Check logs with journalctl -u etcd -f.

Step 6: Verify Cluster Health After Restore

Command

kubectl get pods --all-namespaces

Summary

✅ etcd Backup Taken ✅ Backup Stored on Remote Storage ✅ Cluster Restored from Snapshot ✅ Control Plane Services Restarted

Next Task: Do you want to proceed with Kubernetes High Availability setup or Disaster Recovery automation? 😊

PreviousUpgrade EKS NextRecovery

Last updated 11 months ago

hashtagTask 4: Backup and Restore etcd in Kubernetes

hashtagStep 1: Access the etcd Pod

hashtagStep 2: Find the etcdctl Command Path

hashtagStep 3: Take an etcd Snapshot

hashtagMethod 1: Run in the etcd Pod (Recommended)

hashtagStep 4: Move Snapshot to a Safe Location

hashtagStep 5: Restore etcd from Backup

hashtagStep 5.1: Stop Control Plane Services (Only for Self-Managed Clusters)

hashtagStep 5.2: Restore Snapshot

hashtagStep 5.3: Replace Old etcd Data

hashtagStep 5.4: Restart etcd and Control Plane

hashtagStep 6: Verify Cluster Health After Restore

hashtagSummary

hashtagNext Task: Do you want to proceed with Kubernetes High Availability setup or Disaster Recovery automation? 😊

Task 4: Backup and Restore `etcd` in Kubernetes

Step 1: Access the `etcd` Pod

Step 2: Find the `etcdctl` Command Path

Step 3: Take an `etcd` Snapshot

Method 1: Run in the `etcd` Pod (Recommended)

Step 4: Move Snapshot to a Safe Location

Step 5: Restore `etcd` from Backup

Step 5.1: Stop Control Plane Services (Only for Self-Managed Clusters)

Step 5.2: Restore Snapshot

Step 5.3: Replace Old `etcd` Data

Step 5.4: Restart `etcd` and Control Plane

Step 6: Verify Cluster Health After Restore

Summary

Next Task: Do you want to proceed with Kubernetes High Availability setup or Disaster Recovery automation? 😊