Node Management

Complete Guide to Node Management in AWS EKS

Managing nodes effectively in an EKS cluster ensures optimal scalability, security, and performance. Below is a detailed approach covering node lifecycle management, draining, scaling, troubleshooting, and common issues with solutions.


🛠 Step 1: Check Node Health & Status

Before making any changes, always verify node health.

kubectl get nodes -o wide

🔹 STATUS should be Ready 🔹 Check node details:

kubectl describe node <node-name>

🔹 Check node resource usage:

kubectl top nodes

If a node is in NotReady, check logs to diagnose issues.


🚀 Step 2: Add New Nodes (Scaling Up)

AWS Managed Node Groups handle upgrades and scaling automatically.

🔹 Add a new node group:

Nodes are automatically added to the cluster.


Option 2: Self-Managed Nodes

1️⃣ Launch an EC2 instance with an EKS-compatible AMI

2️⃣ Join the new nodes to the cluster

Verify the new nodes:


📉 Step 3: Drain and Remove Nodes (Scaling Down)

Before removing a node, safely drain it to avoid workload disruption.

Step 1: Cordon the Node (Prevent New Pods)

The node will not accept new pods.


Step 2: Drain the Node (Move Running Workloads)

This gracefully evicts running pods before removal.DaemonSet pods (e.g., logging/monitoring) remain.

Common Issue: "Cannot evict pod" error 🔹 Solution: Force drain with --force


Step 3: Delete the Node from the Cluster

🔹 For Managed Node Group:

🔹 For Self-Managed Nodes:

🔹 Terminate the EC2 instance (if applicable):

Verify node removal:


📦 Step 4: Upgrade Nodes (Rolling Update)

To upgrade nodes for security patches:

Step 1: Get Latest Amazon EKS AMI

Step 2: Upgrade Nodes

🔹 Managed Node Groups (Rolling Update):

🔹 Self-Managed Nodes: 1️⃣ Launch new nodes with the latest AMI. 2️⃣ Drain old nodes:

3️⃣ Terminate old instances and update aws-auth-cm.yaml.

Ensure nodes are healthy post-upgrade:


⚠️ Step 5: Troubleshooting Common Node Issues

Below are common node-related issues and their fixes.

❌ Issue 1: Nodes in NotReady State

🔹 Check node status

🔹 Check if kubelet is running

Solution: Restart kubelet

Solution: If disk is full, clean up logs


❌ Issue 2: New Pods Stuck in Pending

🔹 Check node capacity:

🔹 Check if there are taints:

Solution: Remove unwanted taints


❌ Issue 3: Node Fails to Join Cluster

🔹 Check logs for join errors:

🔹 Check if IAM role is missing permissions

Solution: Attach proper IAM policies.


❌ Issue 4: Disk Pressure Warning

🔹 Check disk usage:

🔹 Clean up unused Docker images


❌ Issue 5: Nodes Stuck in Terminating

🔹 Check if node.kubernetes.io/unreachable taint is present:

Solution: Manually delete the node


📌 Summary

Monitor and manage node healthAdd new nodes using eksctl or manuallyDrain and delete nodes properly to prevent downtimeUpgrade nodes with a rolling update strategyTroubleshoot node issues efficiently


🚀 NEXT: Do you want to configure Cluster Autoscaler or Karpenter for auto-scaling nodes?

Last updated