Node Management

Complete Guide to Node Management in AWS EKS

Managing nodes effectively in an EKS cluster ensures optimal scalability, security, and performance. Below is a detailed approach covering node lifecycle management, draining, scaling, troubleshooting, and common issues with solutions.


πŸ›  Step 1: Check Node Health & Status

Before making any changes, always verify node health.

kubectl get nodes -o wide

πŸ”Ή STATUS should be Ready πŸ”Ή Check node details:

kubectl describe node <node-name>

πŸ”Ή Check node resource usage:

kubectl top nodes

βœ… If a node is in NotReady, check logs to diagnose issues.


πŸš€ Step 2: Add New Nodes (Scaling Up)

AWS Managed Node Groups handle upgrades and scaling automatically.

πŸ”Ή Add a new node group:

βœ… Nodes are automatically added to the cluster.


Option 2: Self-Managed Nodes

1️⃣ Launch an EC2 instance with an EKS-compatible AMI

2️⃣ Join the new nodes to the cluster

βœ… Verify the new nodes:


πŸ“‰ Step 3: Drain and Remove Nodes (Scaling Down)

Before removing a node, safely drain it to avoid workload disruption.

Step 1: Cordon the Node (Prevent New Pods)

βœ… The node will not accept new pods.


Step 2: Drain the Node (Move Running Workloads)

βœ… This gracefully evicts running pods before removal. βœ… DaemonSet pods (e.g., logging/monitoring) remain.

Common Issue: "Cannot evict pod" error πŸ”Ή Solution: Force drain with --force


Step 3: Delete the Node from the Cluster

πŸ”Ή For Managed Node Group:

πŸ”Ή For Self-Managed Nodes:

πŸ”Ή Terminate the EC2 instance (if applicable):

βœ… Verify node removal:


πŸ“¦ Step 4: Upgrade Nodes (Rolling Update)

To upgrade nodes for security patches:

Step 1: Get Latest Amazon EKS AMI

Step 2: Upgrade Nodes

πŸ”Ή Managed Node Groups (Rolling Update):

πŸ”Ή Self-Managed Nodes: 1️⃣ Launch new nodes with the latest AMI. 2️⃣ Drain old nodes:

3️⃣ Terminate old instances and update aws-auth-cm.yaml.

βœ… Ensure nodes are healthy post-upgrade:


⚠️ Step 5: Troubleshooting Common Node Issues

Below are common node-related issues and their fixes.

❌ Issue 1: Nodes in NotReady State

πŸ”Ή Check node status

πŸ”Ή Check if kubelet is running

βœ… Solution: Restart kubelet

βœ… Solution: If disk is full, clean up logs


❌ Issue 2: New Pods Stuck in Pending

πŸ”Ή Check node capacity:

πŸ”Ή Check if there are taints:

βœ… Solution: Remove unwanted taints


❌ Issue 3: Node Fails to Join Cluster

πŸ”Ή Check logs for join errors:

πŸ”Ή Check if IAM role is missing permissions

βœ… Solution: Attach proper IAM policies.


❌ Issue 4: Disk Pressure Warning

πŸ”Ή Check disk usage:

πŸ”Ή Clean up unused Docker images


❌ Issue 5: Nodes Stuck in Terminating

πŸ”Ή Check if node.kubernetes.io/unreachable taint is present:

βœ… Solution: Manually delete the node


πŸ“Œ Summary

βœ” Monitor and manage node health βœ” Add new nodes using eksctl or manually βœ” Drain and delete nodes properly to prevent downtime βœ” Upgrade nodes with a rolling update strategy βœ” Troubleshoot node issues efficiently


πŸš€ NEXT: Do you want to configure Cluster Autoscaler or Karpenter for auto-scaling nodes?

Last updated