Scaling using Karpenter

Utilizing Spot instance at 60% discount rate for scaled nodes

Deployed using helm chart

πŸ”₯ Problems before Karpenter:

  • ~EC2 Auto Scaling Groups required predefining instance types and capacities

  • ~Underutilized nodes due to static sizing and slow scaling decisions

  • ~Provisioning delays β€” new nodes took time to launch and register

  • ~High AWS costs due to over-provisioned capacity during traffic spikes

  • ~Manual management of ASG launch templates for multiple workloads

  • ~Node fragmentation β€” pods failed to schedule even with idle resources

  • ~Inflexible scaling during rapid CI/CD deployments or workload bursts

  • ~Hard to optimize for Spot + On-Demand mix without complexity


πŸ› οΈ Solutions using Karpenter:

  • βœ… Installed Karpenter via Helm with IAM roles and OIDC trust on our EKS cluster

  • βœ… Defined Provisioners based on workload needs (e.g., general, GPU, Spot)

  • βœ… Enabled automatic node provisioning with real-time pod-driven scaling

  • βœ… Used consolidation to automatically downsize and bin-pack underutilized nodes

  • βœ… Allowed Karpenter to choose optimal instance types and sizes across Spot and On-Demand

  • βœ… Tuned ttlSecondsAfterEmpty for faster scale-down of idle nodes

  • βœ… Used taints, tolerations, and labels to isolate workloads by team/priority

  • βœ… Integrated with Prometheus and CloudWatch for cost visibility and scaling metrics


βœ… Here’s a summary of what we achieved:

  • βœ… ~45% cost reduction by leveraging Spot instances dynamically

  • βœ… Improved pod scheduling speed β€” workloads spin up in under a minute

  • βœ… ~30% better resource utilization via bin-packing and consolidation

  • βœ… Eliminated need for ASG complexity β€” no static instance mapping

  • βœ… Highly responsive scaling based on actual pod needs

  • βœ… Faster rollout of high-load jobs β€” CI/CD jobs no longer queue for resources

  • βœ… Simplified ops β€” no more managing launch templates or capacity planning

  • βœ… Flexible provisioning β€” different instance types per workload with minimal config

Last updated