Troubleshooting
Here are some scenario-based EC2 interview questions with answers to help you prepare:
Scenario 1: High CPU Utilization on EC2 Instance
Question: You have an EC2 instance running a web application, and users report slow response times. After checking CloudWatch, you notice CPU utilization is consistently at 90%. How do you troubleshoot and fix this issue?
Answer:
Analyze the Workload:
Use
htoportopto check which processes are consuming high CPU.If a specific application is consuming excessive CPU, optimize it (e.g., adjust configurations, enable caching, or optimize database queries).
Scale the Instance:
Upgrade to a larger instance type with more vCPUs (e.g., M5.large → M5.xlarge).
Move to a Compute Optimized instance (e.g., C5 series).
Enable Auto Scaling:
Configure an Auto Scaling Group to launch additional instances when CPU exceeds a threshold (e.g., CPU > 80% for 5 minutes).
Use Load Balancing:
Distribute traffic using an Elastic Load Balancer (ELB) to multiple instances.
Optimize the Application:
Enable caching (e.g., use Redis, Memcached).
Reduce background tasks or move them to AWS Lambda for event-driven execution.
Scenario 2: EC2 Instance Not Accessible via SSH
Question: A developer reports that they cannot SSH into an EC2 instance. What steps would you take to troubleshoot this issue?
Answer:
Check Security Group Rules:
Ensure port 22 (SSH) is open to your IP (
0.0.0.0/0is not recommended for security).
Verify Key Pair:
Ensure you're using the correct private key (
.pem) to authenticate.Run:
If the key pair is lost, create a new key pair, attach a new volume to another instance, and copy the new key into
~/.ssh/authorized_keys.
Check Network ACLs:
Ensure the VPC Network ACLs allow inbound and outbound traffic on port 22.
Restart SSH Service:
If you have access via EC2 Instance Connect (Amazon Linux 2, Ubuntu), restart the SSH service:
Verify Instance Status:
Check EC2 instance status checks in AWS Console. If it has failed, reboot the instance.
Scenario 3: EC2 Instance Stopped Unexpectedly
Question: An EC2 instance running a production application was stopped unexpectedly. How do you investigate and prevent this issue?
Answer:
Check AWS CloudTrail Logs:
Look for StopInstances API calls to see if a user manually stopped it.
Check Auto Scaling Termination Policy:
If the instance is part of an Auto Scaling Group, it might have been terminated due to scaling policies.
Verify Spot Instance Interruption:
If it's a Spot Instance, AWS might have reclaimed it due to pricing fluctuations.
Solution: Use Spot Fleet or switch to On-Demand/Reserved Instances for stability.
Check Billing Issues:
If your AWS account has exceeded free tier limits or lacks funds, AWS might have stopped the instance.
Enable Termination Protection:
If the instance is critical, enable termination protection to prevent accidental termination.
Scenario 4: Data Loss After EC2 Termination
Question: You terminated an EC2 instance and lost all the application data. How can you prevent this from happening in the future?
Answer:
Use EBS Volumes Instead of Instance Store:
Instance Store data is lost when the instance is terminated.
Solution: Attach an EBS volume (
gp3,io1, etc.), which persists after termination.
Enable EBS Volume Snapshots:
Schedule automated snapshots using AWS Backup or Lambda.
Restore data from a snapshot in case of failure.
Use AMIs for Backups:
Create a custom Amazon Machine Image (AMI) of your instance for easy restoration.
Use S3 for Persistent Storage:
Store important data in Amazon S3 to avoid loss during instance termination.
Check "Delete on Termination" Setting:
Modify the EBS volume attribute to prevent automatic deletion on termination.
Scenario 5: EC2 Instance Taking Too Long to Boot
Question: A newly launched EC2 instance is taking a long time to boot. How would you troubleshoot this?
Answer:
Check Instance System Logs:
View logs using:
Check AWS Console System Logs (EC2 → Actions → Monitor & Troubleshoot → Get System Log).
Verify Boot Volume Size and Type:
Ensure the root EBS volume has enough space.
Upgrade from
gp2togp3for better performance.
Check Startup Services:
If there are too many startup scripts or services, disable unnecessary ones:
Network Issues:
If the instance is waiting for a DHCP lease, manually assign a static private IP.
Try Another AMI:
The AMI might be outdated or corrupted. Use a different Amazon Linux 2 or Ubuntu AMI.
Here are more advanced EC2 scenario-based interview questions with detailed answers:
Scenario 6: Application Running on EC2 Becomes Unresponsive
Question: Your web application running on an EC2 instance suddenly becomes unresponsive. The instance is running, but you cannot access it via HTTP or SSH. How do you diagnose and fix this issue?
Answer:
Check Instance Status Checks:
Go to EC2 Console → Instance → Status Checks.
If a System Status Check has failed, restart the instance or move to a new AZ.
Check CPU and Memory Usage:
Connect using EC2 Instance Connect (if supported).
Run:
Check Disk Space:
If disk usage is 100%, clear logs or extend the volume:
Verify Security Group & NACL Rules:
Ensure port 80 (HTTP)/443 (HTTPS) is open for web access.
Ensure port 22 (SSH) is open for admin access.
Check Web Server Logs:
If Apache/Nginx crashed, restart it:
Restore from Snapshot:
If the instance is corrupted, create a new instance and attach an EBS snapshot of the original volume.
Scenario 7: EC2 Instance with High Disk Read/Write Latency
Question: Your EC2 instance is experiencing high disk latency, slowing down the database and application. How would you investigate and fix this?
Answer:
Check Disk Utilization:
Use iostat to check IOPS and latency:
Check EBS Volume Type:
If using
gp2, upgrade togp3orio1/io2for better performance.Increase provisioned IOPS if using
io1/io2.
Optimize Application Reads/Writes:
Enable database indexing and caching.
Use Amazon RDS if a managed database is needed.
Enable EBS Optimization:
Ensure instance type supports EBS-optimized volumes for better throughput.
Scenario 8: EC2 Instance Running Out of Private IPs in VPC
Question: You have an EC2 instance in a private subnet that needs to make outbound internet calls, but it cannot resolve domain names or connect externally. What is the issue and how do you resolve it?
Answer:
Check if a NAT Gateway is Set Up:
Private instances cannot access the internet unless a NAT Gateway or NAT Instance is set up in the public subnet.
Verify Route Table Configuration:
Route table should have a route like:
Check Security Group & NACL Rules:
Allow outbound access for HTTP (80), HTTPS (443).
Use VPC Endpoints (For AWS Services):
Instead of public internet access, use VPC Endpoints for services like S3, DynamoDB.
Scenario 9: EC2 Auto Scaling Group Not Launching New Instances
Question: You have an Auto Scaling Group (ASG) with a desired count of 3 instances, but it is stuck at 1 instance. What could be the issue?
Answer:
Check ASG Activity Logs:
In AWS Console:
EC2 → Auto Scaling Groups → Activity History
Verify Launch Template or Launch Configuration:
Ensure the AMI exists and the IAM role allows instance creation.
Check VPC & Subnet Availability:
If the selected subnet has no free IPs, instances won't launch.
Try launching a test instance in the same subnet.
Check Spot Instance Availability (If Used):
If using Spot Instances, AWS may not have capacity at the bid price.
Switch to On-Demand Instances temporarily.
Increase Service Quotas:
Check EC2 instance limits for your AWS region.
Request an increase if needed.
Scenario 10: EC2 AMI Deployment Results in Misconfigured Instances
Question: You created an AMI from a working EC2 instance and launched multiple new instances from it. However, the new instances are misconfigured and not working properly. How do you troubleshoot?
Answer:
Check AMI Customization:
Verify that the AMI retains necessary configurations (e.g., SSH keys, network settings).
Manually test the AMI by launching a single instance before scaling.
Ensure User Data Scripts Execute Correctly:
If your instance setup relies on a User Data script, check logs:
Verify Static Configuration Issues:
If the old instance had hardcoded IPs, hostnames, or certificates, update them dynamically.
Use EC2 metadata service to fetch instance-specific details:
Check IAM Role and Instance Profile:
If the new instance needs AWS service access, assign the correct IAM role.
Scenario 11: AWS EC2 Instance Costs Are Too High
Question: Your AWS bill is higher than expected, and you suspect EC2 costs are a major factor. How do you reduce costs without affecting performance?
Answer:
Use Reserved or Spot Instances:
Convert On-Demand instances to Reserved Instances (RI) for savings.
Use Spot Instances for non-critical workloads.
Right-size Instances:
Use AWS Compute Optimizer to find cost-effective instance types.
If CPU and RAM are underutilized, downgrade (
m5.large → m5.medium).
Use Auto Scaling to Optimize Costs:
Configure ASG to scale down during off-peak hours.
Terminate Unused Instances:
Use AWS Trusted Advisor to find idle instances.
Set up Auto Stop for dev/test instances using AWS Lambda.
Optimize Storage:
Move infrequent-access data to S3 Glacier instead of keeping large EBS volumes.
Use gp3 instead of gp2 for better price-performance ratio.
Scenario 12: EC2 Instance with Long-running Background Processes Fails After Reboot
Question: You have an EC2 instance running a background job that needs to restart automatically after reboot, but it does not start. What steps would you take?
Answer:
Use Systemd Services:
Create a systemd service file:
Add:
Enable and start the service:
Use AWS Systems Manager (SSM):
If manual SSH access is unreliable, use SSM Session Manager for automatic management.
These real-world EC2 interview scenarios cover performance, security, networking, cost optimization, and automation. Let me know if you need even more complex cases! 🚀
Last updated