Common issues
Excellent — this is a real-world SRE-style question that shows deep understanding of AWS networking 👏
Let’s go through common issues in AWS VPC and its components — grouped logically and phrased the way a senior engineer or interviewer would expect.
🧱 VPC — Common Issues and Pitfalls
Component
Common Issues
Explanation / Real-world Example
VPC (CIDR Block)
🔹 Overlapping CIDRs
Two VPCs with overlapping CIDRs can’t be peered or connected via VPN/Transit Gateway — common in multi-account setups.
🔹 Insufficient IP range
Picking too small a CIDR (e.g., /24) can cause subnet exhaustion later when you scale.
🔹 Wrong region or AZ mapping
Creating subnets in a region or AZ that doesn’t match your workload leads to resource launch failures.
🌐 Subnets
Issue
Explanation
Private vs Public confusion
Users forget to attach an Internet Gateway to public subnets, or a NAT Gateway to private subnets — causing connectivity issues.
Route Table misassociation
Subnets pointing to wrong route table → traffic doesn’t flow as expected.
No enough IPs for scaling
Small subnet CIDRs like /28 cause auto-scaling or ECS pod failures due to lack of available IPs.
🚏 Route Tables
Issue
Explanation
Missing default route (0.0.0.0/0)
Without this route pointing to IGW or NAT, outbound internet access fails.
Incorrect peering routes
Forgetting to add routes for VPC peering or TGW attachments means inter-VPC communication doesn’t work.
Multiple route tables confusion
Attaching the wrong route table to subnets leads to intermittent connectivity issues.
🌉 Internet Gateway (IGW) & NAT Gateway
Issue
Explanation
IGW not attached to VPC
Even if subnet route points to IGW, if IGW isn’t attached → no internet access.
Elastic IP not assigned to NAT
NAT without an EIP results in no outbound internet for private subnets.
NAT Gateway in wrong subnet
NAT must be in a public subnet with a route to IGW, not private — common beginner mistake.
🔐 Security Groups (SG)
Issue
Explanation
Inbound rule missing
SSH (22), HTTPS (443), or custom port not open → no connectivity.
Outbound rules restricted accidentally
Outbound rules set to “none” cause outbound traffic failures.
SG referencing wrong resource
Referencing instance SGs incorrectly in load balancer SGs causes denied connections.
Circular dependencies
Security groups referencing each other in complex setups cause dependency errors in Terraform/CDK.
🚧 Network ACLs (NACLs)
Issue
Explanation
Stateless nature misunderstood
People forget to allow return traffic in both directions — e.g., allowing inbound 443 but not outbound ephemeral ports.
Conflicts with SG rules
NACL denies traffic even if SG allows it — NACL is evaluated first.
Default NACL replaced accidentally
Custom NACL with “deny all” blocks everything.
🔌 VPC Peering / Transit Gateway
Issue
Explanation
Overlapping CIDRs
Prevents peering or TGW attachment — must redesign CIDR plan.
Missing route entries
Even if peering is active, if routes are not updated → traffic doesn’t flow.
No cross-region DNS resolution
Need to enable “DNS resolution over peering” for name-based access.
Peering not transitive
Instance in VPC A can’t reach VPC C through VPC B → must use TGW or hub-spoke design.
☁️ Endpoints (S3, DynamoDB, PrivateLink)
Issue
Explanation
Wrong route table association
Endpoint route missing from target subnet route tables.
DNS not resolving to endpoint
Need to enable “Private DNS” for interface endpoints.
Cross-account access failure
Interface endpoint in one account can’t access services in another without proper resource policies.
🧭 DNS & DHCP Options Set
Issue
Explanation
Custom DNS settings override Amazon DNS
Private zones stop resolving if enableDnsSupport or enableDnsHostnames is false.
Wrong DHCP Options Set
Incorrect domain-name or resolver IP breaks EC2 name resolution.
🔄 VPN & Direct Connect
Issue
Explanation
BGP misconfiguration
Wrong ASN or route advertisement causes VPN to stay down.
IKE version mismatch
One side uses IKEv2, other uses IKEv1 → tunnel won’t establish.
Redundancy not set up
Single VPN connection — if the tunnel goes down, full connectivity loss.
💡 Pro Tip (for interviews)
If they ask:
“What are common networking issues you’ve faced in AWS?”
You can say:
“Most issues come from route misconfigurations, CIDR overlaps, or misunderstood subnet roles. I usually start debugging by checking routing tables, NAT/IGW setup, and DNS resolution — since those three cover 80% of VPC-related outages.”
Would you like me to make a one-page visual summary (like a cheat sheet table) of all these VPC issues for quick interview revision or posting as LinkedIn/YouTube content?
Last updated