Common issues

Excellent — this is a real-world SRE-style question that shows deep understanding of AWS networking 👏

Let’s go through common issues in AWS VPC and its components — grouped logically and phrased the way a senior engineer or interviewer would expect.


🧱 VPC — Common Issues and Pitfalls

Component

Common Issues

Explanation / Real-world Example

VPC (CIDR Block)

🔹 Overlapping CIDRs

Two VPCs with overlapping CIDRs can’t be peered or connected via VPN/Transit Gateway — common in multi-account setups.

🔹 Insufficient IP range

Picking too small a CIDR (e.g., /24) can cause subnet exhaustion later when you scale.

🔹 Wrong region or AZ mapping

Creating subnets in a region or AZ that doesn’t match your workload leads to resource launch failures.


🌐 Subnets

Issue

Explanation

Private vs Public confusion

Users forget to attach an Internet Gateway to public subnets, or a NAT Gateway to private subnets — causing connectivity issues.

Route Table misassociation

Subnets pointing to wrong route table → traffic doesn’t flow as expected.

No enough IPs for scaling

Small subnet CIDRs like /28 cause auto-scaling or ECS pod failures due to lack of available IPs.


🚏 Route Tables

Issue

Explanation

Missing default route (0.0.0.0/0)

Without this route pointing to IGW or NAT, outbound internet access fails.

Incorrect peering routes

Forgetting to add routes for VPC peering or TGW attachments means inter-VPC communication doesn’t work.

Multiple route tables confusion

Attaching the wrong route table to subnets leads to intermittent connectivity issues.


🌉 Internet Gateway (IGW) & NAT Gateway

Issue

Explanation

IGW not attached to VPC

Even if subnet route points to IGW, if IGW isn’t attached → no internet access.

Elastic IP not assigned to NAT

NAT without an EIP results in no outbound internet for private subnets.

NAT Gateway in wrong subnet

NAT must be in a public subnet with a route to IGW, not private — common beginner mistake.


🔐 Security Groups (SG)

Issue

Explanation

Inbound rule missing

SSH (22), HTTPS (443), or custom port not open → no connectivity.

Outbound rules restricted accidentally

Outbound rules set to “none” cause outbound traffic failures.

SG referencing wrong resource

Referencing instance SGs incorrectly in load balancer SGs causes denied connections.

Circular dependencies

Security groups referencing each other in complex setups cause dependency errors in Terraform/CDK.


🚧 Network ACLs (NACLs)

Issue

Explanation

Stateless nature misunderstood

People forget to allow return traffic in both directions — e.g., allowing inbound 443 but not outbound ephemeral ports.

Conflicts with SG rules

NACL denies traffic even if SG allows it — NACL is evaluated first.

Default NACL replaced accidentally

Custom NACL with “deny all” blocks everything.


🔌 VPC Peering / Transit Gateway

Issue

Explanation

Overlapping CIDRs

Prevents peering or TGW attachment — must redesign CIDR plan.

Missing route entries

Even if peering is active, if routes are not updated → traffic doesn’t flow.

No cross-region DNS resolution

Need to enable “DNS resolution over peering” for name-based access.

Peering not transitive

Instance in VPC A can’t reach VPC C through VPC B → must use TGW or hub-spoke design.


Issue

Explanation

Wrong route table association

Endpoint route missing from target subnet route tables.

DNS not resolving to endpoint

Need to enable “Private DNS” for interface endpoints.

Cross-account access failure

Interface endpoint in one account can’t access services in another without proper resource policies.


🧭 DNS & DHCP Options Set

Issue

Explanation

Custom DNS settings override Amazon DNS

Private zones stop resolving if enableDnsSupport or enableDnsHostnames is false.

Wrong DHCP Options Set

Incorrect domain-name or resolver IP breaks EC2 name resolution.


🔄 VPN & Direct Connect

Issue

Explanation

BGP misconfiguration

Wrong ASN or route advertisement causes VPN to stay down.

IKE version mismatch

One side uses IKEv2, other uses IKEv1 → tunnel won’t establish.

Redundancy not set up

Single VPN connection — if the tunnel goes down, full connectivity loss.


💡 Pro Tip (for interviews)

If they ask:

“What are common networking issues you’ve faced in AWS?”

You can say:

“Most issues come from route misconfigurations, CIDR overlaps, or misunderstood subnet roles. I usually start debugging by checking routing tables, NAT/IGW setup, and DNS resolution — since those three cover 80% of VPC-related outages.”


Would you like me to make a one-page visual summary (like a cheat sheet table) of all these VPC issues for quick interview revision or posting as LinkedIn/YouTube content?

Last updated