Courier partner API
API request failed scenario
Excellent question, Pavan 👏 — this is a key challenge in logistics aggregator platforms like Shiprocket, because they rely on unreliable third-party courier APIs that can fail, time out, or return inconsistent data.
Let’s go step by step on how Shiprocket (or any similar aggregator like Pickrr or NimbusPost) handles API failures robustly while maintaining reliability and good user experience 👇
⚙️ 1. Retry Mechanism (with Backoff)
When a booking or tracking API call to a courier partner fails, Shiprocket does not fail the entire workflow immediately.
Instead, they implement:
Automatic retries using exponential backoff Example: Retry after 2s → 4s → 8s → 16s (max 3–5 attempts)
Circuit breaker pattern — if multiple requests to the same courier keep failing, temporarily stop calling that courier’s API and mark it unavailable.
🧠 Tools used:
Retry libraries (e.g., Resilience4j, or in Go/Rust custom retry logic)
Background job queues (like Celery, Sidekiq, or Kafka consumers)
📨 2. Queue-Based Request Handling
Every courier API call (like Create Shipment, Track Order) is not synchronous — it goes through a message queue.
Flow:
Order request enters Kafka/RabbitMQ queue
Worker service picks it up and sends API request to courier
If the courier API fails → worker retries or requeues the message with delay
Status stored in DB as
pending_retry
This ensures no data loss and eventual consistency, even if couriers are down temporarily.
🧩 Benefit: API failure doesn’t block the user. Seller immediately gets a “shipment under processing” response while retries happen in the background.
💾 3. Fallback to Alternate Courier
If a courier consistently fails (API timeout or bad response), Shiprocket can:
Re-route the shipment request to another available courier based on pre-defined rules:
Cheapest available
Fastest available
Reliable score (historical SLA success rate)
This logic is implemented inside the Courier Selection Engine, which maintains health scores for each courier partner.
📊 4. Failure Logging & Monitoring
All failed API interactions are logged in detail:
Courier name
Request payload
Response or error
Retry count
Last attempt time
These logs feed into:
ELK Stack (Elasticsearch + Kibana) for search
Prometheus + Grafana dashboards for metrics like:
API success rate per courier
Latency
Failure trend
📢 Alerts are sent (Slack/Email) if a courier’s API failure rate crosses a threshold (e.g., 10% in last 5 minutes).
🧰 5. Dead Letter Queue (DLQ)
If all retry attempts fail (e.g., courier API is down for hours):
The message is pushed into a Dead Letter Queue
The operations/support team manually reviews and reprocesses these shipments later
This prevents permanent data loss and ensures manual intervention is possible.
🧩 6. Idempotency & Duplicate Handling
Since retries can lead to duplicate API calls, Shiprocket uses idempotency keys:
A unique key (like order ID or AWB request ID) is sent with each API call.
If the courier processes the same request again, it safely ignores duplicates.
This avoids accidental multiple bookings or charges.
🔒 7. Transactional Integrity
Important steps (booking, label generation, tracking update) are handled via transactional states in DB:
order_status = [CREATED, BOOKING_PENDING, BOOKED, FAILED, RETRYING, COMPLETED]This ensures the system knows exactly where the failure happened and can resume safely.
🔁 8. Async Tracking Updates
Tracking updates are not dependent on live API calls. Shiprocket uses:
Webhooks (preferred, near real-time)
Polling (backup) — if webhooks fail or are delayed, Shiprocket polls every X minutes to keep status fresh.
This ensures that tracking reliability is high even if some courier’s webhook system is unstable.
🧠 Example Scenario
Case: Booking with Delhivery API Fails
Shiprocket sends POST
/shipmentsto Delhivery API → TimeoutRetry after 5s → Still fails
Retry again after 20s → Success ✅
Status in DB changes from
BOOKING_PENDING → BOOKEDLabel generated and sent to seller
If all retries fail → fallback to XpressBees → label regenerated.
💡 Summary Table
API timeout
Retry with backoff
Resilience4j / custom logic
API down
Circuit breaker
Health check + disable temporarily
All retries failed
Dead Letter Queue
Kafka / RabbitMQ DLQ
Courier unstable
Switch courier
Courier selection engine
Duplicate request
Idempotency key
UUID per shipment
Tracking API failure
Polling fallback
Cron / Scheduler service
Would you like me to draw a diagram of this failure-handling architecture (showing queues, retry flow, DLQ, etc.)? It’s a great visual for understanding how such systems maintain reliability.
API Request cases
Perfect follow-up, Pavan 👏
When Shiprocket connects to multiple third-party courier APIs, there are a lot of things that can go wrong — both from Shiprocket’s side (bad data, wrong payload) and the courier’s side (downtime, throttling, etc.).
Here’s a breakdown of the most common reasons API requests fail in such logistics integrations — grouped by type 👇
⚙️ 1. Client-Side (Shiprocket) Issues
These happen when Shiprocket sends something invalid or incomplete to the courier API.
Invalid payload / missing fields
Required parameters (e.g., pincode, weight, dimensions) missing or wrong format
400 Bad Request – Missing parameter: destination_pincode
Invalid authentication
Wrong API key, expired token, or incorrect headers
401 Unauthorized or 403 Forbidden
Invalid address / pincode not serviceable
Courier API rejects shipment creation for unsupported areas
422 Unprocessable Entity – Service not available for given pincode
Invalid weight or dimensions
Exceeds courier limit or invalid numeric value
400 Bad Request – Weight out of range
Duplicate request (same AWB)
Resending the same booking without idempotency key
409 Conflict – Duplicate shipment request
Unsupported content type
Sending JSON instead of XML (or vice versa)
415 Unsupported Media Type
Serialization error
Malformed JSON/XML in the request body
Malformed JSON at line 1 column 50
🧩 2. Courier-Side (Third-Party API) Issues
These are issues with the courier partner’s systems — Shiprocket can’t control them but needs to handle gracefully.
API downtime / maintenance
Courier system temporarily unavailable
503 Service Unavailable
Rate limiting / throttling
Too many requests sent in a short time
429 Too Many Requests
High latency / timeout
Courier server takes too long to respond
504 Gateway Timeout
Internal server error
Courier system crashes or has a bug
500 Internal Server Error
Partial service outage
Some endpoints work (e.g., tracking), others fail (e.g., booking)
50% success rate seen in monitoring
Inconsistent response format
Courier API changes response fields unexpectedly
JSON key mismatch or missing AWB in response
🧰 3. Network / Infrastructure Issues
Problems in the communication path between Shiprocket and courier systems.
DNS resolution failure
Courier API domain not resolvable
Could not resolve host: api.delhivery.com
SSL certificate errors
Expired or invalid SSL cert
SSLHandshakeException
Connection reset / dropped
Network connection broken mid-request
Connection reset by peer
Firewall / proxy blocking
Shiprocket’s IP blocked or filtered
No response / TCP timeout
Cloudflare or CDN blocking
Rate limiting or geo-blocking from CDN layer
403 Forbidden – Access denied
📦 4. Data Integrity / Business Logic Issues
When the API request itself succeeds technically, but the business validation fails.
Invalid courier code
Courier not onboarded or inactive
400 – Unknown courier code
Invalid order status
Booking attempted for canceled order
409 Conflict – Order already canceled
COD limit exceeded
COD amount > partner’s allowed limit
422 – COD limit exceeded (₹50,000)
Restricted item type
Product type not allowed by courier
422 – Dangerous goods not allowed
Invalid pickup address
Pickup location not verified / approved
422 – Pickup not available for this address
Duplicate AWB assignment
AWB number already used
409 – Duplicate airway bill
🧠 5. Shiprocket Internal Service Issues
Failures within Shiprocket’s microservices before the request even reaches the courier.
Internal timeout (upstream delay)
Rate engine or courier service slow to respond
504 Upstream timeout
Cache inconsistency
Old courier token or invalid rate data in Redis
Incorrect courier selection
Queue processing failure
Kafka/RabbitMQ consumer crashed
Shipment stuck in pending
Database lock / transaction failure
Concurrent shipment updates
500 Internal Error – Deadlock detected
Code regression / bad deployment
New release introduces payload bug
Sudden spike in courier API failures
🧩 6. Tracking API-Specific Failures
Tracking is continuous, so failures here are also common.
Courier webhook not firing
Courier didn’t send tracking updates
No status update for >24 hrs
Webhook signature mismatch
Invalid or missing signature in callback
Ignored update for security
Tracking ID expired or not found
Courier deleted shipment record
404 – Tracking not found
Poll frequency too high
Over-polling courier API for tracking
429 – Rate limit exceeded
📈 7. Integration Lifecycle Issues
These occur during or after new courier integration.
Courier API version upgrade → Shiprocket still on old version
Schema changes (field renamed or removed)
Deprecated endpoints not yet updated
Sandbox vs Production URL confusion
Missing authentication key for new environment
⚙️ In Real Life
Shiprocket logs error categories and response codes in a central monitoring system (like ELK + Grafana). They often have dashboards such as:
Courier API Success Rate (Last 24h)
-----------------------------------
Delhivery 97.2%
Xpressbees 99.1%
Bluedart 94.5%
EcomExpress 96.8%Anything below ~95% triggers an alert and retries are automatically managed through queues.
🧩 In Short
Common reasons API requests fail in Shiprocket:
❌ Wrong payload
Missing or invalid parameters
🔑 Auth issue
Invalid token / expired key
🕓 Timeout
Courier API slow or unreachable
🚫 Rate limiting
Too many requests
🧱 Courier error
Internal server error / system maintenance
🌐 Network issue
DNS / SSL / proxy problem
⚠️ Business rule fail
Invalid pincode, COD limit exceeded
💥 Internal error
Bug in Shiprocket microservice
📦 Tracking failure
Webhook not sent / invalid tracking ID
Would you like me to show a real-world flow diagram of how Shiprocket retries + classifies these failures (e.g., failed → retry → DLQ → manual fix)? It’s a great visual to understand resilience in logistics APIs.
Last updated













