Courier partner API

chevron-rightAPI testing imageshashtag

SELECT QURIER

after courier allocated order will show in ready to ship

chevron-rightAPI request failed scenariohashtag

Excellent question, Pavan 👏 — this is a key challenge in logistics aggregator platforms like Shiprocket, because they rely on unreliable third-party courier APIs that can fail, time out, or return inconsistent data.

Let’s go step by step on how Shiprocket (or any similar aggregator like Pickrr or NimbusPost) handles API failures robustly while maintaining reliability and good user experience 👇


⚙️ 1. Retry Mechanism (with Backoff)

When a booking or tracking API call to a courier partner fails, Shiprocket does not fail the entire workflow immediately.

Instead, they implement:

  • Automatic retries using exponential backoff Example: Retry after 2s → 4s → 8s → 16s (max 3–5 attempts)

  • Circuit breaker pattern — if multiple requests to the same courier keep failing, temporarily stop calling that courier’s API and mark it unavailable.

🧠 Tools used:

  • Retry libraries (e.g., Resilience4j, or in Go/Rust custom retry logic)

  • Background job queues (like Celery, Sidekiq, or Kafka consumers)


📨 2. Queue-Based Request Handling

Every courier API call (like Create Shipment, Track Order) is not synchronous — it goes through a message queue.

Flow:

  1. Order request enters Kafka/RabbitMQ queue

  2. Worker service picks it up and sends API request to courier

  3. If the courier API fails → worker retries or requeues the message with delay

  4. Status stored in DB as pending_retry

This ensures no data loss and eventual consistency, even if couriers are down temporarily.

🧩 Benefit: API failure doesn’t block the user. Seller immediately gets a “shipment under processing” response while retries happen in the background.


💾 3. Fallback to Alternate Courier

If a courier consistently fails (API timeout or bad response), Shiprocket can:

  • Re-route the shipment request to another available courier based on pre-defined rules:

    • Cheapest available

    • Fastest available

    • Reliable score (historical SLA success rate)

This logic is implemented inside the Courier Selection Engine, which maintains health scores for each courier partner.


📊 4. Failure Logging & Monitoring

All failed API interactions are logged in detail:

  • Courier name

  • Request payload

  • Response or error

  • Retry count

  • Last attempt time

These logs feed into:

  • ELK Stack (Elasticsearch + Kibana) for search

  • Prometheus + Grafana dashboards for metrics like:

    • API success rate per courier

    • Latency

    • Failure trend

📢 Alerts are sent (Slack/Email) if a courier’s API failure rate crosses a threshold (e.g., 10% in last 5 minutes).


🧰 5. Dead Letter Queue (DLQ)

If all retry attempts fail (e.g., courier API is down for hours):

  • The message is pushed into a Dead Letter Queue

  • The operations/support team manually reviews and reprocesses these shipments later

This prevents permanent data loss and ensures manual intervention is possible.


🧩 6. Idempotency & Duplicate Handling

Since retries can lead to duplicate API calls, Shiprocket uses idempotency keys:

  • A unique key (like order ID or AWB request ID) is sent with each API call.

  • If the courier processes the same request again, it safely ignores duplicates.

This avoids accidental multiple bookings or charges.


🔒 7. Transactional Integrity

Important steps (booking, label generation, tracking update) are handled via transactional states in DB:

order_status = [CREATED, BOOKING_PENDING, BOOKED, FAILED, RETRYING, COMPLETED]

This ensures the system knows exactly where the failure happened and can resume safely.


🔁 8. Async Tracking Updates

Tracking updates are not dependent on live API calls. Shiprocket uses:

  • Webhooks (preferred, near real-time)

  • Polling (backup) — if webhooks fail or are delayed, Shiprocket polls every X minutes to keep status fresh.

This ensures that tracking reliability is high even if some courier’s webhook system is unstable.


🧠 Example Scenario

Case: Booking with Delhivery API Fails

  1. Shiprocket sends POST /shipments to Delhivery API → Timeout

  2. Retry after 5s → Still fails

  3. Retry again after 20s → Success ✅

  4. Status in DB changes from BOOKING_PENDING → BOOKED

  5. Label generated and sent to seller

If all retries fail → fallback to XpressBees → label regenerated.


💡 Summary Table

Failure Type
Handling Strategy
Tool/Pattern

API timeout

Retry with backoff

Resilience4j / custom logic

API down

Circuit breaker

Health check + disable temporarily

All retries failed

Dead Letter Queue

Kafka / RabbitMQ DLQ

Courier unstable

Switch courier

Courier selection engine

Duplicate request

Idempotency key

UUID per shipment

Tracking API failure

Polling fallback

Cron / Scheduler service


Would you like me to draw a diagram of this failure-handling architecture (showing queues, retry flow, DLQ, etc.)? It’s a great visual for understanding how such systems maintain reliability.

chevron-rightAPI Request caseshashtag

Perfect follow-up, Pavan 👏

When Shiprocket connects to multiple third-party courier APIs, there are a lot of things that can go wrong — both from Shiprocket’s side (bad data, wrong payload) and the courier’s side (downtime, throttling, etc.).

Here’s a breakdown of the most common reasons API requests fail in such logistics integrations — grouped by type 👇


⚙️ 1. Client-Side (Shiprocket) Issues

These happen when Shiprocket sends something invalid or incomplete to the courier API.

Cause
Description
Example / Typical Error

Invalid payload / missing fields

Required parameters (e.g., pincode, weight, dimensions) missing or wrong format

400 Bad Request – Missing parameter: destination_pincode

Invalid authentication

Wrong API key, expired token, or incorrect headers

401 Unauthorized or 403 Forbidden

Invalid address / pincode not serviceable

Courier API rejects shipment creation for unsupported areas

422 Unprocessable Entity – Service not available for given pincode

Invalid weight or dimensions

Exceeds courier limit or invalid numeric value

400 Bad Request – Weight out of range

Duplicate request (same AWB)

Resending the same booking without idempotency key

409 Conflict – Duplicate shipment request

Unsupported content type

Sending JSON instead of XML (or vice versa)

415 Unsupported Media Type

Serialization error

Malformed JSON/XML in the request body

Malformed JSON at line 1 column 50


🧩 2. Courier-Side (Third-Party API) Issues

These are issues with the courier partner’s systems — Shiprocket can’t control them but needs to handle gracefully.

Cause
Description
Example / Typical Error

API downtime / maintenance

Courier system temporarily unavailable

503 Service Unavailable

Rate limiting / throttling

Too many requests sent in a short time

429 Too Many Requests

High latency / timeout

Courier server takes too long to respond

504 Gateway Timeout

Internal server error

Courier system crashes or has a bug

500 Internal Server Error

Partial service outage

Some endpoints work (e.g., tracking), others fail (e.g., booking)

50% success rate seen in monitoring

Inconsistent response format

Courier API changes response fields unexpectedly

JSON key mismatch or missing AWB in response


🧰 3. Network / Infrastructure Issues

Problems in the communication path between Shiprocket and courier systems.

Cause
Description
Example / Typical Error

DNS resolution failure

Courier API domain not resolvable

Could not resolve host: api.delhivery.com

SSL certificate errors

Expired or invalid SSL cert

SSLHandshakeException

Connection reset / dropped

Network connection broken mid-request

Connection reset by peer

Firewall / proxy blocking

Shiprocket’s IP blocked or filtered

No response / TCP timeout

Cloudflare or CDN blocking

Rate limiting or geo-blocking from CDN layer

403 Forbidden – Access denied


📦 4. Data Integrity / Business Logic Issues

When the API request itself succeeds technically, but the business validation fails.

Cause
Description
Example / Typical Error

Invalid courier code

Courier not onboarded or inactive

400 – Unknown courier code

Invalid order status

Booking attempted for canceled order

409 Conflict – Order already canceled

COD limit exceeded

COD amount > partner’s allowed limit

422 – COD limit exceeded (₹50,000)

Restricted item type

Product type not allowed by courier

422 – Dangerous goods not allowed

Invalid pickup address

Pickup location not verified / approved

422 – Pickup not available for this address

Duplicate AWB assignment

AWB number already used

409 – Duplicate airway bill


🧠 5. Shiprocket Internal Service Issues

Failures within Shiprocket’s microservices before the request even reaches the courier.

Cause
Description
Example / Typical Error

Internal timeout (upstream delay)

Rate engine or courier service slow to respond

504 Upstream timeout

Cache inconsistency

Old courier token or invalid rate data in Redis

Incorrect courier selection

Queue processing failure

Kafka/RabbitMQ consumer crashed

Shipment stuck in pending

Database lock / transaction failure

Concurrent shipment updates

500 Internal Error – Deadlock detected

Code regression / bad deployment

New release introduces payload bug

Sudden spike in courier API failures


🧩 6. Tracking API-Specific Failures

Tracking is continuous, so failures here are also common.

Cause
Description
Example / Typical Error

Courier webhook not firing

Courier didn’t send tracking updates

No status update for >24 hrs

Webhook signature mismatch

Invalid or missing signature in callback

Ignored update for security

Tracking ID expired or not found

Courier deleted shipment record

404 – Tracking not found

Poll frequency too high

Over-polling courier API for tracking

429 – Rate limit exceeded


📈 7. Integration Lifecycle Issues

These occur during or after new courier integration.

Cause
Description

Courier API version upgrade → Shiprocket still on old version

Schema changes (field renamed or removed)

Deprecated endpoints not yet updated

Sandbox vs Production URL confusion

Missing authentication key for new environment


⚙️ In Real Life

Shiprocket logs error categories and response codes in a central monitoring system (like ELK + Grafana). They often have dashboards such as:

Courier API Success Rate (Last 24h)
-----------------------------------
Delhivery      97.2%
Xpressbees     99.1%
Bluedart       94.5%
EcomExpress    96.8%

Anything below ~95% triggers an alert and retries are automatically managed through queues.


🧩 In Short

Common reasons API requests fail in Shiprocket:

Category
Example

❌ Wrong payload

Missing or invalid parameters

🔑 Auth issue

Invalid token / expired key

🕓 Timeout

Courier API slow or unreachable

🚫 Rate limiting

Too many requests

🧱 Courier error

Internal server error / system maintenance

🌐 Network issue

DNS / SSL / proxy problem

⚠️ Business rule fail

Invalid pincode, COD limit exceeded

💥 Internal error

Bug in Shiprocket microservice

📦 Tracking failure

Webhook not sent / invalid tracking ID


Would you like me to show a real-world flow diagram of how Shiprocket retries + classifies these failures (e.g., failed → retry → DLQ → manual fix)? It’s a great visual to understand resilience in logistics APIs.

Last updated