Parallel API Calls: How to Cut Response Times by 60% Without Changing Your Backend
If your application makes three API calls to compose a single page or response, and each call takes 100ms, your total response time is either 300ms or 100ms — depending entirely on whether those calls run sequentially or in parallel. That is the core of the parallel API execution story, and it is both simpler and more impactful than most teams realise.
In composable architectures, BFF layers, and microservice-based frontends, it is extremely common to fan out to multiple upstream services for a single user-facing response. Product pages need pricing, inventory, and catalogue data. Checkout flows need tax calculations, shipping estimates, and inventory confirmation. Dashboards need data from multiple reporting services. When these upstream calls are made sequentially, latency adds up linearly. When they are made in parallel, latency is determined by the slowest call — not the sum of all calls.
This post examines how parallel API execution works, when it is safe to apply, when it creates problems, and how an orchestration layer implements it correctly in production — including graceful handling of partial failures, timeouts, and circuit breaking.
Sequential vs. Parallel: The Numbers
To understand the impact of parallel execution, consider a typical e-commerce product detail page that requires five upstream API calls:
Sequential execution (total: ~530ms)
- → Product catalogue API 120ms
- → Pricing service 95ms
- → Inventory API 110ms
- → Recommendation engine 140ms
- → Shipping estimator 65ms
- Total: 530ms
Parallel execution (total: ~140ms)
- ↗ Product catalogue API 120ms
- ↗ Pricing service 95ms
- ↗ Inventory API 110ms (running concurrently)
- ↗ Recommendation engine 140ms
- ↗ Shipping estimator 65ms
- Total: 140ms (limited by slowest call)
The result: a 74% reduction in total response time without modifying a single backend service. The upstream services are unchanged — only the coordination layer between the client and the services has been optimised. This is the compounding value of an orchestration layer: performance improvements are systemic, not per-service.
In practice, the gains vary by workload. If only two of five calls have data dependencies on each other, you can parallelise the independent three and still capture most of the benefit. Real-world improvements of 40–70% response time reduction are common when moving from uncoordinated sequential client calls to a parallel orchestration layer.
The Fan-Out / Fan-In Pattern
The architectural pattern behind parallel API execution is called fan-out / fan-in. The orchestration layer receives a single inbound request, fans out to multiple upstream services simultaneously, then fans in by waiting for all responses, composing the data, and returning a single unified response to the caller.
The fan-out phase is where parallelism happens. All independent upstream calls are initiated at the same time. The fan-in phase is where the orchestrator waits for the slowest upstream call to complete, then merges the results. This merge step can include field mapping, data transformation, conditional logic (show a fallback if a service returns an error), and response shaping.
The critical design question in the fan-in phase is: what happens when one upstream call fails or times out while others succeed? This is where naive parallel implementations break down, and where a production-grade orchestration layer provides significant value.
Handling Partial Failures in Parallel Calls
When you run five API calls in parallel and one fails, you have several options: fail the entire response, return partial data with an indication of what is missing, use a cached fallback for the failed call, or return a degraded response that omits the failed component. Which option is correct depends entirely on the business context.
For a product page, if the recommendation engine fails, you probably want to return the page without recommendations rather than fail the entire request. If the pricing service fails, you probably want to fail the request — showing a product without a price is worse than showing an error page. These policies are business decisions, and they should be configurable in the orchestration layer rather than hard-coded into every client.
A well-designed orchestration platform allows you to configure per-upstream-call failure behaviour: which calls are required (failure propagates to the response), which are optional (failure returns a default or empty value), and which should trigger a fallback from cache. This configuration lives in the orchestration workflow, not in client code — so the policy is consistent regardless of which client (web, mobile, partner API) triggers the orchestrated request.
When Parallel Execution Does Not Apply
Not all API calls in a workflow can be parallelised. Data dependencies between calls create a natural sequencing requirement: if call B needs the result of call A as an input, they must run sequentially. Understanding these dependencies is the first step in designing an effective parallel execution strategy.
Common patterns where sequential execution is required:
- Identity resolution before data fetch: An authentication or session validation call must complete before any data calls that require the resolved user identity.
- Cart creation before item addition: A workflow that creates a cart and then adds items must complete the creation before the item addition calls.
- Order placement before payment capture: The order must exist before a payment can be associated with it.
- Conditional branches: If a call's result determines which downstream calls are made, those downstream calls cannot run until the determining call completes.
The good news is that most workflows are not purely sequential. They contain a mix of sequential and parallel stages. A checkout flow might require sequential steps for cart validation → order creation, but the order creation step itself can fan out in parallel to tax calculation, inventory reservation, and fraud screening — all of which are independent of each other.
Identifying the dependency graph of your workflows and parallelising independent branches is the practical work of performance optimisation at the orchestration layer.
Timeouts and Circuit Breaking in Parallel Flows
In a sequential call chain, a slow upstream service makes the entire chain slow. In a parallel call pattern, a slow upstream service holds the fan-in phase open until it either responds or times out. Without explicit timeout configuration, a single slow service can hold all your parallel calls hostage — causing the parallel response time to degrade to the sequential worst case.
Timeout configuration is therefore essential in parallel orchestration. Each upstream call should have an individually configured timeout. When a call exceeds its timeout, the orchestration layer should apply the configured failure policy — fail the request, use a cached fallback, or return a default value — and proceed with composing the response from the calls that did succeed.
Circuit breaking extends this further. If a particular upstream service is consistently timing out or returning errors, a circuit breaker can stop calling it temporarily — preventing the orchestration layer from wasting connection budget and adding latency on calls that are known to fail. The circuit reopens after a configured recovery window, allowing the service to resume when it has recovered.
Timeout and circuit breaker configuration per upstream call
- Connect timeout: Maximum time to establish a connection to the upstream service
- Read timeout: Maximum time to wait for a response once connected
- Overall call timeout: Hard ceiling on total call duration including retries
- Circuit open threshold: Error rate or count that triggers circuit opening
- Recovery window: Time before the circuit allows a probe request to test recovery
Caching as a Force Multiplier for Parallel Performance
Parallel execution reduces latency by eliminating wait time between sequential calls. Caching eliminates latency by serving responses from memory instead of making upstream calls at all. When combined, these two patterns create a dramatically faster system.
In an orchestration layer, caching can be applied per upstream call with configurable TTLs. A product catalogue that changes infrequently might be cached for 5 minutes, while pricing data that changes frequently might have a 10-second TTL. Inventory data that must always be fresh might bypass the cache entirely.
Because caching is managed at the orchestration layer, all consumers benefit automatically. If the web frontend and the mobile app both request the same product page through the orchestrator, the second request benefits from the cache warmed by the first. In a point-to-point architecture, each client maintains its own cache independently — meaning cache warm-up happens independently for each consumer and the downstream services see full request volumes from each.
Cache hit rates at the orchestration layer also compound across the parallel fan-out. If 4 of 5 upstream calls are served from cache, only one actual upstream request is made — meaning the response time is determined by the one cache miss, not by any realistic network latency scenario.
Measuring the Impact: What to Instrument
To understand the real performance impact of parallel execution in your system, you need to measure the right metrics at the right level. Generic application latency metrics (p50, p95, p99 response times at the edge) are useful for detecting regressions but insufficient for diagnosing the contribution of individual upstream calls to total latency.
An effective orchestration observability setup tracks:
- Per-upstream call latency: How long each upstream service takes to respond, broken down by percentile (p50, p95, p99). This identifies which services are the bottleneck in your fan-out pattern.
- Fan-in wait time: The difference between when the first upstream call completes and when the last one does. High fan-in wait time indicates that one slow service is holding your response.
- Cache hit rate per upstream: The fraction of requests served from cache vs. hitting the upstream service. Low hit rates on cacheable data indicate misconfigured TTLs.
- Partial failure rate: How often a parallel fan-out completes with one or more upstream calls failing. High rates indicate a reliability problem in a specific upstream service.
- Total orchestration latency vs. sum of upstream latencies: Comparing these two numbers tells you the overhead introduced by the orchestration layer itself. A well-implemented orchestrator adds 5–15ms of overhead, not hundreds of milliseconds.
Implementing Parallel Execution Without Building It Yourself
Building a parallel API orchestration layer from scratch requires more than just spawning goroutines or chaining Promise.all() calls. Production-grade parallel execution needs configurable timeouts per call, partial failure policies, circuit breaking, response caching, credential management, execution tracing, schema validation, and alerting when upstream services degrade. Assembling all of this reliably from first principles is a significant engineering investment that delivers no direct business value — it is pure infrastructure.
API orchestration platforms like Apitide provide this as a managed layer. You define your fan-out workflow declaratively — which upstream services to call, what parameters to pass, how to transform and merge responses, and what to do on failure — and the platform handles the execution engine, connection pooling, caching, retry logic, and observability automatically. This means your team captures the performance benefits of parallel execution without the cost of building and operating the underlying infrastructure.
The result is that adding a new upstream service to a parallel workflow is a configuration change, not a development sprint. Performance characteristics are immediately visible in the execution logs. And when an upstream service degrades, the circuit breaker and partial failure policies you configured handle it automatically — your on-call engineer sees a clear alert rather than a cascading incident.
True Parallel Execution
Fan out to multiple upstream services simultaneously. Response time is determined by the slowest call, not the sum of all calls — delivering 40–70% latency reductions on typical multi-service pages.
Per-Call Timeout Control
Configure connect, read, and overall timeouts per upstream call. Slow services are isolated — they cannot hold the entire fan-in phase hostage when a failure policy is in place.
Execution Timeline Visibility
Every parallel execution produces a detailed timeline showing which upstream calls ran concurrently, how long each took, and where fan-in time was spent. Performance bottlenecks surface immediately.
Partial Failure Policies
Define per-call failure behaviour: required calls fail the response, optional calls degrade gracefully, and cacheable calls fall back to stale data. Consistent policy applied across all consumers without client-side logic.
Ready to cut your API response times?
Apitide's parallel execution engine fans out to multiple upstream services simultaneously, delivering sub-100ms composite responses with built-in caching, configurable timeouts, and complete execution visibility.