Ayhan Sipahi 2026-06-18

Keeping a GraphQL BFF Standing When Its Dependencies Fall Over

A GraphQL gateway over many microservices is your highest-leverage and most fragile service. Resilience comes from connector isolation, batching, cost limits and degradation.

A typed GraphQL gateway acting as a Backend-for-Frontend collapses dozens of REST microservices into one schema the client queries. That makes it the single highest-leverage service you run and also the most failure-prone, because one slow or down dependency can take the whole graph with it. Resilience here is an operational property of how the gateway talks to its dependencies, not a property of its schema, so the controls that matter are connector isolation, batching, cost limiting, and degradation, adopted in that order.

The Fan-Out Failure Shape

A single GraphQL query fans out to many downstream calls. The gateway cannot return until enough of those calls resolve. So one slow service stalls the whole response, and meanwhile it ties up the event loop and the connection pool that every other query also needs.

The naive failure modes follow from this shape. A resolver that issues one HTTP call per item in a list multiplies downstream load (the N+1 problem), and it does so under exactly the traffic spikes you cannot afford it. A deeply nested or recursive query can ask for an unbounded amount of work. And a single 500 from one non-critical dependency blanks an entire page, even when the rest of the data was available. None of these are schema problems. They are problems in how the gateway talks to its dependencies.

So the foundation is not a prettier schema. It is connector-per-service isolation, with batching, cost limiting, and partial-data degradation layered on top. The sections below take them in the order you should adopt them.

Connector-Per-Service Isolation

Each downstream gets its own connector object. That connector owns four things: a circuit breaker, a retry-with-backoff policy, an HTTP agent with keep-alive (connection reuse), and a DNS cache. The point is blast-radius containment. When one dependency fails, its breaker trips, the gateway stops hammering a service that is already down, and the other connectors keep serving.

The circuit breaker has three states. Closed lets traffic through. After a threshold of failures it trips to open, and calls fail fast without touching the downstream. After a cooldown it moves to half-open and sends a few trial requests. If those succeed, it closes again; if not, it reopens. The half-open probe is the part people forget, and without it the breaker stays open forever.

No single library bundles all four concerns, and that is a feature, not a gap. opossum is a widely-used standalone Node circuit breaker that wraps any async function. It owns the breaker and the per-call timeout. You compose retry, keep-alive, and DNS caching around the function it wraps, which keeps the connector honest: every layer is visible in one place rather than hidden inside a black-box client.

const CircuitBreaker = require("opossum");
const https = require("https");

// Keep-alive reuses connections instead of paying TCP + TLS + DNS per call.
// maxSockets is the per-connector socket cap (see the bulkhead section).
const agent = new https.Agent({ keepAlive: true, maxSockets: 50 });

// The raw downstream call: one HTTP request to the catalog service.
async function callCatalog(path) {
  const res = await fetch(`https://catalog.internal${path}`, { agent });
  if (!res.ok) throw new Error(`catalog ${res.status}`);
  return res.json();
}

// Trip once half of recent calls fail; these are illustrative
// defaults to tune against your own traffic, not measured truth.
const options = {
  timeout: 3000,                 // fail the call if it takes longer than 3s
  errorThresholdPercentage: 50,  // trip once 50% of requests in the window fail
  resetTimeout: 30000,           // after 30s, allow a half-open trial request
  rollingCountTimeout: 10000,    // stats window
  volumeThreshold: 20,           // minimum requests in the window before tripping
};

const catalogBreaker = new CircuitBreaker(callCatalog, options);
// Serve degraded data when the breaker is open instead of throwing.
catalogBreaker.fallback(() => ({ products: [], degraded: true }));
catalogBreaker.on("open", () => {/* emit a breaker-open metric here */});

// breaker.fire(path) resolves with the result, or the fallback when open.

Treat those breaker numbers as starting points, not measured truth. A breaker that trips too eagerly causes false trips and self-inflicted outages; one that trips too late lets a dying dependency drag everything down. You tune them against your own traffic.

Two pieces of opossum carry into later sections. Its .fallback() is the hook you wire to graceful degradation: when the breaker is open, fire() resolves with the fallback value instead of throwing, so a down dependency returns empty-but-valid data rather than a 500. Its open event is the metric you watch to know a breaker has tripped. The connector code and the degraded-mode policy are the same mechanism seen from two ends.

What opossum does not give you is retry, keep-alive, or DNS caching. You add those yourself. Retry wraps the call below the breaker (with the safety rules in the next section). The keep-alive agent and a cached resolver are not optional polish: a new connection per request pays a TCP handshake, a TLS handshake, and a DNS lookup on every hop, and across a fan-out of many calls per query that overhead and its failure surface add up. A keep-alive https.Agent and a cached resolver remove most of it.

Per-Dependency Bulkheads

The circuit breaker handles a dependency that is down. It does nothing for one that is up but slow. A downstream that answers in eight seconds instead of eighty milliseconds never trips a breaker; it just holds connections and event-loop turns. If every connector draws from one shared socket pool, that one slow service can consume the whole pool, and then every other field, including the healthy ones, queues behind it. The breaker protects you from failure. The bulkhead protects you from contention.

The fix is resource isolation per dependency, borrowed from the bulkhead pattern: partition shared capacity so one consumer cannot exhaust it for the rest. In Node terms, that means two caps per connector.

The first is a socket cap. Each connector gets its own https.Agent with a maxSockets limit, so the catalog service can open at most, say, fifty connections no matter how slow it gets. The pricing service has its own fifty, drawn from a separate pool. A slow catalog cannot starve pricing, because they never share sockets in the first place.

The second is a concurrency cap. A semaphore or a p-limit-style gate bounds how many calls to a given downstream can be in flight at once. Calls past the limit wait briefly or fail fast rather than piling onto a struggling dependency. This also bounds memory: a slow service can hold at most N in-flight requests, not the entire incoming queue.

const pLimit = require("p-limit");

// One gate per downstream. The catalog connector may have at most
// 20 calls in flight; the rest queue or fail fast.
const catalogGate = pLimit(20);

function fireCatalog(path) {
  // The breaker still owns timeout and failure tracking;
  // the gate owns concurrency. They compose, they do not overlap.
  return catalogGate(() => catalogBreaker.fire(path));
}

Concern	Mechanism	What it prevents
Dependency is down	Circuit breaker	Hammering a service that cannot answer
Dependency is slow	Concurrency gate + socket cap	One slow service hogging shared capacity
Connection churn	Keep-alive agent	TCP/TLS/DNS cost per call
Total latency	Request deadline	One slow hop blowing the request budget

The breaker and the bulkhead compose cleanly because they catch different failures. The breaker stops calling a dependency that is returning errors. The bulkhead stops a dependency that is answering slowly from monopolizing sockets and concurrency. A connector wants both: without the breaker, a hard-down service wastes every call on a guaranteed failure; without the bulkhead, a soft-slow service quietly drags the whole gateway into timeouts.

Timeout Budgets and Retry Safety

Isolation decides which dependency suffers when one misbehaves. Timeout budgets decide how long the whole request is allowed to suffer at all. A GraphQL query is only as fast as its slowest required field, so the gateway needs a request-level deadline that bounds total latency, and each connector’s own timeout must sit comfortably below it.

The rule is that downstream timeouts are shorter than the client-facing deadline, and they shrink as you go deeper. If the client deadline is three seconds and a resolver calls two services in sequence, neither downstream timeout can be three seconds; together they would blow the budget with no margin for the gateway’s own work. Propagate a remaining-time budget into each connector and let it cap its own timeout at the smaller of its default and the time left. When the budget is spent, fail fast and degrade rather than holding the client past the deadline for data that will arrive too late to matter.

Retry needs the same discipline, because a careless retry turns a struggling dependency into a dead one. Three rules keep it safe. First, only retry idempotent operations: a GET is safe to repeat, a non-idempotent mutation is not, and retrying it risks double-applying a side effect. Second, retry with jitter, not a fixed delay, so a fleet of gateways does not synchronize its retries into a thundering herd that hits the recovering service all at once. Third, never retry through an open breaker. The breaker exists precisely to stop calls to a failing dependency; a retry loop layered on top defeats it and produces the retry-storm failure mode, where each client multiplies its load on a service that is already falling over. Retry sits below the breaker, so an open breaker short-circuits the retry entirely.

N+1 Batching With DataLoader

Connector isolation contains failures. It does nothing about volume. A resolver that loops over a list and fires one downstream call per element generates the N+1 explosion, and that is where DataLoader comes in. DataLoader coalesces per-item calls made within a single tick into one batched call, and it memoizes results for the duration of the request.

const DataLoader = require("dataloader");

// Created PER REQUEST, never shared across requests.
function createLoaders(connectors) {
  return {
    productById: new DataLoader(async (ids) => {
      // One batched downstream call for all ids in this tick.
      const products = await connectors.catalog.getProductsByIds(ids);
      const byId = new Map(products.map((p) => [p.id, p]));
      // Return one result per id, in the same order.
      return ids.map((id) => byId.get(id) ?? null);
    }),
  };
}

The lifecycle discipline is the whole point, and it is also a security boundary. The README is explicit: “Typically, DataLoader instances are created when a Request begins, and are not used once the Request ends.” And the risk if you ignore that: “Avoid multiple requests from different users using the DataLoader instance, which could result in cached data incorrectly appearing in each request.” A shared loader is not a slow cache. It is a cross-user data leak, which is a security bug.

One more misread to head off: DataLoader’s cache “does not replace Redis, Memcache, or any other shared application-level cache.” It is per-request memoization, scoped to one query, gone when the request ends. If you need a cache that survives requests, that is a separate layer.

Query-Cost Limiting

Batching tames the queries you expect. Cost limiting handles the ones you do not. It is admission control: score each incoming query for depth and complexity against a budget, and reject anything over the budget before execution begins. A deeply nested or recursive query, whether deliberately abusive or accidentally generated by a client, never gets to spend the gateway’s resources.

graphql-cost-analysis is one concrete example. You give it a maximumCost, a costMap that assigns per-field cost without needing @cost directives in the schema, and an onComplete(cost) hook for logging or session-based rate limiting. Queries above maximumCost are rejected before they run.

This is a genuine “which library” decision, not a settled one. graphql-cost-analysis is well known but has not seen frequent recent releases, so verify its maintenance status before you make it a primary dependency. The actively-maintained alternatives are worth weighing: graphql-query-complexity is more current, graphql-depth-limit handles the simpler depth-only case, and Apollo now ships built-in demand control with @cost directives in Apollo Router and Server. The control you want is static cost analysis before execution; the package that provides it for you is the part to evaluate against current maintenance.

Cost limiting pairs well with persisted queries, an allowlist of known-good operations the client may send. Apollo’s persisted queries documentation treats them as both an admission-control and a stability lever: if only registered queries are allowed, arbitrary expensive ones never reach the executor at all. A JIT executor exists as an advanced lever for hot queries that dominate CPU, but most gateways never need it; the standard graphql-js executor is the safe default and keeps improving, so reach for execution-level tricks only when profiling names interpretation overhead as the bottleneck.

Graceful Degradation With Partial Data

The first four controls keep the gateway healthy. The last one decides what the client sees when a dependency fails anyway. GraphQL has a spec-level answer: a response can carry data for the fields that resolved and an errors array for the ones that did not, instead of a top-level 500. When one optional service is down, the page renders the rest of its data in a degraded state rather than going blank.

This is where the connector’s circuit breaker becomes a degradation mechanism, not just a protection one. opossum’s .fallback() is exactly this seam: when the breaker is open, the wrapped call resolves with a fallback value instead of throwing. Wire the fallback to return empty-but-valid data for an optional dependency, and the resolver above it sees data rather than an exception, so the field renders in a degraded state instead of poisoning the whole response with a 500.

To make degradation deliberate rather than accidental, model errors as part of the schema where it matters. The GraphQL specification’s Errors section defines how partial results and an errors array coexist in one response, and Apollo’s error-handling guide shows how to express error states as types and unions, so a client can pattern-match on a known shape instead of parsing a stringly-typed errors array. The first design step is to classify each dependency as critical or optional. A failed critical field may still warrant a hard error; an optional one should degrade. Treating every dependency as critical is the mistake that turns one optional 500 into a blank page.

One supporting technique sharpens this without changing the core shape. A resolver can inspect the requested selection set and skip downstream fields the client did not ask for, which avoids over-fetching; the graphql-parse-resolve-info utility parses the GraphQLResolveInfo object into the requested-fields tree that makes this practical.

Which Controls, In What Order

The default is to start with connector isolation, then add the rest as the shape of your gateway demands. This decision tree roots at that default and branches on the conditions that justify each further control.

This beats reaching for “just add a CDN” or “just cache harder” because the failure mode is dependency failure, not load. Caching does not save you when a service returns errors; it only serves stale data faster while the underlying call still fails. Schema-first refactors are the alternative most teams reach for first, and the wrong one to lead with: a cleaner schema does not contain a blast radius.

The Cost of Each Control

Every control on this list buys resilience and charges something for it. Connector isolation adds configuration and per-service tuning; a mis-tuned breaker causes false trips that look like outages. Bulkheads add a socket and concurrency cap per dependency, and caps set too low throttle a healthy service while caps set too high defeat the isolation. DataLoader demands per-request lifecycle discipline, and a shared loader leaks data across users. Cost limiting that is too strict rejects legitimate queries, so the budget has to be one you measured, not guessed. And partial data only helps if clients are written to handle data + errors; a degraded mode that no client renders is dead code.

Common Pitfalls

Sharing a DataLoader across requests, which leaks cached data across users. Fix: instantiate per request.
No timeout on downstream calls, so a hung dependency hangs the whole query. Fix: a per-connector timeout that feeds the breaker.
Treating every dependency as critical, so one optional service blanks the page. Fix: classify critical versus optional and degrade the optional.
A circuit breaker with no half-open probe, which stays open forever. Fix: half-open trial requests.
A new connection per request, which pays TCP, TLS, and DNS overhead on every hop. Fix: a keep-alive agent and a DNS cache.
One shared socket pool across all connectors, so a slow dependency starves the rest. Fix: a per-connector maxSockets cap and a concurrency gate.
Retrying through an open breaker or retrying a non-idempotent call, which amplifies load on a struggling dependency. Fix: retry only idempotent operations, with jitter, below the breaker.

These are the controls to watch, framed as signals rather than targets, since a meaningful number comes only from your own traffic. Per connector, the metrics that matter are breaker state and its transitions (the open event opossum already emits), downstream p99 latency, and the fallback rate, which tells you how often a dependency is serving degraded data. At the gateway level, watch downstream fan-out per query (it should drop sharply once DataLoader is in place), the cost limiter’s rejection rate alongside its false-rejection rate, and the share of responses that return partial data + errors versus hard 5xx. Degraded mode is working when it converts 5xx into partial responses; a rising fallback rate on a connector is the early signal that a dependency is sliding before its breaker fully trips.

When Not to Build This

A BFF concentrates risk into one place. That is its value when you have many independent dependencies, and its cost when you do not. Skip or defer this architecture when you front only two or three services or one well-behaved backend, since a BFF then adds a hop, a deploy unit, and an on-call surface for little gain. Cost limiting and persisted-query allowlists buy less when your clients are internal and trusted. If a managed gateway such as AWS AppSync or Apollo Router with federation already gives you isolation, hand-rolling connectors may be redundant. And if the team cannot own the operational burden, an under-resourced BFF is worse than direct calls.

Where it does fit, start with connector-per-service isolation and per-dependency bulkheads as the non-negotiable foundation, add DataLoader the moment a resolver fans out over a list, add cost limiting and persisted queries before exposing the graph to untrusted clients, and write clients to consume partial data so a single failed dependency degrades a page instead of blanking it. Treat execution-level tricks as a last resort, reached for only when profiling names interpretation overhead as the bottleneck.

References

opossum - Widely-used standalone Node.js circuit breaker (current major v9). Wraps an async function with a breaker, per-call timeout, .fallback(), and an open event; the lead reference implementation for the connector pattern.
Bulkhead pattern - Azure Architecture Center: partition resources per consumer so one dependency cannot exhaust shared capacity for the rest.
p-limit - Promise-concurrency limiter used to build a per-dependency concurrency gate.
graphql/dataloader - Canonical N+1 batching and per-request memoization utility. README documents the per-request lifecycle and the cross-user cache-leak risk.
dataloader on npm - Package and version reference for the DataLoader library.
graphql-cost-analysis - Static query-cost analyzer (maximumCost, costMap, onComplete). Verify maintenance status; compare against graphql-query-complexity and Apollo demand control.
graphql-query-complexity - Actively-maintained query complexity analyzer for graphql-js, an alternative to graphql-cost-analysis.
graphql-parse-resolve-info - Parses GraphQLResolveInfo into the requested-fields tree, enabling selection-set lookaheads that skip unrequested downstreams.
GraphQL specification: Errors - The spec’s definition of how partial data and an errors array coexist in one response, the foundation for partial-data degradation.
Apollo Server error handling - Expressing error states as types and unions so clients pattern-match a known shape.
Apollo persisted queries - Persisted-query allowlists as a stability and admission-control lever.
The Back-end for Front-end Pattern (BFF) - Phil Calçado’s original write-up coining the BFF pattern at SoundCloud.
Backends For Frontends - Sam Newman’s widely-cited definition of the BFF pattern.
Microsoft REST API Guidelines - Public guidelines behind the REST microservices a gateway like this fronts; useful for the typed-contracts framing.

Async API Patterns for Web and Mobile: An Opinionated Default

One default shape for long-running work across a browser SPA and a mobile app, with the cases where it should be overridden.

api-designwebsocketsserver-sent-events+5

April 18, 2026

External Authorization Management Systems: Choosing the Right Platform for Your Architecture

A vendor-neutral evaluation of AWS Verified Permissions, SpiceDB, OpenFGA, Cerbos, and OPA, with architecture patterns, cost analysis, and a decision framework.

authorizationsecurityarchitecture+5

March 22, 2026

SpiceDB vs Auth0 FGA: Relationship-Based Authorization Compared

A deep technical comparison of SpiceDB and Auth0 FGA (OpenFGA), two Zanzibar-inspired systems with different trade-offs in schema, consistency, deployment, and scale.

authorizationsecurityarchitecture+3

March 22, 2026

Amazon Cognito Deep Dive: Beyond Basic Authentication

A technical guide to advanced Amazon Cognito: custom auth flows, federation, multi-tenancy, migration strategies, and production-grade security with CDK.

awscognitoauthentication+7

December 24, 2025

API Versioning Strategies in Practice: From First Release to Sunset

A practical guide to API versioning: URL vs header approaches, breaking changes, Sunset-header deprecation, AWS API Gateway, GraphQL, and contract testing.

api-designversioningrest-api+7

December 22, 2025