Implementing Circuit Breakers in Edge Middleware
A downstream API starts timing out. Your edge middleware keeps calling it on every request, each call hanging until the AbortController fires, each hang burning CPU from a tight budget and stacking up latency for users who were never going to get a useful response. The failing service, meanwhile, is buried under retries it cannot answer. A circuit breaker stops this spiral: after enough consecutive failures it trips, short-circuits to a fallback for a cooldown window, then cautiously probes for recovery.
This guide is part of Observability and Debugging Edge Middleware. It walks through a closed/open/half-open breaker around a downstream fetch, with exponential backoff, and explains the central edge constraint: where the breaker’s state actually lives.
Root cause: per-isolate state at the edge
A circuit breaker is fundamentally a piece of shared state — a failure count and a status — that many requests read and update. On a single long-lived server that state is one object in process memory, seen by every request. At the edge it is not.
Your code runs in many V8 isolates spread across Points of Presence. A module-scope variable is local to one isolate; a second isolate at the same PoP, and certainly one in another city, has its own copy. So a breaker held in module scope trips per isolate, not globally. That is acceptable — even desirable — for shielding a single hot isolate from a flapping dependency, and it costs nothing. When you need one consistent verdict across the whole fleet, the state must live in a single coordination point: a Durable Object, which gives you one authoritative instance every isolate can consult.
This guide builds the module-scope version first, then shows the Durable Object variant.
Step 1: Model the three states
A breaker has three states. Closed is healthy: requests pass through. Open is tripped: requests short-circuit to the fallback without touching the downstream. Half-open is probing: a single request is allowed through to test whether the downstream has recovered, promoting back to closed on success or back to open on failure.
// breaker.ts
export type BreakerState = "closed" | "open" | "half-open";
export interface BreakerConfig {
failureThreshold: number; // consecutive failures before opening
baseCooldownMs: number; // first open-state cooldown
maxCooldownMs: number; // ceiling for exponential backoff
}
export interface BreakerSnapshot {
state: BreakerState;
failures: number;
openedAt: number;
cooldownMs: number;
}
Step 2: Implement the state machine with exponential backoff
The breaker reads the clock, decides whether a call may proceed, and records the outcome. On repeated trips the cooldown grows exponentially up to a ceiling, so a persistently broken dependency is probed less and less often.
import type { BreakerConfig, BreakerSnapshot, BreakerState } from "./breaker";
export class CircuitBreaker {
private state: BreakerState = "closed";
private failures = 0;
private openedAt = 0;
private cooldownMs: number;
private consecutiveOpens = 0;
constructor(private readonly cfg: BreakerConfig) {
this.cooldownMs = cfg.baseCooldownMs;
}
/** Returns true if a call may proceed (closed or half-open probe). */
canAttempt(now = Date.now()): boolean {
if (this.state === "open" && now - this.openedAt >= this.cooldownMs) {
this.state = "half-open"; // allow a single probe
}
return this.state !== "open";
}
recordSuccess(): void {
this.failures = 0;
this.consecutiveOpens = 0;
this.cooldownMs = this.cfg.baseCooldownMs;
this.state = "closed";
}
recordFailure(now = Date.now()): void {
this.failures += 1;
if (this.state === "half-open" || this.failures >= this.cfg.failureThreshold) {
this.trip(now);
}
}
private trip(now: number): void {
this.state = "open";
this.openedAt = now;
this.consecutiveOpens += 1;
// Exponential backoff: base * 2^(opens-1), capped.
const factor = 2 ** (this.consecutiveOpens - 1);
this.cooldownMs = Math.min(this.cfg.baseCooldownMs * factor, this.cfg.maxCooldownMs);
}
snapshot(): BreakerSnapshot {
return { state: this.state, failures: this.failures, openedAt: this.openedAt, cooldownMs: this.cooldownMs };
}
}
Step 3: Wrap the downstream fetch
Hold one breaker instance in module scope, keyed by downstream, so it persists across requests served by the same isolate. The middleware consults it before calling, records the outcome, and returns a fallback when the breaker is open.
import { CircuitBreaker } from "./circuit-breaker";
// Module scope: shared across requests within one isolate.
const apiBreaker = new CircuitBreaker({ failureThreshold: 5, baseCooldownMs: 2_000, maxCooldownMs: 60_000 });
export async function callDownstream(req: Request): Promise<Response> {
if (!apiBreaker.canAttempt()) {
return new Response(JSON.stringify({ error: "service unavailable" }), {
status: 503,
headers: { "content-type": "application/json", "x-circuit": "open" },
});
}
const ctrl = new AbortController();
const timer = setTimeout(() => ctrl.abort(), 1_500);
try {
const res = await fetch("https://api.internal/resource", { signal: ctrl.signal });
if (res.status >= 500) {
apiBreaker.recordFailure();
} else {
apiBreaker.recordSuccess();
}
return res;
} catch {
apiBreaker.recordFailure();
return new Response(JSON.stringify({ error: "upstream timeout" }), {
status: 503,
headers: { "content-type": "application/json", "x-circuit": "tripped" },
});
} finally {
clearTimeout(timer);
}
}
Count only failures the breaker should react to. A timeout or 5xx is a failure; a 404 or 401 is a normal downstream answer and must not trip the breaker. Pair this with an early-return guard so the open-state fallback exits the chain immediately without invoking later stages.
Step 4: Coordinate globally with a Durable Object
When you need one breaker verdict across all isolates, move the state into a Durable Object. Every isolate calls the same instance, so the failure count and trip decision are global. The breaker logic is identical; only the storage location changes.
// breaker-do.ts
export class BreakerDO {
private breaker = new CircuitBreaker({ failureThreshold: 10, baseCooldownMs: 5_000, maxCooldownMs: 120_000 });
async fetch(req: Request): Promise<Response> {
const { outcome } = (await req.json()) as { outcome: "success" | "failure" | "check" };
if (outcome === "success") this.breaker.recordSuccess();
else if (outcome === "failure") this.breaker.recordFailure();
const allowed = this.breaker.canAttempt();
return Response.json({ allowed, ...this.breaker.snapshot() });
}
}
The trade-off is latency: every request now makes a round-trip to the Durable Object before calling the downstream. Reserve the global breaker for dependencies where over-calling is genuinely harmful (a fragile partner API, a metered backend); use the cheaper module-scope breaker everywhere else.
Configuration
For the Durable Object variant, declare the binding and migration in wrangler.jsonc:
{
"name": "edge-middleware",
"main": "src/index.ts",
"compatibility_date": "2026-06-01",
"durable_objects": { "bindings": [{ "name": "BREAKER", "class_name": "BreakerDO" }] },
"migrations": [{ "tag": "v1", "new_classes": ["BreakerDO"] }]
}
On Vercel Edge and Netlify there is no Durable Object equivalent; use module-scope breakers for per-isolate protection and an external coordinated store (Vercel KV / Upstash, Netlify Blobs, or your own service) when global state is required.
Local vs production divergence
| Aspect | Local dev | Production |
|---|---|---|
| Module-scope persistence | One isolate — breaker always shared | Many isolates — state is per-isolate |
| Time source | Real Date.now() |
Real Date.now(); coarsened in some runtimes |
| Durable Object | Emulated single instance | One global instance per ID |
| Failure injection | You force timeouts/5xx | Real downstream degradation |
| Cooldown observability | Console logs | Trace spans and x-circuit headers |
Step 5: Validate with Vitest
Drive the state machine with a fake clock so you can assert the trip threshold, the cooldown transition to half-open, and the recovery path — without waiting in real time.
import { describe, it, expect } from "vitest";
import { CircuitBreaker } from "../src/circuit-breaker";
const cfg = { failureThreshold: 3, baseCooldownMs: 1_000, maxCooldownMs: 8_000 };
describe("CircuitBreaker", () => {
it("opens after the failure threshold", () => {
const b = new CircuitBreaker(cfg);
b.recordFailure(); b.recordFailure(); b.recordFailure();
expect(b.snapshot().state).toBe("open");
expect(b.canAttempt(0)).toBe(false);
});
it("moves to half-open after the cooldown and closes on a successful probe", () => {
const b = new CircuitBreaker(cfg);
b.recordFailure(0); b.recordFailure(0); b.recordFailure(0);
expect(b.canAttempt(500)).toBe(false); // still cooling down
expect(b.canAttempt(1_000)).toBe(true); // promoted to half-open
expect(b.snapshot().state).toBe("half-open");
b.recordSuccess();
expect(b.snapshot().state).toBe("closed");
});
it("backs off exponentially on repeated trips", () => {
const b = new CircuitBreaker(cfg);
for (let i = 0; i < 3; i++) b.recordFailure(0);
const first = b.snapshot().cooldownMs; // 1000
b.canAttempt(first); // half-open
b.recordFailure(first); // probe fails -> reopen
expect(b.snapshot().cooldownMs).toBe(2_000); // doubled
});
});
Pitfalls
- Assuming global state from module scope. Module-scope breakers trip per isolate. For one fleet-wide verdict, use a Durable Object or external coordinated store.
- Counting normal responses as failures. A
404or401is a valid answer; tripping on it opens the breaker against a healthy service. Count only timeouts and5xx. - No timeout on the fetch. Without an
AbortController, a hanging downstream consumes CPU and wall-clock and never registers as a failure. Always bound the call. - Fixed cooldown. A constant cooldown hammers a persistently broken dependency. Grow it with exponential backoff up to a ceiling.
- Thundering herd on half-open. Letting every isolate probe at once on recovery re-floods the downstream. The Durable Object variant naturally admits a single probe; for module scope, keep the threshold and probe count low.
Production deployment checklist
- Breaker counts only timeouts and
5xxas failures, never - Every downstream
fetchhas anAbortController - Open state returns a fast fallback
503 - Breaker state changes are emitted as trace spans and
x-circuit
Frequently Asked Questions
Why does a module-scope circuit breaker only trip per isolate?
Module-scope variables live in a single V8 isolate’s memory. Edge platforms run many isolates across Points of Presence, each with its own copy, so a failure count in one isolate is invisible to the others. That gives per-isolate protection for free; for a single fleet-wide verdict you need a shared coordination point such as a Durable Object.
Should a 404 or 401 from the downstream count as a failure?
No. Those are normal answers from a healthy service. Counting them trips the breaker against a service that is working fine. Treat only timeouts (aborted fetches) and 5xx responses as failures.
When is a Durable Object worth the extra round-trip?
When over-calling a degraded dependency is genuinely harmful — a fragile partner API, a metered or rate-limited backend, or one that needs protection from coordinated retries. For most internal services the cheaper per-isolate breaker is enough and avoids adding a Durable Object hop to every request.
How do I avoid a thundering herd when the breaker recovers?
The half-open state admits only a single probe before deciding. With a Durable Object that single-probe guarantee is global. With module-scope breakers, keep the failure threshold and the implicit probe count low so only a few isolates test recovery at once, and lean on exponential backoff to space out attempts.