Implementing Circuit Breakers in Edge Middleware

A downstream API starts timing out. Your edge middleware keeps calling it on every request, each call hanging until the AbortController fires, each hang burning CPU from a tight budget and stacking up latency for users who were never going to get a useful response. The failing service, meanwhile, is buried under retries it cannot answer. A circuit breaker stops this spiral: after enough consecutive failures it trips, short-circuits to a fallback for a cooldown window, then cautiously probes for recovery.

This guide is part of Observability and Debugging Edge Middleware. It walks through a closed/open/half-open breaker around a downstream fetch, with exponential backoff, and explains the central edge constraint: where the breaker’s state actually lives.

Root cause: per-isolate state at the edge

A circuit breaker is fundamentally a piece of shared state — a failure count and a status — that many requests read and update. On a single long-lived server that state is one object in process memory, seen by every request. At the edge it is not.

Your code runs in many V8 isolates spread across Points of Presence. A module-scope variable is local to one isolate; a second isolate at the same PoP, and certainly one in another city, has its own copy. So a breaker held in module scope trips per isolate, not globally. That is acceptable — even desirable — for shielding a single hot isolate from a flapping dependency, and it costs nothing. When you need one consistent verdict across the whole fleet, the state must live in a single coordination point: a Durable Object, which gives you one authoritative instance every isolate can consult.

This guide builds the module-scope version first, then shows the Durable Object variant.

Step 1: Model the three states

A breaker has three states. Closed is healthy: requests pass through. Open is tripped: requests short-circuit to the fallback without touching the downstream. Half-open is probing: a single request is allowed through to test whether the downstream has recovered, promoting back to closed on success or back to open on failure.

// breaker.ts
export type BreakerState = "closed" | "open" | "half-open";

export interface BreakerConfig {
  failureThreshold: number; // consecutive failures before opening
  baseCooldownMs: number;   // first open-state cooldown
  maxCooldownMs: number;    // ceiling for exponential backoff
}

export interface BreakerSnapshot {
  state: BreakerState;
  failures: number;
  openedAt: number;
  cooldownMs: number;
}

Step 2: Implement the state machine with exponential backoff

The breaker reads the clock, decides whether a call may proceed, and records the outcome. On repeated trips the cooldown grows exponentially up to a ceiling, so a persistently broken dependency is probed less and less often.

import type { BreakerConfig, BreakerSnapshot, BreakerState } from "./breaker";

export class CircuitBreaker {
  private state: BreakerState = "closed";
  private failures = 0;
  private openedAt = 0;
  private cooldownMs: number;
  private consecutiveOpens = 0;

  constructor(private readonly cfg: BreakerConfig) {
    this.cooldownMs = cfg.baseCooldownMs;
  }

  /** Returns true if a call may proceed (closed or half-open probe). */
  canAttempt(now = Date.now()): boolean {
    if (this.state === "open" && now - this.openedAt >= this.cooldownMs) {
      this.state = "half-open"; // allow a single probe
    }
    return this.state !== "open";
  }

  recordSuccess(): void {
    this.failures = 0;
    this.consecutiveOpens = 0;
    this.cooldownMs = this.cfg.baseCooldownMs;
    this.state = "closed";
  }

  recordFailure(now = Date.now()): void {
    this.failures += 1;
    if (this.state === "half-open" || this.failures >= this.cfg.failureThreshold) {
      this.trip(now);
    }
  }

  private trip(now: number): void {
    this.state = "open";
    this.openedAt = now;
    this.consecutiveOpens += 1;
    // Exponential backoff: base * 2^(opens-1), capped.
    const factor = 2 ** (this.consecutiveOpens - 1);
    this.cooldownMs = Math.min(this.cfg.baseCooldownMs * factor, this.cfg.maxCooldownMs);
  }

  snapshot(): BreakerSnapshot {
    return { state: this.state, failures: this.failures, openedAt: this.openedAt, cooldownMs: this.cooldownMs };
  }
}

The breaker trips to open at the failure threshold, probes once after the cooldown, and the cooldown grows with each consecutive trip.

Step 3: Wrap the downstream fetch

Hold one breaker instance in module scope, keyed by downstream, so it persists across requests served by the same isolate. The middleware consults it before calling, records the outcome, and returns a fallback when the breaker is open.

import { CircuitBreaker } from "./circuit-breaker";

// Module scope: shared across requests within one isolate.
const apiBreaker = new CircuitBreaker({ failureThreshold: 5, baseCooldownMs: 2_000, maxCooldownMs: 60_000 });

export async function callDownstream(req: Request): Promise<Response> {
  if (!apiBreaker.canAttempt()) {
    return new Response(JSON.stringify({ error: "service unavailable" }), {
      status: 503,
      headers: { "content-type": "application/json", "x-circuit": "open" },
    });
  }

  const ctrl = new AbortController();
  const timer = setTimeout(() => ctrl.abort(), 1_500);
  try {
    const res = await fetch("https://api.internal/resource", { signal: ctrl.signal });
    if (res.status >= 500) {
      apiBreaker.recordFailure();
    } else {
      apiBreaker.recordSuccess();
    }
    return res;
  } catch {
    apiBreaker.recordFailure();
    return new Response(JSON.stringify({ error: "upstream timeout" }), {
      status: 503,
      headers: { "content-type": "application/json", "x-circuit": "tripped" },
    });
  } finally {
    clearTimeout(timer);
  }
}

Count only failures the breaker should react to. A timeout or 5xx is a failure; a 404 or 401 is a normal downstream answer and must not trip the breaker. Pair this with an early-return guard so the open-state fallback exits the chain immediately without invoking later stages.

Step 4: Coordinate globally with a Durable Object

When you need one breaker verdict across all isolates, move the state into a Durable Object. Every isolate calls the same instance, so the failure count and trip decision are global. The breaker logic is identical; only the storage location changes.

// breaker-do.ts
export class BreakerDO {
  private breaker = new CircuitBreaker({ failureThreshold: 10, baseCooldownMs: 5_000, maxCooldownMs: 120_000 });

  async fetch(req: Request): Promise<Response> {
    const { outcome } = (await req.json()) as { outcome: "success" | "failure" | "check" };
    if (outcome === "success") this.breaker.recordSuccess();
    else if (outcome === "failure") this.breaker.recordFailure();
    const allowed = this.breaker.canAttempt();
    return Response.json({ allowed, ...this.breaker.snapshot() });
  }
}

The trade-off is latency: every request now makes a round-trip to the Durable Object before calling the downstream. Reserve the global breaker for dependencies where over-calling is genuinely harmful (a fragile partner API, a metered backend); use the cheaper module-scope breaker everywhere else.

Configuration

For the Durable Object variant, declare the binding and migration in wrangler.jsonc:

{
  "name": "edge-middleware",
  "main": "src/index.ts",
  "compatibility_date": "2026-06-01",
  "durable_objects": { "bindings": [{ "name": "BREAKER", "class_name": "BreakerDO" }] },
  "migrations": [{ "tag": "v1", "new_classes": ["BreakerDO"] }]
}

On Vercel Edge and Netlify there is no Durable Object equivalent; use module-scope breakers for per-isolate protection and an external coordinated store (Vercel KV / Upstash, Netlify Blobs, or your own service) when global state is required.

Local vs production divergence

Aspect	Local dev	Production
Module-scope persistence	One isolate — breaker always shared	Many isolates — state is per-isolate
Time source	Real `Date.now()`	Real `Date.now()`; coarsened in some runtimes
Durable Object	Emulated single instance	One global instance per ID
Failure injection	You force timeouts/5xx	Real downstream degradation
Cooldown observability	Console logs	Trace spans and `x-circuit` headers

Step 5: Validate with Vitest

Drive the state machine with a fake clock so you can assert the trip threshold, the cooldown transition to half-open, and the recovery path — without waiting in real time.

import { describe, it, expect } from "vitest";
import { CircuitBreaker } from "../src/circuit-breaker";

const cfg = { failureThreshold: 3, baseCooldownMs: 1_000, maxCooldownMs: 8_000 };

describe("CircuitBreaker", () => {
  it("opens after the failure threshold", () => {
    const b = new CircuitBreaker(cfg);
    b.recordFailure(); b.recordFailure(); b.recordFailure();
    expect(b.snapshot().state).toBe("open");
    expect(b.canAttempt(0)).toBe(false);
  });

  it("moves to half-open after the cooldown and closes on a successful probe", () => {
    const b = new CircuitBreaker(cfg);
    b.recordFailure(0); b.recordFailure(0); b.recordFailure(0);
    expect(b.canAttempt(500)).toBe(false);      // still cooling down
    expect(b.canAttempt(1_000)).toBe(true);     // promoted to half-open
    expect(b.snapshot().state).toBe("half-open");
    b.recordSuccess();
    expect(b.snapshot().state).toBe("closed");
  });

  it("backs off exponentially on repeated trips", () => {
    const b = new CircuitBreaker(cfg);
    for (let i = 0; i < 3; i++) b.recordFailure(0);
    const first = b.snapshot().cooldownMs;       // 1000
    b.canAttempt(first);                          // half-open
    b.recordFailure(first);                       // probe fails -> reopen
    expect(b.snapshot().cooldownMs).toBe(2_000);  // doubled
  });
});

Pitfalls

Assuming global state from module scope. Module-scope breakers trip per isolate. For one fleet-wide verdict, use a Durable Object or external coordinated store.
Counting normal responses as failures. A 404 or 401 is a valid answer; tripping on it opens the breaker against a healthy service. Count only timeouts and 5xx.
No timeout on the fetch. Without an AbortController, a hanging downstream consumes CPU and wall-clock and never registers as a failure. Always bound the call.
Fixed cooldown. A constant cooldown hammers a persistently broken dependency. Grow it with exponential backoff up to a ceiling.
Thundering herd on half-open. Letting every isolate probe at once on recovery re-floods the downstream. The Durable Object variant naturally admits a single probe; for module scope, keep the threshold and probe count low.

Production deployment checklist

Breaker counts only timeouts and 5xx as failures, never Breaker counts only timeouts and `5xx` as failures, never `4xx`
Every downstream fetch has an AbortController Every downstream `fetch` has an `AbortController` timeout
Open state returns a fast fallback 503 Open state returns a fast fallback `503`, not a hung request
Cooldown grows with exponential backoff up to a documented ceiling
Module-scope breakers are used only where per-isolate tripping is acceptable
A Durable Object (or external store) backs any breaker that must be global
Breaker state changes are emitted as trace spans and x-circuit Breaker state changes are emitted as trace spans and `x-circuit` headers
Vitest covers open, half-open, recovery, and backoff with a fake clock

Frequently Asked Questions

Why does a module-scope circuit breaker only trip per isolate?

Module-scope variables live in a single V8 isolate’s memory. Edge platforms run many isolates across Points of Presence, each with its own copy, so a failure count in one isolate is invisible to the others. That gives per-isolate protection for free; for a single fleet-wide verdict you need a shared coordination point such as a Durable Object.

Should a 404 or 401 from the downstream count as a failure?

No. Those are normal answers from a healthy service. Counting them trips the breaker against a service that is working fine. Treat only timeouts (aborted fetches) and 5xx responses as failures.

When is a Durable Object worth the extra round-trip?

When over-calling a degraded dependency is genuinely harmful — a fragile partner API, a metered or rate-limited backend, or one that needs protection from coordinated retries. For most internal services the cheaper per-isolate breaker is enough and avoids adding a Durable Object hop to every request.

How do I avoid a thundering herd when the breaker recovers?

The half-open state admits only a single probe before deciding. With a Durable Object that single-probe guarantee is global. With module-scope breakers, keep the failure threshold and the implicit probe count low so only a few isolates test recovery at once, and lean on exponential backoff to space out attempts.