Edge Caching & CDN Integration Patterns

Caching is the single highest-leverage optimization available at the network perimeter, yet it is also the most constrained. An edge isolate has no shared disk, no durable local state between invocations, and no guarantee that two requests for the same URL land on the same Point of Presence (PoP). Every caching decision must therefore treat responses as immutable artifacts keyed by a deterministic identifier, replicated lazily across PoPs, and reconciled under eventual consistency. This guide establishes a constraint-first model for edge caching and CDN integration that holds across Cloudflare Workers, Vercel Edge, and Netlify Edge Functions, then links out to the in-depth walkthroughs that implement each layer.

The economic case is straightforward: a cache hit served from a PoP within 30 ms of the user is two orders of magnitude faster and cheaper than a cold origin fetch traversing the public internet. The engineering case is harder. Caching at the edge means choosing the right storage tier for each payload, deriving a cache key that neither fragments nor collides, revalidating without stampeding the origin, and invalidating with surgical precision so that a single product update does not purge an entire catalog. Get any of these wrong and you ship stale prices, leak one user’s dashboard to another, or melt your origin during a cache flush.

Edge caching tier resolution A request checks the local PoP Cache API, then a regional tier, then KV or Durable Object state, and finally the origin, with stale-while-revalidate refreshing entries in the background. Client Edge isolate (PoP) Cache API per-PoP, immutable Regional tier upstream cache KV / DO replicated state stale-while- revalidate waitUntil refresh Origin
A request resolves against progressively colder cache tiers; a background revalidation refreshes the entry without blocking the response.

Why edge caching is constrained

Three structural properties of edge runtimes define every pattern in this domain.

Responses are immutable once cached. When you write a Response into the Cache API, its body becomes an opaque, read-only artifact. You cannot patch a header or mutate a byte in place; you replace the whole entry or delete it. This is why versioned cache keys and atomic purges are the only safe invalidation primitives. The isolate model that makes this possible runs each request inside a V8 isolate with no persistent file system, so there is nowhere to stash a mutable scratch copy anyway.

There is no shared disk and no shared memory across PoPs. A response cached in Frankfurt is invisible to a request served from São Paulo until it is independently populated there. The Cache API is strictly per-PoP. Anything that must be globally readable — a rate-limit counter, a deduplicated render, a session — has to live in a replicated store such as a KV namespace or a Durable Object, each with its own consistency and latency trade-offs.

Consistency is eventual. Propagating a new value or a purge across a global network takes time — typically under five seconds on the major providers, but never zero. Designs that assume read-your-writes consistency across PoPs will surface stale data. Treat every cache as a hint, validate freshness with explicit directives, and make invalidation idempotent so retries during propagation do no harm.

These constraints are not bugs to engineer around; they are the physics of a globally distributed system. The patterns below lean into them rather than fighting them.

Three storage layers: Cache-Control, the Cache API, and KV/Durable Objects

Edge caching is not one mechanism but three, each operating at a different layer with different ownership, scope, and write semantics. Choosing the wrong layer is the most common architectural mistake in edge caching.

The HTTP Cache-Control layer is declarative. You emit response directives — public, s-maxage, stale-while-revalidate, Vary — and the CDN’s managed cache obeys them. You never call an API; the platform reads the headers and decides what to store, for how long, and when to revalidate. This is the right default for shared, anonymous, GET-able content because it requires no code and benefits from the provider’s entire tiered fleet.

The Cache API (caches.default / caches.open) is imperative and programmatic. Inside your isolate you call cache.match(request) and cache.put(request, response) explicitly, controlling exactly what gets stored under exactly which key. This is how you cache responses you assembled yourself, cache POST-derived data under a custom key, or implement stale-while-revalidate with ctx.waitUntil when you need behavior the declarative layer cannot express.

The KV / Durable Objects layer is a key-value and coordination store, not an HTTP cache at all. It holds arbitrary serialized values — rendered fragments, API payloads, counters — that you read and write programmatically and that replicate across PoPs. Use it when data must outlive a single response, be shared globally, or be updated transactionally.

Concern Cache-Control (managed CDN) Cache API (caches.default) KV / Durable Objects
Control model Declarative headers Imperative match/put Imperative get/put
Scope Provider’s tiered fleet Per-PoP (local) Replicated globally
Key URL + Vary Any Request you construct Arbitrary string key
Stored value HTTP response HTTP response Any serialized value
TTL source Response directives Cache-Control on the put Per-write expirationTtl
Cloudflare Cache rules + cf.cacheTtl caches.default Workers KV, Durable Objects, D1
Vercel Edge s-maxage / stale-while-revalidate (CDN-Cache-Control) Limited; prefer header layer Vercel KV (Upstash), Edge Config
Netlify Edge Cache-Control / Netlify-CDN-Cache-Control caches (Deno) Netlify Blobs, external KV

A robust system uses all three: Cache-Control for the bulk of anonymous traffic, the Cache API for computed responses and background refresh, and KV or a Durable Object for the small set of values that must be globally consistent. The dedicated guide on KV vs Durable Objects for edge state covers when replicated eventual-consistency (KV) beats strongly-consistent single-instance coordination (Durable Objects).

Cache key derivation

A cache key is the contract between a request and a stored response. If the key is too coarse, two different responses collide and one user sees another’s content. If it is too fine, the cache fragments into thousands of near-duplicate entries with near-zero hit ratio. The art of cache key derivation is including exactly the request attributes that change the response, and nothing else.

The default key is the request URL. The first refinement is normalizing it: lowercase the host, strip the fragment, drop tracking parameters (utm_*, fbclid, gclid), and sort the surviving query parameters into a canonical order so that ?a=1&b=2 and ?b=2&a=1 hit the same entry. The companion guide on normalizing query parameters in edge cache keys covers allow-list versus deny-list strategies and the order-stability pitfalls in detail.

// Derive a normalized cache key from a request, allow-listing query params.
function deriveCacheKey(request: Request, allowedParams: string[]): Request {
  const url = new URL(request.url);
  url.hostname = url.hostname.toLowerCase();
  url.hash = "";

  const kept = new URLSearchParams();
  for (const name of allowedParams.sort()) {
    const value = url.searchParams.get(name);
    if (value !== null) kept.set(name, value);
  }
  url.search = kept.toString();

  // Returning a Request lets caches.default key on the normalized URL.
  return new Request(url.toString(), { method: "GET" });
}

Beyond the URL, the Vary header tells the shared cache which request headers also participate in the key. Vary: Accept-Encoding is almost always correct. Vary: Accept-Language is correct only if you actually serve localized bodies. Varying by Cookie is dangerous: a single rotating session cookie shatters the cache into per-user entries and collapses your hit ratio to zero. When you genuinely need per-segment caching, hash the one cookie that matters into a coarse bucket rather than varying on the whole header — the pattern in varying edge cache by cookie shows how to do this safely without leaking authenticated content into a shared cache.

Stale-while-revalidate

The cruelest trade-off in caching is freshness versus latency: a short TTL keeps data current but pushes load onto the origin, while a long TTL is fast but serves stale content. stale-while-revalidate dissolves the trade-off by decoupling the two. Within the max-age window the response is fresh and served directly. After max-age expires but within the stale-while-revalidate window, the cache serves the stale copy immediately and kicks off a background revalidation so the next request sees fresh data. The user never waits for the origin.

// Serve stale immediately, revalidate in the background with waitUntil.
async function swr(
  request: Request,
  ctx: { waitUntil(p: Promise<unknown>): void },
  origin: (r: Request) => Promise<Response>,
): Promise<Response> {
  const cache = caches.default;
  const cached = await cache.match(request);

  if (cached) {
    // Refresh asynchronously; the user is already being served.
    ctx.waitUntil(
      origin(request).then((fresh) => cache.put(request, fresh.clone())),
    );
    return cached;
  }

  const fresh = await origin(request);
  ctx.waitUntil(cache.put(request, fresh.clone()));
  return fresh;
}

The header form, Cache-Control: public, max-age=60, stale-while-revalidate=600, lets the managed CDN do this for you with no code. The imperative form above gives you control over the revalidation key and lets you add stale-if-error semantics — serving stale content when the origin returns 5xx rather than propagating the failure. The companion cluster on stale-while-revalidate at the edge details the provider mapping and the two failure modes you must design against: a thundering herd when many PoPs revalidate the same expired key simultaneously, and unbounded staleness when a persistently failing origin keeps the stale-if-error window open forever. The framework-specific implementation lives in implementing stale-while-revalidate in Next.js, and the window math is covered in tuning Cache-Control max-age for edge.

Tag and surrogate-key invalidation

TTL-based expiry answers “how long until this might be stale?” Invalidation answers “this is stale now — purge it.” The naive approach, purging by URL, breaks down the moment one piece of content appears under many URLs (a product on the homepage, a category page, and a search result) or one upstream change affects many pages. The solution is to tag responses with logical identifiers and purge by tag.

When you emit a response, attach a surrogate key or cache tag header listing every logical entity the response depends on — Cache-Tag: product-123, category-shoes, price-table. When product-123 changes, you issue a single purge for that tag and every cached response carrying it is evicted across the fleet, regardless of URL. This decouples invalidation from URL structure entirely.

Provider Tag mechanism Purge interface Notes
Cloudflare Cache-Tag response header API purge_cache with tags Enterprise feature; up to 30 tags/response
Vercel x-vercel-cache-tags (via next/cache) revalidateTag() in Next.js Tags flow through the ISR/data cache
Netlify Netlify-Cache-Tag / Cache-Tag purgeCache({ tags }) Works with Netlify Edge + on-demand builders
Fastly Compute Surrogate-Key response header purge by surrogate key The pattern these others emulate

The dedicated walkthroughs cover purging Cloudflare cache by tag and using surrogate keys on Fastly Compute. The governing principle is the same everywhere: make purges idempotent and narrow. A purge that runs twice during propagation must be harmless, and a single content change must never trigger a fleet-wide flush. Where a provider lacks native tagging, emulate it by storing tag-to-key mappings in a KV namespace and iterating the affected keys.

Multi-tier CDN architecture

A single cache layer between users and origin leaves the origin exposed to every cache miss from every PoP. With hundreds of PoPs, even a 5% miss rate can mean thousands of simultaneous origin fetches for the same expired object. Tiered caching inserts an intermediate layer: edge PoPs do not fall back directly to origin but to a smaller set of regional or upstream caches, which absorb and coalesce misses before any request reaches the origin.

The effect is dramatic. If 50 edge PoPs in Europe all miss the same object, a tiered topology funnels them through one regional cache that fetches from origin exactly once and fans the result back out. The origin sees one request instead of fifty. This is request coalescing applied at the network topology level, and it is the single most effective defense against origin overload during cache fill and revalidation storms.

// Conceptual tier walk: local PoP -> regional -> origin, populating colder tiers.
async function tieredFetch(
  request: Request,
  ctx: { waitUntil(p: Promise<unknown>): void },
): Promise<Response> {
  const local = caches.default;
  const hit = await local.match(request);
  if (hit) return hit;

  // Miss locally: a tiered CDN routes this to an upstream tier, not origin.
  const upstream = await fetch(request, {
    // Cloudflare honors cf.cacheTtl / tiered cache topology here.
    cf: { cacheEverything: true, cacheTtl: 300 },
  } as RequestInit);

  ctx.waitUntil(local.put(request, upstream.clone()));
  return upstream;
}

Each provider exposes tiering differently — Cloudflare via Tiered Cache and Cache Reserve, Vercel through its global/regional edge cache, Fastly through shielding. The configuration mechanics for the most common setup are in configuring tiered cache on Cloudflare, and the architectural trade-offs across providers are surveyed in multi-tier CDN cache architecture. The rule of thumb: enable tiering whenever your origin is a single region or a rate-limited API, and measure the origin offload ratio before and after.

Observability: hit ratios and the freshness budget

You cannot tune a cache you cannot measure. The headline metric is the cache hit ratio — the fraction of requests served without an origin fetch — but it is meaningless in aggregate. A 95% global hit ratio can hide a 40% hit ratio on your highest-traffic route because a rogue Vary or an unstripped query parameter fragmented that route’s keyspace. Always slice hit ratio by route, by content type, and by cache tier.

Providers surface cache status on the response: Cloudflare’s cf-cache-status (HIT, MISS, EXPIRED, REVALIDATED, DYNAMIC), Vercel’s x-vercel-cache, and Netlify’s Cache-Status. Log these from your middleware and aggregate them. An unexpected spike in DYNAMIC/BYPASS means a directive or key regression; a spike in MISS after a deploy means your build invalidated more than intended.

// Emit a structured cache observation for aggregation downstream.
function recordCacheStatus(response: Response, route: string): void {
  const status =
    response.headers.get("cf-cache-status") ??
    response.headers.get("x-vercel-cache") ??
    response.headers.get("Cache-Status") ??
    "unknown";
  console.log(JSON.stringify({ event: "cache", route, status, ts: Date.now() }));
}

Background revalidation runs after the response is sent, so its failures are invisible to the request that triggered it. Always attach error handling to the promise you pass to ctx.waitUntil, and emit a counter when revalidation fails — otherwise a quietly broken origin can keep serving ever-staler content while every user-facing request still returns 200. For propagating these observations into a trace, wire the cache status into the same W3C Trace Context (traceparent) span you use for the rest of the request, as described in the middleware observability patterns.

Deployment checklist

Before promoting an edge caching configuration to production, walk this sequence:

  1. Confirm the storage layer per route. Anonymous GET content uses Cache-Control; computed responses use the Cache API; globally-shared mutable state uses KV or a Durable Object. No route should silently fall through to DYNAMIC.
  2. Audit cache keys. Verify query-parameter normalization and that no Vary: Cookie leaks authenticated content into a shared cache.
  3. Set freshness windows deliberately. Pair every max-age/s-maxage with a stale-while-revalidate window, and add stale-if-error for resilience.
  4. Wire tag-based invalidation. Every mutable response carries a Cache-Tag/Surrogate-Key; every mutation path issues an idempotent, narrow purge.
  5. Enable tiering. Turn on tiered/shielded caching if the origin is single-region or rate-limited; measure origin offload.
  6. Instrument hit ratios. Log provider cache-status headers per route and alert on regressions and on background-revalidation failures.
  7. Validate propagation. Confirm purges propagate within the provider SLA and that retries during propagation are idempotent.

Checklist form:

  • Each route maps to an explicit storage layer; no unintended DYNAMIC/
  • Query parameters normalized; no unintended Vary
  • max-age/s-maxage paired with stale-while-revalidate (+ stale-if-error

In-depth guides in this section

This overview anchors a set of focused walkthroughs:

Frequently Asked Questions

Should I use Cache-Control headers or the Cache API?

Use Cache-Control headers for shared, anonymous, GET-able content — it requires no code and benefits from the provider’s full tiered fleet. Reach for the Cache API (caches.default) when you assemble responses yourself, need a custom cache key, or implement stale-while-revalidate with ctx.waitUntil. Most production systems use both: headers for the bulk of traffic and the Cache API for computed responses and background refresh.

Why is my cache hit ratio low even though I set a long max-age?

Almost always cache-key fragmentation. Unstripped tracking query parameters, an over-broad Vary header (especially Vary: Cookie), or unsorted query strings cause semantically identical requests to map to distinct keys. Slice your hit ratio by route, then audit query-parameter normalization and Vary on the low-performing routes.

How is the Cache API different from KV or Durable Objects?

The Cache API stores HTTP responses scoped to a single PoP and is meant for caching. KV stores arbitrary serialized values replicated across all PoPs under eventual consistency. Durable Objects provide a single, strongly-consistent instance for coordination. Cache content where each PoP can independently populate; use KV or Durable Objects when data must be globally shared or transactionally updated.

Is purging by URL or by tag better?

Tag-based purging is better whenever content appears under multiple URLs or one upstream change affects many pages. Tag each response with the logical entities it depends on, then purge by tag so every affected response is evicted regardless of URL. Reserve URL purges for one-off corrections, and keep all purges idempotent and narrowly scoped.

What does tiered caching actually do for my origin?

It coalesces cache misses. Instead of every edge PoP falling back directly to origin, PoPs fall back to a smaller set of regional caches that fetch from origin once and fan the result out. During a revalidation storm this turns potentially hundreds of simultaneous origin fetches into one, which is the most effective defense against origin overload.