KV and Durable Object Caching at the Edge

This guide is part of Edge Caching & CDN Integration. It covers how to treat Cloudflare KV, Durable Objects, and the Cache API as distinct cache and state layers — when each is the right tool, how their consistency and latency models differ, and how to coalesce writes, set TTLs, and attach metadata without paying for a round trip you did not need.

Edge middleware runs inside an isolated V8 isolate with no shared memory and no disk. The moment your logic needs to remember anything between requests — a cached upstream response, a rate-limit counter, a session flag — you reach for an external storage primitive. The mistake teams make is assuming these primitives are interchangeable. They are not. KV is an eventually consistent, globally replicated read cache. Durable Objects are single-instance, strongly consistent coordination points. The Cache API is a PoP-local HTTP cache. Choosing the wrong one produces stale reads, lost writes, or latency you cannot debug.

The constraint that drives the pattern

The edge has no central database within reach. A request landing in São Paulo cannot afford a 150 ms round trip to a primary region in Virginia on every invocation; that round trip alone exceeds the entire latency budget of most middleware chains. So edge storage primitives make a deliberate trade: they replicate data close to the reader, and they accept weaker consistency in exchange for that proximity.

Three properties separate the primitives you will use:

Read latency at the PoP. Cache API and warm KV reads resolve in single-digit milliseconds because the data lives in the colo serving the request. Durable Objects route to a single instance that may be in another region, adding a network hop.
Consistency. KV is eventually consistent: a write may take up to roughly 60 seconds to become visible at all edge locations. Durable Objects are strongly consistent because every request for a given object id is funneled to one instance. The Cache API is per-PoP, so a write in one colo is invisible to another.
Write semantics. KV is read-optimized and rate-limits writes to the same key (about one per second). Durable Objects serialize writes through a single actor, which is exactly what you want for counters and locks. The Cache API writes are local and cheap.

Pick the primitive whose consistency model matches the cost of being wrong. A cached product listing that is 30 seconds stale is fine — use KV. A seat-reservation counter that double-books under a stale read is a bug — use a Durable Object.

Each primitive trades consistency against read locality. KV reads are fast everywhere but converge slowly; Durable Objects are consistent but add a hop; the Cache API is local-only.

Architecture overview

A typical edge cache layer composes the three primitives rather than choosing one outright. The Cache API serves repeat requests within a single PoP for free. KV backs the Cache API as a globally readable second tier so a cold PoP does not always fall through to origin. Durable Objects sit beside the read path to coordinate anything that must be exactly correct — invalidation fan-out, write coalescing, atomic counters.

The read path is layered: check the PoP-local Cache API first, then KV, then origin. Each miss promotes the value back up. Writes flow the other way and are the harder problem, because KV’s write rate limit and eventual propagation mean naive write-through caching loses updates under load.

TTL and metadata as first-class concerns

Every KV write should carry an explicit lifetime. KV exposes two mechanisms: expirationTtl, a relative number of seconds (minimum 60), and expiration, an absolute Unix timestamp. Prefer expirationTtl for caches — it is computed from the moment of write and survives clock skew. The TTL is not a freshness guarantee; it is an eviction hint. KV may serve a value slightly past its expiration during propagation, and it will not proactively refresh — expiry simply means the next read after the window returns null and your code repopulates.

Metadata is the second lever teams underuse. Each KV entry can carry up to 1 KB of arbitrary JSON metadata stored alongside the value and returned by getWithMetadata in the same operation. Use it to avoid a second round trip: stash the content type, an ETag, a logical version stamp, or the upstream Cache-Control so revalidation decisions need no extra read. The Cache API, by contrast, has no metadata side channel — its “metadata” is the HTTP headers on the cached Response itself, so you express TTL through Cache-Control and freshness through Age and ETag. Durable Objects have no TTL at all; their storage persists until you delete it or an alarm() handler prunes it, which makes them the wrong tool for data that should naturally expire and the right tool for data you must explicitly manage.

Core implementation

The following Worker reads through Cache API → KV → origin and promotes on miss. It is edge-safe: only fetch, caches, Request/Response, and the KV binding are used.

interface Env {
  RESPONSE_KV: KVNamespace;
}

const KV_TTL_SECONDS = 300;
const EDGE_TTL_SECONDS = 60;

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    const cache = caches.default;
    const cacheKey = new Request(new URL(request.url).toString(), request);

    // Tier 1: PoP-local Cache API — single-digit ms, no cross-PoP visibility.
    const local = await cache.match(cacheKey);
    if (local) return local;

    // Tier 2: globally replicated KV. Returns value + metadata in one call.
    const kvHit = await env.RESPONSE_KV.getWithMetadata<string, { ct: string }>(
      cacheKey.url,
      "text",
    );
    if (kvHit.value !== null) {
      const response = new Response(kvHit.value, {
        headers: {
          "content-type": kvHit.metadata?.ct ?? "application/json",
          "cache-control": `public, max-age=${EDGE_TTL_SECONDS}`,
          "x-cache": "KV",
        },
      });
      // Promote into the local Cache API without blocking the response.
      ctx.waitUntil(cache.put(cacheKey, response.clone()));
      return response;
    }

    // Tier 3: origin. Populate both lower tiers on the way back.
    const origin = await fetch(request);
    if (origin.ok) {
      const body = await origin.clone().text();
      ctx.waitUntil(
        env.RESPONSE_KV.put(cacheKey.url, body, {
          expirationTtl: KV_TTL_SECONDS,
          metadata: { ct: origin.headers.get("content-type") ?? "application/json" },
        }),
      );
      ctx.waitUntil(cache.put(cacheKey, origin.clone()));
    }
    return origin;
  },
};

Two details matter. First, getWithMetadata returns the value and its attached metadata in a single read — store the content type, an ETag, or a version stamp there rather than issuing a second KV lookup. Second, every cache population runs inside ctx.waitUntil so the response is not held hostage by a slow write.

Write coalescing with a Durable Object

KV rate-limits writes to a single key to roughly one per second. Under a traffic spike, many isolates may simultaneously miss the cache and all try to repopulate the same key — a stampede that wastes origin requests and trips the write limit. Route the repopulation through a Durable Object so exactly one writer touches the key per interval:

export class CacheCoordinator implements DurableObject {
  private inflight = new Map<string, Promise<string>>();
  constructor(private state: DurableObjectState, private env: Env) {}

  async fetch(request: Request): Promise<Response> {
    const { key, originUrl } = await request.json<{ key: string; originUrl: string }>();

    // Coalesce concurrent refreshes for the same key into one origin fetch.
    let pending = this.inflight.get(key);
    if (!pending) {
      pending = (async () => {
        const res = await fetch(originUrl);
        const body = await res.text();
        await this.env.RESPONSE_KV.put(key, body, { expirationTtl: KV_TTL_SECONDS });
        return body;
      })();
      this.inflight.set(key, pending);
      pending.finally(() => this.inflight.delete(key));
    }
    return new Response(await pending);
  }
}

Because every request for a given key maps to the same Durable Object instance, the inflight map deduplicates concurrent refreshes process-wide. One origin fetch, one KV write, regardless of how many PoPs missed simultaneously.

Provider mapping

The same layering exists on every major platform, but the primitives have different names and limits. Use the values below; they are current as of this writing.

Capability	Cloudflare	Vercel	Netlify
Eventually consistent KV cache	Workers KV (~60s global propagation, 25 MB value cap, ~1 write/s per key)	Vercel KV (Upstash Redis-backed)	Netlify Blobs (eventually consistent object store)
Low-write config / flags	KV or Cache API	Edge Config (read-optimized, low-latency global config)	Environment / Blobs
Strong consistency / coordination	Durable Objects (single-instance, transactional storage + alarms)	No native equivalent; use external (Upstash, DO via API)	No native equivalent; external store
PoP-local HTTP cache	Cache API (`caches.default`)	Managed CDN cache + `Cache-Control`	Managed CDN cache + `Cache-Control`
Object / blob storage	R2	Vercel Blob	Netlify Blobs

The asymmetry is real: only Cloudflare ships a strongly consistent coordination primitive (Durable Objects) inside the edge runtime. On Vercel and Netlify, any strong-consistency requirement is satisfied by an external service, which reintroduces the network hop you were trying to avoid. Factor that into platform selection when your workload genuinely needs single-writer semantics.

Read latency in practice

The numbers that decide your design are read latencies under realistic conditions, not best-case benchmarks. A KV read that hits a warm local replica resolves in single-digit milliseconds. A KV read for a key never before requested in that colo is a “cold read” — it fetches from a more central location and can take tens of milliseconds the first time, then warms. The Cache API is consistently the fastest tier because it never leaves the colo. A Durable Object read is fast when the object lives near the requesting PoP and noticeably slower across regions, because the request must travel to the single instance and back. This is why the layered design puts the Cache API first and reserves Durable Objects for the write path, where correctness — not read latency — is the constraint that matters.

A subtle consequence: because KV cold reads are slower, a low-traffic key that is read from many regions never warms anywhere and pays the cold penalty repeatedly. For such keys, either accept the latency, front them with the Cache API per-PoP, or — if they are genuinely hot globally — they will warm naturally. Do not reach for a Durable Object to “speed up” reads; a single instance serving global read traffic is slower and more expensive than KV, not faster.

Control-flow variants

Read-only guard (fail open). For non-critical caches, never let a storage error break the request. Wrap the KV read and fall through to origin on failure:

let cached: string | null = null;
try {
  cached = await env.RESPONSE_KV.get(key);
} catch {
  cached = null; // KV degraded — serve from origin, do not 500.
}
if (cached !== null) return new Response(cached);

Early-exit on hit. Treat a Cache API hit as an early-return guard: return immediately and skip every downstream middleware stage, since a cached response needs no auth re-evaluation or rewriting.

Stale-while-revalidate fallback. Serve the KV value even past its logical freshness window and trigger a background refresh through the coordinator. This pairs directly with stale-while-revalidate at the edge, where the freshness directive itself lives in the response headers.

Framework integration

Next.js App Router. Bindings are not available in middleware.ts on Vercel; KV access happens in route handlers or Cloudflare’s @cloudflare/next-on-pages adapter, which exposes getRequestContext().env. Scope the matcher tightly so the isolate only spins up for cacheable routes:

export const config = { matcher: ["/api/catalog/:path*"] };

Remix. Pass the KV namespace through the loader context. On Cloudflare Pages, the adapter injects context.cloudflare.env.RESPONSE_KV; read from it inside the loader and set Cache-Control on the returned Response.

SvelteKit. Access the binding via event.platform.env.RESPONSE_KV inside hooks.server.ts. Wrap the read in try/catch and call resolve(event) on miss so a storage outage degrades to origin rather than throwing.

Debugging workflow

Local. Run wrangler dev with --local; Miniflare emulates KV and Durable Objects in-memory. Note that local KV is immediately consistent, which hides eventual-consistency bugs — see the divergence table below.
Tracing. Emit x-cache: HIT|KV|MISS and the key on every response. Add a cf-ray-correlated log line per tier so you can see which layer served each request.
Alerting. Watch the KV-tier hit ratio and the origin fall-through rate. A rising fall-through rate after a deploy usually means a cache-key change invalidated everything at once. For per-tier hit math, see multi-tier CDN cache architecture.

Common pitfalls

Symptom	Cause	Fix
Writes “don’t take” for up to a minute	KV eventual consistency; ~60s global propagation	Read-after-write from the same isolate is not guaranteed; use a Durable Object for read-your-writes
`KV PUT failed: 429` under load	More than ~1 write/s to a single key	Coalesce writes through a Durable Object or randomize keys
Large object rejected	KV value exceeds the 25 MB cap	Store the blob in R2/Blob and keep only a pointer in KV
Stale data after invalidation in one region	Cache API is per-PoP; purge did not reach that colo	Invalidate via KV version stamp, not by deleting per-PoP entries
Counter drifts / double-counts	Concurrent isolates incrementing a KV value	Move the counter into a Durable Object with serialized writes

Runtime-constraints checklist

KV values stay under the 25 MB cap; oversized payloads moved to R2/Blob
No more than ~1 write/s per KV key; bursts coalesced through a Durable Object
All cache populations wrapped in ctx.waitUntil All cache populations wrapped in `ctx.waitUntil`, never blocking the response
Metadata (content type, version, ETag) fetched via getWithMetadata Metadata (content type, version, ETag) fetched via `getWithMetadata`, not a second read
Storage reads wrapped in try/catch Storage reads wrapped in `try/catch` so a degraded backend fails open to origin
Strong-consistency requirements routed to Durable Objects, not KV
Cache keys versioned so invalidation is a stamp change, not a per-PoP delete

Frequently Asked Questions

Is Cloudflare KV strongly consistent within the same region?

No. KV is eventually consistent everywhere. A write may take up to roughly 60 seconds to become visible at all edge locations, and even read-after-write from the same isolate is not guaranteed. If you need to read your own writes immediately, use a Durable Object, whose single-instance model gives strong consistency.

When should I use a Durable Object instead of KV?

Use a Durable Object when correctness depends on serialized, strongly consistent access: counters, locks, rate limiters, seat reservations, or write coalescing. Use KV when you are caching read-heavy data that tolerates short staleness, such as upstream API responses or rendered fragments.

How large a value can I store in KV?

A single KV value is capped at 25 MB. Anything larger belongs in object storage (R2 on Cloudflare, Vercel Blob, or Netlify Blobs); keep only a pointer or key reference in KV.

Do Vercel and Netlify have an equivalent to Durable Objects?

Not natively inside the edge runtime. Vercel KV (Upstash) and Netlify Blobs are eventually consistent stores. For strong-consistency or single-writer coordination on those platforms, you call an external service, which adds a network hop the edge-local Durable Object avoids.

Why does the Cache API show different results in two regions?

The Cache API is scoped to a single Point of Presence. A value cached in one colo is invisible to another, and purging it in one place does not purge it elsewhere. For cross-PoP invalidation, version your cache keys in KV rather than deleting individual Cache API entries.