Tag and Surrogate-Key Cache Invalidation at the Edge
This guide is part of Edge Caching & CDN Integration. It covers how to associate cached responses with logical tags so a single content change can purge every page that depends on it — without enumerating URLs, without blanket flushes, and without races between the purge and the next write.
The hardest problem in caching is not storing data; it is knowing when to throw it away. A product price changes and now a dozen pages are wrong: the product page, three category listings, the homepage carousel, a sitemap, an RSS feed. You do not have those URLs in hand, and even if you did, purging them one by one is slow and fragile. Tag-based invalidation inverts the problem: at write time you label each response with the logical entities it depends on, and at mutation time you purge by label. Change product:sku-123 once, and every response tagged with it disappears from the edge in a single call.
Why the edge changes the invalidation problem
A monolithic origin can invalidate its own in-process cache synchronously and atomically. At the edge there is no single cache — there are hundreds of PoPs, each holding its own copy, reachable only through a control-plane API. You cannot iterate keys, you cannot hold a lock across regions, and a purge is an eventually consistent broadcast that takes time to fan out. Every design decision flows from those facts.
Three constraints dominate. First, purges are asynchronous and best-effort — you issue a command and the network propagates it, typically within a few seconds, but you must assume it can be delayed or dropped, which makes idempotent retries mandatory. Second, you cannot enumerate the keyspace, so invalidation must be driven by tags attached at write time, not by scanning. Third, the mutation that triggers a purge usually originates from your application, often passing through edge middleware on a POST/PUT/DELETE, so the purge call itself lives on a request hot path or in a background ctx.waitUntil task and must never block the user’s response.
Architecture overview
Tag-based invalidation has two halves that must stay in agreement.
Tagging at write time. When the edge serves or caches a response, it attaches a tag set describing the logical entities the response renders. A product page depends on the product, its category, and maybe a pricing rule, so it carries product:123, category:7, pricing:v3. Tags are emitted as a response header — Cache-Tag on Cloudflare, Surrogate-Key on Fastly — that the CDN strips before delivery and indexes internally.
Purging at mutation time. When the application mutates an entity, it issues a purge keyed by that entity’s tag. The CDN looks up every cached entry indexed under the tag and evicts it network-wide. The application never needs the URLs.
The contract between the halves is the tag vocabulary. Both sides must derive product:123 identically from a product id, or the purge misses. Centralize tag construction in one helper so the writer and the purger cannot drift.
// One source of truth for tag strings, used by both tagging and purging.
const tags = {
product: (id: string) => `product:${id}`,
category: (id: string) => `category:${id}`,
collection: (slug: string) => `collection:${slug}`,
} as const;
function productPageTags(p: { id: string; categoryId: string }): string[] {
return [tags.product(p.id), tags.category(p.categoryId)];
}
Soft vs hard invalidation
Two strategies, with different latency and consistency trade-offs.
Hard invalidation (purge). The entry is evicted immediately; the next request is a miss and fetches from origin. Correctness is immediate but you pay an origin fetch on the next hit, and a purge of a hot tag can cause a thundering herd as many PoPs miss at once. Use hard purges for correctness-critical changes — a price, a published/unpublished flag, anything where serving the old bytes is wrong.
Soft invalidation (revalidate / version bump). Rather than evicting, you mark entries stale and let stale-while-revalidate serve the old copy once while a background fetch refreshes it. The user sees no latency spike and origin load is smoothed. Use soft invalidation for high-traffic, low-stakes content where one extra stale hit is acceptable.
Key versioning is the third lever and the most robust. Instead of purging, fold a version token into the cache key (or the tag) and bump it. pricing:v3 becomes pricing:v4; every response keyed on the new version is a guaranteed miss, while old entries simply age out under their TTL. Versioning sidesteps purge-propagation races entirely — there is no window where a stale entry can be served under the new key — at the cost of leaving dead entries in cache until they expire. This is the safest pattern when purge ordering is uncertain.
Idempotent purge with retries
A purge call can fail or time out, and you must be able to retry it without harm. Purges are naturally idempotent — purging product:123 twice is the same as once — so the only discipline needed is bounded retry with backoff, run off the hot path via ctx.waitUntil so the user’s mutation response is not delayed:
interface PurgeResult {
ok: boolean;
status: number;
attempts: number;
}
async function purgeWithRetry(
doPurge: () => Promise<Response>,
maxAttempts = 4,
): Promise<PurgeResult> {
let lastStatus = 0;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
const res = await doPurge();
lastStatus = res.status;
// 2xx = success. 4xx (except 429) = our bug; do not retry.
if (res.ok) return { ok: true, status: res.status, attempts: attempt };
if (res.status >= 400 && res.status < 500 && res.status !== 429) {
return { ok: false, status: res.status, attempts: attempt };
}
// 429 / 5xx are transient. Back off with jitter.
const backoff = Math.min(2 ** attempt * 100, 2000);
await new Promise((r) => setTimeout(r, backoff + Math.random() * 100));
}
return { ok: false, status: lastStatus, attempts: maxAttempts };
}
The retry classification matters: a 400/403/404 means the request itself is malformed or unauthorized and retrying will fail identically, so return immediately. Only 429 and 5xx are transient and worth a backed-off retry. Cap total attempts so a permanently failing control plane does not pin a waitUntil task open.
Provider mapping
| Provider | Tag mechanism | Purge mechanism | Notes |
|---|---|---|---|
| Cloudflare | Cache-Tag: a,b,c response header (Enterprise) |
POST /zones/{id}/purge_cache with { "tags": [...] }; also Tiered Cache for fan-out efficiency |
Up to 1000 tags per request; tag length limits apply. Tiered Cache reduces origin fetches after a purge by serving from upper-tier PoPs. |
| Fastly Compute | Surrogate-Key: a b c (space-separated) response header |
POST /service/{id}/purge/{key} per key, or soft purge with Fastly-Soft-Purge: 1 |
Native, first-class surrogate keys; soft purge marks stale and pairs with stale-while-revalidate. |
| Netlify | Cache-Tag: a,b,c response header on Edge Functions |
Purge by tag via the Purge API or purgeCache({ tags }) helper |
Tags integrate with Netlify-CDN-Cache-Control; on-demand and durable caching honor tags. |
| Vercel Edge | No native surrogate keys; emulate via key versioning + revalidateTag (Next.js) |
revalidateTag("product-123") in route handlers |
Next.js revalidateTag is the closest analog; pair with the tags option on fetch/unstable_cache. |
Cloudflare and Netlify both use a comma-separated Cache-Tag header; Fastly uses a space-separated Surrogate-Key. The conceptual model is identical — get the tag vocabulary right once and the per-provider header format is a thin adapter. For the concrete Cloudflare flow see purging Cloudflare cache by tag; for the Fastly flow see using surrogate keys on Fastly Compute.
Control-flow variants
Guard: only mutations purge. Read traffic must never issue purges. Gate purge logic behind a method and route check, an early-return guard that exits the purge path for any non-mutating request:
function purgeTagsFor(request: Request): string[] | null {
if (!["POST", "PUT", "PATCH", "DELETE"].includes(request.method)) return null;
const url = new URL(request.url);
const m = url.pathname.match(/^\/api\/products\/([^/]+)/);
return m ? [tags.product(m[1])] : null;
}
Early-exit: fire-and-forget on the background. The mutation response returns immediately; the purge runs in ctx.waitUntil(purgeWithRetry(...)) so users never wait for tag propagation.
Fallback: version bump when purge is unreliable. If the control plane is degraded, fall back to bumping a version token in the key vocabulary, which makes new reads miss without depending on a purge succeeding.
Framework integration
Next.js App Router. Tag fetches with the tags option and invalidate with revalidateTag in a route handler:
// app/api/products/[id]/route.ts
import { revalidateTag } from "next/cache";
export async function PUT(req: Request, { params }: { params: { id: string } }) {
await saveProduct(params.id, await req.json());
revalidateTag(`product:${params.id}`);
revalidateTag(`category:${(await getProduct(params.id)).categoryId}`);
return Response.json({ ok: true });
}
Remix. Set Cache-Tag/Surrogate-Key on loader responses and call the provider purge API from actions after a mutation, wrapped in purgeWithRetry.
SvelteKit. Attach tags in handle via setHeaders/the response in src/hooks.server.ts; purge from +server.ts actions on write.
Debugging workflow
- Local. Log the tag set attached to each response and the tag set sent on each purge. Assert the writer and purger derive identical strings for the same entity — the number one cause of “purge did nothing.”
- Tracing. Record
cache.tagsas a span attribute on the response andpurge.tagson the mutation. Correlate a stale-content report with whether a purge for the expected tag was issued and what it returned. - Alerting. Alert on purge API error rate and on
purgeWithRetryexhausting attempts. A rising purge-failure rate means content is going stale silently.
Common pitfalls
| Symptom | Cause | Fix |
|---|---|---|
| Purge “succeeds” but content stays stale | Writer and purger compute different tag strings | Centralize tag construction in one shared helper used by both paths |
| Mutation response is slow | Purge runs synchronously on the request path | Move the purge into ctx.waitUntil with retries |
| Transient 5xx leaves content stale | Purge issued once, no retry | Wrap in purgeWithRetry; retry only 429/5xx, never 4xx |
| Origin overwhelmed right after a purge | Hard purge of a hot tag causes a thundering herd | Use soft purge / stale-while-revalidate, or tiered cache to absorb misses |
| Stale entry served briefly under new content | Purge propagation race | Bump a version token in the tag/key so new reads are guaranteed misses |
| Purge rejected with 4xx | Too many tags or oversized tag in one call | Batch within the provider’s per-request tag limit; keep tags short |
Runtime-constraints checklist
- Purges run off the hot path via
ctx.waitUntil - Purge calls are wrapped in bounded retry that retries only
429/ - Read/
GET
Frequently Asked Questions
What is the difference between a cache tag and a surrogate key?
They are the same concept under different vendor names. Cloudflare and Netlify call it a Cache-Tag; Fastly calls it a Surrogate-Key. In all cases you attach labels to a cached response and later purge every response carrying a given label. The only differences are header syntax and the purge API.
When should I use soft invalidation instead of a hard purge?
Use soft invalidation for high-traffic, low-stakes content where serving one extra stale response is acceptable. It marks entries stale and lets stale-while-revalidate refresh them in the background, avoiding a latency spike and origin thundering herd. Use a hard purge when serving the old bytes would be incorrect, such as a price or publish-state change.
How do I make purges reliable when the control plane can fail?
Wrap the purge call in bounded retry with backoff, retrying only transient 429 and 5xx responses and never 4xx. Run it off the request hot path with ctx.waitUntil so the user’s mutation response is not delayed, and alert when retries are exhausted.
What is key versioning and why is it the safest option?
Key versioning folds a version token into the cache key or tag, such as pricing:v3. To invalidate, you bump it to pricing:v4, which makes every new read a guaranteed miss while old entries age out under their TTL. It sidesteps purge-propagation races entirely, at the cost of leaving dead entries in cache until they expire.
Why does my purge succeed but content stays stale?
Almost always because the tag string sent on purge does not exactly match the tag string attached at write time. Centralize tag construction in one shared helper so the writer and purger cannot drift, and log both tag sets to confirm they match.