Why is my cache hit ratio low even with a long TTL?

Common causes are a Set-Cookie header making the response uncacheable, an over-broad Vary header fragmenting the key, or unnormalized query strings. Strip cookies from cacheable responses, minimize Vary, and normalize the cache key so identical URLs share one entry.

Multi-Tier CDN Cache Architecture

This guide is part of Edge Caching & CDN Integration. It explains how a request travels through layered caches — browser, edge Point of Presence, regional or tiered cache, origin shield, and finally origin — why each tier exists, how to reason about the compound hit ratio across them, and how predictive pre-warming keeps the cold tiers warm.

A single cache tier is rarely enough. An edge PoP cache gives every region a local copy, but the moment a value expires or a cold PoP receives its first request, it falls straight through to origin. With hundreds of PoPs, that means hundreds of independent origin fetches for the same object — an origin stampede that defeats the purpose of caching. Multi-tier architecture inserts intermediate caches so that a miss at the edge is absorbed by a regional tier or an origin shield before it ever reaches your application.

The constraint that drives the pattern

Edge caches are distributed and independent. Cloudflare alone operates data centers in hundreds of cities; each runs its own Cache API store with no visibility into the others. Without tiering, the effective origin request rate scales with the number of cold PoPs, not with the number of unique objects. For a popular object served from 200 PoPs, a synchronized expiry can produce 200 simultaneous origin fetches.

Tiering solves this by designating upstream caches. A lower PoP that misses does not go to origin — it asks an upstream regional cache, which has likely already been populated by a neighboring PoP. Only that one upstream miss reaches origin (or the shield). The origin sees a request rate proportional to unique objects times a small constant, not times the PoP count.

The same constraint shapes freshness. Each tier has its own TTL and its own copy, so a stale object can persist in a lower tier after the upper tier refreshed. Coherent invalidation must reach every tier, which is why tag and surrogate-key purging — covered in tag and surrogate-key invalidation — is a hard requirement at scale.

A request descends only as far as the first tier that holds the object. Each tier shields the one above it, collapsing many edge misses into few origin fetches.

Architecture overview

Read the hierarchy top to bottom — closest to the user first.

Browser cache. Governed by Cache-Control: max-age on the response. The cheapest possible hit: zero network. Set it conservatively for HTML, generously for fingerprinted static assets.
Edge PoP cache. The colo serving the user. A hit here is single-digit milliseconds. This is where most CDN hits land.
Regional / tiered cache. An upper-tier cache that a group of PoPs treat as their upstream. A lower PoP miss is absorbed here instead of going to origin.
Origin shield. A single designated cache (or small set) that all regional tiers funnel through. It is the last line before origin and the key to a high origin offload ratio — only the shield’s misses reach your application.
Origin. Your application or object store. The architecture’s whole job is to minimize requests that arrive here.

Hit-ratio math

The value of tiering is multiplicative, and you should compute it before deploying. If each tier independently serves a fraction of the requests reaching it, the requests surviving to origin are the product of the miss rates.

Suppose the browser absorbs 20% (h_b = 0.20), the edge PoP absorbs 70% of what remains (h_e = 0.70), the regional tier absorbs 60% of the rest (h_r = 0.60), and the shield absorbs 50% of what is left (h_s = 0.50). The fraction of original requests reaching origin is:

origin_fraction = (1 - h_b) * (1 - h_e) * (1 - h_r) * (1 - h_s)
                = 0.80 * 0.30 * 0.40 * 0.50
                = 0.048   // ~4.8% reach origin; ~95.2% offloaded

Removing the regional tier and shield raises origin traffic to 0.80 * 0.30 = 24% — a fivefold increase in origin load from the same edge hit ratio. The intermediate tiers are not redundant; they compound.

function originFraction(hitRates: number[]): number {
  // Product of miss rates across all tiers, in user-to-origin order.
  return hitRates.reduce((survive, h) => survive * (1 - h), 1);
}

const offload = 1 - originFraction([0.2, 0.7, 0.6, 0.5]); // 0.952

Predictive pre-warming

A cold tier is the enemy of tail latency. After a deploy, a purge, or a TTL expiry, the next request pays the full descent to origin. Pre-warming pushes popular objects into the upper tiers before user traffic demands them. Trigger it from the same mutation path that invalidates: when content changes, purge the tiers and immediately re-fetch the canonical URLs so the regional cache and shield repopulate during low-traffic windows rather than under peak load.

// Run from a scheduled Worker or post-deploy hook, not the hot request path.
async function prewarm(urls: string[]): Promise<void> {
  await Promise.allSettled(
    urls.map((u) =>
      fetch(u, { headers: { "x-prewarm": "1" }, cf: { cacheEverything: true } }),
    ),
  );
}

Pre-warming pairs naturally with stale-while-revalidate at the edge: serve the stale copy instantly while the background fetch repopulates the descending tiers.

TTL coordination across tiers

A multi-tier cache only behaves predictably when the TTLs across tiers are deliberately coordinated rather than copied. The mechanism is the split between max-age (the browser’s private cache) and s-maxage (shared caches: every CDN tier). Give the browser a short max-age so a user picks up changes quickly, and give the shared tiers a long s-maxage so the edge and regional caches absorb the bulk of traffic. A common, robust profile for a fingerprinted asset is public, max-age=300, s-maxage=31536000 — five minutes in the browser, effectively forever at the edge, because the fingerprint changes the URL on every content change so the long shared TTL is never wrong.

The danger is an inverted profile: a long browser max-age on HTML means users cannot pick up a publish until their local cache expires, and no purge you issue can reach a browser cache. As a rule, HTML and API responses get short max-age with a longer s-maxage plus stale-while-revalidate; immutable fingerprinted assets get a short max-age and a very long s-maxage. The shared tiers are the ones you control through purging, so it is safe to let them hold content far longer than the browser.

Origin offload as the metric that matters

Edge hit ratio is the number teams report, but origin offload ratio is the number that pays the bill and protects availability. They are not the same. A 90% edge hit ratio still sends 10% of traffic deeper — and without intermediate tiers, most of that 10% lands on origin. The compound math above is exactly the gap between “edge hit ratio” and “origin offload”: each intermediate tier converts what would have been an origin request into an upper-tier hit. Instrument the origin’s own request rate, not just the CDN’s reported hit ratio, and alert on it. A deploy that quietly drops origin offload from 95% to 80% quadruples origin load while the edge hit ratio looks barely changed — the kind of regression that only the origin-side metric catches.

Provider mapping

Every major platform offers tiering, but the names and the granularity differ. Use these values.

Capability	Cloudflare	Vercel	Netlify	Concept
Edge PoP cache	Cache (per data center)	Edge Network cache	Edge CDN cache	Closest tier to user
Regional / tiered cache	Tiered Cache (Smart or custom topology)	Regional edge caches	Regional cache	Absorbs PoP misses
Origin shield	Tiered Cache designates upper tiers; Cache Reserve adds a persistent tier	Origin shielding via regional consolidation	Origin shielding	Single funnel to origin
Persistent upper tier	Cache Reserve (R2-backed, long-lived)	—	—	Survives PoP eviction
Smart routing / latency optimization	Argo Smart Routing	Built into Edge Network	Built-in	Faster tier-to-tier hops
Tag / surrogate invalidation	Cache-Tag header + purge by tag (Enterprise)	Cache tags / on-demand revalidation	Cache-Tag + purge API	Coherent multi-tier purge

Cloudflare is the most explicit: Tiered Cache designates which data centers act as upper tiers, and Cache Reserve adds an R2-backed persistent layer that survives normal eviction, dramatically raising the offload ratio for rarely requested content. Vercel and Netlify expose tiering as a managed behavior with fewer knobs — you tune it primarily through Cache-Control and platform revalidation APIs rather than topology configuration. The hands-on walkthrough for Cloudflare lives in configuring tiered cache on Cloudflare.

Control-flow variants

Cache-everything with an early return. Treat an edge hit as an early-return guard: a cached object skips every downstream middleware stage. Mark cacheable routes explicitly so dynamic routes never accidentally populate a shared tier.

Bypass for personalized content. Authenticated or per-user responses must never enter a shared tier. Set Cache-Control: private, no-store and short-circuit the cache lookup when an Authorization header or session cookie is present, exactly as cache-key derivation handles in cache-key normalization and Vary.

Tiered fallback on origin failure. If the shield’s origin fetch fails, serve the last good object from the persistent tier (Cache Reserve) rather than returning an error. This converts an origin outage into stale-but-available content.

Framework integration

Next.js App Router. Caching is driven by route segment config and Cache-Control headers set in route handlers or middleware. On Vercel, revalidate and on-demand revalidation populate and purge the managed tiers; you do not configure topology directly. Scope config.matcher so middleware does not run for static assets that should hit the CDN directly:

export const config = { matcher: ["/((?!_next/static|favicon.ico).*)"] };

Remix. Set Cache-Control on loader responses (headers export). The CDN tiers honor max-age and s-maxage; use s-maxage to give shared tiers a longer life than the browser.

SvelteKit. Use setHeaders({ 'cache-control': 'public, s-maxage=3600' }) inside a load function. The shared-cache directive (s-maxage) controls the edge and regional tiers independently of the browser’s max-age.

Debugging workflow

Local. Tiering does not exist locally — emulators run a single cache. Validate cacheability (correct Cache-Control, no accidental Set-Cookie) locally, then verify tier behavior in a staging zone.
Tracing. Inspect cache-status response headers (cf-cache-status, x-vercel-cache, Netlify-Cache-Status). Distinguish HIT, MISS, EXPIRED, and REVALIDATED, and correlate by region to see which tier served each request.
Alerting. Track the origin offload ratio per content class. A sudden drop usually means a cache-key change, an unexpected Set-Cookie, or a Vary header fragmenting the key space. Watch shield request rate as the canary for origin load.

Common pitfalls

Symptom	Cause	Fix
Origin load scales with PoP count	No tiered cache; every cold PoP fetches origin	Enable Tiered Cache / origin shield so misses funnel through an upper tier
Low hit ratio despite long TTL	`Set-Cookie` or `Vary` on a cacheable response makes it uncacheable or fragments keys	Strip `Set-Cookie`; normalize the cache key and minimize `Vary`
Stale content after a publish	Purge reached the edge but not the regional/persistent tier	Use tag-based purge that propagates to every tier
Tail latency spikes after deploy	Cold upper tiers; first request descends to origin	Pre-warm popular URLs into the upper tiers post-deploy
Personalized data served to wrong user	User-specific response cached in a shared tier	Set `private, no-store` and bypass cache when auth is present

Runtime-constraints checklist

Tiered cache / origin shield enabled so origin load tracks unique objects, not PoP count
Compound hit ratio computed; intermediate tiers justified by the offload math
s-maxage set for shared tiers independently of browser `s-maxage` set for shared tiers independently of browser `max-age`
No Set-Cookie No `Set-Cookie` on responses intended for shared tiers
Cache key normalized and Vary Cache key normalized and `Vary` minimized to avoid fragmentation
Tag/surrogate-key purge propagates to every tier, including any persistent tier
Popular URLs pre-warmed after deploys and purges, off the hot path
Cache-status headers monitored per region; origin offload ratio alerted on

Frequently Asked Questions

What is an origin shield and why does it matter?

An origin shield is a single designated cache (or small set) that all regional tiers funnel through before reaching origin. Without it, every regional cache that misses contacts origin independently, so origin load scales with the number of regions. With a shield, only the shield’s misses reach origin, collapsing many regional fetches into one and dramatically raising the offload ratio.

How do I calculate the overall cache hit ratio?

Multiply the miss rate of each tier in user-to-origin order; the product is the fraction of requests that reach origin, and one minus that product is the offload ratio. Because the tiers compound, adding a regional tier and shield can cut origin traffic several-fold even when the edge hit ratio is unchanged.

Does Cloudflare Tiered Cache cost extra?

Smart Tiered Cache is available on paid plans, and Cache Reserve (the persistent R2-backed tier) is billed separately for storage and operations. The offload it provides on infrequently requested content usually outweighs the cost, but model it against your origin egress and request pricing before enabling it broadly.

Why is my hit ratio low even with a long TTL?

The most common causes are a Set-Cookie header making the response uncacheable, an over-broad Vary header fragmenting the cache key, or query-string parameters not being normalized. Strip cookies from cacheable responses, minimize Vary, and normalize the cache key so semantically identical URLs share one entry.

How do I keep all tiers coherent after publishing content?

Use tag-based or surrogate-key purging that propagates to every tier, including any persistent tier, rather than relying on TTL expiry. Then pre-warm the affected URLs so the upper tiers repopulate before user traffic arrives, avoiding a cold-tier latency spike.