Per-IP Rate Limiting with Cloudflare KV

You want to slow down scrapers and runaway clients without adding a round-trip to a single-region actor on every request. A counter in a KV store reads from a local Point of Presence replica in single-digit milliseconds, which makes it the cheapest way to put a soft ceiling on per-IP traffic. The catch is that KV is eventually consistent, so the limit it enforces is approximate. This guide is part of Rate Limiting and Abuse Prevention at the Edge, and it shows how to build the limiter, quantify its inaccuracy, and recognize when you must graduate to Durable Objects.

The constraint: KV is fast but eventually consistent

Workers KV is a read-optimized, globally-replicated key-value store. Reads are served from a cache at the edge isolate’s PoP, so they are fast but may return a stale value. Writes are durable but propagate to other PoPs eventually — typically within seconds, not instantly. There is no atomic increment primitive: you read a value, add one in your code, and write it back. That read-modify-write is not transactional, so two requests for the same IP — whether concurrent in one PoP or landing in two different PoPs — can both read count = 4, both write 5, and the increment is lost.

The practical consequence: a KV-backed limiter enforces a soft limit. Under bursty or geographically-spread traffic a client can exceed the nominal limit by a margin that grows with concurrency. That is acceptable for coarse abuse mitigation (scraper slowdown, accidental retry storms) and unacceptable for anything that must be exact (login throttling, billing quotas) — for those, use the Durable Object approach in Token-bucket rate limiting at the edge.

Without an atomic increment, two PoPs reading the same stale KV value lose one update — the source of KV's approximate limit.

Step 1 — Derive a safe per-IP key

Build the counter key from the platform’s connecting-IP header and the current fixed window. Never trust a raw X-Forwarded-For value supplied by the client — it is forgeable, and an attacker who controls it can sidestep the limit by rotating the header.

// key.ts
const WINDOW_SECONDS = 60;
const LIMIT = 100; // requests per window per IP

export function rateLimitKey(request: Request): string {
  const ip = request.headers.get("cf-connecting-ip") ?? "0.0.0.0";
  const window = Math.floor(Date.now() / 1000 / WINDOW_SECONDS);
  return `rl:${ip}:${window}`;
}

Folding the window number into the key gives a clean fixed-window counter: each minute uses a fresh key, and old keys expire on their own. This is the simplest algorithm to run on KV; the rate-limiting overview compares it with sliding-window and token-bucket.

Step 2 — Read, increment, and write with a TTL

Read the current count, increment, and write back with expirationTtl set to the window length so the key self-cleans. Setting a TTL is mandatory — without it, every IP and window pair would persist forever and bloat the namespace.

// limiter.ts
import { rateLimitKey } from "./key";

const WINDOW_SECONDS = 60;
const LIMIT = 100;

export interface KvLimitResult {
  allowed: boolean;
  count: number;
  retryAfterSeconds: number;
}

export async function checkLimit(
  request: Request,
  kv: KVNamespace,
): Promise<KvLimitResult> {
  const key = rateLimitKey(request);
  const current = parseInt((await kv.get(key)) ?? "0", 10);
  const count = current + 1;

  if (count > LIMIT) {
    const retryAfterSeconds = WINDOW_SECONDS - (Math.floor(Date.now() / 1000) % WINDOW_SECONDS);
    return { allowed: false, count, retryAfterSeconds };
  }

  // Refresh TTL on every write so the key lives exactly one window.
  await kv.put(key, String(count), { expirationTtl: WINDOW_SECONDS });
  return { allowed: true, count, retryAfterSeconds: 0 };
}

This is the non-atomic read-modify-write described above. It is correct enough for coarse limiting; just remember the enforced ceiling can drift above LIMIT under concurrency.

Step 3 — Wire the limiter into the Worker

Call the limiter at the front of the request handler and short-circuit with a 429 when denied. Rejecting here is an early-return guard, so blocked requests never reach origin.

// worker.ts
import { checkLimit } from "./limiter";

interface Env {
  RATE_LIMIT: KVNamespace;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const result = await checkLimit(request, env.RATE_LIMIT);

    if (!result.allowed) {
      return new Response(JSON.stringify({ error: "rate_limited" }), {
        status: 429,
        headers: {
          "content-type": "application/json",
          "retry-after": String(result.retryAfterSeconds),
          "x-ratelimit-limit": "100",
          "x-ratelimit-remaining": "0",
        },
      });
    }

    return new Response("ok", {
      headers: { "x-ratelimit-remaining": String(Math.max(0, 100 - result.count)) },
    });
  },
};

Step 4 — Configure wrangler

Bind the KV namespace. Create the namespace once with npx wrangler kv namespace create RATE_LIMIT and paste the returned id.

// wrangler.jsonc
{
  "name": "edge-ip-limiter",
  "main": "src/worker.ts",
  "compatibility_date": "2026-06-01",
  "kv_namespaces": [
    { "binding": "RATE_LIMIT", "id": "" }
  ]
}

Local vs production divergence

Behavior	`wrangler dev` (local)	Production
KV consistency	Immediate, in-process	Eventually consistent across PoPs (~seconds)
Lost-update races	Rare (single process)	Real under concurrency / multi-PoP traffic
`cf-connecting-ip`	Often `127.0.0.1` or absent	Real client IP set by the edge
TTL expiry	Honored locally	Honored, with slight propagation delay
Read latency	Near-zero	Single-digit ms from PoP cache, but value may be stale

The most dangerous divergence is consistency: local testing makes the limiter look exact because there is one process. In production the approximation surfaces only under concurrency, so explicitly load-test it.

Step 5 — Validate with Vitest

Use an in-memory KV stub to test the counting logic deterministically. This proves the boundary at LIMIT and the per-window reset without needing the real eventual-consistency behavior.

// limiter.test.ts
import { describe, it, expect, vi } from "vitest";
import { checkLimit } from "./limiter";

function fakeKv(): KVNamespace {
  const store = new Map<string, string>();
  return {
    get: async (k: string) => store.get(k) ?? null,
    put: async (k: string, v: string) => void store.set(k, v),
    delete: async (k: string) => void store.delete(k),
    list: async () => ({ keys: [], list_complete: true, cacheStatus: null }),
    getWithMetadata: async () => ({ value: null, metadata: null, cacheStatus: null }),
  } as unknown as KVNamespace;
}

function reqFromIp(ip: string): Request {
  return new Request("https://x/", { headers: { "cf-connecting-ip": ip } });
}

describe("per-IP KV limiter", () => {
  it("allows up to the limit then rejects", async () => {
    vi.spyOn(Date, "now").mockReturnValue(60_000); // fixed window
    const kv = fakeKv();
    const req = reqFromIp("203.0.113.7");

    let last;
    for (let i = 0; i < 101; i++) {
      last = await checkLimit(req, kv);
    }
    expect(last!.allowed).toBe(false); // 101st request denied
    expect(last!.retryAfterSeconds).toBeGreaterThan(0);
    vi.restoreAllMocks();
  });

  it("isolates counts per IP", async () => {
    vi.spyOn(Date, "now").mockReturnValue(60_000);
    const kv = fakeKv();
    const a = await checkLimit(reqFromIp("198.51.100.1"), kv);
    const b = await checkLimit(reqFromIp("198.51.100.2"), kv);
    expect(a.count).toBe(1);
    expect(b.count).toBe(1); // independent buckets
    vi.restoreAllMocks();
  });

  it("resets when the window rolls over", async () => {
    const now = vi.spyOn(Date, "now").mockReturnValue(60_000);
    const kv = fakeKv();
    const req = reqFromIp("203.0.113.9");
    await checkLimit(req, kv);
    now.mockReturnValue(120_000); // next window → new key
    const next = await checkLimit(req, kv);
    expect(next.count).toBe(1);
    vi.restoreAllMocks();
  });
});

Run with npx vitest run. The stub is single-process, so it cannot reproduce lost updates — that behavior is inherent to distributed KV and must be confirmed with a production load test rather than a unit test.

When to switch to Durable Objects

KV is the right tool until precision becomes a requirement. Move to a Durable Object when any of these hold:

The limit guards a security boundary (login attempts, OTP requests, password reset).
The limit meters billable usage and over-counting costs money.
Tight limits (single-digit requests per second) make KV’s drift a meaningful fraction of the budget.
You need exact, per-request X-RateLimit-Remaining values clients can rely on.

For loose limits where occasional over-count is harmless, KV’s speed and zero cross-region round-trip win.

Pitfalls

Trusting X-Forwarded-For. Key on CF-Connecting-IP; the forwarded header is client-controlled and bypasses the limit.
Omitting expirationTtl. Without a TTL, counter keys live forever and the namespace grows without bound.
Treating the limit as exact. KV under-counts concurrent writes; document the limit as a soft ceiling and size it with headroom.
Shared-NAT false positives. Many users behind one IP share a bucket; prefer authenticated identity for logged-in traffic and reserve IP limits for anonymous routes.
Reading your own write immediately. A read right after a write may still return the old value due to cache; never assume a just-written count is visible everywhere.

Production deployment checklist

Counter key built from CF-Connecting-IP, never raw Counter key built from `CF-Connecting-IP`, never raw `X-Forwarded-For`
expirationTtl `expirationTtl` set equal to the window length on every write
429 responses include an accurate Retry-After `429` responses include an accurate `Retry-After` and short-circuit the chain
Limit documented and sized as approximate (soft ceiling), with headroom
Security- or billing-critical limits moved to Durable Objects instead
Anonymous-only IP limits; logged-in traffic keyed on identity
Behavior load-tested for over-count under real concurrency

Frequently Asked Questions

Why is a KV rate limiter only approximate?

Workers KV is eventually consistent and has no atomic increment. A limiter reads the count, adds one in code, and writes it back, and that read-modify-write is not transactional. Two requests — concurrent in one PoP or in different PoPs reading a stale replica — can both read the same value and both write the same incremented value, losing one count. The enforced limit therefore drifts above the nominal value under concurrency.

When should I move from KV to Durable Objects?

Move when the limit must be exact: login or OTP throttling, billing quotas, or tight single-digit-per-second limits where KV’s drift is a large fraction of the budget. A Durable Object serializes all requests for a key through one instance, giving an exact count at the cost of a round-trip to the object’s home region. For loose limits where occasional over-count is harmless, KV’s lower latency wins.

Do I need a TTL on KV counter keys?

Yes. Set expirationTtl to the window length on every write so each IP-and-window key expires automatically. Without a TTL, every counter key persists indefinitely and the namespace grows without bound, since a new key is created for every window. The TTL also gives you the per-window reset for free.

Can I rate-limit by IP when clients share a NAT?

IP keying treats every user behind a shared NAT or corporate proxy as one client, so a tight per-IP limit can block legitimate users. For authenticated traffic, key on a verified JWT sub or API key instead. Reserve IP-based limits for anonymous routes, and size them with enough headroom to tolerate many users behind one address.