Streaming Responses: Vercel Edge vs Cloudflare Workers

You want to stream tokens from an LLM, push Server-Sent Events, or transform a large upstream body without buffering it — and the response cuts off mid-stream, arrives all at once, or hits a hard timeout. Streaming is where Vercel Edge Runtime and Cloudflare Workers look most alike (both implement the WHATWG Streams API) and behave most differently (the wall-clock budget that ends a stream is not the same number on each).

This guide is part of Vercel Edge Runtime vs Cloudflare Workers. It builds a TransformStream pipeline and an SSE endpoint that run unchanged on both V8 isolate runtimes, then maps the limits that decide how long a stream can stay open.

Root cause: streaming is free, holding the connection open is not

Returning a ReadableStream as a Response body lets the runtime flush bytes to the client as you produce them, instead of buffering the whole payload in the 128 MB isolate. That is the same on both platforms. The constraint that bites is how long the runtime lets the response stay open:

Vercel Edge caps a streaming response at roughly 25 seconds before it terminates the connection. That is generous for SSE but finite — a stream that idles past it is killed.
Cloudflare Workers has no fixed wall-clock cap on a streaming response; the connection stays open as long as the client reads and you keep writing, bounded instead by subrequest and CPU limits. CPU time only accrues while your code runs, not while you await a write.

A second difference: on Cloudflare you should attach long-lived stream pumping to ctx.waitUntil so the runtime does not consider the request finished while the body is still being produced. Vercel keeps the invocation alive for the duration of the response automatically.

A TransformStream rewrites chunks in flight; the isolate never holds the full body in memory.

Step 1: Stream a generated body on Vercel Edge

The simplest stream uses TransformStream: write into the writable side, return the readable side as the body. This pattern is identical on both runtimes.

// app/api/stream/route.ts — Vercel Edge
export const runtime = 'edge';

export async function GET() {
  const { readable, writable } = new TransformStream();
  const writer = writable.getWriter();
  const encoder = new TextEncoder();

  (async () => {
    for (let i = 0; i < 5; i++) {
      await writer.write(encoder.encode(`chunk ${i}\n`));
      await new Promise((r) => setTimeout(r, 200)); // simulate work
    }
    await writer.close();
  })();

  return new Response(readable, {
    headers: { 'content-type': 'text/plain; charset=utf-8' },
  });
}

The async IIFE pumps the writer without blocking the return. Vercel keeps the invocation alive until writer.close() resolves or the ~25 s cap is hit.

Step 2: Stream the same body on Cloudflare Workers

The pump logic is the same. The difference: hand the pumping promise to ctx.waitUntil so the runtime does not garbage-collect the request while bytes are still flowing.

// src/worker.ts — Cloudflare Workers
export default {
  async fetch(req: Request, env: unknown, ctx: ExecutionContext): Promise<Response> {
    const { readable, writable } = new TransformStream();
    const writer = writable.getWriter();
    const encoder = new TextEncoder();

    const pump = (async () => {
      for (let i = 0; i < 5; i++) {
        await writer.write(encoder.encode(`chunk ${i}\n`));
        await new Promise((r) => setTimeout(r, 200));
      }
      await writer.close();
    })();

    ctx.waitUntil(pump); // keep the worker alive until the stream finishes

    return new Response(readable, {
      headers: { 'content-type': 'text/plain; charset=utf-8' },
    });
  },
};

Step 3: Transform an upstream body without buffering it

The high-value pattern is rewriting a streamed upstream response in flight — useful for redacting fields, injecting markup, or reframing LLM tokens. Pipe upstream.body through a TransformStream and never materialize the whole payload.

// uppercase.ts — runtime-agnostic streaming transform
export async function streamUppercase(upstreamUrl: string): Promise<Response> {
  const upstream = await fetch(upstreamUrl);
  if (!upstream.body) return new Response('no body', { status: 502 });

  const decoder = new TextDecoder();
  const encoder = new TextEncoder();

  const transform = new TransformStream<Uint8Array, Uint8Array>({
    transform(chunk, controller) {
      const text = decoder.decode(chunk, { stream: true });
      controller.enqueue(encoder.encode(text.toUpperCase()));
    },
  });

  return new Response(upstream.body.pipeThrough(transform), {
    headers: { 'content-type': upstream.headers.get('content-type') ?? 'text/plain' },
  });
}

pipeThrough wires backpressure automatically: if the client reads slowly, the runtime pauses pulling from upstream. You get flow control for free as long as you do not buffer chunks yourself.

Step 4: Emit Server-Sent Events

SSE is a stream with a specific framing (data: lines, double-newline terminators) and content type. Build it on the same TransformStream primitive.

// sse.ts — runtime-agnostic SSE source
export function sseResponse(): Response {
  const { readable, writable } = new TransformStream();
  const writer = writable.getWriter();
  const enc = new TextEncoder();

  const send = (event: string, data: unknown) =>
    writer.write(enc.encode(`event: ${event}\ndata: ${JSON.stringify(data)}\n\n`));

  (async () => {
    for (let i = 0; i < 10; i++) {
      await send('tick', { n: i, ts: Date.now() });
      await new Promise((r) => setTimeout(r, 1000));
    }
    await send('done', {});
    await writer.close();
  })();

  return new Response(readable, {
    headers: {
      'content-type': 'text/event-stream',
      'cache-control': 'no-cache, no-transform',
      connection: 'keep-alive',
    },
  });
}

no-transform is essential — it stops intermediary CDNs from buffering or compressing the stream, which would defeat the point.

Configuration

Vercel needs the edge runtime declared on the route. Cloudflare needs nothing special for streaming beyond a current compatibility date.

// Vercel: declare edge runtime on the route segment
export const runtime = 'edge';

// wrangler.jsonc — Cloudflare Workers
{
  "name": "stream-edge",
  "main": "src/worker.ts",
  "compatibility_date": "2026-01-01"
}

Local vs production divergence

Concern	Local dev	Production
Streaming time cap (Vercel)	unlimited under `next dev`	~25 s for streaming responses
Streaming time cap (Cloudflare)	unlimited under `wrangler dev`	no fixed cap; bounded by subrequest/CPU limits
Chunk flushing	Node may buffer the whole body before sending	flushed per chunk at the PoP
`ctx.waitUntil` (Cloudflare)	works, but request may not be reaped early	required to keep a long stream alive
Proxy buffering	none locally	a CDN may buffer without `no-transform`
Backpressure	rarely exercised on fast loopback	real across the network — slow clients pause pulls

The trap: SSE that works perfectly in next dev and wrangler dev arrives all-at-once in production because an upstream proxy buffers it. Set cache-control: no-cache, no-transform and avoid compressing event streams.

Validation with Vitest

Test the transform by reading the readable side to completion and asserting on the assembled output. No live runtime needed.

// uppercase.test.ts
import { describe, it, expect, vi } from 'vitest';
import { streamUppercase } from './uppercase';

function bodyOf(text: string): ReadableStream<Uint8Array> {
  const enc = new TextEncoder();
  return new ReadableStream({
    start(c) {
      c.enqueue(enc.encode(text.slice(0, 3)));
      c.enqueue(enc.encode(text.slice(3)));
      c.close();
    },
  });
}

describe('streamUppercase', () => {
  it('uppercases a chunked upstream body', async () => {
    vi.stubGlobal('fetch', async () =>
      new Response(bodyOf('hello world'), { headers: { 'content-type': 'text/plain' } }),
    );
    const res = await streamUppercase('https://upstream.test/x');
    expect(await res.text()).toBe('HELLO WORLD');
  });
});

Named pitfalls

Awaiting the pump before returning. Blocking on the writer loop buffers everything and defeats streaming. Fix: run the pump in a detached async function; return the readable immediately.
Forgetting ctx.waitUntil on Cloudflare. A long stream can be cut short when the runtime reaps the request. Fix: pass the pump promise to ctx.waitUntil.
No no-transform on SSE. A buffering proxy collapses the stream into one delayed payload. Fix: set cache-control: no-cache, no-transform.
Ignoring Vercel’s ~25 s streaming cap. Long idle streams are killed. Fix: send periodic keep-alive comments (:\n\n) or move very long streams to a durable connection model.
Building chunks with Buffer. It does not exist at the edge. Fix: use TextEncoder/Uint8Array (see replacing Node Buffer with Uint8Array).

Production deployment checklist

Stream pump runs detached; the Response Stream pump runs detached; the `Response` returns before the body completes
Cloudflare passes the pump promise to Cloudflare passes the pump promise to `ctx.waitUntil`
SSE responses set content-type: text/event-stream and SSE responses set `content-type: text/event-stream` and `cache-control: no-cache, no-transform`
Vercel routes declare Vercel routes declare `export const runtime = 'edge'`
Long streams send keep-alive frames to survive the Vercel ~25 s cap
Transforms use pipeThrough Transforms use `pipeThrough` so backpressure is honored
No Buffer usage; encoding uses TextEncoder/ No `Buffer` usage; encoding uses `TextEncoder`/`Uint8Array`
Transform tested end-to-end against a chunked upstream body in CI

Frequently Asked Questions

How long can a streaming response stay open on each platform?

Vercel Edge caps streaming responses at roughly 25 seconds. Cloudflare Workers has no fixed wall-clock cap on a stream — it stays open while the client reads and you keep writing, bounded by subrequest and CPU limits rather than a single timeout.

Why does my SSE stream arrive all at once in production but not locally?

An intermediary proxy or CDN is buffering it. Set cache-control: no-cache, no-transform on the response and do not compress text/event-stream. Local dev servers do not buffer, which hides the problem.

Do I need ctx.waitUntil to stream on Cloudflare?

For long-lived streams, yes. Pass the pumping promise to ctx.waitUntil so the runtime does not consider the request finished while the body is still being produced. Vercel keeps the invocation alive for the response duration automatically.

Does streaming buffer the whole body in the isolate?

No, if you stream correctly. Returning a ReadableStream and using pipeThrough flushes chunks as they are produced and applies backpressure, so the 128 MB isolate never holds the full payload. Buffering happens only if you accumulate chunks yourself before writing.