Streaming Responses: Vercel Edge vs Cloudflare Workers
You want to stream tokens from an LLM, push Server-Sent Events, or transform a large upstream body without buffering it — and the response cuts off mid-stream, arrives all at once, or hits a hard timeout. Streaming is where Vercel Edge Runtime and Cloudflare Workers look most alike (both implement the WHATWG Streams API) and behave most differently (the wall-clock budget that ends a stream is not the same number on each).
This guide is part of Vercel Edge Runtime vs Cloudflare Workers. It builds a TransformStream pipeline and an SSE endpoint that run unchanged on both V8 isolate runtimes, then maps the limits that decide how long a stream can stay open.
Root cause: streaming is free, holding the connection open is not
Returning a ReadableStream as a Response body lets the runtime flush bytes to the client as you produce them, instead of buffering the whole payload in the 128 MB isolate. That is the same on both platforms. The constraint that bites is how long the runtime lets the response stay open:
- Vercel Edge caps a streaming response at roughly 25 seconds before it terminates the connection. That is generous for SSE but finite — a stream that idles past it is killed.
- Cloudflare Workers has no fixed wall-clock cap on a streaming response; the connection stays open as long as the client reads and you keep writing, bounded instead by subrequest and CPU limits. CPU time only accrues while your code runs, not while you
awaita write.
A second difference: on Cloudflare you should attach long-lived stream pumping to ctx.waitUntil so the runtime does not consider the request finished while the body is still being produced. Vercel keeps the invocation alive for the duration of the response automatically.
Step 1: Stream a generated body on Vercel Edge
The simplest stream uses TransformStream: write into the writable side, return the readable side as the body. This pattern is identical on both runtimes.
// app/api/stream/route.ts — Vercel Edge
export const runtime = 'edge';
export async function GET() {
const { readable, writable } = new TransformStream();
const writer = writable.getWriter();
const encoder = new TextEncoder();
(async () => {
for (let i = 0; i < 5; i++) {
await writer.write(encoder.encode(`chunk ${i}\n`));
await new Promise((r) => setTimeout(r, 200)); // simulate work
}
await writer.close();
})();
return new Response(readable, {
headers: { 'content-type': 'text/plain; charset=utf-8' },
});
}
The async IIFE pumps the writer without blocking the return. Vercel keeps the invocation alive until writer.close() resolves or the ~25 s cap is hit.
Step 2: Stream the same body on Cloudflare Workers
The pump logic is the same. The difference: hand the pumping promise to ctx.waitUntil so the runtime does not garbage-collect the request while bytes are still flowing.
// src/worker.ts — Cloudflare Workers
export default {
async fetch(req: Request, env: unknown, ctx: ExecutionContext): Promise<Response> {
const { readable, writable } = new TransformStream();
const writer = writable.getWriter();
const encoder = new TextEncoder();
const pump = (async () => {
for (let i = 0; i < 5; i++) {
await writer.write(encoder.encode(`chunk ${i}\n`));
await new Promise((r) => setTimeout(r, 200));
}
await writer.close();
})();
ctx.waitUntil(pump); // keep the worker alive until the stream finishes
return new Response(readable, {
headers: { 'content-type': 'text/plain; charset=utf-8' },
});
},
};
Step 3: Transform an upstream body without buffering it
The high-value pattern is rewriting a streamed upstream response in flight — useful for redacting fields, injecting markup, or reframing LLM tokens. Pipe upstream.body through a TransformStream and never materialize the whole payload.
// uppercase.ts — runtime-agnostic streaming transform
export async function streamUppercase(upstreamUrl: string): Promise<Response> {
const upstream = await fetch(upstreamUrl);
if (!upstream.body) return new Response('no body', { status: 502 });
const decoder = new TextDecoder();
const encoder = new TextEncoder();
const transform = new TransformStream<Uint8Array, Uint8Array>({
transform(chunk, controller) {
const text = decoder.decode(chunk, { stream: true });
controller.enqueue(encoder.encode(text.toUpperCase()));
},
});
return new Response(upstream.body.pipeThrough(transform), {
headers: { 'content-type': upstream.headers.get('content-type') ?? 'text/plain' },
});
}
pipeThrough wires backpressure automatically: if the client reads slowly, the runtime pauses pulling from upstream. You get flow control for free as long as you do not buffer chunks yourself.
Step 4: Emit Server-Sent Events
SSE is a stream with a specific framing (data: lines, double-newline terminators) and content type. Build it on the same TransformStream primitive.
// sse.ts — runtime-agnostic SSE source
export function sseResponse(): Response {
const { readable, writable } = new TransformStream();
const writer = writable.getWriter();
const enc = new TextEncoder();
const send = (event: string, data: unknown) =>
writer.write(enc.encode(`event: ${event}\ndata: ${JSON.stringify(data)}\n\n`));
(async () => {
for (let i = 0; i < 10; i++) {
await send('tick', { n: i, ts: Date.now() });
await new Promise((r) => setTimeout(r, 1000));
}
await send('done', {});
await writer.close();
})();
return new Response(readable, {
headers: {
'content-type': 'text/event-stream',
'cache-control': 'no-cache, no-transform',
connection: 'keep-alive',
},
});
}
no-transform is essential — it stops intermediary CDNs from buffering or compressing the stream, which would defeat the point.
Configuration
Vercel needs the edge runtime declared on the route. Cloudflare needs nothing special for streaming beyond a current compatibility date.
// Vercel: declare edge runtime on the route segment
export const runtime = 'edge';
// wrangler.jsonc — Cloudflare Workers
{
"name": "stream-edge",
"main": "src/worker.ts",
"compatibility_date": "2026-01-01"
}
Local vs production divergence
| Concern | Local dev | Production |
|---|---|---|
| Streaming time cap (Vercel) | unlimited under next dev |
~25 s for streaming responses |
| Streaming time cap (Cloudflare) | unlimited under wrangler dev |
no fixed cap; bounded by subrequest/CPU limits |
| Chunk flushing | Node may buffer the whole body before sending | flushed per chunk at the PoP |
ctx.waitUntil (Cloudflare) |
works, but request may not be reaped early | required to keep a long stream alive |
| Proxy buffering | none locally | a CDN may buffer without no-transform |
| Backpressure | rarely exercised on fast loopback | real across the network — slow clients pause pulls |
The trap: SSE that works perfectly in next dev and wrangler dev arrives all-at-once in production because an upstream proxy buffers it. Set cache-control: no-cache, no-transform and avoid compressing event streams.
Validation with Vitest
Test the transform by reading the readable side to completion and asserting on the assembled output. No live runtime needed.
// uppercase.test.ts
import { describe, it, expect, vi } from 'vitest';
import { streamUppercase } from './uppercase';
function bodyOf(text: string): ReadableStream<Uint8Array> {
const enc = new TextEncoder();
return new ReadableStream({
start(c) {
c.enqueue(enc.encode(text.slice(0, 3)));
c.enqueue(enc.encode(text.slice(3)));
c.close();
},
});
}
describe('streamUppercase', () => {
it('uppercases a chunked upstream body', async () => {
vi.stubGlobal('fetch', async () =>
new Response(bodyOf('hello world'), { headers: { 'content-type': 'text/plain' } }),
);
const res = await streamUppercase('https://upstream.test/x');
expect(await res.text()).toBe('HELLO WORLD');
});
});
Named pitfalls
- Awaiting the pump before returning. Blocking on the writer loop buffers everything and defeats streaming. Fix: run the pump in a detached async function; return the readable immediately.
- Forgetting
ctx.waitUntilon Cloudflare. A long stream can be cut short when the runtime reaps the request. Fix: pass the pump promise toctx.waitUntil. - No
no-transformon SSE. A buffering proxy collapses the stream into one delayed payload. Fix: setcache-control: no-cache, no-transform. - Ignoring Vercel’s ~25 s streaming cap. Long idle streams are killed. Fix: send periodic keep-alive comments (
:\n\n) or move very long streams to a durable connection model. - Building chunks with
Buffer. It does not exist at the edge. Fix: useTextEncoder/Uint8Array(see replacing Node Buffer with Uint8Array).
Production deployment checklist
- Stream pump runs detached; the
Response - Cloudflare passes the pump promise to
- SSE responses set
content-type: text/event-streamand - Vercel routes declare
- Transforms use
pipeThrough - No
Bufferusage; encoding usesTextEncoder/
Frequently Asked Questions
How long can a streaming response stay open on each platform?
Vercel Edge caps streaming responses at roughly 25 seconds. Cloudflare Workers has no fixed wall-clock cap on a stream — it stays open while the client reads and you keep writing, bounded by subrequest and CPU limits rather than a single timeout.
Why does my SSE stream arrive all at once in production but not locally?
An intermediary proxy or CDN is buffering it. Set cache-control: no-cache, no-transform on the response and do not compress text/event-stream. Local dev servers do not buffer, which hides the problem.
Do I need ctx.waitUntil to stream on Cloudflare?
For long-lived streams, yes. Pass the pumping promise to ctx.waitUntil so the runtime does not consider the request finished while the body is still being produced. Vercel keeps the invocation alive for the response duration automatically.
Does streaming buffer the whole body in the isolate?
No, if you stream correctly. Returning a ReadableStream and using pipeThrough flushes chunks as they are produced and applies backpressure, so the 128 MB isolate never holds the full payload. Buffering happens only if you accumulate chunks yourself before writing.