Streaming HTML parsing with Web Streams API

February 8, 2026

Streaming HTML parsing with Web Streams API

When you need to parse HTML from a network response, the typical approach is: fetch the entire body, then parse it. But for large documents, this means buffering megabytes in memory before processing starts.

The idea

The Web Streams API lets you process data as it arrives. If you could pipe a fetch() response directly into an HTML parser, you'd get:

  • Lower memory usage (no full-body buffering)
  • Faster time-to-first-result (parsing starts immediately)
  • Backpressure support (the parser controls how fast data flows)

Contributing to htmlparser2

htmlparser2 already supported Node.js streams via WritableStream. But it didn't support the Web Streams API — the browser-native stream interface.

I opened a PR to add a WebWritableStream adapter. The implementation wraps htmlparser2's Parser in a WritableStream that the browser's fetch() can pipe into:

const parser = new WebWritableStream(handler);
const response = await fetch(url);
await response.body.pipeTo(parser);

What I learned

  1. WritableStream's write() receives Uint8Array chunks — you need a TextDecoder to convert them to strings before feeding them to the parser.

  2. Backpressure is automatic — the stream runtime pauses the source when the sink (parser) isn't ready. No manual buffering needed.

  3. Error propagation matters — if the parser throws, the error needs to flow back to the fetch stream to cancel the network request.

The result

The PR was merged and shipped. You can now parse HTML from fetch() responses without buffering the entire body — in browsers and in Deno/Bun/Cloudflare Workers that support Web Streams natively.