StreamMD

Every AI chat app has the same performance bug: they re-parse the entire conversation on every token. I built a library that makes this structurally impossible.

🔴The O(n²) Trap Nobody Talks About

Open any LLM-powered chat. Stream a long response. Watch your DevTools. Every single token triggers a full re-parse of the entire accumulated markdown string.

The cascade on every token:

Token arrives→Full string → parser→Re-parse ALL blocks→Diff entire VDOM→Reconcile

After 500 tokens, you're parsing a 2,000-character markdown string on every frame. After 2,000 tokens — 8,000 characters. The work grows quadratically. At 100 tok/s the average app performs 100 full re-parses per second, each one touching content that hasn't changed.

🟢The Fix: Block-Level Incremental Parsing

I asked: What if the parser only processed new characters?

StreamMD's StreamParser accepts the full accumulated text on each call, but internally tracks prevLength and only processes the delta. It classifies each line into block types (heading, code fence, table, list, paragraph) and maintains a running array of structured Block objects.

The magic: completed blocks are frozen. When a block is closed (the parser encounters a blank line, a new heading, or a closing code fence), it's marked closed: true. The React layer wraps each block in React.memo — closed blocks never re-render. Only the active (last, unclosed) block updates on each token.

⚙️Architecture

Data Flow

LLM tokens

→

StreamParser.push(fullText)

diff(prevLength)

▼

Incremental Parser

▸ Process only new lines

▸ Classify: heading | code | table | list | paragraph

▸ Track incomplete lines separately (no duplication)

▸Returns: Block[], activeIndex

blocks[]

▼

React.memo(BlockComponent)

Only activeBlock re-renders

The key insight: most blocks are frozen. In a typical streaming response, 95% of the rendered content is in completed blocks. The parser identifies block boundaries and the React layer leverages this to skip re-rendering everything that hasn't changed.

The incomplete-line tracking is critical — when tokens arrive mid-line (e.g., "## He" before the "ading\n"), the partial text is held in a separate buffer and virtually appended at render time. This avoids the classic streaming bug where partial tokens get permanently committed and then duplicated when the rest of the line arrives.

📊The Numbers

Metric	react-markdown	StreamMD
Re-renders (500 tokens)	500	~20
Per-token complexity	O(n) — full re-parse	O(1) — delta only
Bundle size	45kB + remark + rehype	30kB total (incl. highlighter)
Runtime dependencies	unified + remark + rehype + …	0 (React peer only)
Syntax highlighting	BYO (Prism/Shiki)	Built-in (15 languages)

💻Usage

$ npm install stream-md

// Drop-in replacement for react-markdown

import { StreamMD } from 'stream-md';

import 'stream-md/css';

function Chat() {

const { messages } = useChat(); // Vercel AI SDK

const last = messages[messages.length - 1];

return (

<StreamMD

text={last?.content || ''}

theme="dark"

);

}

That's it. One import. One component. Your streaming goes from janky to buttery.

🎯What Makes This Different

Incremental Diffing

The parser tracks prevLength and only processes the new characters — never re-scans completed content.

Incomplete Line Buffer

Tokens that arrive mid-line are held in a separate buffer and virtually appended at render time — preventing the classic duplication bug.

Block-Level Memoization

Each block (heading, code, paragraph, table) is a React.memo component. Completed blocks never re-render.

Built-in Highlighter

Token-by-token syntax highlighting for 15 languages. Returns structured spans — no dangerouslySetInnerHTML needed.

Component Overrides

Full control: swap any element (pre, a, table, code) with your own component via the components prop.

Zero Dependencies

No unified, no remark, no rehype. Just React as a peer dependency. 30kB total ESM bundle.

The ZeroJitter + StreamMD Stack

ZeroJitter eliminates layout thrashing by rendering text to canvas. StreamMD eliminates redundant markdown parsing by incrementally tracking blocks.

Together, they own the "streaming LLM display" category. Use ZeroJitter for raw text streams. Use StreamMD when you need full markdown rendering with headings, code blocks, tables, and inline formatting.

The fastest LLM UI is the one that does the least work.

$ npm install stream-mdStar on GitHub →

Naive (innerHTML)

StreamMD (incremental)

I Made Streaming Markdown 25x Faster — Here's the Architecture