Releases: harlan-zw/mdream
v1.0.3
🐞 Bug Fixes
- Prevent empty link title/aria-label from leaking in excluded ele… - by @Ice-Hazymoon and Claude Opus 4.6 (1M context) in #71 (93321)
- crawl: Broken http recursive url processing - by @harlan-zw (cad7b)
- rust: Drop fast mode for <script> parse - by @harlan-zw (c09f6)
View changes on GitHub
v1.0.2
🐞 Bug Fixes
- Include
<th>cells in table cell detection for HTML passthrough - by @oritwoen in #67 (855c9) - nuxt:
- Reexport utils - by @harlan-zw (14060)
- Edge build - by @harlan-zw (d1400)
- rust:
- Opening tags inside <script> bypasses quote tracking - by @Ice-Hazymoon and Claude Opus 4.6 (1M context) in #68 (881a5)
- vite:
- Re-export utils - by @harlan-zw (fa0d4)
View changes on GitHub
v1.0.0
Mdream v1 ships a native Rust engine, WebAssembly for edge runtimes, and a simpler declarative API while keeping the JS engine available as @mdream/js for users who need imperative hook-based plugins.
👀 Highlights
⚡️ Native Rust Engine
Single-pass architecture via NAPI-RS: parsing and Markdown generation in one traversal, no intermediate DOM.
| Input Size | mdream (Rust NAPI) | mdream (JS) | Speedup |
|---|---|---|---|
| 166 KB | 🏆 0.60ms | 3.36ms | 5.6× |
| 420 KB | 🏆 1.26ms | 7.79ms | 6.2× |
| 1.8 MB | 🏆 7.83ms | 62.2ms | 7.9× |
Native Rust (no Node.js overhead) hits 3.84ms on 1.8MB with PGO, up to ~16× faster than JS.
Install mdream and the platform-specific binary resolves automatically.
🌐 WebAssembly for Edge Runtimes
For Cloudflare Workers, Vercel Edge, and browsers. Export conditions (workerd, edge-light, browser) select the correct build automatically, or use mdream/worker directly.
🔧 Declarative Plugin Config
Built-in plugins take a flat options object instead of an array of factory calls. Serializable, works with both engines.
import { htmlToMarkdown } from 'mdream'
const markdown = htmlToMarkdown(html, { minimal: true })
// Or configure individually
let title = ''
const markdown = htmlToMarkdown(html, {
frontmatter: (fm) => { title = fm.title },
filter: { exclude: ['nav', 'footer', 'aside'] },
extraction: { h1: el => console.log('Title:', el.textContent) },
tagOverrides: { 'x-code': 'pre' },
})🧩 @mdream/js: Standalone JS Engine
Custom hook-based plugins (createPlugin, extractionPlugin with callbacks) live here:
import { createPlugin, htmlToMarkdown } from '@mdream/js'
const myPlugin = createPlugin({
onNodeEnter(el, state) { /* ... */ },
processTextNode(node, state) { /* ... */ },
})
const markdown = htmlToMarkdown(html, {
frontmatter: true,
hooks: [myPlugin],
})📋 Packages
| Package | Description |
|---|---|
mdream |
Rust NAPI engine + WASM for edge. Performance-first. Primary package. |
@mdream/js |
Pure JS engine. Full hook access, zero native deps. Subpaths: /plugins, /splitter, /preset/minimal, /parse, /llms-txt, /negotiate. |
@mdream/crawl |
Site-wide crawler for llms.txt generation. |
@mdream/rust-* |
Platform-specific native binaries (auto-resolved). |
⚖️ Engine Comparison
mdream is the performance-focused engine. @mdream/js is the pluggability-focused engine with full hook access and zero native dependencies.
| Feature | mdream (Rust) |
@mdream/js |
|---|---|---|
| Performance | ~7.8ms (1.8MB) | ~62ms (1.8MB) |
| Custom hook plugins | — | hooks |
| Declarative plugins | ✅ | ✅ |
| Streaming | ✅ | ✅ |
| Extraction / Frontmatter | ✅ | ✅ |
| Edge/WASM | Built-in | JS runs anywhere |
parseHtml access |
— | ✅ |
⚠️ Bundler Compatibility
The native Rust engine uses NAPI-RS to load platform-specific binaries at runtime. Bundlers that attempt to statically analyze or bundle .node files (e.g. Turbopack in Next.js 15/16) will fail because the loader pattern uses createRequire() and dynamic path resolution.
Next.js / Turbopack
Add mdream to serverExternalPackages in your next.config.js:
const nextConfig = {
serverExternalPackages: ['mdream'],
}This tells Next.js to skip bundling mdream and load it via native require() at runtime, which is required for any package with native Node.js bindings.
Other Bundlers
If your bundler fails to resolve mdream/napi, mark mdream as external in your bundler config. For example, in webpack:
externals: ['mdream']The @mdream/js package has zero native dependencies and works with all bundlers without any configuration.
💥 Migration from v0.x
Migrate with AI. Copy the prompt below into your AI coding assistant along with any file that imports from
mdream:Migrate from mdream v0.x to v1.0. Rules: - `htmlToMarkdown()` now returns `string`, not an object. Frontmatter/extraction use callbacks. - Replace `plugins: [frontmatterPlugin(), ...]` with declarative config: `{ frontmatter: true, isolateMain: true, tailwind: true, filter: { exclude: ['nav'] } }` or `{ minimal: true }`. - Custom hook plugins (createPlugin, extractionPlugin with callbacks) must import from '@mdream/js' and use `hooks: [...]` instead of `plugins: [...]`. - Subpath imports moved: mdream/plugins -> @mdream/js/plugins, mdream/splitter -> @mdream/js/splitter, mdream/preset/minimal -> @mdream/js/preset/minimal, mdream/llms-txt -> @mdream/js/llms-txt, mdream/negotiate -> @mdream/js/negotiate. - Type renames: Plugin -> TransformPlugin (from @mdream/js), HTMLToMarkdownOptions -> MdreamOptions. - TAG_* constants and ELEMENT_NODE/TEXT_NODE: import from '@mdream/js' instead of 'mdream/plugins'. - filter.exclude uses string[] ('nav', 'footer') not TAG_* constants when using the Rust engine.
Quick Decision Tree
Custom plugins (createPlugin, extractionPlugin with callbacks)?
YES -> import from '@mdream/js', use `hooks` option
NO -> import from 'mdream', use declarative config
Subpath imports (mdream/plugins, mdream/splitter)?
YES -> update to @mdream/js equivalents (see table below)
NO -> no change needed
Frontmatter or extracted element data?
YES -> use `frontmatter` callback or `extraction` handlers
NO -> no change needed
Breaking Changes
1. Return type: string, not object
htmlToMarkdown() returns a string. Frontmatter and extraction use callbacks.
- const result = htmlToMarkdown(html, { plugins: [frontmatterPlugin()] })
- console.log(result.frontmatter)
+ let frontmatter: Record<string, string> | undefined
+ const markdown = htmlToMarkdown(html, {
+ frontmatter: (fm) => { frontmatter = fm },
+ })Config object form also works: frontmatter: { onExtract: (fm) => { ... } }
2. Plugin array → declarative config
- import { filterPlugin, frontmatterPlugin, isolateMainPlugin, tailwindPlugin } from 'mdream/plugins'
- htmlToMarkdown(html, {
- plugins: [frontmatterPlugin(), isolateMainPlugin(), tailwindPlugin(), filterPlugin({ exclude: [TAG_NAV] })]
- })
+ htmlToMarkdown(html, {
+ frontmatter: true,
+ isolateMain: true,
+ tailwind: true,
+ filter: { exclude: ['nav'] }
+ })Or: { minimal: true }. Both mdream and @mdream/js accept the same config shape.
3. Custom hook plugins → @mdream/js
The Rust engine cannot execute JS callbacks mid-conversion.
- import { createPlugin } from 'mdream/plugins'
- htmlToMarkdown(html, { plugins: [createPlugin({ onNodeEnter(el) { ... } })] })
+ import { htmlToMarkdown, createPlugin } from '@mdream/js'
+ htmlToMarkdown(html, { hooks: [createPlugin({ onNodeEnter(el) { ... } })] })Passing Plugin[] to the Rust engine throws: Custom hook plugins require @mdream/js.
4. Subpath imports moved
| v0.x | v1.0 |
|---|---|
mdream/plugins |
@mdream/js/plugins |
mdream/splitter |
@mdream/js/splitter |
mdream/preset/minimal |
@mdream/js/preset/minimal |
mdream/llms-txt |
@mdream/js/llms-txt |
mdream/negotiate |
@mdream/js/negotiate |
Added: mdream/worker for edge runtimes.
5. Removed from main entry
| Removed | Use instead |
|---|---|
parseHtml |
@mdream/js |
MarkdownProcessor |
htmlToMarkdown() |
TagIdMap |
@mdream/js |
createPlugin |
@mdream/js/plugins |
6. Type renames
- import type { Plugin, HTMLToMarkdownOptions } from 'mdream'
+ import type { TransformPlugin } from '@mdream/js'
+ import type { MdreamOptions } from 'mdream'7. Constants
- import { ELEMENT_NODE, TAG_NAV } from 'mdream/plugins'
+ import { ELEMENT_NODE, TAG_H1 } from '@mdream/js'Only TAG_H1-TAG_H6, ELEMENT_NODE, and TEXT_NODE are exported. Use string names ('nav', 'footer') in filter config.
8. filter.exclude types
mdream(Rust):string[]only@mdream/js:(string | number)[](tag names orTAG_*constants)
String tag names work everywhere: { filter: { exclude: ['nav', 'footer'] } }
Changelog
🚀 Features
- crawl:
- Add
allowSubdomainsoption for cross-subdomain crawling - by @harlan-zw in #64 (18e8f) - Add config file support and hookable hooks - by @harlan-zw in #65 (90bd6)
- Add
🐞 Bug Fixes
- Broken iife build - by @harlan-zw (6da2d)
- Add bounds validation to unsafe ptr::copy and depth_map access - by @harlan-zw in #56 (480ab)
- crawl:
- Support single-page mode with playwright driver - by @harlan-zw in #55 (6e2ba)
- Handle wmic.exe ENOENT on Windows 11+ - by @harlan-zw in #57 (12f02)
- edge:
- Export missing stream functions - by @harlan-zw (9c462)
- js:
v1.0.0-beta.14
No significant changes
View changes on GitHub
v1.0.0-beta.13
🐞 Bug Fixes
- parser: Handle
<operator inside script tags without breaking parse loop - by @harlan-zw in #66 (5fda6)
View changes on GitHub
v1.0.0-beta.12
🚀 Features
- crawl:
- Add
allowSubdomainsoption for cross-subdomain crawling - by @harlan-zw in #64 (18e8f) - Add config file support and hookable hooks - by @harlan-zw in #65 (90bd6)
- Add
🐞 Bug Fixes
- Add bounds validation to unsafe ptr::copy and depth_map access - by @harlan-zw in #56 (480ab)
- crawl:
- Support single-page mode with playwright driver - by @harlan-zw in #55 (6e2ba)
- Handle wmic.exe ENOENT on Windows 11+ - by @harlan-zw in #57 (12f02)
- edge:
- Export missing stream functions - by @harlan-zw (9c462)
- js:
- napi:
- Prevent silent u32-to-u8 truncation in splitter options - by @harlan-zw in #54 (2549e)
- rust:
- Prevent panics from crashing Node.js process - by @harlan-zw in #58 (8a98a)
- Prevent depth_map u8 overflow with saturating_add - by @harlan-zw in #53 (80fef)
- Add cargo-fuzz harness and fix 7 panics found by fuzzing - by @harlan-zw (30330)
- Prevent ](#) artifacts from links after close - by @harlan-zw in #62 (51317)
- Expand named HTML entity decoder from 6 to 245 entries - by @oritwoen and @harlan-zw in #61 (0a194)
View changes on GitHub
v1.0.0-beta.11
🐞 Bug Fixes
- rust: Stream - by @harlan-zw (66e17)
🏎 Performance
- crawl: Raw http crawling - by @harlan-zw (3a6f6)
View changes on GitHub
v1.0.0-beta.9
No significant changes
View changes on GitHub
v1.0.0-beta.7
No significant changes
View changes on GitHub
v1.0.0-beta.6
No significant changes