harlan-zw / mdream
āļø The fastest HTML to markdown convertor. Optimized for LLMs and supports streaming.
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing harlan-zw/mdream in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewmdream > āļø The fastest HTML to markdown convertor built with JavaScript and Rust. Optimized for LLMs and supports streaming. Made possible by my Sponsor Program š Follow me @harlan_zw š¦ ⢠Join Discord for help Features ⢠š§ #1 Token Optimizer: Up to 2x fewer tokens than Turndown, node-html-markdown, and html-to-markdown. 70-99% fewer tokens than raw HTML. ⢠š #1 Fastest: Fastest pure JS & native rust - 35x faster than Turndown, converts 1.8MB HTML in ~62ms (JS) and ~3.9ms (Rust). ⢠š Generates Minimal GitHub Flavored Markdown: Frontmatter, Nested & HTML markup support. Clean mode strips broken links, empty images, redundant anchors. ⢠š Streamable: Process HTML incrementally with for large documents and real-time pipelines. ⢠┠Tiny: 10kB gzip JS core, 60kB gzip with Rust WASM engine. Zero dependencies. ⢠āļø Run anywhere: CLI Crawler, Docker, GitHub Actions, Vite, & more. What is Mdream? A zero-dependency, LLM-optimized HTML to Markdown converter. Faster and leaner than Turndown, node-html-markdown, and html-to-markdown, with output tuned for token efficiency and readability. On top of the core converter, Mdream ships packages to generate LLM artifacts like for your own sites or produce LLM context for any project. Mdream Packages Mdream is built to run anywhere for all projects and use cases and is available in the following packages: | Package | Description | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------| | mdream | Rust NAPI engine + WASM for edge. Performance-first, declarative config. Includes CLI. | | Browser CDN | Use mdream directly in browsers via unpkg/jsDelivr without any build step | | @mdream/js | Pure JS engine. Full hook access, zero native deps. Subpaths: , , , , . | | @mdream/crawl | Site-wide crawler to generate artifacts from entire websites | | Docker | Pre-built Docker image with Playwright Chrome for containerized website crawling | | @mdream/vite | Generate automatic for your own Vite sites | | @mdream/nuxt | Generate automatic and artifacts generation for Nuxt Sites | | @mdream/action | Generate and artifacts from your static output | | mdream (crate) | Native Rust crate with CLI. Zero dependencies, streaming support. Available on crates.io | Mdream Usage Installation > [!TIP] > Generate an Agent Skill for this package using skilld: > Basic Usage **Core Functions:** ⢠htmlToMarkdown - Convert HTML to Markdown ⢠streamHtmlToMarkdown - Stream HTML to Markdown See the API Usage section for complete details. Mdream Crawl > Need something that works in the browser or an edge runtime? Use Mdream. The package crawls an entire site generating LLM artifacts using for Markdown conversion. ⢠llms.txt: A consolidated text file optimized for LLM consumption. ⢠llms-full.txt: An extended format with comprehensive metadata and full content. ⢠Individual Markdown Files: Each crawled page is saved as a separate Markdown file in the directory. Usage Examples š¤ Analyze Websites with AI Tools Feed website content directly to Claude or other AI tools: š Make Your Site AI-Discoverable Generate llms.txt to help AI tools understand your site: Outputs: ⢠- Optimized for LLM consumption ⢠- Complete content with metadata ⢠- Individual markdown files per page šļø Build RAG Systems from Websites Crawl websites and generate embeddings for vector databases: āļø Extract Specific Content from Pages Pull headers, images, or other elements during conversion: ā” Optimize Token Usage With Clean Mode Use (enabled by default with ) to automatically reduce token costs: Stdin CLI Usage Mdream is much more minimal than Mdream Crawl. It provides a CLI designed to work exclusively with Unix pipes, providing flexibility and freedom to integrate with other tools. **Pipe Site to Markdown** Fetches the Markdown Wikipedia page and converts it to Markdown preserving the original links and images. _Tip: The flag will fix relative image and link paths_ **Local File to Markdown** Converts a local HTML file to a Markdown file, using to write the output to a file and display it in the terminal. CLI Options ⢠: Base URL for resolving relative links and images ⢠: Conversion presets: minimal ⢠: Display help information ⢠: Display version information Docker Run with Playwright Chrome pre-installed for website crawling in containerized environments. **Available Images:** ⢠- Latest stable release ⢠- GitHub Container Registry See DOCKER.md for complete usage, configuration, and building instructions. GitHub Actions Integration Installation See the GitHub Actions README for usage and configuration. Vite Integration Installation See the Vite README for usage and configuration. Nuxt Integration Installation See the Nuxt Module README for usage and configuration. Browser CDN Usage Use mdream directly via CDN with no build step. Call once to load the WASM binary, then use synchronously: **CDN Options:** ⢠**unpkg**: ⢠**jsDelivr**: Benchmarks Converts 1.8MB HTML in **7.83ms** (Rust NAPI) or **62ms** (pure JS). Up to 35x faster than Turndown, 3500x faster than node-html-markdown on large files. | Input | mdream (rust) | mdream (js) | Turndown | node-html-markdown | |-------|---------------|-------------|----------|---------------------| | 166 KB | **0.60ms** | 3.36ms | 11.91ms | 15.35ms | | 420 KB | **1.26ms** | 7.79ms | 14.01ms | 17.23ms | | 1.8 MB | **7.83ms** | 62.2ms | 276.0ms | 27,381ms | With , mdream produces up to **92% fewer tokens** than raw HTML and up to **2x fewer tokens** than competing libraries. | Page (HTML tokens) | mdream minimal | Turndown | node-html-markdown | |---------------------|---------ā¦