back to home

fb55 / htmlparser2

The fast & forgiving HTML and XML parser

4,801 stars
397 forks
22 issues
TypeScriptJavaScriptHTML

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing fb55/htmlparser2 in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/fb55/htmlparser2)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

htmlparser2 The fast & forgiving HTML/XML parser. _htmlparser2 is the fastest HTML parser, and takes some shortcuts to get there. If you need strict HTML spec compliance, have a look at parse5._ Installation npm install htmlparser2 A live demo of is available on AST Explorer. Ecosystem | Name | Description | | ------------------------------------------------------------- | ------------------------------------------------------- | | htmlparser2 | Fast & forgiving HTML/XML parser | | domhandler | Handler for htmlparser2 that turns documents into a DOM | | domutils | Utilities for working with domhandler's DOM | | css-select | CSS selector engine, compatible with domhandler's DOM | | cheerio | The jQuery API for domhandler's DOM | | dom-serializer | Serializer for domhandler's DOM | Usage itself provides a callback interface that allows consumption of documents with minimal allocations. For a more ergonomic experience, read Getting a DOM below. Output (with multiple text events combined): This example only shows three of the possible events. Read more about the parser, its events and options in the wiki. Usage with streams While the interface closely resembles Node.js streams, it's not a 100% match. Use the interface to process a streaming input: Getting a DOM The produces a DOM (document object model) that can be manipulated using the helper. The , while still bundled with this module, was moved to its own module. Have a look at that for further information. Parsing Feeds makes it easy to parse RSS, RDF and Atom feeds, by providing a method: Performance After having some artificial benchmarks for some time, **@AndreasMadsen** published his , which benchmarks HTML parses based on real-world websites. At the time of writing, the latest versions of all supported parsers show the following performance characteristics on GitHub Actions (sourced from here): How does this module differ from node-htmlparser? In 2011, this module started as a fork of the module. was rewritten multiple times and, while it maintains an API that's mostly compatible with , the projects don't share any code anymore. The parser now provides a callback interface inspired by sax.js (originally targeted at readabilitySAX). As a result, old handlers won't work anymore. The was renamed to clarify its purpose (to ). The old name is still available when requiring and your code should work as expected. The was replaced with a function that takes a DOM and returns a feed object. There is a helper function that can be used to parse a feed from a string. Security contact information To report a security vulnerability, please use the Tidelift security contact. Tidelift will coordinate the fix and disclosure.