Best Open Source parser Libraries
A curated list of the most popular GitHub repositories tagged with parser. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1opendatalab/MinerU
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
#2markedjs/marked
A markdown parser and compiler. Built for speed.
#3swc-project/swc
Rust-based platform for the Web
#4cheeriojs/cheerio
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
#5postcss/postcss
Transforming styles with JS plugins
#6tree-sitter/tree-sitter
An incremental parsing system for programming tools
#7vectordotdev/vector
A high-performance observability data pipeline.
#8oxc-project/oxc
⚓ A collection of high-performance JavaScript tools.
#9nikic/PHP-Parser
A PHP parser written in PHP
#10json-iterator/go
A high-performance 100% compatible drop-in replacement of "encoding/json"
#11terser/terser
🗜 JavaScript parser, mangler and compressor toolkit for ES6+
#12tobymao/sqlglot
Python SQL Parser and Transpiler
#13bytedance/Dolphin
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
#14pdfminer/pdfminer.six
Community maintained fork of pdfminer - we fathom PDF
#15boa-dev/boa
Boa is an embeddable Javascript engine written in Rust.
#16fkling/astexplorer
A web tool to explore the ASTs generated by various parsers.