back to home
Best Open Source extract data Libraries
A curated list of the most popular GitHub repositories tagged with extract data. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1opendatalab/MinerU
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
54,581Python
Analyze Code
#2pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
9,097Python
Analyze Code
#3bda-research/node-crawler
Web Crawler/Spider for NodeJS + server-side jQuery ;-)
6,785TypeScript
Analyze Code