back to home
Best Open Source document parsing Libraries
A curated list of the most popular GitHub repositories tagged with document parsing. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1PaddlePaddle/PaddleOCR
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
70,987Python
Analyze Code
#2docling-project/docling
Get your documents ready for gen AI
53,757Python
Analyze Code
#3Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
14,014HTML
Analyze Code