Best Open Source ocr Libraries
A curated list of the most popular GitHub repositories tagged with ocr. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1tesseract-ocr/tesseract
Tesseract Open Source OCR Engine (main repository)
#2PaddlePaddle/PaddleOCR
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
#3opendatalab/MinerU
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
#4hiroi-sora/Umi-OCR
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
#5siyuan-note/siyuan
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
#6naptha/tesseract.js
Pure Javascript OCR for more than 100 Languages 📖🎉🖥
#7paperless-ngx/paperless-ngx
A community-supported supercharged document management system: scan, index and archive all your documents
#8ShareX/ShareX
ShareX is a free and open-source application that enables users to capture or record any area of their screen with a single keystroke. It also supports uploading images, text, and various file types to a wide range of destinations.
#9ocrmypdf/OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
#10JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
#11opendataloader-project/opendataloader-pdf
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
#12pot-app/pot-desktop
🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.
#13lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
#14ripperhe/Bob
Bob 是一款 macOS 平台的翻译和 OCR 软件。
#15zyddnys/manga-image-translator
Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)
#16pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
#17bytedance/Dolphin
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
#18STranslate/STranslate
A ready-to-go translation ocr tool developed with WPF/WPF 开发的一款即用即走的翻译、OCR工具
#19oomol-lab/pdf-craft
PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.
#20datalab-to/chandra
OCR model that handles complex tables, forms, handwriting with full layout.
#21shipfastlabs/parsel
A fast, helpful, and open-source document parser for PHP