Best Open Source ocr Libraries
A curated list of the most popular GitHub repositories tagged with ocr. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1tesseract-ocr/tesseract
Tesseract Open Source OCR Engine (main repository)
#2PaddlePaddle/PaddleOCR
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
#3opendatalab/MinerU
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
#4hiroi-sora/Umi-OCR
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
#5siyuan-note/siyuan
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
#6naptha/tesseract.js
Pure Javascript OCR for more than 100 Languages 📖🎉🖥
#7paperless-ngx/paperless-ngx
A community-supported supercharged document management system: scan, index and archive all your documents
#8ShareX/ShareX
ShareX is a free and open-source application that enables users to capture or record any area of their screen with a single keystroke. It also supports uploading images, text, and various file types to a wide range of destinations.
#9ocrmypdf/OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
#10JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
#11pot-app/pot-desktop
🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.
#12lukas-blecher/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
#13ripperhe/Bob
Bob 是一款 macOS 平台的翻译和 OCR 软件。
#14zyddnys/manga-image-translator
Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)
#15pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
#16bytedance/Dolphin
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
#17PaddlePaddle/PaddleX
All-in-One Development Tool based on PaddlePaddle
#18oomol-lab/pdf-craft
PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.
#19datalab-to/chandra
OCR model that handles complex tables, forms, handwriting with full layout.
#20lzhgus/Capso
Open-source screenshot and screen recording for macOS. The free, native alternative to CleanShot X. Built with Swift 6.0 and SwiftUI.
#21R0Wi-DEV/workflow_ocr
This is a Nextcloud Workflow App which enables you to process files via OCR on serverside.