back to home

Best Open Source ocr Libraries

A curated list of the most popular GitHub repositories tagged with ocr. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1tesseract-ocr/tesseract

Tesseract Open Source OCR Engine (main repository)

72,951C++
Explore Repo

#2PaddlePaddle/PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

72,460Python
Explore Repo

#3opendatalab/MinerU

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

56,399Python
Explore Repo

#4hiroi-sora/Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

42,616Python
Explore Repo

#5siyuan-note/siyuan

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

41,915TypeScript
Explore Repo

#6naptha/tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

37,925JavaScript
Explore Repo

#7paperless-ngx/paperless-ngx

A community-supported supercharged document management system: scan, index and archive all your documents

37,418Python
Explore Repo

#8ShareX/ShareX

ShareX is a free and open-source application that enables users to capture or record any area of their screen with a single keystroke. It also supports uploading images, text, and various file types to a wide range of destinations.

35,901C#
Explore Repo

#9ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

32,961Python
Explore Repo

#10JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

29,096Python
Explore Repo

#11opendataloader-project/opendataloader-pdf

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

21,855Java
Explore Repo

#12pot-app/pot-desktop

🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.

17,365JavaScript
Explore Repo

#13lukas-blecher/LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

16,255Python
Explore Repo

#14ripperhe/Bob

Bob 是一款 macOS 平台的翻译和 OCR 软件。

9,579
Explore Repo

#15zyddnys/manga-image-translator

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)

9,549Python
Explore Repo

#16pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

9,256Python
Explore Repo

#17bytedance/Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

8,865Python
Explore Repo

#18STranslate/STranslate

A ready-to-go translation ocr tool developed with WPF/WPF 开发的一款即用即走的翻译、OCR工具

7,086C#
Explore Repo

#19oomol-lab/pdf-craft

PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.

4,959Python
Explore Repo

#20datalab-to/chandra

OCR model that handles complex tables, forms, handwriting with full layout.

4,949Python
Explore Repo

#21shipfastlabs/parsel

A fast, helpful, and open-source document parser for PHP

165PHP
Explore Repo