back to home

Best Open Source ocr Libraries

A curated list of the most popular GitHub repositories tagged with ocr. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1tesseract-ocr/tesseract

Tesseract Open Source OCR Engine (main repository)

72,951C++
Explore Repo

#2PaddlePaddle/PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

72,460Python
Explore Repo

#3opendatalab/MinerU

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

56,399Python
Explore Repo

#4hiroi-sora/Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

42,616Python
Explore Repo

#5siyuan-note/siyuan

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

41,915TypeScript
Explore Repo

#6naptha/tesseract.js

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

37,925JavaScript
Explore Repo

#7paperless-ngx/paperless-ngx

A community-supported supercharged document management system: scan, index and archive all your documents

37,418Python
Explore Repo

#8ShareX/ShareX

ShareX is a free and open-source application that enables users to capture or record any area of their screen with a single keystroke. It also supports uploading images, text, and various file types to a wide range of destinations.

35,901C#
Explore Repo

#9ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

32,961Python
Explore Repo

#10JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

29,096Python
Explore Repo

#11pot-app/pot-desktop

🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.

17,365JavaScript
Explore Repo

#12lukas-blecher/LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

16,255Python
Explore Repo

#13ripperhe/Bob

Bob 是一款 macOS 平台的翻译和 OCR 软件。

9,579
Explore Repo

#14zyddnys/manga-image-translator

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)

9,549Python
Explore Repo

#15pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

9,256Python
Explore Repo

#16bytedance/Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

8,865Python
Explore Repo

#17PaddlePaddle/PaddleX

All-in-One Development Tool based on PaddlePaddle

6,112Python
Explore Repo

#18oomol-lab/pdf-craft

PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.

4,959Python
Explore Repo

#19datalab-to/chandra

OCR model that handles complex tables, forms, handwriting with full layout.

4,949Python
Explore Repo

#20lzhgus/Capso

Open-source screenshot and screen recording for macOS. The free, native alternative to CleanShot X. Built with Swift 6.0 and SwiftUI.

521Swift
Explore Repo

#21R0Wi-DEV/workflow_ocr

This is a Nextcloud Workflow App which enables you to process files via OCR on serverside.

93PHP
Explore Repo