back to home

Best Open Source pdf Libraries

A curated list of the most popular GitHub repositories tagged with pdf. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1justjavac/free-programming-books-zh_CN

:books: 免费的计算机编程类中文书籍,欢迎投稿

116,465
Explore Repo

#2microsoft/markitdown

Python tool for converting files and office documents to Markdown.

90,864Python
Explore Repo

#3Stirling-Tools/Stirling-PDF

#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere

75,401TypeScript
Explore Repo

#4opendatalab/MinerU

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

56,399Python
Explore Repo

#5docling-project/docling

Get your documents ready for gen AI

55,956Python
Explore Repo

#6siyuan-note/siyuan

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

41,915TypeScript
Explore Repo

#7paperless-ngx/paperless-ngx

A community-supported supercharged document management system: scan, index and archive all your documents

37,418Python
Explore Repo

#8ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

32,961Python
Explore Repo

#9PDFMathTranslate/PDFMathTranslate

[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero

32,297Python
Explore Repo

#10hehonghui/awesome-english-ebooks

经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新

29,609CSS
Explore Repo

#11posquit0/Awesome-CV

:page_facing_up: Awesome CV is LaTeX template for your outstanding job application

27,036TeX
Explore Repo

#12forthespada/CS-Books

🔥🔥超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~

26,476
Explore Repo

#13koodo-reader/koodo-reader

A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux, Android, iOS and Web

26,294JavaScript
Explore Repo

#14koreader/koreader

An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices

25,799Lua
Explore Repo

#15readest/readest

Readest is a modern, feature-rich ebook reader designed for avid readers offering seamless cross-platform access, powerful tools, and an intuitive interface to elevate your reading experience.

18,756TypeScript
Explore Repo

#16ether/etherpad-lite

Etherpad: A modern really-real-time collaborative document editor.

18,185TypeScript
Explore Repo

#17diegomura/react-pdf

📄 Create PDF files using React

16,469TypeScript
Explore Repo

#18salomonelli/best-resume-ever

:necktie: :briefcase: Build fast :rocket: and easy multiple beautiful resumes and create your best CV ever! Made with Vue and LESS.

16,463Vue
Explore Repo

#19mayooear/ai-pdf-chatbot-langchain

AI PDF chatbot agent built with LangChain & LangGraph

16,395TypeScript
Explore Repo

#20sumatrapdfreader/sumatrapdf

SumatraPDF reader

16,256C
Explore Repo

#21jsvine/pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

9,934Python
Explore Repo

#22py-pdf/pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

9,875Python
Explore Repo

#23pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

9,256Python
Explore Repo

#24ahrm/sioyek

Sioyek is a PDF viewer with a focus on textbooks and research papers

9,228C
Explore Repo

#25bytedance/Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

8,865Python
Explore Repo

#26Kozea/WeasyPrint

The awesome document factory

8,725Python
Explore Repo

#27oomol-lab/pdf-craft

PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.

4,959Python
Explore Repo

#28ciromattia/kcc

KCC (a.k.a. Kindle Comic Converter) is a comic and manga converter for ebook readers.

4,882Python
Explore Repo

#29hadyang/interview

Java 笔试、面试 知识整理

4,848
Explore Repo

#30Hufe921/canvas-editor

rich text editor by canvas/svg

4,843TypeScript
Explore Repo

#31qpdf/qpdf

qpdf: A content-preserving PDF document transformer

4,842C++
Explore Repo

#32prawnpdf/prawn

Fast, Nimble PDF Writer for Ruby

4,802Ruby
Explore Repo

#33firecrawl/pdf-inspector

Fast Rust library for PDF inspection, classification, and text extraction. Intelligently detects scanned vs text-based PDFs to enable smart routing decisions.

869Rust
Explore Repo

#34yfedoseev/pdf_oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

586Rust
Explore Repo

#35Aryan-Raj3112/episteme

A native Android document reader application built with Kotlin and Jetpack Compose.

470Kotlin
Explore Repo

#36WtfJoke/setup-tectonic

Sets up Tectonic in your GitHub Actions workflow so you can compile your LaTeX documents.

62TypeScript
Explore Repo