Best Open Source pdf Libraries
A curated list of the most popular GitHub repositories tagged with pdf. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1justjavac/free-programming-books-zh_CN
:books: 免费的计算机编程类中文书籍,欢迎投稿
#2microsoft/markitdown
Python tool for converting files and office documents to Markdown.
#3Stirling-Tools/Stirling-PDF
#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere
#4opendatalab/MinerU
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
#5docling-project/docling
Get your documents ready for gen AI
#6siyuan-note/siyuan
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
#7paperless-ngx/paperless-ngx
A community-supported supercharged document management system: scan, index and archive all your documents
#8ocrmypdf/OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
#9PDFMathTranslate/PDFMathTranslate
[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
#10hehonghui/awesome-english-ebooks
经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新
#11posquit0/Awesome-CV
:page_facing_up: Awesome CV is LaTeX template for your outstanding job application
#12forthespada/CS-Books
🔥🔥超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~
#13koodo-reader/koodo-reader
A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux, Android, iOS and Web
#14koreader/koreader
An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices
#15readest/readest
Readest is a modern, feature-rich ebook reader designed for avid readers offering seamless cross-platform access, powerful tools, and an intuitive interface to elevate your reading experience.
#16ether/etherpad-lite
Etherpad: A modern really-real-time collaborative document editor.
#17diegomura/react-pdf
📄 Create PDF files using React
#18salomonelli/best-resume-ever
:necktie: :briefcase: Build fast :rocket: and easy multiple beautiful resumes and create your best CV ever! Made with Vue and LESS.
#19mayooear/ai-pdf-chatbot-langchain
AI PDF chatbot agent built with LangChain & LangGraph
#20sumatrapdfreader/sumatrapdf
SumatraPDF reader
#21jsvine/pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
#22py-pdf/pypdf
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
#23pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
#24ahrm/sioyek
Sioyek is a PDF viewer with a focus on textbooks and research papers
#25bytedance/Dolphin
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
#26Kozea/WeasyPrint
The awesome document factory
#27oomol-lab/pdf-craft
PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.
#28ciromattia/kcc
KCC (a.k.a. Kindle Comic Converter) is a comic and manga converter for ebook readers.
#29hadyang/interview
Java 笔试、面试 知识整理
#30Hufe921/canvas-editor
rich text editor by canvas/svg
#31qpdf/qpdf
qpdf: A content-preserving PDF document transformer
#32prawnpdf/prawn
Fast, Nimble PDF Writer for Ruby
#33firecrawl/pdf-inspector
Fast Rust library for PDF inspection, classification, and text extraction. Intelligently detects scanned vs text-based PDFs to enable smart routing decisions.
#34yfedoseev/pdf_oxide
The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.
#35Aryan-Raj3112/episteme
A native Android document reader application built with Kotlin and Jetpack Compose.
#36WtfJoke/setup-tectonic
Sets up Tectonic in your GitHub Actions workflow so you can compile your LaTeX documents.