Best Open Source pdf Libraries
A curated list of the most popular GitHub repositories tagged with pdf. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1justjavac/free-programming-books-zh_CN
:books: 免费的计算机编程类中文书籍,欢迎投稿
#2microsoft/markitdown
Python tool for converting files and office documents to Markdown.
#3Stirling-Tools/Stirling-PDF
#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere
#4opendatalab/MinerU
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
#5docling-project/docling
Get your documents ready for gen AI
#6siyuan-note/siyuan
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
#7paperless-ngx/paperless-ngx
A community-supported supercharged document management system: scan, index and archive all your documents
#8ocrmypdf/OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
#9PDFMathTranslate/PDFMathTranslate
[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
#10hehonghui/awesome-english-ebooks
经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新
#11posquit0/Awesome-CV
:page_facing_up: Awesome CV is LaTeX template for your outstanding job application
#12forthespada/CS-Books
🔥🔥超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~
#13koodo-reader/koodo-reader
A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux, Android, iOS and Web
#14koreader/koreader
An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices
#15ether/etherpad-lite
Etherpad: A modern really-real-time collaborative document editor.
#16readest/readest
Readest is a modern, feature-rich ebook reader designed for avid readers offering seamless cross-platform access, powerful tools, and an intuitive interface to elevate your reading experience.
#17xournalpp/xournalpp
Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input from devices such as Wacom Tablets.
#18Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
#19kekingcn/kkFileView
Universal File Online Preview Project based on Spring-Boot
#20janishar/mit-deep-learning-book-pdf
MIT Deep Learning Book in PDF format (complete and parts) by Ian Goodfellow, Yoshua Bengio and Aaron Courville
#21QuestPDF/QuestPDF
QuestPDF is a modern library for PDF document generation. Its fluent C# API lets you design complex layouts with clean, readable code. Create documents using a flexible, component-based approach.
#22Zettlr/Zettlr
Your One-Stop Publication Workbench
#23documenso/documenso
The Open Source DocuSign Alternative.
#24wmjordan/PDFPatcher
PDF补丁丁——PDF工具箱,可以编辑书签、剪裁旋转页面、解除限制、提取或合并文档,探查文档结构,提取图片、转成图片等等
#25getomni-ai/zerox
OCR & Document Extraction using vision models
#26Kareadita/Kavita
Kavita is a fast, feature rich, cross platform reading server. Built with the goal of being a full solution for all your reading needs. Setup your own server and share your reading collection with your friends and family.
#27py-pdf/pypdf
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
#28jsvine/pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
#29yusufkaraaslan/Skill_Seekers
Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection
#30pymupdf/PyMuPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
#31bytedance/Dolphin
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
#32Kozea/WeasyPrint
The awesome document factory
#33pdfminer/pdfminer.six
Community maintained fork of pdfminer - we fathom PDF
#34smuyyh/BookReader
:closed_book: "任阅" 网络小说阅读器,3D翻页效果、txt/pdf/epub书籍阅读、Wifi传书~