back to home

Best Open Source pdf Libraries

A curated list of the most popular GitHub repositories tagged with pdf. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1justjavac/free-programming-books-zh_CN

:books: 免费的计算机编程类中文书籍,欢迎投稿

116,333
Analyze Code

#2microsoft/markitdown

Python tool for converting files and office documents to Markdown.

87,461Python
Analyze Code

#3Stirling-Tools/Stirling-PDF

#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere

74,425TypeScript
Analyze Code

#4opendatalab/MinerU

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

54,581Python
Analyze Code

#5docling-project/docling

Get your documents ready for gen AI

53,757Python
Analyze Code

#6siyuan-note/siyuan

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

41,378TypeScript
Analyze Code

#7paperless-ngx/paperless-ngx

A community-supported supercharged document management system: scan, index and archive all your documents

36,804Python
Analyze Code

#8ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

32,673Python
Analyze Code

#9PDFMathTranslate/PDFMathTranslate

[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero

31,844Python
Analyze Code

#10hehonghui/awesome-english-ebooks

经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新

29,136CSS
Analyze Code

#11posquit0/Awesome-CV

:page_facing_up: Awesome CV is LaTeX template for your outstanding job application

26,790TeX
Analyze Code

#12forthespada/CS-Books

🔥🔥超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~

26,291
Analyze Code

#13koodo-reader/koodo-reader

A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux, Android, iOS and Web

26,067JavaScript
Analyze Code

#14koreader/koreader

An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices

25,486Lua
Analyze Code

#15ether/etherpad-lite

Etherpad: A modern really-real-time collaborative document editor.

18,135TypeScript
Analyze Code

#16readest/readest

Readest is a modern, feature-rich ebook reader designed for avid readers offering seamless cross-platform access, powerful tools, and an intuitive interface to elevate your reading experience.

17,836TypeScript
Analyze Code

#17xournalpp/xournalpp

Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input from devices such as Wacom Tablets.

14,331C++
Analyze Code

#18Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

14,014HTML
Analyze Code

#19kekingcn/kkFileView

Universal File Online Preview Project based on Spring-Boot

13,958Java
Analyze Code

#20janishar/mit-deep-learning-book-pdf

MIT Deep Learning Book in PDF format (complete and parts) by Ian Goodfellow, Yoshua Bengio and Aaron Courville

13,906Java
Analyze Code

#21QuestPDF/QuestPDF

QuestPDF is a modern library for PDF document generation. Its fluent C# API lets you design complex layouts with clean, readable code. Create documents using a flexible, component-based approach.

13,821C#
Analyze Code

#22Zettlr/Zettlr

Your One-Stop Publication Workbench

12,543TypeScript
Analyze Code

#23documenso/documenso

The Open Source DocuSign Alternative.

12,427TypeScript
Analyze Code

#24wmjordan/PDFPatcher

PDF补丁丁——PDF工具箱,可以编辑书签、剪裁旋转页面、解除限制、提取或合并文档,探查文档结构,提取图片、转成图片等等

12,161C#
Analyze Code

#25getomni-ai/zerox

OCR & Document Extraction using vision models

12,140TypeScript
Analyze Code

#26Kareadita/Kavita

Kavita is a fast, feature rich, cross platform reading server. Built with the goal of being a full solution for all your reading needs. Setup your own server and share your reading collection with your friends and family.

9,870C#
Analyze Code

#27py-pdf/pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

9,822Python
Analyze Code

#28jsvine/pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

9,741Python
Analyze Code

#29yusufkaraaslan/Skill_Seekers

Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection

9,680Python
Analyze Code

#30pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

9,097Python
Analyze Code

#31bytedance/Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

8,827Python
Analyze Code

#32Kozea/WeasyPrint

The awesome document factory

8,658Python
Analyze Code

#33pdfminer/pdfminer.six

Community maintained fork of pdfminer - we fathom PDF

6,904Python
Analyze Code

#34smuyyh/BookReader

:closed_book: "任阅" 网络小说阅读器,3D翻页效果、txt/pdf/epub书籍阅读、Wifi传书~

6,865Java
Analyze Code