back to home

Best Open Source pdf Libraries

A curated list of the most popular GitHub repositories tagged with pdf. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1justjavac/free-programming-books-zh_CN

:books: 免费的计算机编程类中文书籍,欢迎投稿

116,465
Explore Repo

#2microsoft/markitdown

Python tool for converting files and office documents to Markdown.

90,864Python
Explore Repo

#3Stirling-Tools/Stirling-PDF

#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere

75,401TypeScript
Explore Repo

#4opendatalab/MinerU

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

56,399Python
Explore Repo

#5docling-project/docling

Get your documents ready for gen AI

55,956Python
Explore Repo

#6siyuan-note/siyuan

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

41,915TypeScript
Explore Repo

#7paperless-ngx/paperless-ngx

A community-supported supercharged document management system: scan, index and archive all your documents

37,418Python
Explore Repo

#8ocrmypdf/OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

32,961Python
Explore Repo

#9PDFMathTranslate/PDFMathTranslate

[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero

32,297Python
Explore Repo

#10hehonghui/awesome-english-ebooks

经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新

29,609CSS
Explore Repo

#11posquit0/Awesome-CV

:page_facing_up: Awesome CV is LaTeX template for your outstanding job application

27,036TeX
Explore Repo

#12forthespada/CS-Books

🔥🔥超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~

26,476
Explore Repo

#13koodo-reader/koodo-reader

A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux, Android, iOS and Web

26,294JavaScript
Explore Repo

#14koreader/koreader

An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices

25,799Lua
Explore Repo

#15opendataloader-project/opendataloader-pdf

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

21,855Java
Explore Repo

#16readest/readest

Readest is a modern, feature-rich ebook reader designed for avid readers offering seamless cross-platform access, powerful tools, and an intuitive interface to elevate your reading experience.

18,756TypeScript
Explore Repo

#17ether/etherpad-lite

Etherpad: A modern really-real-time collaborative document editor.

18,185TypeScript
Explore Repo

#18diegomura/react-pdf

📄 Create PDF files using React

16,469TypeScript
Explore Repo

#19salomonelli/best-resume-ever

:necktie: :briefcase: Build fast :rocket: and easy multiple beautiful resumes and create your best CV ever! Made with Vue and LESS.

16,463Vue
Explore Repo

#20mayooear/ai-pdf-chatbot-langchain

AI PDF chatbot agent built with LangChain & LangGraph

16,395TypeScript
Explore Repo

#21sumatrapdfreader/sumatrapdf

SumatraPDF reader

16,256C
Explore Repo

#22documenso/documenso

The Open Source DocuSign Alternative.

13,149TypeScript
Explore Repo

#23T8RIN/ImageToolbox

🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options

13,010Kotlin
Explore Repo

#24jsvine/pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

9,934Python
Explore Repo

#25py-pdf/pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

9,875Python
Explore Repo

#26pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

9,256Python
Explore Repo

#27ahrm/sioyek

Sioyek is a PDF viewer with a focus on textbooks and research papers

9,228C
Explore Repo

#28bytedance/Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

8,865Python
Explore Repo

#29Kozea/WeasyPrint

The awesome document factory

8,725Python
Explore Repo

#30oomol-lab/pdf-craft

PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.

4,959Python
Explore Repo

#31ciromattia/kcc

KCC (a.k.a. Kindle Comic Converter) is a comic and manga converter for ebook readers.

4,882Python
Explore Repo

#32hadyang/interview

Java 笔试、面试 知识整理

4,848
Explore Repo

#33Hufe921/canvas-editor

rich text editor by canvas/svg

4,843TypeScript
Explore Repo

#34qpdf/qpdf

qpdf: A content-preserving PDF document transformer

4,842C++
Explore Repo

#35prawnpdf/prawn

Fast, Nimble PDF Writer for Ruby

4,802Ruby
Explore Repo

#36tecnickcom/tc-lib-pdf

PHP PDF Library (TCPDF)

1,839PHP
Explore Repo

#37tobya/DocTo

Simple command line utility for converting .doc & .xls files to any supported format such as Text, RTF, CSV or PDF

506Pascal
Explore Repo

#38SimplePDF/simplepdf-embed

PDF editor in the browser – add text, checkboxes, pictures, signatures to PDF files. Merge, rotate PDF pages – iframe, script and React component

403TypeScript
Explore Repo

#39datadrivenconstruction/OpenConstructionERP

Open-source construction ERP - BOQ, PDF/CAD/BIM takeoff, AI cost matching. 42 regional catalogues, 21 languages, 71 modules. AGPL-3.0. v3.0 - pip install openconstructionerp

292TypeScript
Explore Repo

#40shipfastlabs/parsel

A fast, helpful, and open-source document parser for PHP

165PHP
Explore Repo