back to home

Best Open Source rag Libraries

A curated list of the most popular GitHub repositories tagged with rag. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1langgenius/dify

Production-ready platform for agentic workflow development.

129,920TypeScript
Analyze Code

#2langchain-ai/langchain

🦜🔗 The platform for reliable agents.

127,110Python
Analyze Code

#3open-webui/open-webui

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

124,513Python
Analyze Code

#4Shubhamsaboo/awesome-llm-apps

Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models.

96,413Python
Analyze Code

#5infiniflow/ragflow

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

73,497Python
Analyze Code

#6PaddlePaddle/PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

70,987Python
Analyze Code

#7dair-ai/Prompt-Engineering-Guide

🐙 Guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.

70,629MDX
Analyze Code

#8pathwaycom/llm-app

Ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data. 🐳Docker-friendly.⚡Always in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.

56,280Jupyter Notebook
Analyze Code

#9Mintplex-Labs/anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

54,814JavaScript
Analyze Code

#10FlowiseAI/Flowise

Build AI Agents, Visually

49,249TypeScript
Analyze Code

#11mem0ai/mem0

Universal memory layer for AI Agents

47,726Python
Analyze Code

#12run-llama/llama_index

LlamaIndex is the leading document agent and OCR platform

47,100Python
Analyze Code

#13jeecgboot/JeecgBoot

【AI低代码平台】AI low-code platform empowers enterprises to quickly develop low-code solutions and build AI applications. 助力企业快速实现低代码开发和构建AI应用! AI应用平台涵盖:AI应用、AI模型、AI聊天助手、知识库、AI流程编排、MCP和插件,聊天式业务操作等。 强大代码生成器:实现前后端一键生成,无需手写代码! 显著提升效率节省成本,又不失灵活~

45,255Java
Analyze Code

#14milvus-io/milvus

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

42,914Go
Analyze Code

#15QuivrHQ/quivr

Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.

38,947Python
Analyze Code

#16mindsdb/mindsdb

Federated Query Engine for AI - The only MCP Server you'll ever need

38,551Python
Analyze Code

#17chatchat-space/Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain

37,306Python
Analyze Code

#18khoj-ai/khoj

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

32,554Python
Analyze Code

#19microsoft/graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system

31,009Python
Analyze Code

#20patchy631/ai-engineering-hub

In-depth tutorials on LLMs, RAGs and real-world AI agent applications.

30,364Jupyter Notebook
Analyze Code

#21thedotmack/claude-mem

A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.

29,742TypeScript
Analyze Code

#22ItzCrazyKns/Perplexica

Perplexica is an AI-powered answering engine.

29,008TypeScript
Analyze Code

#23HKUDS/LightRAG

[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

28,485Python
Analyze Code

#24labring/FastGPT

FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.

27,140TypeScript
Analyze Code

#25simstudioai/sim

Build, deploy, and orchestrate AI agents. Sim is the central intelligence layer for your AI workforce.

26,503TypeScript
Analyze Code

#26chroma-core/chroma

Open-source search and retrieval database for AI applications.

26,222Rust
Analyze Code

#27datawhalechina/happy-llm

📚 从零开始的大语言模型原理与实践教程

26,007Jupyter Notebook
Analyze Code

#28NirDiamant/RAG_Techniques

This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contextually rich responses.

25,520Jupyter Notebook
Analyze Code

#29Cinnamon/kotaemon

An open-source RAG-based tool for chatting with your documents.

25,145Python
Analyze Code

#30langchain-ai/langgraph

Build resilient language agents as graphs.

24,913Python
Analyze Code

#31deepset-ai/haystack

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

24,250MDX
Analyze Code

#32getzep/graphiti

Build Real-Time Knowledge Graphs for AI Agents

22,975Python
Analyze Code

#33ScrapeGraphAI/Scrapegraph-ai

Python scraper based on AI

22,734Python
Analyze Code

#34vanna-ai/vanna

🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.

22,721Python
Analyze Code

#35datawhalechina/hello-agents

📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程

21,248Python
Analyze Code

#361Panel-dev/MaxKB

🔥 MaxKB is an open-source platform for building enterprise-grade agents. 强大易用的开源企业级智能体平台。

20,146Python
Analyze Code

#37coze-dev/coze-studio

An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.

19,897TypeScript
Analyze Code

#38humanlayer/12-factor-agents

What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?

18,290TypeScript
Analyze Code

#39eosphoros-ai/DB-GPT

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

18,146Python
Analyze Code

#40arc53/DocsGPT

Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.

17,716Python
Analyze Code

#41elizaOS/eliza

Autonomous agents for everyone

17,566TypeScript
Analyze Code

#42onyx-dot-app/onyx

Open Source AI Platform - AI Chat with advanced features that works with every LLM

17,513Python
Analyze Code

#43Canner/WrenAI

⚡️ GenBI (Generative BI) queries any database in natural language, generates accurate SQL (Text-to-SQL), charts (Text-to-Chart), and AI-powered business intelligence in seconds.

14,443TypeScript
Analyze Code

#44ConardLi/easy-dataset

A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval

13,392JavaScript
Analyze Code

#45memvid/memvid

Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.

13,175Rust
Analyze Code

#46Tencent/WeKnora

LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

13,064Go
Analyze Code

#47MODSetter/SurfSense

Connect any LLM to your internal knowledge sources and chat with it in real time alongside your team. OSS alternative to NotebookLM, Perplexity, and Glean. Join our Discord: https://discord.gg/ejRNvftDp9

13,002Python
Analyze Code

#48topoteretes/cognee

Knowledge Engine for AI Agent Memory in 6 lines of code

12,449Python
Analyze Code

#49ZhuLinsen/daily_stock_analysis

LLM驱动的 A/H/美股智能分析器,多数据源行情 + 实时新闻 + Gemini 决策仪表盘 + 多渠道推送,零成本,纯白嫖,定时运行

12,353Python
Analyze Code

#50neuml/txtai

💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

12,192Python
Analyze Code