Best Open Source information retrieval Libraries
A curated list of the most popular GitHub repositories tagged with information retrieval. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1JaidedAI/EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
#2deepset-ai/haystack
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.
#3arc53/DocsGPT
Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.
#4onyx-dot-app/onyx
Open Source AI Platform - AI Chat with advanced features that works with every LLM
#5Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
#6neuml/txtai
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows