Best Open Source evaluation Libraries
A curated list of the most popular GitHub repositories tagged with evaluation. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1mlflow/mlflow
The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.
#2langfuse/langfuse
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
#3comet-ml/opik
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
#4Tencent/WeKnora
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
#5vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
#6oumi-ai/oumi
Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!
#7open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.