Best Open Source evaluation Libraries

A curated list of the most popular GitHub repositories tagged with evaluation. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1mlflow/mlflow

The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

24,349Python

Analyze Code

#2langfuse/langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

22,137TypeScript

Analyze Code

#3comet-ml/opik

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

17,798Python

Analyze Code

#4Tencent/WeKnora

LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

13,064Go

Analyze Code

#5vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

12,669Python

Analyze Code

#6oumi-ai/oumi

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

8,860Python

Analyze Code

#7open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

6,679Python

Analyze Code