back to home

rllm-org / rllm

Democratizing Reinforcement Learning for LLMs

View on GitHub
5,268 stars
522 forks
161 issues

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing rllm-org/rllm in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/rllm-org/rllm)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

rLLM **Train your AI agents with RL. Any framework. Minimal code changes.** rLLM is an open-source framework for training AI agents with reinforcement learning. Swap in a tracked client, define a reward function, and let RL handle the rest — no matter what agent framework you use. Core Features • **Works with any agent framework** — LangGraph, SmolAgent, Strands, OpenAI Agents SDK, Google ADK, or plain . Just swap the client. šŸ”Œ • **Near-zero code changes** — Add to wrap your agent code, and rLLM traces every LLM call automatically. šŸŖ„ • **CLI-first workflow** — Eval and train from the command line with 50+ built-in benchmarks. just works. ⚔ • **Battle-tested results** — rLLM-trained agents beat models 50x their size (4B → outperforms 235B on finance, 1.5B → surpasses O1-Preview on math). šŸ“ˆ • **Multiple RL algorithms** — GRPO, REINFORCE, RLOO, rejection sampling, and more. 🧠 • **Two training backends** — for distributed multi-GPU training, for single-machine / CPU setups. Same API either way. šŸ”§ Read more on our documentation site. Installation rLLM requires ( is needed if using ). You can install it either directly via pip or build from source. this installs dependencies for running rllm cli, which uses Tinker as the training backend. To use as the training backend (GPU machine required), install via For building from source or Docker, see the installation guide. Quickstart Option A: CLI (no code needed) Option B: Python API Define a rollout (your agent) and an evaluator (your reward function), then hand them to the trainer: During training, points to a gateway that transparently captures token IDs and logprobs — your agent code stays the same for eval and training. See the cookbooks for complete working examples (single-turn VLM solver, multi-agent solver-judge, and more). Architecture rLLM follows a simple pipeline: **run your agent → collect traces → compute rewards → update the model**. Your agent runs as-is — rLLM's SDK intercepts LLM calls and structures them into **Episodes** (one task) containing **Trajectories** (one agent run) made of **Steps** (one LLM call). A reward function scores the result, and the RL algorithm updates the model weights. The same agent code works for both eval and training. Under the hood: • **Workflow Engine** runs N parallel agent instances to collect rollouts • **LiteLLM Proxy** routes requests and captures token IDs + logprobs • **Transform Pipeline** groups trajectories for advantage computation • **Training Backend** (verl or tinker) handles the policy update Community Projects • Tongyi DeepResearch — Open-source AI researchers by Alibaba NLP • Terminal-Bench-RL — Training long-horizon terminal agents with RL • PettingLLMs — Multi-agent RL with on-policy training • SETA — Scaling environments for terminal agents • LLM-in-Sandbox — Building general agents by running LLMs in a sandbox • Cogito, Ergo Ludo — An agent that learns to play by reasoning and planning • Cut the Bill, Keep the Turns — Affordable multi-turn search RL • Experiential Reinforcement Learning — Experience-reflection-consolidation loop for RL with sparse rewards • V1: Unifying Generation and Self-Verification — Pairwise self-verification for parallel test-time scaling Articles & Blog Posts • rLLM UI: Real-Time Observability Tool for Agent Training & Evaluation — Mar 2026 • rLLM On-Policy Distillation: Training Smaller Students from Stronger Teachers — Mar 2026 • Faster and Better: Open-Source Recipe for Deep Research Agents with Fully Async Training — Feb 2026 • rLLM-FinQA: How a 4B Model Outperforms 235B and Rivals Gemini 2.5 Pro on Financial Analysis — Feb 2026 • rLLM SDK: Training Any Agentic Program without Code Changes — Dec 2025 • rLLM v0.2: RL Training for General Agentic Programs — Oct 2025 • DeepSWE: Open-source SWE Agent via RL — Jul 2025 • DeepCoder: 14B Coder at O3-mini Level — Apr 2025 • DeepScaleR: 1.5B Surpasses O1-Preview — Feb 2025 Acknowledgements Our work is done as part of Berkeley Sky Computing Lab. The rLLM team is generously supported by grants from Laude Institute, AWS, Hyperbolic, Fireworks AI, and Modal. We pay special thanks to Together AI for the research partnership and compute support. Citation You may also cite our prior work DeepScaleR, DeepCoder, and DeepSWE.