back to home

inclusionAI / AReaL

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

4,817 stars
413 forks
43 issues
PythonJupyter NotebookDockerfile

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing inclusionAI/AReaL in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/inclusionAI/AReaL)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

AReaL : A Large-Scale Asynchronous Reinforcement Learning System | Paper | Documentation | 中文文档 | Ask DeepWiki | 🤗 Models & Data | WeChat (微信) Group | AReaL is an open-source **fully asynchronous** reinforcement learning training system for large **reasoning and agentic models**, developed by members from Tsinghua IIIS and the AReaL Team at Ant Group. Built upon the open-source project ReaLHF, we are fully committed to open-source principles by providing the training details, data, and infrastructure required to reproduce our results, along with the models themselves. AReaL aims to help everyone build their own AI agents easily and affordably. Our team loves milk tea because it's delicious, customizable, and affordable—we hope you enjoy our project just as much as you'd enjoy real milk tea. Cheers! **AReaL Highlights** • ⚡ **Flexibility**: Seamless customization for agentic RL and online RL training by simply replacing the . • 📈 **Scalability**: **Stable** fully asynchronous RL training with **industry-leading speed**. • ✨ **Cutting-Edge Performance**: State-of-the-art math, coding, search, and customer service agents. 📰 News **\[2026/03/02\]** We provide a complete example to train your own 🦞 OpenClaw agent by simply replacing the and with AReaL's RL service - no complicated dependencies, no code changes, works with any agentic runtime! **\[2026/02/06\]** We are delighted to introduce **AReaL-SEA**, a self-evolving data synthesis engine. Combined with RL training on AReaL, the 235B MoE model surpasses GPT 5 and achieves comparable performance with Gemini 3.0 Pro on $\\tau^2$-bench! Check out the paper, model, data, and code. **\[2026/01/15\]** Congrats to our friends at CAMEL-AI for open-sourcing SETA, their terminal agent RL project trained with AReaL! Check out their training workflow and the announcement on X. 📋 Previous Releases **\[2026/01/01\]** Happy New Year! Thanks to the outstanding contribution from @HwVanICI, we are excited to officially announce stable support for AReaL training on **Ascend NPU devices**! The code is actively maintained and continuously updated in the branch. Check out our documentation to get started, and feel free to report any issues! **\[2025/08/30\]** Introducing ASearcher, a state-of-the-art search agent built with AReaL's end-to-end asynchronous RL training. Check out the paper and the open-source repository! **\[2025/07/31\] (AReaL-lite)** We introduce AReaL-lite, a **lightweight** version of AReaL designed specifically for AI researchers and rapid prototyping. AReaL-lite features an **algorithm-first** API design that prioritizes ease of use and algorithm development, while natively supporting **fully asynchronous agentic RL**. With 80% fewer lines of code, AReaL-lite maintains 90% of AReaL's performance and core functionality. Check out our AReaL-lite design documentation and the quickstart guide to begin your journey with **AReaL-lite**! **\[2025/06/03\] (v0.3, boba²)** We release **boba²** (double-boba) for fully asynchronous RL training, which achieves **2.77× speedup while delivering comparable or superior training performance** compared to synchronous systems. Furthermore, asynchronous RL significantly simplifies multi-turn agentic RL training setup! Check out our v0.3 overview blog and the research paper. **\[2025/03/31\] (v0.2, boba)** Introducing our milestone release—boba! Please call it A-ReaL-boba! This release features significantly faster training with SGLang support and state-of-the-art 7B and 32B models for mathematical reasoning. Check out our v0.2 technical blog. **\[2025/02/24\] (v0.1)** Our initial release includes reproducible results for 1.5B and 7B Large Reasoning Models (LRMs). Check out our v0.1 technical blog. 🚀 Getting Started First, install the package: Our training scripts automatically download the required dataset (openai/gsm8k) and model (Qwen/Qwen2-1.5B-Instruct). To run on a single node: To run on a Ray cluster with 2 nodes and 8 GPUs per node (remember to update paths in the YAML file to point to your shared storage): For comprehensive setup instructions, see our quickstart guide. 📚 Examples Math & Reasoning | Task | Description | Performance | | --------------------------------------------------- | -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- | | **Math** | GSM8K math reasoning with GRPO, PPO, DAPO, REINFORCE, RLOO, LitePPO, DR-GRPO, GSPO, and more | - | | **Multi-Turn Math** | Multi-turn math agent with reward discounting across turns | Training Curve | | **LoRA Math** | Parameter-efficient math training with LoRA (SGLang/vLLM backends) | - | | **Countdown** | Countdown numbers game with custom rewards | Training Curve | Agentic RL | Task | Description | Performance | | -------------------------------------------------------- | ---------------------------------------------------------------------- | ---------------------------------------------------------------------------- | | **General Agent** | General agentic training with any agentic frameworks | Guide | | **Tau2 Customer Service** | Customer service agent on Tau2-Bench (retail, airline, telecom) | Paper | | **Search Agent** | End-to-end search agent with Tongyi-DeepResearch workflow | Training Curve | | **Tool-Integrated Reasoning** | Multi-turn tool calling during reasoning (Python executor, calculator) | Training Curve | | **OpenAI Agents Integration** | Integration with OpenAI Agents SDK for agentic workflows | - | | **CAMEL-AI Integration** | Integration with CAMEL-AI framework for agentic RL | - | Vision-Language Models | Task | Description | Performance | | ----------------------------------- | --------------------------------------------------------- | ----------------------------------------------- | | **VLM** | Geometry3K and CLEVR Count 70K visual reasoning with GRPO | - | | **VLM on NPU** |…