back to home

InternLM / xtuner

A Next-Generation Training Engine Built for Ultra-Large MoE Models

5,100 stars
406 forks
318 issues
PythonDockerfileShell

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing InternLM/xtuner in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/InternLM/xtuner)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

👋 join us on 🔍 Explore our models on English | įŽ€äŊ“中文 🚀 Speed Benchmark 🎉 News â€ĸ **\[2025/09\]** XTuner V1 Released! A Next-Generation Training Engine Built for Ultra-Large MoE Models 📖 XTuner V1 XTuner V1 is a next-generation LLM training engine specifically designed for ultra-large-scale MoE models. Unlike traditional 3D parallel training architectures, XTuner V1 is optimized for the mainstream MoE training scenarios prevalent in today's academic research. Key Features **📊 Dropless Training** â€ĸ **Scalable without complexity:** Train 200B-scale MoE models without expert parallelism; 600B models require only intra-node expert parallelism â€ĸ **Optimized parallelism strategy:** Smaller expert parallelism dimension compared to traditional 3D approaches, enabling more efficient Dropless training **📝 Long Sequence Support** â€ĸ **Memory-efficient design:** Train 200B MoE models on 64k sequence lengths without sequence parallelism through advanced memory optimization techniques â€ĸ **Flexible scaling:** Full support for DeepSpeed Ulysses sequence parallelism with linearly scalable maximum sequence length â€ĸ **Robust performance:** Maintains stability despite expert load imbalance during long sequence training **⚡ Superior Efficiency** â€ĸ **Massive scale:** Supports MoE training up to 1T parameters â€ĸ **Breakthrough performance:** First to achieve FSDP training throughput that surpasses traditional 3D parallel schemes for MoE models above 200B scale â€ĸ **Hardware optimization:** Achieves training efficiency on Ascend A3 Supernode that exceeds NVIDIA H800 đŸ”Ĩ Roadmap XTuner V1 is committed to continuously improving training efficiency for pre-training, instruction fine-tuning, and reinforcement learning of ultra-large MoE models, with special focus on Ascend NPU optimization. 🚀 Training Engine Our vision is to establish XTuner V1 as a versatile training backend that seamlessly integrates with the broader open-source ecosystem. | Model | GPU(FP8) | GPU(BF16)| NPU(BF16) | |------------|-----------|----------|-----------| | Intern S1 | ✅ | ✅ | ✅ | | Intern VL | ✅ | ✅ | ✅ | | Qwen3 Dense| ✅ | ✅ | ✅ | | Qwen3 MoE | ✅ | ✅ | ✅ | | GPT OSS | ✅ | ✅ | 🚧 | | Deepseek V3| ✅ | ✅ | 🚧 | | KIMI K2 | ✅ | ✅ | 🚧 | 🧠 Algorithm The algorithm component is actively evolving. We welcome community contributions - with XTuner V1, scale your algorithms to unprecedented sizes! **Implemented** â€ĸ ✅ **Multimodal Pre-training** - Full support for vision-language model training â€ĸ ✅ **Multimodal Supervised Fine-tuning** - Optimized for instruction following â€ĸ ✅ GRPO - Group Relative Policy Optimization **Coming Soon** â€ĸ 🔄 MPO - Mixed Preference Optimization â€ĸ 🔄 DAPO - Dynamic Sampling Policy Optimization â€ĸ 🔄 **Multi-turn Agentic RL** - Advanced agent training capabilities ⚡ Inference Engine Integration Seamless deployment with leading inference frameworks: â€ĸ [x] LMDeploy â€ĸ [ ] vLLM â€ĸ [ ] SGLang Data Preparation â€ĸ You can use GraphGen to create synthetic data for fine-tuning. 🤝 Contributing We appreciate all contributions to XTuner. Please refer to CONTRIBUTING.md for the contributing guideline. 🙏 Acknowledgement The development of XTuner V1's training engine has been greatly inspired by and built upon the excellent work of the open-source community. We extend our sincere gratitude to the following pioneering projects: **Training Engine:** â€ĸ Torchtitan - A PyTorch native platform for training generative AI models â€ĸ Deepspeed - Microsoft's deep learning optimization library â€ĸ MindSpeed - Ascend's high-performance training acceleration library â€ĸ Megatron - NVIDIA's large-scale transformer training framework **Reinforcement Learning:** XTuner V1's reinforcement learning capabilities have been enhanced through insights and best practices from: â€ĸ veRL - Volcano Engine Reinforcement Learning for LLMs â€ĸ SLIME - THU's scalable RLHF implementation â€ĸ AReal - Ant Reasoning Reinforcement Learning for LLMs â€ĸ OpenRLHF - An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray We are deeply grateful to all contributors and maintainers of these projects for advancing the field of large-scale model training. đŸ–Šī¸ Citation License This project is released under the Apache License 2.0. Please also adhere to the Licenses of models and datasets being used.