InternLM / xtuner
A Next-Generation Training Engine Built for Ultra-Large MoE Models
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing InternLM/xtuner in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewđ join us on đ Explore our models on English | įŽäŊ䏿 đ Speed Benchmark đ News âĸ **\[2025/09\]** XTuner V1 Released! A Next-Generation Training Engine Built for Ultra-Large MoE Models đ XTuner V1 XTuner V1 is a next-generation LLM training engine specifically designed for ultra-large-scale MoE models. Unlike traditional 3D parallel training architectures, XTuner V1 is optimized for the mainstream MoE training scenarios prevalent in today's academic research. Key Features **đ Dropless Training** âĸ **Scalable without complexity:** Train 200B-scale MoE models without expert parallelism; 600B models require only intra-node expert parallelism âĸ **Optimized parallelism strategy:** Smaller expert parallelism dimension compared to traditional 3D approaches, enabling more efficient Dropless training **đ Long Sequence Support** âĸ **Memory-efficient design:** Train 200B MoE models on 64k sequence lengths without sequence parallelism through advanced memory optimization techniques âĸ **Flexible scaling:** Full support for DeepSpeed Ulysses sequence parallelism with linearly scalable maximum sequence length âĸ **Robust performance:** Maintains stability despite expert load imbalance during long sequence training **⥠Superior Efficiency** âĸ **Massive scale:** Supports MoE training up to 1T parameters âĸ **Breakthrough performance:** First to achieve FSDP training throughput that surpasses traditional 3D parallel schemes for MoE models above 200B scale âĸ **Hardware optimization:** Achieves training efficiency on Ascend A3 Supernode that exceeds NVIDIA H800 đĨ Roadmap XTuner V1 is committed to continuously improving training efficiency for pre-training, instruction fine-tuning, and reinforcement learning of ultra-large MoE models, with special focus on Ascend NPU optimization. đ Training Engine Our vision is to establish XTuner V1 as a versatile training backend that seamlessly integrates with the broader open-source ecosystem. | Model | GPU(FP8) | GPU(BF16)| NPU(BF16) | |------------|-----------|----------|-----------| | Intern S1 | â | â | â | | Intern VL | â | â | â | | Qwen3 Dense| â | â | â | | Qwen3 MoE | â | â | â | | GPT OSS | â | â | đ§ | | Deepseek V3| â | â | đ§ | | KIMI K2 | â | â | đ§ | đ§ Algorithm The algorithm component is actively evolving. We welcome community contributions - with XTuner V1, scale your algorithms to unprecedented sizes! **Implemented** âĸ â **Multimodal Pre-training** - Full support for vision-language model training âĸ â **Multimodal Supervised Fine-tuning** - Optimized for instruction following âĸ â GRPO - Group Relative Policy Optimization **Coming Soon** âĸ đ MPO - Mixed Preference Optimization âĸ đ DAPO - Dynamic Sampling Policy Optimization âĸ đ **Multi-turn Agentic RL** - Advanced agent training capabilities ⥠Inference Engine Integration Seamless deployment with leading inference frameworks: âĸ [x] LMDeploy âĸ [ ] vLLM âĸ [ ] SGLang Data Preparation âĸ You can use GraphGen to create synthetic data for fine-tuning. đ¤ Contributing We appreciate all contributions to XTuner. Please refer to CONTRIBUTING.md for the contributing guideline. đ Acknowledgement The development of XTuner V1's training engine has been greatly inspired by and built upon the excellent work of the open-source community. We extend our sincere gratitude to the following pioneering projects: **Training Engine:** âĸ Torchtitan - A PyTorch native platform for training generative AI models âĸ Deepspeed - Microsoft's deep learning optimization library âĸ MindSpeed - Ascend's high-performance training acceleration library âĸ Megatron - NVIDIA's large-scale transformer training framework **Reinforcement Learning:** XTuner V1's reinforcement learning capabilities have been enhanced through insights and best practices from: âĸ veRL - Volcano Engine Reinforcement Learning for LLMs âĸ SLIME - THU's scalable RLHF implementation âĸ AReal - Ant Reasoning Reinforcement Learning for LLMs âĸ OpenRLHF - An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray We are deeply grateful to all contributors and maintainers of these projects for advancing the field of large-scale model training. đī¸ Citation License This project is released under the Apache License 2.0. Please also adhere to the Licenses of models and datasets being used.