back to home

axolotl-ai-cloud / axolotl

Go ahead and axolotl questions

11,456 stars
1,276 forks
218 issues
PythonJinjaShell

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing axolotl-ai-cloud/axolotl in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/axolotl-ai-cloud/axolotl)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

A Free and Open Source LLM Fine-tuning Framework 🎉 Latest Updates • 2026/03: • New model support has been added in Axolotl for Mistral Small 4, Qwen3.5, Qwen3.5 MoE, GLM-4.7-Flash, GLM-4.6V, and GLM-4.5-Air. • MoE expert quantization support (via ) greatly reduces VRAM when training MoE models (FSDP2 compat). • 2026/02: • ScatterMoE LoRA support. LoRA fine-tuning directly on MoE expert weights using custom Triton kernels. • Axolotl now has support for SageAttention and GDPO (Generalized DPO). • 2026/01: • New integration for EAFT (Entropy-Aware Focal Training), weights loss by entropy of the top-k logit distribution, and Scalable Softmax, improves long context in attention. • 2025/12: • Axolotl now includes support for Kimi-Linear, Plano-Orchestrator, MiMo, InternVL 3.5, Olmo3, Trinity, and Ministral3. • Distributed Muon Optimizer support has been added for FSDP2 pretraining. • 2025/10: New model support has been added in Axolotl for: Qwen3 Next, Qwen2.5-vl, Qwen3-vl, Qwen3, Qwen3MoE, Granite 4, HunYuan, Magistral 2509, Apertus, and Seed-OSS. Expand older updates • 2025/09: Axolotl now has text diffusion training. Read more here. • 2025/08: QAT has been updated to include NVFP4 support. See PR. • 2025/07: • ND Parallelism support has been added into Axolotl. Compose Context Parallelism (CP), Tensor Parallelism (TP), and Fully Sharded Data Parallelism (FSDP) within a single node and across multiple nodes. Check out the blog post for more info. • Axolotl adds more models: GPT-OSS, Gemma 3n, Liquid Foundation Model 2 (LFM2), and Arcee Foundation Models (AFM). • FP8 finetuning with fp8 gather op is now possible in Axolotl via . Get started here! • Voxtral, Magistral 1.1, and Devstral with mistral-common tokenizer support has been integrated in Axolotl! • TiledMLP support for single-GPU to multi-GPU training with DDP, DeepSpeed and FSDP support has been added to support Arctic Long Sequence Training. (ALST). See examples for using ALST with Axolotl! • 2025/06: Magistral with mistral-common tokenizer support has been added to Axolotl. See docs to start training your own Magistral models with Axolotl! • 2025/05: Quantization Aware Training (QAT) support has been added to Axolotl. Explore the docs to learn more! • 2025/04: Llama 4 support has been added in Axolotl. See docs to start training your own Llama 4 models with Axolotl's linearized version! • 2025/03: Axolotl has implemented Sequence Parallelism (SP) support. Read the blog and docs to learn how to scale your context length when fine-tuning. • 2025/03: (Beta) Fine-tuning Multimodal models is now supported in Axolotl. Check out the docs to fine-tune your own! • 2025/02: Axolotl has added LoRA optimizations to reduce memory usage and improve training speed for LoRA and QLoRA in single GPU and multi-GPU training (DDP and DeepSpeed). Jump into the docs to give it a try. • 2025/02: Axolotl has added GRPO support. Dive into our blog and GRPO example and have some fun! • 2025/01: Axolotl has added Reward Modelling / Process Reward Modelling fine-tuning support. See docs. ✨ Overview Axolotl is a free and open-source tool designed to streamline post-training and fine-tuning for the latest large language models (LLMs). Features: • **Multiple Model Support**: Train various models like GPT-OSS, LLaMA, Mistral, Mixtral, Pythia, and many more models available on the Hugging Face Hub. • **Multimodal Training**: Fine-tune vision-language models (VLMs) including LLaMA-Vision, Qwen2-VL, Pixtral, LLaVA, SmolVLM2, GLM-4.6V, InternVL 3.5, Gemma 3n, and audio models like Voxtral with image, video, and audio support. • **Training Methods**: Full fine-tuning, LoRA, QLoRA, GPTQ, QAT, Preference Tuning (DPO, IPO, KTO, ORPO), RL (GRPO, GDPO), and Reward Modelling (RM) / Process Reward Modelling (PRM). • **Easy Configuration**: Re-use a single YAML configuration file across the full fine-tuning pipeline: dataset preprocessing, training, evaluation, quantization, and inference. • **Performance Optimizations**: Multipacking, Flash Attention 2/3/4, Xformers, Flex Attention, SageAttention, Liger Kernel, Cut Cross Entropy, ScatterMoE, Sequence Parallelism (SP), LoRA optimizations, Multi-GPU training (FSDP1, FSDP2, DeepSpeed), Multi-node training (Torchrun, Ray), and many more! • **Flexible Dataset Handling**: Load from local, HuggingFace, and cloud (S3, Azure, GCP, OCI) datasets. • **Cloud Ready**: We ship Docker images and also PyPI packages for use on cloud platforms and local hardware. 🚀 Quick Start - LLM Fine-tuning in Minutes **Requirements**: • NVIDIA GPU (Ampere or newer for and Flash Attention) or AMD GPU • Python 3.11 • PyTorch ≥2.8.0 Google Colab Installation Using pip Using Docker Installing with Docker can be less error prone than installing in your own environment. Other installation approaches are described here. Cloud Providers • RunPod • Vast.ai • PRIME Intellect • Modal • Novita • JarvisLabs.ai • Latitude.sh Your First Fine-tune That's it! Check out our Getting Started Guide for a more detailed walkthrough. 📚 Documentation • Installation Options - Detailed setup instructions for different environments • Configuration Guide - Full configuration options and examples • Dataset Loading - Loading datasets from various sources • Dataset Guide - Supported formats and how to use them • Multi-GPU Training • Multi-Node Training • Multipacking • API Reference - Auto-generated code documentation • FAQ - Frequently asked questions 🤝 Getting Help • Join our Discord community for support • Check out our Examples directory • Read our Debugging Guide • Need dedicated support? Please contact ✉️wing@axolotl.ai for options 🌟 Contributing Contributions are welcome! Please see our Contributing Guide for details. 📈 Telemetry Axolotl has opt-out telemetry that helps us understand how the project is being used and prioritize improvements. We collect basic system information, model types, and error rates—never personal data or file paths. Telemetry i…