back to home

NVIDIA-NeMo / Megatron-Bridge

Training library for Megatron-based models with bidirectional Hugging Face conversion capability

View on GitHub
523 stars
228 forks
477 issues
PythonShellJupyter Notebook

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing NVIDIA-NeMo/Megatron-Bridge in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/NVIDIA-NeMo/Megatron-Bridge)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

NeMo Megatron Bridge Documentation | Supported Models | Examples | Contributing šŸ“£ News • [03/12/2026] **Deprecating Python 3.10 support:** We're officially dropping Python 3.10 support with the upcoming 0.4.0 release. Downstream applications must raise their lower boundary to 3.12 to stay compatible with Megatron-Bridge. • [12/16/2025] Mind Lab successfully used Megatron-bridge and VeRL to trained GRPO Lora for Trillion-parameter model on 64 H800 - See their techblog. • [12/15/2025] Day 0 support for NVIDIA-NeMotron-3-Nano-30B-A3B-FP8! Reproducible code and custom NGC container: nvcr.io/nvidia/nemo:25.11.nemotron_3_nano Overview NeMo Megatron Bridge is a PyTorch-native library within the NeMo Framework that provides pretraining, SFT and LoRA for popular LLM and VLM models. It serves as a powerful **bridge, conversion, and verification layer** between šŸ¤— Hugging Face and Megatron Core. It provides bidirectional checkpoint conversion between these formats, enabling other projects to leverage Megatron Core's parallelism capabilities or export models for various inference engines. The bridge includes built-in verification mechanisms to ensure conversion accuracy and checkpoint integrity across different model formats. On top of the bridge, NeMo Megatron Bridge provides a performant and scalable PyTorch-native training loop that leverages Megatron Core to deliver state-of-the-art training throughput. It supports pretraining and fine-tuning with features like tensor and pipeline parallelism, and mixed precision (FP8, BF16, FP4, etc.). Users can either use existing šŸ¤— Hugging Face models or define custom PyTorch model definitions for flexible end-to-end workflows. NeMo Megatron Bridge is a refactor of the previous NeMo training stack that adopts a PyTorch-native training loop to provide greater flexibility and customizability for developers. šŸ”§ Installation 🐳 NeMo Framework container The best experience, highest performance, and full feature support are provided by the NeMo Framework container. Fetch the most recent $TAG and run the following to start a container: For development installation and additional details, please refer to our Contribution guide. ⚔ Quickstart To get started, install Megatron Bridge or download a NeMo Framework container as described above. Log in to Hugging Face Hub: Conversion-only quickstart (āœ… Core): Training quickstart using pre-configured recipes: You can launch the above script with: More examples: • Conversion scripts overview • Import/Export checkpoints • Generation with bridge • Multi-GPU loading from HF • Compare HF vs Megatron outputs • Toy RLHF with Bridge (HF inference + Megatron training) For a deeper dive into conversion design and advanced usage, see the models README. šŸš€ Key Features • **Bridge with šŸ¤— Hugging Face**: Seamless bidirectional conversion between šŸ¤— Hugging Face and Megatron formats for interoperability (model bridges, auto bridge, conversion examples) • Online import/export without intermediate full checkpoints • Parallelism-aware (TP/PP/VPP/CP/EP/ETP) during conversion • Memory-efficient per-parameter streaming • Simple high-level API with architecture auto-detection • Optimized paths when Transformer Engine is available • **Flexible to Customize**: Lightweight custom training loop making it easy to configure custom logic in data loading, distributed training, checkpointing, evaluation and logging (training framework, training utilities) • **Supervised & Parameter-Efficient Finetuning**: SFT & PEFT implementation tailored for Megatron-based models that supports LoRA, DoRA, and user-defined PEFT methods (PEFT implementations, finetune module, SFT dataset) • **SOTA Training Recipes**: Pre-configured production-ready training recipes for popular models like Llama 3, with optimized hyperparameters and distributed training configuration (Llama recipes, recipe examples) • **Performance Optimization**: Built-in support for FP8 training, model parallelism, and memory-efficient techniques to offer high utilization and near-linear scalability to thousands of nodes. (mixed precision, communication overlap, optimizer utilities) Supported Models Megatron Bridge provides out-of-the-box bridges and training recipes for a wide range of models, built on top of base model architectures from Megatron Core. Refer to the models directory for the most up-to-date list of model bridges. Supported Models Overview For more details on supported models, see our documentation: • **Large Language Models** • **Vision Language Models** | Model | Checkpoint Conversion | Pretrain Recipes | SFT & LoRA Recipes | |-------|-------------------|-------------------|-------------------| | DeepSeek V2 | āœ… | āœ… (v2) | Coming soon | | DeepSeek V2 Lite | āœ… | āœ… (v2-lite) | Coming soon | | DeepSeek V3 | āœ… | āœ… (v3) | Coming soon | | Gemma | āœ… | Coming soon | Coming soon | | Gemma 2 | āœ… | Coming soon | Coming soon | | Gemma 3 | āœ… | āœ… (1B) | āœ… (1B) | | Gemma 3-VL | āœ… | Coming soon | āœ… (4B/12B/27B) | | GLM-4.5 | āœ… | āœ… (106B-Air/355B) | āœ… (106B-Air/355B) | | GPT-oss | āœ… | āœ… (20B/120B) | āœ… (20B/120B) | | Llama 2 | āœ… | āœ… (7B) | Coming soon | | Llama 3 | āœ… | āœ… (8B/70B) | āœ… (8B/70B) | | Llama 3.1 | āœ… | āœ… (8B/70B/405B) | āœ… (8B/70B/405B) | | Llama 3.2 | āœ… | āœ… (1B/3B) | āœ… (1B/3B) | | Llama 3.3 | āœ… | Coming soon | Coming soon | | Llama Nemotron | āœ… | Coming soon | Coming soon | | Mistral | āœ… | Coming soon | Coming soon | |Ministral| āœ…| āœ… 3B/8B/14B|āœ… 3B/8B/14B| | Moonlight | āœ… | āœ… (16B) | āœ… (16B) | | Nemotron | āœ… | Coming soon | Coming soon | | Nemotron-nano-v3 | āœ… | āœ… (30B-A3B) | āœ… (A3B) | | Nemotron-super-v3 | āœ… | āœ… (120B-A12B) | āœ… (A12B) | | Nemotron-H | āœ… | āœ… (4B/8B/47B/56B) | Coming soon | | Nemotron Nano v2 | āœ… | āœ… (9B/12B) | Coming soon | | Nemotron Nano v2 VL | āœ… | Coming soon | āœ… (9B/12B) | | OlMoE | āœ… | āœ… (7B) | āœ… (7B) | | Qwen2 | āœ… | āœ… (500M/1.5B/7B/72B) | āœ… (500M/1.5B/7B/72B) | | Qwen2.5 | āœ… | āœ… (500M/1.5B/7B/14B/32B/72B) | āœ… (500M/1.5B/7B/14B/32B/72B) | | Qwen2…