NVIDIA-NeMo / Megatron-Bridge
Training library for Megatron-based models with bidirectional Hugging Face conversion capability
View on GitHubAI Architecture Analysis
This repository is indexed by RepoMind. By analyzing NVIDIA-NeMo/Megatron-Bridge in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewNeMo Megatron Bridge Documentation | Supported Models | Examples | Contributing š£ News ⢠[03/12/2026] **Deprecating Python 3.10 support:** We're officially dropping Python 3.10 support with the upcoming 0.4.0 release. Downstream applications must raise their lower boundary to 3.12 to stay compatible with Megatron-Bridge. ⢠[12/16/2025] Mind Lab successfully used Megatron-bridge and VeRL to trained GRPO Lora for Trillion-parameter model on 64 H800 - See their techblog. ⢠[12/15/2025] Day 0 support for NVIDIA-NeMotron-3-Nano-30B-A3B-FP8! Reproducible code and custom NGC container: nvcr.io/nvidia/nemo:25.11.nemotron_3_nano Overview NeMo Megatron Bridge is a PyTorch-native library within the NeMo Framework that provides pretraining, SFT and LoRA for popular LLM and VLM models. It serves as a powerful **bridge, conversion, and verification layer** between š¤ Hugging Face and Megatron Core. It provides bidirectional checkpoint conversion between these formats, enabling other projects to leverage Megatron Core's parallelism capabilities or export models for various inference engines. The bridge includes built-in verification mechanisms to ensure conversion accuracy and checkpoint integrity across different model formats. On top of the bridge, NeMo Megatron Bridge provides a performant and scalable PyTorch-native training loop that leverages Megatron Core to deliver state-of-the-art training throughput. It supports pretraining and fine-tuning with features like tensor and pipeline parallelism, and mixed precision (FP8, BF16, FP4, etc.). Users can either use existing š¤ Hugging Face models or define custom PyTorch model definitions for flexible end-to-end workflows. NeMo Megatron Bridge is a refactor of the previous NeMo training stack that adopts a PyTorch-native training loop to provide greater flexibility and customizability for developers. š§ Installation š³ NeMo Framework container The best experience, highest performance, and full feature support are provided by the NeMo Framework container. Fetch the most recent $TAG and run the following to start a container: For development installation and additional details, please refer to our Contribution guide. ā” Quickstart To get started, install Megatron Bridge or download a NeMo Framework container as described above. Log in to Hugging Face Hub: Conversion-only quickstart (ā Core): Training quickstart using pre-configured recipes: You can launch the above script with: More examples: ⢠Conversion scripts overview ⢠Import/Export checkpoints ⢠Generation with bridge ⢠Multi-GPU loading from HF ⢠Compare HF vs Megatron outputs ⢠Toy RLHF with Bridge (HF inference + Megatron training) For a deeper dive into conversion design and advanced usage, see the models README. š Key Features ⢠**Bridge with š¤ Hugging Face**: Seamless bidirectional conversion between š¤ Hugging Face and Megatron formats for interoperability (model bridges, auto bridge, conversion examples) ⢠Online import/export without intermediate full checkpoints ⢠Parallelism-aware (TP/PP/VPP/CP/EP/ETP) during conversion ⢠Memory-efficient per-parameter streaming ⢠Simple high-level API with architecture auto-detection ⢠Optimized paths when Transformer Engine is available ⢠**Flexible to Customize**: Lightweight custom training loop making it easy to configure custom logic in data loading, distributed training, checkpointing, evaluation and logging (training framework, training utilities) ⢠**Supervised & Parameter-Efficient Finetuning**: SFT & PEFT implementation tailored for Megatron-based models that supports LoRA, DoRA, and user-defined PEFT methods (PEFT implementations, finetune module, SFT dataset) ⢠**SOTA Training Recipes**: Pre-configured production-ready training recipes for popular models like Llama 3, with optimized hyperparameters and distributed training configuration (Llama recipes, recipe examples) ⢠**Performance Optimization**: Built-in support for FP8 training, model parallelism, and memory-efficient techniques to offer high utilization and near-linear scalability to thousands of nodes. (mixed precision, communication overlap, optimizer utilities) Supported Models Megatron Bridge provides out-of-the-box bridges and training recipes for a wide range of models, built on top of base model architectures from Megatron Core. Refer to the models directory for the most up-to-date list of model bridges. Supported Models Overview For more details on supported models, see our documentation: ⢠**Large Language Models** ⢠**Vision Language Models** | Model | Checkpoint Conversion | Pretrain Recipes | SFT & LoRA Recipes | |-------|-------------------|-------------------|-------------------| | DeepSeek V2 | ā | ā (v2) | Coming soon | | DeepSeek V2 Lite | ā | ā (v2-lite) | Coming soon | | DeepSeek V3 | ā | ā (v3) | Coming soon | | Gemma | ā | Coming soon | Coming soon | | Gemma 2 | ā | Coming soon | Coming soon | | Gemma 3 | ā | ā (1B) | ā (1B) | | Gemma 3-VL | ā | Coming soon | ā (4B/12B/27B) | | GLM-4.5 | ā | ā (106B-Air/355B) | ā (106B-Air/355B) | | GPT-oss | ā | ā (20B/120B) | ā (20B/120B) | | Llama 2 | ā | ā (7B) | Coming soon | | Llama 3 | ā | ā (8B/70B) | ā (8B/70B) | | Llama 3.1 | ā | ā (8B/70B/405B) | ā (8B/70B/405B) | | Llama 3.2 | ā | ā (1B/3B) | ā (1B/3B) | | Llama 3.3 | ā | Coming soon | Coming soon | | Llama Nemotron | ā | Coming soon | Coming soon | | Mistral | ā | Coming soon | Coming soon | |Ministral| ā | ā 3B/8B/14B|ā 3B/8B/14B| | Moonlight | ā | ā (16B) | ā (16B) | | Nemotron | ā | Coming soon | Coming soon | | Nemotron-nano-v3 | ā | ā (30B-A3B) | ā (A3B) | | Nemotron-super-v3 | ā | ā (120B-A12B) | ā (A12B) | | Nemotron-H | ā | ā (4B/8B/47B/56B) | Coming soon | | Nemotron Nano v2 | ā | ā (9B/12B) | Coming soon | | Nemotron Nano v2 VL | ā | Coming soon | ā (9B/12B) | | OlMoE | ā | ā (7B) | ā (7B) | | Qwen2 | ā | ā (500M/1.5B/7B/72B) | ā (500M/1.5B/7B/72B) | | Qwen2.5 | ā | ā (500M/1.5B/7B/14B/32B/72B) | ā (500M/1.5B/7B/14B/32B/72B) | | Qwen2ā¦