AI-Hypercomputer / gpu-recipes

Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.

123 stars

65 forks

24 issues

PythonShellJinja

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing AI-Hypercomputer/gpu-recipes in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/AI-Hypercomputer/gpu-recipes)

Preview:

Repository Overview (README excerpt)

Crawler view

Reproducible benchmark recipes for GPUs Welcome to the reproducible benchmark recipes repository for GPUs! This repository contains recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud. Overview • **Identify your requirements:** Determine the model, GPU type, workload, framework, and orchestrator you are interested in. • **Select a recipe:** Based on your requirements use the Benchmark support matrix to find a recipe that meets your needs. • Follow the recipe: each recipe will provide you with procedures to complete the following tasks: • Prepare your environment • Run the benchmark • Analyze the benchmarks results. This includes not just the results but detailed logs for further analysis Benchmarks support matrix Training benchmarks A3 Mega Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe ----------------- | --------------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------ **GPT3-175B** | A3 Mega (NVIDIA H100) | NeMo | Pre-training | GKE | Link **Llama-3-70B** | A3 Mega (NVIDIA H100) | NeMo | Pre-training | GKE | Link **Llama-3.1-70B** | A3 Mega (NVIDIA H100) | NeMo | Pre-training | GKE | Link **Mixtral-8-7B** | A3 Mega (NVIDIA H100) | NeMo | Pre-training | GKE | Link Training benchmarks A3 Ultra Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe ------------------ | ----------------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------ **Llama-3.1-70B** | A3 Ultra (NVIDIA H200) | MaxText | Pre-training | GKE | Link **Llama-3.1-70B** | A3 Ultra (NVIDIA H200) | NeMo | Pre-training | GKE | Link **Llama-3.1-405B** | A3 Ultra (NVIDIA H200) | MaxText | Pre-training | GKE | Link **Llama-3.1-405B** | A3 Ultra (NVIDIA H200) | NeMo. | Pre-training | GKE | Link **Mixtral-8-7B** | A3 Ultra (NVIDIA H200) | NeMo | Pre-training | GKE | Link Training benchmarks A4 Models | GPU Machine Type | Framework / Library | Workload Type | Orchestrator | Link to the recipe ------------------ | ---------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------ **Llama-3.1-70B** | A4 (NVIDIA B200) | MaxText | Pre-training | GKE | Link **Llama-3.1-70B** | A4 (NVIDIA B200) | NeMo | Pre-training | GKE | Link **Llama-3.1-405B** | A4 (NVIDIA B200) | MaxText | Pre-training | GKE | Link **Llama-3.1-405B** | A4 (NVIDIA B200) | NeMo | Pre-training | GKE | Link **Mixtral-8-7B** | A4 (NVIDIA B200) | NeMo | Pre-training | GKE | Link **PaliGemma2** | A4 (NVIDIA B200) | Hugging Face Accelerate | Finetuning | GKE | Link Training benchmarks A4X Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe ------------------ | ---------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------ **Llama-3.1-8B** | A4X (NVIDIA GB200) | NeMo | Pre-training | GKE | Link **Llama-3.1-70B** | A4X (NVIDIA GB200) | NeMo | Pre-training | GKE | Link **Llama-3.1-405B** | A4X (NVIDIA GB200) | NeMo | Pre-training | GKE | Link **Nemotron-4-340B** | A4X (NVIDIA GB200) | NeMo | Pre-training | GKE | Link **Wan-2.1-14B** | A4X (NVIDIA GB200) | NeMo | Pre-training | GKE | Link **Wan-2.1-14B** | A4X (NVIDIA GB200) | NeMo | Pre-training | Slurm | Link Inference benchmarks A3 Mega | Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe | | ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ | | **Llama-4** | A3 Mega (NVIDIA H100) | SGLang | Inference | GKE | Link | **DeepSeek R1 671B** | A3 Mega (NVIDIA H100) | SGLang | Inference | GKE | Link | **DeepSeek R1 671B** | A3 Mega (NVIDIA H100) | vLLM | Inference | GKE | Link Inference benchmarks A3 Ultra | Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe | | ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ | | **GPT OSS 120B** | A3 Ultra (NVIDIA H200) | vLLM | Inference | GKE | Link | **Llama-4** | A3 Ultra (NVIDIA H200) | vLLM | Inference | GKE | Link | **Llama-3.1-405B** | A3 Ultra (NVIDIA H200) | TensorRT-LLM | Inference | GKE | Link | **DeepSeek R1 671B** | A3 Ultra (NVIDIA H200) | SGLang | Inference | GKE | Link | **DeepSeek R1 671B** | A3 Ultra (NVIDIA H200) | vLLM | Inference | GKE | Link Inference benchmarks A4 | Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe | | ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ | | **DeepSeek R1 671B** | A4 (NVIDIA B200) | vLLM | Inference | GKE | Link | **DeepSeek R1 671B** | A4 (NVIDIA B200) | SGLang | Inference | GKE | Link Inference benchmarks G4 | Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe | | ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ | | **Qwen3 8B** | G4 (NVIDIA RTX PRO 6000 Blackwell) | vLLM | Inference | GCE | Link | **Qwen3 30B A3B**| G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link | **Qwen3 4B** | G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link | **Qwen3 8B** | G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link | **Qwen3 32B** | G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link | **Qwen3 32B** | G4 (NVIDIA RTX PRO 6000 Blackwell) | vLLM | Inference | GCE | Link | **Llama3.1 70B** | G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link | **DeepSeek R1** | G4 (NVIDIA…