FoundationVision / VAR
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing FoundationVision/VAR in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewVAR: a new visual generation method elevates GPT-style models beyond diffusion🚀 & Scaling laws observed📈 Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction NeurIPS 2024 Best Paper News • **2025-11:** We Release our Text-to-Video generation model **InfinityStar** based on VAR & Infinity, please check Infinity⭐️. • **2025-11:** 🎉 InfinityStar is accepted as **NeurIPS 2025 Oral.** • **2025-04:** 🎉 Infinity is accepted as **CVPR 2025 Oral.** • **2024-12:** 🏆 VAR received **NeurIPS 2024 Best Paper Award**. • **2024-12:** 🔥 We Release our Text-to-Image research based on VAR, please check Infinity. • **2024-09:** VAR is accepted as **NeurIPS 2024 Oral** Presentation. • **2024-04:** Visual AutoRegressive modeling is released. 🕹️ Try and Play with VAR! ~~We provide a demo website for you to play with VAR models and generate images interactively. Enjoy the fun of visual autoregressive modeling!~~ We provide a demo website for you to play with VAR Text-to-Image and generate images interactively. Enjoy the fun of visual autoregressive modeling! We also provide demo_sample.ipynb for you to see more technical details about VAR. [//]: # ( ) [//]: # ( ) What's New? 🔥 Introducing VAR: a new paradigm in autoregressive visual generation✨: Visual Autoregressive Modeling (VAR) redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction". 🔥 For the first time, GPT-style autoregressive models surpass diffusion models🚀: 🔥 Discovering power-law Scaling Laws in VAR transformers📈: 🔥 Zero-shot generalizability🛠️: For a deep dive into our analyses, discussions, and evaluations, check out our paper. VAR zoo We provide VAR models for you to play with, which are on or can be downloaded from the following links: | model | reso. | FID | rel. cost | #params | HF weights🤗 | |:----------:|:-----:|:--------:|:---------:|:-------:|:------------------------------------------------------------------------------------| | VAR-d16 | 256 | 3.55 | 0.4 | 310M | var_d16.pth | | VAR-d20 | 256 | 2.95 | 0.5 | 600M | var_d20.pth | | VAR-d24 | 256 | 2.33 | 0.6 | 1.0B | var_d24.pth | | VAR-d30 | 256 | 1.97 | 1 | 2.0B | var_d30.pth | | VAR-d30-re | 256 | **1.80** | 1 | 2.0B | var_d30.pth | | VAR-d36 | 512 | **2.63** | - | 2.3B | var_d36.pth | You can load these models to generate images via the codes in demo_sample.ipynb. Note: you need to download vae_ch160v4096z32.pth first. Installation • Install . • Install other pip packages via . • Prepare the ImageNet dataset assume the ImageNet is in . It should be like this: **NOTE: The arg should be passed to the training script.** • (Optional) install and compile and for faster attention computation. Our code will automatically use them if installed. See models/basic_var.py#L15-L30. Training Scripts To train VAR-{d16, d20, d24, d30, d36-s} on ImageNet 256x256 or 512x512, you can run the following command: A folder named will be created to save the checkpoints and logs. You can monitor the training process by checking the logs in and , or using . If your experiment is interrupted, just rerun the command, and the training will **automatically resume** from the last checkpoint in (see utils/misc.py#L344-L357). Sampling & Zero-shot Inference For FID evaluation, use to sample 50,000 images (50 per class) and save them as PNG (not JPEG) files in a folder. Pack them into a file via in utils/misc.py#L344. Then use the OpenAI's FID evaluation toolkit and reference ground truth npz file of 256x256 or 512x512 to evaluate FID, IS, precision, and recall. Note a relatively small is used for trade-off between image quality and diversity. You can adjust it to , or sample with for **better visual quality**. We'll provide the sampling script later. Third-party Usage and Research ***In this pargraph, we cross link third-party repositories or research which use VAR and report results. You can let us know by raising an issue*** ( ) | **Time** | **Research** | **Link** | |--------------|-------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------| | [5/12/2025] | [ICML 2025]Continuous Visual Autoregressive Generation via Score Maximization | https://github.com/shaochenze/EAR | | [5/8/2025] | Generative Autoregressive Transformers for Model-Agnostic Federated MRI Reconstruction | https://github.com/icon-lab/FedGAT | | [4/7/2025] | FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning | https://github.com/csguoh/FastVAR | | [4/3/2025] | VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning | https://github.com/VARGPT-family/VARGPT-v1.1 | | [3/31/2025] | Training-Free Text-Guided Image Editing with Visual Autoregressive Model | https://github.com/wyf0912/AREdit | | [3/17/2025] | Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers | https://github.com/Shiran-Yuan/ArchonView | | [3/14/2025] | Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking | https://arxiv.org/abs/2503.11324 | | [3/3/2025] | [ICML 2025]Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator | https://research.nvidia.com/labs/dir/ddo/ | | [2/28/2025] | Autoregressive Medical Image Segmentation via Next-Scale Mask Prediction | https://arxiv.org/abs/2502.20784 | | [2/27/2025] | FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction | https://github.com/jiaosiyu1999/FlexVAR | | [2/17/2025] | MARS: Mesh AutoRegressive Model for 3D Shape Detailization | https://arxiv.org/abs/2502.11390 | | [1/31/2025] | [ICML 2025]Visual Autoregressive Modeling for Image Super-Resolut…