gracezhao1997 / Awesome-Video-World-Models-with-AR-Diffusion

A Curated List of Awesome Video World Models with AR Diffusion: Covering Algorithms, Applications, and Infrastructure, Aimed at Serving as a Comprehensive Resource for Researchers, Practitioners, and Enthusiasts.

View on GitHub

301 stars

9 forks

0 issues

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing gracezhao1997/Awesome-Video-World-Models-with-AR-Diffusion in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/gracezhao1997/Awesome-Video-World-Models-with-AR-Diffusion)

Preview:

Repository Overview (README excerpt)

Crawler view

📹 Awesome Video World Models with AR Diffusion Overview This repository focuses on **Video World Models with Autoregressive (AR) Diffusion**, a promising paradigm for **scalable, consistent and interactive world modeling** (e.g., Genie 3). It aims to serve as a comprehensive and structured resource for researchers, practitioners,i and enthusiasts interested in AR diffusion-based video world modeling. To stay at the forefront of the field, **this repository is updated weekly**. 🌟 Key Features • **Structured Taxonomy:** We organize the evolving ecosystem from three complementary perspectives: **Algorithmic Foundations**, **Real-world Applications**, and **Infrastructure-level Acceleration**. Together, these dimensions reflect the full stack of AR diffusion—from modeling design to real-time interactive deployment. • **One-Stop Citation Collection:** 📚 We provide a **consolidated BibTeX file** containing all papers listed in this repository. You can easily import it into your LaTeX or Zotero projects with one click! 📬 Contact This repository is curated and maintained by: • **Min Zhao** (gracezhao1997@gmail.com) • **Hongzhou Zhu** (suinibian74@gmail.com) • **Wenqiang Sun** (sunwq0814@gmail.com) For any questions or suggestions, please feel free to reach out to us. • 🎯 We have not yet compiled an exhaustive list of all related work. We apologize for any omissions and **welcome pull requests to merge them in**. • 💡 We also welcome high-level categorization, synthesis, and perspective contributions to improve the organization and clarity of this repository. Table of Contents • 1. Algorithm • 1.1 AR Diffusion (native pretraining) • 1.2 AR Diffusion Distillation for Real-time Generation (post training) • 1.3 Long Video Generation • 2. Application • 2.1 Open-source AR Video Foundation Models • 2.2 Interactive Video Action World Model • 2.3 Real-time Interactive Avtar & Motion Control • 2.4 Egocentric Interaction • 2.5 Embodied AI • 3. Infrastructure • 3.1 Sparse Attention • 3.2 Caching • 3.3 Quantized Attention • Contributing • Acknowledgment • Algorithm 1.1 AR Diffusion (native pretraining) These methods focus on basic **AR Diffusion (where each chunk/frame is generated via diffusion and the frames are AR)**. • **Diffusion Forcing**: "Next-token Prediction Meets Full-Sequence Diffusion". • **Pyramid Flow**: "Pyramidal Flow Matching for Efficient Video Generative Modeling". • **DFoT**, "History-Guided Video Diffusion". • **AR-Diffusion**, "AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion". • **PFVG**, "Pack and force your memory: Long-form and consistent video generation". • **BAgger**, "BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models". • **Resampling Forcing**, "End-to-End Training for Autoregressive Video Diffusion via Self-Resampling". • **Helios**, "Helios: Real Real-Time Long Video Generation Model". 1.2 🔥 AR Diffusion Distillation for Real-time Generation (post training) This category of algorithms focuses on **distilling multi-step bidirectional diffusion models into few-step AR models**, specifically tailored for **real-time streaming generation**. • From Multi-step Bidirectional Diffusion to Few-step Autoregressive Generators: • [⭐] **CausVid**, "From Slow Bidirectional to Fast Autoregressive Video Diffusion Models". • [⭐] **Self Forcing**, "Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion". • [⭐] **Causal Forcing**, "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation". • Further Improvements: • (Adversarial distillation) **Seaweed APT2**, "Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation". • (One-step distillation) **ASD**, "Towards One-Step Causal Video Generation via Adversarial Self-Distillation". • (Two-steps distillation) **Diagonal Distillation**, "Streaming Autoregressive Video Generation via Diagonal Distillation". • (Reinforcement learning) **Reward Forcing**, "Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation". • (Reinforcement learning) **WorldCompass**, "WorldCompass: Reinforcement Learning for Long-Horizon World Models". • (Reinforcement learning) **AR-CoPO**,"AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization". • (Reinforcement learning) **Astrolabe**, "Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models". Website](https://franklinz233.github.io/projects/astrolabe/) 1.3 Long Video Generation • From Short-video Generator to Long-video Generator: • **LongLive**, "LongLive: Real-time Interactive Long Video Generation". • **Rolling Forcing**, "Rolling Forcing: Autoregressive Long Video Diffusion in Real Time". • **Self Forcing++**, "Self-Forcing++: Towards Minute-Scale High-Quality Video Generation". • **Infinite Forcing**, • **Infinity-RoPE**, "Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout". • **Deep Forcing**, "Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression". • **LoL**, "LoL: Longer than Longer, Scaling Video Generation to Hour". • **FLEX**, "Train Short, Inference Long: Training-free Horizon Extension for Autoregressive Video Generation". • **Rolling Sink**, "Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion". • **MMM**, "Mode Seeking meets Mean Seeking for Fast Long Video Generation". • **MemRoPE**, "MemRoPE: Training-Free Infinite Video Generation via Evolving Memory Tokens". • **Anchor Forcing**, "Anchor Forcing: Anchor Memory and Tri-Region RoPE for Interactive Streaming Video Diffusion". [ ](http _...truncated for preview_