Wan-Video / Wan2.1

Wan: Open and Advanced Large-Scale Video Generative Models

15,603 stars

2,462 forks

355 issues

PythonShellMakefile

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing Wan-Video/Wan2.1 in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/Wan-Video/Wan2.1)

Preview:

Repository Overview (README excerpt)

Crawler view

Wan2.1 💜 Wan &nbsp&nbsp ｜ &nbsp&nbsp 🖥️ GitHub &nbsp&nbsp | &nbsp&nbsp🤗 Hugging Face &nbsp&nbsp | &nbsp&nbsp🤖 ModelScope &nbsp&nbsp | &nbsp&nbsp 📑 Technical Report &nbsp&nbsp | &nbsp&nbsp 📑 Blog &nbsp&nbsp | &nbsp&nbsp💬 WeChat Group &nbsp&nbsp | &nbsp&nbsp 📖 Discord &nbsp&nbsp ----- **Wan: Open and Advanced Large-Scale Video Generative Models** In this repository, we present **Wan2.1**, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. **Wan2.1** offers these key features: • 👍 **SOTA Performance**: **Wan2.1** consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks. • 👍 **Supports Consumer-grade GPUs**: The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes (without optimization techniques like quantization). Its performance is even comparable to some closed-source models. • 👍 **Multiple Tasks**: **Wan2.1** excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation. • 👍 **Visual Text Generation**: **Wan2.1** is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications. • 👍 **Powerful Video VAE**: **Wan-VAE** delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation. Video Demos 🔥 Latest News!! • May 14, 2025: 👋 We introduce **Wan2.1** VACE, an all-in-one model for video creation and editing, along with its inference code, weights, and technical report! • Apr 17, 2025: 👋 We introduce **Wan2.1** FLF2V with its inference code and weights! • Mar 21, 2025: 👋 We are excited to announce the release of the **Wan2.1** technical report. We welcome discussions and feedback! • Mar 3, 2025: 👋 **Wan2.1**'s T2V and I2V have been integrated into Diffusers (T2V | I2V). Feel free to give it a try! • Feb 27, 2025: 👋 **Wan2.1** has been integrated into ComfyUI. Enjoy! • Feb 25, 2025: 👋 We've released the inference code and weights of **Wan2.1**. Community Works If your work has improved **Wan2.1** and you would like more people to see it, please inform us. • Helios, a breakthrough video generation model base on **Wan2.1** that achieves minute-scale, high-quality video synthesis at 19.5 FPS on a single H100 GPU (about 10 FPS on a single Ascend NPU) —without relying on conventional long video anti-drifting strategies or standard video acceleration techniques. Visit their webpage for more details. • Video-As-Prompt, the first unified semantic-controlled video generation model based on **Wan2.1-14B-I2V** with a Mixture-of-Transformers architecture and in-context controls (e.g., concept, style, motion, camera). Refer to the project page for more examples. • LightX2V, a lightweight and efficient video generation framework that integrates **Wan2.1** and **Wan2.2**, supports multiple engineering acceleration techniques for fast inference, which can run on RTX 5090 and RTX 4060 (8GB VRAM). • DriVerse, an autonomous driving world model based on **Wan2.1-14B-I2V**, generates future driving videos conditioned on any scene frame and given trajectory. Refer to the project page for more examples. • Training-Free-WAN-Editing, built on **Wan2.1-T2V-1.3B**, allows training-free video editing with image-based training-free methods, such as FlowEdit and FlowAlign. • Wan-Move, accepted to NeurIPS 2025, a framework that brings **Wan2.1-I2V-14B** to SOTA fine-grained, point-level motion control! Refer to their project page for more information. • EchoShot, a native multi-shot portrait video generation model based on **Wan2.1-T2V-1.3B**, allows generation of multiple video clips featuring the same character as well as highly flexible content controllability. Refer to their project page for more information. • AniCrafter, a human-centric animation model based on **Wan2.1-14B-I2V**, controls the Video Diffusion Models with 3DGS Avatars to insert and animate anyone into any scene following given motion sequences. Refer to the project page for more examples. • HyperMotion, a human image animation framework based on **Wan2.1**, addresses the challenge of generating complex human body motions in pose-guided animation. Refer to their website for more examples. • MagicTryOn, a video virtual try-on framework built upon **Wan2.1-14B-I2V**, addresses the limitations of existing models in expressing garment details and maintaining dynamic stability during human motion. Refer to their website for more examples. • ATI, built on **Wan2.1-I2V-14B**, is a trajectory-based motion-control framework that unifies object, local, and camera movements in video generation. Refer to their website for more examples. • Phantom has developed a unified video generation framework for single and multi-subject references based on both **Wan2.1-T2V-1.3B** and **Wan2.1-T2V-14B**. Please refer to their examples. • UniAnimate-DiT, based on **Wan2.1-14B-I2V**, has trained a Human image animation model and has open-sourced the inference and training code. Feel free to enjoy it! • CFG-Zero enhances **Wan2.1** (covering both T2V and I2V models) from the perspective of CFG. • TeaCache now supports **Wan2.1** acceleration, capable of increasing speed by approximately 2x. Feel free to give it a try! • DiffSynth-Studio provides more support for **Wan2.1**, including video-to-video, FP8 quantization, VRAM optimization, LoRA training, and more. Please refer to their examples. 📑 Todo List • Wan2.1 Text-to-Video • [x] Multi-GPU Inference code of the 14B and 1.3B models • [x] Checkpoints of the 14B and 1.3B models • [x] Gradio demo • [x] ComfyUI integration • [x] Diffusers integration • [ ] Diffusers + Multi-GPU In…