back to home

lllyasviel / FramePack

Lets make video diffusion practical!

16,681 stars
1,649 forks
475 issues
Python

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing lllyasviel/FramePack in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/lllyasviel/FramePack)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

FramePack Official implementation and desktop software for "Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models". Links: **Paper**, **Project Page** FramePack is a next-frame (next-frame-section) prediction neural network structure that generates videos progressively. FramePack compresses input contexts to a constant length so that the generation workload is invariant to video length. FramePack can process a very large number of frames with 13B models even on laptop GPUs. FramePack can be trained with a much larger batch size, similar to the batch size for image diffusion training. **Video diffusion, but feels like image diffusion.** News **2025 July 14:** Some pure text2video anti-drifting stress-test results of FramePack-P1 are uploaded here, using common prompts without any reference images. **2025 June 26:** Some results of FramePack-P1 are uploaded here. The FramePack-P1 will be the next version of FramePack with two designs: Planned Anti-Drifting and History Discretization. **2025 May 03:** The FramePack-F1 is released. Try it here. Note that this GitHub repository is the only official FramePack website. We do not have any web services. All other websites are spam and fake, including but not limited to , , , , , , , , , , , , , , , , , , and so on. Again, they are all spam and fake. **Do not pay money or download files from any of those websites.** Requirements Note that this repo is a functional desktop software with minimal standalone high-quality sampling system and memory management. **Start with this repo before you try anything else!** Requirements: • Nvidia GPU in RTX 30XX, 40XX, 50XX series that supports fp16 and bf16. The GTX 10XX/20XX are not tested. • Linux or Windows operating system. • At least 6GB GPU memory. To generate 1-minute video (60 seconds) at 30fps (1800 frames) using 13B model, the minimal required GPU memory is 6GB. (Yes 6 GB, not a typo. Laptop GPUs are okay.) About speed, on my RTX 4090 desktop it generates at a speed of 2.5 seconds/frame (unoptimized) or 1.5 seconds/frame (teacache). On my laptops like 3070ti laptop or 3060 laptop, it is about 4x to 8x slower. Troubleshoot if your speed is much slower than this. In any case, you will directly see the generated frames since it is next-frame(-section) prediction. So you will get lots of visual feedback before the entire video is generated. Installation **Windows**: >>> Click Here to Download One-Click Package (CUDA 12.6 + Pytorch 2.6) Copy this prompt: Set like this: (all default parameters, with teacache turned off) The result will be: Video may be compressed by GitHub **Important Note:** Again, this is a next-frame-section prediction model. This means you will generate videos frame-by-frame or section-by-section. **If you get a much shorter video in the UI, like a video with only 1 second, then it is totally expected.** You just need to wait. More sections will be generated to complete the video. Know the influence of TeaCache and Quantization Download this image: Copy this prompt: Set like this: Turn off teacache: You will get this: Video may be compressed by GitHub Now turn on teacache: About 30% users will get this (the other 70% will get other random results depending on their hardware): A typical worse result. So you can see that teacache is not really lossless and sometimes can influence the result a lot. We recommend using teacache to try ideas and then using the full diffusion process to get high-quality results. This recommendation also applies to sage-attention, bnb quant, gguf, etc., etc. Image-to-1-minute Set video length to 60 seconds: If everything is in order you will get some result like this eventually. 60s version: Video may be compressed by GitHub 6s version: Video may be compressed by GitHub More Examples Many more examples are in **Project Page**. Below are some more examples that you may be interested in reproducing. --- Video may be compressed by GitHub --- Video may be compressed by GitHub --- Video may be compressed by GitHub --- Video may be compressed by GitHub --- Video may be compressed by GitHub --- Video may be compressed by GitHub --- Video may be compressed by GitHub --- Prompting Guideline Many people would ask how to write better prompts. Below is a ChatGPT template that I personally often use to get prompts: You are an assistant that writes short, motion-focused prompts for animating images. When the user sends an image, respond with a single, concise prompt describing visual motion (such as human activity, moving objects, or camera movements). Focus only on how the scene could come alive and become dynamic using brief phrases. Larger and more dynamic motions (like dancing, jumping, running, etc.) are preferred over smaller or more subtle ones (like standing still, sitting, etc.). Describe subject, then motion, then other things. For example: "The girl dances gracefully, with clear movements, full of charm." If there is something that can dance (like a man, girl, robot, etc.), then prefer to describe it as dancing. Stay in a loop: one image in, one motion prompt out. Do not explain, ask questions, or generate multiple options. You paste the instruct to ChatGPT and then feed it an image to get prompt like this: *The man dances powerfully, striking sharp poses and gliding smoothly across the reflective floor.* Usually this will give you a prompt that works well. You can also write prompts yourself. Concise prompts are usually preferred, for example: *The girl dances gracefully, with clear movements, full of charm.* *The man dances powerfully, with clear movements, full of energy.* and so on. Cite @inproceedings{zhang2025framepack, title={Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models}, author={Lvmin Zhang and Shengqu Cai and Muyang Li and Gordon Wetzstein and Maneesh Agrawala}, bookt…