AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing modelscope/DiffSynth-Studio in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewDiffSynth-Studio 切换到中文版 Introduction > DiffSynth-Studio Documentation: 中文版、English version Welcome to the magical world of Diffusion models! DiffSynth-Studio is an open-source Diffusion model engine developed and maintained by the ModelScope Community. We hope to foster technological innovation through framework construction, aggregate the power of the open-source community, and explore the boundaries of generative model technology! DiffSynth currently includes two open-source projects: • DiffSynth-Studio: Focused on aggressive technical exploration, targeting academia, and providing cutting-edge model capability support. • DiffSynth-Engine: Focused on stable model deployment, targeting industry, and providing higher computational performance and more stable features. DiffSynth-Studio and DiffSynth-Engine are the core engines of the ModelScope AIGC zone. Welcome to experience our carefully crafted productized features: • ModelScope AIGC Zone (for Chinese users): https://modelscope.cn/aigc/home • ModelScope Civision (for global users): https://modelscope.ai/civision/home We believe that a well-developed open-source code framework can lower the threshold for technical exploration. We have achieved many interesting technologies based on this codebase. Perhaps you also have many wild ideas, and with DiffSynth-Studio, you can quickly realize these ideas. For this reason, we have prepared detailed documentation for developers. We hope that through these documents, developers can understand the principles of Diffusion models, and we look forward to expanding the boundaries of technology together with you. Update History > DiffSynth-Studio has undergone major version updates, and some old features are no longer maintained. If you need to use old features, please switch to the last historical version before the major version update. > Currently, the development personnel of this project are limited, with most of the work handled by Artiprocher and mi804. Therefore, the progress of new feature development will be relatively slow, and the speed of responding to and resolving issues is limited. We apologize for this and ask developers to understand. • **January 19, 2026**: Added support for openmoss/MOVA-720p and openmoss/MOVA-360p models, including training and inference capabilities. Documentation and example code are now available. • **March 12, 2026**: We have added support for the LTX-2.3 audio-video generation model. The features includes text-to-audio/video, image-to-audio/video, IC-LoRA control, audio-to-video, and audio-video inpainting. We have supported the complete inference and training functionalities. For details, please refer to the documentation and code. • **March 3, 2026**: We released the DiffSynth-Studio/Qwen-Image-Layered-Control-V2 model, which is an updated version of Qwen-Image-Layered-Control. In addition to the originally supported text-guided functionality, it adds brush-controlled layer separation capabilities. • **March 2, 2026** Added support for Anima. For details, please refer to the documentation. This is an interesting anime-style image generation model. We look forward to its future updates. More • **February 26, 2026** Added full and lora training support for the LTX-2 audio-video generation model. See the documentation for details. • **February 10, 2026** Added inference support for the LTX-2 audio-video generation model. See the documentation for details. Support for model training will be implemented in the future. • **February 2, 2026** The first document of the Research Tutorial series is now available, guiding you through training a small 0.1B text-to-image model from scratch. For details, see the documentation and model. We hope DiffSynth-Studio can evolve into a more powerful training framework for Diffusion models. • **January 27, 2026**: Z-Image is released, and our Z-Image-i2L model is released concurrently. You can use it in ModelScope Studios. For details, see the documentation. • **January 19, 2026**: Added support for FLUX.2-klein-4B and FLUX.2-klein-9B models, including training and inference capabilities. Documentation and example code are now available. • **January 12, 2026**: We trained and open-sourced a text-guided image layer separation model (Model Link). Given an input image and a textual description, the model isolates the image layer corresponding to the described content. For more details, please refer to our blog post (Chinese version, English version). • **December 24, 2025**: Based on Qwen-Image-Edit-2511, we trained an In-Context Editing LoRA model (Model Link). This model takes three images as input (Image A, Image B, and Image C), and automatically analyzes the transformation from Image A to Image B, then applies the same transformation to Image C to generate Image D. For more details, please refer to our blog post (Chinese version, English version). • **December 9, 2025** We release a wild model based on DiffSynth-Studio 2.0: Qwen-Image-i2L (Image-to-LoRA). This model takes an image as input and outputs a LoRA. Although this version still has significant room for improvement in terms of generalization, detail preservation, and other aspects, we are open-sourcing these models to inspire more innovative research. For more details, please refer to our blog. • **December 4, 2025** DiffSynth-Studio 2.0 released! Many new features online • Documentation online: Our documentation is still continuously being optimized and updated • VRAM Management module upgraded, supporting layer-level disk offload, releasing both memory and VRAM simultaneously • New model support • Z-Image Turbo: Model, Documentation, Code • FLUX.2-dev: Model, Documentation, Code • Training framework upgrade • Split Training: Supports automatically splitting the training process into two stages: data processing and training (even for training ControlNet or any other model). Computations that do not require gradient backpropagation, such as text encoding and…