Lightricks / LTX-2
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing Lightricks/LTX-2 in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewLTX-2 **LTX-2** is the first DiT-based audio-video foundation model that contains all core capabilities of modern video generation in one model: synchronized audio and video, high fidelity, multiple performance modes, production-ready outputs, API access, and open access. 🚀 Quick Start Required Models Download the following models from the LTX-2.3 HuggingFace repository: **LTX-2.3 Model Checkpoint** (choose and download one of the following) • - Download • - Download **Spatial Upscaler** - Required for current two-stage pipeline implementations in this repository • - Download • - Download **Temporal Upscaler** - Supported by the model and will be required for future pipeline implementations • - Download **Distilled LoRA** - Required for current two-stage pipeline implementations in this repository (except DistilledPipeline and ICLoraPipeline) • - Download **Gemma Text Encoder** (download all assets from the repository) • **LoRAs** • - Download • - Download • - Download • - Download • - Download • - Download • - Download • - Download • - Download • - Download • - Download Available Pipelines • **TI2VidTwoStagesPipeline** - Production-quality text/image-to-video with 2x upsampling (recommended) • **TI2VidTwoStagesHQPipeline** - Same two-stage flow as above but uses the res_2s second-order sampler (fewer steps, better quality) • **TI2VidOneStagePipeline** - Single-stage generation for quick prototyping • **DistilledPipeline** - Fastest inference with 8 predefined sigmas • **ICLoraPipeline** - Video-to-video and image-to-video transformations (uses distilled model.) • **KeyframeInterpolationPipeline** - Interpolate between keyframe images • **A2VidPipelineTwoStage** - Audio-to-video generation conditioned on an input audio file • **RetakePipeline** - Regenerate a specific time region of an existing video ⚡ Optimization Tips • **Use DistilledPipeline** - Fastest inference with only 8 predefined sigmas (8 steps stage 1, 4 steps stage 2) • **Enable FP8 quantization** - Enables lower memory footprint: (CLI) or (Python). For Hopper GPUs with TensorRT-LLM, use for FP8 scaled matrix multiplication. • **Install attention optimizations** - Use xFormers ( ) or Flash Attention 3 for Hopper GPUs • **Use gradient estimation** - Reduce inference steps from 40 to 20-30 while maintaining quality (see pipeline documentation) • **Skip memory cleanup** - If you have sufficient VRAM, disable automatic memory cleanup between stages for faster processing • **Choose single-stage pipeline** - Use for faster generation when high resolution isn't required ✍️ Prompting for LTX-2 When writing prompts, focus on detailed, chronological descriptions of actions and scenes. Include specific movements, appearances, camera angles, and environmental details - all in a single flowing paragraph. Start directly with the action, and keep descriptions literal and precise. Think like a cinematographer describing a shot list. Keep within 200 words. For best results, build your prompts using this structure: • Start with main action in a single sentence • Add specific details about movements and gestures • Describe character/object appearances precisely • Include background and environment details • Specify camera angles and movements • Describe lighting and colors • Note any changes or sudden events For additional guidance on writing a prompt please refer to Automatic Prompt Enhancement LTX-2 pipelines support automatic prompt enhancement via an parameter. 🔌 ComfyUI Integration To use our model with ComfyUI, please follow the instructions at . 📦 Packages This repository is organized as a monorepo with three main packages: • **ltx-core** - Core model implementation, inference stack, and utilities • **ltx-pipelines** - High-level pipeline implementations for text-to-video, image-to-video, and other generation modes • **ltx-trainer** - Training and fine-tuning tools for LoRA, full fine-tuning, and IC-LoRA Each package has its own README and documentation. See the Documentation section below. 📚 Documentation Each package includes comprehensive documentation: • **LTX-Core README** - Core model implementation, inference stack, and utilities • **LTX-Pipelines README** - High-level pipeline implementations and usage guides • **LTX-Trainer README** - Training and fine-tuning documentation with detailed guides