nomadkaraoke / karaoke-gen
Generate karaoke videos, by downloading audio and lyrics, separating instrumentals, synchronising lyrics using transcription models, rendering CDG and uploading videos to YouTube / Dropbox / Google Drive
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing nomadkaraoke/karaoke-gen in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewKaraoke Generator 🎶 🎥 🚀 Generate professional karaoke videos with instrumental audio and synchronized lyrics. Available as a **local CLI** ( ) or **cloud-based CLI** ( ) that offloads processing to Google Cloud. ✨ Two Ways to Generate Karaoke • Local CLI ( ) Run all processing locally on your machine. Requires GPU for optimal audio separation performance. • Remote CLI ( ) Offload all processing to a cloud backend. No GPU required - just authenticate and submit jobs. Both CLIs produce identical outputs: 4K karaoke videos, CDG+MP3 packages, audio stems, and more. --- 🎯 Features Core Pipeline • **Audio Separation**: AI-powered vocal/instrumental separation using MDX and Demucs models • **Lyrics Transcription**: Word-level timestamps via AudioShake API • **Lyrics Correction**: Match transcription against online lyrics (Genius, Spotify, Musixmatch) • **Human Review**: Interactive UI for correcting lyrics before final render • **Video Rendering**: High-quality 4K karaoke videos with customizable styles • **Multiple Outputs**: MP4 (4K lossless/lossy, 720p), MKV, CDG+MP3, TXT+MP3 Distribution Features • **YouTube Upload**: Automatic upload to your YouTube channel • **Dropbox Integration**: Organize output in brand-coded folders • **Google Drive**: Upload to public share folders • **Discord Notifications**: Webhook notifications on completion --- 📦 Installation This installs both (local) and (cloud) CLIs. Requirements • Python 3.10-3.13 • FFmpeg • For local processing: CUDA-capable GPU or Apple Silicon CPU recommended Transcription Provider Setup **Transcription is required** for creating karaoke videos with synchronized lyrics. The system needs word-level timing data to display lyrics in sync with the music. Option 1: AudioShake (Recommended) Commercial service with high-quality transcription. Best for production use. Get an API key at https://www.audioshake.ai/ - business only, at time of writing this. Option 2: Local Whisper (No Cloud Required) Run Whisper directly on your local machine using whisper-timestamped. Works on CPU, NVIDIA GPU (CUDA), or Apple Silicon. **Model Size Guide:** | Model | VRAM | Speed | Quality | |-------|------|-------|---------| | tiny | ~1GB | Fast | Lower | | base | ~1GB | Fast | Basic | | small | ~2GB | Medium | Good | | medium | ~5GB | Slower | Better | | large | ~10GB | Slowest | Best | **CPU-Only Installation** (no GPU required): Local Whisper runs automatically as a fallback when no cloud transcription services are configured. Option 3: Whisper via RunPod Cloud-based alternative using OpenAI's Whisper model on RunPod infrastructure. Set up a Whisper endpoint at https://www.runpod.io/ Without Transcription (Instrumental Only) If you don't need synchronized lyrics, use the flag: This creates an instrumental-only karaoke video without lyrics overlay. > **Note:** See for detailed transcription provider configuration options. --- 🖥️ Local CLI ( ) Basic Usage Remote Audio Separation (Optional) Offload just the GPU-intensive audio separation to Modal.com while keeping other processing local: Key Options Full Options Reference --- ☁️ Remote CLI ( ) The remote CLI submits jobs to a Google Cloud backend that handles all processing. You don't need a GPU or any audio processing libraries installed locally. Setup • **Set the backend URL:** • **Authenticate with Google Cloud:** Basic Usage Job Management Full Production Run Environment Variables | Variable | Description | Default | |----------|-------------|---------| | | Backend service URL | Required | | | Admin auth token (for protected endpoints) | Optional | | | Lyrics review UI URL | | | | Seconds between status polls | | **Note:** The defaults to the hosted lyrics review UI. For local development, set it to if you're running the frontend dev server. Authentication The backend uses token-based authentication for admin operations (bulk delete, internal worker triggers). For basic job submission and monitoring, authentication is optional. **For admin access:** The token must match one of the tokens configured in the backend's environment variable. Non-Interactive Mode For automated/CI usage: The flag auto-accepts default corrections and selects clean instrumental. --- 🎨 Style Configuration Create a file to customize the karaoke video appearance: When using , all referenced files are automatically uploaded with your job. --- 📤 Output Files A completed job produces: --- 🏗️ Deploy Your Own Backend The cloud backend runs on Google Cloud Platform using: • **Cloud Run**: Serverless API hosting • **Firestore**: Job state management • **Cloud Storage**: File uploads and outputs • **Modal.com**: GPU-accelerated audio separation • **AudioShake**: Lyrics transcription API Prerequisites • Google Cloud account with billing enabled • Pulumi CLI • Modal.com account (for audio separation) • AudioShake API key Infrastructure Setup This creates: • Firestore database • Cloud Storage bucket • Artifact Registry • Service account with IAM roles • Secret Manager secrets (you add values) Add Secret Values Deploy Cloud Run Deployments happen automatically via GitHub Actions CI when pushing to . See for the full deployment workflow. Point CLI to Your Backend --- 🔌 Backend API Reference The backend exposes a REST API for job management. Job Submission **POST** Submit a new karaoke generation job with audio file and options. Job Status **GET** Get job status and details. List Jobs **GET** List all jobs with optional status filter. Cancel Job **POST** Cancel a running job. Delete Job **DELETE** Delete a job and its files. Lyrics Review **GET** Get correction data for lyrics review. **POST** Submit corrected lyrics and trigger video rendering. Instrumental Selection **GET** Get available instrumental options. **POST** Submit instrumental selection (clean or with_backing). Download Files **GET** Get download URLs for all output files. **GET** Str…