back to home

jianchang512 / pyvideotrans

Translate the video from one language to another and embed dubbing & subtitles.

16,448 stars
1,928 forks
328 issues
PythonBatchfile

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing jianchang512/pyvideotrans in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/jianchang512/pyvideotrans)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

> **Recall.ai - Meeting Transcription API** > > If you’re looking for a transcription API for meetings, consider checking out **Recall.ai** , an API that works with Zoom, Google Meet, Microsoft Teams, and more. Recall.ai diarizes by pulling the speaker data and separate audio streams from the meeting platforms, which means 100% accurate speaker diarization with actual speaker names. pyVideoTrans **A Powerful Open Source Video Translation / Audio Transcription / AI Dubbing / Subtitle Translation Tool** 中文 | **Documentation** | **Online Q&A** [ ]() **pyVideoTrans** is dedicated to seamlessly converting videos from one language to another, offering a complete workflow that includes speech recognition, subtitle translation, multi-role dubbing, and audio-video synchronization. It supports both local offline deployment and a wide variety of mainstream online APIs. --- ✨ Core Features • **🎥 Fully Automatic Video Translation**: One-click workflow: Speech Recognition (ASR) -> Subtitle Translation -> Speech Synthesis (TTS) -> Video Synthesis. • **🎙️ Audio Transcription / Subtitle Generation**: Batch convert audio/video to SRT subtitles, supporting **Speaker Diarization** to distinguish between different roles. • **🗣️ Multi-Role AI Dubbing**: Assign different AI dubbing voices to different speakers. • **🧬 Voice Cloning**: Integrates models like **F5-TTS, CosyVoice, GPT-SoVITS** for zero-shot voice cloning. • **🧠 Powerful Model Support**: • **ASR**: Faster-Whisper (Local), OpenAI Whisper, Alibaba Qwen, ByteDance Volcano, Azure, Google, etc. • **LLM Translation**: DeepSeek, ChatGPT, Claude, Gemini, MiniMax, Ollama (Local), Alibaba Bailian, etc. • **TTS**: Edge-TTS (Free), OpenAI, Azure, Minimaxi, ChatTTS, ChatterBox, etc. • **🖥️ Interactive Editing**: Supports pausing and manual proofreading at each stage (recognition, translation, dubbing) to ensure accuracy. • **🛠️ Utility Toolkit**: Includes auxiliary tools such as vocal separation, video/subtitle merging, audio-video alignment, and transcript matching. • **💻 Command Line Interface (CLI)**: Supports headless operation, convenient for server deployment or batch processing. --- 🚀 Quick Start (Windows Users) We provide a pre-packaged version for Windows 10/11 users, requiring no Python environment configuration. • **Download**: Click to download the latest pre-packaged version • **Unzip**: Extract the compressed file to a path (e.g., ). • **Run**: Double-click inside the folder to launch. > **Note**: > * Do not run directly from within the compressed archive. > * To use GPU acceleration, ensure **CUDA 12.8** and **cuDNN 9.11** are installed. --- 🛠️ Source Deployment (macOS / Linux / Windows Developers) We recommend using ** ** for package management for faster speed and better environment isolation. • Prerequisites • **Python**: Recommended version 3.10 --> 3.12 • **FFmpeg**: Must be installed and configured in the environment variables. • **macOS**: • **Linux (Ubuntu/Debian)**: • **Windows**: Download FFmpeg and configure Path, or place and directly in the project directory. • Install uv (If not installed) • Clone and Install • Launch Software **Launch GUI**: **Use CLI**: > View documentation for detailed parameters • (Optional) GPU Acceleration Configuration If you have an NVIDIA graphics card, execute the following commands to install the CUDA-supported PyTorch version: --- 🧩 Supported Channels & Models (Partial) | Category | Channel/Model | Description | | :--- | :--- | :--- | | **ASR (Speech Recognition)** | **Faster-Whisper** (Local) | Recommended, fast speed, high accuracy | | | WhisperX / Parakeet | Supports timestamp alignment & speaker diarization | | | Alibaba Qwen3-ASR / ByteDance Volcano | Online API, excellent for Chinese | | **Translation (LLM/MT)** | **DeepSeek** / ChatGPT | Supports context understanding, more natural translation | | | MiniMax AI | MiniMax M2.5 LLM, 204K context window, OpenAI-compatible | | | Google / Microsoft | Traditional machine translation, fast speed | | | Ollama / M2M100 | Fully local offline translation | | **TTS (Speech Synthesis)** | **Edge-TTS** | Microsoft free interface, natural effect | | | **F5-TTS / CosyVoice** | Supports **Voice Cloning**, requires local deployment | | | GPT-SoVITS / ChatTTS | High-quality open-source TTS | | | 302.AI / OpenAI / Azure | High-quality commercial API | --- 📚 Documentation & Support • **Official Documentation**: https://pyvideotrans.com (Includes detailed tutorials, API configuration guides, FAQ) • **Online Q&A Community**: https://bbs.pyvideotrans.com (Submit error logs for automated AI analysis and answers) ⚠️ Disclaimer This software is an open-source, free, non-commercial project. Users are solely responsible for any legal consequences arising from the use of this software (including but not limited to calling third-party APIs or processing copyrighted video content). Please comply with local laws and regulations and the terms of use of relevant service providers. 🙏 Acknowledgements This project mainly relies on the following open-source projects (partial): • FFmpeg • PySide6 • faster-whisper • openai-whisper • edge-tts • F5-TTS • CosyVoice --- *Created by jianchang512*