Ksuriuri / index-tts-vllm

Added vLLM support to IndexTTS for faster inference.

1,083 stars

147 forks

148 issues

PythonCudaC

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing Ksuriuri/index-tts-vllm in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/Ksuriuri/index-tts-vllm)

Preview:

Repository Overview (README excerpt)

Crawler view

中文｜ English IndexTTS-vLLM 项目简介该项目在 index-tts 的基础上使用 vllm 库重新实现了 gpt 模型的推理，加速了 index-tts 的推理过程。推理速度（Index-TTS-v1/v1.5）在单卡 RTX 4090 上的提升为： • 单个请求的 RTF (Real-Time Factor)：≈0.3 -> ≈0.1 • 单个请求的 gpt 模型 decode 速度：≈90 token / s -> ≈280 token / s • 并发量：gpu_memory_utilization 设置为 0.25（约5GB显存）的情况下，实测 16 左右的并发无压力（测速脚本参考）更新日志 • **[2025-09-22]** 支持了 vllm v1 版本，IndexTTS2 正在兼容中 • **[2025-09-28]** 支持了 IndexTTS2 的 webui 推理，并整理了权重文件，现在部署更加方便了！ \0.0/ ；但当前版本对于 IndexTTS2 的 gpt 似乎并没有加速效果，待研究 • **[2025-09-29]** 解决了 IndexTTS2 的 gpt 模型推理加速无效的问题 • **[2025-10-09]** 兼容 IndexTTS2 的 api 接口调用，请参考 API；v1/1.5 的 api 接口以及 openai 兼容的接口可能还有 bug，晚点再修 • **[2025-10-19]** 支持 qwen0.6bemo4-merge 的 vllm 推理 • **[2026-03-03]** vllm 0.16.0 support for gpt2 inference TODO list • V2 api 的并发优化：目前只有 gpt2 模型的推理是并行的，其他模块均是串行，而其中 s2mel 的推理开销大（需要 DiT 迭代 25 步），十分影响并发性能 • s2mel 的推理加速使用步骤 • git 本项目 • 创建并激活 conda 环境 • 安装依赖使用强制覆盖的方式进行依赖安装，规避vllm 0.16.0与descript-audiotools 0.7.2版本中protobuf的版本冲突问题。 • 下载模型权重自动下载（推荐）选择对应版本的模型权重下载到路径下： **From ModelScope（国内推荐）：** **From Hugging Face：** 手动下载 • ModelScope：Index-TTS | IndexTTS-1.5 | IndexTTS-2 • Hugging Face：IndexTTS-2 自行转换原权重（可选，不推荐）可以使用自行转换官方权重文件： • webui 启动！运行对应版本（第一次启动可能会久一些，因为要对 bigvgan 进行 cuda 核编译）： API 使用 fastapi 封装了 api 接口，启动示例如下：启动参数 • : 必填，模型权重路径 • : 服务ip地址，默认为 • : 服务端口，默认为 • : vllm 显存占用率，默认设置为 API 请求示例 • v1/1.5 请参考 • v2 请参考 OpenAI API • 添加 /audio/speech api 路径，兼容 OpenAI 接口 • 添加 /audio/voices api 路径，获得 voice/character 列表详见：createSpeech 新特性 • **v1/v1.5:** 支持多角色音频混合：可以传入多个参考音频，TTS 输出的角色声线为多个参考音频的混合版本（输入多个参考音频会导致输出的角色声线不稳定，可以抽卡抽到满意的声线再作为参考音频）性能 Word Error Rate (WER) Results for IndexTTS and Baseline Models on the **seed-test** | model | zh | en | | ----------------------- | ----- | ----- | | Human | 1.254 | 2.143 | | index-tts (num_beams=3) | 1.005 | 1.943 | | index-tts (num_beams=1) | 1.107 | 2.032 | | index-tts-vllm | 1.12 | 1.987 | 基本保持了原项目的性能并发测试参考，需先启动 API 服务