back to home

FluidInference / FluidAudio

Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.

View on GitHub
1,730 stars
221 forks
6 issues

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing FluidInference/FluidAudio in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/FluidInference/FluidAudio)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

FluidAudio - Transcription, Text-to-speech, VAD, Speaker diarization with CoreML Models FluidAudio is a Swift SDK for fully local, low-latency audio AI on Apple devices, with inference offloaded to the Apple Neural Engine (ANE), resulting in less memory and generally faster inference. The SDK includes state-of-the-art speaker diarization, transcription, and voice activity detection via open-source models (MIT/Apache 2.0) that can be integrated with just a few lines of code. Models are optimized for background processing, ambient computing and always on workloads by running inference on the ANE, minimizing CPU usage and avoiding GPU/MPS entirely. For custom use cases, feedback, additional model support, or platform requests, join our Discord. We're also bringing visual, language, and TTS models to device and will share updates there. Below are some featured local AI apps using Fluid Audio models on macOS and iOS: Want to convert your own model? Check möbius Highlights • **Automatic Speech Recognition (ASR)**: Parakeet TDT v3 (0.6b) for batch transcription supporting 25 European languages; Parakeet EOU (120m) for streaming ASR with end-of-utterance detection (English only) • **Inverse Text Normalization (ITN)**: Post-process ASR output to convert spoken-form to written-form ("two hundred" → "200"). See text-processing-rs • **Text-to-Speech (TTS)**: Kokoro (82m) for parallel synthesis with SSML and pronunciation control across 9 languages (EN, ES, FR, HI, IT, JA, PT, ZH); PocketTTS for streaming TTS with voice cloning support (English only) • **Speaker Diarization (Online + Offline)**: Speaker separation and identification across audio streams. Streaming pipeline for real-time processing and offline batch pipeline with advanced clustering. • **Speaker Embedding Extraction**: Generate speaker embeddings for voice comparison and clustering, you can use this for speaker identification • **Voice Activity Detection (VAD)**: Voice activity detection with Silero models • **Apple Neural Engine**: Models run efficiently on Apple's ANE for maximum performance with minimal power consumption • **Open-Source Models**: All models are publicly available on HuggingFace — converted and optimized by our team; permissive licenses Video Demos | Link | Description | | --- | --- | | **Spokenly Real-time ASR** | Video demonstration of FluidAudio's transcription accuracy and speed | | **Senko Integration** | Python Speaker diarization on Mac using FluidAudio's segmentation model | | **Kokoro TTS** | Text-to-speech demo using FluidAudio's Kokoro and Silero models on iOS | | **Parakeet Realtime EOU** | Parakeet streaming ASR with end-of-utterance detection on iOS | | **Sortformer Diarization** | Sortformer for speaker diarization with overlapping speech on iOS | | **PocketTTS** | Streaming text-to-speech using PocketTTS on iOS | Showcase Make a PR if you want to add your app, please keep it in chronological order. | App | Description | | --- | --- | | **Voice Ink** | Local AI for instant, private transcription with near-perfect accuracy. Uses Parakeet ASR. | | **Spokenly** | Mac dictation app for fast, accurate voice-to-text; supports real-time dictation and file transcription. Uses Parakeet ASR and speaker diarization. | | **Senko** | A very fast and accurate speaker diarization pipeline. A good example for how to integrate FluidAudio into a Python app | | **Slipbox** | Privacy-first meeting assistant for real-time conversation intelligence. Uses Parakeet ASR (iOS) and speaker diarization across platforms. | | **Whisper Mate** | Transcribes movies and audio locally; records and transcribes in real time from speakers or system apps. Uses speaker diarization. | | **Altic/Fluid Voice** | Lightweight Fully free and Open Source Voice to Text dictation for macOS built using FluidAudio. Never pay for dictation apps | | **Paraspeech** | AI powered voice to text. Fully offline. No subscriptions. | | **mac-whisper-speedtest** | Comparison of different local ASR, including one of the first versions of FluidAudio's ASR models | | **Starling** | Open Source, fully local voice-to-text transcription with auto-paste at your cursor. | | **BoltAI** | Write content 10x faster using parakeet models | | **Voxeoflow** | Mac dictation app with real-time translation. Lightning-fast transcription in over 100 languages, instantly translated to your target language. | | **Speakmac** | Mac app that lets you type anywhere on your Mac using your voice. Fully local, private dictation built on FluidAudio. | | **SamScribe** | An open-source macOS app that captures and transcribes audio from your microphone and meeting applications (Zoom, Teams, Chrome) in real-time, with cross-session speaker recognition. | | **WhisKey** | Privacy-first voice dictation keyboard for iOS and macOS. On-device transcription with 12+ languages, AI meeting summaries, and mindmap generation. Great for daily use and vibe-coding. Uses speaker diarization. | | **Dictate Anywhere** | Native macOS dictation app with global Fn key activation. Dictate into any app with 25 language support. Uses Parakeet ASR. | | **hongbomiao.com** | A personal R&D lab that facilitates knowledge sharing. Uses Parakeet ASR. | | **Hex** | macOS app that lets you press-and-hold a hotkey to record your voice, transcribe it, and paste into any application. Uses Parakeet ASR. | | **Super Voice Assistant** | Open-source macOS voice assistant with local transcription. Uses Parakeet ASR. | | **VoiceTypr** | Open-source voice-to-text dictation for macOS and Windows. Uses Parakeet ASR. | | **Summit AI Notes** | Local meeting transcription and summarization with speaker identification. Supports 100+ languages. | | **Ora** | Local voice assistant for macOS with speech recognition and text-to-speech. | | **Flowstay** | Easy text-to-speech, local post-processing and Claude Code integration for macOS. Free forever. | | **macos-speech-server** | OpenAI compatible STT/transcription and TTS/speech API s…