evilsocket / cake

Distributed inference for mobile, desktop and server.

2,970 stars

183 forks

6 issues

RustHTMLShell

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing evilsocket/cake in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/evilsocket/cake)

Preview:

Repository Overview (README excerpt)

Crawler view

Cake is a **multimodal AI inference server** written in Rust that can run models as a single node, or shard them across a heterogeneous cluster of devices — iOS, Android, macOS, Linux, Windows — to run workloads that wouldn't fit on a single GPU, effectively leveraging planned obsolescence to make AI more accessible and democratic. This is experimental code that's being actively developed and changed very quickly. Key Features • **Multi Modal** — Text generation, image generation (Stable Diffusion, FLUX), and voice synthesis (VibeVoice TTS with voice cloning). • **Multi Model** — 15 text model families, 6 image model variants, and 2 TTS models. Architecture auto-detected from HuggingFace checkpoints. • **Multi Platform** — CUDA, Metal, Vulkan, and CPU backends across Linux, macOS, Windows, iOS, and Android. • **Multi Node** — Shard transformer blocks across devices with zero-config mDNS clustering or manual topology. Also runs entirely on a single machine. • **OpenAI-Compatible API** — REST API with streaming, plus a built-in web UI and TUI chat client. • **Docker** — Container builds for Linux/NVIDIA with docker-compose cluster support. Quick Start Build Models Download models from HuggingFace with . Models are stored in the standard HuggingFace cache directory ( ) and are shared with any other tools that use the same cache (transformers, huggingface-cli, etc.). Models are also downloaded automatically on first use if not already cached. Single Node Run any model locally on a single machine — architecture is auto-detected from the model's : Distributed Shard a model across multiple machines using . Workers don't need the model data — the master automatically streams the required tensor weights over the network (compressed with zstd, verified with CRC32 checksums). Workers cache received data locally for subsequent runs. The master discovers workers via mDNS, assigns layers proportionally to each device's VRAM/compute, and pushes only the required weight shards. See the clustering documentation for manual topology files and advanced configuration. For the full usage guide and API reference, check the project documentation. Star History License Released under the FAIR License (Free for Attribution and Individual Rights) v1.0.0. • **Non-commercial use** (personal, educational, research, non-profit) is freely permitted under the terms of the license. • **Commercial use** (SaaS, paid apps, any monetization) requires visible attribution to the project and its author. See the license for details. • **Business use** (any use by or on behalf of a business entity) requires a signed commercial agreement with the author. Contact for inquiries. To see the licenses of the project dependencies, install cargo license with and then run .