llamastack / llama-stack

Composable building blocks to build LLM Apps

8,298 stars

1,284 forks

182 issues

PythonTypeScriptMustache

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing llamastack/llama-stack in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/llamastack/llama-stack)

Preview:

Repository Overview (README excerpt)

Crawler view

Llama Stack **Quick Start** | **Documentation** | **Colab Notebook** | **Discord** 🚀 One-Line Installer 🚀 To try Llama Stack locally, run: Overview Llama Stack defines and standardizes the core building blocks that simplify AI application development. It provides a unified set of APIs with implementations from leading service providers. More specifically, it provides: • **Unified API layer** for Inference, RAG, Agents, Tools, Safety, Evals. • **Plugin architecture** to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile. • **Prepackaged verified distributions** which offer a one-stop solution for developers to get started quickly and reliably in any environment. • **Multiple developer interfaces** like CLI and SDKs for Python, Typescript, iOS, and Android. • **Standalone applications** as examples for how to build production-grade AI applications with Llama Stack. Llama Stack Benefits • **Flexibility**: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices. • **Consistent Experience**: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior. • **Robust Ecosystem**: Llama Stack is integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models. For more information, see the Benefits of Llama Stack documentation. API Providers Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack. Please checkout for full list | API Provider | Environments | Agents | Inference | VectorIO | Safety | Post Training | Eval | DatasetIO | |:--------------------:|:------------:|:------:|:---------:|:--------:|:------:|:-------------:|:----:|:--------:| | Builtin | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | SambaNova | Hosted | | ✅ | | ✅ | | | | | Cerebras | Hosted | | ✅ | | | | | | | Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | | AWS Bedrock | Hosted | | ✅ | | ✅ | | | | | Together | Hosted | ✅ | ✅ | | ✅ | | | | | Groq | Hosted | | ✅ | | | | | | | Ollama | Single Node | | ✅ | | | | | | | TGI | Hosted/Single Node | | ✅ | | | | | | | NVIDIA NIM | Hosted/Single Node | | ✅ | | ✅ | | | | | ChromaDB | Hosted/Single Node | | | ✅ | | | | | | Milvus | Hosted/Single Node | | | ✅ | | | | | | Qdrant | Hosted/Single Node | | | ✅ | | | | | | Weaviate | Hosted/Single Node | | | ✅ | | | | | | SQLite-vec | Single Node | | | ✅ | | | | | | PG Vector | Single Node | | | ✅ | | | | | | PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | | vLLM | Single Node | | ✅ | | | | | | | OpenAI | Hosted | | ✅ | | | | | | | Anthropic | Hosted | | ✅ | | | | | | | Gemini | Hosted | | ✅ | | | | | | | WatsonX | Hosted | | ✅ | | | | | | | HuggingFace | Single Node | | | | | ✅ | | ✅ | | TorchTune | Single Node | | | | | ✅ | | | | NVIDIA NEMO | Hosted | | ✅ | ✅ | | ✅ | ✅ | ✅ | | NVIDIA | Hosted | | | | | ✅ | ✅ | ✅ | > **Note**: Additional providers are available through external packages. See External Providers documentation. Distributions A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario. For example, you can begin with a local setup of Ollama and seamlessly transition to production, with fireworks, without changing your application code. Here are some of the distributions we support: | **Distribution** | **Llama Stack Docker** | Start This Distribution | |:---------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:| | Starter Distribution | llamastack/distribution-starter | Guide | | Starter Distribution GPU | llamastack/distribution-starter-cpu | Guide | | Builtin | llamastack/distribution-builtin-gpu | Guide | | PostgreSQL | llamastack/distribution-postgres-demo | N/A | | Dell | llamastack/distribution-dell | Guide | For full documentation on the Llama Stack distributions see the Distributions Overview page. Documentation Please checkout our Documentation page for more details. • CLI references • llama (server-side) CLI Reference: Guide for using the CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution. • llama (client-side) CLI Reference: Guide for using the CLI, which allows you to query information about the distribution. • Getting Started • Quick guide to start a Llama Stack server. • Jupyter notebook to walk-through how to use simple text and vision inference llama_stack_client APIs • The complete Llama Stack lesson Colab notebook of the new Llama 3.2 course on Deeplearning.ai. • A Zero-to-Hero Guide that guide you through all the key components of llama stack with code samples. • Contributing • Adding a new API Provider to walk-through how to add a new API provider. • Release Process for information about release schedules and versioning. Llama Stack Client SDKs Check out our client SDKs for connecting to a Llama Stack server in your preferred language. | **Language** | **Client SDK** | **Package** | | :----: | :----: | :----: | | Python | llama-stack-client-python | | Swift | llama-stack-client-swift | | Typescript | llama-stack-client-typescript | | Kotlin | llama-stack-client-kotlin | > **Note**: We are considering a transition from Stainless to OpenAPI Generator for SDK generation (#4609). The directory contains the new tooling for local SDK generation. You can find more example scripts with client SDKs to talk with the Llama Stack serv…