back to home

lablup / backend.ai

Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, Gaudi NPU, Google TPU, GraphCore IPU and other NPUs.

View on GitHub
628 stars
169 forks
1,439 issues

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing lablup/backend.ai in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/lablup/backend.ai)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

Backend.AI ========== Backend.AI is a streamlined, container-based computing cluster platform that hosts popular computing/ML frameworks and diverse programming languages, with pluggable heterogeneous accelerator support including CUDA GPU, ROCm GPU, Rebellions, FuriosaAI, HyperAccel, Google TPU, Graphcore IPU and other NPUs. It allocates and isolates the underlying computing resources for multi-tenant computation sessions on-demand or in batches with customizable job schedulers with its own orchestrator named "Sokovan". All its functions are exposed as REST and GraphQL APIs. Requirements ------------ Python & Build Tools • **Python**: 3.13.x (main branch requires CPython 3.13.7) • **Pantsbuild**: 2.27.x • See full version compatibility table Infrastructure **Required**: • Docker 20.10+ (with Compose v2) • PostgreSQL 16+ (tested with 16.3) • Redis 7.2+ (tested with 7.2.11) • etcd 3.5+ (tested with 3.5.14) • Prometheus 3.x (tested with 3.1.0) **Recommended** (for observability): • Grafana 11.x (tested with 11.4.0) • Loki 3.x (tested with 3.5.0) • Tempo 2.x (tested with 2.7.2) • OpenTelemetry Collector → Detailed infrastructure setup: Infrastructure Documentation System • **OS**: Linux (Debian/RHEL-based) or macOS • **Permissions**: sudo access for installation • **Resources**: 4+ CPU cores, 8GB+ RAM recommended for development Getting Started --------------- Quick Start (Development) • Clone and Install This script will: • Check required dependencies (Docker, Python, etc.) • Set up Python virtual environment with Pantsbuild • Start halfstack infrastructure (PostgreSQL, Redis, etcd, Grafana, etc.) • Initialize database schemas • Create default API keypairs and user accounts • Start Backend.AI Services Start each component in separate terminals: **Manager** (Terminal 1): **Agent** (Terminal 2): **Storage Proxy** (Terminal 3): **Web Server** (Terminal 4): **App Proxy** (Terminal 5-6, optional for in-container service access): • Run Your First Session Set up client environment: Run a simple Python session: Or access Web UI at **http://localhost:8090** with credentials from files. Accessing Compute Sessions (aka Kernels) Backend.AI provides websocket tunneling into individual computation sessions (containers), so that users can use their browsers and client CLI to access in-container applications directly in a secure way. • Jupyter: data scientists' favorite tool • Most container images have intrinsic Jupyter and JupyterLab support. • Web-based terminal • All container sessions have intrinsic ttyd support. • SSH • All container sessions have intrinsic SSH/SFTP/SCP support with auto-generated per-user SSH keypair. PyCharm and other IDEs can use on-demand sessions using SSH remote interpreters. • VSCode • Most container sessions have intrinsic web-based VSCode support. Working with Storage Backend.AI provides an abstraction layer on top of existing network-based storages (e.g., NFS/SMB), called vfolders (virtual folders). Each vfolder works like a cloud storage that can be mounted into any computation sessions and shared between users and user groups with differentiated privileges. Installation for Multi-node Tests & Production Please consult our documentation for community-supported materials. Contact the sales team (contact@lablup.com) for professional paid support and deployment options. Architecture ------------ For comprehensive system architecture, component interactions, and infrastructure details, see: **Component Architecture Documentation** This document covers: • System architecture diagrams and component flow • Port numbers and infrastructure setup • Component dependencies and communication protocols • Development and production environment configuration Contents in This Repository --------------------------- This repository contains all open-source server-side components and the client SDK for Python as a reference implementation of API clients. Directory Structure • : Source codes • : Manager as the cluster control-plane • : Manager API handlers • : Unified user profile and SSO management • : Agent as per-node controller • : Agent's Docker backend • : Agent's Kubernetes backend • : Agent's dummy backend • : Agent's kernel runner counterpart • : Agent's in-kernel prebuilt binaries • : Agent's in-kernel helper package • : Shared utilities • : Client SDK • : Unified CLI for all components • : SCIE-based TUI installer • : Storage proxy for offloading storage operations • : Storage proxy's manager-facing and client-facing APIs • : App proxy for accessing container apps from outside • : App proxy coordinator who provisions routing circuits • : App proxy worker who forwards the traffic • : Web UI server • : Backend.AI WebUI release artifacts • : Logging subsystem • : Plugin subsystem • : Integration test suite • : Shared utilities used by unit tests • : Legacy meta package • : Intrinsic accelerator plugins • : Unified documentation • - , , ...: Per-component unit tests • - , , ...: Per-component sample configurations • : Dockerfiles for auxiliary containers • - , ...: Per-component fixtures for development setup and tests • : A directory to place plugins such as accelerators, monitors, etc. • : Scripts to assist development workflows • : The single-node development setup script from the working copy • : Type annotation stub packages written by us • : A directory to host Pants-related tooling • : A directory to put build artifacts (.whl files) and Pants-exported virtualenvs • : News fragments for towncrier • : The Pants configuration • : Tooling configuration (towncrier, pytest, mypy) • : The root build config file • : Per-directory build config files • : An indicator to mark the build root directory for Pants • : The steering guide for agent-assisted development • : The unified requirements file • , : The dependency lock files • : Per-version recommended halfstack container configs • : This file • : The migration guide for updating between major release…